Archives Discussions

Barsik107 · ‎10-21-2010

I want to count some array which have 40000 items. But I'm thinking of it like 2 dimensional(200x200). So I set the globalThreads[0] = 200;
localThreads[0] = 200. Then using the simple logic, I calculating index in kernel like:

uint tid = get_global_id(0);
uint lid = get_local_id(0);

const uint range=200;

index= tid*range+lid;

But seems that it wrong idea. What's wrong in my logic?

dravisher · ‎10-21-2010

I don't know exactly what you want to do here, but if you just want each work-item to access an element, without using local memory in any particular way, then what you want is a 2D NDRange globally. What you seem to have done is create a 1D range with 200 work-items in total, with one work-group of size 200. What you should rather do is create a 200x200 2D NDRange globally. The local size is determined based on performance. Since you're not using a power-of-two size, it's probably going to be sub-optimal, but you could use a local size of say 10x10.

So you'd have something like:

uint gidx = get_global_id(0);

uint gidy = get_global_id(1);

uint range = get_global_size(0);

index = gidy * range + gidx;

douglas125 · ‎10-21-2010

I think what you mean is to use a work dimension of two:

Use EnqueueNDRangeKernel using GlobalWorkSize = {200, 200}

Then inside the kernel use

int tid = get_global_id(0);

int lid = get_global_id(1);

ind = tid*200+lid;

Roughly speaking, the local workgroup is a subdivision of the global workgroup. Refer to the OpenCL architecture in Khronos Spec.

Archives Discussions

Indexes in Kernel's