Archives Discussions

spectral · ‎05-19-2011

It can be a stupid question, but is it necessary to put the following lines in the begining of a kernel ?

If I request the "exact" quantity of work... I don't need this ?

if (get_global_id(0) >= workAmount) return;

himanshu_gautam · ‎05-19-2011

hi viewon01,

I think its good to keep it there while developing the kernel as one might do some weird things while writing kernel. But once development is done and performance is the concern, remove it.

spectral · ‎05-19-2011

Ok, so I'm right !!

Thanks Himanshu

Remarks : do you have some news about my crash with SDK 2.4 ?

himanshu_gautam · ‎05-19-2011

hi viewon01,

I just happened to find out one scenario where that statement can be useful.

if you global NDRange is a prime number, you cannot divide it properly in workgroups. So you create a larger global NDRAnge and use that statement.

Although in some cases just doing some more work can be better but that statement can be helpful in most cases.

Thanks

spectral · ‎05-19-2011

Thanks,

Do you have an example please to explain ?

Regards

himanshu_gautam · ‎05-19-2011

if global NDRange is 257, you cannot divide it into workgroups completely, so you can set the global ndrange as 320(256 + 64) and set workgroup size as 64. So Now you will need that statement to check that only required number of workitems actally run.

rick_weber · ‎05-19-2011

Originally posted by: himanshu.gautam if global NDRange is 257, you cannot divide it into workgroups completely, so you can set the global ndrange as 320(256 + 64) and set workgroup size as 64. So Now you will need that statement to check that only required number of workitems actally run.

I would argue that this isn't just one example, but the general case. Most applications operate on nearly arbitrary data sets, but you can't spawn an arbitrary number of threads and expect to get high performance. Since your global size is almost always a multiple of 64 on the GPU, you'll need these kinds of statements everywhere.

spectral · ‎05-20-2011

And if I don't specify the workgroup size, it is not the work of the OpenCL runtime to check this ?

_queue->enqueueNDRangeKernel(

*_kernel,

cl::NullRange,

cl::NDRange(257), // total work

0,

0);

If I only specify the number of tasks... does the OpenCL runtime do the check ?

I want to be sure that I don't do the check if in the runtime we already do the check !!!

himanshu_gautam · ‎05-20-2011

AFAIK, if you don't specify workgroup size you don't need to check for out of range global threads.

But you surely loose a programmable feature in your app which sometimes can be important from performance perspective.

rick_weber · ‎05-20-2011

OpenCL does not (and cannot) know how many threads to put in a work group based on the size of your data. If you set it to NULL, it assigns an implementation defined work size that has no bearing on program correctness. It is your responsibility to ensure that excess threads don't create seg faults, write to things they aren't supposed to, etc. As such, you need these checks. Furthermore, these checks generally aren't a whole lot of work; if they're causing you significant overhead, you probably don't have enough work to ammortize data transfers and OpenCL overhead in the first place.

If you've used CUDA, work groups are analogous to their grid, except that in CUDA you define the work group size and the number of work groups rather than total work items.

settle · ‎05-23-2011

Originally posted by: viewon01 It can be a stupid question, but is it necessary to put the following lines in the begining of a kernel ?

If I request the "exact" quantity of work... I don't need this ?

if (get_global_id(0) >= workAmount) return;

If you start your kernel with something like this, doesn't that mean you are prohibited from using any kind of barrier or mem fence? Otherwise you risk encountering an indefinite barrier.

rick_weber · ‎05-23-2011

I don't think it matters for fences, but it can matter for barriers. For that, you'll need to do something like

if(get_global_id(0) < size)

shouldExecute = true;

if(shouldExecute == true)
{
doWork()
}

barrier(CLK_LOCAL_MEM_FENCE);

However, if you can guarantee that every thread in a work group hits the barrier or doesn't hit the barrier, then you're still good; the compiler doesn't need to be able to deterministically determine the efficacy of barriers.

Archives Discussions

Is it necessary : if (get_global_id(0) >= workAmount) return;