cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

steveh_ol
Journeyman III

global_threads - local_threads - divisibility

Hi,

I am currently working on an OpenCL implementation of an algorithm that works on pictures. As it works pixel by pixel I want to use one workitem per pixel and therefore set the number of global threads (2 dimensional) to (image_width, image_height). As far as I discovered, the number of global threads must be devisible without a remainder by the number of local threads. This is of course not possible for every imagesize. I am not sure how to handle the situation if it is not devisible without a remainder (afaik the SDK samples do not cover that case). At the moment I use the next-highest number which is devisible for the global threads. But on CPU this would be a SEGFAULT since I am accessing positions in the buffer that are beyond the imagearraybounds.

What is the common solution for this? Do I have to ajust the Buffer size in the same way, transporting unnecessary "fill-data" to the GPU?

 

Thanks in advance,

 

steve

0 Likes
5 Replies
nou
Exemplar

or just placeso you don't get access violation.

if(width<get_global_id(0) && height<get_global_id(1))
{
//kernel code
}

and you don't get error when you access out of allocated buffer on GPU. but it can mess up somewhere else. so it is even worse than on CPU.

0 Likes

I am not aware where global size would not be divisible by local size.Are you trying to club work of many pixels inside one kernel.

Can you please tell you global & local sizes.

0 Likes

As nou mentioned, you don't need to pad your image, you just need to pad the dimensions of your threads.  For example, say you want a group size of 8x8 but your image is 10x10.  Simply pad the dimensions for the clEnqueueNDRangeKernel() to 16x16 but then clamp within the kernel itself.

You will lose some efficiency because some wavefronts will only be partially utilized, but for large images it should be marginal.

Jeff

0 Likes

Thank you alot for the answers and sorry for me answering so late, I totally had forgotten that I hadn't answered already.

I did just what Jeff Golds and nou suggested.

Works fine.

Thanks.

 

Greetings,

Steve

 

0 Likes

Also, one of the parameters for clEnqueueWriteImage etc. is the copy region, i.e. the size in pixels that should be copied. This enables you to safely move data between the host and device buffer without worrying about overruns.

0 Likes