5 Replies Latest reply on Jan 19, 2011 9:32 PM by Jawed

    global_threads - local_threads - divisibility



      I am currently working on an OpenCL implementation of an algorithm that works on pictures. As it works pixel by pixel I want to use one workitem per pixel and therefore set the number of global threads (2 dimensional) to (image_width, image_height). As far as I discovered, the number of global threads must be devisible without a remainder by the number of local threads. This is of course not possible for every imagesize. I am not sure how to handle the situation if it is not devisible without a remainder (afaik the SDK samples do not cover that case). At the moment I use the next-highest number which is devisible for the global threads. But on CPU this would be a SEGFAULT since I am accessing positions in the buffer that are beyond the imagearraybounds.

      What is the common solution for this? Do I have to ajust the Buffer size in the same way, transporting unnecessary "fill-data" to the GPU?


      Thanks in advance,