1 Reply Latest reply on Jul 3, 2014 2:35 PM by pbanwait

    EnqueueArgs produces no results.


      I have been trying to optimize a few kernels I have been developing for OpenCL. The kernels have been running quite well up until now. I have a CPU implementation for my kernels on the side such that I can compare results. I have queried my device, AMD R9 290, for maximum work group sizes and divided up the work accordingly. I have also made my global work size a power of two. I am using the C++ OpenCL API to make my kernel calls building the kernels using the make_kernel() command. Anyways, the optimizations I made were solely to take advantage of wave-front sizes and divide work accordingly such that I can have as many work units active as possible within a work group. By setting up my NDRange with the following API call :


      cl::EnqueueArgs range_sliver_args = cl::EnqueueArgs( queue, cl::NullRange, cl::NDRange( dimensionX, dimensionY ), cl::NDRange( dimensionX / 16, dimensionX / 16 ) );


      the code fails to produce any results. Note that dimensionX and dimensionY are both a power of two. The first argument is for the command queue, the second for the NDRange offset, the third for the global NDRange, and finally the fourth for the local NDRange. If I set the local NDRange to null using cl::NullRange, the kernel executes perfectly and produces the correct results. However I would like to be able to adjust the local NDRange to test out how the kernel performs. Note, my kernel implementation does not depend on local id's at all, thus adjusting the local range is purely to test performance improvements. However setting a local NDRange produces no results. If anyone has any idea what the problem can be, I would appreciate any input.

        • Re: EnqueueArgs produces no results.

          I seem to have figured it out. It seems that depending upon the value that I divide my local args, the program will run perfectly at some times and then will not on others. I had divided my local args appropriately such that it was exactly what CL_KERNEL_WORK_GROUP_SIZE specified. But it does not work at that amount. By dividing the local work group size by larger values, it worked perfectly. I'm not sure why the work group information gave me an invalid size however. It seems that perhaps the product of my local range must be less then what CL_KERNEL_WORK_GROUP_SIZE specified. I could be wrong with this.