yes it seems correct.
Originally posted by: iLoop Hi
I have a CUDA program running like this:
clEnqueueNDRangeKernel(cmd_queue, kernel, grid, NULL, block,threads, 0, NULL, NULL);
It is wrong,
It should be like this
clEnqueueNDRangeKernel(cmd_queue, kernel[0, dim(grid), NULL, grid * block, block,NULL, NULL, NULL);
Remember dimension of grid and block must be same.
3rd paremeter of execution configuration of CUDA is size of shared memory. You need to use clSetKernelArg function to set shared memory size.