cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

atoz
Journeyman III

enqueueNDRangeKernel / Invalid work group size error

hello,

i am trying to use opencl (api 1.2) on a bit older hardware and software - socket am3 cpu/a 5770 and a 6550 gpu, debian/squeeze w. 3.2 backports kernel and amd-driver version 12.4 <-- i use this outdated driver because of the old xorg version in debian/squeeze.

i use the enqueueNDRangeKernel method from the queue object to start a simple matrix multiplication kernel with different numbers of threads to get a feeling for the necessary number of threads for this device and i am experiencing the following problem:

if i use the following call it works like a charm, but only utilizing one compute module on the gpu/one core on the CPU:

queue->enqueueNDRangeKernel(*kernel, cl::NullRange, numThreads, cl::nullRange, NULL, &event);

numThreads ... total number of threads.

if i try to speficy the global and local dimenstion:

queue->enqueueNDRangeKernel(*kernel, cl::NullRange, numThreads, threadsPerWorkgroup, NULL, &event);

numThreads ... still total number of threads

threadsPerWorkgroup ... at the moment fixated to 16

i get an error "Invalid work group size".

i thought the opencl library will select a proper layout for the device automatically

can anyone give me some hints to resolve this problem?

cheers

v.

0 Likes
2 Replies
himanshu_gautam
Grandmaster

global work size must be divisible by local work group size } in each and every dimension.

Are you making sure of this?

Also, 16 is not a nice number on GPU. Use multiples of 64 in order to use the GPU hardware effectively lest you should wither away hardware cycles for non-existing workitems

0 Likes

hi!

yes, thank you! you are right ... it was the problem of the number of global work-item dimensions has to be divisable by the local dimensions!

cheers

v.

0 Likes