I'm using pyopencl 2012.1 with a Radeon HD 6450 with the drivers included in ubuntu 12.04 (fglrx_8.960).
With the attached python script (is a test only), the GPU give me a poor performane (40x slowness) only changing two elements in the vector size...
Time for ASIZE: 29120 [GPU]: 0.296946 s
Time for ASIZE: 29120 [CPU]: 0.354775 s
Time for ASIZE: 29122 [GPU]: 11.4285 s
Time for ASIZE: 29122 [CPU]: 0.429958 s
It's a problem with my graphic card?
Solved! Go to Solution.
property using clGetKernelWorkgroupInfo() API
That will help. You need query your kernel object to get that.
it is defined by HW. most AMD cards need 64. low end AMD cards have 32. nVidia use 32 and from Intel OpenCL programong guide for their accerleator card it seems like it use 16/32 width.
I think nou answered it right.
Although I am not familiar with pyOpenCL, I believe the following line launches the kernel
exec_evt = prg.test(queue, a.shape, None, a_buf, b_buf, dest_buf)
a.shape == global size == 29120 or 29122
None == local size ==> Find out a suitable local size (Is this correct?)
As nou put it, 29122 is not divisible by 64, 128, 192 or 256.
Also, Since 14561 is a prime number, 2 is the only option available for local size.
+ Enabling Profiling will slow down your operations. Try to use external timers to measure time. You might get better numbers.
On HD5xxx/HD6xxx the global size has to be divisible by 64, 128, 192 or 256 for optimal performance.
HD7xxx series (GCN architecture) supports partial launches. You should have the same performance for 29120 or 29122.