cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

abeamud
Journeyman III
Journeyman III

Strange behaviour with different vector sizes

Jump to solution

Hi all:

I'm using pyopencl 2012.1 with a Radeon HD 6450 with the drivers included in ubuntu 12.04 (fglrx_8.960).

With the attached python script (is a test only), the GPU give me a poor performane (40x slowness) only changing two elements in the vector size...

Time for ASIZE: 29120 [GPU]: 0.296946 s

Time for ASIZE: 29120 [CPU]: 0.354775 s

Time for ASIZE: 29122 [GPU]: 11.4285 s

Time for ASIZE: 29122 [CPU]: 0.429958 s

It's a problem with my graphic card?

Thanks

Tags (3)
0 Kudos
Reply
1 Solution

Accepted Solutions
nou
Exemplar
Exemplar

Re: Strange behaviour with different vector sizes

Jump to solution

problem is your global size. 29120 can be divided by 64 so you get optimal performace. but 29122 factorized is 2*14561 so it can run only with local size 2.

View solution in original post

0 Kudos
Reply
6 Replies
nou
Exemplar
Exemplar

Re: Strange behaviour with different vector sizes

Jump to solution

problem is your global size. 29120 can be divided by 64 so you get optimal performace. but 29122 factorized is 2*14561 so it can run only with local size 2.

View solution in original post

0 Kudos
Reply
himanshu_gautam
Grandmaster
Grandmaster

Re: Strange behaviour with different vector sizes

Jump to solution

I think nou answered it right.

Although I am not familiar with pyOpenCL, I believe the following line launches the kernel

exec_evt = prg.test(queue, a.shape, None, a_buf, b_buf, dest_buf)

a.shape == global size == 29120 or 29122

None == local size ==> Find out a suitable local size (Is this correct?)

As nou put it, 29122 is not divisible by 64, 128, 192 or 256.

Also, Since 14561 is a prime number, 2 is the only option available for local size.

+ Enabling Profiling will slow down your operations. Try to use external timers to measure time. You might get better numbers.

german
Staff
Staff

Re: Strange behaviour with different vector sizes

Jump to solution

On HD5xxx/HD6xxx the global size has to be divisible by 64, 128, 192 or 256 for optimal performance.

HD7xxx series (GCN architecture) supports partial launches. You should have the same performance for 29120 or 29122.

abeamud
Journeyman III
Journeyman III

Re: Strange behaviour with different vector sizes

Jump to solution

This value (64), is defined by the hardware or by the opencl framework?... How I can get this value?

Thank you for your response.

0 Kudos
Reply
himanshu_gautam
Grandmaster
Grandmaster

Re: Strange behaviour with different vector sizes

Jump to solution

 

Check out

CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE

property using clGetKernelWorkgroupInfo() API

That will help. You need query your kernel object to get that.

nou
Exemplar
Exemplar

Re: Strange behaviour with different vector sizes

Jump to solution

it is defined by HW. most AMD cards need 64. low end AMD cards have 32. nVidia use 32 and from Intel OpenCL programong guide for their accerleator card it seems like it use 16/32 width.