cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

syoyofujita
Journeyman III

OpenCL on Windows too slow?

I've wrote very simple OpenCL kernel which fills pixels by work ID.

I got terribly slow performance from this kernel with ATI Stream SDK 2.0beta on Windows Vista64.

It requires about 8 secs to execute which is unbelievable for me. On the other hand Snow Leopard executes same kernel within 0.0001 sec.

Does anyone know the reason why so slow on Windows?

More is available at the following site.

http://lucille.atso-net.jp/blog/?p=907

 

__kernel void main( __global uint *out, uint col) // not used. { int x = get_global_id(0); int y = get_global_id(1); out[x+y*get_global_size(0)] = (uint)(x | (y << 😎 | (255 << 16) | (255<<24)); }

0 Likes
7 Replies
omkaranathan
Adept I

Which hardware are you running the Snow Leopard  implementation on and what is the global work size you are using?

0 Likes
heavensrevenge
Journeyman III

I'll venture a guess (slightly obvious): The ATI Stream SDK 2.0 beta can only use the CPU for computations in the current release as stated in the release notes. Snow leopard on the other hand has the Drivers and kernel hooks built into the OS directly, so yea, on Windows only your CPU is doing the math, unlike a capable GPU running Snow Leopard.

Hint: look at your screenshot, and the CL_DEVICE_NAME... stating to you it's using the CPU and not your GPU.  

Hope you have fun practicing on Snow Leopard until a Microsoft release has GPU support.

0 Likes

Both are running on CPU(CL_DEVICE_TYPE_CPU), and The program use following work size.

global work size = (256, 256)

local work size = (1, 1)

0 Likes

Are you using the same source code(host &kernel code) for both the platforms? Could you post the host side code?

 

 

0 Likes

I am getting around 140 fps in the NBody sample running on CPU (using ATI StreamSDK 2.0 sample on Phenom Quad). It shouldn't take 8 secs to execute a simple kernel like yours. Can you post the host+kernel code?

0 Likes

The host code is same as OpenCL AO Bench.

http://kioku.sys-k.net/archives/2009/08/opencl_ao_bench.html

I am using VS2009 and I've found executing OpenCL app through [Debug] -> [Start Debugging] causes terrible performance slowdown in my case(8secs. Even if the app was built with Release settings). Executing OpenCL app through [Debug] -> [Start without debugging] gives normal performance(0.05 secs).

Hope it helps when you develop OpenCL app with VS2009.

0 Likes

This particularly issue with AO bench is a known problem. This is actually due to a number of small things that add up, each has been addressed and will appear in an up and comming refresh.

One thing to note; a launch of 1,1 may not always be the best choice on our implementation and it might be worth trying different values for this, e.g. 8x8 or 16x16.

0 Likes