OpenCL performance on multicore CPU

Discussion created by FangQ on Feb 6, 2010
Latest reply on Feb 7, 2010 by nou


I just got my first OpenCL code working. There are still a lot of things needed to be fine tuned and digested. One of those is the CPU load when running the code on a multicore CPU.

My computer has an intel quad-core (Q6700) CPU and a Radeon 4650 card, I first called clGetPlatformIDs() and it returned 1 platform, called "ATI Stream". Then, I used clCreateContextFromType() created a CPU context from this platform. Calliing clGetContextInfo() returned 4 devices, which I assume they are the 4 cores of the CPU. Then, I created a command queue for device[0], I thought that it attached a queue for the first core of the CPU. However, when I launched my kernel for this command queue, I saw my CPU load jumped to 400%, indicating all cores are used.

Can anyone explain to me what happened? do you expect the call

[code]commands=clCreateCommandQueue(context,devices[0] ... )[/code]

limit all the subsequent computation to a single core of the CPU? or stream sdk is smart enough to expand it to all available devices within this context?


In addition, my card is supposed to have 320 cores, but when I ran CLInfo, it showed only 8 compute units. is this right? (running my code on GPU was a lot slower than CPU )