Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Journeyman III

Confusing, CPU and GPU seems serialized. Help!

I want to use the CPU and GPU concurrently in an application, however, it seems serialized. Probably, it's my mistate. Please help me point out it.

My OS is OpenSUSE 10.3 with SDK 2.2. The computer includes a 4-core processor and 4870x2 GPU.

The main procedure is like this. I use the main thread to controll the one GPU. And create another 3 threads to perform some computation on the 3 cores. The main code is attached.

Firstly, I test the execution time for GPU and CPU respectively. The execution time for GPU is 8s. The execution for each thread on CPU is about 12s. Because the 3 threads are running in parallel, the final execution time for CPU is 12s.
Then, I want to run CPU and GPU concurrently. I think the final execution time is 12s (max{12,8}), however, the real execution time is 20! It seems the executions of CPU and GPU are serialized.
Through using many time counters, I find that when the GPU starts running, the 3 threads on CPU cores are blocked. Until the GPU finishes, the 3 threads could continue. I'm sure there is no synchronization between the main thread and the 3 slave threads. The result seems confusing. The 3 CPU threads have no relationship with GPU's execution. I think there is something wrong in it. Please help me point it. Thanks.


streamsdk::SDKThread *threads = new streamsdk::SDKThread[3]; threads[0].create(threadFunc2, (void *)&num[1]); threads[1].create(threadFunc2, (void *)&num[2]); threads[2].create(threadFunc2, (void *)&num[3]); time_t time1;time(&time1);printf("Host Start %ld\n", time1); for(int i = 0; i < 5; i++) { kernelRun = clgpu.runCLKernels(); //call GPU kernel if(kernelRun != SDK_SUCCESS) { return kernelRun; } } clgpu.flush(); threads[0].join(); threads[1].join(); threads[2].join(); clgpu.runCLKernelsEnd(); //wait for GPU time(&time1);printf("Host End %ld\n", time1);

4 Replies

what is CPU usage during computation. are all three-four cores utilized?

Journeyman III

I monitor the IPC for all the processor cores. From the IPC, it seems the processor is BUSY.

Could anyone provide a sample code of using CPU and GPU device concurrently in OpenCL ?


Are you using the device fission extension to divide you CPU cores?


Thanks for your reply. I have moved to SDK2.4. And it seems OK with executing CPU and GPU in parallel.