4 Replies Latest reply on Jun 3, 2011 1:52 PM by wgbljl

    Confusing, CPU and GPU seems serialized. Help!


      I want to use the CPU and GPU concurrently in an application, however, it seems serialized. Probably, it's my mistate. Please help me point out it.

      My OS is OpenSUSE 10.3 with SDK 2.2. The computer includes a 4-core processor and 4870x2 GPU.

      The main procedure is like this. I use the main thread to controll the one GPU. And create another 3 threads to perform some computation on the 3 cores. The main code is attached.

      Firstly, I test the execution time for GPU and CPU respectively. The execution time for GPU is 8s. The execution for each thread on CPU is about 12s. Because the 3 threads are running in parallel, the final execution time for CPU is 12s.
      Then, I want to run CPU and GPU concurrently. I think the final execution time is 12s (max{12,8}), however, the real execution time is 20! It seems the executions of CPU and GPU are serialized.
      Through using many time counters, I find that when the GPU starts running, the 3 threads on CPU cores are blocked. Until the GPU finishes, the 3 threads could continue. I'm sure there is no synchronization between the main thread and the 3 slave threads. The result seems confusing. The 3 CPU threads have no relationship with GPU's execution. I think there is something wrong in it. Please help me point it. Thanks.


      streamsdk::SDKThread *threads = new streamsdk::SDKThread[3]; threads[0].create(threadFunc2, (void *)&num[1]); threads[1].create(threadFunc2, (void *)&num[2]); threads[2].create(threadFunc2, (void *)&num[3]); time_t time1;time(&time1);printf("Host Start %ld\n", time1); for(int i = 0; i < 5; i++) { kernelRun = clgpu.runCLKernels(); //call GPU kernel if(kernelRun != SDK_SUCCESS) { return kernelRun; } } clgpu.flush(); threads[0].join(); threads[1].join(); threads[2].join(); clgpu.runCLKernelsEnd(); //wait for GPU time(&time1);printf("Host End %ld\n", time1);