AnsweredAssumed Answered

Concurrent kernel execution between CPU and GPU

Question asked by hschu on May 6, 2013
Latest reply on May 13, 2013 by himanshu.gautam

Hi. I am trying to verify simple heterogenous computing using a CPU and a GPU using OpenCL. The Kernel function is a simple BLAS level 1 saxpy (single-precision, scalar multplication and vector addition) algorithm, and I assigned "n" numbers of elements to the CPU and "nn-n" to the GPU, where "nn" is the vector length. Moving n variable, I wanted to figure out a splitting point "n" that minimizes whole computational time.

 

In order to get the ideal splitting point, OpenCL should guarantee a concurrency under heterogeneous system. So I tried to verify that concurrency by testing a simple program as follows.

 

   CPerfCounter t1;

   ...

   t1.Reset();
   t1.Start();

   // Enqueue to write the target vectors x and y to GPU Global memory.
   clEnqueueWriteBuffer(cqCommandQueue_gpu, cl_x, CL_FALSE, 0, sizeof(FLOAT)*(nn-n), x, 0, NULL, NULL);
   clEnqueueWriteBuffer(cqCommandQueue_gpu, cl_y, CL_FALSE, 0, sizeof(FLOAT)*(nn-n), y, 0, NULL, NULL);

  
   // Enqueue NDRange to CPU
   err = clEnqueueNDRangeKernel(cqCommandQueue_cpu, ckKernel[1], 1, NULL, &GWS2, &LWS2, 0, NULL, NULL);

   // Enqueue NDRange to GPU
   clEnqueueNDRangeKernel(cqCommandQueue_gpu, ckKernel[0], 1, NULL, &GWS, &LWS, 0, NULL, NULL);
  
   // Enqueue to read the result vector to Host memory
   clEnqueueReadBuffer(cqCommandQueue_gpu, cl_y, CL_FALSE, 0, sizeof(FLOAT)*(nn-n), z, 0, NULL, NULL);
   //clFlush(cqCommandQueue_gpu);

   clFlush(cqCommandQueue_cpu);
   clFinish(cqCommandQueue_gpu);
   clFinish(cqCommandQueue_cpu);
   t1.Stop();
     
I intentionally remove "clFlush(cqCommandQueue_gpu)" since there were no big differences about results. Here is profile information using AMD Profiler. I found out some strang results.

 

Case1. Not executed in parallel

1.png

 

Case2. Working properly

2.png

 

Case3. Strangely PCI express holds data while CPU computes

3.png

 

How can I analyze these results?

Thanks in advance.

 

------------- My information

Windows 7 64-bit, VS 2010

CPU : FX 8120

GPU : Radeon 7970

Attachments

Outcomes