AnsweredAssumed Answered

Problem using zero copy buffer OpenCL in discrete GPU

Question asked by arvin99 on Nov 29, 2013
Latest reply on Nov 30, 2013 by nou

Hello everybody,


I have a question about OpenCL especially about zero copy buffer. I create an OpenCL program  for my assignment and I get difficulty about how to use zero copy buffer.

I use prepinned buffer (CL_MEM_ALLOC_HOST_PTR) and  map  the buffer (clEnqueueMapBuffer) from host to device and from device to host in order to get zero copy result.

I use AMD APP Profiler to get the result but the result is weird.

Transfer from host to device using zero copy buffer produce weird result like using pinned buffer ( higher transfer size ) but the transfer from device to host produce good result (zero copy buffer result -->NA).


Here is the result:




I use AMD APP SDK 2.6 and OS Windows 7 (64 bit)

I use discrete GPU ( AMD Radeon HD6630M ) as my OpenCL device. Is it because I use discrete GPU and not integrated GPU ??

My discrete GPU suppot Virtual Memory (VM)


Here is the code:

// Allocate Device Memory For Input And Output

  d_A = clCreateBuffer(context,  CL_MEM_READ_ONLY  | CL_MEM_ALLOC_HOST_PTR,   sizeof(cl_float)*size*size, 0, &err);

  d_B = clCreateBuffer(context,  CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR,   sizeof(cl_float)*size*size, 0, &err);

  d_C = clCreateBuffer(context,  CL_MEM_WRITE_ONLY | CL_MEM_ALLOC_HOST_PTR  , sizeof(cl_float)*size*size, 0, &err);


  void* mapPtrA = (float*)clEnqueueMapBuffer( queue, d_A, CL_TRUE, CL_MAP_WRITE, 0, sizeof(cl_float)*size*size, 0, NULL, NULL, NULL);

  void* mapPtrB = (float*)clEnqueueMapBuffer( queue, d_B, CL_TRUE, CL_MAP_WRITE, 0, sizeof(cl_float)*size*size, 0, NULL, NULL, NULL);


  memcpy(mapPtrA, matrixA, sizeof(cl_float)*size*size);

  memcpy(mapPtrB, matrixB, sizeof(cl_float)*size*size);


  clEnqueueUnmapMemObject(queue, d_A, mapPtrA, 0, NULL, NULL);

  clEnqueueUnmapMemObject(queue, d_B, mapPtrB, 0, NULL, NULL);


//this function call kernel

  MatrixMul(d_A, d_B, d_C, size);


  void* mapPtrC = (float*)clEnqueueMapBuffer( queue, d_C, CL_TRUE, CL_MAP_READ, 0, sizeof(cl_float)*size*size, 0, NULL, NULL, NULL);

  memcpy(matrixC, mapPtrC, sizeof(cl_float)*size*size);

  clEnqueueUnmapMemObject(queue, d_C, mapPtrC, 0, NULL, NULL);


  err = clReleaseMemObject(d_A);

  err = clReleaseMemObject(d_B);

  err = clReleaseMemObject(d_C);