cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

arvin99
Adept II

Problem using zero copy buffer OpenCL in discrete GPU

Hello everybody,

I have a question about OpenCL especially about zero copy buffer. I create an OpenCL program  for my assignment and I get difficulty about how to use zero copy buffer.

I use prepinned buffer (CL_MEM_ALLOC_HOST_PTR) and  map  the buffer (clEnqueueMapBuffer) from host to device and from device to host in order to get zero copy result.

I use AMD APP Profiler to get the result but the result is weird.

Transfer from host to device using zero copy buffer produce weird result like using pinned buffer ( higher transfer size ) but the transfer from device to host produce good result (zero copy buffer result -->NA).

Here is the result:

Untitled.png

I use AMD APP SDK 2.6 and OS Windows 7 (64 bit)

I use discrete GPU ( AMD Radeon HD6630M ) as my OpenCL device. Is it because I use discrete GPU and not integrated GPU ??

My discrete GPU suppot Virtual Memory (VM)

Here is the code:

// Allocate Device Memory For Input And Output

  d_A = clCreateBuffer(context,  CL_MEM_READ_ONLY  | CL_MEM_ALLOC_HOST_PTR,   sizeof(cl_float)*size*size, 0, &err);

  d_B = clCreateBuffer(context,  CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR,   sizeof(cl_float)*size*size, 0, &err);

  d_C = clCreateBuffer(context,  CL_MEM_WRITE_ONLY | CL_MEM_ALLOC_HOST_PTR  , sizeof(cl_float)*size*size, 0, &err);

  void* mapPtrA = (float*)clEnqueueMapBuffer( queue, d_A, CL_TRUE, CL_MAP_WRITE, 0, sizeof(cl_float)*size*size, 0, NULL, NULL, NULL);

  void* mapPtrB = (float*)clEnqueueMapBuffer( queue, d_B, CL_TRUE, CL_MAP_WRITE, 0, sizeof(cl_float)*size*size, 0, NULL, NULL, NULL);

  memcpy(mapPtrA, matrixA, sizeof(cl_float)*size*size);

  memcpy(mapPtrB, matrixB, sizeof(cl_float)*size*size);

  clEnqueueUnmapMemObject(queue, d_A, mapPtrA, 0, NULL, NULL);

  clEnqueueUnmapMemObject(queue, d_B, mapPtrB, 0, NULL, NULL);

//this function call kernel

  MatrixMul(d_A, d_B, d_C, size);

  void* mapPtrC = (float*)clEnqueueMapBuffer( queue, d_C, CL_TRUE, CL_MAP_READ, 0, sizeof(cl_float)*size*size, 0, NULL, NULL, NULL);

  memcpy(matrixC, mapPtrC, sizeof(cl_float)*size*size);

  clEnqueueUnmapMemObject(queue, d_C, mapPtrC, 0, NULL, NULL);

  err = clReleaseMemObject(d_A);

  err = clReleaseMemObject(d_B);

  err = clReleaseMemObject(d_C);

0 Likes
3 Replies
nou
Exemplar

it is zero copy. because 3.4 TB/s copy rate over PCIe is impossible.

0 Likes

Thanks for reply.

I 'm sorry, i still don't understand. I know that pinned host memory and zero copy buffer is limited with PCIe bandwidth.

My  PCIe is PCIe 2.0 x16.

If i am correct, NA in data transfer mean "Not Available" (zero copy buffer), isn't it?

So, why just the third map (mapPtrC  is used to read the result of kernel) become zero copy buffer ??

Why mapPtrA and mapPtrB (is used to write the kernel) not become zero copy buffer ?? The transfer rate for mapPtrA and B is 3.4 TB/s, isn't it??

Could you explain to me clearly?

0 Likes

CodeXL calculate transfer rate for all transfer. the transfer rate is ~20TB/s. For some reason it show as NA. maybe It is just too big. All of them must be zero copy because it is impossible to achive 3.4TB/s transfer rate which mean it doesn't copy from device memory to host memory in other words zero memory buffer. NA only mean too big to show not that it is zero copy.