Hello everybody,
I have a question about OpenCL especially about zero copy buffer. I create an OpenCL program for my assignment and I get difficulty about how to use zero copy buffer.
I use prepinned buffer (CL_MEM_ALLOC_HOST_PTR) and map the buffer (clEnqueueMapBuffer) from host to device and from device to host in order to get zero copy result.
I use AMD APP Profiler to get the result but the result is weird.
Transfer from host to device using zero copy buffer produce weird result like using pinned buffer ( higher transfer size ) but the transfer from device to host produce good result (zero copy buffer result -->NA).
Here is the result:
I use AMD APP SDK 2.6 and OS Windows 7 (64 bit)
I use discrete GPU ( AMD Radeon HD6630M ) as my OpenCL device. Is it because I use discrete GPU and not integrated GPU ??
My discrete GPU suppot Virtual Memory (VM)
Here is the code:
// Allocate Device Memory For Input And Output
d_A = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, sizeof(cl_float)*size*size, 0, &err);
d_B = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, sizeof(cl_float)*size*size, 0, &err);
d_C = clCreateBuffer(context, CL_MEM_WRITE_ONLY | CL_MEM_ALLOC_HOST_PTR , sizeof(cl_float)*size*size, 0, &err);
void* mapPtrA = (float*)clEnqueueMapBuffer( queue, d_A, CL_TRUE, CL_MAP_WRITE, 0, sizeof(cl_float)*size*size, 0, NULL, NULL, NULL);
void* mapPtrB = (float*)clEnqueueMapBuffer( queue, d_B, CL_TRUE, CL_MAP_WRITE, 0, sizeof(cl_float)*size*size, 0, NULL, NULL, NULL);
memcpy(mapPtrA, matrixA, sizeof(cl_float)*size*size);
memcpy(mapPtrB, matrixB, sizeof(cl_float)*size*size);
clEnqueueUnmapMemObject(queue, d_A, mapPtrA, 0, NULL, NULL);
clEnqueueUnmapMemObject(queue, d_B, mapPtrB, 0, NULL, NULL);
//this function call kernel
MatrixMul(d_A, d_B, d_C, size);
void* mapPtrC = (float*)clEnqueueMapBuffer( queue, d_C, CL_TRUE, CL_MAP_READ, 0, sizeof(cl_float)*size*size, 0, NULL, NULL, NULL);
memcpy(matrixC, mapPtrC, sizeof(cl_float)*size*size);
clEnqueueUnmapMemObject(queue, d_C, mapPtrC, 0, NULL, NULL);
err = clReleaseMemObject(d_A);
err = clReleaseMemObject(d_B);
err = clReleaseMemObject(d_C);
it is zero copy. because 3.4 TB/s copy rate over PCIe is impossible.
Thanks for reply.
I 'm sorry, i still don't understand. I know that pinned host memory and zero copy buffer is limited with PCIe bandwidth.
My PCIe is PCIe 2.0 x16.
If i am correct, NA in data transfer mean "Not Available" (zero copy buffer), isn't it?
So, why just the third map (mapPtrC is used to read the result of kernel) become zero copy buffer ??
Why mapPtrA and mapPtrB (is used to write the kernel) not become zero copy buffer ?? The transfer rate for mapPtrA and B is 3.4 TB/s, isn't it??
Could you explain to me clearly?
CodeXL calculate transfer rate for all transfer. the transfer rate is ~20TB/s. For some reason it show as NA. maybe It is just too big. All of them must be zero copy because it is impossible to achive 3.4TB/s transfer rate which mean it doesn't copy from device memory to host memory in other words zero memory buffer. NA only mean too big to show not that it is zero copy.