AnsweredAssumed Answered

Mapping device memory

Question asked by skanur on Apr 15, 2015
Latest reply on Apr 16, 2015 by skanur

Hello all,

 

While working on my problem, I came across an interesting phenomenon which I'm trying to understand. Basically I create a pinned memory and do data tI ransfer between device and host using clEnqueueWriteBuffer. I get a datarate of about 6 GB/s on a Kaveri CPU with Hawaii GPU connected with PCIe 3 bus. This is maximum as verified by BufferBandwidth sample of AMD. To illustrate the measurement, here is the pseudocode

 

// Create device and pinned host memory
cl_mem dmem = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(cl_float) * size, NULL, &err); // Error checks are done, but not shown here
cl_mem pinned_hmem = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, size * sizeof(cl_float), NULL, &err);
cl_float *transfer_data = (float*) clEnqueueBuffer(commands, pinned_hmem, CL_TRUE, CL_MAP_WRITE, 0, size * sizeof(cl_float), 0, NULL, NULL, &err);
memcpy(transfer_data, data, sizeof(cl_float) * size); // "data" consists of pre-defined stuff
clEnqueueUnmapMemObject(commands, pinned_hmem, (void*) transfer_data, 0, NULL, NULL);
// map again as read only
transfer_data = (cl_float*) clEnqueueMapBuffer(commands, pinned_hmem, CL_TRUE, CL_MAP_READ, 0, size * sizeof(cl_float), 0, NULL, NULL, &err);
clFinish(commands);

startTimer();
// This is done few iterations and average is calculated
err = clEnqueueWriteBuffer(commands, dmem, CL_FALSE, 0, sizeof(cl_float) * size, transfer_data, 0, NULL, NULL);
endTimer(); // Calculate the transfer rate



 

However instead of clEnqueueWriteBuffer, if I map the device memory and copy the data, I get a data rate of close to 2.2 GB/s. I'm trying to understand why this discrepancy? Here is the pseudocode

 

// Creation of device and pinned host memory remains same as above

startTimer();
// This too is averaged out after few iterations
void *mapped_dmem = clEnqueueMapBuffer(commands, dmem, CL_TRUE, CL_MAP_WRITE, 0, sizeof(cl_float) * size, 0, NULL, NULL, &err);
memcpy(mapped_dmem, transfer_data, sizeof(cl_float) * size);
clEnqueueUnmapMemObject(commands, dmem, mapped_dmem, 0, NULL, NULL);
endTimer(); // Calculate the transfer rate



 

Could someone explain why the transfer rate is almost half?

 

Thanks for reading

 

Edit: Updated first pseudocode and put memcpy in right place

Outcomes