4 Replies Latest reply on Apr 16, 2015 2:23 AM by skanur

    Mapping device memory


      Hello all,


      While working on my problem, I came across an interesting phenomenon which I'm trying to understand. Basically I create a pinned memory and do data tI ransfer between device and host using clEnqueueWriteBuffer. I get a datarate of about 6 GB/s on a Kaveri CPU with Hawaii GPU connected with PCIe 3 bus. This is maximum as verified by BufferBandwidth sample of AMD. To illustrate the measurement, here is the pseudocode


      // Create device and pinned host memory
      cl_mem dmem = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(cl_float) * size, NULL, &err); // Error checks are done, but not shown here
      cl_mem pinned_hmem = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, size * sizeof(cl_float), NULL, &err);
      cl_float *transfer_data = (float*) clEnqueueBuffer(commands, pinned_hmem, CL_TRUE, CL_MAP_WRITE, 0, size * sizeof(cl_float), 0, NULL, NULL, &err);
      memcpy(transfer_data, data, sizeof(cl_float) * size); // "data" consists of pre-defined stuff
      clEnqueueUnmapMemObject(commands, pinned_hmem, (void*) transfer_data, 0, NULL, NULL);
      // map again as read only
      transfer_data = (cl_float*) clEnqueueMapBuffer(commands, pinned_hmem, CL_TRUE, CL_MAP_READ, 0, size * sizeof(cl_float), 0, NULL, NULL, &err);
      // This is done few iterations and average is calculated
      err = clEnqueueWriteBuffer(commands, dmem, CL_FALSE, 0, sizeof(cl_float) * size, transfer_data, 0, NULL, NULL);
      endTimer(); // Calculate the transfer rate


      However instead of clEnqueueWriteBuffer, if I map the device memory and copy the data, I get a data rate of close to 2.2 GB/s. I'm trying to understand why this discrepancy? Here is the pseudocode


      // Creation of device and pinned host memory remains same as above
      // This too is averaged out after few iterations
      void *mapped_dmem = clEnqueueMapBuffer(commands, dmem, CL_TRUE, CL_MAP_WRITE, 0, sizeof(cl_float) * size, 0, NULL, NULL, &err);
      memcpy(mapped_dmem, transfer_data, sizeof(cl_float) * size);
      clEnqueueUnmapMemObject(commands, dmem, mapped_dmem, 0, NULL, NULL);
      endTimer(); // Calculate the transfer rate


      Could someone explain why the transfer rate is almost half?


      Thanks for reading


      Edit: Updated first pseudocode and put memcpy in right place