4 Replies Latest reply on Apr 16, 2015 2:23 AM by skanur

    Mapping device memory

    skanur

      Hello all,

       

      While working on my problem, I came across an interesting phenomenon which I'm trying to understand. Basically I create a pinned memory and do data tI ransfer between device and host using clEnqueueWriteBuffer. I get a datarate of about 6 GB/s on a Kaveri CPU with Hawaii GPU connected with PCIe 3 bus. This is maximum as verified by BufferBandwidth sample of AMD. To illustrate the measurement, here is the pseudocode

       

      // Create device and pinned host memory
      cl_mem dmem = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(cl_float) * size, NULL, &err); // Error checks are done, but not shown here
      cl_mem pinned_hmem = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, size * sizeof(cl_float), NULL, &err);
      cl_float *transfer_data = (float*) clEnqueueBuffer(commands, pinned_hmem, CL_TRUE, CL_MAP_WRITE, 0, size * sizeof(cl_float), 0, NULL, NULL, &err);
      memcpy(transfer_data, data, sizeof(cl_float) * size); // "data" consists of pre-defined stuff
      clEnqueueUnmapMemObject(commands, pinned_hmem, (void*) transfer_data, 0, NULL, NULL);
      // map again as read only
      transfer_data = (cl_float*) clEnqueueMapBuffer(commands, pinned_hmem, CL_TRUE, CL_MAP_READ, 0, size * sizeof(cl_float), 0, NULL, NULL, &err);
      clFinish(commands);
      
      startTimer();
      // This is done few iterations and average is calculated
      err = clEnqueueWriteBuffer(commands, dmem, CL_FALSE, 0, sizeof(cl_float) * size, transfer_data, 0, NULL, NULL);
      endTimer(); // Calculate the transfer rate
      
      
      
      

       

      However instead of clEnqueueWriteBuffer, if I map the device memory and copy the data, I get a data rate of close to 2.2 GB/s. I'm trying to understand why this discrepancy? Here is the pseudocode

       

      // Creation of device and pinned host memory remains same as above
      
      startTimer();
      // This too is averaged out after few iterations
      void *mapped_dmem = clEnqueueMapBuffer(commands, dmem, CL_TRUE, CL_MAP_WRITE, 0, sizeof(cl_float) * size, 0, NULL, NULL, &err);
      memcpy(mapped_dmem, transfer_data, sizeof(cl_float) * size);
      clEnqueueUnmapMemObject(commands, dmem, mapped_dmem, 0, NULL, NULL);
      endTimer(); // Calculate the transfer rate
      
      
      
      

       

      Could someone explain why the transfer rate is almost half?

       

      Thanks for reading

       

      Edit: Updated first pseudocode and put memcpy in right place