10 Replies Latest reply on Jan 12, 2017 9:01 AM by boxerab

    Strange clEnqueueMapImage/clEnqueueUnmapMemObject behaviour

    boxerab

      Windows 10

      Latest Crimson Driver

      RX 470

       

      I created a 1024x1024 image and mapped it with the CL_MAP_WRITE flag,

      but I only mapped a 64x64 region in the image.

       

      Using the returned pointer, I filled the entire 1024x1024 image , and called clEnqueueUnmapMemObject.

       

      I expected only the 64x64 region to be copied from host to device, but in fact the entire 1024x1024 region gets copied.

       

      Is this expected behaviour ? This doesn't seem very efficient to me: if I only map a 64x64 region, then only this region

      should get copied to the card.

       

      Thanks,

      Aaron

        • Re: Strange clEnqueueMapImage/clEnqueueUnmapMemObject behaviour
          dipak

          Hi Aaron,

          Thanks for reporting it.

          May I know how the image was created? Was it in pinned host memory? It would be really helpful if you could share a code snippet that manifests the  above behaviour.

           

          Regards,

            • Re: Strange clEnqueueMapImage/clEnqueueUnmapMemObject behaviour
              boxerab

              Thanks, Dipak. Here is the creation and mapping code.

              So, it looks like the image dimensions are ignored by clEnqueueMapImage,

              and the entire image is always mapped.

               

               

               

               

              // create image

              cl_mem_flags flags = CL_MEM_ALLOC_HOST_PTR;

              flags |= CL_MEM_READ_ONLY | CL_MEM_HOST_WRITE_ONLY;

               

                  cl_image_desc desc;

                  desc.image_type = CL_MEM_OBJECT_IMAGE2D;

                  desc.image_width = 540;

                  desc.image_height = 1080;

                  desc.image_depth = 0;

                  desc.image_array_size = 0;

                  desc.image_row_pitch = 0;

                  desc.image_slice_pitch = 0;

                  desc.num_mip_levels = 0;

                  desc.num_samples = 0;

                  desc.buffer = NULL;

               

                  cl_image_format format;

                  format.image_channel_order = CL_RGBA;

                  format.image_channel_data_type = CL_UNSIGNED_INT32;

               

              cl_mem image = clCreateImage(ocl->context, flags, &format, &desc, hostBuffer, &error_code);

                  if (CL_SUCCESS != error_code)    {

                 Util::LogError("Error: clCreateImage (CL_QUEUE_CONTEXT) returned %s.\n", Util::TranslateOpenCLError(error_code));

                  }

               

               

              // map region of image

              cl_int error_code = CL_SUCCESS;

                  size_t image_dimensions[3] = { 64,64, 1 };

                  size_t image_origin[3] = { 0, 0, 0 };

                  size_t image_row_pitch;

               

                  *mappedPtr = clEnqueueMapImage(queue,

                 img,
                 TRUE,
                 CL_MAP_WRITE,
                 image_origin,
                 image_dimensions,
                 &image_row_pitch,
                 NULL,
                 0,
                 NULL,
                 NULL,
                 &error_code);
                  if (CL_SUCCESS != error_code){
                 Util::LogError("Error: clEnqueueMapImage return %s.\n", Util::TranslateOpenCLError(error_code));

                  }

              • Re: Strange clEnqueueMapImage/clEnqueueUnmapMemObject behaviour
                boxerab

                Dipak,

                I can confirm that the problem is CL_MEM_ALLOC_HOST_PTR .

                I took the ImageBandwidth sample from APP SDK, and made some small modifications to the  program -

                created a 8192x8192 image for writing, set the map region to 64x64, and looked at the timing for the unmap

                call.

                 

                WIthout CL_MEM_ALLOC_HOST_PTR, unmap is very fast, as expected from such a small region.

                 

                With CL_MEM_ALLOC_HOST_PTR, unmap takes the same time as with map region set to 8192x8192.

                 

                So, there is a bug with region mapping when CL_MEM_ALLOC_HOST_PTR is set on image creation.

                 

                Cheers,

                Aaron

                  • Re: Strange clEnqueueMapImage/clEnqueueUnmapMemObject behaviour
                    dipak

                    Hi Aaron,

                    Sorry for this delayed reply.

                    I've got to know that the above behaviour seems an expected one. CL_MEM_ALLOC_HOST_PTR means system memory allocation. There is no such thing as “region” transfer in this case.

                     

                    Regards,

                    It’s the same memory as seen by CPU or GPU.

                      • Re: Strange clEnqueueMapImage/clEnqueueUnmapMemObject behaviour
                        boxerab

                        Thanks, Dipak. But, it still sounds like a bug to me, or perhaps a missed opportunity?

                         

                        For a discrete GPU, there is host memory and there is device memory. When I enqueue an unmap,

                        even with CL_MEM_ALLOC_HOST_PTR flag,  memory is transferred from host to device.

                        So, it must be possible to specify a region in this case.

                         

                        If the driver always transfers the entire image, then this is very inefficient.

                         

                        For APU, the situation is different. But for dGPU, it still makes sense to transfer a region, even with CL_MEM_ALLOC_HOST_PTR.