3 Replies Latest reply on Oct 25, 2011 9:27 AM by stgatilov

    Mapping image allocated with CL_MEM_USE_PERSISTENT_MEM_AMD

    stgatilov

      I have an image2D that was allocated with CL_MEM_USE_PERSISTENT_MEM_AMD flag. Then I map this image and try to memcpy data to the obtained pointer. However, after unmapping the image, kernel fetches wrong data from the image.

      I do not see any API errors in APP Profiler trace. Also, the pitch returned by mapping command perfectly matches the size of single image row in bytes.

       

      The problem disappears if I do ANY of the following:

      1. Delete CL_MEM_USE_PERSISTENT_MEM_AMD flag.

      2. Replace CL_MEM_USE_PERSISTENT_MEM_AMD with CL_MEM_ALLOC_HOST_PTR.

      3. Use buffer instead of image (the code is very different in this case).

      The first two options lack zero-copy feature and the last option lacks texture caching in kernel.

       

      The device used is A8-3850 APU located on the remote server. Unfortunately I don't have any AMD GPU at home.

      I suppose that mapping device-resident host-visible images is unsupported currently?

      Actually I need to overlap copying data from malloced host memory to device image2D with kernel execution.

       

      //............................... //create image in host-visible device memory inputImage = cl::Image2D(context, CL_MEM_READ_ONLY | CL_MEM_USE_PERSISTENT_MEM_AMD, cl::ImageFormat(CL_A, CL_FLOAT), WIDTH, HEIGHT, 0, 0, &err); //............................... //map the image cl::Event evMap; cl::size_t<3> origin, size; size.push_back(WIDTH); size.push_back(HEIGHT); size.push_back(1); origin.push_back(0); origin.push_back(0); origin.push_back(0); *dstPtr = (float*)pQueue->enqueueMapImage(inputImage, CL_FALSE, CL_MAP_WRITE, origin, size, &inputImagePitch, 0, 0, &evMap); evMap.wait(); //............................... //copy data to mapped image (k lines, sz bytes each) for (int i = 0; i<k; i++) { memcpy(dstPtr, srcPtr, sz); dstPtr += sz; srcPtr += inputImagePitch; } //............................... //unmap the image cl::Event evUnmap; pQueue->enqueueUnmapMemObject(buffer, ptr, 0, &evUnmap); evUnmap.wait(); //............................... //run the kernel devProcessBlockKernel.setArg(0, inputImage); devProcessBlockKernel.setArg(1, outBuffer); cl::Event evKernel; pQueue->enqueueNDRangeKernel(devProcessBlockKernel, cl::NullRange, OverallThreads, ThreadsInBlock, 0, &evKernel); //...............................

        • Mapping image allocated with CL_MEM_USE_PERSISTENT_MEM_AMD
          stgatilov

          Eventually I decided to use the third option: switch to buffers completely without even rewriting anything. Surprisingly enough I don't see any performance difference=). Perhaps both images and buffers use cache equally well.

          Anyway, it is still interesting whether mapping a device host-visible 2D image is supported...

          By the way, AMD APP 2.5, Windows 7 64 bit.

            • Mapping image allocated with CL_MEM_USE_PERSISTENT_MEM_AMD
              genaganna

               

              Originally posted by: stgatilov Eventually I decided to use the third option: switch to buffers completely without even rewriting anything. Surprisingly enough I don't see any performance difference=). Perhaps both images and buffers use cache equally well.

               

              Anyway, it is still interesting whether mapping a device host-visible 2D image is supported...

               

              By the way, AMD APP 2.5, Windows 7 64 bit.

               

              Stgatilov,

              There is a know issue in SDK2.5 on this.  Are WIDTH and HEIGHT power of 2?  Try with power of 2 sizes for both WIDTH and HEIGHT.