Archives Discussions

stgatilov · ‎10-15-2011

I have an image2D that was allocated with CL_MEM_USE_PERSISTENT_MEM_AMD flag. Then I map this image and try to memcpy data to the obtained pointer. However, after unmapping the image, kernel fetches wrong data from the image.

I do not see any API errors in APP Profiler trace. Also, the pitch returned by mapping command perfectly matches the size of single image row in bytes.

The problem disappears if I do ANY of the following:

1. Delete CL_MEM_USE_PERSISTENT_MEM_AMD flag.

2. Replace CL_MEM_USE_PERSISTENT_MEM_AMD with CL_MEM_ALLOC_HOST_PTR.

3. Use buffer instead of image (the code is very different in this case).

The first two options lack zero-copy feature and the last option lacks texture caching in kernel.

The device used is A8-3850 APU located on the remote server. Unfortunately I don't have any AMD GPU at home.

I suppose that mapping device-resident host-visible images is unsupported currently?

Actually I need to overlap copying data from malloced host memory to device image2D with kernel execution.

//............................... //create image in host-visible device memory inputImage = cl::Image2D(context, CL_MEM_READ_ONLY | CL_MEM_USE_PERSISTENT_MEM_AMD, cl::ImageFormat(CL_A, CL_FLOAT), WIDTH, HEIGHT, 0, 0, &err); //............................... //map the image cl::Event evMap; cl::size_t<3> origin, size; size.push_back(WIDTH); size.push_back(HEIGHT); size.push_back(1); origin.push_back(0); origin.push_back(0); origin.push_back(0); *dstPtr = (float*)pQueue->enqueueMapImage(inputImage, CL_FALSE, CL_MAP_WRITE, origin, size, &inputImagePitch, 0, 0, &evMap); evMap.wait(); //............................... //copy data to mapped image (k lines, sz bytes each) for (int i = 0; i<k; i++) { memcpy(dstPtr, srcPtr, sz); dstPtr += sz; srcPtr += inputImagePitch; } //............................... //unmap the image cl::Event evUnmap; pQueue->enqueueUnmapMemObject(buffer, ptr, 0, &evUnmap); evUnmap.wait(); //............................... //run the kernel devProcessBlockKernel.setArg(0, inputImage); devProcessBlockKernel.setArg(1, outBuffer); cl::Event evKernel; pQueue->enqueueNDRangeKernel(devProcessBlockKernel, cl::NullRange, OverallThreads, ThreadsInBlock, 0, &evKernel); //...............................

stgatilov · ‎10-15-2011

Eventually I decided to use the third option: switch to buffers completely without even rewriting anything. Surprisingly enough I don't see any performance difference=). Perhaps both images and buffers use cache equally well.

Anyway, it is still interesting whether mapping a device host-visible 2D image is supported...

By the way, AMD APP 2.5, Windows 7 64 bit.

genaganna · ‎10-25-2011

Originally posted by: stgatilov Eventually I decided to use the third option: switch to buffers completely without even rewriting anything. Surprisingly enough I don't see any performance difference=). Perhaps both images and buffers use cache equally well.

Anyway, it is still interesting whether mapping a device host-visible 2D image is supported...

By the way, AMD APP 2.5, Windows 7 64 bit.

Stgatilov,

There is a know issue in SDK2.5 on this. Are WIDTH and HEIGHT power of 2? Try with power of 2 sizes for both WIDTH and HEIGHT.

stgatilov · ‎10-25-2011

Thank you for reply!

The size of image was 1920x1080.

I bumped into this bug during AMD APP performance challenge. The competition is over now and I don't have access to AMD GPU anymore.

As this issue is already known, then the topic can be closed=)

Archives Discussions

Mapping image allocated with CL_MEM_USE_PERSISTENT_MEM_AMD