    clEnqueueReadImage() slow on fairly small images




      I'm just getting started with OpenCL programming, so please be gentle!


      I have written a simple image resizing kernel using JOCL (Java OpenCL bindings) and I'm running this on a AMD HD 7970 GHz with the latest Catalyst drivers (9.002-120928m-149042C-ATI). The code works fine, but it is fairly slow: ~100ms to resize a 6299x4725 image to 1/4th it's size. After running JProfiler I find that 99% of the 100ms is spent in clEnqueueReadImage(). What could be causing this?


      I have also tried enabling OpenCL profiling and reading the results using CL_PROFILING_COMMAND_START/CL_PROFILING_COMMAND_END, but (end-start) gives me 9506814 ns (~9.5ms) which does not match well the results I'm getting in JProfiler. Could this indicate a bug in JOCL?


      Code here: https://github.com/oddbjornkvalsund/opencl-img/blob/master/src/main/java/no/nixx/opencl/ImageResizer.java


          Hi oddbjornk,

          I have not worked with JOCL, but here are a few basic pointers which can be helpful:

          1. I have faced some issues with event profiling myself. Better to use some standard system timers for time measurement.

          2. Do a clFinish(queue) call before and after the clEnqueueNDRangeKernel. So time it like






               Also try something similar to measure clEnqueueReadImage time.

          3. 10ms should mean a data transfer of about 40MB at a decent rate of 4GBps. Which might be the size of your image, in which case the profiler reports it correctly.

          4. I do not see any kernel at the above link, but maybe the kernel is not very compute intensive. Please share the kernel too


          Hope it helps.

            Thanks guys! I'll look into this tomorrow and report my findings. I suspect I will have to try clEnqueueMapImage & friends, get some proper benchmarks and determine what's going on.


            Thanks again!