5 Replies Latest reply on Jan 11, 2013 1:03 PM by oddbjornk

    clEnqueueReadImage() slow on fairly small images




      I'm just getting started with OpenCL programming, so please be gentle!


      I have written a simple image resizing kernel using JOCL (Java OpenCL bindings) and I'm running this on a AMD HD 7970 GHz with the latest Catalyst drivers (9.002-120928m-149042C-ATI). The code works fine, but it is fairly slow: ~100ms to resize a 6299x4725 image to 1/4th it's size. After running JProfiler I find that 99% of the 100ms is spent in clEnqueueReadImage(). What could be causing this?


      I have also tried enabling OpenCL profiling and reading the results using CL_PROFILING_COMMAND_START/CL_PROFILING_COMMAND_END, but (end-start) gives me 9506814 ns (~9.5ms) which does not match well the results I'm getting in JProfiler. Could this indicate a bug in JOCL?


      Code here: https://github.com/oddbjornkvalsund/opencl-img/blob/master/src/main/java/no/nixx/opencl/ImageResizer.java


      Screenshot from JProfiler:




        • Re: clEnqueueReadImage() slow on fairly small images

          Hi oddbjornk,

          I have not worked with JOCL, but here are a few basic pointers which can be helpful:

          1. I have faced some issues with event profiling myself. Better to use some standard system timers for time measurement.

          2. Do a clFinish(queue) call before and after the clEnqueueNDRangeKernel. So time it like






               Also try something similar to measure clEnqueueReadImage time.

          3. 10ms should mean a data transfer of about 40MB at a decent rate of 4GBps. Which might be the size of your image, in which case the profiler reports it correctly.

          4. I do not see any kernel at the above link, but maybe the kernel is not very compute intensive. Please share the kernel too


          Hope it helps.

          1 of 1 people found this helpful
          • Re: clEnqueueReadImage() slow on fairly small images

            Thanks guys! I'll look into this tomorrow and report my findings. I suspect I will have to try clEnqueueMapImage & friends, get some proper benchmarks and determine what's going on.


            Thanks again!