I'm just getting started with OpenCL programming, so please be gentle!
I have written a simple image resizing kernel using JOCL (Java OpenCL bindings) and I'm running this on a AMD HD 7970 GHz with the latest Catalyst drivers (9.002-120928m-149042C-ATI). The code works fine, but it is fairly slow: ~100ms to resize a 6299x4725 image to 1/4th it's size. After running JProfiler I find that 99% of the 100ms is spent in clEnqueueReadImage(). What could be causing this?
I have also tried enabling OpenCL profiling and reading the results using CL_PROFILING_COMMAND_START/CL_PROFILING_COMMAND_END, but (end-start) gives me 9506814 ns (~9.5ms) which does not match well the results I'm getting in JProfiler. Could this indicate a bug in JOCL?
Screenshot from JProfiler: