cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

ilmarih
Journeyman III

Optimizing a repetitive dot product between two float4 images

Hi, I'm trying to optimize an image correlation algorithm that computes dot products between the different intersections of two float4 images. On the CPU OpenCL achieves ~90% L1 bandwidth, on an HD 4850 I'm seeing 70% L1 bandwidth using GLSL and around 50% with OpenCL (I suppose it's slower because the OpenCL array fetches are uncached and I have to use terrible caching hacks.) I'm wondering how the 5xxx-series manage on this algo and if it's possible to reach 90% L1 bandwidth (or more) on the GPU...

I have the code up at github: 

http://github.com/kig/correlate_opencl

correlate.cl is the GPU kernel, correlate2.cl is the CPU OpenCL kernel, correlate.fs is the GLSL kernel, correlate_naive.cl is a naive GPU kernel and correlate_image.cl is an untested GPU kernel that uses images (untested because I have no hardware with image support.)



0 Likes
0 Replies