cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

gales
Journeyman III

Optimizing OpenCL 1D downsampler

I would like a very fast 1D array resampler but haven't been able to find one so I'm trying a two stage approach. Linear interpolate in an upsampler to some integer factor greater that what I really want and downsample to the value I need.

Example: incomming data set is 1024 samples long and I want 801 samples in my final array. I then upsample to 1602 then downsample by a factor of 2 to get 801.

Problem is it's just not fast enough even on my W9100 GPU. I'm trying to process 1G samples/sec. Funny thing is the clFFT has no problem processing at these rates but a simple downsampler chokes.

I'm doing a maximum, minimum, and average output in parallel on seperate queues. I tried doing them all together in a float_3 array output but that was even slower.

Does anyone have any ideas on how to optimize this?

__kernel void Downsample(__global float *in, __global float *out, int type, int ratio)

{

  int gx = get_global_id(0);

  int gw = get_global_size(0);

  int gy = get_global_id(1);

  int gh = get_global_size(1);

  int index = gy * gw + gx;

  __global float *ptr = in + index * ratio;

  float value = ptr[0];

  for (int i = 1; i < ratio; i++)

  {

       if (type == 0)

       value = max(value, ptr);

       else if (type == 1)

            value = min(value, ptr);

       else

       value += ptr;

  }

  if (type == 2)

       value /= ratio;

  out[index] = value;

}

0 Likes
1 Reply
ekondis
Adept II

You should coalesce your global memory accesses. Try using local memory as a buffer.

0 Likes