Parallel reduction on GPU/CPU using OpenCL

Discussion created by erman_amd on May 22, 2011
Latest reply on May 23, 2011 by LeeHowes


Is there a way to do parallel reduction without using local memory?

I studied the sample provided in AMD APP SDK 2.4. It uses local memory and only recieve one vector as its input

I want to do parallel reduction and without using local memory. For example, the kernel receives 3 input vectors and outputs three values (each is the reduction value of each vector). Is it possible write this kind of kernel? Any hints, helps?

Thank you