
Reduction kernel problem
berathebrain May 17, 2009 8:58 PM (in response to berathebrain)I may have found a solution, but it is extremly slow
The host code:
float product;
::brook::Stream<float> streamB(1,&n);
::brook::Stream<float2> streamB2(1,&n);
streamB.read(b);
matrix_combine_gpu_ati(streamB,streamB,streamB2);
matrix_DotProduct_gpu_ati(streamB2,product);Kernel code:
kernel void matrix_combine_gpu_ati(float in1<>,float in2<>,out float2 out1<>
{
out1.x=in1;
out1.y=in2;
}
// multiply vector by vector (each vector should have one dimension equal to 1)
reduce void matrix_DotProduct_gpu_ati(float2 a<>,reduce float c<>{
c += a.x * a.y;
}
Reduction kernel problem
eduardoschardong May 18, 2009 1:15 AM (in response to berathebrain)I'm surprised your solution worked, to work it should be something like:
kernel void product(float a<>, float<>b, out float c<>)
{
c = a * b;
}
reduce void reduce_sum(float a<>, reduce float b<>)
{
b += a;
}
About the performance issue, reduction kernels won't help, in fact, a dot product is likely to be limited by memory bandwidth, the best you can do to help is reduce the number of trips to memory by reducing the numbers of kernels being launched, doing your own reduction, lds may help.
