
reduction function which finds the index of max value in array
MicahVillmow Sep 2, 2008 7:22 PM (in response to bjang)bjang,
This is where the vector properties of the graphic card come in very handy. This can actually be handled with very minimal changes to the code.
reduce void red(float a<>, reduce float4 b<> )
{
if(b.x < a)
{
b.x = a;
b.y = indexof(a).x;
b.z = indexof(a).y;
}
}
reduction function which finds the index of max value in array
bjang Sep 2, 2008 8:26 PM (in response to MicahVillmow)Micah, thanks very much for helping me.
BTW, I tried your suggestion as follows and got incorrect answer. I am suspecting that indexof() is not valid in reduction kernel.
====================================
reduce void red(float a<>, reduce float4 b<> {
if(b.x > a) {
b.x = a;
b.y = indexof(a).x;
b.z = indexof(a).y;
}
}
int main(int argc, char* argv[])
{
float input[8] = { 5, 2, 8, 6, 3, 7, 9, 4} ;
float4 result;
{
float inputStream<8>;
streamRead(inputStream, input);
red(inputStream, result);
}
printf("result.x is %f \n", result.x);
printf("result.y is %f \n", result.y);
printf("result.z is %f \n", result.z);
return 0;
}========================
result.x is 2.000000
result.y is 2.000000
result.z is 2.000000
========================

reduction function which finds the index of max value in array
bjang Sep 2, 2008 8:29 PM (in response to bjang)Ceq,
I am very positive that you have a correct answer. Thanks.
BTW, do you have any other idea to do it efficiently on GPU, other than reduction kernel?



reduction function which finds the index of max value in array
Ceq Sep 2, 2008 8:25 PM (in response to bjang)That would be a nice solution, however looks like there is a issues in the current Brook+ version:
reduce void red1(float a<>, reduce float4 b<> )
This doesn't work, it would just compute the maximum value.
Looks like only b.x value assignation is performed, so the result would be b.y = b.z = b.x = max(a<> );
reduce void red2(float4 a<>, reduce float4 b<> )
Now the assignment works properly, looks like there is a bug and both streams should be equal type.
However the result is wrong again, b.x = max(a<> ); b.y = b.z = 0;
Note that in the documentation section 6.1 states that indexof operator is not valid for reduction
Note: Only tested on Radeon 4850 
reduction function which finds the index of max value in array
Ceq Sep 2, 2008 8:50 PM (in response to bjang)Well, I have a workaround, however it's better to wait for Micah's answer.
Anyway, if you want to see it:
kernel void setIndex(float a<>, out float2 b<>) {
b.x = a;
b.y = indexof(a);
}
reduce void red1(float2 a<>, reduce float2 b<>) {
if(b < a) b = a;
}
int main(int argc, char* argv[]) {
float input[8] = { 5, 2, 8, 6, 3, 7, 9, 4};
float2 result = float2(0, 0);
{
float inputS1<8>;
float2 inputS2<8>;
streamRead(inputS1, input);
setIndex(inputS1, inputS2);
red1(inputS2, result);
}
printf("Max value is input[%.0f] = %f\n", result.y, result.x);
return 0;
}
Note that it would be better to use a separate index stream, but you can't use more than one input stream in a reduction.

Without using Brook+ reductions you would have to implement a multistage algorithm with gather, calling a kernel several times using domain operator, it's a bit complex, so I just write a example:
iterative_max_calc(float input[ ], float size, out float pos<>, out float max<>);
input1 : 5, 2, 8, 6, 3, 7, 9, 4
size: 8;
output1max : 5, 8, 7, 9
output1pos : 0, 2, 5, 6
input2 : output1max
size : 4;
output2max : 8, 9
output2pos : 2, 6
input3 : output2max
size : 2;
output3max : 9
output3pos : 6 
reduction function which finds the index of max value in array
MicahVillmow Sep 2, 2008 9:38 PM (in response to bjang)Ahh yeah, I forgot that indexof cannot be used in a reduction kernel. The reason is because a reduction kernel is just a shortcut for doing a multistage algorithm, but the Brook+ Runtime handles the multiple stages for the developer. A seperate pass can be used to encode the index into a seperate component of vector is a solution similiar to what I was attempting to get across.
Hope this solved your problem.
reduction function which finds the index of max value in array
bjang Sep 3, 2008 12:25 AM (in response to MicahVillmow)Thanks, Ceq and Micah,
BTW, I keep wondering what is benefit of using reduction kernel in terms of performance. I couldn't find any information on performance of reduction kernel. Of course, it looks pretty just like recursive function in C.
The disassembled binary of reduction kernel looks pretty neat in GSA, seeming very fast. I didn't measure the performance yet though.
Thanks in advance.


reduction function which finds the index of max value in array
nberger Sep 3, 2008 7:07 AM (in response to bjang)Hi!
Performance wise I am a bit disappointed in reductions; I sum over a 8192 element 1Darray. For numeric reasons, the sum has to be performed in doubles, and because reductions require identical types in in and output, I need another "conversion" kernel. Together, these kernels are ~10x slower than copying the stream to host memory and summing with a simple for loop on the CPU. I also tested just summing floats, and even this is slower than the CPU.
Cheers
Nik
reduction function which finds the index of max value in array
zimoch Oct 19, 2008 2:43 PM (in response to nberger)My problem is similar but I have to find minimum of my 3D stream and reduce it to 2D. As a result of result of reduction I sould have index of cvalue from input1 stream.
reduce kernel void test2(float input1 <a,b,c> , reduce float output1<a,b> )
{
}I don't really know how to deal with this. Could anyone help how to write in Brook?
