9 Replies Latest reply on Oct 19, 2008 2:43 PM by zimoch

    reduction function which finds the index of max value in array

    bjang

      Hello,

      I would like to write a reduction kernel which finds (or returns) the index of max value in array.

      We have seen a beautiful reduction function which finds max value in array as follows by Ceq and by jski.

      reduce void red(double a<>, reduce double b<>{
          if(b < a)   b = a;
      }

      reduce void red( float value<>, reduce float result<> ){
         result = max( value, value );
      }

      Now, I would like to find the index of that max value. Do you guys think it is possible with reduction kernel?

        • reduction function which finds the index of max value in array
          MicahVillmow
          bjang,
          This is where the vector properties of the graphic card come in very handy. This can actually be handled with very minimal changes to the code.
          reduce void red(float a<>, reduce float4 b<> )
          {

          if(b.x < a)
          {
          b.x = a;
          b.y = indexof(a).x;
          b.z = indexof(a).y;
          }
          }
            • reduction function which finds the index of max value in array
              bjang

              Micah, thanks very much for helping me.

              BTW, I tried your suggestion as follows and got incorrect answer. I am suspecting that indexof() is not valid in reduction kernel.

              ====================================

              reduce void red(float a<>, reduce float4 b<> {
                  if(b.x > a) {
                      b.x = a;
                      b.y = indexof(a).x;
                      b.z = indexof(a).y;
                  }
              }

              int main(int argc, char* argv[])
              {
                  float input[8] = { 5, 2, 8, 6, 3, 7, 9, 4} ;
                  float4 result;

                  {
                      float inputStream<8>;
                      streamRead(inputStream, input);
                      red(inputStream, result);
                  }

                  printf("result.x is %f \n", result.x);
                  printf("result.y is %f \n", result.y);
                  printf("result.z is %f \n", result.z);

                  return 0;
              }

              ========================

              result.x is 2.000000

              result.y is 2.000000

              result.z is 2.000000

              ========================

            • reduction function which finds the index of max value in array
              Ceq
              That would be a nice solution, however looks like there is a issues in the current Brook+ version:

              reduce void red1(float a<>, reduce float4 b<> )

              This doesn't work, it would just compute the maximum value.
              Looks like only b.x value assignation is performed, so the result would be b.y = b.z = b.x = max(a<> );

              reduce void red2(float4 a<>, reduce float4 b<> )

              Now the assignment works properly, looks like there is a bug and both streams should be equal type.
              However the result is wrong again, b.x = max(a<> ); b.y = b.z = 0;
              Note that in the documentation section 6.1 states that indexof operator is not valid for reduction

              Note: Only tested on Radeon 4850
              • reduction function which finds the index of max value in array
                Ceq
                Well, I have a workaround, however it's better to wait for Micah's answer.
                Anyway, if you want to see it:

                kernel void setIndex(float a<>, out float2 b<>) {
                b.x = a;
                b.y = indexof(a);
                }

                reduce void red1(float2 a<>, reduce float2 b<>) {
                if(b < a) b = a;
                }


                int main(int argc, char* argv[]) {
                float input[8] = { 5, 2, 8, 6, 3, 7, 9, 4};
                float2 result = float2(0, 0);
                {
                float inputS1<8>;
                float2 inputS2<8>;
                streamRead(inputS1, input);
                setIndex(inputS1, inputS2);
                red1(inputS2, result);
                }
                printf("Max value is input[%.0f] = %f\n", result.y, result.x);
                return 0;
                }

                Note that it would be better to use a separate index stream, but you can't use more than one input stream in a reduction.

                ---------------

                Without using Brook+ reductions you would have to implement a multistage algorithm with gather, calling a kernel several times using domain operator, it's a bit complex, so I just write a example:

                iterative_max_calc(float input[ ], float size, out float pos<>, out float max<>);

                input1 : 5, 2, 8, 6, 3, 7, 9, 4
                size: 8;
                output1max : 5, 8, 7, 9
                output1pos : 0, 2, 5, 6

                input2 : output1max
                size : 4;
                output2max : 8, 9
                output2pos : 2, 6

                input3 : output2max
                size : 2;
                output3max : 9
                output3pos : 6
                • reduction function which finds the index of max value in array
                  MicahVillmow
                  Ahh yeah, I forgot that indexof cannot be used in a reduction kernel. The reason is because a reduction kernel is just a shortcut for doing a multi-stage algorithm, but the Brook+ Runtime handles the multiple stages for the developer. A seperate pass can be used to encode the index into a seperate component of vector is a solution similiar to what I was attempting to get across.

                  Hope this solved your problem.
                    • reduction function which finds the index of max value in array
                      bjang

                      Thanks, Ceq and Micah,

                      BTW, I keep wondering what is benefit of using reduction kernel in terms of performance. I couldn't find any information on performance of reduction kernel. Of course, it looks pretty just like recursive function in C.

                      The disassembled binary of reduction kernel looks pretty neat in GSA, seeming very fast. I didn't measure the performance yet though.

                      Thanks in advance.

                    • reduction function which finds the index of max value in array
                      nberger
                      Hi!

                      Performance wise I am a bit disappointed in reductions; I sum over a 8192 element 1D-array. For numeric reasons, the sum has to be performed in doubles, and because reductions require identical types in in- and output, I need another "conversion" kernel. Together, these kernels are ~10x slower than copying the stream to host memory and summing with a simple for loop on the CPU. I also tested just summing floats, and even this is slower than the CPU.
                      Cheers

                      Nik