15 Replies Latest reply on Jul 16, 2009 3:44 PM by gaurav.garg

    Finding max, min, and mean/median in an multidimensional array

    riza.guntur

      Dear all,

      I have a problem like this:

      float input[100][16];

      I want to group each 5 rows of the input so I will have 20 of 5 rows each with 16 dimensions.

      For each group, I need to find max, min, and mean/median within each dimension separately, so I get 16 of max, min, and mean/median for each group. The next step is rather confusing so I ask this first pass first.

      Is there any ways to do it using Brook+? I thought of using reduce kernel.

      reduce void
      sum(float i<>, reduce float o<>
      {
          o = o + i/(float)5;
      }

      for each 5 within a dimension in a group. But how? These each 5 separated by rows while reduce kernel operated in columns.

      Is there anyway to do it by offset, for each 5 then reduce it to one? Yes I need to change the 2 dimensional array to 1 dimension but it is not hard.

      Later will be compiled in the CPU for next operation.

      As for max and min values haven't got any ideas yet. Anyone has a suggestion?

      Thanks anyway.

        • Finding max, min, and mean/median in an multidimensional array
          gaurav.garg

          It seems you want to use something similar to partial reduction where you reduce multiple groups of input stream to single value for each group.

          e.g in your case, you want to reduce stream from [100][16] to [20][16].

          Take a look at section A.4.1.3 of stream computing user guide on Partial reduction.

              • Finding max, min, and mean/median in an multidimensional array
                riza.guntur

                I've tried the A4.1.3 and using the rest of min max reduction function there.

                But it seems my program didn't execute well, the is access violation there, that I don't understand why.

                From example the written:

                #include "brookgenfiles/percobaan_pertama.h"
                #include <iostream>
                #include <iomanip>
                #include <fstream>
                using namespace std;
                using namespace brook;

                template <typename T>
                T **AllocateDynamicArray2D( int nRows, int nCols)
                {
                      T **dynamicArray;

                      dynamicArray = new T*[nRows];
                      for( int i = 0 ; i < nRows ; i++ )
                      dynamicArray = new T [nCols];

                      return dynamicArray;
                }

                template <typename T>
                void FreeDynamicArray2D(T** dArray)
                {
                      delete [] *dArray;
                      delete [] dArray;
                }

                template <typename T>
                T *AllocateDynamicArray1D( int nDims)
                {
                      T *dynamicArray;

                      dynamicArray = new T[nDims];

                      return dynamicArray;
                }

                template <typename T>
                void FreeDynamicArray1D(T* dArray)
                {
                      delete [] dArray;
                }

                int
                main(int argc, char* argv[])
                {
                    int jumlahData = 480;
                    int jumlahDiSatuGrup = 5;
                    int jumlahDimensi = 16;
                    int jumlahOutput = 6;
                    unsigned int streamSize[] = {jumlahDimensi, jumlahData};
                    unsigned int streamSizeReduce[] = {jumlahDimensi, jumlahData/jumlahDiSatuGrup};

                    unsigned int rank = 2;
                    //int baris = jumlahData * jumlahDimensi / jumlahDiSatuGrup;

                    //float ** arr0 = AllocateDynamicArray<float>(baris,jumlahDiSatuGrup);

                    float **arr0 = AllocateDynamicArray2D<float>(jumlahDimensi,jumlahData);
                    float **arr1 = AllocateDynamicArray2D<float>(jumlahDimensi,jumlahData/jumlahDiSatuGrup);
                    float **arr2 = AllocateDynamicArray2D<float>(jumlahDimensi,jumlahData/jumlahDiSatuGrup);
                    float **arr3 = AllocateDynamicArray2D<float>(jumlahDimensi,jumlahData/jumlahDiSatuGrup);

                    ifstream inFile;
                    
                    inFile.open("train data.txt");
                    if (!inFile) {
                        cout << "Unable to open file";
                        exit(1); // terminate with error
                    }

                    //for( int i = 0; i < baris * jumlahDiSatuGrup; i++)
                    for( int i = 0; i < jumlahDimensi * jumlahData; i++)
                    {
                        float temp;
                        if( inFile >> temp)
                            arr0[i/jumlahData][i%jumlahData] = temp;
                    }

                    /*for( int i = 0; i < jumlahDimensi * jumlahData; i++)
                    {
                        printf("%f ", arr0[i/jumlahData][i%jumlahData]);
                    }*/

                    Stream<float> streami0(rank, streamSize);
                    Stream<float> streami1(rank, streamSizeReduce);
                    Stream<float> streami2(rank, streamSizeReduce);
                    Stream<float> streami3(rank, streamSizeReduce);

                    streamRead(streami0,arr0);
                    mean_of_five(streami0,streami1);
                    max_reduce(streami0,streami2);
                    min_reduce(streami0,streami3);
                    streami1.write(arr1);
                    streami2.write(arr2);
                    streami3.write(arr3);

                    inFile.close();
                    FreeDynamicArray2D<float>(arr0);
                    FreeDynamicArray2D<float>(arr1);
                    FreeDynamicArray2D<float>(arr2);
                    FreeDynamicArray2D<float>(arr3);


                    getchar();
                    return 0;
                }


                the brook+ code is like one below:

                reduce void mean_of_five(float a<>, reduce float b<>
                {
                    b += (float)0.2*a;
                }

                reduce void max_reduce(float a<>, reduce float b<>
                {
                    if(a > b)
                        b = a;
                }

                reduce void min_reduce(float a<>, reduce float b<>
                {
                    if(a < b)
                        b = a;
                }

                 

                Why the access violation happens? The first subscript was fine if I look at the example.

                  • Finding max, min, and mean/median in an multidimensional array
                    riza.guntur

                    I'm sorry about paste from word warning, I've edited using paste from word but still gone bad.

                    After changing the array to 1D it does fine. Though I don't understand why. I ame quite familiar with C but not C++ so... I beg your explanation...

                    The change is below, to easily see what I add just search for "//here the change."

                    #include "brookgenfiles/percobaan_pertama.h"
                    #include
                    #include
                    #include
                    using namespace std;
                    using namespace brook;

                    template
                    T **AllocateDynamicArray2D( int nRows, int nCols)
                    {
                          T **dynamicArray;

                          dynamicArray = new T*[nRows];
                          for( int i = 0 ; i < nRows ; i++ )
                          dynamicArray = new T [nCols];

                          return dynamicArray;
                    }

                    template
                    void FreeDynamicArray2D(T** dArray)
                    {
                          delete [] *dArray;
                          delete [] dArray;
                    }

                    template
                    T *AllocateDynamicArray1D( int nDims)
                    {
                          T *dynamicArray;

                          dynamicArray = new T[nDims];

                          return dynamicArray;
                    }

                    template
                    void FreeDynamicArray1D(T* dArray)
                    {
                          delete [] dArray;
                    }

                    int
                    main(int argc, char* argv[])
                    {
                        unsigned int jumlahData = 480;
                        unsigned  int jumlahDiSatuGrup = 5;
                        unsigned  int jumlahDimensi = 16;
                        unsigned int jumlahOutput = 6;
                        unsigned int streamSize[] = {jumlahDimensi, jumlahData};
                        unsigned int streamSizeReduce[] = {jumlahDimensi, jumlahData/jumlahDiSatuGrup};

                        unsigned int rank = 2;
                        //int baris = jumlahData * jumlahDimensi / jumlahDiSatuGrup;

                        //float ** arr0 = AllocateDynamicArray(baris,jumlahDiSatuGrup);

                    //here the change
                        float *arr0 = new float[jumlahDimensi*jumlahData];
                        float *arr1 = new float[jumlahDimensi*jumlahData/jumlahDiSatuGrup];
                        float *arr2 = new float[jumlahDimensi*jumlahData/jumlahDiSatuGrup];
                        float *arr3 = new float[jumlahDimensi*jumlahData/jumlahDiSatuGrup];
                        memset(arr0, 0, jumlahDimensi * jumlahData * sizeof(float));
                        memset(arr1, 0, jumlahDimensi * jumlahData /jumlahDiSatuGrup * sizeof(float));
                        memset(arr2, 0, jumlahDimensi * jumlahData /jumlahDiSatuGrup * sizeof(float));
                        memset(arr3, 0, jumlahDimensi * jumlahData /jumlahDiSatuGrup * sizeof(float));

                        ifstream inFile;
                        
                        inFile.open("train data.txt");
                        if (!inFile) {
                            cout << "Unable to open file";
                            exit(1); // terminate with error
                        }

                        //for( int i = 0; i < baris * jumlahDiSatuGrup; i++)
                        /*for( int i = 0; i < jumlahDimensi * jumlahData; i++)
                        {
                            float temp;
                            if( inFile >> temp)
                                arr0[i/jumlahData][i%jumlahData] = temp;
                        }*/

                    //here the change
                        for(unsigned int i = 0; i < jumlahDimensi; i++)
                        {
                            for(unsigned int j = 0; j < jumlahData; j++)
                            {
                                unsigned int index = i * jumlahData + j;
                                float temp;
                                if( inFile >> temp)
                                    arr0[index] = temp;
                            }
                        }

                        /*for( int i = 0; i < jumlahDimensi * jumlahData; i++)
                        {
                            printf("%f ", arr0[i/jumlahData][i%jumlahData]);
                        }*/

                        Stream streami0(rank, streamSize);
                        Stream streami1(rank, streamSizeReduce);
                        Stream streami2(rank, streamSizeReduce);
                        Stream streami3(rank, streamSizeReduce);

                        streamRead(streami0,arr0);
                        mean_of_five(streami0,streami1);
                        max_reduce(streami0,streami2);
                        min_reduce(streami0,streami3);
                        streamWrite(streami1,arr1);
                        streamWrite(streami2,arr2);
                        streamWrite(streami3,arr3);

                        inFile.close();
                        delete[] arr0;
                        delete[] arr1;
                        delete[] arr2;
                        delete[] arr3;

                        getchar();
                        return 0;
                    }

                      • Finding max, min, and mean/median in an multidimensional array
                        riza.guntur

                        Oh man, after checking the values of each processed stream output, I get unexpected results

                        cout << (arr0[0]+arr0[1]+arr0[2]+arr0[3]+arr0[4])/5 <
                        cout << arr1[0] << endl; the mean GPU which is still broken (at least for me)
                        cout << arr2[0] << endl; the maximum output which is broken
                        cout << arr3[0] << endl; the minimum output which is also broken

                        Oh man I confuse...

                          • Finding max, min, and mean/median in an multidimensional array
                            riza.guntur

                            Changing the last to:

                                mean_of_five((float)5.0,streami0,streami1);
                                max_reduce(streami0,streami2);
                                min_reduce(streami0,streami3);
                                streamWrite(streami1,arr1);
                                streamWrite(streami2,arr2);
                                streamWrite(streami3,arr3);
                                cout << (arr0[0]+arr0[1]+arr0[2]+arr0[3]+arr0[4])/5 <<endl;
                                cout << arr1[0] << endl;
                                cout << arr2[0] << endl;
                                cout << arr3[0] << endl;

                            with mean_of_five:

                            reduce void mean_of_five(float x,float a<>, reduce float b<>
                            {
                                b += a/x;
                            }

                            shows different result.

                            Anyone help

                          • Finding max, min, and mean/median in an multidimensional array
                            gaurav.garg

                             

                            After changing the array to 1D it does fine. Though I don't understand why.


                            StreamRead and Write method requires a pointer to linear memory of size width * height * sizeof(stream type), not an array of pointers.

                            To calculate mean, you need to call reduce kernel only to add all of them and later you should do averaging on the CPU or in a seperate kernel. (Read last paragraph of section A.4.1.2 of stream computing user guide for more detail)

                            I have written max/min find reduction kernel on my end and it works fine. I just wanted to point that you are reducing in y direction of the stream. When you declare a stream, first value in the dimension pointer is width of the stream(jumlahDimensi in your case).

                            Also, you can check your correctness with CPU backend (use env variable BRT_RUNTIME=cpu)

                              • Finding max, min, and mean/median in an multidimensional array
                                riza.guntur

                                I see, from now I will use linear memory for any array.

                                Okay, I've now understand that "this kernel is undefined" mean I can't do any operation beside adding sum in there, so how to do it in separate kernel? I havn't see any examples about it.

                                I made my min max function for finding the min and max in group of five not differ from mean_of_five. Is there any changes I should make from my current implementation to make it works? I really want to reduce 480 to 96 to get max and min of group of five.

                                  • Finding max, min, and mean/median in an multidimensional array
                                    gaurav.garg

                                     

                                    so how to do it in separate kernel?


                                    You can write a kernel that do averaging and its input is the output of reduction kernel or do averaging on CPU.

                                    You should probably debug your aplication with CPU backend (use -nl option to compile .br file), I have implemented min/max on my end and it works fine.

                                    Here are some modifications in your code that works-

                                    int
                                    main(int argc, char* argv[])
                                    {
                                        unsigned int jumlahData = 480;
                                        unsigned  int jumlahDiSatuGrup = 5;
                                        unsigned  int jumlahDimensi = 16;
                                        unsigned int jumlahOutput = 6;
                                        unsigned int streamSize[] = {jumlahData, jumlahDimensi};
                                        unsigned int streamSizeReduce[] = {jumlahData/jumlahDiSatuGrup, jumlahDimensi};

                                        unsigned int rank = 2;
                                        float *arr0 = new float[jumlahDimensi*jumlahData];
                                        float *arr1 = new float[jumlahDimensi*jumlahData/jumlahDiSatuGrup];
                                        float *arr2 = new float[jumlahDimensi*jumlahData/jumlahDiSatuGrup];
                                        float *arr3 = new float[jumlahDimensi*jumlahData/jumlahDiSatuGrup];
                                        memset(arr0, 0, jumlahDimensi * jumlahData * sizeof(float));
                                        memset(arr1, 0, jumlahDimensi * jumlahData /jumlahDiSatuGrup * sizeof(float));
                                        memset(arr2, 0, jumlahDimensi * jumlahData /jumlahDiSatuGrup * sizeof(float));
                                        memset(arr3, 0, jumlahDimensi * jumlahData /jumlahDiSatuGrup * sizeof(float));


                                    //here the change
                                        for(unsigned int i = 0; i < jumlahData; i++)
                                        {
                                            for(unsigned int j = 0; j < jumlahDimensi; j++)
                                            {
                                                unsigned int index = i * jumlahDimensi + j;
                                                    arr0[index] = (float)index;
                                            }
                                        }

                                        Stream<float> streami0(rank, streamSize);
                                        Stream<float> streami1(rank, streamSizeReduce);
                                        Stream<float> streami2(rank, streamSizeReduce);
                                        Stream<float> streami3(rank, streamSizeReduce);

                                        streamRead(streami0,arr0);
                                        mean_of_five(streami0,streami1);
                                        max_reduce(streami0,streami2);
                                        min_reduce(streami0,streami3);
                                        streamWrite(streami1,arr1);
                                        streamWrite(streami2,arr2);
                                        streamWrite(streami3,arr3);

                                        for(unsigned int i = 0; i < 5; i++)
                                        {
                                            for(unsigned int j = 0; j < jumlahDimensi; j++)
                                            {
                                                unsigned int index = i * jumlahDimensi + j;
                                                printf("%.2f  ", arr0[index]);
                                            }
                                            printf("\n");
                                        }

                                        printf("\n\n");

                                        for(unsigned int i = 0; i < 5; i++)
                                        {
                                            for(unsigned int j = 0; j < jumlahDimensi; j++)
                                            {
                                                unsigned int index = i * jumlahDimensi + j;
                                                printf("%.2f  ", arr1[index]);
                                            }
                                            printf("\n");
                                        }

                                        printf("\n\n");

                                        for(unsigned int i = 0; i < 5; i++)
                                        {
                                            for(unsigned int j = 0; j < jumlahDimensi; j++)
                                            {
                                                unsigned int index = i * jumlahDimensi + j;
                                                printf("%.2f  ", arr2[index]);
                                            }
                                            printf("\n");
                                        }

                                        printf("\n\n");

                                        for(unsigned int i = 0; i < 5; i++)
                                        {
                                            for(unsigned int j = 0; j < jumlahDimensi; j++)
                                            {
                                                unsigned int index = i * jumlahDimensi + j;
                                                printf("%.2f  ", arr3[index]);
                                            }
                                            printf("\n");
                                        }

                                        delete[] arr0;
                                        delete[] arr1;
                                        delete[] arr2;
                                        delete[] arr3;

                                        return 0;
                                    }

                                      • Finding max, min, and mean/median in an multidimensional array
                                        riza.guntur

                                        Thanks a lot. The array in stream kinda like transposing the CPU array, I just realized it

                                        So I will need to make kernel to average the summing:

                                        kernel void average( float div, float a<>, out float b<> )

                                        {

                                        b = a / div;

                                        }

                                         

                                        then call it from main:

                                        average( 5.0f, streami1, streamAverage);

                                        At first I confuse to your explanation since in page A-10 Stream Computing User Guide there is a line in first paragraph:

                                        ..., or used as a subkernel by an enclosing kernel (which can itself be a reduction kernel).

                                        Now I think that part of the book is not suitable in this situation, and for now I would not use that.

                                          • Finding max, min, and mean/median in an multidimensional array
                                            riza.guntur

                                            I have add -nl when compiling .br file but error.

                                            mkdir brookgenfiles | "$(BROOKROOT)\sdk\bin\brcc.exe" -nl "$(ProjectDir)\brookgenfiles\$(InputName)" "$(InputPath)"

                                            1>------ Rebuild All started: Project: percobaan_pertama, Configuration: Debug Win32 ------
                                            1>Deleting intermediate and output files for project 'percobaan_pertama', configuration 'Debug|Win32'
                                            1>Performing Custom Build Step
                                            1>Brook+ Compiler
                                            1>Version: 1.4  Built: Mar  2 2009, 13:08:36
                                            1>brcc [-hkrbfilxaec] [-D macro] [-n flag] [-w level] [-o prefix] [-p shader ] foo.br
                                            1>   -h            Help (print this message)
                                            1>   -k            Keep generated IL program (in foo.il)
                                            1>   -r            Disable address virtualization
                                            1>   -o <prefix>   Prefix prepended to all output files
                                            1>   -p <shader>   cpu or cal (can specify multiple)
                                            1>   -s            Tokenize into char list generated IL program
                                            1>   -b            Turn on bison debugging
                                            1>   -f            Turn on flex debugging
                                            1>   -i            Specify include directory for passing to external preprocessor
                                            1>   -l            Insert #line directives into generated code
                                            1>   -w <level>    Specify level of warning level. level can be 0, 1, 2, 3
                                            1>                 0 level is default
                                            1>   -x            Turn on warnings as errors
                                            1>   -a            Disable strong type checking.
                                            1>   -e            Adds extern C for non kernel function declarations
                                            1>   -c            Disable cached gather array feature
                                            1>   -pp           Enables the preprocessor
                                            1>   -D <name>     Define macro
                                            1>   -D <name>{=}<int-value> Define macro with integer value
                                            1>                 No spaces allowed between macro name and macro value
                                            1>   -n flag       Disable the specified flag
                                            1>                 flag = l -> Disable line directive information to debug
                                            1>                 presently -l flag is only valid flag
                                            1>Note : Usage of -x and -w flags are valid only with -a flag
                                            1>A subdirectory or file brookgenfiles already exists.
                                            1>Project : error PRJ0019: A tool returned an error code from "Performing Custom Build Step"
                                            1>Build log was saved at "file://c:\Documents and Settings\mic\My Documents\Visual Studio 2008\Projects\TA\percobaan_pertama\built_d\xp_x86_32\BuildLog.htm"
                                            1>percobaan_pertama - 1 error(s), 0 warning(s)
                                            ========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========