cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

riza_guntur
Journeyman III

Finding max, min, and mean/median in an multidimensional array

Dear all,

I have a problem like this:

float input[100][16];

I want to group each 5 rows of the input so I will have 20 of 5 rows each with 16 dimensions.

For each group, I need to find max, min, and mean/median within each dimension separately, so I get 16 of max, min, and mean/median for each group. The next step is rather confusing so I ask this first pass first.

Is there any ways to do it using Brook+? I thought of using reduce kernel.

reduce void
sum(float i<>, reduce float o<>
{
    o = o + i/(float)5;
}

for each 5 within a dimension in a group. But how? These each 5 separated by rows while reduce kernel operated in columns.

Is there anyway to do it by offset, for each 5 then reduce it to one? Yes I need to change the 2 dimensional array to 1 dimension but it is not hard.

Later will be compiled in the CPU for next operation.

As for max and min values haven't got any ideas yet. Anyone has a suggestion?

Thanks anyway.

0 Likes
15 Replies
gaurav_garg
Adept I

It seems you want to use something similar to partial reduction where you reduce multiple groups of input stream to single value for each group.

e.g in your case, you want to reduce stream from [100][16] to [20][16].

Take a look at section A.4.1.3 of stream computing user guide on Partial reduction.

0 Likes

Okay, I'll try it

0 Likes

I've tried the A4.1.3 and using the rest of min max reduction function there.

But it seems my program didn't execute well, the is access violation there, that I don't understand why.

From example the written:

#include "brookgenfiles/percobaan_pertama.h"
#include <iostream>
#include <iomanip>
#include <fstream>
using namespace std;
using namespace brook;

template <typename T>
T **AllocateDynamicArray2D( int nRows, int nCols)
{
      T **dynamicArray;

      dynamicArray = new T*[nRows];
      for( int i = 0 ; i < nRows ; i++ )
      dynamicArray = new T [nCols];

      return dynamicArray;
}

template <typename T>
void FreeDynamicArray2D(T** dArray)
{
      delete [] *dArray;
      delete [] dArray;
}

template <typename T>
T *AllocateDynamicArray1D( int nDims)
{
      T *dynamicArray;

      dynamicArray = new T[nDims];

      return dynamicArray;
}

template <typename T>
void FreeDynamicArray1D(T* dArray)
{
      delete [] dArray;
}

int
main(int argc, char* argv[])
{
    int jumlahData = 480;
    int jumlahDiSatuGrup = 5;
    int jumlahDimensi = 16;
    int jumlahOutput = 6;
    unsigned int streamSize[] = {jumlahDimensi, jumlahData};
    unsigned int streamSizeReduce[] = {jumlahDimensi, jumlahData/jumlahDiSatuGrup};

    unsigned int rank = 2;
    //int baris = jumlahData * jumlahDimensi / jumlahDiSatuGrup;

    //float ** arr0 = AllocateDynamicArray<float>(baris,jumlahDiSatuGrup);

    float **arr0 = AllocateDynamicArray2D<float>(jumlahDimensi,jumlahData);
    float **arr1 = AllocateDynamicArray2D<float>(jumlahDimensi,jumlahData/jumlahDiSatuGrup);
    float **arr2 = AllocateDynamicArray2D<float>(jumlahDimensi,jumlahData/jumlahDiSatuGrup);
    float **arr3 = AllocateDynamicArray2D<float>(jumlahDimensi,jumlahData/jumlahDiSatuGrup);

    ifstream inFile;
    
    inFile.open("train data.txt");
    if (!inFile) {
        cout << "Unable to open file";
        exit(1); // terminate with error
    }

    //for( int i = 0; i < baris * jumlahDiSatuGrup; i++)
    for( int i = 0; i < jumlahDimensi * jumlahData; i++)
    {
        float temp;
        if( inFile >> temp)
            arr0[i/jumlahData][i%jumlahData] = temp;
    }

    /*for( int i = 0; i < jumlahDimensi * jumlahData; i++)
    {
        printf("%f ", arr0[i/jumlahData][i%jumlahData]);
    }*/

    Stream<float> streami0(rank, streamSize);
    Stream<float> streami1(rank, streamSizeReduce);
    Stream<float> streami2(rank, streamSizeReduce);
    Stream<float> streami3(rank, streamSizeReduce);

    streamRead(streami0,arr0);
    mean_of_five(streami0,streami1);
    max_reduce(streami0,streami2);
    min_reduce(streami0,streami3);
    streami1.write(arr1);
    streami2.write(arr2);
    streami3.write(arr3);

    inFile.close();
    FreeDynamicArray2D<float>(arr0);
    FreeDynamicArray2D<float>(arr1);
    FreeDynamicArray2D<float>(arr2);
    FreeDynamicArray2D<float>(arr3);


    getchar();
    return 0;
}


the brook+ code is like one below:

reduce void mean_of_five(float a<>, reduce float b<>
{
    b += (float)0.2*a;
}

reduce void max_reduce(float a<>, reduce float b<>
{
    if(a > b)
        b = a;
}

reduce void min_reduce(float a<>, reduce float b<>
{
    if(a < b)
        b = a;
}

 

Why the access violation happens? The first subscript was fine if I look at the example.

0 Likes

I'm sorry about paste from word warning, I've edited using paste from word but still gone bad.

After changing the array to 1D it does fine. Though I don't understand why. I ame quite familiar with C but not C++ so... I beg your explanation...

The change is below, to easily see what I add just search for "//here the change."

#include "brookgenfiles/percobaan_pertama.h"
#include
#include
#include
using namespace std;
using namespace brook;

template
T **AllocateDynamicArray2D( int nRows, int nCols)
{
      T **dynamicArray;

      dynamicArray = new T*[nRows];
      for( int i = 0 ; i < nRows ; i++ )
      dynamicArray = new T [nCols];

      return dynamicArray;
}

template
void FreeDynamicArray2D(T** dArray)
{
      delete [] *dArray;
      delete [] dArray;
}

template
T *AllocateDynamicArray1D( int nDims)
{
      T *dynamicArray;

      dynamicArray = new T[nDims];

      return dynamicArray;
}

template
void FreeDynamicArray1D(T* dArray)
{
      delete [] dArray;
}

int
main(int argc, char* argv[])
{
    unsigned int jumlahData = 480;
    unsigned  int jumlahDiSatuGrup = 5;
    unsigned  int jumlahDimensi = 16;
    unsigned int jumlahOutput = 6;
    unsigned int streamSize[] = {jumlahDimensi, jumlahData};
    unsigned int streamSizeReduce[] = {jumlahDimensi, jumlahData/jumlahDiSatuGrup};

    unsigned int rank = 2;
    //int baris = jumlahData * jumlahDimensi / jumlahDiSatuGrup;

    //float ** arr0 = AllocateDynamicArray(baris,jumlahDiSatuGrup);

//here the change
    float *arr0 = new float[jumlahDimensi*jumlahData];
    float *arr1 = new float[jumlahDimensi*jumlahData/jumlahDiSatuGrup];
    float *arr2 = new float[jumlahDimensi*jumlahData/jumlahDiSatuGrup];
    float *arr3 = new float[jumlahDimensi*jumlahData/jumlahDiSatuGrup];
    memset(arr0, 0, jumlahDimensi * jumlahData * sizeof(float));
    memset(arr1, 0, jumlahDimensi * jumlahData /jumlahDiSatuGrup * sizeof(float));
    memset(arr2, 0, jumlahDimensi * jumlahData /jumlahDiSatuGrup * sizeof(float));
    memset(arr3, 0, jumlahDimensi * jumlahData /jumlahDiSatuGrup * sizeof(float));

    ifstream inFile;
    
    inFile.open("train data.txt");
    if (!inFile) {
        cout << "Unable to open file";
        exit(1); // terminate with error
    }

    //for( int i = 0; i < baris * jumlahDiSatuGrup; i++)
    /*for( int i = 0; i < jumlahDimensi * jumlahData; i++)
    {
        float temp;
        if( inFile >> temp)
            arr0[i/jumlahData][i%jumlahData] = temp;
    }*/

//here the change
    for(unsigned int i = 0; i < jumlahDimensi; i++)
    {
        for(unsigned int j = 0; j < jumlahData; j++)
        {
            unsigned int index = i * jumlahData + j;
            float temp;
            if( inFile >> temp)
                arr0[index] = temp;
        }
    }

    /*for( int i = 0; i < jumlahDimensi * jumlahData; i++)
    {
        printf("%f ", arr0[i/jumlahData][i%jumlahData]);
    }*/

    Stream streami0(rank, streamSize);
    Stream streami1(rank, streamSizeReduce);
    Stream streami2(rank, streamSizeReduce);
    Stream streami3(rank, streamSizeReduce);

    streamRead(streami0,arr0);
    mean_of_five(streami0,streami1);
    max_reduce(streami0,streami2);
    min_reduce(streami0,streami3);
    streamWrite(streami1,arr1);
    streamWrite(streami2,arr2);
    streamWrite(streami3,arr3);

    inFile.close();
    delete[] arr0;
    delete[] arr1;
    delete[] arr2;
    delete[] arr3;

    getchar();
    return 0;
}

0 Likes

Oh man, after checking the values of each processed stream output, I get unexpected results

cout << (arr0[0]+arr0[1]+arr0[2]+arr0[3]+arr0[4])/5 <
cout << arr1[0] << endl; the mean GPU which is still broken (at least for me)
cout << arr2[0] << endl; the maximum output which is broken
cout << arr3[0] << endl; the minimum output which is also broken

Oh man I confuse...

0 Likes

Changing the last to:

    mean_of_five((float)5.0,streami0,streami1);
    max_reduce(streami0,streami2);
    min_reduce(streami0,streami3);
    streamWrite(streami1,arr1);
    streamWrite(streami2,arr2);
    streamWrite(streami3,arr3);
    cout << (arr0[0]+arr0[1]+arr0[2]+arr0[3]+arr0[4])/5 <<endl;
    cout << arr1[0] << endl;
    cout << arr2[0] << endl;
    cout << arr3[0] << endl;

with mean_of_five:

reduce void mean_of_five(float x,float a<>, reduce float b<>
{
    b += a/x;
}

shows different result.

Anyone help

0 Likes

After changing the array to 1D it does fine. Though I don't understand why.


StreamRead and Write method requires a pointer to linear memory of size width * height * sizeof(stream type), not an array of pointers.

To calculate mean, you need to call reduce kernel only to add all of them and later you should do averaging on the CPU or in a seperate kernel. (Read last paragraph of section A.4.1.2 of stream computing user guide for more detail)

I have written max/min find reduction kernel on my end and it works fine. I just wanted to point that you are reducing in y direction of the stream. When you declare a stream, first value in the dimension pointer is width of the stream(jumlahDimensi in your case).

Also, you can check your correctness with CPU backend (use env variable BRT_RUNTIME=cpu)

0 Likes

I see, from now I will use linear memory for any array.

Okay, I've now understand that "this kernel is undefined" mean I can't do any operation beside adding sum in there, so how to do it in separate kernel? I havn't see any examples about it.

I made my min max function for finding the min and max in group of five not differ from mean_of_five. Is there any changes I should make from my current implementation to make it works? I really want to reduce 480 to 96 to get max and min of group of five.

0 Likes

so how to do it in separate kernel?


You can write a kernel that do averaging and its input is the output of reduction kernel or do averaging on CPU.

You should probably debug your aplication with CPU backend (use -nl option to compile .br file), I have implemented min/max on my end and it works fine.

Here are some modifications in your code that works-

int
main(int argc, char* argv[])
{
    unsigned int jumlahData = 480;
    unsigned  int jumlahDiSatuGrup = 5;
    unsigned  int jumlahDimensi = 16;
    unsigned int jumlahOutput = 6;
    unsigned int streamSize[] = {jumlahData, jumlahDimensi};
    unsigned int streamSizeReduce[] = {jumlahData/jumlahDiSatuGrup, jumlahDimensi};

    unsigned int rank = 2;
    float *arr0 = new float[jumlahDimensi*jumlahData];
    float *arr1 = new float[jumlahDimensi*jumlahData/jumlahDiSatuGrup];
    float *arr2 = new float[jumlahDimensi*jumlahData/jumlahDiSatuGrup];
    float *arr3 = new float[jumlahDimensi*jumlahData/jumlahDiSatuGrup];
    memset(arr0, 0, jumlahDimensi * jumlahData * sizeof(float));
    memset(arr1, 0, jumlahDimensi * jumlahData /jumlahDiSatuGrup * sizeof(float));
    memset(arr2, 0, jumlahDimensi * jumlahData /jumlahDiSatuGrup * sizeof(float));
    memset(arr3, 0, jumlahDimensi * jumlahData /jumlahDiSatuGrup * sizeof(float));


//here the change
    for(unsigned int i = 0; i < jumlahData; i++)
    {
        for(unsigned int j = 0; j < jumlahDimensi; j++)
        {
            unsigned int index = i * jumlahDimensi + j;
                arr0[index] = (float)index;
        }
    }

    Stream<float> streami0(rank, streamSize);
    Stream<float> streami1(rank, streamSizeReduce);
    Stream<float> streami2(rank, streamSizeReduce);
    Stream<float> streami3(rank, streamSizeReduce);

    streamRead(streami0,arr0);
    mean_of_five(streami0,streami1);
    max_reduce(streami0,streami2);
    min_reduce(streami0,streami3);
    streamWrite(streami1,arr1);
    streamWrite(streami2,arr2);
    streamWrite(streami3,arr3);

    for(unsigned int i = 0; i < 5; i++)
    {
        for(unsigned int j = 0; j < jumlahDimensi; j++)
        {
            unsigned int index = i * jumlahDimensi + j;
            printf("%.2f  ", arr0[index]);
        }
        printf("\n");
    }

    printf("\n\n");

    for(unsigned int i = 0; i < 5; i++)
    {
        for(unsigned int j = 0; j < jumlahDimensi; j++)
        {
            unsigned int index = i * jumlahDimensi + j;
            printf("%.2f  ", arr1[index]);
        }
        printf("\n");
    }

    printf("\n\n");

    for(unsigned int i = 0; i < 5; i++)
    {
        for(unsigned int j = 0; j < jumlahDimensi; j++)
        {
            unsigned int index = i * jumlahDimensi + j;
            printf("%.2f  ", arr2[index]);
        }
        printf("\n");
    }

    printf("\n\n");

    for(unsigned int i = 0; i < 5; i++)
    {
        for(unsigned int j = 0; j < jumlahDimensi; j++)
        {
            unsigned int index = i * jumlahDimensi + j;
            printf("%.2f  ", arr3[index]);
        }
        printf("\n");
    }

    delete[] arr0;
    delete[] arr1;
    delete[] arr2;
    delete[] arr3;

    return 0;
}

0 Likes

Thanks a lot. The array in stream kinda like transposing the CPU array, I just realized it

So I will need to make kernel to average the summing:

kernel void average( float div, float a<>, out float b<> )

{

b = a / div;

}

 

then call it from main:

average( 5.0f, streami1, streamAverage);

At first I confuse to your explanation since in page A-10 Stream Computing User Guide there is a line in first paragraph:

..., or used as a subkernel by an enclosing kernel (which can itself be a reduction kernel).

Now I think that part of the book is not suitable in this situation, and for now I would not use that.

0 Likes

I have add -nl when compiling .br file but error.

mkdir brookgenfiles | "$(BROOKROOT)\sdk\bin\brcc.exe" -nl "$(ProjectDir)\brookgenfiles\$(InputName)" "$(InputPath)"

1>------ Rebuild All started: Project: percobaan_pertama, Configuration: Debug Win32 ------
1>Deleting intermediate and output files for project 'percobaan_pertama', configuration 'Debug|Win32'
1>Performing Custom Build Step
1>Brook+ Compiler
1>Version: 1.4  Built: Mar  2 2009, 13:08:36
1>brcc [-hkrbfilxaec] [-D macro] [-n flag] [-w level] [-o prefix] [-p shader ] foo.br
1>   -h            Help (print this message)
1>   -k            Keep generated IL program (in foo.il)
1>   -r            Disable address virtualization
1>   -o <prefix>   Prefix prepended to all output files
1>   -p <shader>   cpu or cal (can specify multiple)
1>   -s            Tokenize into char list generated IL program
1>   -b            Turn on bison debugging
1>   -f            Turn on flex debugging
1>   -i            Specify include directory for passing to external preprocessor
1>   -l            Insert #line directives into generated code
1>   -w <level>    Specify level of warning level. level can be 0, 1, 2, 3
1>                 0 level is default
1>   -x            Turn on warnings as errors
1>   -a            Disable strong type checking.
1>   -e            Adds extern C for non kernel function declarations
1>   -c            Disable cached gather array feature
1>   -pp           Enables the preprocessor
1>   -D <name>     Define macro
1>   -D <name>{=}<int-value> Define macro with integer value
1>                 No spaces allowed between macro name and macro value
1>   -n flag       Disable the specified flag
1>                 flag = l -> Disable line directive information to debug
1>                 presently -l flag is only valid flag
1>Note : Usage of -x and -w flags are valid only with -a flag
1>A subdirectory or file brookgenfiles already exists.
1>Project : error PRJ0019: A tool returned an error code from "Performing Custom Build Step"
1>Build log was saved at "file://c:\Documents and Settings\mic\My Documents\Visual Studio 2008\Projects\TA\percobaan_pertama\built_d\xp_x86_32\BuildLog.htm"
1>percobaan_pertama - 1 error(s), 0 warning(s)
========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========

0 Likes

You have forgot to give -o option.

It should be-

$(BROOKROOT)\sdk\bin\brcc.exe" -nl -o "$(ProjectDir)\brookgenfiles\$(InputName)" "$(InputPath)"

0 Likes

Thank you, I'll try it.

0 Likes

About the CPU backend:

1. BRT_RUNTIME=cpu where is it? I didn't find it in windows xp environment variables

2. -nl parameter seems works, no errors. How do you usually compare GPU vs CPU output? Make two .br files with different parameters? Cause I haven't seen any performance gain (probably because small domain)

Thanks a lot

0 Likes

You are supposed to add that environment variable. If no env variable is specified, Brook+ uses CAL by default.

0 Likes