cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

fandango
Journeyman III

Kernel function problem

Hi,

I have faced with small problem. My kernel function does not work correctly and return partly right result. This is function is very simple and I believe it is not my mistake. Could you please look at my example? What's wrong?

kernel void func1(unsigned char src[][],  unsigned char str, out unsigned char o_img<>
{
    // Output position
    int j = instance().x; // width
    int i = instance().y; // height
   
    int rest = j % 16;
   
    if (rest == 0)
    {
        o_img = src [ i] ;
    }
    else
    {
        o_img = str;
    }
}

 

Input:

3   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   3   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   3   1   1   1   1   1   1   1   1

I just replace 1 by 9.

Wrong output:

  3      9   9   9   9   9   9   9   9   9   9   9   9   9   9   3      9   9   9   9   9   9   9   9   9   9   9   9   9   9   3      9   9   9   9   9   9   9

You can see garbage in memory. If I remove if statement from the kernel , the output will be without garbage. But I need the first way.

 

 

0 Likes
74 Replies
gaurav_garg
Adept I

I could not reproduce this issue on my system. Could you send me your system configuration?

0 Likes

Yes,

ATI Radeon 4800 HD, driver 8.600.0.0

Intel Core 2 6300 1.86 GHz, 1 Gb DDR2

 

0 Likes

Chunk of code just in case:


 // Specifying the size of the 2D stream
    unsigned int streamSize[] = {width, height};

    // Specifying the rank of the stream
    unsigned int rank = 2;
   
    brook::Stream<unsigned char> inputStream(rank, streamSize);

    // Copying data from input buffer to input stream
    inputStream.read(src);

    //--------------------------------------------------------------------------
    // Creating the output stream
    //--------------------------------------------------------------------------    
    streamSize[0] = width;
    streamSize[1] = height;

    brook::Stream<unsigned char> outputStream(rank, streamSize);

    //--------------------------------------------------------------------------
    // Executing kernel and copying back data
    //--------------------------------------------------------------------------    
    unsigned char str = '9';
    
    // Calling the kernel on the input and output streams
    func1(inputStream, str, outputStream);

    // Creating an output buffer
    unsigned char* ref = new unsigned char[width * height];
    memset(ref, 0, width * height);


    // Copying data from output stream to output buffer
    outputStream.write(ref);

0 Likes

Any ideas?

0 Likes

no idea, because some mistakes is too strange.

0 Likes

ok. i will try to reinstall driver or stream sdk.

0 Likes

I am also using 8.60 driver, but I don't see the issue.

What is your OS and what are the value for width & height?

0 Likes

I have tried the Windows Xp 32 and Windows Vista 64 SP1. The same result. The latest stream sdk and driver.

I have tried the different w&h. Full code:

#include <iostream>

#include "brook/Stream.h"
#include "brook/KernelInterface.h"
#include "brookgenfiles/kernel.h"


void print(unsigned char* arr, int width, int height)
{
    for (int i = 0; i < height; i++)
    {
        for(int j = 0; j < width; j++)
        {
            char cStr[256];
            sprintf(cStr, "% 3c ", arr[j + i * width]);
            OutputDebugString(cStr);
        }

        OutputDebugString("\n");
    }
}

int
main(int argc, char* argv[])
{
// Specifying the width and height of the 2D buffer
    const unsigned int width = 49;
    const unsigned int height = 6;

    //--------------------------------------------------------------------------
    // Creating and initializing the input buffer
    //--------------------------------------------------------------------------

    // Creating an input buffer
    unsigned char* src = new unsigned char[width * height];
    //memset(src, 7, width * height);

    for (int i = 0; i < height; i++)
    {
        for(int j = 0; j < width; j++)
        {
            if (j % 16 == 0)
            {
                src[j + i * width] = '3';
            }
            else
            {
                src[j + i * width] = '1';
            }
        }

        OutputDebugString("\n");
    }

     print(src, width, height);

    // Initializing the input buffer such that
    // input(i,j) = i*width + j
//    fillBuffer(inputBuffer, width, height);

    // Printing input buffer
    fprintf(stdout, "Input buffer:\n");

    //--------------------------------------------------------------------------
    // Creating the input stream and copying data from input buffer
    //--------------------------------------------------------------------------

    // Specifying the size of the 2D stream
    unsigned int streamSize[] = {width, height};

    // Specifying the rank of the stream
    unsigned int rank = 2;

    // Create a 2D stream of specified size i.e. 64x64 floating-point values   
    brook::Stream<unsigned char> inputStream(rank, streamSize);

    // Copying data from input buffer to input stream
    inputStream.read(src);

    //--------------------------------------------------------------------------
    // Creating the output stream
    //--------------------------------------------------------------------------   
    streamSize[0] = width;
    streamSize[1] = height;

    brook::Stream<unsigned char> outputStream(rank, streamSize);

    //--------------------------------------------------------------------------
    // Executing kernel and copying back data
    //--------------------------------------------------------------------------   
    unsigned char str = '9';
   
    // Calling the kernel on the input and output streams
    func1(inputStream, str, outputStream);

    // Creating an output buffer
    unsigned char* ref = new unsigned char[width * height];
    memset(ref, 0, width * height);
    //memset(ref, 0, width * height * sizeof(just));

    //print(ref, width, height);

    // Copying data from output stream to output buffer
    outputStream.write(ref);

    print(ref, width, height);

    // Check error on stream
    if(outputStream.error())
    {
        // Print error Log associated to stream
        fprintf(stdout, "%s\n", outputStream.errorLog());
    }

    fprintf(stdout, "Output buffer:\n");
//    printBuffer(outputBuffer, width, 0, 0, 8, 8);

    //--------------------------------------------------------------------------
    // Checking whether the result is correct or not
    //--------------------------------------------------------------------------
   

    //--------------------------------------------------------------------------
    // Cleaning up
    //--------------------------------------------------------------------------
   
    delete[] src;
    delete[] ref;
   
    return 0;
}

0 Likes

I just have noticed one thing. Depence on width and height of streamSize for output brook stream I have the different results. I mean the different garbage location.

I suppose something wrong in my kernel function

0 Likes

If I set

    streamSize[0] = width;
    streamSize[1] = 1;

    brook::Stream<unsigned char> outputStream(rank, streamSize);

The result is correct. As soon as I set height > 1 the problem is occured.

0 Likes

hmm. I obtained correct result. The changes

kernel void func1(unsigned char src[],  unsigned char str, unsigned char str2, out unsigned char o_img<>

src is one dimensional array.

And I set src rank to 1, dst to 2.

Please comment on this. What was the reason for the problem? Is it my allocation approach?

unsigned char* src = new unsigned char[width * height];

Please respond.

0 Likes

is dst height 1?

 

Are you able to run samples\legacy\tests\sum?

Let me know sum sample runing or not?

0 Likes

Yes, I'm able. No problem here. Now the height and width can be any. My example works as I expected. I think it's my misundstanding of conception and I would ask you to explain me what is wrong in my mind.

0 Likes

What changes you made to your code?

 

I did not see any problems with pasted code on the top

0 Likes

The changes were:

1. I set stream rank 1 for the src stream instead of 2.

unsigned int rank = 1;

brook::Stream<unsigned char> inputStream(rank, streamSize);

 inputStream.read(src);

2. I changed accordingly my kernel function. You can see one dimensional src array [], instead of [][] in previous version.

kernel void func1(unsigned char src[],  unsigned char str,  out unsigned char o_img<>
{
    // Output position
    int2 vPos = instance().xy;
   
    int j = vPos.x; // width
    int i = vPos.y; // height
   
    int rest = j % 16;
   
    if (rest > 0)
    {
        o_img = str;
    }
    else
    {
        o_img = src[j + i * 40];
    }
}

That's all.

0 Likes

constant 40 in code above is width

0 Likes

In kernel code, dimension of output is importent

 

you can also use src[][] but in this case both size and dimensions of src and dst must be same

 

 

 

 

0 Likes

I did not change of output properties, only input.

And the your last sentence describes my first approach, when i obtained incorrect results (garbage in memory).

So question is still open.

0 Likes

With the given width & height, I could reproduce this. A quick workaround to resolve this problem is to use regular strream instead of gather stream-

kernel void func1(unsigned char src<>, unsigned char str, out unsigned char o_img<> )
{
    // Output position
    int j = instance().x; // width
    int i = instance().y; // height
  
    int rest = j % 16;

    if (rest == 0)
    {
        o_img = src;
    }
    else
    {
        o_img = str;
    }
}

0 Likes

Now, its confirmed that its a regression with Catalyst 9.4. You can try with previous version of catalyst.

0 Likes

So, in other words, it is a driver problem. Is it right?

0 Likes

Yes.

0 Likes

Ok. Thank you very much for your support.

Today I've faced with other problem. I expect another behaviour.

Kernel:

kernel void motion_estimation(unsigned char src[],
                              unsigned char ref[],
                              unsigned char str,
                              unsigned char str2,
                              out unsigned char o_img<>,
                              out double sad<>
{
    // Output position
    int2 vPos = instance().xy;
   
    int i = vPos.x; // width
    int j = vPos.y; // height
   
    if ((j % 4 > 0 || i % 4 > 0) || (j == 12 || i == 32))
    {
        o_img = str;
        sad = 1.0;
    }
    else
    {
        estimate_macroblock_4x4(src, ref, i, j, str2, o_img, sad);
    }
}

kernel int estimate_macroblock_4x4(unsigned char mbs[],
                                   unsigned char mbr[],
                                   int i, int j,
                                   unsigned char str,
                                   out unsigned char o_img<>,
                                   out double sad<>
{

    int x, y;
   
    //sad = (double) (i + 0 + ((j + 0) * 33)) ;
   
    for (x = 0; x < 4; x++)
    {
        for (y = 0; y < 4; y++)
        {

// PROBLEM IS HERE
            int index = i + x + ((j + y) * 33);
            sad += (double)(mbs[index] - mbr[index]);
        }
   
    }
   
    return 0;
}

The part of sad output is

-71.000000  1.000000  1.000000  1.000000 -71.000000  1.000000  ...
 1.000000  1.000000  1.000000  1.000000  1.000000  1.000000  ...
 1.000000  1.000000  1.000000  1.000000  1.000000  1.000000  ...
 1.000000  1.000000  1.000000  1.000000  1.000000  1.000000 ...


-80.000000  1.000000  1.000000  1.000000 -80.000000  1.000000 ...
 1.000000  1.000000  1.000000  1.000000  1.000000  1.000000  ...
 1.000000  1.000000  1.000000  1.000000  1.000000  1.000000  ...
 1.000000  1.000000  1.000000  1.000000  1.000000  1.000000  ...

-71 is correct value. It is sum of differences between blocks

 *   1   1   1
 1   1   1   1
 1   1   1   1
 1   1   1   1

and

/   3   3   3
3   3   3   3
3   3   3   3
3   3   3   3

-80.0 it is difference only between * and /. I expect everywhere -71 instead of 80. It seems like for { for ... does not work for j > 0.

0 Likes

I would recommend you to first try with catalyst 9.2 and see if your problems resolve.

0 Likes

The same problem on 9.2. Only different figures. The same description

4025.000000  1.000000  1.000000  1.0000
1.000000  1.000000  1.000000  1.000000
1.000000  1.000000  1.000000  1.000000
1.000000  1.000000  1.000000  1.000000


4016.000000  1.000000  1.000000  1.0000
1.000000  1.000000  1.000000  1.000000
1.000000  1.000000  1.000000  1.000000
1.000000  1.000000  1.000000  1.000000

0 Likes

If you are using 2D streams, you must use [][] for gather streams.

0 Likes

What do you mean? Where I should use [][]? instead of sad<>?

0 Likes

kernel void motion_estimation(unsigned char src[],                               unsigned char ref[],                               unsigned char str,                               unsigned char str2,                               out unsigned char o_img<>,                               out double sad<> {     // Output position     int2 vPos = instance().xy;         int i = vPos.x; // width     int j = vPos.y; // height


If your input streams src & ref are 2D streams use [][], otherwise its fine.

0 Likes

My input streams are 1D. Output are 2D.

0 Likes

Then its fine. Could you post your runtime code as well?

0 Likes

Do you mean .cpp generated code?


////////////////////////////////////////////
// Generated by BRCC 1.4
// BRCC Compiled on: Mar  2 2009 13:07:15
////////////////////////////////////////////

#include "brook/brook.h"
#include "kernel_gpu.h"
#include "kernel.h"


static __BrtInt1  __estimate_macroblock_4x4_cpu_inner(const __BrtArray<__BrtUChar1  > &mbs,
                                                const __BrtArray<__BrtUChar1  > &mbr,
                                                const __BrtInt1  &i,
                                                const __BrtInt1  &j,
                                                const __BrtUChar1  &str,
                                                __BrtUChar1  &o_img,
                                                __BrtDouble1  &sad)


{

  __BrtInt1  y, x;

  for (y = __BrtInt1((int)0); y < __BrtInt1((int)4); y++)
  {
    for (x = __BrtInt1((int)0); x < __BrtInt1((int)4); x++)
    {
      __BrtInt1  index = i + x + (j + y) * __BrtInt1((int)33);

      sad += (__BrtDouble1 ) (mbs[index] - mbr[index]);
    }

  }

  return __BrtInt1((int)0);
}
void  __estimate_macroblock_4x4_cpu(::brt::KernelC *__k, int __brt_idxstart, int __brt_idxend, bool __brt_isreduce)
{
  __BrtArray<__BrtUChar1  > *arg_mbs = (__BrtArray<__BrtUChar1  > *) __k->getVectorElement(0);


  __BrtArray<__BrtUChar1  > *arg_mbr = (__BrtArray<__BrtUChar1  > *) __k->getVectorElement(1);

  __BrtInt1 *arg_i = (__BrtInt1 *) __k->getVectorElement(2);

  __BrtInt1 *arg_j = (__BrtInt1 *) __k->getVectorElement(3);


  __BrtUChar1 *arg_str = (__BrtUChar1 *) __k->getVectorElement(4);


  ::brt::StreamInterface *arg_o_img = (::brt::StreamInterface *) __k->getVectorElement(5);


  ::brt::StreamInterface *arg_sad = (::brt::StreamInterface *) __k->getVectorElement(6);
 
    for(int __brt_idx=__brt_idxstart; __brt_idx<__brt_idxend; __brt_idx++) {
  if(!(__k->isValidAddress(__brt_idx))){ continue; }


    Addressable <__BrtUChar1  > __out_arg_o_img((__BrtUChar1 *) __k->FetchElem(arg_o_img, __brt_idx));

    Addressable <__BrtDouble1  > __out_arg_sad((__BrtDouble1 *) __k->FetchElem(arg_sad, __brt_idx));

    __estimate_macroblock_4x4_cpu_inner (

                                         *arg_mbs,

                                         *arg_mbr,

                                         *arg_i,

                                         *arg_j,

                                         *arg_str,

                                         __out_arg_o_img,

                                         __out_arg_sad);


    *reinterpret_cast<__BrtUChar1 *>(__out_arg_o_img.address) = __out_arg_o_img.castToArg(*reinterpret_cast<__BrtUChar1 *>(__out_arg_o_img.address));

    *reinterpret_cast<__BrtDouble1 *>(__out_arg_sad.address) = __out_arg_sad.castToArg(*reinterpret_cast<__BrtDouble1 *>(__out_arg_sad.address));
  }
}

void  __motion_estimation_cpu_inner(const __BrtArray<__BrtUChar1  > &src,
                                   const __BrtArray<__BrtUChar1  > &ref,
                                   const __BrtUChar1  &str,
                                   const __BrtUChar1  &str2,
                                   __BrtUChar1  &o_img,
                                   __BrtDouble1  &sad)
{

  __BrtInt2  vPos = (indexof(o_img)).swizzle2(::brt::maskX, ::brt::maskY);

  __BrtInt1  i = vPos.swizzle1(::brt::maskX);

  __BrtInt1  j = vPos.swizzle1(::brt::maskY);

  if (j % __BrtInt1((int)4) > __BrtInt1((int)0) || i % __BrtInt1((int)4) > __BrtInt1((int)0) || (j == __BrtInt1((int)12) || i == __BrtInt1((int)32)))

  {

    o_img = str;

    sad = __BrtDouble1((double)1.0);
  }

  else
  {

    o_img = src[i + j * __BrtInt1((int)33)];

    __estimate_macroblock_4x4_cpu_inner(src, ref, i, j, str2, o_img, sad);
  }

}
void  __motion_estimation_cpu(::brt::KernelC *__k, int __brt_idxstart, int __brt_idxend, bool __brt_isreduce)
{

  __BrtArray<__BrtUChar1  > *arg_src = (__BrtArray<__BrtUChar1  > *) __k->getVectorElement(0);

  __BrtArray<__BrtUChar1  > *arg_ref = (__BrtArray<__BrtUChar1  > *) __k->getVectorElement(1);


  __BrtUChar1 *arg_str = (__BrtUChar1 *) __k->getVectorElement(2);

  __BrtUChar1 *arg_str2 = (__BrtUChar1 *) __k->getVectorElement(3);

  ::brt::StreamInterface *arg_o_img = (::brt::StreamInterface *) __k->getVectorElement(4);

  ::brt::StreamInterface *arg_sad = (::brt::StreamInterface *) __k->getVectorElement(5);


    for(int __brt_idx=__brt_idxstart; __brt_idx<__brt_idxend; __brt_idx++) {
  if(!(__k->isValidAddress(__brt_idx))){ continue; }

    Addressable <__BrtUChar1  > __out_arg_o_img((__BrtUChar1 *) __k->FetchElem(arg_o_img, __brt_idx));

    Addressable <__BrtDouble1  > __out_arg_sad((__BrtDouble1 *) __k->FetchElem(arg_sad, __brt_idx));

    __motion_estimation_cpu_inner (
                                   *arg_src,
                                   *arg_ref,
                                   *arg_str,
                                   *arg_str2,
                                   __out_arg_o_img,
                                   __out_arg_sad);

    *reinterpret_cast<__BrtUChar1 *>(__out_arg_o_img.address) = __out_arg_o_img.castToArg(*reinterpret_cast<__BrtUChar1 *>(__out_arg_o_img.address));

    *reinterpret_cast<__BrtDouble1 *>(__out_arg_sad.address) = __out_arg_sad.castToArg(*reinterpret_cast<__BrtDouble1 *>(__out_arg_sad.address));
  }
}


void __motion_estimation:perator()(const ::brook::Stream< uchar >& src,  const ::brook::Stream< uchar >& ref,
        const uchar  str,
        const uchar  str2,
        const ::brook::Stream<  uchar >& o_img,
        const ::brook::Stream<  double >& sad)
{

  static const void *__motion_estimation_fp[] = {

     "cal", __motion_estimation_cal,
     "cpu", (void *) __motion_estimation_cpu,
     NULL, NULL };

  ::brook::Kernel  __k(__motion_estimation_fp, brook::KERNEL_MAP);
  ::brook::ArgumentInfo __argumentInfo;

  __k.PushGatherStream(src);

  __k.PushGatherStream(ref);


  brook::Constant<uchar > constant_2(str);
  __k.PushConstant(constant_2);

  brook::Constant<uchar > constant_3(str2);
  __k.PushConstant(constant_3);
  __k.PushOutput(o_img);

  __k.PushOutput(sad);

  __argumentInfo.startExecDomain = _domainOffset;
  __argumentInfo.domainDimension = _domainSize;


  __k.run(&__argumentInfo);
  DESTROYPARAM();

}

__THREAD__ __motion_estimation motion_estimation;


0 Likes

The code where you declare stream, call kernel and call different operators on stream.

0 Likes

#include <iostream>

#include "brook/Stream.h"
#include "brook/KernelInterface.h"
#include "brookgenfiles/kernel.h"


void print(unsigned char* arr, int width, int height)
{
    for (int i = 0; i < height; i++)
    {
        for(int j = 0; j < width; j++)
        {
            char cStr[256];
            sprintf(cStr, "% 3c ", arr[j + i * width]);
            OutputDebugString(cStr);
        }

        OutputDebugString("\n");
    }
    OutputDebugString("\n\n");
}

void printd(double* arr, int width, int height)
{
    for (int i = 0; i < height; i++)
    {
        for(int j = 0; j < width; j++)
        {
            char cStr[256];
            sprintf(cStr, "% 3f ", arr[j + i * width]);
            OutputDebugString(cStr);
        }

        OutputDebugString("\n");
    }
    OutputDebugString("\n\n");
}

int
main(int argc, char* argv[])
{
// Specifying the width and height of the 2D buffer
    const unsigned int width = 33;
    const unsigned int height = 13;

    //--------------------------------------------------------------------------
    // Creating and initializing the input buffer
    //--------------------------------------------------------------------------

    // Creating an input buffer
    unsigned char* src = new unsigned char[width * height];
    unsigned char* ref = new unsigned char[width * height];
    //memset(src, 7, width * height);

    for (int i = 0; i < height; i++)
    {
        for(int j = 0; j < width; j++)
        {
            if (j % 4 == 0 && i % 4 == 0)
            {
                src[j + i * width] = '*';
                ref[j + i * width] = '/';
            }
            else
            {
                src[j + i * width] = '1';
                ref[j + i * width] = '3';
            }
        }
    }

    print(src, width, height);
   
    print(ref, width, height);

    // specifying the size of the 2D stream
    unsigned int streamSize[] = {width, height};

    // specifying the rank of the stream
    unsigned int rank = 1;

    brook::Stream<unsigned char> srcStream(rank, streamSize);
    brook::Stream<unsigned char> refStream(rank, streamSize);

    // copying data from input buffer to input stream
    srcStream.read(src);
    refStream.read(ref);

    // creating the output stream
    streamSize[0] = width;
    streamSize[1] = height;
    rank = 2;

    brook::Stream<unsigned char> outputStream(rank, streamSize);

    // creating the output stream
    streamSize[0] = width;
    streamSize[1] = height;
    rank = 2;

    brook::Stream<double> sad(rank, streamSize);

    //--------------------------------------------------------------------------
    // Executing kernel and copying back data
    //--------------------------------------------------------------------------   
    unsigned char str = '9';
    unsigned char str2 = '+';

    double ddd = src[0] - ref[0];
    double sadd = 0;

    for (int y = 0; y < 4; y++)
    {
        for (int x = 0; x < 4; x++)
        {
            int index = 0 + x + ((0 + y) * 33);
            sadd += (double)(src[index] - ref[index]);
        }
   
    }

    // Calling the kernel on the input and output streams
    motion_estimation(srcStream, refStream, str, str2, outputStream, sad);

    // Creating an output buffer
    unsigned char* out = new unsigned char[width * height];
    memset(out, 0, width * height);

    double *das = new double[width * height];
    memset(out, 0, width * height);

    // Copying data from output stream to output buffer
    outputStream.write(out);
    sad.write(das);

    print(out, width, height);

    printd(das, width, height);

    // Check error on stream
    if(outputStream.error())
    {
        // Print error Log associated to stream
        fprintf(stdout, "%s\n", outputStream.errorLog());
    }

    fprintf(stdout, "Output buffer:\n");
//    printBuffer(outputBuffer, width, 0, 0, 8, 8);

    //--------------------------------------------------------------------------
    // Checking whether the result is correct or not
    //--------------------------------------------------------------------------
   

    //--------------------------------------------------------------------------
    // Cleaning up
    //--------------------------------------------------------------------------
   
    delete[] src;
    delete[] ref;
   
    return 0;
}


kernel int estimate_macroblock_4x4(unsigned char mbs[],
                                   unsigned char mbr[],
                                   int i, int j,
                                   unsigned char str,
                                   out unsigned char o_img<>,
                                   out double sad<>
{
    //o_img = str;
    int x, y;
   
    //sad = (double) (i + 0 + ((j + 0) * 33)) ;
   
    for (y = 0; y < 4; y++)
    {
        for (x = 0; x < 4; x++)
        {
            int index = i + x + ((j + y) * 33);
            sad += (double)(mbs[index] - mbr[index]);
        }
   
    }
    //o_img = str;
   
    return 0;
}

kernel void motion_estimation(unsigned char src[],
                              unsigned char ref[],
                              unsigned char str,
                              unsigned char str2,
                              out unsigned char o_img<>,
                              out double sad<>
{
    // Output position
    int2 vPos = instance().xy;
   
    int i = vPos.x; // width
    int j = vPos.y; // height
   
    if ((j % 4 > 0 || i % 4 > 0) || (j == 12 || i == 32))
    {
        o_img = str;
        sad = 1.0;
    }
    else
    {
        o_img = src[i + j * 33];
        estimate_macroblock_4x4(src, ref, i, j, str2, o_img, sad);
    }
}


0 Likes

One thing that is definitely wrong with your test case is out of range indexing of 1D input streams.Your input stream is 1D with size = width and not width * height

// specifying the size of the 2D stream
    unsigned int streamSize[] = {width, height};

// specifying the rank of the stream
unsigned int rank = 1;

brook::Stream<unsigned char> srcStream(rank, streamSize);
brook::Stream<unsigned char> refStream(rank, streamSize);



 

I think you want to do the following-

 

// specifying the size of the 2D stream
    unsigned int streamSize[] = {width * height};

// specifying the rank of the stream
unsigned int rank = 1;

brook::Stream<unsigned char> srcStream(rank, streamSize);
brook::Stream<unsigned char> refStream(rank, streamSize);



0 Likes

Yes, you are right. It helps. Thank you.

0 Likes

Hello again! I decided to continue this topic by next question.

This line if (testsad <= sad[ i]) produce next error:

error C2676: binary '<=' : '__BrtDouble1' does not define this operator or a conversion to a type acceptable to the predefined operator

kernel void motion_estimation(unsigned char src[],
                              unsigned char ref[],
                              int width,
                              int height,
                              out double sad[][])


What is the problem here? I can not obtain elements from output array?

0 Likes

It seems sad is 2D scatter stream, shouldn't you index it with 2D indices.

0 Likes

No Of course I use [] [], it is forum problem. The second brackets were removed by unknown reasons. I put space after '[' and it helps.

Ok. Any other ideas?

0 Likes

Could you post the datatype of testsad? These are some template errors from CPU runtime and doesn't show up on all the versions of gcc.

I would suggest you to disable CPU backend code generation to resolve these issues. You can compile .br file with -p cal option to disable CPU codegen.

0 Likes