cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

fandango
Journeyman III

Kernel function problem

Hi,

I have faced with small problem. My kernel function does not work correctly and return partly right result. This is function is very simple and I believe it is not my mistake. Could you please look at my example? What's wrong?

kernel void func1(unsigned char src[][],  unsigned char str, out unsigned char o_img<>
{
    // Output position
    int j = instance().x; // width
    int i = instance().y; // height
   
    int rest = j % 16;
   
    if (rest == 0)
    {
        o_img = src [ i] ;
    }
    else
    {
        o_img = str;
    }
}

 

Input:

3   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   3   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   3   1   1   1   1   1   1   1   1

I just replace 1 by 9.

Wrong output:

  3      9   9   9   9   9   9   9   9   9   9   9   9   9   9   3      9   9   9   9   9   9   9   9   9   9   9   9   9   9   3      9   9   9   9   9   9   9

You can see garbage in memory. If I remove if statement from the kernel , the output will be without garbage. But I need the first way.

 

 

0 Likes
74 Replies

the type is double.

i use mvc. i will try you advice

0 Likes

Do you mean like this?

mkdir brookgenfiles | "$(BROOKROOT)\sdk\bin\brcc_d.exe" -p cal -o "$(ProjectDir)\brookgenfiles\$(InputName)" "$(InputPath)"

It helps. Thanks.

Can I install latest ATI drivers? I mean did you fix that problem what was specified at beginning of the thread (memory garbage)?

Thanks

0 Likes

Additional remark regarding brook compiler.

The expression:

double sad = (double)(abs((((src[idx]) - ((ref[idx])))), where src and ref are unsigned char [] cause repletion.

The correct variant here

double sad = (double)(abs((((int)src[idx]) - ((int)ref[idx]))))

But I believe compiler should automatically converts to integer operation.

0 Likes

Additional question.

Can I pass more than one output buffer.

As I understand output buffer defines domain of execution. So kernel can use only one output stream. Is it right?

I need the additional array with the same size as output stream. Like this:

kernel void motion_estimation(unsigned char src[],
                              unsigned char ref[],
                              int width,
                              int height,

                              int mv[][], // additional buffer
                              out double sad[][])

0 Likes

You can use multiple regular output streams, but multiple scatter streams are not supported.

0 Likes

Give me example please? Is it affected performance?

0 Likes

kernel void multiple_ouput(out float o0<>, out float4 o1<> //valid - Good in performance as I would expect it would increase compute intensity of kernel compared to calling two kernel with single output streams

kernel void multiple_scatter(out float o0[], out float4 o1[]) // not supported

kernel void mix_output(out float o0[], out float4 o1<> // supported, but computation is done in multiple passes, so performance is similar to calling two kernels with single output streams

0 Likes

Ok. Thanks.

Are these chunks of code similar?

kernel void motion_estimation(unsigned char src[],
                              unsigned char ref[],
                              int width,
                              int height,
                              out double sad[][])
{
    // Output position
    int2 vPos = instance().xy;
   
    int i = vPos.x; // width
    int j = vPos.y; // height

   if (i % 16 == 0 && j % 16 == 0)

    sad[ i] = 1.0;

}


and


kernel void motion_estimation(unsigned char src[],
                              unsigned char ref[],
                              int width,
                              int height,
                              out double sad<>
{
    // Output position
    int2 vPos = instance().xy;
   
    int i = vPos.x; // width
    int j = vPos.y; // height

   if (i % 16 == 0 && j % 16 == 0)

    sad = 1.0;

}


0 Likes

The next question.

The key -p cal helps to avoid compile template errors, but unfortunately it hampers to debug program. I mean return values in out stream are corrupted when I compiled program with -p cal key. As soon as I remove -p cal and rebuild project this problem fades out. But template errors return

What do you advice me?

0 Likes

Yes, both the above kernels are same and the second kernel would have much better performance. Scatter streams are used for random writing, but if you always write to instance() position, its better to use regular output stream.

-p cal disables CPU backend codegen, so as long as you are are not running your code in CPU emulation mode, everything should be fine. Make sure you have not set environment variable BRT_RUNTIME=cpu

0 Likes

Gaurav,

I wonder, what is advantage of using cpu emulator? I understand that instructions are executed on cpu. And what...? Anyway I can not to enter kernel and debugging inside.

0 Likes

The purpose of CPU backend code is for debugging only. You can debug inside kernel if you disable line generation in cpp file (use -nl option)

0 Likes

Gaurav,

BRT_RUNTIME = cal

mkdir brookgenfiles | "$(BROOKROOT)\sdk\bin\brcc_d.exe" -p cal -o "$(ProjectDir)\brookgenfiles\$(InputName)" "$(InputPath)"

Ouput is broken yet. Why?

0 Likes

That is strange. Are you sure it works without -p cal option? I mean how did you test it with template error? Could you post the test case?

0 Likes

I just use the simple test in this case.

 

kernel void motion_estimation(unsigned char src[],
                              unsigned char ref[],
                              int width,
                              int height,
                              out double sad<>,
                              out int mvx<>,
                              out int mvy<>
{
    // Output position
    int2 vPos = instance().xy;
   
    int i = vPos.x; // width
    int j = vPos.y; // height
   
    int ix = i * 16;
    int jy = j * 16;

   sad = 2.0;

}

 

That's it. So, no templates. With -p cal option output sad contains garbage. With -p cpu everything is ok (2.0 value).

 

0 Likes

Something is going wrong. It seems you are running your code under CPU backend. Make sure you close your visual studo or command prompt after changing environment variable and then open it again to read the updated env variable.

0 Likes

You are right. I tried to restart VS, but it did not help.

The windows restart helps.

0 Likes

if ((sad >= testsad) && (mvlength > abs(y) + abs(x)))
{
         sad = testsad;
         mvlength = abs(y) + abs(x);
         mvy = y;
         mvx = x;
}

ERROR--1: In Binary expression: Mismatched operands: both must have same type and same number of components
1> Statement: sad >= testsad && mvlength > abs(y) + abs(x) in sad >= testsad && mvlength > abs(y) + abs(x)
1> Expression : sad >= testsad, Type : double
1> Expression : mvlength > abs(y) + abs(x), Type : int

 

Try to do this : if (((int)sad >= (int)testsad) && (mvlength > abs(y) + abs(x))) and this condition never is true.

 

 

 

 

0 Likes

brcc returns the same type from conditional expressions as of operands. You can try this-

if ((int)(sad >= testsad) && (mvlength > abs(y) + abs(x)))

0 Likes

No, I suppose your variant is incorrect. I have checked.

The correct:

if (((int)sad >= (int)testsad) && (mvlength > abs(y) + abs(x)))

I just figured out it.

 

 

0 Likes

But, it would cause a conversion of sad and testsad before checking the condition and can produce incorrect results

0 Likes

Hmm, a many days try to understand what is going on with my kernel code.

Probably you can help me. Are these code chuncks similar? I mean logic.

This chunk I execute on cpu

    int xleft = 0, xright = 16;
    int ytop = 0, ybottom = 16;

    int temp = 0;
    int mvlength = 100000000;

    for (int j = 0; j < height; j += 16)
    {
        for (int i = 0; i < width; i += 16)
        {
            // set top and bottom range
            ytop = - min(j, 16);
            ybottom = min(height - 16 - j + 1, 16);

            // set left and right range
            xleft = - min(i, 16);
            xright = min(width - 16 - i + 1, 16);

            refsad = 100000000;

            for (int y = ytop; y < ybottom; y++)
            {
                for (int x = xleft; x < xright; x++)
                {
                    int srcidx = i + (j * width);

                    int index = i + x + ((j + y) * width);

                    // calculate SAD
                    //--------------------------------
                     for (m = 0; m < 16; m++)
                    {
                        for (n = 0; n < 16; n++)
                        {
                            temp += abs((src[srcidx + n] - ref[index + n]));
                        }

                        srcidx += width;
                        index += width;
                    }
                    //-------------------------------

                    if ((refsad >= temp) && (mvlength > abs(x) + abs(y)))
                    {
                        refsad = temp;
                        mvlength = abs(x) + abs(y);
                        refmvx = x;
                        refmvy = y;

                        refmvl = x;
                    }

                    temp = 0.0;
                }
            }

            l++;
            mvlength = 100000000;
        }
    }

And this as kernel

 int ytop = - min(jy, 16);
        int ybottom = min(height - 16 - jy + 1, 16);

        // set left and right range
        int xleft = - min(ix, 16);
        int xright = min(width - 16 - ix + 1, 16);
       
        int x, y;
        int m, n;
       
        int mvlength = 100000000;
       
        sad = 100000000;

        for (y = ytop; y < ybottom; y++)
        {
            for (x = xleft; x < xright; x++)
            {
                int testsad = 0;
               
                int srcidx = ix + (jy * width);
               
                int idx = ix + x + ((jy + y) * width);
               
                for (m = 0; m < 16; m++)
                {
                    for (n = 0; n < 16; n++)
                    {               
                        testsad += (abs((((int)src[srcidx + n]) - ((int)ref[idx + n]))));   
                    }
                   
                    srcidx += width;
                    idx += width;
                }
               
                if ((sad >= testsad) && (mvlength > (abs(y) + abs(x))))
                {
                    sad = testsad;
                    mvlength = (abs(y) + abs(x));
                    mvy = y;
                    mvx = x;
                    mvl = mvlength;
                }
            }
        }
    }

 

What do you think is the same logic? I have different results in mvx and mvy. Probably you see mistakes in kernel code. Because I expect absolutely the same behaviour.

I think problem exists in latest if.

if ((sad >= testsad) && (mvlength > (abs(y) + abs(x))))

If you need additional code let me know.

0 Likes

I'm sure it is driver problem again.

I have debugged in cpu mode (everytime forgot about debug mode) and there are no problems.

The problems are only in cal mode.

0 Likes

Could you post the complete test case to reproduce this issue?
0 Likes

Can I send code by email? It is not comfortable to publish on the forum.

0 Likes

Yes, you can email on the address mentioned in my profile. I would take a look as soon as I get some free cycles.

0 Likes

Done. Please let me know asap.

Thank you very much!

0 Likes

Gaurav,

Please confirm that you have received my email.

0 Likes

Yes, I have received your mail, but I couldn't find any issue with your code. It seems an issue on driver side? Which Catalyst are you using?

Could you try it with 9.2?

0 Likes

What is output of my test? Are the similar results on cpu and cal modes?

0 Likes

No, the results were different. CPU mode was showing all the value to 0, whereas CAL was showing values to 15 (except first column or first row that was 0).

0 Likes

Ok. So you agree that problem exist on the driver level.

My catalyst version is 2009.0428.2132.36839, driver version 8.612.0.0000.

It is the latest release for x64 platform.

Before I used old version and the same problems were obtained. Unfortunately I don't remember version number.

0 Likes

I have tried 9.2 catalyst. The same problem. What do you advice me?

0 Likes

I am still waiting for you help. It is very important for me.

0 Likes

What is official your position?

Do you have plans to fix such bugs?

0 Likes