cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

Journeyman III
Journeyman III

Kernel function problem

the type is double.

i use mvc. i will try you advice

0 Kudos
Reply
Journeyman III
Journeyman III

Kernel function problem

Do you mean like this?

mkdir brookgenfiles | "$(BROOKROOT)\sdk\bin\brcc_d.exe" -p cal -o "$(ProjectDir)\brookgenfiles\$(InputName)" "$(InputPath)"

It helps. Thanks.

Can I install latest ATI drivers? I mean did you fix that problem what was specified at beginning of the thread (memory garbage)?

Thanks

0 Kudos
Reply
Journeyman III
Journeyman III

Kernel function problem

Additional remark regarding brook compiler.

The expression:

double sad = (double)(abs((((src[idx]) - ((ref[idx])))), where src and ref are unsigned char [] cause repletion.

The correct variant here

double sad = (double)(abs((((int)src[idx]) - ((int)ref[idx]))))

But I believe compiler should automatically converts to integer operation.

0 Kudos
Reply
Journeyman III
Journeyman III

Kernel function problem

Additional question.

Can I pass more than one output buffer.

As I understand output buffer defines domain of execution. So kernel can use only one output stream. Is it right?

I need the additional array with the same size as output stream. Like this:

kernel void motion_estimation(unsigned char src[],
                              unsigned char ref[],
                              int width,
                              int height,

                              int mv[][], // additional buffer
                              out double sad[][])

0 Kudos
Reply
Adept I
Adept I

Kernel function problem

You can use multiple regular output streams, but multiple scatter streams are not supported.

0 Kudos
Reply
Journeyman III
Journeyman III

Kernel function problem

Give me example please? Is it affected performance?

0 Kudos
Reply
Adept I
Adept I

Kernel function problem

kernel void multiple_ouput(out float o0<>, out float4 o1<> //valid - Good in performance as I would expect it would increase compute intensity of kernel compared to calling two kernel with single output streams

kernel void multiple_scatter(out float o0[], out float4 o1[]) // not supported

kernel void mix_output(out float o0[], out float4 o1<> // supported, but computation is done in multiple passes, so performance is similar to calling two kernels with single output streams

0 Kudos
Reply
Journeyman III
Journeyman III

Kernel function problem

Ok. Thanks.

Are these chunks of code similar?

kernel void motion_estimation(unsigned char src[],
                              unsigned char ref[],
                              int width,
                              int height,
                              out double sad[][])
{
    // Output position
    int2 vPos = instance().xy;
   
    int i = vPos.x; // width
    int j = vPos.y; // height

   if (i % 16 == 0 && j % 16 == 0)

    sad[ i] = 1.0;

}


and


kernel void motion_estimation(unsigned char src[],
                              unsigned char ref[],
                              int width,
                              int height,
                              out double sad<>
{
    // Output position
    int2 vPos = instance().xy;
   
    int i = vPos.x; // width
    int j = vPos.y; // height

   if (i % 16 == 0 && j % 16 == 0)

    sad = 1.0;

}


0 Kudos
Reply
Journeyman III
Journeyman III

Kernel function problem

The next question.

The key -p cal helps to avoid compile template errors, but unfortunately it hampers to debug program. I mean return values in out stream are corrupted when I compiled program with -p cal key. As soon as I remove -p cal and rebuild project this problem fades out. But template errors return

What do you advice me?

0 Kudos
Reply
Adept I
Adept I

Kernel function problem

Yes, both the above kernels are same and the second kernel would have much better performance. Scatter streams are used for random writing, but if you always write to instance() position, its better to use regular output stream.

-p cal disables CPU backend codegen, so as long as you are are not running your code in CPU emulation mode, everything should be fine. Make sure you have not set environment variable BRT_RUNTIME=cpu

0 Kudos
Reply