the type is double.
i use mvc. i will try you advice
Do you mean like this?
mkdir brookgenfiles | "$(BROOKROOT)\sdk\bin\brcc_d.exe" -p cal -o "$(ProjectDir)\brookgenfiles\$(InputName)" "$(InputPath)"
It helps. Thanks.
Can I install latest ATI drivers? I mean did you fix that problem what was specified at beginning of the thread (memory garbage)?
Thanks
Additional remark regarding brook compiler.
The expression:
double sad = (double)(abs((((src[idx]) - ((ref[idx])))), where src and ref are unsigned char [] cause repletion.
The correct variant here
double sad = (double)(abs((((int)src[idx]) - ((int)ref[idx]))))
But I believe compiler should automatically converts to integer operation.
Additional question.
Can I pass more than one output buffer.
As I understand output buffer defines domain of execution. So kernel can use only one output stream. Is it right?
I need the additional array with the same size as output stream. Like this:
kernel void motion_estimation(unsigned char src[],
unsigned char ref[],
int width,
int height,
int mv[][], // additional buffer
out double sad[][])
You can use multiple regular output streams, but multiple scatter streams are not supported.
Give me example please? Is it affected performance?
kernel void multiple_ouput(out float o0<>, out float4 o1<> //valid - Good in performance as I would expect it would increase compute intensity of kernel compared to calling two kernel with single output streams
kernel void multiple_scatter(out float o0[], out float4 o1[]) // not supported
kernel void mix_output(out float o0[], out float4 o1<> // supported, but computation is done in multiple passes, so performance is similar to calling two kernels with single output streams
Ok. Thanks.
Are these chunks of code similar?
kernel void motion_estimation(unsigned char src[],
unsigned char ref[],
int width,
int height,
out double sad[][])
{
// Output position
int2 vPos = instance().xy;
int i = vPos.x; // width
int j = vPos.y; // height
if (i % 16 == 0 && j % 16 == 0)
sad
}
and
kernel void motion_estimation(unsigned char src[],
unsigned char ref[],
int width,
int height,
out double sad<>
{
// Output position
int2 vPos = instance().xy;
int i = vPos.x; // width
int j = vPos.y; // height
if (i % 16 == 0 && j % 16 == 0)
sad = 1.0;
}
The next question.
The key -p cal helps to avoid compile template errors, but unfortunately it hampers to debug program. I mean return values in out stream are corrupted when I compiled program with -p cal key. As soon as I remove -p cal and rebuild project this problem fades out. But template errors return
What do you advice me?
Yes, both the above kernels are same and the second kernel would have much better performance. Scatter streams are used for random writing, but if you always write to instance() position, its better to use regular output stream.
-p cal disables CPU backend codegen, so as long as you are are not running your code in CPU emulation mode, everything should be fine. Make sure you have not set environment variable BRT_RUNTIME=cpu