Hello everyone,
I'm currently using Brook+ on my 3870x2 to write a texture synthesis program that takes three 64x64 images as input. I've implemented my first step but I was a bit dissapointed at the performance. When I checked GPU-Z, the GPU load meter was only high (80+%) for about a second and then stayed around 5% for the rest of the execution, which takes about 10-15 minutes!
This is my code to call my kernel (CompareCross):
for(int xy = 0; xy < 64; xy++)
{
for(int xx = 0; xx < 64; xx++)
{
for(int yy = 0; yy < 64; yy++)
{
for(int yx = 0; yx < 64; yx++)
{
CompareCross(int2(xx, xy), stmExemplarX, int2(yx, yy), stmExemplarY, stmExemplarZ, stmOutput);
}
}
}
}
I'm wondering why this is the case? Is it the nested loops? I figure if my kernel was badly written, I'd still see a lot of GPU activity. If anyone has some advice or tips they'd be greatly appreciated
Kind regards,
Rob