I wrote a opencl kernel for testing:
.....
if (A==B) { copy private memory data to global memory }
......}
I am sure that A definitely not equal B for all work-item, but the If statement still decreases the performance seriously.
Can anyone give me a hint?
Thank you!
Is A not equal B for all work-items?
I am sure that A definitely not equal B for all work-items.
Is this "If" inside a tight cycle? Could you show a little more of your kernel?
//global_size=1000
//__global uint B;
//__global uint b;
//__private uint A;
//__private uint a;
...__kernel ......
for (i=0;i<=1000000;i++)
{
//generating __private a;
//generating __private A;
if (A==B) b=a;
//but A!=B for all i=0~1000000 and global_id=0~999;
}
-----------------------------------------
thank you so much!
Well my guess is that when you remove "if (A==B) b=a;" line the compiler throws away the code which generates a and A as usless code because it doesn't affect anything global.
You will be able to check this by looking at ISA code in the profiler.