I am currently coding part of MD5 algorithm inside a kernel code and I encountered a strange issue and even if I found some workaround for that, I would like to understand the rootcause...
I tried to isolate the problem in a simpler kernel code, but the issue "obviously" is not occuring with less code before ...
I tried also to compare assembler code (using CodeXL Kernel analyser code generator) but as I have not the Tahiti assembler instructions documented, I cannot really make any conclusion ...
So here is the full kernel code:
__kernel void digest_F_compose(__constant uint16 *X, __global uint4 *r1234)
private uint a,b,c,d;
private uint FF;
private uint16 local_X;
private uint4 local_r1234 = r1234;
local_r1234 = (uint4)(0x67452301,0xefcdab89,0x98badcfe,0x10325476);
FF = (b & c) | (~b & d);
FF = (a & b) | (~a & c);
FF = (d & a) | (~d & b);
FF = (c & d) | (~c & a);
After last RED line (FF last computing), here are the values I get for a,FF,c,d private variables:
(note that my initial FF value expectation is 0x5B02192C ....)
At the end when the error occurs, the last FF result is NOT equal to expected result from formula (c & d) | (~c & a)
I suspect the compiler is not working fine and ORing 2 incorrect registers as curiously (a | c) is = to my wrong value of FF here.....!!!!
I tried as a workaround to replace the last FF computation by the equivalent formula : FF = ~(~((c) & (d)) & ~((~c) & (a))); and surprisingly it gave me the correct expected result of initial FF formula !!!
So it tells me I could have discovered an OpenCL compiler issue because at the end GPU is able itself to compute the final values when I change expression of FF ......
For information, my GPU is Tahiti (Radeon HD7950) and my SDK is AMD APP 2.8 release. Catalyst driver version is 13.4
Please advice ....
Thanks a lot nou !
It also double confirm a test I did today using a RV710 GPU (HD4xxx)+open CL 1.0 compiler + Catalyst 12.10 => my original code worked on this GPU... so indeed new 13.4 is buggy with HD79xx ......
I had occasion to test the bitselect and it worked really fine !