Hi, I did some tests on an HD7970 card recently, simply running the md5 hash function in openCL on it. But I found that the result is wrong (the same code runs fine on an NVIDIA card).
It became correct if I used the cl-disable-opt option. Does anyone have similar experience? Is it a bug in the compiler? I have been using openCL 1.2.
A simplified version of the kernel code is like this:
__kernel void crack(__global unsigned char * correctPass, __global unsigned char *cudaCharSet, int wordLength, int charSetLen, unsigned int v1, unsigned int v2, unsigned int v3, unsigned int v4, __constant unsigned char *cudaBrute, __global unsigned int* result)
const unsigned int a0 = 0x67452301;
const unsigned int b0 = 0xEFCDAB89;
const unsigned int c0 = 0x98BADCFE;
const unsigned int d0 = 0x10325476;
int length = 3;
unsigned int a = 0;
unsigned int b = 0;
unsigned int c = 0;
unsigned int d = 0;
vals |= ((unsigned int)v1) << ((0 % 4) * 8); //(97+((int)(get_group_id(0))%26)) << ((0 % 4) * 8);
vals |= (97+1%26) << ((1 % 4) * 8);
vals |= (97+2) << ((2 % 4) * 8);
vals |= 0x80 << ((3 % 4) * 8);
//Initialize hash value for this chunk:
a = a0;
b = b0;
c = c0;
d = d0;
#define S11 7
#define S12 12
#define S13 17
#define S14 22
FF ( a, b, c, d, vals, S11, 3614090360); /* 1 */
FF ( d, a, b, c, 0, S12, 3905402710); /* 2 */
FF ( c, d, a, b, 0, S13, 606105819); /* 3 */
unsigned int inter = ~c;
//FF ( b, c, d, a, 0, S14, 3250441966); /* 4 */
unsigned int test2 = ~c&a&0x80000000;
result = (d&c) | result;
The value of result is wrong. If I used a constant value instead of "v1" in the first assignment of vals, the result is correct.
I have asked someone who builds high performance computer systems, and
he found that this is a bug in the OpenCL compiler. It is some code like
#define F(x, y, z) (( & (y)) | ((~x) & (z)))
He told me to change it to
#define F(x, y, z) bitselect(z,y,x)
, and it works.
This work-around does not apply in most cases. There are many transformation passes, and if any of these transformations modify your code to be of the form that triggers the bug, you will be out of luck. As it happens, transformations are a major part of what compilers do, and so this workaround only works if you perform all transformations by hand & apply the fix before passing your code to the compiler. I don't think you would like to do that. As it turns out, the code above is some bitsliced code (typically used to break cryptographic primitives such as ciphers, hash functions, etc) that will be transformed in so many ways before hitting the bug-inducing translation in the compiler that trying to apply that fix is probably futile. I have bumped into this same problem in multiple different kernels of mine, each implementing a different primitive. It's simply unavoidable. This needs to be fixed in the compiler, no workaround will do.
This was reported long time back and reported to AMD as well.
A bug has been raised and by the end of our service duration - this bug was "in progress" and was not fixed.
AMD Folks who want to know the bug-id can check with the team that maintained Forums last year.
Missed giving update on this before. The issue has been fixed and fix should be available in the next driver. We will get back with exact details later.