Hi, I did some tests on an HD7970 card recently, simply running the md5 hash function in openCL on it. But I found that the result is wrong (the same code runs fine on an NVIDIA card).
It became correct if I used the cl-disable-opt option. Does anyone have similar experience? Is it a bug in the compiler? I have been using openCL 1.2.
A simplified version of the kernel code is like this:
__kernel void crack(__global unsigned char * correctPass, __global unsigned char *cudaCharSet, int wordLength, int charSetLen, unsigned int v1, unsigned int v2, unsigned int v3, unsigned int v4, __constant unsigned char *cudaBrute, __global unsigned int* result)
const unsigned int a0 = 0x67452301;
const unsigned int b0 = 0xEFCDAB89;
const unsigned int c0 = 0x98BADCFE;
const unsigned int d0 = 0x10325476;
int length = 3;
unsigned int a = 0;
unsigned int b = 0;
unsigned int c = 0;
unsigned int d = 0;
vals |= ((unsigned int)v1) << ((0 % 4) * 8); //(97+((int)(get_group_id(0))%26)) << ((0 % 4) * 8);
vals |= (97+1%26) << ((1 % 4) * 8);
vals |= (97+2) << ((2 % 4) * 8);
vals |= 0x80 << ((3 % 4) * 8);
//Initialize hash value for this chunk:
a = a0;
b = b0;
c = c0;
d = d0;
#define S11 7
#define S12 12
#define S13 17
#define S14 22
FF ( a, b, c, d, vals, S11, 3614090360); /* 1 */
FF ( d, a, b, c, 0, S12, 3905402710); /* 2 */
FF ( c, d, a, b, 0, S13, 606105819); /* 3 */
unsigned int inter = ~c;
//FF ( b, c, d, a, 0, S14, 3250441966); /* 4 */
unsigned int test2 = ~c&a&0x80000000;
result = (d&c) | result;
The value of result is wrong. If I used a constant value instead of "v1" in the first assignment of vals, the result is correct.
Could you give the host code along with the kernel that reproduces the issue? Also, which OS and driver are you using?
I have asked someone who builds high performance computer systems, and
he found that this is a bug in the OpenCL compiler. It is some code like
#define F(x, y, z) (( & (y)) | ((~x) & (z)))
He told me to change it to
#define F(x, y, z) bitselect(z,y,x)
, and it works.
Thanks for letting us know. I am glad that you found the work-around.
This work-around does not apply in most cases. There are many transformation passes, and if any of these transformations modify your code to be of the form that triggers the bug, you will be out of luck. As it happens, transformations are a major part of what compilers do, and so this workaround only works if you perform all transformations by hand & apply the fix before passing your code to the compiler. I don't think you would like to do that. As it turns out, the code above is some bitsliced code (typically used to break cryptographic primitives such as ciphers, hash functions, etc) that will be transformed in so many ways before hitting the bug-inducing translation in the compiler that trying to apply that fix is probably futile. I have bumped into this same problem in multiple different kernels of mine, each implementing a different primitive. It's simply unavoidable. This needs to be fixed in the compiler, no workaround will do.
Yeah, I agree with you.
The person I asked have reported this bug to AMD, I think, but haven't
got a fixing yet.
I am looking into the bug report filed on this. Will keep you guys updated.
This was reported long time back and reported to AMD as well.
A bug has been raised and by the end of our service duration - this bug was "in progress" and was not fixed.
AMD Folks who want to know the bug-id can check with the team that maintained Forums last year.
Exactly. it was never fixed. I'm glad that after more than half a year, we are back to where we started. This sounds like progress, AMD style.
Missed giving update on this before. The issue has been fixed and fix should be available in the next driver. We will get back with exact details later.
This bug is fixed in the future release of the catalyst driver ver 14.20.
We have further verified with our driver team. Catalyst driver Ver 14.20 will not have this bug.