Here is an OpenCL Kernel which fails to compute some of the values correctly. I've tested this on Ubuntu 12.04 with Catalyst 13.4 but it fails on older versions, too. Reproduces on all my tested cards. Problem is located somewhere in the builtin rotate().
/* This kernel returns the following values:
0: 0x01234567 = 0x01234567
1: 0x02468ace = 0x02468ace
2: 0x048d159c = 0x048d159c
3: 0x091a2b38 = 0x091a2b38
4: 0x12345670 = 0x12345670
5: 0x2468ace0 = 0x2468ace0
6: 0x48d159c0 = 0x48d159c0
7: 0x91a2b380 = 0x91a2b380
8: 0x67012345 = 0x23456701 <- ERROR
9: 0x468ace02 = 0x468ace02
10: 0x8d159c04 = 0x8d159c04
11: 0x1a2b3809 = 0x1a2b3809
12: 0x34567012 = 0x34567012
13: 0x68ace024 = 0x68ace024
14: 0xd159c048 = 0xd159c048
15: 0xa2b38091 = 0xa2b38091
16: 0x45670123 = 0x45670123
17: 0x8ace0246 = 0x8ace0246
18: 0x159c048d = 0x159c048d
19: 0x2b38091a = 0x2b38091a
20: 0x56701234 = 0x56701234
21: 0xace02468 = 0xace02468
22: 0x59c048d1 = 0x59c048d1
23: 0xb38091a2 = 0xb38091a2
24: 0x23456701 = 0x67012345 <- ERROR
25: 0xce02468a = 0xce02468a
26: 0x9c048d15 = 0x9c048d15
27: 0x38091a2b = 0x38091a2b
28: 0x70123456 = 0x70123456
29: 0xe02468ac = 0xe02468ac
30: 0xc048d159 = 0xc048d159
31: 0x8091a2b3 = 0x8091a2b3
uint ROTATE_LEFT (const uint a, const uint n)
return (a << n) | (a >> (32 - n));
__kernel void oclHashcat_RipeMD160 (__global uint *results)
const int gid = get_global_id (0);
if (gid != 0) return; // this is just to avoid race conditions while writing results
const uint v = 0x01234567;
for (uint i = 0; i < 32; i++) results[ 0 + i] = rotate (v, i);
for (uint i = 0; i < 32; i++) results[32 + i] = ROTATE_LEFT (v, i);
It's not entirely clear from atom's post, but 0x23456701 is the desired output for results. Looking closely, I can see that the results for 8 and 24 are switched. If we were rotating right, then the results for 8 and 24 would be correct. 16 is correct if we are rotating right or left. Of course, we are rotating left and not right. I'm guessing some compiler optimization was made for increments of 8 and the coder mistakenly put the wrong direction in.
I try doing something to make the loop not unroll (it might not be anyway), but Visual Studio is crashing on me whenever I press enter.
Thanks for reporting it. I will try to see if it happens at my end too.
So far what my beta-testers reported they can reproduce it on Catalyst 13.1 and 12.8, too. They can reproduce it with Cayman and Cypress but Tahiti is -not- affected. If you need a Host-Program for reproduce I can send it to you.
Thanks for the details. If you could send a complete repro-case, it will be easy to take it up to the developers.
Thanks for your time.
Attached to this post is a package that contains everything. The example console host program and the OpenCL kernel. I've stripped this code down out of a real application so that it should contain only what's neccessary to reproduce the error. The package contains precompiled 32 and 64 binaries for Linux and Windows as well as all the sources in case you want to build it out of your own.
Once you start the program it prints just the results to stdout.
Thanks for your time. Will check out and take it up with AMD engg.
The issue was reproduced on HD 6950 with 13.1 driver. I could also see that it does not happen on HD7770, as you had said. I will forward it to relevant Engg team. Thanks for your help.
We were able to reproduce this issue with the latest driver on Cayman Pro (HD 6950).
The issue is not seen with GCN Arch (Tahiti and Pitcairn). We are continuing to investigate the issue and will keep you updated.