I’ve got the working OpenCL kernel that calculates SHA-256 hash of VERY long string and it takes too much time. I decided to split this kernel into several parts and save the intermediate results in global buffer. This means the same kernel is called several times and, if the calculation is not completed, the previous intermediate context is loaded from this buffer, more hash calculation is performed and new intermediate result is saved again.
Unfortunately, the new split kernel is not working – the global buffer with intermediate result has always zeros. But,
- It’s not working on GCN architecture only (I’ve tested on Capeverde). I've tested Catalyst from 12.3 up to 12.11. It works fine on VLIW5 and NVIDIA GPU.
- If I try to print using printf() the intermediate buffer, the code works ok.
- If I comment some lines in the code , the buffer is also not zero.
The minimal sample is attached. If there is the right way to report such a bug, please let me know – I couldn’t find it. Thanks.