Hi,
I have come across a possible compiler issue which is present in the attached kernels. The two kernels differ how work is mapped to the threads. Requiring a work group size of 256; the difference in the kernels is
210,211c210,211
< int divIdx = get_local_id(0) / 8;
< int modIdx = get_local_id(0) % 8;
---
> int divIdx = get_local_id(0) % 32;
> int modIdx = get_local_id(0) / 32;
For the first version i get correct reproducible results from an Radeon 6550 for the PowerBornBBKernel, as well as an r290x. For the latter version i still get correct reproducible results from the 6550. However i get incorrect results from r290x which differ between different runs of the kernel with the same input.
Name: | Hawaii |
Vendor: | Advanced Micro Devices, Inc. |
Device OpenCL C version: | OpenCL C 1.2 |
Driver version: | 1573.4 (VM) |
Profile: | FULL_PROFILE |
Version: | OpenCL 1.2 AMD-APP (1573.4) |
Name: | BeaverCreek |
Vendor: | Advanced Micro Devices, Inc. |
Device OpenCL C version: | OpenCL C 1.2 |
Driver version: | 1214.3 |
Profile: | FULL_PROFILE |
Version: | OpenCL 1.2 AMD-APP (1214.3) |