Finally I made a minimal reproducing example of a bug in OpenCL compilers for Thaiti in Adrenalin Win10 x64 drivers (tested on two workstations with 19.12.2, 20.1.1 and 20.5.1 drivers with -O0 and -O5). Kernel is attached (it is a part of my realization of improved iterative Gauss-Seidel WDK-method for complex polynomial roots finding). As is, it gives wrong result for poly and its parts poly1 and poly2:
l=0, poly=0.0768369+0.00147968i, prod=13+6.62408e-17i, tau=-0.999259-0.0385005i
Changing 1 to 2 in the loop upper limit miraculously gives the right result:
l=0, poly=-1.11022e-16+0i, prod=13+6.62408e-17i, tau=-0.999259-0.0385005i
Another small changes in the code switch right and wrong results in a seemingly random way. For example, commenting out third line of output in printf here gives always right results, but when line of initial code
-= cdiv(poly, prod);
is added before printf, result starts to be consistently wrong even without printf.
On my old laptop with 15.200.1065.0 drivers this bug is absent.
Thank you for reporting this issue and providing the reproducible test-case. I'll try to reproduce it locally and get back to you shortly.
As I know, the AMDIL compiler toolchain, which was used for GCN1 cards like Tahiti, has been discontinued and currently there is no plan to fix any issue related to this compiler. So, I doubt that the above issue will get fixed unless it is also reproducible on some newer cards (GCN2 and above).
I ran the attached code on a GCN2 card but could not reproduce the issue. Please let us know if you observe the issue on any newer card.
That's a pity to know such a thing not from release notes of drivers but on the support forum only. My calculations need double precision so I'm locked to Tahiti. What is the latest version of drivers that had supported OpenCL on GCN1 cards then? I can roll back to 15.200.1065.0, but maybe later drivers will be faster.
The latest Adrenalin drivers support OpenCL on GCN1 cards, but currently there is no plan to fix any OpenCL compiler related issues for these devices. I have also edited my earlier reply to clarify this point.
In case if somebody will have the same problem—it seems that the latest OpenCL implementation without this bug is 2004.6 of 16.4.2 Crimson drivers.