I found a bug in OpenCL compiler in the latest drivers. At least it is persistent in Adrenalin 19.5.2 and 19.8.1. Minimal reproducing example is included in the file. It just multiplies several complex numbers in a loop. If I write product to results array in the loop at the last iteration by commented out part, I got the right answer, while if I write result immediately after the loop, I got some strange numbers like 1E53.
I use OpenCL 1.2 mode on my Tahiti card in Win10 x64.
The bug was not present in Adrenalin 19.4.3. As I can see from binary files the problem arises from instruction (in CLRX notation)
v_mov_b32 v4, v5
added immediately before store instruction
tbuffer_store_format_xy v[4:5], v0, s[8:11], 0 offen format:[32_32,float].
In 19.4.3 binary this instruction is absent.
Hi melirius,
Thank you for reporting the above issue. I will report it to the compiler team. Once I've any update on this, I'll get back to you.
Thanks.
P.S. You have been whitelisted.
Looks like the attached kernel file is incomplete. The compilation failed because of some undefined identifiers like "REAL4". Please check and attach the correct kernel file.
By the way, do you observe the same issue if you disable the optimization (i.e. set the optimization flag "-O0")?
Thanks.
Just change REAL4 to double4. The following code should be suffice, if the problem is not in the unused kernel parameters. I try to check it now, both with -O0 switch.
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
//2 component vector to hold the real and imaginary parts of a complex number:
typedef double2 cREAL;
/* Multiply two complex numbers */
inline cREAL cmult(cREAL a, cREAL b) {
return (cREAL)(fma(a.x, b.x, -a.y * b.y), fma(a.x, b.y, a.y * b.x));
}
__kernel void p3loop(__global double * const results) {
const uint
ind_gl = ((get_global_id(0) * get_global_size(1) + get_global_id(1)) * get_global_size(2) + get_global_id(2)); // global index
double4 ptemp;
cREAL alpha[6];
alpha[0] = (cREAL)(0, -0.2);
alpha[1] = (cREAL)(0, 0.05);
alpha[2] = (cREAL)(0.6, 0.0005);
alpha[3] = (cREAL)(-0.1, 0.75);
alpha[4] = (cREAL)(0.9, -.03);
alpha[5] = (cREAL)(0.2, 0.1);
ptemp.lo = (cREAL)(1, 0);
for (volatile int j = 0; j < 6; ++j) {
ptemp.hi = (cREAL)(1, 0) - alpha
ptemp.hi = cmult(ptemp.hi, ptemp.hi);
ptemp.lo = cmult(ptemp.hi, ptemp.lo);
/*
if (j == 5) {
results[ind_gl] = ptemp.lo.y;
return;
}
*/
}
results[ind_gl] = ptemp.lo.y;
return;
} // end of kernel
OK, on Adrenalin 19.4.3 all -O{0-5} give right results for the code in the previous post. Let me check last recommended 19.9.2.
I confirm the bug in 19.9.2 OpenCL compiler with -O{2-5}. The bug is absent at -O0 and -O1.
Thank you for sharing the above observations. So, it looks like a compiler optimization related issue. I've reported it to the concerned team. Once I've any update, I'll get back to you.
Thanks.
Update:
A ticket has been created against this issue and it has been assigned to the appropriate team for detailed investigation.
As I've come to know, the above compiler issue has been fixed in the internal build. The fix is expected to be released soon.
Thanks.
I have found another bug in compiler optimization, but its minimal reproducing example so far is quite ample. I'll try to reduce it maximally and then post it here as an another thread.