OpenCL

melirius · ‎10-10-2019

I found a bug in OpenCL compiler in the latest drivers. At least it is persistent in Adrenalin 19.5.2 and 19.8.1. Minimal reproducing example is included in the file. It just multiplies several complex numbers in a loop. If I write product to results array in the loop at the last iteration by commented out part, I got the right answer, while if I write result immediately after the loop, I got some strange numbers like 1E53.

I use OpenCL 1.2 mode on my Tahiti card in Win10 x64.

The bug was not present in Adrenalin 19.4.3. As I can see from binary files the problem arises from instruction (in CLRX notation)

v_mov_b32 v4, v5
added immediately before store instruction

tbuffer_store_format_xy v[4:5], v0, s[8:11], 0 offen format:[32_32,float].

In 19.4.3 binary this instruction is absent.

dipak · ‎10-11-2019

Hi melirius‌,

Thank you for reporting the above issue. I will report it to the compiler team. Once I've any update on this, I'll get back to you.

Thanks.

P.S. You have been whitelisted.

dipak · ‎10-11-2019

Looks like the attached kernel file is incomplete. The compilation failed because of some undefined identifiers like "REAL4". Please check and attach the correct kernel file.

By the way, do you observe the same issue if you disable the optimization (i.e. set the optimization flag "-O0")?

Thanks.

melirius · ‎10-11-2019

Just change REAL4 to double4. The following code should be suffice, if the problem is not in the unused kernel parameters. I try to check it now, both with -O0 switch.

#pragma OPENCL EXTENSION cl_khr_fp64 : enable

//2 component vector to hold the real and imaginary parts of a complex number:
typedef double2 cREAL;

/* Multiply two complex numbers */
inline cREAL cmult(cREAL a, cREAL b) {
return (cREAL)(fma(a.x, b.x, -a.y * b.y), fma(a.x, b.y, a.y * b.x));
}

__kernel void p3loop(__global double * const results) {

const uint

ind_gl = ((get_global_id(0) * get_global_size(1) + get_global_id(1)) * get_global_size(2) + get_global_id(2)); // global index

double4 ptemp;

cREAL alpha[6];

   alpha[0] = (cREAL)(0, -0.2);
   alpha[1] = (cREAL)(0, 0.05);
   alpha[2] = (cREAL)(0.6, 0.0005);
   alpha[3] = (cREAL)(-0.1, 0.75);
   alpha[4] = (cREAL)(0.9, -.03);
   alpha[5] = (cREAL)(0.2, 0.1);

   ptemp.lo = (cREAL)(1, 0);
   for (volatile int j = 0; j < 6; ++j) {
      ptemp.hi = (cREAL)(1, 0) - alpha;
      ptemp.hi = cmult(ptemp.hi, ptemp.hi);
      ptemp.lo = cmult(ptemp.hi, ptemp.lo);

      /*
      if (j == 5) {
         results[ind_gl] = ptemp.lo.y;
      return;
      }
      */
   }
   results[ind_gl] = ptemp.lo.y;
   return;

} // end of kernel

melirius · ‎10-11-2019

OK, on Adrenalin 19.4.3 all -O{0-5} give right results for the code in the previous post. Let me check last recommended 19.9.2.

melirius · ‎10-11-2019

I confirm the bug in 19.9.2 OpenCL compiler with -O{2-5}. The bug is absent at -O0 and -O1.

dipak · ‎10-11-2019

Thank you for sharing the above observations. So, it looks like a compiler optimization related issue. I've reported it to the concerned team. Once I've any update, I'll get back to you.

Thanks.

dipak · ‎10-15-2019

Update:

A ticket has been created against this issue and it has been assigned to the appropriate team for detailed investigation.

dipak · ‎10-18-2019

As I've come to know, the above compiler issue has been fixed in the internal build. The fix is expected to be released soon.

Thanks.

melirius · ‎10-24-2019

I have found another bug in compiler optimization, but its minimal reproducing example so far is quite ample. I'll try to reduce it maximally and then post it here as an another thread.

OpenCL

Bug in OpenCL compiler