17 Replies Latest reply on Apr 3, 2013 10:47 PM by vmiura

    OpenCL bug with HD 7790 (Bonaire)

    vmiura

      Hello,

       

      I am hitting an odd bug running one of my OpenCL kernels on a new HD 7790.  This is a kernel that I've verified on a HD 7770, and also on some Fermi and Kepler cards also.

       

      After a lot of narrowing, I am strongly suspecting it's some kind of compiler bug.  Unfortunately it doesn't look like CodeXL will disassemble Bonaire ISA yet so I can't confirm if it's doing something odd.  I also can't debug the kernel.

       

      Are there any known issues with register clobbering or similar?  I have 'AMD APP SDK Runtime 10.0.1124.2'.

       

      I'll try to make a standalone test, but this is the gist of the problem code:

       

      struct MyStruct *m = (__global struct MyStruct *)(basePtr + offset);

       

      if(m->magic != 123)
      {

           ... dump debug diagnostics to global memory  // This never happens

            return;
      }

       

      if(...)
      {

        // loads + arithmetic

         // no stores, and no touching 'm'

      }

      else
      {

         // loads + arithmetic

        // no stores, and no touching 'm'
      }

       

      if(m->magic != 123)
      {

           ... dump debug diagnostics to global memory // This always happens

           return;
      }

       

      The result is that I get the dump the 2nd time I check m->magic not the 1st.  Nothing should be modifying global memory here.  There's just the one kernel running with clFinish before and after - and it's 100% reproducible.

       

      I dumped 'basePtr', 'offset' and 'm' and I can see m is corrupt (m != basePtr + offset).

        • Re: OpenCL bug with HD 7790 (Bonaire)
          himanshu.gautam

          hi vmiura,

          What is the type of basePtr?

          Shouldn't this statement, increment basePtr based on its older type, rather than newer type. This might be the issue.

          struct MyStruct *m = (__global struct MyStruct *)(basePtr + offset);

           

          anyways, please share the testcase, if issue persists.

            • Re: OpenCL bug with HD 7790 (Bonaire)
              vmiura

              Hi Himanshu,

               

              basePtr is __global uchar *.  The offset in bytes is as intended.

               

              Do you know if I can view the ISA disassembly for Bonaire somehow?  It would help me confirm if it's odd code or if it might be something else.

               

              Regards,

              Victor

                • Re: OpenCL bug with HD 7790 (Bonaire)
                  himanshu.gautam

                  Does you structure contain doubles? I suspect some alignment issue here. (or) Alignment mismatch between host and GPU code.

                  What is the data-type of "magic"?

                  Can you please publish your structure? We are more intersted in the data-types than the actual names. So, if it is secret, just remove the field names (except magic) and let us know.....

                   

                  Also, If you are accessing the structure many times, you will be better of to use Structure of field arrays... instead of Array of Structures with fields. The former will make sure your memory accesses are coalesced.

                    • Re: OpenCL bug with HD 7790 (Bonaire)
                      vmiura

                      I'm not using doubles.

                       

                      Here is the structure (with names changed).

                       

                      struct MyStruct

                      {

                                union

                                {

                                          struct

                                          {

                                                    unsigned int magic;

                                                    short a, b;

                                                    short c, d;

                                                    unsigned int e;

                                                    unsigned int f;

                                                    unsigned int g;

                                                    int h, i, j;

                                                    short k, l, m, n;

                                                    unsigned char o;

                                                    unsigned char p;

                       

                       

                                                    unsigned char q, r, s, t;

                                                    unsigned char u, v;

                                                    unsigned char w;

                       

                       

                                                    unsigned char x;

                                                    unsigned char y;

                                                    unsigned char z, a0;

                                                    unsigned char b0,

                                                    unsigned char d0;

                                                    unsigned char e0;

                                                    unsigned char f0;

                                                    unsigned char g0[2];

                                                    unsigned char h0, i0, j0;

                                                    unsigned char k0;

                                          };

                                          unsigned int raw[17];

                                };

                      };

                       

                      Actually I added "magic" so that I could track when the struct was bad.  Originally I found out that my kernel was reading invalid data for field 'f', so then I added 'magic' to the head of the struct and checks in the kernel to make sure I was reading valid initialized data.

                       

                      I suspect the pointer, rather than the value of the struct.  I think the pointer register has gotten clobbered by earlier code.  If I remove the earlier code then it works.

                        • Re: OpenCL bug with HD 7790 (Bonaire)
                          vmiura

                          All threads in my wavefront read from the same element in this struct so coalescing works fine I think.

                           

                          What would be nice is to have a __scalar decorator added to OpenCL so that we can make full use of scalars, and wavefront constant branching in GCN .

                            • Re: OpenCL bug with HD 7790 (Bonaire)
                              himanshu.gautam

                              I think I am seeing some issues in your struct. I will get back on this....

                              Especially with its size and the alignment of uints (which require 4-byte alignment)....

                              Within the strucure, the alignment is fine. But your structure size seems not be a multiple of "sizeof(unsigned int)"

                              You may need to increase the size of the "uint array". Can you just check if your code works fine if you use "20" instead of "17"?

                                • Re: OpenCL bug with HD 7790 (Bonaire)
                                  vmiura

                                  By the way it should have been:

                                  > unsigned char b0,c0;

                                   

                                  Thanks for the idea.  I have already tried that as I suspected a sizeof differences between host and GPU, but it doesn't seem to be that.  The sizeof(MyStruct) following standard alignment rules is 68, which is 17 uints.  Also the base of the struct in __global mem is 16 byte aligned.

                                   

                                  When I hit the error check code here I grab a global atomic and dump the value of several registers to global memory which I then print on console.

                                   

                                  if(m->magic != 123)
                                  {

                                       ... dump debug diagnostics to global memory // This always happens

                                       return;
                                  }

                                   

                                  I am saving the value of 'm', ''basePtr', and 'offset'.

                                  I get something like:

                                    basePtr = 0x84000060

                                     offset = 0x0000240

                                     m = 0xffffffff  /// !!?

                                   

                                  m was correct when I first initialize it, and it's clobbered by -1 'after the if() else statement in the middle.

                          • Re: OpenCL bug with HD 7790 (Bonaire)
                            himanshu.gautam

                            vmiura wrote:

                             

                            Do you know if I can view the ISA disassembly for Bonaire somehow?  It would help me confirm if it's odd code or if it might be something else.

                             

                            Regards,

                            Victor

                            Well, i guess CodeXL/Kernel analyzer is the only way to get kernel ISA. If that cannot help you, plz share a testcase, I can try to run it here and confirm if the issue was reproducible.

                              • Re: OpenCL bug with HD 7790 (Bonaire)
                                vmiura

                                Hello,

                                 

                                I could look at the ISA using -save-temps.

                                 

                                The problem actually seems to be related to the "return" statement.  I think the compiler must have trouble with conditional return inside a do { } while loop.

                                 

                                Here's the overall control flow of my kernel.

                                 

                                __kernel foo(__global unsigned char *basePtr, ...)

                                {

                                          if()

                                          {

                                                    do

                                                    {

                                                              __global MyStruct *myStruct = (__global MyStruct *)(basePtr + offset);

                                 

                                                              if(x)

                                                              {

                                                                        ...

                                                              }

                                                              else

                                                              {

                                                                        ...

                                                              }

                                 

                                 

                                                              if(myStruct->magic != expectedVal)

                                                              {

                                                                        // Dump vars to global buffer

                                                                        return;  // <--- this return is mucking things up

                                                              }

                                 

                                 

                                                              while()

                                                              {

                                                                        if()

                                                                        {

                                                                                  ...

                                                                        }

                                 

                                 

                                                                        if()

                                                                        {

                                                                                  switch()

                                                                                  {

                                                                                            case: ...

                                                                                                      break;

                                                                                            case: ...

                                                                                                      break;

                                                                                            case: ...

                                                                                                      break;

                                                                                            default: ...

                                                                                                      break;

                                                                                  }

                                 

                                 

                                                                                  if()

                                                                                  {

                                                                                            ...

                                                                                  }

                                                                        }

                                                              }

                                                    } while();

                                          }

                                }

                                 

                                Basically I get the debug dump if I have the "return" statement there.  If I comment out the "return" then it never hits that code.

                                 

                                Sadly, this just shows why my debug code is not working as expected, but it's not showing why I'm getting my original bug on the 7790 .

                                • Re: OpenCL bug with HD 7790 (Bonaire)
                                  vmiura

                                  Hello,

                                   

                                  I found the bug and I have a workaround.

                                   

                                  I have some code that does:

                                   

                                    dstColor = (pkColor & (~fbmask)) | (dstColor & fbmask);

                                   

                                  The ISA disassembler shows the compiler cleverly used  v_bfi_b32 which implements vdst = (vsrc1 & vselect1) | (vsrc2 &~vselect1), but it has registers mixed up.  I get the opposite result of what it should be.

                                   

                                  If I instead use dstColor = bitselect(pkColor, dstColor, fbmask) then it works correctly.

                                   

                                  So... I think there's some bug in whatever peephole optimizer is generating v_bfi_b32.  I will try to make a small test case.

                                   

                                  Thanks,

                                  Victor