11 Replies Latest reply on Oct 4, 2012 8:26 AM by smistad

    HD 7970 Compiler Segfault - Windows & Linux

    alexs.mac

      Dear All,

       

      I have a program (attached) which, when compiled either in Windows 7 using the AMD Kernel Analyzer, or in Ubuntu 12.04 (just calling program.build()) targeting an HD 7970 causes the AMD OpenCL compiler to segfault (every time) using the latest versions of Catalyst (12.8) & AMD APP (2.7) - see at the bottom for exact details.

       

      The same program compiles and the kernels run with no issues, on an Nvidia GTX-680 / Geforce GT 330M (CUDA 4.2), the Intel OpenCL SDK 2012 (targeting a quad-core Intel Xeon [Nehalem]) or even using the AMD APP v2.7 but targeting the same Intel Xeon machine.

       

      The program also compiles when using the Kernel Analyzer to target a different GPU architecture (e.g. Cypress) although other architectures (Capeverde, Cayman, Pitcairn [& Tahiti]) all segfault.  For the architectures that *do* compile, the Analyzer outputs a warning about FrameToPriorJointAppearanceHGMJacobianKernel having register spilling, so worse performance is expected.  This is odd, as a very similar kernel (which actually uses more private memory, and the same amount of local memory) produces no such errors in the Analyzer, and compiles and runs under Tahiti - the Nvidia analyzers also produce no such warnings for the GTX-680 for either kernel, and the GTX-680 has less local memory per core than the HD 7970.

       

      The HD 7970 on which this is being tested can compile and run the AMD sample code (and a bunch of our other kernels) so it seems likely that it's just a compiler bug rather than a hardware issue.

       

      From what I can find, this forum appears to be the (indirect) bug-reporting system for AMD, is this correct?  Or is a separate bugzilla floating around for bug reports? - thanks!

       

      -Alex

       

      ------------------

      Test Setup (causes AMD OpenCL compiler segfault):

      - Saphire HD 7970

      1. Windows 7

      -- AMD Catalyst 12.8 / AMD APP 2.7

      2. Ubuntu 12.04

      -- AMD Catalyst 12.8 / AMD APP 2.7

       

      ------------------

      Reference Setups (compiles & runs - no issues):

      1. Ubuntu 12.04

      -- Nvidia GTX-680 - CUDA 4.2

      -- Intel quad-core Xeon Nehalem - Intel OpenCL SDK 2012

      -- Intel quad-core Xeon Nehalem - AMD APP v2.7 (targeting x86)

      2. OS X 10.7.4

      -- Nvidia Geforce GT 330M - OS X OpenCL implementation.

        • Re: HD 7970 Compiler Segfault - Windows & Linux
          binying

          Can this kernel compile and run in a regular way, not in the analyzer?

            • Re: HD 7970 Compiler Segfault - Windows & Linux
              alexs.mac

              It can compile & run just fine (all unit tests pass etc) on the GTX-680 (Ubuntu 12.04) / Geforce GT 330M (OS X) cards - but I can't get it to compile at all using the AMD APP under either Linux or Windows, either inside or outside of the Analyzer.

                • Re: HD 7970 Compiler Segfault - Windows & Linux
                  drallan

                  If you replace line 553

                     for ( uint fr=0; fr<CubicBSplineOrderPerDimSqrd; ++fr ) {

                  with

                      for ( uint fr=0; fr< 3 ; ++fr ) {

                  it compiles fine on Tahiti in the Kernel Analyzer.

                   

                  constant CubicBSplineOrderPerDimSqrd = 16, so some number above 3 and below 16 is causing the compiler to choke.

                   

                  You might wonder how I found that in 2 minutes. The older compiler versions had a problem unrolling simple loops (on Tahiti) with large loop counters (like 5000). So, I replaced all the loop counters with a small number and it compiled, then worked backwards. This is the only loop with a problem.

                   

                  If you have the latest version compiler I would be interested to know if #pragma unroll 1 fixes it, that too had a problem in early compiler versions.

                   

                  drallan

                  1 of 1 people found this helpful
                    • Re: HD 7970 Compiler Segfault - Windows & Linux
                      alexs.mac

                      Thanks for all your help with this - you're right, I can get the kernel to compile and run (although obviously it gets the wrong answer) by setting the termination value of the loop on line 553 to be small (<3).

                       

                      Unfortunately adding the #pragma unroll 1 (and leaving the loop termination value at 16) doesn't fix it, the compiler still segfaults

                       

                      It's interesting that the kernel I mentioned that is very similar to this one (but works) has an outer loop wrapped around a very similar loop to the one on line 553 in this kernel - presumably that's enough to prevent whatever optimisations the compiler is attempting to perform (and screwing up) from taking place.

                       

                      It's a real shame, as on paper the HD 7970 should be much faster for this problem than the GTX-680, but this bug is something of a show-stopper for us.

                        • Re: HD 7970 Compiler Segfault - Windows & Linux
                          drallan

                           

                          It's interesting that the kernel I mentioned that is very similar to this one (but works) has an outer loop wrapped around a very similar loop to the one on line 553 in this kernel - presumably that's enough to prevent whatever optimisations the compiler is attempting to perform (and screwing up) from taking place.

                          Nested loops didn't work but splitting the loop in 2 smaller ones works fine, i.e.,

                           

                          for(fr = 0 ; fr < 8 ; ++fr ){ blah}

                          for(fr = 8;  fr < 16; ++ fr){more blah}

                           

                          It does not register spill on Tahiti but uses something over 300 registers, so it would be interesting to hear how it performs.

                           

                          It does seem to be a compiler error. I don't know the official bug reporting channel but a number of AMD technical people are fairly active in this forum.

                          1 of 1 people found this helpful
                            • Re: HD 7970 Compiler Segfault - Windows & Linux
                              alexs.mac

                              Thanks - I was thinking of trying this too, I don't have the HD 7970 dev machine to hand at the moment, but I'll give it a try and report back.

                               

                              The apparent absence of a formal bug reporting system for the AMD OpenCL tools is a bit tedious though, especially in this case when it seems pretty evident it is a genuine bug that needs to be fixed.

                            • Re: HD 7970 Compiler Segfault - Windows & Linux
                              smistad

                              You could try to disable compiler optimizations by passing the parameter "-cl-opt-disable" when building the source code. That worked for me with some other problem that made the compiler seg fault.

                               

                              See http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/clBuildProgram.html

                      • Re: HD 7970 Compiler Segfault - Windows & Linux
                        peakitde

                        Nice Thanx to share this.

                        • Re: HD 7970 Compiler Segfault - Windows & Linux
                          Marix

                          Sorry for the possibly stupid question, but how do you target the AMD Radeon HD 7970 in the APP Kernel Analyzer? Version 1.12, which still claims to be the latest, does not allow to choose Tahiti as a target.