cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

alexs_mac
Adept I

HD 7970 Compiler Segfault - Windows & Linux

Dear All,

I have a program (attached) which, when compiled either in Windows 7 using the AMD Kernel Analyzer, or in Ubuntu 12.04 (just calling program.build()) targeting an HD 7970 causes the AMD OpenCL compiler to segfault (every time) using the latest versions of Catalyst (12.8) & AMD APP (2.7) - see at the bottom for exact details.

The same program compiles and the kernels run with no issues, on an Nvidia GTX-680 / Geforce GT 330M (CUDA 4.2), the Intel OpenCL SDK 2012 (targeting a quad-core Intel Xeon [Nehalem]) or even using the AMD APP v2.7 but targeting the same Intel Xeon machine.

The program also compiles when using the Kernel Analyzer to target a different GPU architecture (e.g. Cypress) although other architectures (Capeverde, Cayman, Pitcairn [& Tahiti]) all segfault.  For the architectures that *do* compile, the Analyzer outputs a warning about FrameToPriorJointAppearanceHGMJacobianKernel having register spilling, so worse performance is expected.  This is odd, as a very similar kernel (which actually uses more private memory, and the same amount of local memory) produces no such errors in the Analyzer, and compiles and runs under Tahiti - the Nvidia analyzers also produce no such warnings for the GTX-680 for either kernel, and the GTX-680 has less local memory per core than the HD 7970.

The HD 7970 on which this is being tested can compile and run the AMD sample code (and a bunch of our other kernels) so it seems likely that it's just a compiler bug rather than a hardware issue.

From what I can find, this forum appears to be the (indirect) bug-reporting system for AMD, is this correct?  Or is a separate bugzilla floating around for bug reports? - thanks!

-Alex

------------------

Test Setup (causes AMD OpenCL compiler segfault):

- Saphire HD 7970

1. Windows 7

-- AMD Catalyst 12.8 / AMD APP 2.7

2. Ubuntu 12.04

-- AMD Catalyst 12.8 / AMD APP 2.7

------------------

Reference Setups (compiles & runs - no issues):

1. Ubuntu 12.04

-- Nvidia GTX-680 - CUDA 4.2

-- Intel quad-core Xeon Nehalem - Intel OpenCL SDK 2012

-- Intel quad-core Xeon Nehalem - AMD APP v2.7 (targeting x86)

2. OS X 10.7.4

-- Nvidia Geforce GT 330M - OS X OpenCL implementation.

0 Likes
11 Replies
binying
Challenger

Can this kernel compile and run in a regular way, not in the analyzer?

0 Likes

It can compile & run just fine (all unit tests pass etc) on the GTX-680 (Ubuntu 12.04) / Geforce GT 330M (OS X) cards - but I can't get it to compile at all using the AMD APP under either Linux or Windows, either inside or outside of the Analyzer.

0 Likes

If you replace line 553

   for ( uint fr=0; fr<CubicBSplineOrderPerDimSqrd; ++fr ) {

with

    for ( uint fr=0; fr< 3 ; ++fr ) {

it compiles fine on Tahiti in the Kernel Analyzer.

constant CubicBSplineOrderPerDimSqrd = 16, so some number above 3 and below 16 is causing the compiler to choke.

You might wonder how I found that in 2 minutes. The older compiler versions had a problem unrolling simple loops (on Tahiti) with large loop counters (like 5000). So, I replaced all the loop counters with a small number and it compiled, then worked backwards. This is the only loop with a problem.

If you have the latest version compiler I would be interested to know if #pragma unroll 1 fixes it, that too had a problem in early compiler versions.

drallan

Thanks for all your help with this - you're right, I can get the kernel to compile and run (although obviously it gets the wrong answer) by setting the termination value of the loop on line 553 to be small (<3).

Unfortunately adding the #pragma unroll 1 (and leaving the loop termination value at 16) doesn't fix it, the compiler still segfaults

It's interesting that the kernel I mentioned that is very similar to this one (but works) has an outer loop wrapped around a very similar loop to the one on line 553 in this kernel - presumably that's enough to prevent whatever optimisations the compiler is attempting to perform (and screwing up) from taking place.

It's a real shame, as on paper the HD 7970 should be much faster for this problem than the GTX-680, but this bug is something of a show-stopper for us.

0 Likes

It's interesting that the kernel I mentioned that is very similar to this one (but works) has an outer loop wrapped around a very similar loop to the one on line 553 in this kernel - presumably that's enough to prevent whatever optimisations the compiler is attempting to perform (and screwing up) from taking place.

Nested loops didn't work but splitting the loop in 2 smaller ones works fine, i.e.,

for(fr = 0 ; fr < 8 ; ++fr ){ blah}

for(fr = 8;  fr < 16; ++ fr){more blah}

It does not register spill on Tahiti but uses something over 300 registers, so it would be interesting to hear how it performs.

It does seem to be a compiler error. I don't know the official bug reporting channel but a number of AMD technical people are fairly active in this forum.

Thanks - I was thinking of trying this too, I don't have the HD 7970 dev machine to hand at the moment, but I'll give it a try and report back.

The apparent absence of a formal bug reporting system for the AMD OpenCL tools is a bit tedious though, especially in this case when it seems pretty evident it is a genuine bug that needs to be fixed.

0 Likes

You could try to disable compiler optimizations by passing the parameter "-cl-opt-disable" when building the source code. That worked for me with some other problem that made the compiler seg fault.

See http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/clBuildProgram.html

0 Likes
peakitde
Journeyman III

Nice Thanx to share this.

0 Likes
Marix
Adept II

Sorry for the possibly stupid question, but how do you target the AMD Radeon HD 7970 in the APP Kernel Analyzer? Version 1.12, which still claims to be the latest, does not allow to choose Tahiti as a target.

0 Likes

It wasn't obvious to me either - you need to go into the KernelAnalyzer options (Edit > Options from memory, I don't have my Windows 7 box available), then select: 'use installed Catalyst version XX.XX' for your Catalyst version - provided that this is recent enough to contain Tahiti (e.g. 12.8, although I think some earlier versions also worked) then Tahiti should become an option under the GPU architectures.

Thank you very much. I always thought it brought its own drivers. Also found out now, that the Kernel Analyzer 2, which comes with CodeXL does finally support Tahiti.

0 Likes