cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Curiouscat
Journeyman III

Scratch registers = aticaldd crash

when launching the kernel repeatedly

I'm working on a relatively big OpenCL program, targeting a 5870. Host code is a loop which launches several kernels in sequence, i.e. each launch is immediately followed by clFinish().

Started out CUDA-style (since that's where I'm coming from) with 9 big kernels containing plenty of code. Worked fine on the CPU, not at all on the GPU; it quickly led to heap corruption on the host, followed by the display driver dying. Same story both under Windows (Vista Ultimate 64, with Catalyst 10.3, 10.4 and 10.5) and Linux (openSUSE 11.2).

Broke up the kernels in smaller parts, and a pattern emerged: when the SKA reports scratch registers > 0 for a kernel, launching that kernel repeatedly causes a crash. I now have 74 (!) relatively small kernels, two of them using scratch registers (10 and 8 of them, respectively). Under Windows (Catalyst 10.5), the second launch of the first kernel invariably results in a read access violation in aticaldd:

First-chance exception at 0x6904e18c in xxxxxx.exe: 0xC0000005: Access violation reading location 0x00000018.
Unhandled exception at 0x6904e18c in xxxxxx.exe: 0xC0000005: Access violation reading location 0x00000018.

Call stack:

> aticaldd.dll!6904e18c()  
  [Frames below may be incorrect and/or missing, no symbols loaded for aticaldd.dll] 
  aticaldd.dll!6904b5f7()  
  aticaldd.dll!69044c26()  
  aticaldd.dll!69012f9d()  
  aticaldd.dll!69008b0a()  
  aticaldd.dll!69003e23()  
  atiocl.dll!0284d4df()  
  atiocl.dll!0285a105()  
  atiocl.dll!0285ab90()  
  atiocl.dll!0285ae54()  
  atiocl.dll!0285fc8b()  
  atiocl.dll!0285e606()  
  kernel32.dll!7560eccb()  
  ntdll.dll!7706d24d()  
  ntdll.dll!7706d45f()

Unfortunately, getting rid of those last scratch registers is proving hard.

Is anyone else having the same kind of problem? Any advice?

0 Likes
8 Replies

Curious Cat,
Is this something you can send us as a test case so that we can verify that it is fixed before our next release. You can send it to streamdeveloper@amd.com, Re: me.
0 Likes
malcolm3141
Journeyman III

Hi Curios Cat,

I've had crashes similar to this (on second invocation of a kernel only), but unrelated to scratch registers. The only way I managed to get around this was to use a barrier in place of all memory fence operations.

Also worth checking if you are using the C++ api, make sure the ref counting is not freeing any buffers unexpectedly. (doubt this is your problem tho).

Other than that, its up to AMD to find and fix this bug...

Malcolm

0 Likes

Thanks, malcolm. I did start out using the C++ bindings, but switched to C when I saw the heap corruption, in order to rule out a bug in that layer. Didn't help. 😕

Micah, I will try to create a smaller example for you (presumably over the weekend) which reproduces the problem. Who knows, maybe doing so I will finally find something Really Stupid I've been doing all along.

0 Likes

Curious Cat,
We've reproduced a crash, not sure if it is the same one you see, but are investigating it.
0 Likes

Yay. Sent you some code today, but maybe the "Re: me" should have been "Re: MicahWillmow"... :">

0 Likes

Update: got back to this after almost a month's pause (waiting for a fix) and managed to get rid of all scratch registers.

Still crashes the same way.

So I guess the scratch register count was just a symptom of code complexity, and the correct problem description would be "long expressions = aticaldd crash".

0 Likes

curious cat, this is fixed in the next catalyst release.
0 Likes

Excellent news, thanks! Now I'll be checking the downloads page every five minutes until that's out.

0 Likes