I'm working on a relatively big OpenCL program, targeting a 5870. Host code is a loop which launches several kernels in sequence, i.e. each launch is immediately followed by clFinish().
Started out CUDA-style (since that's where I'm coming from) with 9 big kernels containing plenty of code. Worked fine on the CPU, not at all on the GPU; it quickly led to heap corruption on the host, followed by the display driver dying. Same story both under Windows (Vista Ultimate 64, with Catalyst 10.3, 10.4 and 10.5) and Linux (openSUSE 11.2).
Broke up the kernels in smaller parts, and a pattern emerged: when the SKA reports scratch registers > 0 for a kernel, launching that kernel repeatedly causes a crash. I now have 74 (!) relatively small kernels, two of them using scratch registers (10 and 8 of them, respectively). Under Windows (Catalyst 10.5), the second launch of the first kernel invariably results in a read access violation in aticaldd:
First-chance exception at 0x6904e18c in xxxxxx.exe: 0xC0000005: Access violation reading location 0x00000018.
Unhandled exception at 0x6904e18c in xxxxxx.exe: 0xC0000005: Access violation reading location 0x00000018.
[Frames below may be incorrect and/or missing, no symbols loaded for aticaldd.dll]
Unfortunately, getting rid of those last scratch registers is proving hard.
Is anyone else having the same kind of problem? Any advice?