Try the 12.01 catalyst driver for 7970! With the latest driver I've found problems too when using CAL+7970, 12.01 is just fine.
On Evergreen I've used
- global buffer
- pinned 2D memory
- vWincoord (fastest thread indexing)
And then on the 7970 It just doung access violations, but I've changed some things:
- compute shader
- wathever(pinned/local/remote) 1D Linear(flag:global_buffer when you allocate) memory
- vAbsTIdFlat (the fastest one, I guess, then you can split it with fast mul24)
Fortunately the '7970 method' works perfect on Evergreen too (with the latest catalyst 11.10 if I recall).
Here's a test kernel, hope it helps.
uav_raw_store_id(0) mem.x, r0.y, r0.x
cb0: pinned, CAL_FORMAT_UNORM_INT32_4, CAL_RESALLOC_GLOBAL_BUFFER
uav0: pinned, CAL_FORMAT_UNORM_INT32_1, CAL_RESALLOC_GLOBAL_BUFFER
*note that uav format must be 1 component. If you specify 4, Evergreen will allocate only 1/4 amount of memory for it (it's a bug or my misunderstanding)
Finally run the program using RunProgramGrid() where the domain is 64(wavefrontsize) wide!
Hopefully no more black magic will be needed
thanks a lot for your help! I just got your test kernel working. The problem was that instead of calling calCtxRunProgramGrid() I used calCtxRunProgram(), which somehow didn't compute. Things would be so much easier if the driver simply told me what's wrong. So now I have to tweak the domain parameters. Seems like a number of changes have been made to the hardware since the 5870.