cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

tofic
Journeyman III

Is there a limit for the scratchpad size?

Hello all,

  I have a pretty fat (GPR-wise) kernel. It uses 948 scratch registers. I have a feeling that some overflow happens in this kernel, as during the execution of the fat place (where I have high stack workload) it shares flow control decisions per whole wavefront (e.g. if I have some flow control return, then the whole wavefront returns with it as soon as any thread (thread#0?) in the wavefront hits this return). I don't use any local memory inside the kernel.

  This kernel works correctly on both Nvidia and Intel archs. My GPU is one of HD7900 series, latest APP SDK and drivers.

  Did anyone have similar problems with scratchpad size? I can attach the compiled assembler code from APP Profiler if necessary.

Anton

0 Likes
4 Replies
tofic
Journeyman III

Just to confirm that: When I move my workload from GPRs to global memory (which is of course slower on many architectures), the bug disappears and the kernel starts working correctly.

It also works correctly (but even slower) if I reduce the local workgroup size to 1x1x1.

So it seems like there is some overflow in the scratchpad with ATI GPUs.

Anton

0 Likes

Hi Anton,

I think it's right. I think you should avoid using scratch registers, it has a bad effect on performance(Someone tells me). I think Nvidia and Intel also have this problem if the workload is big enough.

0 Likes

Hi dear Wenju,

  Thank you for your answer. Though I believe this is a wrong behavior. As I mentioned, both Nvidia and Intel handle this situation (when the scratchpad is huge) gracefully. Also I believe it should be developer's decision between the performance and the maintainance burden of the code.

0 Likes

It's said that AMD is working on this, but the result is not good. Maybe it's a rumour!

0 Likes