About execution domain dimensions

Discussion created by Raistmer on Sep 5, 2010
Latest reply on Sep 7, 2010 by rotor
how to chose them properly?

I have some very long kernel that should run over big execution domain.
If I enqueue it over whole domain it causes driver restart (due too long execution).
So I split execution domain on blocks and call that kernel over smaller parts.

When I use (128x2) grid size it runs OK. But with (256x1) it causes driver restart.
What can be reasons for such different behavior with same number of threads ?

I use HD4870. It has (AFAIK) 10 compute units that is, for better GPU usage, some of dimensions should be divisible by compute units number (in my case - by 10). But also, for better load each wavefront should have 64 threads.
Do I understand right that for meeting both requirements first (X-axis) dimension should be divisible by 64 while second one (Y-axis) shuld be divisible by 10 (in my case).

In other words, is it true that let say 128x10 will work on my GPU better than 10x128 ?