can somebody shed some light on best register usage. In another topic I just read that any thread can at most use 122 or so registers. That's because of performance, since at least two wavefronts should be active at the same time.
Now, what is a wavefront? Is it at least 16 and at most 64 threads, depending on register usage? Somewhere I believe I have read that on any given thread processor at most 4 threads are executing in an interleaved way to account for pipeline latency.
So, does this mean that I would have to divide the 256 registers of a thread processor by 8 to get maximum performance from the ALUs?
Thanks for any information.
Please read the documentation.
Particularly the new AMD OpenCL Programming Guide has a lot of hardware information, including wavefronts and register allocation.