The programming guide says that the first 64 threads conpose a wavefront which is executed on the first SIMD engine, so is the following wavefront. My problem is how does the wavefront switched between each other. In CUDA, the warps (equal to wavefront) are switched every instruction. So how this happens in AMD's GPU?
Each you search "wavefront" in this forum you might find some VERY useful threads about the topic.
As far as I know they switch between clauses, if needed and also they run in parallel between instructions. For example, it seems there are two slots (odd and even) for wavefronts on each TP, my limited assumption is that two wavefronts map to a SIMD at one time (more WFs get mapped to that SIMD, if registers allow it's just that they don't run "at the same time" but they do "run in parallel" and by "parallel" they mean "switched between clauses"), the same quad from each wf gets mapped to a TP on that SIMD and the threads in those quads get executed in parallel, odd and even.... this is how it was explained to me (I think, lol), but like I said, please search "wavefront" in this forum and you will find more.
Originally posted by: ryta1203 Each you search "wavefront" in this forum you might find some VERY useful threads about the topic.
As far as I know they switch between clauses, if needed and also they run in parallel between instructions.
Thank you ryta, I have browse all the topics about "wavefront". In fact, I could not find a determinate answer.
As you said, the wavefronts switch between clauses, why did you say it also run in parallel between instructions. What is the real granularity of switching between wavefronts?
Look at the "Calculate the Bottleneck" thread. There are even and odd slots, so even though 4 instr run over 4 cycles (according to the docs, which are apparently not true) they only do this if there are an even number of WFs. EDIT: meaning that if you have 5 WFs total, then 2 go, 2 go and then 1 go... when the 1 goes it wastes half the execution units. That is my understanding of the Bottleneck thread.
The documentation is horrid, at best.