Perhaps, there's something that I'm not seeing in the docs, so I apologize in advance.
I've got 16 dwords in scalar registers s16-s31. I need to copy that data from the scalar registers to GDS at the GDS base address + 64 bytes offset. The best way I see so far to do this is to
- mask all lanes but one in the wave with the exec;
- move the data from the scalar to the vector registers;
- issue a bunch of ds_write_bNNN gds instructions;
- re-enable all lanes with the exec.
This sounds cumbersome. Is there a better way to store the data from a bunch of scalar registers to GDS?
I've attached a screenshot of the concept of the code that I have (sorry for blanking out the tiny part that's under the NDA, I promise it doesn't matter for this question). This code may not compile or anything, this is a conceptual explanation of what's going on. For the purposes of limiting and simplifying this question, assume that we only have one wave - wave 0 - going over the entire GPU. If it matters, the platform is ROCm on Linux, and the card is Vega 64.
Surely, there must be a better way to do this, so I must be missing something? What is it? What's the best way of copying the data from scalar registers to GDS?
(Thank you in advance.)