Let's assume I have a thread group of size 64, one wavefront. That group uses the entire LDS by means of
Is it correct that I cannot issue a transaction with external memory such as a read without stalling execution on that SIMD unit until this particular read is returned? Meaning, the wavefront cannot be descheduled because it occupies the entire LDS?
If so, is there a way my kernel can inform the chip that it doesn't need the LDS anymore, so that in the above scenario other threads can run?
Memory access latency is hidden by finding another hardware thread for the ALUs to work on. You have declared that you only have a single hardware thread for the ALUs to work on. So you will get no latency hiding.
It's not possible to define a kernel so that it "releases" its LDS allocation. You need to start a new kernel (with no LDS allocation), which would mean saving data from the first kernel so that the second kernel could see it.