AnsweredAssumed Answered

Is there an elegant way to force recalculation (of values or addresses)

Question asked by lolliedieb on May 23, 2020
Latest reply on Jun 1, 2020 by dipak

Well the question in the title already hits it.

I got a rather simple kernel, which uses 20 vgpr and the complete 32 kByte of shared memory. So all fine for running 2x 1024 threads per work group. So fine so far.

But: in the moment I later on read stuff from the same addresses again for a 2nd pass over my data structure I can observe vgpr usage going up - and the reason is not data is stored there, but the addresses to be load from are cached.

Further I tried to extend my shared memory by parking some of the data in registers - but also then when swaping memory multiple times between vreg and shared memory the addresses used get cached in vgpr too. Both this effects cause my used vgpr count to explode far behind the 32 registers I can use without loosing occupancy.

Now my question is: is there a way to prevent the compiler to store this addresses? Because the calculation for them is a very easy formula done with just a few integer ops depending on get_group_id(0) and get_local_id(0) and should be way cheap enough to not hurt too compared with the lower occupancy. Sadly I can not use inline assembly since I need to be able to compile this code with the just in time compiler on PAL platform and Windows for Polaris and Vega architecture - and that drivers still fail to provide this feature ROCm has since year now

Thanks

Outcomes