Archives Discussions

ekondis · ‎11-21-2015

As I had been aware each CU in GCN GPUs has 64KB of LDS. However, in the respective ISA manuals a 32KB size is stated per CU (e.g. paragraph 2.3.1/page 2-4 of AMD_Sea_Islands_Instruction_Set_Architecture1.pdf). What's the actual size? Why is there such a confusion?

realhet · ‎11-21-2015

Evergreen had 32KB. I guess they just copied the manual and didn't checked.

You can see all the presentations and diagrams advertising GCN architecture, and they all about 64K. For example in the GCN white paper they stated: "The LDS capacity in GCN has doubled to 64KB with 16 or 32 banks (depending on the product)."

Or if you're still not sure, check Table 5.10LDS_ALLOC in the ISA manual: This is new information, was not copied. LDS base offset is 8bits which addresses 64DWord cells -> this is 64KB total. However based on that manual, the LDS size has 9bits (128KB) which is wrong, because it has only 7 bits.

View solution in original post

realhet · ‎11-21-2015

Hi,

32KB is the maximum that can be allocated by a single workgroup. If there are at least 2 workgroups queued on a CU then it is possible to use all the physical 64KB of LDS.

ekondis · ‎11-21-2015

That is what I was aware of but in the ISA reference it is stated: "Each compute unit has a 32 kB memory space that enables low-latency communication between work-items within a work-group, or the work-items within a wavefront; this is the local data share (LDS)." (paragraph 2.3.1 "Sea Islands ISA reference").

The same is fact is depicted in figure 2.1 where each CU is illustrated as having a 32KB sized LDS (with a label). It doesn't seem to note anywhere that its physical size is 64KB.

realhet · ‎11-21-2015

Evergreen had 32KB. I guess they just copied the manual and didn't checked.

You can see all the presentations and diagrams advertising GCN architecture, and they all about 64K. For example in the GCN white paper they stated: "The LDS capacity in GCN has doubled to 64KB with 16 or 32 banks (depending on the product)."

Or if you're still not sure, check Table 5.10LDS_ALLOC in the ISA manual: This is new information, was not copied. LDS base offset is 8bits which addresses 64DWord cells -> this is 64KB total. However based on that manual, the LDS size has 9bits (128KB) which is wrong, because it has only 7 bits.

ekondis · ‎11-22-2015

AMD should spend more effort in keeping their documentation up to date.

Thanks for your clarification.

realhet · ‎11-23-2015

Yea, low level documentation is just not 100% correct. Not many of us reading that, but I'm happy that they are exists at all.

So there is an lds_direct input operand for every GCN instruction. (I've never used them, 'though).

And it has a 16 bit byte offset in the low part of M0 register. So maybe it could be used for 64 KBytes.

But the LDS hardware has an address calculation logic, that doesn't allow you to read/write outside your LDS area. (If you're use 128bit resource constants properly, then there is a 'protection' to the GPU memory as well.)

So I think every wavefront queued in a CU can access any place in the LDS mem, it's only the driver which restrict the area those wavefronts can access, and you just can't do anything against it.

(Btw there are more interesting things about the LDS than its size: For example if you have a 8KB lookup table that is accessed all the time, then you should do it 75% from the LDS and 25% from the ram (L1 cache), that leads to optimum performance on GCN.)

tzachi_cohen · ‎11-24-2015

Each compute unit has 64KB of LDS, however, at most only 32KB can be allocated to a single wavefront (or workgroup for that matter).

I.e. you can utilize the full 64KB of LDS as long as you have more than two wavefronts executing concurrently on the same CU.

Tzachi

Archives Discussions

GCN LDS size