In openCL the LDS had to be emulated using global memory for 4xxx devices.
In 5xxx devices the LDS was changed in a way to confirm to the properties set by openCL spec, so performance was boosted by using LDS.
One of the improvements in Cayman is more efficiently moving data into the LDS from memory. In Cypress, moving data into the LDS takes a memory instruction and an ALU instruction. Data must first be loaded from memory into the register files and then subsequently moved from the register files into the LDS. Cayman can directly fetch from memory into the LDS, eliminating the ALU instruction altogether.