None of the 68 people who have viewed this thread know? Bummer.
BTW, I found the 5870 docs that talk about LDS, it appears that it's the same as the 3870, at least for calculating latency as described in the Stream Programming Guide (not OpenCL Prog. Guide)?
In openCL the LDS had to be emulated using global memory for 4xxx devices.
In 5xxx devices the LDS was changed in a way to confirm to the properties set by openCL spec, so performance was boosted by using LDS.
One of the improvements in Cayman is more efficiently moving data into the LDS from memory. In Cypress, moving data into the LDS takes a memory instruction and an ALU instruction. Data must first be loaded from memory into the register files and then subsequently moved from the register files into the LDS. Cayman can directly fetch from memory into the LDS, eliminating the ALU instruction altogether.