Ok, sorry about the "Horrible 5870 performance" but this goes to the same topic...
... why is the 64x1 block size performance so horrid?
Compute shader might be faster but you really need to know how to get perfect texture fetch to make it so.
Accessing naively (64x1) gives HORRIBLE performance... WAY worse than pixel shader mode. And if LDS isn't any faster... I mean how many applications out there really need LDS?