Archives Discussions

Ceq · ‎09-01-2009

I've tried to write some test code using the LDS as I think is an interesting feature, however looks like using scatter for data output usually is slower than rewriting the algorithm to use multiple streaming kernels.

While using GroupSize attribute Brook+ compiler doesn't allow stream outputs, only scatter ones. Is this a hardware or a software limitation? Will it change in the future?

I think that even if the code scatters to sequential array positions the write is unbuffered, so it is really slow and if the computation is small this becomes soon a bottleneck. Also, contrary to other samples, LDS tutorial in Brook+ directory doesn't have a benchmark option to compare with CPU.

gaurav_garg · ‎09-02-2009

This is a hardware limitation. LDS can be used only in compute shader mode that only allows scatter streams.

Brook+ implementation of Scatter does some copy of data from linear memory to tiled memory in case you use a 2D scatter stream. You can avoid it by using 1D scatter streams.

Archives Discussions

Does LDS force to use scatter output?