Does LDS force to use scatter output?

Discussion created by Ceq on Sep 1, 2009
Latest reply on Sep 2, 2009 by gaurav.garg

I've tried to write some test code using the LDS as I think is an interesting feature, however looks like using scatter for data output usually is slower than rewriting the algorithm to use multiple streaming kernels.

While using GroupSize attribute Brook+ compiler doesn't allow stream outputs, only scatter ones. Is this a hardware or a software limitation? Will it change in the future?

I think that even if the code scatters to sequential array positions the write is unbuffered, so it is really slow and if the computation is small this becomes soon a bottleneck. Also, contrary to other samples, LDS tutorial in Brook+ directory doesn't have a benchmark option to compare with CPU.