1 Reply Latest reply on Sep 2, 2009 4:42 AM by gaurav.garg

    Does LDS force to use scatter output?

    Ceq

      I've tried to write some test code using the LDS as I think is an interesting feature, however looks like using scatter for data output usually is slower than rewriting the algorithm to use multiple streaming kernels.

      While using GroupSize attribute Brook+ compiler doesn't allow stream outputs, only scatter ones. Is this a hardware or a software limitation? Will it change in the future?

      I think that even if the code scatters to sequential array positions the write is unbuffered, so it is really slow and if the computation is small this becomes soon a bottleneck. Also, contrary to other samples, LDS tutorial in Brook+ directory doesn't have a benchmark option to compare with CPU.

        • Does LDS force to use scatter output?
          gaurav.garg

          This is a hardware limitation. LDS can be used only in compute shader mode that only allows scatter streams.

          Brook+ implementation of Scatter does some copy of data from linear memory to tiled memory in case you use a 2D scatter stream. You can avoid it by using 1D scatter streams.