2 Replies Latest reply on Mar 19, 2017 9:14 AM by boxerab

    Order of buffer allocation affects performance

    boxerab

      I don't know if anyone else has experienced this, but I just found that changing the order in which I allocate

      opencl buffers has a dramatic effect on performance. I have a group of buffers to hold uncompressed data,

      and another group of buffers to hold compressed data.  If I allocate all of the uncompressed buffers first, and then

      allocate all of the compressed buffers, performance is dramatically higher than if I interleave the allocation i.e.

      allocate one uncompressed buffer, then one compressed, then one uncompressed etc.

       

      This is for Ellesmere arch. I saw a similar issue with Cape Verde, where if I allocated a small dummy buffer, performance

      went way up.

       

      In the Cape Verde case, I was told this was related to the memory channels on the card, and I suppose for Ellesmere as well,

      if the uncompressed buffer and corresponding compressed buffer are assigned to the same channel, then performance is better.

      My app is very memory-intensive.

        • Re: Order of buffer allocation affects performance
          boxerab

          It would be nice to have a deep-dive into how the memory controllers are designed on Polaris. Also, would be nice to provide

          hints to compiler to place certain buffers on the same memory controller.

           

          For my app, when I compress an image, I have a number of buffers assigned to that image, and having these buffers assigned to the same

          controller (if that is what is going on) seems to give a huge performance boost ( around 100% faster performance)

            • Re: Order of buffer allocation affects performance
              boxerab

              Any advice on how buffer allocation order can affect performance?

               

              I also read in AMD opencl best practices guide that there are two DMA engines on cards, and command queues are assigned to one or the other

              engine based on when they are allocated : first queue goes to engine 1, second queue to engine 2, third queue to engine 1 etc.

               

              I suppose the same logic applies to memory controllers on card ?