5 Replies Latest reply on Feb 3, 2011 1:38 PM by galmok

    Extreme kernel slowdown

    galmok
      with large allocations

      I have been running into a memory problem with allocating large buffers for my kernel in that the kernel at a certain limit runs extremely slow.

      I can reproduce this issue using AMD's MatrixMulDouble sample program when using matrices of size 3632 or larger. This will work fine:

      N:\work\ATI Stream\samples\opencl\bin\debug\x86>MatrixMulDouble.exe -x 3616 -y 3616 -z 3616 -q -t
      Platform :Advanced Micro Devices, Inc.
      Device 0 : Cypress

      Executing kernel for 1 iterations
      -------------------------------------------
      MatrixA                  MatrixB                  Time(sec)                KernelTime(sec)
      3648x3616                3616x3648                1.41411                  0.761971

      While this causes extreme time consumption:

      N:\work\ATI Stream\samples\opencl\bin\debug\x86>MatrixMulDouble.exe -x 3632 -y 3632 -z 3632 -q -t
      Platform :Advanced Micro Devices, Inc.
      Device 0 : Cypress

      Executing kernel for 1 iterations
      -------------------------------------------
      MatrixA                  MatrixB                  Time(sec)                KernelTime(sec)
      3648x3632                3632x3648                11.441                   10.7991

       

      I have tried to set GPU_MAX_HEAP_SIZE to a value larger than 50 but this didn't change anything.

      3632*3632*8 = 105531392 = 100.64MB (too large)

      3616*3616*8 = 104603648 = 99.76MB (works fine)

      It seems buffers over 100MB causes severel kernel slowdowns.

      In my own kernel I upload 3 buffers the same size as mentioned and when it works fine, all 3 buffers are uploaded right after each other. With buffers that are too large, there is a 1.4 second delay between the second and third buffer. The kernel also uses way more time than it should.

      This 100MB buffer limit, is it a know issue and is there any work-around?

        • Extreme kernel slowdown
          nou

          i have seen similiar behavior with little highr values like 3800.

          but i try experiment with GPU_INITIAL_HEAP_SIZE.

          i run ./MatrixMulDouble -x 4096 -y 4096 -z 4096 -q -t with this results.

          MatrixA                  MatrixB                  Time(sec)                KernelTime(sec)         
          4096x4096                4096x4096                10.498                   9.506

          with set GPU_MAX_HEAP_SIZE to 400 i got

          MatrixA                  MatrixB                  Time(sec)                KernelTime(sec)         
          4096x4096                4096x4096                2.441                    1.432