galmok

Extreme kernel slowdown

Discussion created by galmok on Feb 3, 2011
Latest reply on Feb 3, 2011 by galmok
with large allocations

I have been running into a memory problem with allocating large buffers for my kernel in that the kernel at a certain limit runs extremely slow.

I can reproduce this issue using AMD's MatrixMulDouble sample program when using matrices of size 3632 or larger. This will work fine:

N:\work\ATI Stream\samples\opencl\bin\debug\x86>MatrixMulDouble.exe -x 3616 -y 3616 -z 3616 -q -t
Platform :Advanced Micro Devices, Inc.
Device 0 : Cypress

Executing kernel for 1 iterations
-------------------------------------------
MatrixA                  MatrixB                  Time(sec)                KernelTime(sec)
3648x3616                3616x3648                1.41411                  0.761971

While this causes extreme time consumption:

N:\work\ATI Stream\samples\opencl\bin\debug\x86>MatrixMulDouble.exe -x 3632 -y 3632 -z 3632 -q -t
Platform :Advanced Micro Devices, Inc.
Device 0 : Cypress

Executing kernel for 1 iterations
-------------------------------------------
MatrixA                  MatrixB                  Time(sec)                KernelTime(sec)
3648x3632                3632x3648                11.441                   10.7991

 

I have tried to set GPU_MAX_HEAP_SIZE to a value larger than 50 but this didn't change anything.

3632*3632*8 = 105531392 = 100.64MB (too large)

3616*3616*8 = 104603648 = 99.76MB (works fine)

It seems buffers over 100MB causes severel kernel slowdowns.

In my own kernel I upload 3 buffers the same size as mentioned and when it works fine, all 3 buffers are uploaded right after each other. With buffers that are too large, there is a 1.4 second delay between the second and third buffer. The kernel also uses way more time than it should.

This 100MB buffer limit, is it a know issue and is there any work-around?

Outcomes