I have been running into a memory problem with allocating large buffers for my kernel in that the kernel at a certain limit runs extremely slow.
I can reproduce this issue using AMD's MatrixMulDouble sample program when using matrices of size 3632 or larger. This will work fine:
N:\work\ATI Stream\samples\opencl\bin\debug\x86>MatrixMulDouble.exe -x 3616 -y 3616 -z 3616 -q -t
Platform :Advanced Micro Devices, Inc.
Device 0 : Cypress
Executing kernel for 1 iterations
-------------------------------------------
MatrixA MatrixB Time(sec) KernelTime(sec)
3648x3616 3616x3648 1.41411 0.761971
While this causes extreme time consumption:
N:\work\ATI Stream\samples\opencl\bin\debug\x86>MatrixMulDouble.exe -x 3632 -y 3632 -z 3632 -q -t
Platform :Advanced Micro Devices, Inc.
Device 0 : Cypress
Executing kernel for 1 iterations
-------------------------------------------
MatrixA MatrixB Time(sec) KernelTime(sec)
3648x3632 3632x3648 11.441 10.7991
I have tried to set GPU_MAX_HEAP_SIZE to a value larger than 50 but this didn't change anything.
3632*3632*8 = 105531392 = 100.64MB (too large)
3616*3616*8 = 104603648 = 99.76MB (works fine)
It seems buffers over 100MB causes severel kernel slowdowns.
In my own kernel I upload 3 buffers the same size as mentioned and when it works fine, all 3 buffers are uploaded right after each other. With buffers that are too large, there is a 1.4 second delay between the second and third buffer. The kernel also uses way more time than it should.
This 100MB buffer limit, is it a know issue and is there any work-around?
i have seen similiar behavior with little highr values like 3800.
but i try experiment with GPU_INITIAL_HEAP_SIZE.
i run ./MatrixMulDouble -x 4096 -y 4096 -z 4096 -q -t with this results.
MatrixA MatrixB Time(sec) KernelTime(sec)
4096x4096 4096x4096 10.498 9.506
with set GPU_MAX_HEAP_SIZE to 400 i got
MatrixA MatrixB Time(sec) KernelTime(sec)
4096x4096 4096x4096 2.441 1.432
GPU_MAX_HEAP_SIZE has a range of 0 to 100. What would it mean setting it to 400?
And does GPU_INITIAL_HEAP_SIZE have any influence anymore? (what do the number mean if so)
Ok, I tried with GPU_INITIAL_HEAP_SIZE set to 400 and that caused a significant speed-up. Now there are no unexplained pauses.
But what does GPU_INITIAL_HEAP_SIZE signify? Initial heap size measured in MB? We tried with 100 and that didn't work well (i.e. it is not 100%).
originaly this variables get number in MB. so 400 mean 400MB. later to GPU_MAX_HEAP_SIZE AMD chage it into %.
and 4096*4096*8*3 is 384MiB so thats why 400.
Ok. I tried setting it to 700MB but then the slowdown reappeared. At 600MB it worked fine. I can see why AMD claims this to be unsupported, but still, some details would be nice.