Archives Discussions

galmok · ‎02-03-2011

with large allocations

I have been running into a memory problem with allocating large buffers for my kernel in that the kernel at a certain limit runs extremely slow.

I can reproduce this issue using AMD's MatrixMulDouble sample program when using matrices of size 3632 or larger. This will work fine:

N:\work\ATI Stream\samples\opencl\bin\debug\x86>MatrixMulDouble.exe -x 3616 -y 3616 -z 3616 -q -t
Platform :Advanced Micro Devices, Inc.
Device 0 : Cypress

Executing kernel for 1 iterations
-------------------------------------------
MatrixA MatrixB Time(sec) KernelTime(sec)
3648x3616 3616x3648 1.41411 0.761971

While this causes extreme time consumption:

N:\work\ATI Stream\samples\opencl\bin\debug\x86>MatrixMulDouble.exe -x 3632 -y 3632 -z 3632 -q -t
Platform :Advanced Micro Devices, Inc.
Device 0 : Cypress

Executing kernel for 1 iterations
-------------------------------------------
MatrixA MatrixB Time(sec) KernelTime(sec)
3648x3632 3632x3648 11.441 10.7991

I have tried to set GPU_MAX_HEAP_SIZE to a value larger than 50 but this didn't change anything.

3632*3632*8 = 105531392 = 100.64MB (too large)

3616*3616*8 = 104603648 = 99.76MB (works fine)

It seems buffers over 100MB causes severel kernel slowdowns.

In my own kernel I upload 3 buffers the same size as mentioned and when it works fine, all 3 buffers are uploaded right after each other. With buffers that are too large, there is a 1.4 second delay between the second and third buffer. The kernel also uses way more time than it should.

This 100MB buffer limit, is it a know issue and is there any work-around?

nou · ‎02-03-2011

i have seen similiar behavior with little highr values like 3800.

but i try experiment with GPU_INITIAL_HEAP_SIZE.

i run ./MatrixMulDouble -x 4096 -y 4096 -z 4096 -q -t with this results.

MatrixA MatrixB Time(sec) KernelTime(sec)
4096x4096 4096x4096 10.498 9.506

with set GPU_MAX_HEAP_SIZE to 400 i got

MatrixA MatrixB Time(sec) KernelTime(sec)
4096x4096 4096x4096 2.441 1.432

galmok · ‎02-03-2011

GPU_MAX_HEAP_SIZE has a range of 0 to 100. What would it mean setting it to 400?

And does GPU_INITIAL_HEAP_SIZE have any influence anymore? (what do the number mean if so)

galmok · ‎02-03-2011

Ok, I tried with GPU_INITIAL_HEAP_SIZE set to 400 and that caused a significant speed-up. Now there are no unexplained pauses.

But what does GPU_INITIAL_HEAP_SIZE signify? Initial heap size measured in MB? We tried with 100 and that didn't work well (i.e. it is not 100%).

nou · ‎02-03-2011

originaly this variables get number in MB. so 400 mean 400MB. later to GPU_MAX_HEAP_SIZE AMD chage it into %.

and 4096*4096*8*3 is 384MiB so thats why 400.

galmok · ‎02-03-2011

Ok. I tried setting it to 700MB but then the slowdown reappeared. At 600MB it worked fine. I can see why AMD claims this to be unsupported, but still, some details would be nice.

Archives Discussions

Extreme kernel slowdown