cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

galmok
Journeyman III

Extreme kernel slowdown

with large allocations

I have been running into a memory problem with allocating large buffers for my kernel in that the kernel at a certain limit runs extremely slow.

I can reproduce this issue using AMD's MatrixMulDouble sample program when using matrices of size 3632 or larger. This will work fine:

N:\work\ATI Stream\samples\opencl\bin\debug\x86>MatrixMulDouble.exe -x 3616 -y 3616 -z 3616 -q -t
Platform :Advanced Micro Devices, Inc.
Device 0 : Cypress

Executing kernel for 1 iterations
-------------------------------------------
MatrixA                  MatrixB                  Time(sec)                KernelTime(sec)
3648x3616                3616x3648                1.41411                  0.761971

While this causes extreme time consumption:

N:\work\ATI Stream\samples\opencl\bin\debug\x86>MatrixMulDouble.exe -x 3632 -y 3632 -z 3632 -q -t
Platform :Advanced Micro Devices, Inc.
Device 0 : Cypress

Executing kernel for 1 iterations
-------------------------------------------
MatrixA                  MatrixB                  Time(sec)                KernelTime(sec)
3648x3632                3632x3648                11.441                   10.7991

 

I have tried to set GPU_MAX_HEAP_SIZE to a value larger than 50 but this didn't change anything.

3632*3632*8 = 105531392 = 100.64MB (too large)

3616*3616*8 = 104603648 = 99.76MB (works fine)

It seems buffers over 100MB causes severel kernel slowdowns.

In my own kernel I upload 3 buffers the same size as mentioned and when it works fine, all 3 buffers are uploaded right after each other. With buffers that are too large, there is a 1.4 second delay between the second and third buffer. The kernel also uses way more time than it should.

This 100MB buffer limit, is it a know issue and is there any work-around?

0 Likes
5 Replies
nou
Exemplar

i have seen similiar behavior with little highr values like 3800.

but i try experiment with GPU_INITIAL_HEAP_SIZE.

i run ./MatrixMulDouble -x 4096 -y 4096 -z 4096 -q -t with this results.

MatrixA                  MatrixB                  Time(sec)                KernelTime(sec)         
4096x4096                4096x4096                10.498                   9.506

with set GPU_MAX_HEAP_SIZE to 400 i got

MatrixA                  MatrixB                  Time(sec)                KernelTime(sec)         
4096x4096                4096x4096                2.441                    1.432

0 Likes
galmok
Journeyman III

GPU_MAX_HEAP_SIZE has a range of 0 to 100. What would it mean setting it to 400?

And does GPU_INITIAL_HEAP_SIZE have any influence anymore? (what do the number mean if so)

0 Likes

Ok, I tried with GPU_INITIAL_HEAP_SIZE set to 400 and that caused a significant speed-up. Now there are no unexplained pauses.

But what does GPU_INITIAL_HEAP_SIZE signify? Initial heap size measured in MB? We tried with 100 and that didn't work well (i.e. it is not 100%).

0 Likes

originaly this variables get number in MB. so 400 mean 400MB. later to GPU_MAX_HEAP_SIZE AMD chage it into %.

and 4096*4096*8*3 is 384MiB so thats why 400.

0 Likes
galmok
Journeyman III

Ok. I tried setting it to 700MB but then the slowdown reappeared. At 600MB it worked fine. I can see why AMD claims this to be unsupported, but still, some details would be nice.

0 Likes