cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

dominik_g
Journeyman III

Huge jump in performance when increasing buffer size on Llano

Hi everyone,

I noticed a very strange behaviour on a Llano device (A8-3850) under Linux. When increasing the memory size beyond a certain threshold the kernel runtime on the iGPU would suddenly jump disproportionately.

To investigate what's going on I wrote a simple program: Create a single buffer of size N using clCreateBuffer with only the CL_MEM_READ_WRITE flag set. Launch a kernel with a one-dimensional NDRange of size N which writes a constant to the buffer:

__kernel void foo (__global float * y, const int N) {

  int i = get_global_id(0);

  if (i >= N) return;

  y = 12;

}

When I increase the buffer size and measure the kernel execution time (using OpenCL events) I get the following behaviour:

plot.gif

Between buffer sizes of 240 and 250 MB the run-time jumps from around 7ms to 35ms. At 470 MB it drops from 64ms to 24ms. The results are reproducible and it always occurs at the same place.

Has anyone else noticed something similar? Any ideas what might cause it?

Cheers

Dominik

0 Likes
11 Replies
binying
Challenger

1. Can you repeat this pattern multiple times?

2. If you remove,

        if (i >= N) return

    can you have this pattern?

3 If it can be repeated, could it be understood in terms of the memory hierarchy of A8-3850?


   

 


0 Likes

Yes, I've repeated the experiment a number of times and the result is always the same. Removing the check against N doesn't change it.

It could be to do with the memory hierarchy although I don't understand how this can happen. I'm only accessing each element once so there's no re-use. And it also doesn't make any sense that the run-times go down again to "normal" levels after a while...

0 Likes

I am trying to repeat it on my end. Do you mind posting the host code u are using?

0 Likes

I've uploaded the host code here: http://pastebin.com/QackTJ7n

Let me know if you have any problems with it.

0 Likes

Your code compiles and is good when N is small such as 512. I can not use a large number for N such as 1000, where N is the size of the memory in Bytes, right?

I get an error as the following when N is large.

0 Likes

C:\Users\binying\Desktop\helloCL_hg\Release>helloCL.exe 512

platform name: AMD Accelerated Parallel Processing

gws: 512

time: 0.024ms

time: 0.009ms

time: 0.009ms

time: 0.008ms

time: 0.008ms

time: 0.009ms

time: 0.009ms

time: 0.008ms

time: 0.009ms

time: 0.010ms

C:\Users\binying\Desktop\helloCL_hg\Release>helloCL.exe 1024

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other

memory is corrupt.

   at clGetPlatformIDs(UInt32 , _cl_platform_id** , UInt32* )

   at initPlatform(SByte* platform, _cl_platform_id** platform_id) in c:\users\binying\desktop\hellocl_hg\hellocl.cpp:line 101

   at main(Int32 argc, SByte** argv) in c:\users\binying\desktop\hellocl_hg\hellocl.cpp:line 30

   at _mainCRTStartup()

C:\Users\binying\Desktop\helloCL_hg\Release>

0 Likes

so I used the following kernel, wrote a simple hostside code myself. But I didnot see the performance jump.

---------------------------------------

__kernel void foo (__global float* input, __global float * out, const int n)

{

uint i = get_global_id(0);

if (i >= n) return;

out =-12.03894358+ input;

}

-----------------------------

The app profiler give the following the kernel execution time

memory size     time

500MB            10.1 (ms ?)

400              7.92 

300              5.97

200              4.3

100              2.17

---------------------------------------

Thanks for trying!

Which OS are you using? Maybe it's a problem with the Linux driver...

0 Likes

Welcome.

win7.

0 Likes
yurtesen
Miniboss

What type of buffer are you using? Unfortunately I had some problems with buffers on Linux also but not the runtime of the kernels (yet) just map/unmap was working so slow after about ~200mb bufer size.

0 Likes

Just a normal OpenCL buffer with only the CL_MEM_READ_WRITE flag set.

0 Likes