Archives Discussions

dominik_g · ‎08-15-2012

Hi everyone,

I noticed a very strange behaviour on a Llano device (A8-3850) under Linux. When increasing the memory size beyond a certain threshold the kernel runtime on the iGPU would suddenly jump disproportionately.

To investigate what's going on I wrote a simple program: Create a single buffer of size N using clCreateBuffer with only the CL_MEM_READ_WRITE flag set. Launch a kernel with a one-dimensional NDRange of size N which writes a constant to the buffer:

__kernel void foo (__global float * y, const int N) {
  int i = get_global_id(0);
  if (i >= N) return;
  y = 12;
}

When I increase the buffer size and measure the kernel execution time (using OpenCL events) I get the following behaviour:

Between buffer sizes of 240 and 250 MB the run-time jumps from around 7ms to 35ms. At 470 MB it drops from 64ms to 24ms. The results are reproducible and it always occurs at the same place.

Has anyone else noticed something similar? Any ideas what might cause it?

Cheers

Dominik

binying · ‎08-15-2012

1. Can you repeat this pattern multiple times?

2. If you remove,

if (i >= N) return;

can you have this pattern?

3 If it can be repeated, could it be understood in terms of the memory hierarchy of A8-3850?

dominik_g · ‎08-16-2012

Yes, I've repeated the experiment a number of times and the result is always the same. Removing the check against N doesn't change it.

It could be to do with the memory hierarchy although I don't understand how this can happen. I'm only accessing each element once so there's no re-use. And it also doesn't make any sense that the run-times go down again to "normal" levels after a while...

binying · ‎08-16-2012

I am trying to repeat it on my end. Do you mind posting the host code u are using?

dominik_g · ‎08-16-2012

I've uploaded the host code here: http://pastebin.com/QackTJ7n

Let me know if you have any problems with it.

binying · ‎08-17-2012

Your code compiles and is good when N is small such as 512. I can not use a large number for N such as 1000, where N is the size of the memory in Bytes, right?

I get an error as the following when N is large.

binying · ‎08-17-2012

C:\Users\binying\Desktop\helloCL_hg\Release>helloCL.exe 512

platform name: AMD Accelerated Parallel Processing

gws: 512

time: 0.024ms

time: 0.009ms

time: 0.008ms

time: 0.009ms

time: 0.008ms

time: 0.009ms

time: 0.010ms

C:\Users\binying\Desktop\helloCL_hg\Release>helloCL.exe 1024

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other

memory is corrupt.

at clGetPlatformIDs(UInt32 , _cl_platform_id** , UInt32* )

at initPlatform(SByte* platform, _cl_platform_id** platform_id) in c:\users\binying\desktop\hellocl_hg\hellocl.cpp:line 101

at main(Int32 argc, SByte** argv) in c:\users\binying\desktop\hellocl_hg\hellocl.cpp:line 30

at _mainCRTStartup()

C:\Users\binying\Desktop\helloCL_hg\Release>

binying · ‎08-17-2012

so I used the following kernel, wrote a simple hostside code myself. But I didnot see the performance jump.

---------------------------------------

__kernel void foo (__global float* input, __global float * out, const int n)

{

uint i = get_global_id(0);

if (i >= n) return;

out =-12.03894358+ input;

}

-----------------------------

The app profiler give the following the kernel execution time

memory size time

500MB 10.1 (ms ?)

400 7.92

300 5.97

200 4.3

100 2.17

---------------------------------------

dominik_g · ‎08-18-2012

Thanks for trying!

Which OS are you using? Maybe it's a problem with the Linux driver...

binying · ‎08-18-2012

Welcome.

win7.

yurtesen · ‎08-20-2012

What type of buffer are you using? Unfortunately I had some problems with buffers on Linux also but not the runtime of the kernels (yet) just map/unmap was working so slow after about ~200mb bufer size.

dominik_g · ‎08-21-2012

Just a normal OpenCL buffer with only the CL_MEM_READ_WRITE flag set.

Archives Discussions

Huge jump in performance when increasing buffer size on Llano