Hi everyone,
I noticed a very strange behaviour on a Llano device (A8-3850) under Linux. When increasing the memory size beyond a certain threshold the kernel runtime on the iGPU would suddenly jump disproportionately.
To investigate what's going on I wrote a simple program: Create a single buffer of size N using clCreateBuffer with only the CL_MEM_READ_WRITE flag set. Launch a kernel with a one-dimensional NDRange of size N which writes a constant to the buffer:
__kernel void foo (__global float * y, const int N) {
int i = get_global_id(0);
if (i >= N) return;
y = 12;
}
When I increase the buffer size and measure the kernel execution time (using OpenCL events) I get the following behaviour:
Between buffer sizes of 240 and 250 MB the run-time jumps from around 7ms to 35ms. At 470 MB it drops from 64ms to 24ms. The results are reproducible and it always occurs at the same place.
Has anyone else noticed something similar? Any ideas what might cause it?
Cheers
Dominik
1. Can you repeat this pattern multiple times?
2. If you remove,
if (i >= N) return;
can you have this pattern?
3 If it can be repeated, could it be understood in terms of the memory hierarchy of A8-3850?
Yes, I've repeated the experiment a number of times and the result is always the same. Removing the check against N doesn't change it.
It could be to do with the memory hierarchy although I don't understand how this can happen. I'm only accessing each element once so there's no re-use. And it also doesn't make any sense that the run-times go down again to "normal" levels after a while...
I am trying to repeat it on my end. Do you mind posting the host code u are using?
I've uploaded the host code here: http://pastebin.com/QackTJ7n
Let me know if you have any problems with it.
Your code compiles and is good when N is small such as 512. I can not use a large number for N such as 1000, where N is the size of the memory in Bytes, right?
I get an error as the following when N is large.
C:\Users\binying\Desktop\helloCL_hg\Release>helloCL.exe 512
platform name: AMD Accelerated Parallel Processing
gws: 512
time: 0.024ms
time: 0.009ms
time: 0.009ms
time: 0.008ms
time: 0.008ms
time: 0.009ms
time: 0.009ms
time: 0.008ms
time: 0.009ms
time: 0.010ms
C:\Users\binying\Desktop\helloCL_hg\Release>helloCL.exe 1024
Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other
memory is corrupt.
at clGetPlatformIDs(UInt32 , _cl_platform_id** , UInt32* )
at initPlatform(SByte* platform, _cl_platform_id** platform_id) in c:\users\binying\desktop\hellocl_hg\hellocl.cpp:line 101
at main(Int32 argc, SByte** argv) in c:\users\binying\desktop\hellocl_hg\hellocl.cpp:line 30
at _mainCRTStartup()
C:\Users\binying\Desktop\helloCL_hg\Release>
so I used the following kernel, wrote a simple hostside code myself. But I didnot see the performance jump.
---------------------------------------
__kernel void foo (__global float* input, __global float * out, const int n)
{
uint i = get_global_id(0);
if (i >= n) return;
out =-12.03894358+ input;
}
-----------------------------
The app profiler give the following the kernel execution time
memory size time
500MB 10.1 (ms ?)
400 7.92
300 5.97
200 4.3
100 2.17
---------------------------------------
Thanks for trying!
Which OS are you using? Maybe it's a problem with the Linux driver...
Welcome.
win7.
What type of buffer are you using? Unfortunately I had some problems with buffers on Linux also but not the runtime of the kernels (yet) just map/unmap was working so slow after about ~200mb bufer size.
Just a normal OpenCL buffer with only the CL_MEM_READ_WRITE flag set.