You may be seeing the profiler creating extra copies of buffers and images. The profiler does this in order to run the test program repeatedly. To minimize the number of copies, please only mark buffers and images with the read and write flag (this is the default) where necessary. Flagging additional buffers or images will cause unnecessary extra copies to be made by the profiler.
If we can have a copy of your test program, we'll look and see if we have a problem that we can fix. Please send it to firstname.lastname@example.org, if you can.
Thanks, I will send whole app with needed data file to run it.
The longest run I made now is ~4300 in execution order. I closed all that possible and enabled swap file of 16GB (actually, such big swap file not needed, app stays around 2GB of allocated memory until final "out of memory" crash).
I just sent testcase, please, look at it.
Thank you. We received the test case. We can confirm that when running under the profiler, the test application is using about 1.8 GB RAM. It ran to completion though (on a 4 GB machine, Win7 64 bit) and didn't crash (the execution order was upto 10000 which is the limit in the profiler). We will investigate whether the memory usage is normal (due to the extra copies required by the profiler for the application) or not.
Which OS and GPU are you using?
I use 32-bit Vista x86, GPU is HD4870.
4Gb installed (but 32-bit OS didn't see them in full of course).
I implemented your suggestion about read/write - only buffers usage. Some of buffers can be read or write-only indeed, some used as output for one kernel and input for another - then read-write is required.
This new version still crashes with execution number <4k.