The default values for OpenCL were chosen to provide a large amount of memory to the OpenCL application in a reliable and high performance fashion with our current implementation. They are there so that OpenCL plays nice with other applications that use the graphics card. This will be improved in future releases. That being said, there are environment variables that can be used to override the default values as specified in this posting here:
If you set GPU_MAX_HEAP_SIZE, you can override the default value. If you set GPU_INITIAL_HEAP_SIZE to equal GPU_MAX_HEAP_SIZE and the initial allocation succeeds then you get guaranteed access to that much memory in your OpenCL program. If the two values are not equal, then it is possible that you can fail allocation sometime in your program as the heap gets resized. Also, you can have performance issues if the heap no longer fits in device memory and instead uses host memory.
However, playing around with these environment variables is unsupported and you use them at your own risk. They are not future proof and could disappear in a future release so I strongly recommend using them only for testing and not basing an application on them working.
Originally posted by: MicahVillmow The default values for OpenCL were chosen to provide a large amount of memory to the OpenCL application in a reliable and high performance fashion with our current implementation.
That 128Mb-hard-coded limit can be very problematic...
Imagine Photoshop wanna use OpenCL to perform a filter on a 4k x 4k x 4 channels x float buffer = 256Mb
Now Imagine also an artist has already open a 3dsmax with a GPGPU renderer like VRayRT with a medium-polygonised model ( let's say a 2M sculpted quad model = 192Mb ). The artist wanna apply the filter to the texture in PS and then previsualize it in VRayRT without closing any application. A pretty common case btw.
With the current 128Mb policy BOTH applications will fail !That means the artists won't be able to use VRayRT and NEITHER Photoshop.
On the contrary, if you remove the 128Mb limitation almost ONE could run... and the other could just and plain show a "Sorry, out of VRAM memory. Please close any other GPGPU app so some memory is released".
What do you think it's better? To execute none of the apps or to run one? I bet to run one.
Pls, allow use to monopolise the VRAM. The users actually know what are they doing/using. OpenCL itself not.
And btw... is the VRAM virtualised like the system memory in x86's protected mode?
I see, thanks for explanation.
Hope next release will enable more GPU memory by default, for now I can live with 128MB only, just was surprised with allocation errors for sizes far less than 256MB...
Perhaps it does not strictly fit this topic, but is there any possibility to measure how many memory can be allocated on specified GPU ? It is obvious that OS needs to allocate some memory and perhaps a few other apps.
Thanks to this topic I know that default (max) memory size might be overridden, but what is actually the max value that will not crash my X window system or how it could be measured?
Exceeding memory requirements shouls never crash X. But, clCreateBuffer should give an error saying it is out of memory.
You can query maximum amount of memory available using CL_DEVICE_GLOBAL_MEM_SIZE enum in clGetDeviceInfo.
i use this OpenGL extension http://www.opengl.org/registry/specs/ATI/meminfo.txt to create small program that monitor free memory on GPU.
it return four value in kB.
in my experiment with GPU_MAX_HEAP_SIZE i increase to 1024MB. then as i allocate 128MB buffers it repeatly move from memory pool to auxiliary memory and back. maximum what it allocate on my 1GB 5850 was 768MB then it return out of resources.
Can anyone explain it to me:
There is on Linux a file called mem located in /proc/ati/0 (that is probably system specific). It shows some useful information which I cannot fully understand. It looks like below:
# cat /proc/ati/0/mem
total counts | outstanding
type alloc fail bytes | allocs bytes
system 0 0 2123517952 | 0 0
locked 354349 0 1455151528 | 8748 39565736
sareas 1 0 8192 | 1 8192
driver 51579 0 7245854 | 118 4122674
magic 5 0 60 | 0 0
maplist 6529 0 391740 | 15 900
vmalist 6651 0 79812 | 16 192
buflist 40 0 117968 | 16 116432
files 11 0 7552 | 3 1536
contexts 3 0 56 | 1 32
hwcontext 5 0 40960 | 4 32768
# cat /proc/ati/0/mem1
total counts | outstanding
type alloc fail bytes | allocs bytes
mappings 5 0 2224128 | 5 2224128
textures 0 0 0 | 0 0
agplist 0 0 0 | 0 0
agpmem 0 0 0 | 0 0
boundagp 0 0 0 | 0 0
aperture 0 0 0 | 0 0
dmabufs 0 0 0 | 0 0
memlocks 0 0 0 | 0 0
mutex 65 0 1040 | 4 64
drawables 1 0 16 | 0 0
mempages 815121 0 3338735616 | 8721 35721216
pcielist 178 0 6554444 | 9 69804
pcie 0 0 0 | 0 0
Im asking cause while trying to set more than 255 for both, GPU_INITIAL_HEAP_SIZE and GPU_MAX_HEAP_SIZE, the kernel simply fails. I know these flags are unsupported currently but why 255 is the max value whereas card memory is 512?. What is the rest of memory spent for ?
I thought the Xserver or even session manager might be the problem. I've tested many configurations such as kdm or slim with kde, xfce, twm. It might seem that the most lightweight configuration is simply slim+twm. Well, perhaps it is, but it doesn't help in this case. Does anyone have an idea why is that ?
When just GPU_MAX_HEAP_SIZE is bigger than 255, it works but probably uses host memory, which dramatically decreases computation performance. Actually it is just slightly faster than on CPU :-( .
Why id SDK 2.1 this limit (128MB only available for OpenCL) was not changed?
W/o GPU_MAX_HEAP_SIZE OpenCL reports only 128M available.
With GPU_MAX_HEAP_SIZE=256 it reports 512MB (full GPU memory for my GPU).
That is, it's bugged in both cases!
As Micah mentioned above, GPU_MAX_HEAP_SIZE is not supported, and might or might not work properly..
Hm, actually I don't wonder much why that hackish env variable doesn't work in new SDK. I wonder why new SDK still have limit of 128MB for memory usage for 4870 GPU!
i just found this thread http://www.khronos.org/message_boards/viewtopic.php?p=7321#p7321
The device memory can be viewed as just a cache where only memory objects needed by a command(s) executing on a device need to be allocated. This way, the actual amount of physical memory available on the device does not limit how many memory object you can create - it only limits the amount of memory needed by memory objects used by a command.
this is true when you have CPU only context. i succsefuly allocate more memory objects than reports CL_DEVICE_GLOBAL_MEM_SIZE.
but it fail with GPU only or mix context where it faild when i try allocate more than is device memory. so it is BUG.
Hm, this feature can be viewed as memory virtulization. One would swap memory object out of device memory into host memory (and, ultimately, even into swap file). It's good for memory-demanding apps that algorithm can not tolerate lower memory amount.
But for other apps it will be surely performance killer if such swapping will be done on runtime discretion w/o any control from app author. In this case maybe some additional attribute will be needed just as page-locking for Windows VirtualAlloc. Some data should stay in GPU memory while other could be swapped out of it. It becomes especially important when few GPGPU apps executed simultaneously. Improper memory swapping can kill performance for all of them.