Originally posted by: MicahVillmow The default values for OpenCL were chosen to provide a large amount of memory to the OpenCL application in a reliable and high performance fashion with our current implementation.
That 128Mb-hard-coded limit can be very problematic...
Example:
Imagine Photoshop wanna use OpenCL to perform a filter on a 4k x 4k x 4 channels x float buffer = 256Mb
Now Imagine also an artist has already open a 3dsmax with a GPGPU renderer like VRayRT with a medium-polygonised model ( let's say a 2M sculpted quad model = 192Mb ). The artist wanna apply the filter to the texture in PS and then previsualize it in VRayRT without closing any application. A pretty common case btw.
With the current 128Mb policy BOTH applications will fail !That means the artists won't be able to use VRayRT and NEITHER Photoshop.
On the contrary, if you remove the 128Mb limitation almost ONE could run... and the other could just and plain show a "Sorry, out of VRAM memory. Please close any other GPGPU app so some memory is released".
What do you think it's better? To execute none of the apps or to run one? I bet to run one.
Pls, allow use to monopolise the VRAM. The users actually know what are they doing/using. OpenCL itself not.
And btw... is the VRAM virtualised like the system memory in x86's protected mode?
Perhaps it does not strictly fit this topic, but is there any possibility to measure how many memory can be allocated on specified GPU ? It is obvious that OS needs to allocate some memory and perhaps a few other apps.
Thanks to this topic I know that default (max) memory size might be overridden, but what is actually the max value that will not crash my X window system or how it could be measured?
Exceeding memory requirements shouls never crash X. But, clCreateBuffer should give an error saying it is out of memory.
You can query maximum amount of memory available using CL_DEVICE_GLOBAL_MEM_SIZE enum in clGetDeviceInfo.
i use this OpenGL extension http://www.opengl.org/registry/specs/ATI/meminfo.txt to create small program that monitor free memory on GPU.
it return four value in kB.
in my experiment with GPU_MAX_HEAP_SIZE i increase to 1024MB. then as i allocate 128MB buffers it repeatly move from memory pool to auxiliary memory and back. maximum what it allocate on my 1GB 5850 was 768MB then it return out of resources.
Can anyone explain it to me:
There is on Linux a file called mem located in /proc/ati/0 (that is probably system specific). It shows some useful information which I cannot fully understand. It looks like below:
# cat /proc/ati/0/mem
total counts | outstanding
type alloc fail bytes | allocs bytes
system 0 0 2123517952 | 0 0
locked 354349 0 1455151528 | 8748 39565736
sareas 1 0 8192 | 1 8192
driver 51579 0 7245854 | 118 4122674
magic 5 0 60 | 0 0
maplist 6529 0 391740 | 15 900
vmalist 6651 0 79812 | 16 192
buflist 40 0 117968 | 16 116432
files 11 0 7552 | 3 1536
contexts 3 0 56 | 1 32
hwcontext 5 0 40960 | 4 32768
# cat /proc/ati/0/mem1
total counts | outstanding
type alloc fail bytes | allocs bytes
mappings 5 0 2224128 | 5 2224128
textures 0 0 0 | 0 0
agplist 0 0 0 | 0 0
agpmem 0 0 0 | 0 0
boundagp 0 0 0 | 0 0
aperture 0 0 0 | 0 0
dmabufs 0 0 0 | 0 0
memlocks 0 0 0 | 0 0
mutex 65 0 1040 | 4 64
drawables 1 0 16 | 0 0
mempages 815121 0 3338735616 | 8721 35721216
pcielist 178 0 6554444 | 9 69804
pcie 0 0 0 | 0 0
Im asking cause while trying to set more than 255 for both, GPU_INITIAL_HEAP_SIZE and GPU_MAX_HEAP_SIZE, the kernel simply fails. I know these flags are unsupported currently but why 255 is the max value whereas card memory is 512?. What is the rest of memory spent for ?
I thought the Xserver or even session manager might be the problem. I've tested many configurations such as kdm or slim with kde, xfce, twm. It might seem that the most lightweight configuration is simply slim+twm. Well, perhaps it is, but it doesn't help in this case. Does anyone have an idea why is that ?
When just GPU_MAX_HEAP_SIZE is bigger than 255, it works but probably uses host memory, which dramatically decreases computation performance. Actually it is just slightly faster than on CPU 😞 .
Raistmer,
As Micah mentioned above, GPU_MAX_HEAP_SIZE is not supported, and might or might not work properly..
i just found this thread http://www.khronos.org/message_boards/viewtopic.php?p=7321#p7321
The device memory can be viewed as just a cache where only memory objects needed by a command(s) executing on a device need to be allocated. This way, the actual amount of physical memory available on the device does not limit how many memory object you can create - it only limits the amount of memory needed by memory objects used by a command.
this is true when you have CPU only context. i succsefuly allocate more memory objects than reports CL_DEVICE_GLOBAL_MEM_SIZE.
but it fail with GPU only or mix context where it faild when i try allocate more than is device memory. so it is BUG.