The runtime allocates a limited amount of pinned host memory that is accessible by the GPU without using the CPU cache coherency protocol.
A limited portion of discrete GPU device memory is configured to be directly accessible by the CPU.
There is a limit on the maximum size per [zero-copy] buffer, as well as on the total size of all buffers. This is platform-dependent, limited in size for each buffer, and also for the total size of all buffers of that type (a good working assumption is 64 MB for the per-buffer limit, and 128 MB for the total).
In case of pinned memory, memory used on RAM space by the buffer is not allowed to be swapped during the application runs. Large pinned buffers can severely limit the RAM which can be used for other purposes. Most OSes have their own threshold ram sizes which they can allot for pinned buffer(which might depend on many other parameters of system).
Refer to the Buffer Bandwidth Sample and check the size(possible) of buffers for your system by supplying different buffer sizes as input.
Originally posted by: omion Thanks for the answers! Does anybody know about quote 2? That seems a bit different from OS-allocated RAM. I'd think that it would be up to the driver since the allocated memory is on the card.
The memory mentioned is a device memory which is mapped to RAM directly. CPU writes are extremely fast on this memory as write combine technique is used.
Refer to OpenCL Programming Guide for Details