How should the buffers be configured when calling clCreateBuffer to make sure that they are never mirrored back and forth between Host and GPU in the background without explicit call to clEnqueueReadBuffer / clEnqueueWriteBuffer?
I keep reading about "driver will cache buffer in GPU memory if seemed appropriate". For example, I have 2 buffers, which are input/output buffers for the program and about 10 others which are GPU only and never accessed by the Host (only by GPU).
Is there a difference in behaviour, if the same buffer is used per partes (using offsets) or multiple distinct buffers?
The simplest answer is to refer to OpenCL Programming guide Table 4.2 and Table 4.3.And yes, buffer created with no flags go directly in device.Other methods are explained in the source pointed above.