I have a question which I could not resolve with the specs and other places.
When I create buffers, I have three main options:
CL_COPY_HOST_PTR: in this case memory is allocated on device and it is implicitly initialized by the actual data in host memory pointed to by one of the arguments.
CL_USE_HOST_PTR: this does not allocate memory on device, rather it only creates a pointer pointing into host memory and every access by the device (may it be read or write) travels through PCIe (except for APUs). Screaming slow.
CL_ALLOC_HOST_PTR: this flag allocates memory on the device, but leaves it uninitialized, thus allowing a NULL pointer to be passed as an argument. Does not involve implicit data copy, but leaves the programmer to initialize memory on device before usage.
Now, when I map buffer objects into host memory:
mapping: upon mapping a buffer into host memory, it's contents are copied back from device. If CL_MAP_READ is specified, than nothing special is done, after host thread finishes with data, it us unmapped and ready for use on the device. If CL_MAP_WRITE is enabled, then it can be read by host thread as before, but modifications on the contents of the mapped memory will be visible on the device once it is unmapped.
What will happen if device uses memory objects that are mapped to host memory at a given time? Will it crash the program, result in undefined behaviour, will it be sluggish?
Please tell me if I am making wrong assumptions at any given point.
you are wrong. in specification to CL_USE_HOST_PTR
OpenCL implementations are allowed to cache the buffer
contents pointed to by host_ptr in device memory. This
cached copy can be used when kernels are executed on a
CL_ALLOC_HOST_PTR just say OpenCL should provide pointer to a data.
and mapping buffer.
If the buffer object is created with CL_MEM_USE_HOST_PTR set in mem_flags, the following
will be true:
The host_ptr specified in clCreateBuffer is guaranteed to contain the latest bits in the
region being mapped when the clEnqueueMapBuffer command has completed.
The pointer value returned by clEnqueueMapBuffer will be derived from the host_ptr
specified when the buffer object is created.
and AFAIK using mapped buffer will lead to undefined behaviour. so even program crash.
I was asking a while back how can I allocate memory on device without having to allocate the same amount on the host? USE_HOST_PTR and COPY_HOST_PTR both initialize data with the given host_ptr when creating the buffer.
If I'd wish to create a 1GB buffer, I do not wish to allocate the same amount in host just to ensure that the pointer I give can be read by the application and the OS will not seg_fault the program.
Someone told me the way to do this is to use the ALLOC_HOST_PTR and pass a NULL pointer to clCreateBuffer. Is this not true?
Also I am unsure about what you meant by "guaranteed to contain the latest bits in the region being mapped". Does this mean that it contains the most up-to-date information? If so, why doesn't a buffer created with COPY_HOST_PTR do the same when mapped to host memory?
you don't need pass any HOST_PTR flag. just READ/WRITE flag. then you create buffer on device without allocation memory on host side.
and with guaranteed to contain the latest bits in the region being mapped is meant that you execute some kernel which write into buffer with USE_HOST_PTR. that kernel write into device memory. and with mapping you synchronize content of device and host memory.
The answer to that question might be implementation dependent. AFAIK OpenCL spec does not specify when a copy is necessary.
It just states the behaviour expected by using various flags. One technique which I tried was to track the system memory in use and note the changes in it when I create buffers using different FLAGS. It was very simple to do so using Task MAnager in windows. I hope there is something similar in Linux also.