today I want to test how the global memory buffer is allocated and stored in OpenCL, but the results confused me. I created a global memory buffer with CL_MEM_ALLOC_HOST_PTR flag, so the buffer will be visible to both CPU and GPU, then pass it to a kernel, this kernel is built onto both CPU and GPU devices, but I assign the first half of this buffer to CPU and the remaining half to GPU, with different offset to indicate the different starting point for CPU and GPU to start with, so CPU and GPU can concurrently work on the same buffer, but in different places. Then in the kernel, I get the address of each element of buffer using &buffer[offset+tid] and store the address to itself again.
__kernel void foo(__global uint * buffer, const uint offset)
uint tid = get_global_id(0);
buffer[offset+tid] = &buffer[offset+tid];
After I send buffer back to CPU and print out the values, I find that, the address values returned by CPU kernel is consecutive and what returned by GPU is also consecutive, but these two address spaces are not consecutive with each other. I think CPU and GPU are working on the same buffer, why are the addresses of the second half of the buffer are not consecutive with the first half? If I use only one device (CPU or GPU), then all addresses are correctly consecutive. Is there anyone can help me to solve this problem? Because I guess it may be a misunderstanding in basic conceptions of GPU memory, so I want a detailed interpretation.
PS.:Can I understand it in this way: Though there is only one buffer physically, but the global memory is actually an opaque structure, the address values returned by different devices will be different because the address conversion & mapping between global memory and CPU, global memory and CPU are implementation-random and -independent?