cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

acekiller
Adept I

Confused by inconsistant address space mapping between global memory and multiple devices

today I want to test how the global memory buffer is allocated and stored in OpenCL, but the results confused me. I created a global memory buffer with CL_MEM_ALLOC_HOST_PTR flag, so the buffer will be visible to both CPU and GPU, then pass it to a kernel, this kernel is built onto both CPU and GPU devices, but I assign the first half of this buffer to CPU and the remaining half to GPU, with different offset to indicate the different starting point for CPU and GPU to start with, so CPU and GPU can concurrently work on the same buffer, but in different places. Then in the kernel, I get the address of each element of buffer using &buffer[offset+tid] and store the address to itself again.

__kernel void foo(__global uint * buffer, const uint offset)

{

     uint tid = get_global_id(0);

     buffer[offset+tid] = &buffer[offset+tid];

     ... ...

}

After I send buffer back to CPU and print out the values, I find that, the address values returned by CPU kernel is consecutive and what returned by GPU is also consecutive, but these two address spaces are not consecutive with each other. I think CPU and GPU are working on the same buffer, why are the addresses of the second half of the buffer are not consecutive with the first half? If I use only one device (CPU or GPU), then all addresses are correctly consecutive. Is there anyone can help me to solve this problem? Because I guess it may be a misunderstanding in basic conceptions of GPU memory, so I want a detailed interpretation.

PS.:Can I understand it in this way: Though there is only one buffer physically, but the global memory is actually an opaque structure, the address values returned by different devices will be different because the address conversion & mapping between global memory and CPU, global memory and CPU are implementation-random and -independent?

0 Likes
1 Solution
LeeHowes
Staff

There isn't only one buffer physically. OpenCL's memory model says that buffers are consistent at the end of kernel execution. As the runtime has no way to know what range of the buffer each kernel is using, it has no way to split the buffer in two automatically and decide what was updated. Therefore it may copy the entire buffer to the device temporarily for efficient access and copy it back, overwriting any changes you made on the other device, and of course meaning that when you check the address you get the address of the copy, not the original. Even if you do have access to exactly the same data and the runtime really is mapping a single copy of the data to both devices, then it is not a requirement to map both into the same address space, only that both have access to the same data, so in that case you are right. There is a single physical copy but each device may have its own virtual range.

OpenCL does not assume that memory is shared, and the runtime specifically optimises under the assumption that it is not shared.

View solution in original post

0 Likes
3 Replies
binying
Challenger

I created a global memory buffer with CL_MEM_ALLOC_HOST_PTR flag,  ...After I send buffer back to CPU and print out the values, I find that, the address values returned by CPU kernel is consecutive and what returned by GPU is also consecutive, but these two address spaces are not consecutive with each other.

--I would say it has something to do with how opencl is implemented.

0 Likes

Sorry, I am not sure what do you mean by "how opencl is implemented"? As I know, the fused memory on APU nowadays is not fully shared among CPU and GPU, but for next generations, it can be expected to be fully shared. Do you hint that this problem occurs because of hardware support? So there are two different address space mappings?

0 Likes
LeeHowes
Staff

There isn't only one buffer physically. OpenCL's memory model says that buffers are consistent at the end of kernel execution. As the runtime has no way to know what range of the buffer each kernel is using, it has no way to split the buffer in two automatically and decide what was updated. Therefore it may copy the entire buffer to the device temporarily for efficient access and copy it back, overwriting any changes you made on the other device, and of course meaning that when you check the address you get the address of the copy, not the original. Even if you do have access to exactly the same data and the runtime really is mapping a single copy of the data to both devices, then it is not a requirement to map both into the same address space, only that both have access to the same data, so in that case you are right. There is a single physical copy but each device may have its own virtual range.

OpenCL does not assume that memory is shared, and the runtime specifically optimises under the assumption that it is not shared.

0 Likes