UPDATE (answering to myself...):
I succeeded to implement it in the following way:
1. On a "Remote" GPU, create an "Addressable" buffer:
a. clCreateBuffer with CL_MEM_BUS_ADDRESSABLE_AMD
b. clEnqueueMakeBuffersResidentAMD which outputs a "Physical address" (cl_bus_address_amd)
2. On a "Local" GPU, create an "External" buffer:
a. clCreateBuffer with CL_MEM_EXTERNAL_PHYSICAL_AMD and the "Physical address"
b. clEnqueueMigrateMemObjects to the buffer
Then I use clEnqueueCopyBuffer on the Local GPU, from the external buffer, to any local buffer. And it copies the content of the "Addressable" buffer which actually resides on the other ("Remote") GPU.
I guess I can even run a kernel on the "Local" GPU with the "External" buffer as a parameter, and that would implicitly read/write in the "Remote" GPU's "Addressable" buffer. Right?
Is every memory access literally external?
Or is there a DMA block transfer when needed?
Am I missing something?