I'm testing some stencil code on heterogeneous architectures by using 2 GPUs.
In order to update memory data in different GPUs, I tried to use the function clEnqueueWriteBufferRec and clEnqueueReadBufferRec to transfer 1000 Bytes data from table A on GPU_1 to table A' on GPU_2.
Then i found this phenomenon: the overhead of data transfer increases linearly with the size of table A (We only and always transfer 1000 Bytes data from table A!). I'd like to know if anyone has noticed that? any solution?