I wrote a program which uses single context and multiple queues/devices (a queue per device). I am using OpenMP to create threads per queue.
I have several read_only memory objects and I have read_write objects per queue.
I realized that the kernels(same kernel on multiple devices) seem to run sequentially even when they are queued in parallel by threads, this happens if the read_only objects are shared between devices. Is this correct behaviour? I thought this would make sense for read/write objects since both kernels shouldnt update them at the same time. But shouldnt OpenCL be able to tell that this is not a problem for read_only objects?
I have changed the program to have read_only objects per queue also. Now kernels appear to run concurrently. However a side effect seems to be that the all read-only objects are copied into both devices even when they wont be used on that card.