Which API are you using Brook+ or CAL?
CAL supports DMA transfer from one card to another, but it is too slow. I am not sure how it is implemented, it is possible that this implementation too uses system memory.
Brook+ exposes a methos Stream::assign() to copy data from one GPU to another, but this too goes through system memory as CAL implementation is too slow. In future, it can be changed if calMemcpy gives better performance.
thanks for your answer. I'm using CAL.
There is a nice diagram on tom's hardware
and that "sideport" was what I had in mind for fast copies between memories on the same board. But the article also states that apparently this feature was not available at that time. But, it would help tremendously to solve my bandwidth problems.
Other than that: is there a way to use the path through the PCIe bridge from one GPU to the other? That would be the second-best solution.
How did you manage to copy data to gpus seperately? From your
question I assume that you have some data on 1 gpu and you
want to copy it to the other gpu without going to system memory,
which implies that you have some data on one gpu and other one
does not. According to my observations that can not be case with
I also have 4870x2 and even though cal/brook reports two gpus,
you can not allocate resources on just one of them. It looks like
when a stream allocation is requested, allocation occurs in both
GPUs memories, since both available local GPU memory decreases.
Which is bad because although there is 2G byte physical memory,
you are able to use just 1G bytes.
I suggest you add some calDeviceGetStatus before and after resource
allocations and check for yourself.
And just in case I am wrong, can you explain how did you manage to
allocate resource just in one GPU?