3 Replies Latest reply on Apr 27, 2009 5:35 AM by attilagenc

    Direct Video Memory <-> Video Memory Copy


      Hi everybody,

      I would have a question.

      I have a PC with two HD4870X2 cards, for a total of 4 RV770 GPUs in the machine. I have written a kernel for parallel soft tissue simulation, and as expected, the required frequent data exchange can be a bottleneck.

      On the software side, I use an architecture very close to diagram 3.11 in the Stream Computing User Guide (rev. 1.4.0), with one CPU-thread per GPU.

      So the question is: is there a way to copy a memory resource resident in one video memory directly to the other video memory on the same card, without ever going across the PCIe-bus?

      Likewise, is there a way to copy a resource on one card directly to the other card via PCIe-bus, without going through system memory?

      Thanks a lot

        • Direct Video Memory <-> Video Memory Copy

          Which API are you using Brook+ or CAL?

          CAL supports DMA transfer from one card to another, but it is too slow. I am not sure how it is implemented, it is possible that this implementation too uses system memory.

          Brook+ exposes a methos Stream::assign() to copy data from one GPU to another, but this too goes through system memory as CAL implementation is too slow. In future, it can be changed if calMemcpy gives better performance.

            • Direct Video Memory <-> Video Memory Copy


              thanks for your answer. I'm using CAL.

              There is a nice diagram on tom's hardware


              and that "sideport" was what I had in mind for fast copies between memories on the same board. But the article also states that apparently this feature was not available at that time. But, it would help tremendously to solve my bandwidth problems.

              Other than that: is there a way to use the path through the PCIe bridge from one GPU to the other? That would be the second-best solution.


                • Direct Video Memory <-> Video Memory Copy


                  How did you manage to copy data to gpus seperately? From your

                  question I assume that you have some data on 1 gpu and you

                  want to copy it to the other gpu without going to system memory,

                  which implies that you have some data on one gpu and other one

                  does not. According to my observations that can not be case with

                  current drivers.


                  I also have 4870x2 and even though cal/brook reports two gpus,

                  you can not allocate resources on just one of them. It looks like

                  when a stream allocation is requested, allocation occurs in both

                  GPUs memories, since both available local GPU memory decreases.

                  Which is bad because although there is 2G byte physical memory,

                  you are able to use just 1G bytes.


                  I suggest you add some calDeviceGetStatus before and after resource

                  allocations and check for yourself.


                  And just in case I am wrong, can you explain how did you manage to

                  allocate resource just in one GPU?