1 Reply Latest reply on Sep 14, 2011 5:06 AM by realhet

    CAL Mem Transfer, calResAlloc


      AMD Forum,

      ====== part 1 ======

      I want to do a quick process in CAL/IL and then bring the results back to the CPU.  What is the best way to do this other than:


      memmove(...); // or do a loop with pointer indexing


       =>This becomes very taxing indeed during runtime.

      ====== part 2 ======

      What is the difference between calResAllocLocal and calResAllocRemote?  I understand that Local uses GPU memory and Remote uses CPU memory; but to put the results back to a CPU array, I can do the calResMap,memmove,calResUnmap strategy to either of these resources...so why would I ever want to use the slower calResAllocRemote?

      ====== part 3 ======

      What's the point of calMemcpy, other than it is asynchronous?  When will I ever want to move from local resource to global resource if I can access both from the CPU given calResMap?


      THANKS! I greatly appreciate it!


        • CAL Mem Transfer, calResAlloc


          part1: You can use SSE instructions to copy on the CPU side, thats much faster, but anyways there is a better way: Pinned memory.

          part2: memory allocated with ResAllocLocal resides on GPU ram, and ResAllocRemote on the system ram. If CPU wants to use Local memory, the it will be transfered through PCIE bus first. Same thing goes between GPU and Remote (CPUside) memory.

          There is a third kind of allocation, I guess that's the best: It's pinned memory:

          - Allocated with calResCreate1D or 2D (it's an extension)

          - You have to provide the System memory for it (watch out for 4K alignment!!!)

          - No need to use calResMap. This memory is synchronised transparently between GPU and CPU. (I guess thr amd driver is actively cooperating with the OS's memory paging stuff)

          - And suprisingly this is the fastest way of memory transfer I experienced with.

          part3: You can copy memory from Local to Loca, while on the cpu you do whatever you want. Also when copying between Local and Remote, it's done assynchronously with the DMA controller. Also you can queue memoryTransfer and runProgram events and.

          Hope it helps.