2 Replies Latest reply on Mar 28, 2017 10:29 AM by glupescu

    AMD HSA - huma with extra dedicated GPU?

    apfel

      Hi,

      i got an AMD A10-7850K and wanted to test and play with

      GitHub - HSAFoundation/CLOC: CL Offline Compiler : Compile OpenCL kernels to HSAIL

       

      But i also thought about getting a dedicated GPU to improve

      the potential performance. Like an RX480 or RX460. Is this

      possible? I mean, does the IOMMUv2 enable the dedicated

      GPU to use the system memory as real shared memory, like

      the integrated GPU in the APU?

       

      PS: I was not able to find a real document about this IOMMUv2,

      maybe i just didn't find it. Also information about how the cache

      coherence is realized in the APU between the CPU and GPU

      cores is somewhat limited. I only found third party websites with

      information on that. Maybe somebody can point me to a good

      source.

       

      Thanks!

       

      BR

      Simon

        • Re: AMD HSA - huma with extra dedicated GPU?
          apfel

          OK,

           

          i got an dedicated GPU but so far i had no success using it with HSA.

           

          I guess its not supposed to work because the cache coherence is only garantied

          for the APU.

           

          BR

          Simon

            • Re: AMD HSA - huma with extra dedicated GPU?
              glupescu

              The role of the IOMMU IP block is to act as a MMU for the PCI devices. In other words it lies between the PCIe device (in your case GPU) and the RC (Root Complex [1] and does the translations as per what tables it has. For example the GPU instructs the DMA engines to write to memory region 0x1000, the request gets sent via the PCIe lanes and the IOMMU block would translate 0x1000 to another address like 0x2E000 000. The GPU thinks it is writing at 0x1000 through the DMA - the GPU only writes btw using the DMA to RAM (CPU/system memory) and using its memory controller to VRAM (its memory).

               

              Coherency would be a big problem given the memory systems are different - and even if that would be solved, it would provide no real performance benefits that I can think of (given the PCIe transfers need to take place anyway).

               

              [1] Root complex - Wikipedia