i got an dedicated GPU but so far i had no success using it with HSA.
I guess its not supposed to work because the cache coherence is only garantied
for the APU.
The role of the IOMMU IP block is to act as a MMU for the PCI devices. In other words it lies between the PCIe device (in your case GPU) and the RC (Root Complex  and does the translations as per what tables it has. For example the GPU instructs the DMA engines to write to memory region 0x1000, the request gets sent via the PCIe lanes and the IOMMU block would translate 0x1000 to another address like 0x2E000 000. The GPU thinks it is writing at 0x1000 through the DMA - the GPU only writes btw using the DMA to RAM (CPU/system memory) and using its memory controller to VRAM (its memory).
Coherency would be a big problem given the memory systems are different - and even if that would be solved, it would provide no real performance benefits that I can think of (given the PCIe transfers need to take place anyway).