I have to process a huge independent similar data in an future product and would like to get rid of the bottle neck of PCI, which means most computation have to been done before being sent to the HW. . The R-series or Zen-series of AMD APU products seems to be the best candidate. However, I am just not sure what the architecture of a CU(GPU) is. It probably does not look like the NVidia Fermi-architecture. Then could you simply reveal the architecture for the fused GPU? The key question is how many threads or ALU per CU.
In addition, is it possible to do general purpose programming on fused GPU, either via CUDA, openCL, C++ACC/AMP and so on?
Thanks