Archives Discussions

leisha · ‎10-23-2015

I have to process a huge independent similar data in an future product and would like to get rid of the bottle neck of PCI, which means most computation have to been done before being sent to the HW. . The R-series or Zen-series of AMD APU products seems to be the best candidate. However, I am just not sure what the architecture of a CU(GPU) is. It probably does not look like the NVidia Fermi-architecture. Then could you simply reveal the architecture for the fused GPU? The key question is how many threads or ALU per CU.

In addition, is it possible to do general purpose programming on fused GPU, either via CUDA, openCL, C++ACC/AMP and so on?

Thanks

bridgman · ‎07-17-2016

Here is an overview of the GCN core including compute units. The first few pages talk about GCN relative to earlier AMD GPU cores so you might want to fast-forward through those pages if you haven't worked with our VLIW shader cores:

https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf

TL:DR version - each CU contains 4 16-way SIMDs, referred to as Vector ALUs, along with a shared scaler unit. Each SIMD/VALU works on 64-item wavefronts, performing 64 operations (1 VALU instruction) in 4 clocks along with optional scalar, branch, and other instructions. Each SIMD/VALU has 10 associated program counters and can switch to a different thread on every VALU (4 clock) boundary to hide latency.

The primary programming models for our GPUs are OpenCL and HCC/HIP:

http://gpuopen.com/compute-product/hcc-heterogeneous-compute-compiler/

Archives Discussions

For R-series, What is the architecture for a CU (GPU)?