2 Replies Latest reply on Mar 10, 2017 12:07 PM by tugrul_512bit

    GCN: 4 workitems per compute unit but with computing float16 in each.


      Normally I give it 256 workitems per workgroup and 64 cores compute those workitems but what if I just pick 4 items with float16 math? Does the compiler map each item to  whole 16-wide SIMDs in any of drivers or GCN versions?