This is an optimized half precision gemm assembly kernels on AMD Fiji which utilizes native GCN assembly to achieve much better performance than clBLAS.
Link: GitHub - hyln9/GCNGEMM: Optimized half precision gemm assembly kernels on AMD Fiji
I would love to test this. Could you discuss more of your environment? How have you been running it?
Hi! I'm using Ubuntu 16.04 and all the other requirements are presented in the README.