I saw the GEMM_Shaders.h in acmlgpu1-1, which used to build the libCALBLAS library in "./src/libCALBLAS" subdirectory.
The "szDGEMM_Mult" kernel have 8 inputs( 4 for A and 4 for B) and 8 outputs(8 'o'registers for C), but why the declaration part only declares 4 'o'registers, and why the compiling and running it have no problems? Besides, when I change to declare 8 outputs, the conpiling aand running proccess also right. But, when I change the kernel to "il_cs_2_0", the compiling cannot complete successfully.
I'm confused now. Thank you for reply.