Is there someone who can explain how INT32/INT64 computation will be processed in MI300 XCD as detailed as possible in perspective of the hardware architecture?
Unlike to NVIDIA ampere architectures those have dedicated int32 cores, MI300 looks focusing to give benefits for supporting ML friendly TF32/BF16 or INT8 operations while deprioritizing classical IN32/INT64 operations according to the white paper.
I want to clarify how much efficient my kernels composed of INT32/INT64 computations(non-ML workload) only would be on MI300.