Most Machine Learning (ML) engineers use single precision (FP32) datatype for developing ML models. TensorFloat32 (TF32) has recently become popular as a drop-in replacement for these FP32 based models. However, there is a pressing need to provide additional performance gains for these models by using faster datatypes (such as BFloat16 (BF16)) without requiring additional code changes.
more