FPU performance on poor code

Which is the highest theoretical FLOP/(cycle*core) rate on EPYC processors when not using vectorization and FMA, for example when using x87 instructions or SSE instructions utilizing only the first value?