There's no such latency on AMD. In every cycle it can read 3 regs and write 1 reg.
The only penalty I know is when a vector instruction that writes into a scalar reg is followed by a scalar instruction. That could be 1 cycle penalty but the compiler will avoid this anyways.
There's no such latency on AMD. In every cycle it can read 3 regs and write 1 reg.
The only penalty I know is when a vector instruction that writes into a scalar reg is followed by a scalar instruction. That could be 1 cycle penalty but the compiler will avoid this anyways.