AnsweredAssumed Answered

Exact meaning of ALU latency measurements?

Question asked by maxdz8 on Apr 17, 2015
Latest reply on Apr 20, 2015 by maxdz8

On section 6.6.1 of APP guide, "hiding ALU and Memory Latency" I read:

The read-after-write latency for most arithmetic operations (a floating-point add, for example) is only four cycles.

Read-after-write... since SI devices take 4 cycles to execute an instruction, what I understand is that they WRITE the result after four clocks, that is, the first 16-WI slice of a result. The register is marked "being written" somehow so it cannot be read.

Or in other terms, I cannot use a value immediately after computation and I must but there at least one interleaving instruction.


Is this correct?


I am not quite sure of GCN ISA but I had some kernels which seem to use wait instructions for no apparent reason.

I've also measured some performance increase in a case where I manually merged two WIs... but again I have no real evidence.


I'll have to write a few things next week so I guessed it was a good time to ask. Thank you for input.