According to the AMD64 manuals:
"The size of the count register used (CX, ECX, or RCX) depends on the address-size attribute of the LOOP
What does this mean? In 32-bit environments is this aways ECX?
AFAIK, in 32-bit environments, you can choose between the native 32-bit form and the legacy 16-bit form of the instruction:
32 bit form: 0E3h, <byte offset>
16 bit form: 67h, 0E3h, <byte offset>