A simple question; Why are there only 16 GPR registers in modern CPU designs, when there are 32 ZMM registers (2048 bytes of space!) on these systems? (Assuming AVX512)
Looking at how ZMM/YMM/XMM are currently handled (they each have their own extension opcode), this should be plenty possible with current means (But may merit some redesigning for CPUs that would support such a thing)
The number of architectural registers (ie registers visible in the programming model) needs to be stable to avoid every application needing to have a lot of different binaries, one for each generation of programming model, but the number of physical registers used at runtime has been growing regularly.
Looking at Ryzen-family processors as an example, there may only be 16 architectural GPRs for integer operations, but the underlying hardware uses 168 physical registers via register renaming. On the floating point side there are more architectural registers (because that part of the ISA was developed more recently) but only 160 physical registers used at execution time.