I have seen the film about Bulldozer in comparison with hyper-threading. But i still can not understand the reasons to refuse from per-core-multy-threading. Any explanations?
The main objection is "serialisation", i.e. each core (each separated half-core in bulldozer core diagram) will be forced to execute one by one originally parallel threads. So, programmer will be punished the more the more threads he will do.
I want to say, with one thread per half-core we can not mix (overlap) in the half-core the next five separated time-slots of:
- end of previous thread's execution;
- CPU MMU context switcher;
- OS scheduler's execution;
- next CPU MMU context switcher;
- start of next user thread's execution.
There are async messages (events, interrupts) also, and the OS scheduler should service them, but for the most times the service routine will only sync the message into program message queue.
This is very interesting and useful hardware idea to double light or popular CPU modules to separate half-cores, but the last considered OS scheduler activity (ISR) is even more easy, in fact we need only one simplest CPU module to recognize ISR request, start the service with trivial opcode set to understand: do we need to call serious, complete ISR in the event or we need to sync the message only.
The same easy actions are required to execute prepared thread switching or to execute task-side thread's IPC also.
But i think, it is enough important to overlap preparation of next thread for execution and the execution of previuos (current) thread in order to hide internal, logical structure of program during execution and to give a way to programmer to feel free to parallel programming.
It looks like easy hardware stuff to support double context for two threads for each half-core and to support active thread switching for the nop opcode time.
And Intel say, that it is possible even to execute both of the thread (active and passive) in concurent mode, but it is hardware desiner knowledge to find out the reasons of concurent mode, of course, it is enough to have passive execution of second thread or even stop passive thread execution.
So, why AMD is ignoring per-core-multy-threading?