Hey AMD, there is a muddle in some people's heads. Of course, it'll be nice for you to name your own 4 module / 8 thread Bulldozer as an 8 core CPU, but I believe that this will put a lots of people in confusion. They already have a habit to compare CPUs by core-to-core method, so if you'll continue to name your 4 module / 8 thread Bulldozer as an 8 core CPU, there will be a lots of disappointment like this:
"Heck, my 8 core Bulldozer is slower than 6 core Gulftown. WTF?!"
My point is: please stop to name the Bulldozer cores as a cores. Yes, they are almost real cores, so yes, they deserve the "core" term, but this will be your, AMD, very bad marketing mistake. Instead, in case of Bulldozer, replace the "core" term with the "thread", just like this:
"4 modules / 8 threads" instead of "4 modules / 8 cores"
If you, AMD, will do this trick, people will begin to compare Bulldozer with any other CPUs by using their usual "apples-to-apples" (i.e. "cores-to-cores" or, more precisely, "threads-to-threads") method, and from this point of view, your Bulldozer will be more successful.
BTW, if you want to invent a name for your own technology "1 module = 2 cor.. er.. threads", then what about "Double Threading?"
Ok, there is another suggestion: if you don't want to rename the "cores" term to the "threads", why not to use the "integer cores" or even "ALUs?" Check it out:
"AMD FX-8xxx CPU. 8 integer cores (ALUs) and 4 flex floating point cores (FPUs)."
Let me remind you, AMD, that the market name you want to use ("cores" instead of "threads" or "integer cores") can easily make your consumers disappointed in case if Bulldozer will be slower then Sandy Bridge just in one test. I repeat: this would be a big mistake to do that! If you don't believe me, then ask the people on your own forums until it's not to late.
I think this issue is already confusing enough without AMD having this naming convention. Since a dual-core, Hyper-threaded processors advertise themselves as full-fledged quad-core CPUs toward the operating system (and that is what most users see), the boundary between "core" and "real-core" is very slim.
Also, many IT-related sites test and compare Intel Core-iX processors and Phenom II YX processors on a thread-core basis, which is very misleading, but somewhere rational. If a 4 threaded Intel is on par with a 4 threaded Phenom II, why not compare and say "hey, Intel also has an IGP inside."
Performance is all that matters, and underlying HW really does not (if you truly think about it). Naming conventions will always serve marketing reasons, and it is always up to the experts to know what the words really mean. Until competition exists, companies always tend to shift the meaning of words in the direction that makes them look better, while not lying too much.
Bulldozer is very similar to Hyper-threading (although it works differently), in naming it only differs that it doesn't distinguish real cores from virtual cores. You might say, that is unfair, but it is the same with Stream Processors vs Stream Cores. As shaders, always the Stream Processors are advertised, when in reality Stream Processors are NOT independant from each other, they imply data independency in shader binary to operate. Thus in reality, everyone knows that Cuda Cores are more flexible, and their number reflects actual performance more reliably than Stream Processor count.
These are things we have to get used to. Competition is tough.
Yes, I'm aware about the difference between the Radeon and GeForce graphics architectures. But I still believe that AMD should be as modest as possible in the situation with Bulldozer. People don't like to be deceived.
What did I say, AMD?! You've made a big mistake by naming Bulldozer CPUs as 4/6/8-"core" instead of "thread," just as I've suggested. Look at the reviews over the Net - almost all of them are saying: "Bulldozer is a crap." Is that what you wanted for your new CPU architecture?
There are much more questions with this BD stuff than just this module/core stuff.
Like number of modules, for example. While managing to cram as much as six Thuban cores on one 346mm2 die at 45nm, you get only 4 BD modules at 32nm.
At 32nm one should have 2x logic budget per same area and with one BD module ( per AMD's statement) requiring only 18% logic than one k-10 core, they should be able to easily be able to put 2X AS MANY MODULES ON BD THAN CORES ON THUBAN.
Even more, had they sacrifised some cache- which for core intensive architecture would make perfect sense.
I would LOVE to see chip with 12 or more modules, evein if lean with cache, say 512K of L2 per module and perhaps 1 or 2MB of L3...
BD, as it is, makes no sense. First they go for multithreading performance and then half the way they change their mind and put on shitload of cache to catch-up with unithread performance.
As old proverb goes, its neither cat nor mouse....
It seems that some software (mostly, games) work on Bulldozer slightly faster when they are using "1 thread per module" method (The Method), instead of "2 threads per module," - just look at this and this. You, AMD, can use this method to improve the games' performance on Bulldozer right now, in Windows XP/V/7, not in Windows 8, just by one these cases:
I have read the developers thread for a patch for Linux kernel 3.2 and from what I understand it is a matter of cache aliasing in L1i cache.
Different scheduling was proposed, but it wouldn't solve, just lessen the problem, so different solution had to be implemented.
I'm not sure that in those cases I've mentioned the problem is in the cache aliasing of L1I. Think about this: in the "1 thread per module" method there is no competition between two threads in one module. Therefore, many module resources belong to single thread: L2, FPU, Scheduler.
True, but it goes deeper than that.
If they execute different progams, each "core" in the module can knock out L1i content that other core needs, so it aggravates the effect.
Cache aliasing problems are not new and unique either for AMD or BD, but they are much more visible, since L1 cache is shared between cores.
It's just a shame that I had to read about it from the linux kernel developers thread and not from "Fam15h SW Optimisation guide" that is published by AMD relatively recently and it does mention cache aliasing but not for L1 instruction cache.
All in all, BD is a nice idea with much potential and I would really like it if AMD could take it a step further, so one would have e.g four integer execution units and for float/sse units, so they could be shared across more threads or something like that.
Also, it would be nice for AMD to be less skimpy with modules- I wan't 8-moduled chip, even if I have to give up some of the L2 and L3 to get it.
Furthermore, it is not clear why do they insist on BIG L3 on the same die and don't go for two dies with cores and L1/L2 on one die and memory controller/3 on the other. They could use fat and fast HT links on the chip, so bandwidth & latencies wouldn't be an issue, especially given L3 current latencies...