I found quite...interesting thing, and Im not sure what to think about it. I have several theories but none of them fit.
Not long time ago I decided to use my FX-8320 to check it in XMR_STACK_CPU, mining some altcoins with CryptoNight Algorithm. I own this CPu from Q4 2012, Mildly Overclocked, as my mobo's VRM is... not great.
As far as Im aware of, this algorithm is kinda heavy on FPU (Floating point)
During optimization of caches process, I found that I have difference in using differed pair of integer clusters (aka. Cores) per module in my Piledriver CPU.
Using 4 thread mining, even cores 0, 2, 4, 8 affinity was giving me avg 55 H/s per core. Peak was 65.2 H/s
Using 4 thread odd cores, 1,3,5,7 affinity was giving me avg 80 H/s, Peak was 91.25 H/s
It is a bit confusing for me. Using random cores without affinity gives me just slight better score than using even Cores.
From what I found on google, on one of Images representing "die shot" shows, that Core2 (or Core 1 in Windows) was closer placed to Cache Unit, next to Shared L2 block. However, Block diagram don't have any traces of "better" or closer pipeline connection to cache block of additional core, so I'm not happy with this theory and don't think its linked to this.
Now a bit more details about machine.
FX 8320 "Vishera" 3500mhz @4000 Mhz (20x200) 1.35v
Asrock 970 Extreme4, Oced, NB 2600Mhz, HT 2600Mhz, NB 1.3v, HT 1.3v, SB 1.2v
DDDR3 2x4GB 1866MHz CL 9 GYS1866D364L9AS/8GDC
Kingston V300 120 GB Sata 3 (This backstabbingly crippled one)
OCZ ModXtream Pro 700
Idle CPU temp is ~40c, 4 core load (50% load) is 52-55c. Full 100% Load never exceed 62c
Im not sure what it exactly can mean. I recall a fact, that after first two years I had odd cores locked in windows, what I discovered by installing Parking Core Manager, so my other theory would be, Cores got degraded? But I never had it actually overheating or causing issues, and during first two years I was not loading it too much.
Thanks in advance for all responses and ideas
I have exactly the same question.
I am mining on cores 1 and 3 on a A8-6600K as they are faster for this than cores 0 and 2.
Mining is a FPU heavy task, and the Cryptonote algorithm is specifically designed to rely heavily on the L2 and/or L3 caches.
As each cluster consists of 2 integer units and 1 FPU one would expect no difference between using the FPUs and IU 0, 2 or 1, 3?
The pics of the cores show the L2 cache closer to the odd numbers IU cores.
Could that account for the performance difference? Is there lower latency between the L2 cache and the closer IU?https://regmedia.co.uk/2015/11/06/amd_diag_2.jpg?x=648&y=327&infer_y=1
Mining also relies heavily on instruction extensions AES and AVX etc.
I have no idea where they are situated in the CPU?