7 Replies Latest reply on Apr 26, 2017 5:48 AM by logic

    Question about Ryzen' L3 cache


      with Newest version of aida64, I saw huge increase in L3 cache bandwidth when I increase memory speed from 2400 to 2966.

      I saw correlation between L3 and Ram Bandwidth( like 10 to 1)

      The question is

      Does L3 run the same speed as the Data-fabric,RAM speed or Core speed?

        • Re: Question about Ryzen' L3 cache

          Nope, AIDA64 released a new version lately with better support for Ryzen, this includes bandwidth & latency for all L cache's.. so those readings now are more optimised.


          Optimized 64-bit benchmarks for AMD Ryzen “Summit Ridge” processors

          AIDA64 CPUID Panel, Cache & Memory Benchmark panel, GPGPU Benchmark panel, System Stability Test, and all cache, memory and processor benchmarks are fully optimized for AMD Ryzen “Summit Ridge” high-performance desktop processors, utilizing AVX2, FMA3, AES-NI and SHA instructions. Detailed chipset information for AMD Ryzen “Summit Ridge” integrated memory controller. Preliminary support for AMD Zen server and workstation processors.



          • Re: Question about Ryzen' L3 cache

            Aha I assumed you saw the results with a newer version of AIDA.


            I don't know actually, never saw anything in this regard. The only relation I see is the IMC and Infinity Fabric.


            If AIDA64 tests involves both L3's (which is ofcource the case) then yes, as the memory speed increases the IF will also increase in speed resulting in L3 bandwidth... Infact it can affect all L caches depending on how AIDA is testing.


            The Memory to IF relation is 1:1, if the Memory is working at 3200, then the memory clock speed is 1600MHz and so will the IF clock speed. Though the IF is a 256bit bus so even at 1600MHz it will only provide 48GB/s which is no where the difference you saw.


            The relation between L3 speed and other clocks are unknown -yet-. Having a lock in clocks between IMC and L cache's will be tricky.

            • Re: Question about Ryzen' L3 cache

              Tested using the Asus Prime B350-Plus ( Beta Bios 0605 ) Windows 10 Pro version 1703 build 15063.14



              • Re: Question about Ryzen' L3 cache

                There is a separate 8 MB L3 cache per 4 core module, or CCX. (Core Complex)
                That L3 cache is faster than RAM, but the two L3 caches are joined together via the infinity fabric bus which runs at the frequency of RAM. (It looks like half the speed of RAM due to RAM being DDR)

                So whenever a thread hops from one CCX to the other it loses its cached data and has to wait for it to be moved, ...'at the speed of RAM'.


                Hence the faster the RAM is running; the faster the benchmark due to Windoze 10 loving to move threads from one CCX to the other willy-nilly..!

                (This seems to be less of a problem in WIn 7 due to its scheduler being more optimised for the Intel Core 2 Quads and thus NUMA aware and thusa not move threads around like a hyped up game of 'pass the parcel'! )

                A better option is to avoid threads from being moved from one CCX to the other as much as possible.
                This should be built into the Windoze 10 scheduler, but isnt yet..!?


                The next thing to avoid is SMT as much as possible:

                As I understand it; windows and apps/games don't properly see 'one core and cache, capable of two threads', but two complete core/caches.

                Hence it's a good idea to avoid SMT until a CCX/app/game has/needs more than 4 threads.


                Here are 3 apps that will do that for you, of which Project Mercury seems the most automated and light weight.
                I have seen rumours of 50 fps increases in certain older games by using Project Mercury,  but that needs testing and verifying.


                Project Mercury: Thread affinities to CCXs, SMT etc optimizations.  Very light weight/efficient.




                AMD Ryzen Processor Optimization added to Cacheman 10.10:



                Bitsum's Process Lasso: Optimize and automate process CPU affinities: