9 Replies Latest reply on Jan 7, 2018 10:28 AM by binder87

    Stop thread hopping between CCXs and unnecessary SMT for Ryzen gaming and app performance:

    logic

      With Ryzen there is a separate 8 MB L3 cache per 4 core  CCX. (Core Complex)
      That L3 cache is faster than RAM, but the two L3 caches are joined together at the speed of RAM. (It looks like half the speed of RAM due to RAM being DDR)

      So whenever a thread hops from one CCX to the other it loses its cached data and has to wait for it to be moved, at the speed of RAM...

       

      Hence the faster the RAM is running; the faster the benchmark due to Windoze 10 loving to move threads from one CCX to the other willy-nilly..!

      (This seems to be less of a problem in WIn 7 due to its scheduler being more optimised for the old Intel Core 2 Quads and thus NUMA aware and not move threads around like a hyped up game of 'pass the parcel!' )

      A better option is to avoid threads from being moved from one CCX to the other as much as possible.
      This should be built into the Windoze 10 scheduler, but isnt yet..!?

       

      The next thing to avoid is SMT as much as possible:

      As I understand it; windoze and apps/games don't properly see 'one core and cache, capable of two threads', but two complete core/caches.

      Hence it's a good idea to avoid SMT until an app/game, on a CCX has/needs more than 4 threads.

       

      The rules seem to be as follows:

       

      1. Keep threads from hopping from one CCX to the other and try to keep windows/OS on one CCX and the app or game on the other/s.
      2. Keep to one thread per physical core, until you run out of cores on a CCX. ie: Avoid SMT until you need to run more than 4 threads per CCX/app.
      3. Disable core parking. (Part of AMD's balanced power plan?)

       

      Here are 3 apps that will do that for you, of which Project Mercury seems the most automated and light weight.
      I have seen rumours of 50 fps increases in certain older games by using Project Mercury,  but that needs testing and verifying.

       

      Project Mercury: Thread affinities to CCXs, SMT etc optimizations.  Very light weight/efficient.

      http://www.techcenter.dk/

      https://hardforum.com/threads/amd-ryzen-game-performance-fix.1926435/

       

      AMD Ryzen Processor Optimization added to Cacheman 10.10:

      https://www.techpowerup.com/232096/outertech-adds-amd-ryzen-processor-optimization-to-cacheman-10-10

       

      Bitsum's Process Lasso: Optimize and automate process CPU affinities:

      https://bitsum.com/

       

      If I were AMD I would be having a good look at these apps,  benchmarking like hell with these apps and perhaps speaking to the devs to get their heads together and get an ...'official' app onto every Ryzen computer out there, as well as to all the review sites!

      I want AMD to succeed and stick it to Intel!  Especially with the 12 and 16 core X399 machines.

        • Re: Stop thread hopping between CCXs and unnecessary SMT for Ryzen gaming and app performance:
          kingfish

          Is core parking no longer set by the power plan? Setting the power plan to 'Performance' used to un-park all cores.

            • Re: Stop thread hopping between CCXs and unnecessary SMT for Ryzen gaming and app performance:
              logic

              I beleive it is disabled in the Performance Power Plan and also in AMD's new Balanced Power Plan, but I am not 100% sure of that.  Hence the '?'

                • Re: Stop thread hopping between CCXs and unnecessary SMT for Ryzen gaming and app performance:
                  svenbent

                  AMD balanced power plan to my knowledge has core parking enabled. however they change it to the windos7 behavior where it parks every other logical core. insted of a pair of logical cores.

                   

                  Disabling every other logical core reduces SMT conflicts. and keeps all physical cores available for usage.

                  Coreparking is still a big issues for affinity control though since ms does not take affinity into account when figured out if a core should go out of parking

                  aka if you have a 2 threaded CPU heavy process ( less say 7-zip) and you put affinity to core 1 and 3 ( two diffrent physical cores) and if windows have alread decide that lets say core 3 and 5 should be parked. it will not unpark cores 3 because it still see a lot of unused logical core.s however your 3 threaded process is now only actuall beeing executed on one out of the 2 logical cores it was assigned to, and you performance drops.

                   

                  It's my general advice to always disable coreparking.  wall power-usage measurements has shown not different in power usage between core parking and disabled coreparking. so the effect on power/heat is minimal.

                  1 of 1 people found this helpful
              • Re: Stop thread hopping between CCXs and unnecessary SMT for Ryzen gaming and app performance:
                logic

                "...Cacheman 10.10 updates the optimization profiles for AMD Ryzen 7 1700, 1700X, 1800X and Ryzen 5 1400, 1500X, 1600, 1600X processors. A range of computer performance profiles is available including gaming, graphics workstation, digital audio recording, notebook, and server.

                 

                Outertech has discovered that tying some Windows applications to the first Ryzen 7 CPU CCX group (4 physical + 4 virtual processor cores) can increase the performance by a significant factor, as thread switching between two CCX groups is avoided. This will work well only with applications that do not make full use of all 16 CPU cores, particularly computer games. Windows 10 appears to not be aware that the Ryzen processor consists of two individual CPU core groups. Switching program threads from one CCX group to another can cause performance degradation on an otherwise very fast processor.."

                 

                A free test version of Cacheman can be downloaded at:

                https://www.outertech.com/en/how-to-speed-up-your-computer

                  • Re: Stop thread hopping between CCXs and unnecessary SMT for Ryzen gaming and app performance:
                    svenbent

                    I have not tested cacheman. So i cant say any thing about it. I will look into it just to see what it does though

                     

                    However before you shell out money for it. My tool "Project Mercury" has the same features built into it. It is called "No CCX Switching" which lock the active process into The first CCX unit and thereby avoids the dreaded CCX to CCX threade communication when you are playing your favorite game.

                     

                    I don't have any real-world benchmarks on this since i don't have a Ryzen based system and money is currently not to upgrade my old I7 3770K system.

                    I Did contact AMD in a hopeless attempt to see if they would provide me with a press kit to run testing and development on. However AMD did not responds to me, which they are in no way to be blamed for.

                     

                    In future version there will be hotkeys for quickly changing the thread distribution while you are in game. and an upcoming PRO-versionen you can set profile for different processes.

                  • Re: Stop thread hopping between CCXs and unnecessary SMT for Ryzen gaming and app performance:
                    logic

                    Edit:
                    I need to clarify:

                    "...That L3 cache is faster than RAM, but the two L3 caches are joined together at the speed of RAM. (It looks like half the speed of RAM due to RAM being DDR)

                    So whenever a thread hops from one CCX to the other it loses its cached data and has to wait for it to be moved, at the speed of RAM..."

                     

                    * ...the two L3 caches are joined together via the Infinity Fabric bus which runs at the frequency of RAM. (It looks like half the speed of RAM due to RAM being DDR)

                    So whenever a thread hops from one CCX to the other it loses its cached data and has to wait for it to be moved at the speed of the Infinity Fabric bus...

                     

                    Wish I had my hardware already so I could test/bench all this!
                    You would think users would be all over it, testing and posting et...?

                    • Re: Stop thread hopping between CCXs and unnecessary SMT for Ryzen gaming and app performance:
                      logic

                      Want to see which games are faster with SMT off:

                      AMD Ryzen 5 1600X CPU Review

                       

                      The review has interesting info on Game Mode in Creators Update:

                      Most notably, the Creators Update includes a new Game Mode that optimizes GPU and CPU performance. The mode improves resource utilization by employing optimized scheduling in concert with isolating and reducing background tasks. This also confers performance consistency advantages. Windows enables Game Mode by default for a pre-selected whitelist, but it can also be turned on in any UWP or Win32 title.

                      Game Mode is easy to use. While in your game of choice, toggle the Game Bar with the Win + G key combination and select Game Mode from the settings menu. After that, the game continues to operate in Game Mode for any subsequent session until you disable the feature. We measured gains of several FPS in most titles and recorded improved frame time consistency.

                       

                      Now if only there were similar benchmarks showing what happened if the games were kept on one CCX and Windows on the other..!
                      Perhaps I can bend the reviewer's ear a bit to test Project Mercury.

                       

                      Sisoft's benchmarks of Ryzen L1, 2, 3, Caches and RAM:

                      April 2017 – SiSoftware

                      • Re: Stop thread hopping between CCXs and unnecessary SMT for Ryzen gaming and app performance:
                        svenbent

                        Thank for the link to my little tool (Project Mercury)

                         

                        I have seen rumours of 50 fps increases in certain older games by using Project Mercury,  but that needs testing and verifying.

                        This number is hard to confirm because of its static nature. typicall performance increase from project mercury would work in a percentage way.

                        Improvements from disabling core parking and better thread handling could give up to 20% boost in heavy CPU dependent programs (Cinebench) but only in very specific situation where number of threads are equal or lower then the amount of physical cores, but more than 1.

                         

                        However an increase in 50fps is more likely to steam from the  core functionality of the program which is to adjust CPU priority. adjusting CPU priority under heavy multitasking system can have a big effect on the perceived performance because what you are working with is getting more CPU focus. offcause this extra performance odes not come out of thin airs and instead comes at the cost of any backgrounds task are now running slower.

                        the typical performance improvement form this can roughly be measure as a 60% gain of lost FPS due to background task

                         

                        Soo if you are running 200FPS with the game alone. but it drops to 100 because you are running some CPU heavy background processes. PRoject mercury will giveyou 60FPS out of the lost 100FPS back.  so you will instead of dropping to 100FPS  only drops to 160FPS.

                         

                        here are some of the benchmark that i have by me from back when i ran the test of this function. I ran my test on more than 16 different system both AMD and Intel ,laptops and desktop. This is jus a small snippet

                         

                        Copy paste to notepad might be better

                         

                        FFX XIV
                        unloaded    7z+cipher    +PM
                        18304        6033        17360
                        18144        5712        17375
                        18204        5486        17184
                        18297        5279        17258
                        18221        6158        17185

                         


                        CatZilla
                        unloaded        7z+cipher        +PM
                        28181/878/15877ms    6967/116/54949ms    27372/656/24730ms
                        27586/736/10036ms    6882/101/30184ms    27644/702/9433ms
                        28284/871/9653ms    6300/107/36245ms    27880/616/9448ms
                        27645/722/10072ms    6201/109/35299ms    27291/694/1740ms
                        27481/715/10045ms    6326/104/36200ms    27861/625/9466ms

                         


                        Resident evil 5
                        Unloaded    7zip+cipher    +PM
                        166.9        132.5        161.7
                        166.6        130.6        162.6
                        167.1        135.5        162.5
                        166.5        136.1        162.7
                        166.7        135.2        162.0

                         

                         

                        AMD Athlon NEO x2  L335 1.6GHz - 4GB - AMD 3200 HD

                         

                                No Load        7-zip load    7-zip+PM
                        3Dmark2001se    6236        4854        5889   
                                6231        4888        5848
                                6242        4891        5829

                         

                        Tropics        2.9 (1.6)    2.8(1.6)    2.8(1.6)
                                2.9 (1.6)    2.8(1.6)    2.8(1.5)
                                2.8 (1.6)    2.8(1.6)    2.8(1.6)

                         

                        Sanctuary    3.7 (2.3)    3.7 (2.1)    3.7 (2.1)
                                3.7 (2.1)    3.7 (2.2)    3.7 (2.1)
                                3.8 (2.1)    3.7 (2.1)    3.7 (2.1)

                         

                        Lightsmark 2008    17.4        15.6        16.8
                                17.4        15.8        17.0
                                17.4        16.2        16.9

                         

                        DroneZmark    161.11(117.76)    119.60(0.00)    126.40(83.75)
                                158.46(103.90)    124.75(82.10)    127.03(85.66)
                                162.30(118.00)    124.72(86.00)    123.21(88.82)

                         

                        Quake3 demo1    156.3        42.1        187.7
                                203.7        32.6        192.8
                                203.5        100.6        185.4

                         

                        Quake3 demo2    204.2        58.1        170.8
                                205.7        53.9        191.0
                                202.3        53.7        201.9

                         

                         

                         

                        Core I7 M620 - 4GB - Intel HD graphics

                         

                                No load        7-zip load    7zip+PM
                        3Dmark2001se    10146        7504        8458
                                10214        7299        8472
                                10217        7314        8388

                         

                        Tropics        3.4FPS (2.2)    3.3FPS (2.2)    3.3FPS (2.2)
                                3.4FPS (2.2)    3.4FPS (2.2)    3.3FPS (2.2)
                                3.5FPS (2.2)    3.3FPS (2.2)    3.3FPS (2.2)

                         

                        Sanctuary    5.6FPS (3.3)    5.5 (3.3)    5.5 (3.3)
                                5.6FPS (2.8)    5.6 (3.3)    5.6 (3.3)
                                5.6FPS (3.3)    5.6 (2.7)    5.5 (3.3)

                         

                        Lightsmark 2008    13.1FPS        8.4FPS        10.8
                                13.0FPS        8.9FPS        10.6
                                13.1FPS        8.6FPS        10.6

                         

                        Quake3 Demo 1    185.7FPS    111.1FPS    151.3
                                186.2FPS    134.6FPS    167.7
                                187.4FPS    114.1FPS    146.2

                         

                         

                        I got a bit off topic here. i Apologize

                        Just to clarify the 50FPS increase is probably more due to CPU Priority adjustments than thread handling optimization

                         

                         

                        1. Keep threads from hopping from one CCX to the other and try to keep windows/OS on one CCX and the app or game on the other/s.
                        2. Keep to one thread per physical core, until you run out of cores on a CCX. ie: Avoid SMT until you need to run more than 4 threads per CCX/app.

                        This is the exact reason why I wanted to see if AMD would donate a review kit to me. I am highly curios about if its better to use 2 CCX units with no SMT. or one CCX units with SMT.  Without measuring this i can't make Project Mercury fully automatic.

                         

                         

                        I hope my feedback was usefull

                        1 of 1 people found this helpful
                        • Re: Stop thread hopping between CCXs and unnecessary SMT for Ryzen gaming and app performance:
                          binder87

                          Bump! That's an awesome thread!

                          Im currently experimenting with smt off with my 1600 as I mostly game with it...

                          Whats the status with that now? Its been a while since the last post in this thread.

                          Is turning off smt still recommended for gaming ? Is there any updates regarding manual software optimizations in windows to maximize ryzen and surpass the windows scheduler /ccx latency issues ?