I am using a Windows 10 Server with a dual-socket AMD7702 processor set.
hence I have access to 128 cores or 256 threads.
When I run a multi-threaded application, the CPU Speed reported on the task manager's front is changing between 2.35GHz to 2.42GHZ. However, there are instances in which the speed drops down to 0 (zero) GHz.
Is this correct? I noticed this because one of my software runs started crashing randomly.
I made the same run on another machine, which uses the AMD Ryzen Threadripper 2990WX CPU instead for which the speed remains constant during the run at 3.39GHz. My software run never crashes although I fully use the 64 threads available to the processor.
Please let me know what is right and whether I need to sort out the dual-socket processor with the supplier.
Please ensure that your board vendor has validated this OS & processor configuration. They would then be able to further assist. I would always ensure latest vendor BIOS, and Windows Updates are installed.
There have been issues with Windows reporting of processor frequency. However, these are cosmetic, e.g. https://docs.microsoft.com/en-us/troubleshoot/windows-server/performance/cpu-frequencies-dont-match Meanwhile, have you considered changing Windows Power Scheme to High Performance? Or perhaps reviewing BIOS settings related to power?
Beyond this, and without having much further detail, some areas to consider to help you debug could relate to NUMA:
Looking at Task Manager, are all cores used?
Hope this helps. We will also look into a few more things on our side, and if we discover anything will come back.
Thanks for your response.
I checked the Windows Power Scheme and indeed it is set to High Performance. However, I have not checked the BIOS settings or its version yet as the software does not crash for smaller size models, although these models run with 250 threads too.
I choose how many cores to use and all the tests for small and large models are using 250 threads (125/128 cores).
Is it possible that the problem is related to memory? I use for the large model tests roughly 500GB of RAM in a machine that has 1TB of RAM.
Is it possible that this may be due to faulty processor or faulty RAM?
I have looked at the Power scheme and indeed it is set to High performance.
I have not checked the BIOS as this is harder to check for a novice.
I run all the tests (small and large) on 250 threads out of 256. The 0 GHz speed occurs only on the large models before the fail.
They require 500GB of RAM out of 1TB RAM availability.
Could the problem be related to overheating? Could it be a memory issue?
Could be a faulty CPU as this was one of the earliest CPUs available in the market?
Look forward to your response.