4 Replies Latest reply on Feb 23, 2018 6:13 AM by admmedlifer

    Are Xeon and Opteron CPU Caches protected with ECC also  ?

    waltersulivan

      Hello!

      I wonder if there is any protection on CPUs rather than only Memory ECC .

      ECC is to ensure stability and no error while compute/working 24/7

       

      and if Memory is ECC protected , but CPU is not , might there be any errors on CPU side ?

      or CPU doesn't need any ECC ?

          • Re: Are Xeon and Opteron CPU Caches protected with ECC also  ?
            waltersulivan

            "Many processors use error correction codes in the on-chip cache, including the Intel Itanium processor, the AMD Athlon and Opteron processors, and the DEC Alpha 21264."

            So, ryzen 7 CPUs have ECC in cache also ?

              • Re: Are Xeon and Opteron CPU Caches protected with ECC also  ?
                juanpc

                #1. L2 Cache can be disabled in Bios.

                Dissable CPU prefetch.

                Prefetch is a fancy word for cache...

                Windows has superfetch, that loads most common software you use when you start Windows.

                thats why the HDD/SDD Led blinks so much when you login, doing nothing.

                ,

                #2. CPU world does not say....

                http://www.cpu-world.com/CPUs/Zen/AMD-Ryzen%207%201800X.html

                 

                But also does not say for Opeton 6386

                http://www.cpu-world.com/CPUs/Bulldozer/AMD-Opteron%206386%20SE%20-%20OS6386YETGGHK.html

                 

                if you want ECC, usually you must buy server parts...

                basically that's the main difference between gaming & server... & overclocking.

                 

                At least on Intel.

                 

                Cache are too small, too fast, and too close to the CPU, and the CPU is shielded with a metallic lid.

                 

                ECC memory is usually to avoid errors caused by Solar Radiation, that can flip 1 to 0,

                one of the Top500 supercomputers made entirely with Power Mac G5, had that problem...

                "Big Fail"

                Virginia Tech Apple G5 super computer cluster - YouTube

                since then all MacPro have ECC memory.

                 

                There are different kinds of ECC memory, Low Voltage, Buffered and Unbuffered...

                CPU usually provides power to the memory, Unbuffered...

                but if you have too much memory installed, more than 32GB,

                CPU can't deliver all the power needed under full load and errors will happen often.

                that's where Buffered comes in... they use motherboard power to buffer the memory.

                Buffered memory must have special flip flop gates, fifo buffers installed on the memory, more expensive and may increase latency a tiny bit more.

                 

                some gaming boards allow Unbuffered ECC memory,

                also some CPU's, but not all.

                 

                Boards with ECC memory will automatically scrub all the memory in 8 hours.

                Like a screen saver.

                SSD also have a Big Cache, but is not for increasing speed like standard HDD's,

                the big cache in SSD's are there to scrub all the memory, not to increase speed...

                But Samsung SSD's have a software Samsung Magician that will use 1GB of system Ram as a fast cache for the SSD,

                if power fails you loose 1GB of data, if a memory error occurs could write 1GB of faulty data to the SSD, but because Data does not stay long in the cache, usually won't have any problems.

                 

                let me tell you my story...

                i had RamDisk Dataram, with a Gaming Board, the best gaming board of that year, 8GB or 12GB.

                also, Windows BitLocker on that drive, an error occurred, and lost all data.

                 

                usually memory errors happen when they hold information for too long.

            • Re: Are Xeon and Opteron CPU Caches protected with ECC also  ?
              admmedlifer

              Yes they do since cache is extremely fast it's not entirely uncommon.  Below you can see a failure in L2 from one of my servers.

              For reference, if you search aspects of this error and L3 you will likely find examples of L3 failures or when it's in the main memory.

               

              Feb 17 09:05:38 x kernel: [152569.582323] mce: [Hardware Error]: Machine check events logged

              Feb 17 09:05:38 x kernel: [152569.588304] [Hardware Error]: Corrected error, no action required.

              Feb 17 09:05:38 x kernel: [152569.594612] [Hardware Error]: CPU:54 (15:2:0) MC2_STATUS[Over|CE|MiscV|-|AddrV|-|-|CECC]: 0xdc50c01000040136

              Feb 17 09:05:38 x kernel: [152569.604629] [Hardware Error]: Error Addr: 0x00000035e35589e0

              Feb 17 09:05:38 x kernel: [152569.610480] [Hardware Error]: MC2 Error: Fill ECC error on data fills.

              Feb 17 09:05:38 x kernel: [152569.618511] [Hardware Error]: cache level: L2, tx: DATA, mem-tx: DRD