10 Replies Latest reply on Nov 24, 2017 4:44 PM by mavericken

    PCIE lanes stepping on eachother?

    mavericken

      I was testing my NVME hard drives separately and simultaneously to see why my NVME raid setup was not giving me expected results. 

       

      In this test case, I tried to start them all at as close to the same time as possible:

       

      These are 4 Samsung 960 EVO on an Asus Hyper M.2 16x card.  Each drive does about the same when testing them individually NOT at the same time:

       

      I bought the Threadripper because of the 64 dedicated PCIE lanes to the processor.  It seems like there is a bottleneck making these lanes feel less than dedicated.  The 4KiB Q32T1 is especially suspect, as it seems like the numbers kinda add up, despite this being 4 different processes doing single threaded tests on a 16 core processor.

       

      I also did a 3 way simultaneous test on other M.2 slots on my board besides the Asus Hyper M.2 16x:

       

      Again, it seems like there is only so much IO capability to go around, and it is being shared among the simultaneous tests.

       

      Here are some  of the RAID tests that got me investigating this:

      AMD Raid0 7 disks no cache:

       

      AMD Raid0 3 disks no cache:

       

      AMD Raid0 4 disks no cache:

       

      On the RAID tangent, I don't seem to be getting anywhere close to the kind of results shown in material like this which prompted me to dig deeper (though I am using CrystalDiskMark instead of IOMeter):

      Super fast NVMe RAID comes to Threadripper

       

       

      The RAID numbers aside, my expectation was that I should see these individual drives performing just as well simultaneously as separately, considering I have 16 cores and 64 CPU PCIE lanes.  Why is my expectation invalid?  Is this a bottleneck in the Infinity Fabric?  It seems weird that it can push such high sequential numbers, yet the 4K tests are fighting each other for resources.

       

      I think I may have to give up on AMD raid for now anyways because of other issues (Installed 17.50 NVME Raid drivers for Threadripper, now computer cannot power down normally ), but considering how much time I sunk into it (It took me several hours to get RaidXpert online), I wanted to get my results out there.

        • Re: PCIE lanes stepping on eachother?
          black_zion

          Can't compare IOMeter to Crystal DiskMark. Also, you're using Evo drives, not Pro drives, and random operations are where Evo drives suffer.

           

          Which movies are you editing which require greater than 9.8GB/s?

          1 of 1 people found this helpful
            • Re: PCIE lanes stepping on eachother?
              mavericken

              Evo vs Pro shouldn't matter for what I am asking, as I am comparing Evo to Evo, not Evo to Pro.  This is more algebraic:

              Case 1:   1evo = 548MB/s

              Case 2:   4evo = 183MB/s + 183MB/s + 205MB/s + 302.1MB/s + 302.1MB/s

                              4evo = 1175MB/s

                              1evo = 293MB/s

               

              It is a pretty huge difference, and probably if I could press the buttons closer to the same time, they would all come back around 183.  Also this is without using RAID, just 4 SSDs that happen to be in use at the same time by 4 single core processes, and somehow they are stepping all over eachother despite being on separate CPU PCIE lanes on a 16 core chip.

            • Re: PCIE lanes stepping on eachother?
              mavericken

              Here is the best illustration of the problem I have come up with so far, and what I would now imagine as my conclusion:

               

              AMD Raid 3x Benchmarked Alone:

              AMD Raid 4x Benchmarked Alone:

               

              AMD Raid 4x and 3x benchmarked at the same time:

               

              Basically my system has magic numbers of around:

              Seq Q32T1 Read: Around 11000MB/s

              Seq Q32T1 Write: Around 11000MB/s (I remember seeing this number in an 8x raid benchmark I did)

              4KiB Q8T8 Read: Around 700MB/s

              4KiB Q8T8 Read: Around 700MB/s

              4KiB Q32T1 Write: Around 600MB/s

              4KiB Q32T1 Write: Around 600MB/s

               

               

              Whatever crazy strategy you can think of on this platform, everything will ultimately add up to these numbers.  So, if your drives' single performance numbers add up to more than any of these magic numbers, you will probably hit the same bottleneck and end up around the same numbers.  AMD Threadripper NVME raid seems to scale a little bit better than Windows Raid, but ultimately there exists some bottleneck that effects both (which I presume is in the Threadripper processor).  So if you mainly care about sequential read speed, pretty much 4 samsung drives is all you need to max out this platform.  If you mainly care about write speed, you can stack them to the non-advertised 8 drive limit if you are using 960 evo.  I would guess that 960 pro would cap out around 6.  If you are looking for the 4k random, pretty much don't bother, as it seems to only be capable of helping marginally, which was actually unexpected for me.  I would expect queue depth 1 to only be hurt by RAID, but it appears it can give the drives a bit of a rest between requests.  The advertising materials pump this Threadripper NVME RAID capabilities up quite a bit too far, resulting in a lot of wasted time for me, but hopefully some people can read this and save their own time.

               

               

              I am including a UserBenchmark so you can better estimate how your results might compare to mine:

              http://www.userbenchmark.com/UserRun/5930809

              Despite better benchmarks, my system visibly performs worse when I try to OC the CPU so I don't bother, but the memory is OCed at 3466 quad channel.

              • Re: PCIE lanes stepping on eachother?
                mavericken

                Another thing I noticed, ntoskrnl.exe saturates exactly 1 core of the processor when I run my tests, .  Looks like there is a single threaded bottleneck here.  Is the raid driver multi threaded?