1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

10Gbe poor performance in windows

Discussion in 'Networks & Internet Connectivity' started by Kei, 15 Jun 2019.

  1. Kei

    Mobster

    Joined: 24 Oct 2008

    Posts: 2,605

    Location: South Wales

    I've run some basic testing on the 3 machines that are going to be linked together via 10Gbe and found some slightly strange performance issues.

    This is the kit that is linked together:

    Switch - Juniper EX3300 with juniper SR transceivers no vlans or L3 features enabled 1500 MTU
    PC1 - Windows 10 1803 - Threadripper 1920x / Intel X710 with intel SR transceiver Tx/Rx Buffers adjusted to 4096
    PC2 - Windows 10 1803 - i7 4820K / Intel X710 with intel SR transceiver Tx/Rx Buffers adjusted to 4096
    NAS - Fedora 30 - i5 9600K / Solarflare SFN7122F with Avago SR transceiver no driver adjustments

    Running iperf server on PC1 and client on NAS nets around ~7.5Gbps. Running iperf server on the NAS and PC1 as client nets ~3Gbps. Running iperf on PC1 and PC2 results in ~1.5Gbps no matter what I try.

    I'm not seeing any dropped packets or errors on the switch monitoring for the 10gig ports. I've tried disabling flow control and interrupt moderation in the intel driver and that makes zero difference. Task manager doesn't show the cpu getting hammered either so I'm not sure what's going on here.

    Since there is the suggestion that iperf is not optimised for windows, I also gave NTttcp a go too.

    Code:
    PS I:\Downloads\NTtcp> ./ntttcp.exe -s -m 8,*,192.168.0.160 -l 128k -a 2 -t 15
    Copyright Version 5.33
    Network activity progressing...
    
    
    Thread  Time(s) Throughput(KB/s) Avg B / Compl
    ======  ======= ================ =============
         0   15.016        95863.612    131072.000
         1   15.454        34596.609    131072.000
         2   15.047        43018.276    131072.000
         3   14.750        13797.966    131072.000
         4   15.016        56984.550    131072.000
         5   15.016        55688.865    131072.000
         6   15.016        54248.269    131072.000
         7   15.016        56686.201    131072.000
    
    
    #####  Totals:  #####
    
    
       Bytes(MEG)    realtime(s) Avg Frame Size Throughput(MB/s)
    ================ =========== ============== ================
         6037.750000      15.015       1460.054          402.115
    
    
    Throughput(Buffers/s) Cycles/Byte       Buffers
    ===================== =========== =============
                 3216.916       2.353     48302.000
    
    
    DPCs(count/s) Pkts(num/DPC)   Intr(count/s) Pkts(num/intr)
    ============= ============= =============== ==============
        28524.875         0.627       52573.427          0.340
    
    
    Packets Sent Packets Received Retransmits Errors Avg. CPU %
    ============ ================ =========== ====== ==========
         4336168           268517           0      0      1.184

    In order to rule out the network infrastructure itself, I've also tried doing iperf tests on the local machine itself to the loopback address sees an average of 4.5Gbits/s. Setting the TCP window size to 2048000 results some heavy variability but the best I've seen is 8.9GBits/s and the average is around 6.5. This suggests to me that it's a windows issue.
    [​IMG]

    Do the same thing on the 9600k linux server results in something like I'd expect.
    [​IMG]

    Running similar tests in ntttcp give the same results.
    Code:
    PS I:\Downloads\NTtcp> ./ntttcp.exe -r -m 8,*,192.168.0.2 -l 128k -a 2 -t 15
    Copyright Version 5.33
    Network activity progressing...
    
    
    Thread  Time(s) Throughput(KB/s) Avg B / Compl
    ======  ======= ================ =============
         0   15.000        76458.280     48480.812
         1   15.000        76458.755     48927.487
         2   14.999        76464.043     53761.016
         3   14.999        76463.758     48379.198
         4   15.000        76459.896     54396.665
         5   15.001        76454.229     64123.136
         6   15.000        76458.660     55378.178
         7   15.000        76458.755     51974.087
    
    
    #####  Totals:  #####
    
    
       Bytes(MEG)    realtime(s) Avg Frame Size Throughput(MB/s)
    ================ =========== ============== ================
         8960.028477      15.000       1359.733          597.335
    
    
    Throughput(Buffers/s) Cycles/Byte       Buffers
    ===================== =========== =============
                 4778.682       9.155     71680.228
    
    
    DPCs(count/s) Pkts(num/DPC)   Intr(count/s) Pkts(num/intr)
    ============= ============= =============== ==============
         1103.533       417.425       30830.067         14.941
    
    
    Packets Sent Packets Received Retransmits Errors Avg. CPU %
    ============ ================ =========== ====== ==========
         6909840          6909643           0      0      6.840
    PS I:\Downloads\NTtcp> ./ntttcp.exe -r -m 24,*,192.168.0.2 -l 256k -a 2 -t 15
    Copyright Version 5.33
    Network activity progressing...
    
    
    Thread  Time(s) Throughput(KB/s) Avg B / Compl
    ======  ======= ================ =============
         0   15.000        24166.232    112041.449
         1   15.001        24164.716    107655.099
         2   14.993        24161.063    104284.717
         3   15.001        24164.716    111003.224
         4   15.008        24169.690    111544.877
         5   14.991        24164.381    104051.108
         6   14.994        24158.976    102780.116
         7   15.002        24163.105    105483.029
         8   14.999        24168.698    108318.197
         9   15.001        24164.716    112688.154
        10   14.992        24162.865    108146.840
        11   14.999        24167.938    117802.215
        12   15.000        24166.327    110474.637
        13   15.000        24166.327    108251.613
        14   15.001        24164.811    105483.444
        15   14.993        24160.778    102017.701
        16   14.994        24160.973    107432.407
        17   15.001        24164.716    105393.180
        18   15.003        24161.495    103224.355
        19   15.001        24163.955    105901.027
        20   15.001        24164.716    108409.690
        21   14.993        24162.394    102930.405
        22   14.992        24162.770    106807.429
        23   15.001        24163.480    109459.098
    
    
    #####  Totals:  #####
    
    
       Bytes(MEG)    realtime(s) Avg Frame Size Throughput(MB/s)
    ================ =========== ============== ================
         8494.292297      15.000       1309.137          566.286
    
    
    Throughput(Buffers/s) Cycles/Byte       Buffers
    ===================== =========== =============
                 2265.145       7.807     33977.169
    
    
    DPCs(count/s) Pkts(num/DPC)   Intr(count/s) Pkts(num/intr)
    ============= ============= =============== ==============
         1336.133       339.470      404661.400          1.121
    
    
    Packets Sent Packets Received Retransmits Errors Avg. CPU %
    ============ ================ =========== ====== ==========
         6803737          6803653           0      0      5.530
    PS I:\Downloads\NTtcp> ./ntttcp.exe -r -m 12,*,192.168.0.2 -l 32M -a 2 -t 15
    Copyright Version 5.33
    Network activity progressing...
    
    
    Thread  Time(s) Throughput(KB/s) Avg B / Compl
    ======  ======= ================ =============
         0   14.877        61672.682  22369633.333
         1   14.880        61660.248  22369633.333
         2   14.880        61660.248  22369633.333
         3   14.882        61651.962  22369633.333
         4   14.895        61598.153  18067780.769
         5   14.883        61647.819  22369633.333
         6   14.893        61606.425  22369633.333
         7   14.888        61627.115  22369633.333
         8   14.885        61639.536  22369633.333
         9   14.897        61589.883  22369633.333
        10   14.880        61660.248  22369633.333
        11   14.883        61647.819  22369633.333
    
    
    #####  Totals:  #####
    
    
       Bytes(MEG)    realtime(s) Avg Frame Size Throughput(MB/s)
    ================ =========== ============== ================
        10752.005768      15.000       1381.202          716.800
    
    
    Throughput(Buffers/s) Cycles/Byte       Buffers
    ===================== =========== =============
                   22.400       5.567       336.000

    Run the same test on the threadripper system in a live boot of fedora 30 and all is well. Definitely seems to be a windows issue.

    [​IMG]

    Question is, what's wrong in windows as I'm seeing the same problem on both my threaripper system and my x79 system. But not having any trouble if I use linux instead.
     
  2. Avalon

    Soldato

    Joined: 29 Dec 2002

    Posts: 5,845

    Windows simply isn’t optimised for 10Gb and this has been discussed for several years, while I commend you on testing it, 30 seconds on google should have told you the same, SNB has a long running discussion on this with pretty much all tweaks required covered from memory.
     
  3. Kei

    Mobster

    Joined: 24 Oct 2008

    Posts: 2,605

    Location: South Wales

    What baffles me is that windows itself isn't necessarily the issue as I've seen plenty of other people have no issues achieving near 10 gigabits/s transfers using windows 10 with similar or even older hardware. The performance I'm seeing on my threadripper system isn't actually too bad but it's not great either. The x79 system on the other hand is frankly awful. That thing can barely push past gigabit in windows. I'm suspecting that the OS on that system might be "borked" as I have the feeling it was upgraded from an F8320/990FX without reinstalling.

    Showing the network adapter info in powershell makes me suspicious of the reported pcie link width.
    Code:
    PS C:\WINDOWS\system32> Get-NetAdapterHardwareInfo
    
    Name                           Segment Bus Device Function Slot NumaNode PcieLinkSpeed PcieLinkWidth Version
    ----                           ------- --- ------ -------- ---- -------- ------------- ------------- -------
    WiFi                                 0   4      0        0    1        0      2.5 GT/s             1 1.1
    10Gbe 1                              0   8      0        0             0      8.0 GT/s             2 1.1
    Gigabit Lan                          0   5      0        0    1        0      2.5 GT/s             1 1.1
    10Gbe 2                              0   8      0        1             0      8.0 GT/s             2 1.1

    This is what I get for the X79 system. Definitely operating at pcie 3.0 in both cases but the link width is definitely wrong on the threadripper system.
    Code:
    PS C:\WINDOWS\system32> Get-NetAdapterHardwareInfo
    
    Name                           Segment Bus Device Function Slot NumaNode PcieLinkSpeed PcieLinkWidth Version
    ----                           ------- --- ------ -------- ---- -------- ------------- ------------- -------
    Ethernet 2                           0   5      0        1                    8.0 GT/s             8 1.1
    Ethernet 3                           0   5      0        0                    8.0 GT/s             8 1.1
    Onboard 1Gbe                         0   0     25        0                     Unknown
     
    Last edited: 16 Jun 2019
  4. Kei

    Mobster

    Joined: 24 Oct 2008

    Posts: 2,605

    Location: South Wales

    Having done significant reading on this matter, that should only be true for server 2008 / vista and earlier in respect of the default TCP window size. The only tweaks that seem to be left open to the end user that are worth tinkering with are RSS queues and RX/TX buffers. I've read through what I could find on SNB and I've pretty much covered all of it except the MTU which I'd prefer to keep at 1500 as jumbo frames shouldn't be necessary to achieve 10Gb/s. Either way, testing iperf on the loopback address should be taking the network out of the equation and I should see what the system bus is capable of handling, hence why I was expecting 30+Gb/s rates, which I do get on linux but not on windows.

    In respect of the pcie link width issue I spotted, I've now fixed it by swapping the NIC and audio interface around as both were in 8x slots. GetNetAdapterHardwareInfo reports the correct link width and speed now but it didn't make 1 iota of difference to the throughput though. Something I've just tested running 3 separate server/client instances of iperf3 which is suggested for 40G/100G network testing. In Work on my old Z800 workstation (dual x5650), one instance on the loopback address tops out at 9.5Gb/s. Three instances net similar results per instance meaning ~30Gb/s. I'll need to try this once I'm back at home and see how that fares.
     
    Last edited: 17 Jun 2019
  5. Jez

    Caporegime

    Joined: 18 Oct 2002

    Posts: 32,412

    I dont wish to contradict this as such but in my (i'd say fairly extensive!) experience any remotely recent Windows release works just fine with a 10gbit NIC. 10gbit is the default bandwidth assigned to a regular vmxnet3 nic, which nearly everyone will have been running for years. Windows servers utilise all of this bandwidth just fine assuming other baseline hardware is in line, i have never had to tweak the OS to do so.

    I dont know why the guys performance is low (although i would question why he feels the need to have 10gbit connections in his house) but i dont think blaming Windows is the answer personally.
     
  6. Firestar_3x

    Caporegime

    Joined: 11 Mar 2005

    Posts: 30,301

    Location: Leafy Cheshire

    my 10gig home network works fine with w10 on 3 desktops and 2016 on 2 servers, only change is jumbo frames and a 9000+ mtu.

    However when first setup i had terrible transfer speeds, the issue was an Intel AT2 adapter in one server had a loopback error and borked the whole network, changed for an X550-T2 and all was fixed.
     
  7. Kei

    Mobster

    Joined: 24 Oct 2008

    Posts: 2,605

    Location: South Wales

    The main reason for upgrading is that I run out of bandwidth at 1gbit to my server. If I run one copy from my PC, it saturates the link which affects others in the house trying to use it. Rather than using multiple aggregated 1gbit links it seemed easier and only a little more expensive to just go the whole hog and get 10gbit. I don't expect to be able to saturate the 10gbit link from either of my pc's as their storage is limited to 6gbit as my only nvme drive is for the OS only. The server on the other hand should be able to manage to saturate the link with all 12 disks in the array as it manages 700MB/s with 7 disks.
     
  8. Jez

    Caporegime

    Joined: 18 Oct 2002

    Posts: 32,412

    Fair :)
     
  9. jaybee

    Mobster

    Joined: 10 Jul 2008

    Posts: 4,972

    I've got a similar issue on my home network where by my main PC when connecting to other machines running iperf as server, gets 950mbps going out. But when I put the main PC in iperf server mode and other machines connect back to it, they can only get 650mbps. Main PC has a realtek NIC, other client machines have realtek and intel but both get the same speeds.
     
  10. Kei

    Mobster

    Joined: 24 Oct 2008

    Posts: 2,605

    Location: South Wales

    Just pushed the 1903 update on my threadripper machine and iperf performance is now fixed on this machine. Across 5 instances I saw just shy of 40Gbit/s and a single instance gave just over 16Gbit/s. I'll look into doing the X79 system tomorrow and see if it also sees the same improvement.