iperf3 in multi-threading to get the most out of 10GbE: a complete guide

Last update: 08/10/2025
Author Isaac
  • Upgrade to iperf3 ≥ 3.16 and use 4–8 threads (-P) to saturate 10GbE.
  • Check RSS/offloads and that the NIC is not limited by degraded PCIe.
  • Check bidirectional (-d) and adjust window/zero-copy if CPU is tight.

Testing with iperf3 multithreading for 10GbE

In 10GbE networks that are increasingly common in home and professional equipment, The bottleneck usually moves from the card to the CPU. Therefore, tuning iperf3 to exploit multiple threads can make the difference between staying at 6–8 Gbps or nearly saturating the link. In this guide, you'll see, with real-life examples, how to take advantage of multithreading mode, which versions you need, and what pitfalls it poses. hardware avoid in order not to lose performance.

Beyond executing a command, Success depends on the version of iperf3, system configuration, drivers of the NIC and even the PCIe link width. You will see practical cases (from a mini PC with Intel N100 up to a server with TrueNAS and Proxmox), key iperf3 parameters, best practices for interpreting results, and a diagnostic checklist when the counter gets stuck.

What is iperf3 and why does it matter in 10GbE?

iperf3 is an open source tool to measure the bandwidth between two computers using TCP, UDP or SCTP. It is multiplatform (Windows, Linux, macOS, BSD, among others) and its code was rewritten from scratch with respect to iperf2 to simplify, add JSON output and become a modular base that is easier to integrate into other programs.

Compared to other utilities, Allows you to adjust the TCP window size, define data amounts, select the protocol, and generate multiple simultaneous connections.In UDP, it measures jitter and loss, supports multicast, and can set the target throughput; in TCP, it offers options such as zero-copy, congestion control, and detailed interval output.

On links of 10 Gbps or higher, a single TCP connection may not be enough to saturate the link; distribute the load across multiple threads and, since iperf3 3.16, take advantage of multithreading more easily helps fill the pipe when CPU or network stack is limiting.

Versions: minimum requirements and compatibility

First of all, check the version on both ends with iperf3 -v. Since 3.16 iperf3 simplifies multithreading, reducing the need for Tricks y commands additional features that were common in 40 Gbps environments and above. If your distro is behind (typically in hypervisors or appliances), compiling the latest version is usually the fastest way; and check the Network types in Hyper‑V, VirtualBox, and VMware of your environment.

A real example: in an environment with OpenWrt/QWRT and Proxmox VE 8.3, OpenWrt already had a recent version, but Proxmox was outdated.. It was resolved by compiling iperf 3.18 from source on the Proxmox host, and from then on multithreading was enabled normally.

Saturating 10GbE with multiple flows in iperf3

Preparing your 10GbE environment: CPU, NIC, drivers, and PCIe

To reach peaks close to 9.8–9.9 Gbps, Cat6A cables and a 10G switch are not enough.. You need to look further down: driver version, RSS (Receive Side Scaling), interrupt coalescing, offloads (TSO/GSO/GRO/LRO), and most importantly, the card's PCIe link.

On Windows 10/11, especially with 10G NICs like the ASUS XG-C100C, Update the driver, enable RSS, and check if SMB Multichannel is active. y audits the network connection in Windows. In file transfers with NAS (e.g. QNAP with QuTS Hero), the speed will also depend on the performance of the storage and the NAS CPU; it is normal to see ~6–8 Gbps if RAID or ZFS are not supporting it, while iperf3 (memory to memory) can approach 10GbE.

  How you can Allow or Disable JavaScript In Chrome Browser

On Linux and BSD, Verify that the card negotiates the correct PCIe width. A typical case: a client was stuck at ~3.5 Gbps despite having 10GbE because the slot was running degraded to x1 at 5 GT/s. The command lspci -vv showed “LnkSta: Speed ​​5GT/s, Width x1 (degraded)”. After adjusting the BIOS To force the port to x16, the speed jumped to ~9.84 Gbps.

Multithreading mode in iperf3: essential commands

The basic test pattern is simple: launch the server with iperf3 -s at one end and run the client from the other. From here, add parallelism:

  • Multiple TCP streams: iperf3 -c IP_SERVIDOR -P 4 generates 4 parallel connections. Adjust -P to your CPU/NIC; on 10GbE, 4–8 is usually sufficient.
  • Simultaneous bidirectional: iperf3 -c IP_SERVIDOR -P 4 -d measures upload and download at the same time, useful for validating full duplex.
  • Reverse direction (server to client traffic): -R.
  • Duration: -t 60 for 60 s tests; rule out starts with -O 3 to avoid TCP slow start.
  • Zero-copy: -Z reduces CPU load on Linux.
  • CPU Affinity: -A sets the affinity (useful for distributing threads across different cores).
  • Binding to interface: -B 10.0.0.2 explicitly choose the outgoing IP/IF (see How to list network interfaces in CMD).

If you work with UDP, controls the flow with -b and check for loss/jitter; although for saturating 10GbE and measuring the system stack, TCP is usually the starting point.

Case Study 1: Intel N100 Mini PC with Two 10GbE

Scenario: two iKOOCORE R2 Max (one fanless with OpenWrt/QWRT as server and another with Proxmox VE as client), both with dual 10GbE and 4-core Intel N100 CPU. In the first measurement, ~9.41 Gbps upload was observed, but the download remained at ~8.6 Gbps, and the bidirectional test was worse.

HTOP showed the track: a single core at 100% while the rest were idle. Although the total CPU usage was around 30%, a single saturated thread was limiting the effective throughput. To resolve this, the server was started as usual (iperf3 -s) and in the client it was added -P 4 to open 4 streams.

The output, with 10 s intervals and four threads, is more verbose, but The important thing is that the target figure was achieved: ~9.41 Gbps. In the full duplex test with -d, the sums averaged ~9.40 Gbps both ways for one minute, confirming that the CPU was already spreading the load across multiple threads and the NIC was saturating the link.

As a practical note: if the client version is old (as was the case with Proxmox), compiling iperf 3.18 solved the performance gap and provided support for recent options on the side that needed it most.

Case Study 2: TrueNAS + Proxmox and the PCIe Bottleneck

Configuration: server with TrueNAS-13.0-U6 (FreeBSD 13.1) on Supermicro x8/x9 boards, Dual Intel Westmere E5645 CPU and Chelsio T420-CR NIC; client with Proxmox VE 8.1.3 (Debian 12, kernel 6.5.11), Dual E5-2420 v2 CPU and Chelsio T440-CR NIC. Direct cabling via twinax DAC between hosts.

The first test nailed down to ~3.55 Gbps on average, with zero retransmissions and a stable congested window. The CPU didn't even bother: ~13–20% usage by iperf3The decisive clue came when looking at the PCIe link status: “LnkSta: Speed ​​5GT/s, Width x1 (degraded).” That is, the 10G NIC was tied to a PCIe x1 lane at 5 GT/s, limiting the throughput.

  Windows security audit using auditpol and wevtutil

After diving into the BIOS, a critical option was found in Supermicro X9DBU-3F: Advanced → Chipset Configuration → North Bridge → Integrated IO Configuration → IIO 1 IOU3 – PCIe Port. Switching from “Auto” to “x16” restored the link bandwidth and speed. The result: iperf3 clocked at ~9.84 Gbps sustained, with zero retransmissions and a stable Cwnd.

Also, opening two clients against two different ports (-p 5000 y -p 5001) ~19.58 Gbps aggregated were reached. In that dual-port scenario, two threads of the Westmere were running at 70–80% and the Ivy Bridge at 30–40%., consistent with multi-threaded loading by pushing traffic through two 10G interfaces at once.

Interpreting results: what to look for and how to adjust

In the output of iperf3, Look at the final average rate and the interval lines (default 1 s; some examples use 10 s). If you multiply the transferred MBytes by 8 and divide by There of the interval, the number must match the reported Gbits/s.

If you see a single saturated thread and the total does not increase, increases -P gradually (4, 6, 8) until the throughput stops growing. If it increases slightly but does not reach 9.8–9.9 Gbps, check offloads and RSS. When there are losses or retransmissions in TCP, evaluate -w (window), -O (skip Boot) and the quality of the physical link.

In two-way (-d), look at the sums and the minute to validate that both directions are close to the theoretical maximum. Large differences between upload and download speeds usually indicate affinity, IRQs, or poorly distributed RSS.

When it's not the CPU: Chronic “6 Gbps” diagnosis

If you stay at ~6–7 Gbps (very typical on new setups), discard these points one by one:

  • Old version: 3.7 clients with 3.16 servers work, but upgrade both to benefit from multithreading improvements and fixes.
  • PCIe degraded: check width and speed with lspci -vv (Linux) or vendor tools; a x1 at 5 GT/s throttles the 10G NIC.
  • RSS disabled: without interrupt sharing, only one core remains at 100%. On Windows, enable it in the driver; on Linux, check ethtool -l and RPS/RFS.
  • Offloads and coalescing: TSO/GSO/GRO/LRO and interrupt coalescing reduce CPU load; adjust with ethtool.
  • MTU/Jumbo frames: It doesn't always increase throughput, but it helps with weak CPUs. Enable it on both ends and on the switch if it supports it.
  • Storage: In NAS tests, the bottleneck may be RAID/ZFS; use iperf3 to isolate the network (memory to memory) and not the disk.
  • SMB and multichannel: On Windows, SMB Multichannel speeds up backups; if it's not enabled, files will run slower even if iperf3 clocks in at 9–10 Gbps.

It is also a good idea to adjust system buffers in Linux when searching for sustained peaks: sysctl net.core.rmem_max y net.core.wmem_max High speeds prevent socket bottlenecks; they are not mandatory for 10GbE on low-latency LANs, but they help in demanding scenarios.

Quick installation on Windows, Linux, and macOS

On Windows download the trusted binaries, unzip and run from CMD o PowerShell; If Windows does not detect the network card, see the solutions guide. Open a console, go to the folder and run iperf3.exe -s o iperf3.exe -c IP. Make sure to allow port 5201/TCP on your firewall if you're testing across subnets.

  How to remove the study checklist in Google Chrome

On Linux/macOS, install is usually a command: on Debian/Ubuntu sudo apt-get install iperf3, on RHEL/CentOS sudo yum install iperf3, on macOS with Homebrew brew install iperf3If your system is outdated, compiling from source gives you the latest version.

Useful iperf3 parameters (client and server)

To control the session in detail, These flags save you time:

  • General: -p port, --cport client port (>=3.1), -f format (k/K/m/M/g/G), -i interval, -F archive, -B bind, -V verbose, -J JSON, --logfile (>=3.1), -h help, -v version.
  • employee: -s server, -D devil, -I PID file.
  • Client: -c IP, --sctp SCTP, -u UDP -b bandwidth, -t weather, -n bytes, -k packages, -l buffer length, -P threads, -R reverse, -w TCP window, -M MSS, -N TCP no-delay, -4/-6 IPv4/IPv6, -S T.O.S. -L IPv6 tag, -Z zero-copy, -O skip boot, -T Title, -C congestion algorithm.

Remember -P applies by flow. If you fix -b If UDP or the system imposes socket limits, each thread may hit its own ceiling; experiment with fewer but better-tuned threads.

Impact and Limitations: What iperf3 Measures (and What It Doesn't)

iperf3 is excellent for stressing the network, but it can flood your LAN if you run it in productionAvoid long tests during peak hours, and coordinate with other teams if you share backbones. Their traffic patterns don't always replicate real-world application loads.

Among its drawbacks, It does not have a graphical interface as standard, it requires tweaking parameters and doesn't keep history itself. While it supports many protocols, there are scenarios (e.g., certain IPv6 features or specific multiplexing) where other tools may be a better fit.

Alternatives when you need something else

If iperf3 doesn't fit your case, These options can save you:

  • hyperf2: legacy but useful for old compatibilities; there is classic GUI (jperf).
  • netperf: very powerful for TCP/UDP and latency, more technical to handle.
  • nuttcp: lightweight for measuring bandwidth/loss without too much complication.
  • bwping/bwping6: ICMP to estimate bandwidth when you can't open TCP/UDP (less accurate).
  • Speedtest CLI / Fast.com CLI: for public Internet; do not replace iperf3 on LANs.

If your goal is to squeeze out 10GbE, start with multithreaded iperf3: Check version ≥3.16, enable 4–8 threads, check RSS and PCIe link, and check drivers/offloads. With these minimums, modest systems like an Intel N100 have reached 10GbE (9.40–9.85 Gbps), and older configurations have gone from 3.5 to 9.8 Gbps after correcting a simple x1 in PCIe. The key is to distribute the work between cores, avoid hidden bottlenecks, and read the right metrics to know when to adjust each parameter.

VirtualBox network type explanation
Related article:
VirtualBox: Complete Guide to Network Types, Uses, and Tricks