- irqbalance distributes interrupts hardware between cores to avoid bottlenecks in SMP systems, being especially useful in high-traffic servers.
- Actual performance depends on combining irqbalance with NAPI, network queues, well-sized TCP buffers, modern qdisc, and congestion control algorithms like BBR.
- Advanced NIC settings (ring buffers, RSS/RPS, offloads) and manual IRQ affinity assignment allow for further fine-tuning of load distribution between CPUs.
- The overall optimization of Linux This is completed with changes to memory, swap, zram, temporary file systems, and desktop parameters and applications such as Firefox.

When you start tinkering with the performance of a GNU/Linux system, sooner or later a recurring protagonist appears: irqbalance and the distribution of hardware interrupts between CPUsIt's talked about in forums, in distribution documentation, and in tutorials on "tuning" the system to make it fly... but it's rarely explained well what it actually does, in what scenarios it contributes, and in which ones you won't notice anything at all.
Furthermore, all this about IRQ balancing It gets tangled up with other advanced concepts: NAPI, kernel receive queues, NIC buffers, TCP congestion control algorithms, RSS/RPS, BBR, sysctl settings, performance daemons like preload, zram-compressed swap, GTK optimization, temporary files in RAM, etc. It's very easy to get lost among the parameters. commands and configuration files without a clear understanding of what each thing does.
What is IRQbalance and what is it actually used for in Linux?
irqbalance is a user-space daemon that is responsible for distributing hardware interrupts (IRQs) among the different CPU cores. in SMP (multiprocessor or multicore) systems. Its goal is not magical: it simply tries to prevent all the work of servicing devices (network, disks, USB, etc.) always falls on the same nucleus.
When a device generates an IRQ, the kernel executes an interrupt handler. If many of these interrupts are concentrated on a single CPU, that CPU can become overloaded while the others remain idle. irqbalance analyzes the volume of interrupts per device and assigns each IRQ to a "suitable" core to distribute the load., trying at the same time to minimize cache losses and respect logical affinities (for example, keeping IRQs of the same network interface related).
In systems with a single CPU or cores that fully share the L2 cacheirqbalance itself detects that it has nothing useful to do and closes. This isn't an error; there's simply no room for improvement. However, on servers with multiple physical CPUs or many cores, especially with heavy network traffic or high I/O, it can make a difference in latency and stability.
The daemon can run in the background (service mode) or on an as-needed basis with the option –oneshot. Also allows excluding specific IRQs by option --banirq and prevent it from using certain cores with the CPU mask defined in the environment variable IRQBALANCE_BANNED_CPUSAll of this is normally controlled from your configuration file, which in many distributions is located in /etc/default/irqbalance o /etc/irqbalance.env.
IRQ balancing: kernel vs irqbalance and when it's noticeable
Linux already has its own internal mechanism to decide which CPU serves each IRQ, without the need for irqbalance. The kernel can set interrupt affinities and distribute them following simple heuristicsAnd on many desktop machines that's more than enough: the average user won't notice any difference by turning irqbalance on or off.
That's why it's relatively common for someone to try irqbalance on their desktop distro and say: “I don’t notice any improvement, but nothing bad either.”That's perfectly normal. On a laptop with a quad-core processor and a single network card without a heavy I/O load, the Linux scheduler and internal kernel mechanisms (NAPI, network queues, etc.) already keep things reasonably balanced.
Where it does make sense is in servers with multiple cores and intensive traffic: databases large, reverse proxies, high traffic web servers, storage of backups, heavily loaded virtual machinesetc. Having many network or disk IRQs pinned to a single core can become a bottleneck. Distributing them properly reduces queues, service times, and latency spikes.
Some users prefer to use irqbalance instead of irqbalance parameterize the kernel with options such as acpi_irq_balance in GRUB. This parameter influences how ACPI and the kernel allocate IRQs, but it doesn't offer the same dynamic flexibility as irqbalance, which re-evaluates the allocation based on the actual load. They are different approaches: the first is more static and low-level; the second, more adaptive.
In ultra-low latency environments (for example, certain trading platforms or networks with DPDK), the opposite occurs: irqbalance is usually disabled and IRQs are manually assigned to specific cores.along with careful mapping between NIC queues and CPUs. In these scenarios, absolute control is sought, and some automation is sacrificed.
Interrupts, NAPI, and network queues: how IRQbalance fits in
To better understand the role of irqbalance, we need to go down another layer and look at how Linux handles them. network interruptions and packet receptionThe kernel's network subsystem combines several key components: NAPI, receive queues (DMA buffer), parameters of net.core.*, queue management algorithms (qdisc) and lateral scaling (RSS/RPS).
NAPI (New API) This is the mechanism by which the kernel reduces the interrupt storm when a lot of traffic arrives. Instead of firing an IRQ for each packet, the NIC generates an interrupt indicating "there is work pending," and the kernel probe The receive queue is processed until it is empty or a time/packet budget is exhausted. This reduces jitter and improves performance, although it may also introduce some latency variation.
The queue where packets land before being processed by the network stack is what is usually called the kernel receive queue or DMA bufferIts capacity is limited by parameters such as:
net.core.netdev_max_backlog: maximum number of packets in the software receive queue when the kernel cannot process them at the rate they arrive.net.core.netdev_budget_usecs: “budget” of time in microseconds that NAPI has to empty queues in each cycle.net.core.dev_weight: number of packets processed per interface in each round within that budget.
Si netdev_max_backlog is very low and the NIC sends more packets than the kernel can handle, we'll start seeing "dropped packets" in /proc/net/softnet_statA typical starting point for fine-tuning on high-traffic servers is in the range of 4000 packages, configured in /etc/sysctl.conf with something like:
net.core.netdev_max_backlog = 4000
After modifying it, it is applied with sysctl -p or with a sysctl -w punctual. In this way, the tail better absorbs arrival peaks without losing packets, as long as the rest of the processing path holds up.
Throughout this circuit, irqbalance decides which CPU handles the IRQs associated with the network interface. If we concentrate all NIC interrupts on a single core, that core will run the NAPI routines and drain the queue on its own.If we distribute IRQs properly and use mechanisms like RSS or RPS, several CPUs can collaborate to process packets, reducing queues and losses.
Fine-tune queues, buffers, and TCP window for heavy traffic
When a server moves a lot of data (for example, backups, FTP traffic, large files or replicating databases), simply "having irqbalance enabled" is not enough. Several levels need to be harmonized: kernel queues, card buffers, TCP window sizes, and congestion parameters.so that everyone rows in the same direction.
The first adjustment block is in the packet receive buffer and TCP windowLinux exposes parameters such as:
net.ipv4.tcp_rmem: triplet (minimum, default, maximum) of TCP receive memory per socket.net.ipv4.tcp_wmem: equivalent for shipping.net.core.rmem_maxynet.core.wmem_max: maximum hard buffer that an application can request.
A practical way to adjust them for a low-latency Gigabit link is to calculate the BDP (Bandwidth-Delay Product) and apply a scaling factor (thanks to the option of TCP window scaling(enabled by default in most modern kernels). In practice, many administrators end up with maximum values of several megabytes, for example:
net.ipv4.tcp_rmem = 4096 16384 10880000
net.ipv4.tcp_wmem = 4096 16384 10880000
That maximum of ~ 10,8 MB This comes from specific calculations for 1 Gbps links with very low RTT (~0,00017 s). The higher the throughput or latency, the larger the buffer needs to be to take advantage of the bandwidth without being limited by the window.
It is important to relate this maximum to the size of the packet receiving queue. If netdev_max_backlog = 4000 and each effective packet is around 1480 bytes, that queue represents about ~5,9 MB. The TCP window buffer must be larger than what fits in the queueor we will lose packets by saturating the buffer before the queue.
To monitor for lossesCommands such as the following can be used:
cat /sys/class/net/eth0/statistics/rx_dropped: packets lost in the NIC.watch -n 1 -t -d cat /proc/net/softnet_stat: columns processed, dropped, time squeezed, etc. per CPU.watch -n 1 -t -d "netstat -s | grep err": network stack errors.
Si netstat indicates errors but /proc/net/softnet_stat does notMost likely, the losses are occurring outside our host (along the path, through an intermediate firewall, etc.). There will always be some loss due to congestion control algorithms, but it should remain under control and correlate with traffic spikes.
Queue management (qdisc), QoS and TCP congestion control
Although the kernel's internal queues are critical, so is the queuing discipline (qdisc) associated with network interfacesThis is what determines how packets are ordered, grouped, and discarded in the output, and can make a difference in the face of phenomena such as bufferbloat.
Linux offers several relevant qdiscs:
- pfifo_fast: old default discipline, a FIFO with three priority bands.
- fq_codel: combination of fair queuing with CoDel to combat bufferbloat, highly recommended for routers and general use.
- fq: simple fair queuing, very useful on high-load servers.
- cake (sch_cake): the most advanced today, but it requires compiling or having the module in the kernel.
The default qdisc looks like this:
sysctl net.core.default_qdisc
And it can be hot-swapped with:
sysctl -w net.core.default_qdisc=fq_codel
To ensure that a specific interface uses a particular discipline, one resorts to tc:
sudo tc qdisc replace dev eth0 root fq_codel
Related to this, the TCP congestion control algorithm It also makes a difference in WAN environments or those with massive traffic. The kernel offers several (cubic, reno, etc.), and modern versions include BBR developed by Googlewhich typically improves sustained throughput without increasing latency. It is activated by:
sudo sysctl -w net.ipv4.tcp_congestion_control=bbr
The list of available algorithms can be inspected by viewing the modules tcp_* en /lib/modules/<versión>/kernel/net/ipv4/. Combine BBR with a suitable qdisc (fq/fq_code) and a good buffer configuration This results in much more stable data flows.
Advanced NIC settings: ring buffers, RSS, RPS, and offloads
The network card itself also has its own set of queues and buffers: the ring buffers for receive (RX) and transmit (TX)Its maximum size depends on the hardware and is displayed with:
ethtool -g ethX
Fields “Pre-set maximums” They indicate how much it can be increased. If the card allows it, RX and TX can be increased to values like 4096, 8192, or 16384 with:
ethtool -G ethX rx 4096 tx 4096
If an error such as “Cannot set device ring parameters: Invalid argument” appears, That value exceeds the capabilities of the NIC and we will have to try smaller divisors.
To prevent a single CPU from handling all the receiving workMany modern NICs implement RSS (Receive Side Scaling)creating several hardware queues associated with different kernels. In Linux, this distribution looks like:
cat /proc/interrupts | grep
On cards without RSS, something similar can be "emulated" with RPS (Receive Packet Steering), assigning CPUs to the interface software queues:
echo f > /sys/class/net/enp4s0f0/queues/rx-0/rps_cpus
The value is a hexadecimal mask. For example, f In binary (1111), it indicates using the first four kernels. The kernel must be compiled with CONFIG_RPS for it to work.
You can also play with checksum offloads and segmentation to offload work from the CPU to the NIC processor. ethtool -k The activated capabilities are visible, and things like the following can be turned on:
ethtool -K ethX rx on (checksum verification at reception)
ethtool -K ethX tso on (TCP segmentation offload)
For these changes to be lasting, are usually placed in network scripts (/etc/network/interfaces in Debian/Ubuntu, /etc/sysconfig/network-scripts/ifcfg-ethX in Red Hat) or in udev rules.
Manually fine-tune IRQ affinity and live with IRQbalance
In older kernels or in cases where we want fine control, it is possible Manually assign which CPU handles a specific IRQ.. All information is in /proc/irq y /proc/interruptsA typical flow would be:
- View interruptions: cat / proc / interrupts and locate the network interface line (e.g., IRQ 25 for enp0s8).
- Check current affinity: cat /proc/irq/25/smp_affinity (hex mask: 02 = CPU1, 01 = CPU0, 04 = CPU2, etc.).
- Change it: echo 1 > /proc/irq/25/smp_affinity to move it to CPU0.
In this way you can, for example, offload from an overloaded CPU Moving an IRQ to a less busy core. If you have irqbalance installed and enabled, it's important to tell it not to touch that specific interrupt by adding an option like:
OPTIONS="--banirq=25"
en /etc/default/irqbalanceor by using the option --banirq=25 When the demon starts. Like that You leave it out of its automatic distribution logic. and you respect your manual assignment.
So that changes like echo 1 > /proc/irq/25/smp_affinity survive the rebootsThey are usually added to the classic /etc/rc.local (if enabled) or to specific systemd units.
Other related performance settings: swap, zram, temporary files, preload
All this work to improve network response and IRQ distribution is usually accompanied by other system tuning which, although not directly tied to irqbalance, complete the performance picture.
In the area of memory, many administrators reduce the aggressive use of swap modifying /etc/sysctl.conf with parameters such as:
- vm.swappiness: how much the kernel prefers to use swap (0-100, default 60). Low values (1-10) prioritize RAM.
- vm.vfs_cache_pressurePressure on inode and dentry cache. Reducing it helps keep metadata in RAM.
- vm.dirty_writeback_centisecs y vm.dirty_expire_centisecs: frequency and expiration of writing dirty pages to disk.
- vm.dirty_ratio y vm.dirty_background_ratio: percentage of memory that can be filled with dirty data before forced writes.
With appropriate values (for example vm.swappiness = 1, vm.vfs_cache_pressure=50etc.) it is achieved that The system will use more RAM before starting to page., something desirable in servers with a lot of memory.
Another classic measure is move temporary directories to RAM mounting them as tmpfs en /etc/fstab:
tmpfs /tmp tmpfs noatime,nodiratime,nodev,nosuid,mode=1777,defaults 0 0
tmpfs /var/tmp tmpfs noatime,nodiratime,nodev,nosuid,mode=1777,defaults 0 0
With this, temporary accesses (compilations, application working files, etc.) are performed to RAM speed and reduces wear on discs SSD.
It is also common in low-resource environments to activate zram-swapA compressed swap device in RAM. It is installed from a repository (for example, using Git by cloning). zram-swap and executing his install.sh) and creates a compressed block device where the system swaps before resorting, if available, to disk swapping. You gain effective memory at the cost of slightly more CPU.which compensates for many loads.
Finally, demons like preload They analyze which applications run most often and preload their binaries and libraries into RAM, speeding up startup at the cost of memory consumption. This makes sense on desktops with ample RAM; on servers with limited resources, it's usually omitted.
GTK, GRUB, Firefox and other “quality of life” tweaks
Beyond pure server performance, many tutorials include very long sections on optimize the desktop experienceGTK menu response times, fonts, dark themes, sound, mouse behavior, etc. While they don't directly affect IRQ Balance or network traffic, they do contribute to a more responsive system.
In GTK2, GTK3 and GTK4, adjustments can be made. dozens of parameters in files like ~/.gtkrc-2.0, ~/.config/gtk-3.0/settings.ini o ~/.config/gtk-4.0/settings.iniAnimations, double-click timings, cursor size, title bar design, use of dark themes, tooltip behavior, font antialiasing, etc. These files are edited manually (with nano) or by adding lines via echo >> from terminal.
Versions are also usually adjusted. “global” of these parameters in /etc/gtk-2.0/gtkrc, /etc/gtk-3.0/settings.ini y /etc/gtk-4.0/settings.ini, so that they affect all users, always respecting that in those cases you have to be root (via su - o sudo).
El manager's waiting time Boot GRUB It gets shorter by editing. /etc/default/grub and modifying GRUB_TIMEOUT (for example, from 10 to 3 seconds), and adding parameters to the kernel such as noresume o acpi_irq_balance en GRUB_CMDLINE_LINUX_DEFAULT. Then it is executed update-grub (Debian/Ubuntu) or grub-mkconfig -o /boot/grub/grub.cfg (Archives and derivatives).
In the case of the browser, there are files such as user.js for firefox These settings allow you to apply batches of preferences aimed at speeding up browsing: number of connections, cache behavior, compression, WebSockets, etc. The usual procedure is to reset the profile, open the profile directory from "Troubleshooting Information," close Firefox, and paste the following: user.js in the profile folder and reopen the browser.
Although all of this is parallel to the topic of irqbalance, it illustrates an important idea: Linux optimization is a holistic processIt's not just about a daemon or a magic parameter, but about fine-tuning CPU, memory, disk, network, desktop, and applications until it fits the actual use of the machine.
Looking at the whole package—IRQ allocation with irqbalance or manual, NAPI and optimized queues, properly sized TCP buffers, modern qdiscs like fq_codel or fq, BBR congestion control, expanded ring buffers, enabled RSS/RPS, swap and zram under control, temporary files in RAM, and a finely tuned desktop environment—it becomes clear that irqbalance is just one more piece of a rather large puzzleIts role is vital in multi-core servers and high traffic, irrelevant in many desktops, and counterproductive in extreme latency systems where manual affinities are preferred; understanding this context well is key to deciding whether to let it work, limit it, or simply disable it.
Passionate writer about the world of bytes and technology in general. I love sharing my knowledge through writing, and that's what I'll do on this blog, show you all the most interesting things about gadgets, software, hardware, tech trends, and more. My goal is to help you navigate the digital world in a simple and entertaining way.