Cm5 macb network hang
Silent network hang on Raspberry Pi CM5 / Pi 5 (macb / RP1 ethernet)
A technical writeup of a silent ethernet TX stall on the Raspberry Pi Compute Module 5, what it looks like, how to tell it apart from superficially similar bugs, and which mitigations did and did not work in our testing. Written for others hitting the same failure. Claims here are limited to what we directly observed or can cite; inference and open questions are marked as such.
Summary
On Raspberry Pi 5 / CM5 hardware (Cadence GEM MAC, macb driver, RP1
southbridge), the on-board gigabit ethernet can stop transmitting while the link
layer continues to report the link as up. The receive path keeps working, the
driver logs nothing, all ethtool error counters stay at zero, and the
kernel's own TX watchdog never fires. The host remains fully alive locally; it is
simply unreachable on the network until the NIC is reset (a link down/up sometimes
suffices; a reboot always does).
We observed this on three CM5 nodes running Ubuntu 26.04 with the
linux-raspi 7.0.0-1011 kernel. Based on a captured counter
time-series (below), we attribute it to the silent TX-ring stall described and
fixed upstream in the Raspberry Pi Foundation kernel
(raspberrypi/linux #7339 /
PR #7340). That fix is not yet in
Ubuntu's linux-raspi ([https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877
Launchpad #2133877], still "Confirmed" as of 2026-06-29).
Environment where we observed it
| Item | Value |
|---|---|
| Board | Raspberry Pi Compute Module 5 (CM5), arm64 |
| OS / kernel | Ubuntu 26.04 LTS, linux-raspi 7.0.0-1011-raspi
|
NIC (from dmesg) |
macb 1f00100000.ethernet eth0: Cadence GEM rev 0x00070109, PHY Broadcom BCM54213PE
|
| Topology | On Pi 5 / CM5 the ethernet MAC is reached over PCIe via the RP1 southbridge (unlike Pi 4, where the MAC is on the SoC). The RP1↔SoC DMA path is relevant to the root cause. |
| Root filesystem | NVMe (not SD). Relevant only because it makes the driver swap below low-risk. |
We did not observe the failure on an x86-64 node with a different NIC in the
same cluster, consistent with this being specific to the macb/RP1 path
rather than anything in the surrounding software.
Symptom and signature
The defining property is that carrier never drops. A cable, switch, or PHY
fault produces a Link is Down event; this failure does not.
Observed, every time:
ip linkshows the interfaceLOWER_UP; the last carrier event in the log is theLink is Up - 1Gbps/Fullfrom boot. No carrier loss is ever logged.- All off-host traffic stops simultaneously — gateway, peers, NFS, DNS. ARP for the gateway goes
INCOMPLETE. - The
macbdriver emits nothing — no TX-timeout, no DMA/IRQ error.ethtool -S eth0shows zero on every error and drop counter. - The kernel netdev TX watchdog (
dev_watchdog) does not fire, becausetrans_startkeeps being updated — this is why the stall is "silent." - The host is otherwise healthy: the kernel is responsive on the local console, journald keeps writing, no panic, no OOM, no thermal event.
- No self-recovery. The interface stays in this state indefinitely until reset.
If you see link-up + all-egress-dead + no carrier event + driver silent + zero error counters, you are very likely looking at this bug rather than a cable, PHY, congestion, or buffer-exhaustion problem.
Why it is easy to miss, and why we caught it quickly
This failure self-erases on reboot: the only evidence is in the previous boot's log, and that log shows nothing wrong with the link. On a desktop or a stateless worker it tends to read as a one-off "the machine dropped off the network," and a reboot makes it disappear. (Our first encounter, on an uninstrumented node, was exactly this — unreachable, no captured logs, reboot fixed it, cause unknown at the time.)
It became unmissable because the affected hosts were etcd voting members of a k3s control plane. When the NIC stalls on such a node, the local etcd is partitioned, the local API server can no longer serve quorum reads, the controller-manager loses its lease, and the k3s process exits and is restarted — a loud, logged, repeating failure rather than a silent drop. That is an artifact of our test setup, not of the bug, but it is a useful detail: if you run any quorum service (etcd, Consul, etc.) on CM5 hardware, this bug will surface as a control-plane meltdown, not as a quiet link blip. The underlying event is still just one stalled TX ring.
Diagnostic evidence we collected
Because none of the standard mechanisms detect this (carrier stays up, so
systemd-networkd sees nothing; dev_watchdog never fires; the
hardware watchdog keeps being petted because the host is alive), we ran a
reachability probe that, on N consecutive failures, captured a read-only
forensic snapshot before resetting the interface. The snapshots are what let us
classify the failure rather than guess at it.
What the snapshots showed, consistently:
- Link up, 1000/Full,
Link detected: yesat the moment of the hang. - Every error and drop counter at 0 (
rx_resource_errors,rx_overruns, FCS errors,tx_carrier_sense_errors,q0_{tx,rx}_dropped). Rules out cable, congestion, buffer exhaustion, and PHY faults. - All four EEE/LPI counters (
{rx,tx}_lpi_{transitions,time}) at 0 — the PHY had not entered Low Power Idle. - No RCU stalls in the kernel log preceding any hang.
- A single TX queue (only
q0_*counters exist) with the eth0 IRQ servicing entirely on CPU0.
The decisive capture was a 3-sample, ~1-second-apart time-series of the tx/rx frame counters taken at the moment of one hang:
| Sample | tx_frames |
rx_frames
|
|---|---|---|
| t0 | 23553102 | 26633386 |
| t1 | 23553102 | 26633400 |
| t2 | 23553102 | 26633423 |
tx_frames is frozen across all three samples while rx_frames
continues to climb, and the eth0 IRQ count keeps advancing. This is a direct
observation of the TX path being stalled while RX and interrupts are still live — not
an inference from symptoms. It is this datapoint, more than the symptom description,
that drives the root-cause attribution below.
Root-cause analysis: which bug is this?
There are several public reports that look alike at the symptom level. They are not all the same bug, and getting the mechanism right determines which mitigation is worth applying. This section lays out the candidates and the evidence for our attribution.
The candidate reports
| Report | Proposed mechanism | Same silicon? |
|---|---|---|
| Ubuntu Launchpad #2133877 — "Complete network hang on Raspberry Pi 5 … possibly related to CPU frequency scaling" | CPU frequency-scaling transitions corrupting the RP1/macb DMA path; the report notes RCU stalls as a precursor and that the performance governor stopped those RCU stalls. |
Yes (Pi 5, linux-raspi)
|
| raspberrypi/linux #7339 / PR #7340 — "candidate fixes for silent TX stall on BCM2712/RP1" | A silent TX stall in the macb driver: tx stops advancing, RX keeps working, link stays up. Three driver patches. |
Yes (BCM2712/RP1, macb)
|
| siderolabs/sbc-raspberrypi #91 — "silent network death on Talos" | Decomposes the failure into an EEE LPI-wake race, a macb TSO/GSO TX-ring hang (single TX queue, softirqs on CPU0, small rings), and a Talos-specific socket issue. Prevention bundle: EEE off + TSO/GSO off + larger rings. | Yes (Pi 5, BCM2712 + Cadence GEM + BCM54213PE) |
Various "ethtool -K eth0 tso off gso off fixes it" posts |
Trace originally to Intel e1000e "Detected Hardware Unit Hang" — a different NIC and driver. | No |
What our evidence supports, and what it argues against
The symptom matches #2133877 essentially line-for-line — same NIC, complete
network hang, host alive locally, clean ethtool stats, no macb
log lines, recovers only on reset. So at the symptom level we are clearly in the
- 2133877 family.
But #2133877's proposed mechanism (frequency scaling) does not fit our observations, on three independent points:
- No RCU stalls. #2133877's headline precursor is RCU stalls (the reporter saw them at a steady rate, and pinning the governor stopped them). We logged zero RCU stalls before any hang, across every event. The precursor that report hangs its causal story on is absent on our hardware.
- The hang occurred at maximum, pinned frequency. We trialed the
performancegovernor (which holds the cores at a fixed maximum frequency and therefore eliminates frequency transitions) on the worst-affected node. It hung again ~30 minutes later, and our counter capture from a separate hang shows the cores already at the maximum 2400 MHz at the time of the stall. If frequency transitions were necessary to trigger the hang, pinning the frequency should have prevented it. It did not. (This is a single negative trial, n=1, but it is a direct one.) - The captured time-series matches #7340's description exactly. #7340 describes the failure as "tx_packets stops incrementing, RX still works, link stays up." Our capture shows precisely that — frozen
tx_frames, climbingrx_frames, live IRQ. #2133877 does not characterize the failure at the ring-counter level; #7340 does, and our data fits it.
Our interpretation: #2133877 and #7339 are most likely the same underlying defect, and the "frequency scaling" wording in #2133877's title is a hypothesis from before the TX-ring stall was isolated, not an established mechanism. The root-cause work that produced an actual fix is #7340, whose described signature is the one we captured. We treat the frequency-scaling angle as not supported on our hardware: we cannot rule out a frequency transition acting as one possible trigger, but our evidence shows transitions are not necessary (the hang happens at pinned max frequency) and that the RCU-stall precursor central to that report is absent for us.
On the EEE and TSO/GSO theories (siderolabs #91)
siderolabs #91 is the closest same-silicon analysis and is the source of the popular prevention bundle (EEE off, TSO/GSO off, larger rings). Two of its three triggers are worth separating against our data:
- EEE / LPI-wake race — not our mechanism. All four LPI counters read zero at every hang; the PHY never entered Low Power Idle. Disabling EEE addresses a state our hardware was never in. (On our setup EEE was advertised but inactive because the switch did not negotiate it, so disabling it also had no measurable power cost — but that is incidental.)
- macb TSO/GSO TX-ring hang — this is the trigger that does line up structurally with what we see: a single TX queue, all NET softirqs on CPU0, and the silent-because-
trans_start-still-ticks behavior. This is the same family as #7340. However — see the mitigations section — disabling TSO/GSO and enlarging the rings did not prevent the hang on our kernel, so while the structural description fits, the offload-disable remedy did not hold for us.
The e1000e-derived "just turn off TSO" advice does not transfer — it originates from a different NIC and driver, and a lone TSO-off was tried and walked back in the
- 7340 discussion. We mention it only because it is widely repeated.
What we are NOT claiming
- We have not bisected the kernel or independently proven the PCIe-posted-write mechanism that PR #7340's patches address; we are relying on the upstream commits for the mechanism and on our counter capture for the match.
- We have not (yet) demonstrated that the backported fix eliminates the hang — that trial is in progress and its result is pending (see Status).
- We cannot speak to whether other Pi 5 / CM5 configurations (different PHY link speeds, RPiOS vs Ubuntu, SD vs NVMe) behave identically.
The upstream fix
raspberrypi/linux PR #7340 was merged
into the Raspberry Pi Foundation kernel branch rpi-6.18.y on 2026-05-08,
and the series subsequently went to mainline netdev (v2, 2026-05-14). It is three
patches to the Cadence/macb driver:
| # | upstream commit | Change (per the commit) | Role |
|---|---|---|---|
| 1 | 63d230184da3 |
Flush the PCIe posted write after the TSTART doorbell, so the doorbell that tells the MAC to begin transmission is not left sitting in the PCIe fabric. | Identified as the core fix |
| 2 | 3ccf780ce058 |
Re-check the interrupt status register after re-enabling interrupts in macb_tx_poll. |
Closes a lost-interrupt race |
| 3 | 8ea87c96f1a6 |
Add a TX-stall watchdog inside the driver. | Defence-in-depth |
The mechanism implied by patch 1 — a posted MMIO write that can be reordered/delayed in the PCIe fabric so the MAC never sees the "start TX" doorbell — is consistent with everything we observed: TX stops, RX and interrupts continue, no error is raised, and the link is unaffected. We present this as the upstream-identified mechanism, not as something we independently verified at the silicon level.
Caveats from the upstream discussion (worth knowing if you deploy the fix):
- Upstream labels them "candidate fixes." Contributors reported that a software watchdog could still occasionally trigger even with the patches applied, and that "both patches + watchdog are necessary to keep things stable."
- Recovery experience varied: a
ip link down/uprecovered the interface for some reporters but not all — at least one needed a driver bind/unbind. This matches our finding that a link bounce usually but not always clears it.
State of the fix in Ubuntu
As of 2026-06-29, the fix is not in Ubuntu's linux-raspi. Our
nodes are on the newest available build (7.0.0-1011) and its changelog
contains no trace of the patches. Launchpad bug
#2133877 remains
in state "Confirmed" (it has not advanced to Fix Committed / Fix Released). A
Canonical kernel engineer on that bug indicated a 7.0-based linux-raspi
would "take some time" and gave no committed date. So the realistic horizon for an
Ubuntu kernel that fixes this at the source is indeterminate (weeks to months).
A way to check for the fix landing without external web access (the package archive is reachable even when Launchpad may not be):
<syntaxhighlight lang="bash"> apt update && apt-get changelog linux-image-raspi | grep -iE 'TX stall|2133877|TSTART' </syntaxhighlight>
An empty result means it has not shipped; a match means the SRU is in and you can fix this by simply updating the kernel.
Mitigations and their measured results
We tried four things. Only two are worth the effort, and the only confirmed recovery is the watchdog; the only plausible cure is the backported driver (result pending). We report what each one actually did, including the ones that failed.
1. Reachability watchdog (recovery) — works; this is the load-bearing mitigation
A small periodic probe that pings two independent off-host targets (we use the gateway and a NAS). Requiring both to fail in a cycle, and requiring N consecutive failed cycles, avoids false positives from ordinary blips. On trip it:
- Captures a read-only forensic snapshot (the data that made this writeup possible).
- Resets the interface:
ip link set eth0 down && ip link set eth0 up. This re-initialises the MAC and, in our experience, usually clears the stall without a reboot. - If still dead after the bounce, reboots as a last resort.
Notes that matter:
- It must test reachability, not link/carrier state — carrier stays up, so a link-state monitor will never fire.
- The recovery script and its logs must live on local disk, not on any network filesystem — the network is exactly what is gone when it runs.
- Timing has to beat whatever your stack's failure threshold is. In our case a slow bounce let the node stay partitioned long enough for the k3s control plane to give up on etcd and restart. We tightened detection (5 s probe cadence, bounce after 2 failed cycles ≈ ~10–15 s to first reset) while keeping the reboot a patient last resort (~75 s). If you run quorum services, tune the bounce to land inside their lease/election timeout.
- A reboot is an acceptable recovery for a node that is one member of a fault-tolerant quorum; it is more disruptive for a singleton. The bounce-first ordering exists to avoid rebooting when a link reset would do.
The kernel's dev_watchdog and the on-board hardware watchdog cannot
substitute for this — the former never fires (the stall is silent by definition) and
the latter is happy because the host itself is not hung.
2. performance CPU governor — did NOT prevent it
Tested specifically to evaluate the frequency-scaling hypothesis. Pinning the cores to maximum frequency eliminates frequency transitions. The node hung again ~30 minutes after pinning. We reverted it. Besides being ineffective here, it carries a small but continuous idle power cost (we measured roughly +0.5–0.75 W on a fanless ~3 W-class node). Recommendation: do not rely on this, and treat it as evidence against the frequency-transition mechanism rather than a mitigation.
3. EEE / TSO / GSO off + larger rings — did NOT prevent it on our kernel
This is the siderolabs #91 prevention bundle:
<syntaxhighlight lang="bash"> ethtool --set-eee eth0 eee off ; ethtool --set-eee eth0 advertise 0x0 ethtool -K eth0 tso off gso off ethtool -G eth0 rx 4096 tx 2048 </syntaxhighlight>
We applied and verified all of it (confirmed active at the time of a subsequent hang:
EEE disabled, both offloads off, rings 4096/2048). The node hung anyway. On our
kernel this bundle did not prevent the stall. We left it applied (it is close to free
on our hardware — see the EEE note above) but we cannot report it as effective
prevention. Reporters on other kernels (notably pre-#7340 builds) found it stable on
a fleet, so results appear to be kernel-dependent; do not assume it will hold on
linux-raspi 7.0.
If you do apply it, note that a link down/up resets these settings, so any watchdog that bounces the interface must re-assert them afterward.
4. Backported macb.ko (the actual fix) — deployed, result pending
Because the upstream fix is a driver change and macb is a loadable module
(CONFIG_MACB=m) on the Ubuntu kernel, the three #7340 patches can be
backported and the driver rebuilt without replacing the kernel. The patches
applied cleanly to the linux-raspi 7.0 source. This is the only mitigation
that targets the root cause rather than recovering from or trying to avoid it.
We load it live and non-persistently: the rebuilt module lives outside
/lib/modules and the initramfs and is inserted at runtime, so every
reboot returns to the stock driver automatically. With root on NVMe (so the NIC
driver is not boot-critical) this is low-risk — a bad module cannot wedge boot, because
boot never loads it. A loader script does the swap (verify the module's vermagic
matches the running kernel, rmmod stock → insmod patched →
bring the link up → wait for a real off-host ping → roll back to stock on any failure),
with the watchdog as the final backstop.
Two operational warnings if you try this:
- A live
macbreload destroys and recreateseth0. If you run an overlay network (we run flannel/VXLAN), the overlay device is bound to the old interface and may not rebuild itself — leaving the host with full LAN connectivity but no overlay (cross-node traffic silently fails). The fix is to restart the networking layer that owns the overlay after the swap. This bites both the swap and a manualrmmod; modproberevert. - The module is bound to one exact kernel version (vermagic). Any kernel upgrade obsoletes it and it must be rebuilt and re-swapped. A reboot after a kernel upgrade but before a re-swap simply runs the stock driver — safe, just unprotected. If you make this permanent, DKMS is the right vehicle because it auto-rebuilds the module on each kernel bump. We have not done this yet — it is gated on the soak result below.
The module loads tainted (out-of-tree + unsigned, taint mask 12288), as
expected.
Early result (preliminary, not a conclusion)
We deployed the patched driver to two nodes first and left a third on the stock driver as a control. In the following window the stock node hung twice overnight while the two patched nodes accumulated roughly 25 node-hours with no hang. That is a real differential and is encouraging, but it is a small sample and the control node was subsequently patched as well (it had started hanging, removing its value as a "never-hung" control). We are not yet claiming the patch works. The standing test is a multi-week soak across all nodes against the pre-patch hang rate.
How to confirm you have this specific bug
After a recovery, examine the boot that failed (e.g. journalctl -b -1):
<syntaxhighlight lang="bash"> ethtool -i eth0 # driver should be 'macb' journalctl -b -1 | grep -iE 'Link is Down|carrier' # expect NOTHING for eth0 (link never dropped) journalctl -b -1 | grep -iE 'i/o timeout|not responding, timed out' # all egress dead at once journalctl -b -1 | grep -icE 'oom-kill|killed process' # expect 0 (rules out memory) journalctl -b -1 | grep -icE 'rcu.*stall' # we saw 0 (the #2133877 precursor) ethtool -S eth0 | grep -vE ': 0$' # at hang time, error/drop counters are 0 </syntaxhighlight>
If you can catch it live (before resetting the interface), the conclusive check is to sample the frame counters a second apart and confirm TX is frozen while RX moves:
<syntaxhighlight lang="bash"> for i in 1 2 3; do
ethtool -S eth0 | grep -E 'tx_frames|rx_frames'; echo ---; sleep 1
done
- tx_frames identical across samples + rx_frames increasing = the silent TX stall
</syntaxhighlight>
Status and open questions (2026-06-29)
- Confirmed: the failure is a silent TX stall (captured: TX frozen, RX live), it is not memory/thermal/PHY/cable, and it recovers on interface reset.
- Confirmed not the cause for us: EEE/LPI (counters zero); frequency transitions as a necessary trigger (hang occurs at pinned max frequency); the RCU-stall precursor from #2133877 (never observed).
- Did not prevent it (our kernel):
performancegovernor; EEE/TSO/GSO-off + larger rings. - Works: the reachability watchdog (recovery, not prevention).
- Pending: whether the backported #7340 driver eliminates the hang — multi-week soak in progress; persistence via DKMS deferred until that passes.
- The real fix remains an Ubuntu SRU of #7340 into
linux-raspi; track Launchpad #2133877 for Fix Released.
References
- raspberrypi/linux #7339 — silent TX stall on Pi 5 / CM5 (root-cause issue): github.com/raspberrypi/linux/issues/7339
- raspberrypi/linux PR #7340 —
net: macb: candidate fixes for silent TX stall on BCM2712/RP1(merged torpi-6.18.y2026-05-08): github.com/raspberrypi/linux/pull/7340 · netdev series: lore.kernel.org - Ubuntu
linux-raspibug #2133877 — Complete network hang on Pi 5 (watch for Fix Released): bugs.launchpad.net/…/2133877 - siderolabs/sbc-raspberrypi #91 — silent network death on Pi 5 (Talos): 3-trigger decomposition + the EEE/TSO/GSO/rings bundle: github.com/siderolabs/sbc-raspberrypi/issues/91
- raspberrypi/linux #6420 — Pi 5 sometimes has no LAN after boot (a different, boot-time variant): github.com/raspberrypi/linux/issues/6420
- Intel e1000e "Detected Hardware Unit Hang" — origin of the unrelated "disable TSO" advice (different NIC/driver, does not transfer): forum.proxmox.com/…/e1000-driver-hang