Network Stability Issues with tg3 Driver - Hardware Replacement or Further Testing?

robalees@lemmy.world · 10 months ago

Network Stability Issues with tg3 Driver - Hardware Replacement or Further Testing?

Sorcaeden@lemmy.world · 10 months ago

I seem to recall a VMware complaint similar to this too, and there was a ring buffer tuning to do to fix it… But that error message doesn’t seem quite right to match it.

TX queue timeouts can be caused by several things, but I wonder if you’re not seeing an end result of a spammy Ethernet flow control implementation where the device can’t transmit because the link is continuously paused.

If so, there may be rx_xoff counters viewable from within proxmox, or “ethtool -s enp1s0f0” would tell you where the device is seeing pause frames from the switch on a regular Linux host.

The link down tends to be a reaction by the driver to recover from a hung queue, so if it’s not flow control, there could be a driver/firmware upgrade possible, or a series of tunables if there’s a bug somewhere in packet handling land resulting in the NIC itself hanging.