Bug #210

setting ring buffer size below 64 via ethtool on the ar71xx hangs the driver at gigE speeds

Added by Dave Täht on Jul 16, 2011. Updated on Apr 21, 2012.
Closed High Jim Gettys

Description

If the wan is on gigE and IF you are getting serious network traffic before you run ethtool, the ring buffer doesn’t get updated right, and you stop getting traffic at 7 packets. Worse, ethtool runs AFTER dhcp, so you do get a valid ip address, then ethtool -g tx 4 runs and locks up the card.

This showed up in our testing in the last couple days. You unplug and replug, it sets up the driver right that time, and life is good, and we achieve 100+Mbit/sec speeds with VERY low latency.

I can duplicate it every time now, but was faarrrr too tired after debugging the problem for 2 days straight to figure out where it was, but I think it should be easy to find.

Basically, the debloat package in cerowrt would trigger it - but only under those circumstances. Saw it, maybe, rarely, on 100Mbit on the wan port. It explains a lot - reports of dhcp problems, etc, etc.

History

Updated by Dave Täht on Jul 16, 2011.
Trying just reducing txqueuelen for now, will fix ethtool another day.
Updated by Dave Täht on Jul 17, 2011.
After removing the debloating script entirely, which calls ethtool and changes txqueuelen, we were able to get from a reliably reproducable test case (driver died after 7 packets) to where the wan port seemed to actually work, for one router, in limited testing.

In doing far more extensive testing, I was able to not only crash the wan port in a little over 500 packets, but actually get an kernel oops, even without fiddling with these parameters. I will upload the oops later (it was not very revealing regardless).

These routers do have a patch to their mac address creation routine, but I don’t think that is the problem, what I am thinking is that there is a real & SUBTLE problem in the reset routines (and/or the ring buffer) that happens when there is lots of other traffic on the wan.

I will look at this MUCH harder after I get caught up on sleep and back to my own lab.

Updated by Dave Täht on Jul 22, 2011.
pointed out by jow (thx!) nbd (double thx!) made a string of commits wednesday that look very promising to fix this bug. I will attempt a new build tomorrow morning.
Updated by Dave Täht on Jul 26, 2011.
Nope.

Although you can ifconfig down and up and get back in business, this is the oops I get

ADDRCONF (NETDEV_UP): ge00: link is not ready
ar71xx: pll_reg 0xb8050014: 0x11110000
ge00: link up (1000Mbps/Full duplex)
ADDRCONF (NETDEV_CHANGE): ge00: link becomes ready
———–[ cut here ]———–
WARNING: at net/sched/sch_generic.c:256 dev_watchdog+0x16c/0x274()
NETDEV WATCHDOG: ge00 (ag71xx): transmit queue 0 timed out
Modules linked in: gpio_buttons xt_hashlimit ip6t_REJECT ip6t_LOG ip6t_rt ip6t_hbh ip6t_mh ip6t_ipv6header ip6t_frag ip6t_eui64 ip6t_ah ip6table_raw ip6_queue ip6table_mangle ip6table_filter ip6_tables nf_conntrack_ipv6 nf_defrag_ipv6 ipt_SET ipt_set ip_set_setlist ip_set_portmap ip_set_nethash ip_set_macipmap ip_set_iptreemap ip_set_iptree ip_set_ipportnethash ip_set_ipportiphash ip_set_ipporthash ip_set_ipmap ip_set_iphash ip_set nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp xt_HL xt_hl ipt_ECN xt_CLASSIFY xt_time xt_tcpmss xt_statistic xt_mark xt_length ipt_ecn xt_DSCP xt_dscp xt_string xt_layer7 xt_quota xt_pkttype xt_physdev xt_owner ipt_MASQUERADE iptable_nat nf_nat xt_recent xt_helper xt_connmark xt_connbytes xt_conntrack xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack pppoe pppox ipt_REJECT xt_TCPMSS ipt_LOG xt_comment xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp x_tables ifb sit tunnel4 tun ppp_async ppp_generic slhc vfat fat autofs4 ath9k ath9k_common ath9k_hw ath nls_utf8 nls_iso8859_2 nls_iso8859_15 nls_iso8859_13 nls_iso8859_1 nls_cp437 mac80211 ts_fsm ts_bm ts_kmp crc_ccitt cfg80211 compat arc4 aes_generic crypto_algapi ipv6 usb_storage ohci_hcd ehci_hcd sd_mod ext4 jbd2 usbcore scsi_mod nls_base mbcache crc16 leds_gpio button_hotplug gpio_keys_polled input_polldev input_core
Call Trace:
[<8026ce24>] dump_stack+0x8/0x34
[<80075238>] warn_slowpath_common+0x78/0xa4
[<800752ec>] warn_slowpath_fmt+0x2c/0x38
[<801ef7d0>] dev_watchdog+0x16c/0x274
[<8007f468>] run_timer_softirq+0x14c/0x1ec
[<8007a9f4>] __do_softirq+0xac/0x15c
[<8007abfc>] do_softirq+0x48/0x68
[<800610e0>] plat_irq_dispatch+0x4c/0x17c
[<8006258c>] ret_from_irq+0x0/0x4
[<80062780>] r4k_wait+0x20/0x40
[<800640fc>] cpu_idle+0x24/0x44
[<802fc8d8>] start_kernel+0x36c/0x38c

[ end trace d0fa80935a954c41 ]–
ge00: tx timeout
ge00: tx timeout
ge00: tx timeout
ge00: tx timeout
ge00: tx timeout
ge00: tx timeout
ge00: tx timeout
ge00: tx timeout
ge00: tx timeout
ge00: tx timeout
ge00: tx timeout
ge00: tx timeout
ge00: link down
ADDRCONF (NETDEV_UP): ge00: link is not ready
ar71xx: pll_reg 0xb8050014: 0x11110000
ge00: link up (1000Mbps/Full duplex)
ADDRCONF (NETDEV_CHANGE): ge00: link becomes ready
ge00: no IPv6 routers present
ge00: link down
ar71xx: pll_reg 0xb8050014: 0x11110000
ge00: link up (1000Mbps/Full duplex)

Updated by Dave Täht on Jul 26, 2011.
The build for this is currently living at:

http://huchra.bufferbloat.net/~cero1/cerowrt-wndr3700-1.0rc2/

Although it fixes nearly all the other outstanding priority 1 bugs, this one is a showstopper, so I’m cancelling the rc2 release and going on to rc3 this week.

Updated by Dave Täht on Jul 26, 2011.
Data point: On a freshly flashed router with this build, connected via it’s switched port to the wan port of the previously flashed router (same build), the new router comes up ‘green’ for it’s connection light, whilst the old router is orange.

Either this means that GigE/100Mbit is not correctly being detected, or that the switch configuration for the blinkenlights is wrong, or….

Updated by Dave Täht on Jul 26, 2011.
Dave Täht wrote:
> Data point: On a freshly flashed router with this build, connected via it’s switched port to the the wan port of the previously flashed router (same build), the new router comes up ‘green’ for it’s connection light, whilst the old router is orange.
>
> Either this means that GigE/100Mbit is not correctly being detected, or that the switch configuration for the blinkenlights is wrong, or….
Updated by Dave Täht on Jul 26, 2011.
Interestingly, I changed the debloat script to change ethtool in this sequence, rather than the opposite. No oops, but the driver is just as hung. I also noticed that I can actually run 02-debloat in advance of bringing the device up (00-netstate)… which I hope will fix it.

case \$devtype in
0) ethtool -G \$DEV tx 4 ;
ip link set \$DEV txqueuelen 8;;

ge00 Link encap:Ethernet HWaddr C4:3D:C7:98:69:15
inet addr:172.30.42.45 Bcast:172.30.42.63 Mask:255.255.255.224
inet6 addr: fe80::c63d:c7ff:fe98:691564 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4 errors:0 dropped:0 overruns:0 frame:0
TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:8
RX bytes:872 (872.0 B) TX bytes:1146 (1.1 KiB)
Interrupt:5

Updated by Dave Täht on Jul 26, 2011.
so with the debloating script running before netstate, AND a sleep 1 between the events, I
get an oops again…

WARNING: at net/sched/sch_generic.c:256 dev_watchdog+0x16c/0x274()
NETDEV WATCHDOG: ge00 (ag71xx): transmit queue 0 timed out
Modules linked in: gpio_buttons xt_hashlimit ip6t_REJECT ip6t_LOG ip6t_rt ip6t_hbh ip6t_mh ip6t_ipv6header ip6t_frag ip6t_eui64 ip6t_ah ip6table_raw ip6_queue ip6table_mangle ip6table_filter ip6_tables nf_conntrack_ipv6 nf_defrag_ipv6 ipt_SET ipt_set ip_set_setlist ip_set_portmap ip_set_nethash ip_set_macipmap ip_set_iptreemap ip_set_iptree ip_set_ipportnethash ip_set_ipportiphash ip_set_ipporthash ip_set_ipmap ip_set_iphash ip_set nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp xt_HL xt_hl ipt_ECN xt_CLASSIFY xt_time xt_tcpmss xt_statistic xt_mark xt_length ipt_ecn xt_DSCP xt_dscp xt_string xt_layer7 xt_quota xt_pkttype xt_physdev xt_owner ipt_MASQUERADE iptable_nat nf_nat xt_recent xt_helper xt_connmark xt_connbytes xt_conntrack xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack pppoe pppox ipt_REJECT xt_TCPMSS ipt_LOG xt_comment xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp x_tables ifb sit tunnel4 tun ppp_async ppp_generic slhc vfat fat autofs4 ath9k ath9k_common ath9k_hw ath nls_utf8 nls_iso8859_2 nls_iso8859_15 nls_iso8859_13 nls_iso8859_1 nls_cp437 mac80211 ts_fsm ts_bm ts_kmp crc_ccitt cfg80211 compat arc4 aes_generic crypto_algapi ipv6 usb_storage ohci_hcd ehci_hcd sd_mod ext4 jbd2 usbcore scsi_mod nls_base mbcache crc16 leds_gpio button_hotplug gpio_keys_polled input_polldev input_core
Call Trace:
[<8026ce24>] dump_stack+0x8/0x34
[<80075238>] warn_slowpath_common+0x78/0xa4
[<800752ec>] warn_slowpath_fmt+0x2c/0x38
[<801ef7d0>] dev_watchdog+0x16c/0x274
[<8007f468>] run_timer_softirq+0x14c/0x1ec
[<8007a9f4>] __do_softirq+0xac/0x15c
[<8007abfc>] do_softirq+0x48/0x68
[<800610e0>] plat_irq_dispatch+0x4c/0x17c
[<8006258c>] ret_from_irq+0x0/0x4
[<80062780>] r4k_wait+0x20/0x40
[<800640fc>] cpu_idle+0x24/0x44
[<802fc8d8>] start_kernel+0x36c/0x38c

[ end trace 99006a3a445e09e2 ]–
ge00: tx timeout

Updated by Dave Täht on Jul 26, 2011.
So I can consistently lock it up by running ethtool -G ge00 tx 4 while there is traffic…

but changing txqueuelen seems to work.

I note that I rename devices in part because I can never remember which is the lan/wan ports and in part to make firewall rules easier. ge00 is the wan port. Which also has the patch to give it a unique mac…

Moving on to attempting this change much earlier in the boot.

Updated by Dave Täht on Jul 26, 2011.
I feel compelled to point out that driver buffering should probably come up automatically different when the link state is detected to be gigE, 100 or 10Mbit, but notice when it has been changed via ethtool…

all Linux network drivers should do that, actually. The problems we are having with the ar71xx… is just the first, in a long road, towards getting there.

I ran into similar problems attempting to debloat the common laptop ‘e1000’ driver, where it worked at 100Mbit, but failed to do TSO offload properly at gigE speeds, with buffers below 64.

Updated by Dave Täht on Jul 26, 2011.
success, I think.

I moved resetting ethtool to VERY early in the boot sequence in the S10boot script… and got a working boot. Need to do some load testing…

killall -q hotplug2
# Change device buffers
ethtool -G eth0 tx 4
ethtool -G eth1 tx 4
# change device names
/sbin/fixeth

root@OpenWrt:~# ifconfig se00
se00 Link encap:Ethernet HWaddr C6:3D:C7:98:69:14
inet addr:172.30.42.33 Bcast:172.30.42.63 Mask:255.255.255.224
inet6 addr: fe80::c43d:c7ff:fe98:691464 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1116 errors:0 dropped:7 overruns:12 frame:0
TX packets:1059 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:8
RX bytes:105475 (103.0 KiB) TX bytes:113162 (110.5 KiB)
Interrupt:4

root@OpenWrt:~# ifconfig ge00
ge00 Link encap:Ethernet HWaddr C4:3D:C7:98:69:15
inet addr:172.30.42.45 Bcast:172.30.42.63 Mask:255.255.255.224
inet6 addr: fe80::c63d:c7ff:fe98:691564 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:86 errors:0 dropped:0 overruns:0 frame:0
TX packets:77 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:8
RX bytes:11302 (11.0 KiB) TX bytes:11454 (11.1 KiB)

Updated by Jim Gettys on Jul 26, 2011.
The ring buffer is completely independent of the transmit queue in Linux today.

This is a major bug in Linux (and other OS’s design), and we’ll be discussing this at LPC, I hope.

And yes, the default txqueuelen the driver should return should depend on the link speed; this is a “safe” change. Cutting to 100 packets at 100Mbps, and (maybe) 10 at 10Mbps should have the same effect as we have now without any chance of introducing problems. It’s certainly the short term “hack” until we have more intelligent buffer management across the rings an transmit queue (and queue disciplines).

Updated by Dave Täht on Jul 27, 2011.
I have proven to my satisfaction that 4 driver buffers + 8 txqueuelen buffers are enough to have sustained 100Mbit performance on this hardware at 100Mbit. We can even go lower…
Updated by Dave Täht on Jul 27, 2011.
Updated by Dave Täht on Jul 31, 2011.
Updated by Dave Täht on Sep 1, 2011.
I do not have this fixed ‘right’. Even with all the patches that flew by on it, the only way that I can successfully
change the ethernet tx ring is in /etc/init.d/boot before it comes up at all.

Which is what I’m doing in cerowrt 1.0.

Updated by Dave Täht on Apr 20, 2012.
I have been able to dynamically change the tx rings for a while…

but bql eliminates the need, and the default is now 64 anyway.

Updated by Dave Täht on Apr 21, 2012.

This is a static export of the original bufferbloat.net issue database. As such, no further commenting is possible; the information is solely here for archival purposes.
RSS feed

Recent Updates

Oct 20, 2023 Wiki page
What Can I Do About Bufferbloat?
Dec 3, 2022 Wiki page
Codel Wiki
Jun 11, 2022 Wiki page
More about Bufferbloat
Jun 11, 2022 Wiki page
Tests for Bufferbloat
Dec 7, 2021 Wiki page
Getting SQM Running Right

Find us elsewhere

Bufferbloat Mailing Lists
#bufferbloat on Twitter
Google+ group
Archived Bufferbloat pages from the Wayback Machine

Sponsors

Comcast Research Innovation Fund
Nlnet Foundation
Shuttleworth Foundation
GoFundMe

Bufferbloat Related Projects

OpenWrt Project
Congestion Control Blog
Flent Network Test Suite
Sqm-Scripts
The Cake shaper
AQMs in BSD
IETF AQM WG
CeroWrt (where it all started)

Network Performance Related Resources


Jim Gettys' Blog - The chairman of the Fjord
Toke's Blog - Karlstad University's work on bloat
Voip Users Conference - Weekly Videoconference mostly about voip
Candelatech - A wifi testing company that "gets it".