Bug #418

TFO crashes cerowrt 3.7.1-1

Added by David Taht on Jan 4, 2013. Updated on May 14, 2014.
Closed Normal Dave Täht

Description

I am not able to get through even a single TCP TFO connection. The router restarts as soon as it sees the TFO connection.

Looks like SYN+Data is crashing the box (see attached trace captured on lo iface of cero). logread, dmesg did not show anything. I don’t know whether its kernel panic.

This is strange as 3.6 was working for SYN+Data cases.

However the difference from previous instance is the polipo server with TFO running on cero box. Client may run on same cero or on my laptop which in either case crashes the box.

On 3.7, if I run the TFO client on cero box and TFO server on the laptop, it still works but not the reverse.

(see email thread at: https://lists.bufferbloat.net/pipermail/cerowrt-devel/2013-January/000844.html for more details)

Attachments

  • tfo_to_x86_64_web.cap (application/vnd.tcpdump.pcap; 206.6 kiB) httping -G -F http://172.26.3.4/~d/rrul/vyatta-fq_codel_bql32k.svg David Taht Jan 4, 2013
  • tfo_to_x86_64_web_via_proxy.cap (application/vnd.tcpdump.pcap; 153.4 kiB) httping -x 172.26.3.4:8123 -G -F http://172.26.3.4/~d/rrul/vyatta-fq_codel_bql32k.svg David Taht Jan 4, 2013
  • tfo_crash_fixed.patch (text/x-patch; 533 bytes) This patch by Eric D should fix the crash Ketan Kulkarni Jan 14, 2013

History

Updated by David Taht on Jan 4, 2013.
I had been building up a 3.7.1-X based testbed anyway, which looks (in part) like this:

3.6.8 laptop (presently) with iwl wifi driver

traceroute to 172.26.3.4 (172.26.3.4), 30 hops max, 60 byte packets
 1  172.26.2.1  2.978 ms  2.890 ms  10.955 ms   picostation 2HP (wireless)
 2  172.26.1.4  10.934 ms  10.903 ms  10.868 ms nanostation m5 (wired)
 3  172.26.1.1  13.826 ms  13.802 ms  13.785 ms nanostation m5 (wired)
 4  172.26.3.224  16.374 ms  16.366 ms  16.361 ms cerowrt 3.7.1-1 adhoc mesh
 5  172.26.3.4  16.355 ms  16.327 ms  16.296 ms x86_64 box with 3.7.1-1 on it

so future tests will be against that for a while.

Updated by David Taht on Jan 4, 2013.
In an effort to duplicate ketan’s bug, I first setup a x86_64 box as a web and polipo (with tfo patch) server on the other side of the testbed. And ran three tests, using httping to issue a full GET request (as opposed to HEAD), once with tfo disabled, another time via polipo with tfo off in polipo, and the last with it on.

The commands:

 httping -G -F http://172.26.3.4/~d/rrul/vyatta-fq_codel_bql32k.svg
 httping -x 172.26.3.4:8123 -G -F http://172.26.3.4/~d/rrul/vyatta-fq_codel_bql32k.svg
 httping -x 172.26.3.4:8123 -G -F http://172.26.3.4/~d/rrul/vyatta-fq_codel_bql32k.svg

The capture script (note you’d have to change this to port 80 for proxyless and your own server ip)

tcpdump -i wlan1 -w tfo_to_x86_64_web_via_proxy_tfo_enabled.cap port 8123 and host 172.26.3.4

Going 3.6.8 to 3.7.1 over x86_64 over this network… worked! The performance improvement of TFO on vs off over the proxy OVER this multi-hope WIFI connection was remarkable:

root@ida:~/src/httping-1.5.6# httping -x 172.26.3.4:8123 -G -Fw http://172.26.3.4/~d/rrul/vyatta-fq_codel_bql32k.svg
Using proxyserver: 172.26.3.4:8123
PING 172.26.3.4:80 (http://172.26.3.4/~d/rrul/vyatta-fq_codel_bql32k.svg):
connected to 172.26.3.4:80 (273 bytes), seq=0 time=11.47 ms 
connected to 172.26.3.4:80 (273 bytes), seq=1 time=12.82 ms 
connected to 172.26.3.4:80 (273 bytes), seq=2 time=10.79 ms 
connected to 172.26.3.4:80 (273 bytes), seq=3 time=9.92 ms 
connected to 172.26.3.4:80 (274 bytes), seq=4 time=10.76 ms 
connected to 172.26.3.4:80 (274 bytes), seq=5 time=12.01 ms 
connected to 172.26.3.4:80 (274 bytes), seq=6 time=13.64 ms 
connected to 172.26.3.4:80 (274 bytes), seq=7 time=11.10 ms 
Got signal 2
--- http://172.26.3.4/~d/rrul/vyatta-fq_codel_bql32k.svg ping statistics ---
8 connects, 8 ok, 0.00% failed, time 7695ms
round-trip min/avg/max = 9.9/11.6/13.6 ms

root@ida:~/src/httping-1.5.6# httping -x 172.26.3.4:8123 -G -F http://172.26.3.4/~d/rrul/vyatta-fq_codel_bql32k.svg
Using proxyserver: 172.26.3.4:8123
PING 172.26.3.4:80 (http://172.26.3.4/~d/rrul/vyatta-fq_codel_bql32k.svg):
connected to 172.26.3.4:80 (274 bytes), seq=0 time=9.69 ms 
connected to 172.26.3.4:80 (274 bytes), seq=1 time=5.57 ms 
connected to 172.26.3.4:80 (274 bytes), seq=2 time=10.11 ms 
connected to 172.26.3.4:80 (274 bytes), seq=3 time=8.11 ms 
connected to 172.26.3.4:80 (274 bytes), seq=4 time=6.61 ms 
connected to 172.26.3.4:80 (274 bytes), seq=5 time=5.71 ms 
connected to 172.26.3.4:80 (274 bytes), seq=6 time=5.78 ms 

If this stuff can actually be made secure and deployable, it’s going to be a big win on web servers and proxies and a variety of other connect time sensitive TCP based applications.

(I included the actual rrul file I was transferring for completeness, which is showing this odd periodic spike in loss being incurred somewhere on the path elsewhere…. and has nothing to do with the TFO test…)

Updated by David Taht on Jan 4, 2013.
Heh. Spoke too soon on the performance front. The packet capture shows the 3.6.11 middlebox shipping a RST on the first big packet…. it appears to have conntracking on but little else.

root@closet:~# uname -a
Linux closet.wifi.armory.com 3.6.11 #2 Mon Dec 24 05:24:18 PST 2012 mips GNU/Linux
root@closet:~# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere            
input_rule  all  --  anywhere             anywhere            
input      all  --  anywhere             anywhere            

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
forwarding_rule  all  --  anywhere             anywhere            
forward    all  --  anywhere             anywhere            

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere            
output_rule  all  --  anywhere             anywhere            
output     all  --  anywhere             anywhere            

Chain forward (1 references)
target     prot opt source               destination         

Chain forwarding_rule (1 references)
target     prot opt source               destination         

Chain input (1 references)
target     prot opt source               destination         

Chain input_rule (1 references)
target     prot opt source               destination         

Chain output (1 references)
target     prot opt source               destination         

Chain output_rule (1 references)
target     prot opt source               destination         

Chain reject (0 references)
target     prot opt source               destination         
REJECT     tcp  --  anywhere             anywhere             reject-with tcp-reset
REJECT     all  --  anywhere             anywhere             reject-with icmp-port-unreachable
Updated by David Taht on Jan 4, 2013.
flushed iptables rules, similar result, capture attached…

Need to simplify the test a bit, a few too many machines in the way right now

Updated by Ketan Kulkarni on Jan 5, 2013.
Dave, the RST you are getting is probably the behavior of httping. By default httping accepts only 4096 bytes of data and after than it simply closes the socket. (closing a socket when data is pending to read will send the RST).
Even I could see the same behavior when I tried -G (using GET for a larger file) on localhost.

Try adding these options to your httping cmd if you are using -G
-b -X -L 100000

L> configures the size of the data that you want to receive after which httping will just close the connection.
-b to show transfer speed
-X to show how much data was transferred in a single ping.

Updated by Ketan Kulkarni on Jan 5, 2013.
BTW, I am just using the HEAD request by default which then
automatically limits the response to a single packet.
Don’t you think this is better way to test the http latency than to
completely download the file using GET?

Thanks,

On Sun, Jan 6, 2013 at 12:01 AM, cerowrt@lists.bufferbloat.net wrote:
>
> Issue #418 has been updated by Ketan Kulkarni.
>
>
> Dave, the RST you are getting is probably the behavior of httping. > By default httping accepts only 4096 bytes of data and after than it simply closes the socket. (closing a socket when data is pending to read will send the RST).
> Even I could see the same behavior when I tried -G (using GET for a larger file) on localhost.
>
> Try adding these options to your httping cmd if you are using -G
> -b -X -L 100000
>
> L> configures the size of the data that you want to receive after which httping will just close the connection.
> -b to show transfer speed
> -X to show how much data was transferred in a single ping.
> —————————————-
> Bug #418: TFO crashes cerowrt 3.7.1-1
> https://www.bufferbloat.net/issues/418 >
> Author: David Taht
> Status: New
> Priority: Normal
> Assignee: Dave Täht
> Category: Networking
> Target version:
>
>
> I am not able to get through even a single TCP TFO connection. The router restarts as soon as it sees the TFO connection.
>
> Looks like SYN+Data is crashing the box (see attached trace captured on lo iface of cero). logread, dmesg did not show anything. I don’t know whether its kernel panic.
>
> This is strange as 3.6 was working for SYN+Data cases.
>
> However the difference from previous instance is the polipo server with TFO running on cero box. Client may run on same cero or on my laptop which in either case crashes the box.
>
> On 3.7, if I run the TFO client on cero box and TFO server on the laptop, it still works but not the reverse.
>
> (see email thread at: https://lists.bufferbloat.net/pipermail/cerowrt-devel/2013-January/000844.html for more details)
>
>

Updated by Ketan Kulkarni on Jan 5, 2013.
Disabling ECN on cero and TFO ON has no effect.
The crash is still seen on cero.

This is the trace I get (similar to lo_capture.txt with ECN disabled)

root@OpenWrt:~# httping -F -g http://127.0.0.1:8123/ PING 127.0.0.1:8123 (http://127.0.0.1:8123/): 18:15:37.736001 IP localhost.39443 > localhost.8123: Flags [S], seq 4217980186, win 43690, options [mss 65495,sackOK,TS val 343024 ecr 0,nop,wscale 6,Unknown Option 254f989], length 0
18:15:37.736121 IP localhost.8123 > localhost.39443: Flags [S.], seq 1926633800, ack 4217980187, win 43690, options [mss 65495,sackOK,TS val 343024 ecr 343024,nop,wscale 6,Unknown Option 254f989df087214939732ef], length 0
18:15:37.736174 IP localhost.39443 > localhost.8123: Flags [.], ack 1, win 683, options [nop,nop,TS val 343024 ecr 343024], length 0
connected to 127.0.0.1:8123 (183 bytes), seq=0 time=3.21 ms
18:15:37.737372 IP localhost.39443 > localhost.8123: Flags [P.], seq 1:65, ack 1, win 683, options [nop,nop,TS val 343024 ecr 343024], length 64
18:15:37.737538 IP localhost.8123 > localhost.39443: Flags [.], ack 65, win 683, options [nop,nop,TS val 343024 ecr 343024], length 0
18:15:37.738395 IP localhost.8123 > localhost.39443: Flags [P.], seq 1:184, ack 65, win 683, options [nop,nop,TS val 343025 ecr 343024], length 183
18:15:37.738565 IP localhost.8123 > localhost.39443: Flags [F.], seq 184, ack 65, win 683, options [nop,nop,TS val 343025 ecr 343024], length 0
18:15:37.738682 IP localhost.39443 > localhost.8123: Flags [.], ack 184, win 700, options [nop,nop,TS val 343025 ecr 343025], length 0
18:15:37.738953 IP localhost.39443 > localhost.8123: Flags [F.], seq 65, ack 185, win 700, options [nop,nop,TS val 343025 ecr 343025], length 0
18:15:37.739063 IP localhost.8123 > localhost.39443: Flags [.], ack 66, win 683, options [nop,nop,TS val 343025 ecr 343025], length 0

Updated by David Taht on Jan 5, 2013.
On Sat, Jan 5, 2013 at 1:34 PM, cerowrt@lists.bufferbloat.net wrote:
>
> Issue #418 has been updated by Ketan Kulkarni.
>
>
> BTW, I am just using the HEAD request by default which then
> automatically limits the response to a single packet.
> Don’t you think this is better way to test the http latency than to
> completely download the file using GET?

Depends on what you were testing for. ME, I was working on looking at
wifi aggregate behavior over the mesh network and doing a GET to grab
a file with 40K in it (less than one aggregate) seemed sane at the
time.

Didn’t know that httping cut off at 4k with the RST, so my test was
invalid. Thank you for the modified version!

Am also glad it wasn’t the middle box!

certainly if all we want to do is test the time of a tcp setup, a head
request is saner.


Thanks,

On Sun, Jan 6, 2013 at 12:01 AM, cerowrt@lists.bufferbloat.net wrote:
>
> Issue #418 has been updated by Ketan Kulkarni.
>
>
> Dave, the RST you are getting is probably the behavior of httping. > By default httping accepts only 4096 bytes of data and after than it simply closes the socket. (closing a socket when data is pending to read will send the RST).
> Even I could see the same behavior when I tried -G (using GET for a larger file) on localhost.
>
> Try adding these options to your httping cmd if you are using -G
> -b -X -L 100000
>
> L> configures the size of the data that you want to receive after which httping will just close the connection.
> -b to show transfer speed
> -X to show how much data was transferred in a single ping.
> —————————————-
> Bug #418: TFO crashes cerowrt 3.7.1-1
> https://www.bufferbloat.net/issues/418 >
> Author: David Taht
> Status: New
> Priority: Normal
> Assignee: Dave Täht
> Category: Networking
> Target version:
>
>
> I am not able to get through even a single TCP TFO connection. The router restarts as soon as it sees the TFO connection.
>
> Looks like SYN+Data is crashing the box (see attached trace captured on lo iface of cero). logread, dmesg did not show anything. I don’t know whether its kernel panic.
>
> This is strange as 3.6 was working for SYN+Data cases.
>
> However the difference from previous instance is the polipo server with TFO running on cero box. Client may run on same cero or on my laptop which in either case crashes the box.
>
> On 3.7, if I run the TFO client on cero box and TFO server on the laptop, it still works but not the reverse.
>
> (see email thread at: https://lists.bufferbloat.net/pipermail/cerowrt-devel/2013-January/000844.html for more details)
>
>
—————————————-
Bug #418: TFO crashes cerowrt 3.7.1-1
https://www.bufferbloat.net/issues/418
Author: David Taht
Status: New
Priority: Normal
Assignee: Dave Täht
Category: Networking
Target version:


I am not able to get through even a single TCP TFO connection. The router restarts as soon as it sees the TFO connection.

Looks like SYN+Data is crashing the box (see attached trace captured on lo iface of cero). logread, dmesg did not show anything. I don’t know whether its kernel panic.

This is strange as 3.6 was working for SYN+Data cases.

However the difference from previous instance is the polipo server with TFO running on cero box. Client may run on same cero or on my laptop which in either case crashes the box.

On 3.7, if I run the TFO client on cero box and TFO server on the laptop, it still works but not the reverse.

(see email thread at: https://lists.bufferbloat.net/pipermail/cerowrt-devel/2013-January/000844.html for more details)

Updated by David Taht on Jan 8, 2013.
While I acquired a serial adaptor for the 3800 I did not gain the time
to look at this issue over the weekend.

I’m buried this week and may not be able to get to this til the weekend.

Updated by Ketan Kulkarni on Jan 13, 2013.
I could get a chance to get the backtrace from serial port.
I didnt do the kgdb session yet.

[ 1024.530000] Kernel bug detected[#1]:
[ 1024.530000] Cpu 0
[ 1024.530000] \$ 0 : 00000000 00000000 00000001 00000014
[ 1024.530000] \$ 4 : 8600a9a0 85446cc0 00000000 377f8397
[ 1024.530000] \$ 8 : 00000000 00000000 00000000 204e7f80
[ 1024.530000] \$12 : 39808080 00000007 00000000 00000000
[ 1024.530000] \$16 : 85446cc0 854fb280 8600a500 00000000
[ 1024.530000] \$20 : 85446cc0 80340000 00000006 803aac70
[ 1024.530000] \$24 : 00000000 c0aa90d8
[ 1024.530000] \$28 : 854c2000 854c3ad8 8033c050 8024a36c
[ 1024.530000] Hi : 00000000
[ 1024.530000] Lo : 00000000
[ 1024.530000] epc : 801fc7f4 reqsk_fastopen_remove+0x30/0x17c
[ 1024.530000] Tainted: G O
[ 1024.530000] ra : 8024a36c tcp_rcv_state_process+0x7b4/0xc28
[ 1024.530000] Status: 1000fc03 KERNEL EXL IE
[ 1024.530000] Cause : 10800034
[ 1024.530000] PrId : 00019374 (MIPS 24Kc)
[ 1024.530000] Modules linked in: sch_teql sch_tbf sch_sfq sch_red
sch_qfq sch_prio sch_ns2_codel sch_nfq_codel sch_netem sch_htb
sch_gred sch_efq_codel sch_dsmark em_text em_nbyte em_meta em_cmp
cls_basic act_police act_ipt act_connmark act_skbedit act_mirred
em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc
sch_fq_codel sch_codel sch_ingress ath79_wdt xt_hashlimit xt_set(O)
ip_set_list_set(O) ip_set_hash_netport(O) ip_set_hash_netiface(O)
ip_set_hash_net(O) ip_set_hash_ipportnet(O) ip_set_hash_ipportip(O)
ip_set_hash_ipport(O) ip_set_hash_ip(O) ip_set_bitmap_port(O)
ip_set_bitmap_ipmac(O) ip_set_bitmap_ip(O) ip_set(O) ip6t_REJECT
ip6t_rt ip6t_hbh ip6t_mh ip6t_ipv6header ip6t_frag ip6t_eui64 ip6t_ah
ip6table_raw ip6table_mangle ip6table_filter ip6_tables
nf_conntrack_ipv6 nf_defrag_ipv6 nfnetlink nf_nat_irc nf_nat_ftp
nf_conntrack_irc nf_conntrack_ftp xt_policy xt_esp ipt_ah xt_HL xt_hl
xt_ecn ipt_ECN xt_CLASSIFY xt_time xt_tcpmss xt_statistic xt_mark
xt_length xt_DSCP xt_dscp xt_string xt_layer7 xt_quota xt_pkttype
xt_physdev xt_owner ipt_MASQUERADE iptable_nat xt_nat nf_nat_ipv4
nf_nat xt_recent xt_helper xt_connmark xt_connbytes pptp pppoe
xt_conntrack xt_CT iptable_raw xt_state nf_conntrack_ipv4
nf_defrag_ipv4 nf_conntrack pppox ipt_REJECT xt_TCPMSS xt_LOG
xt_comment xt_multiport xt_mac xt_limit iptable_mangle iptable_filter
ip_tables xt_tcpudp x_tables ip_gre gre ifb sit xfrm4_tunnel
xfrm4_mode_tunnel xfrm4_mode_transport xfrm4_mode_beet esp4 ah4
tunnel4 tun tcp_ledbat(O) ppp_async ppp_generic slhc xfrm_user
xfrm_algo vfat fat autofs4 button_hotplug(O) ath9k(O) ath9k_common(O)
ath9k_hw(O) ath(O) nls_utf8 nls_iso8859_2 nls_iso8859_15
nls_iso8859_13 nls_iso8859_1 nls_cp437 mac80211(O) ts_fsm ts_bm ts_kmp
crc_ccitt ipv6 input_polldev cfg80211(O) compat(O) input_core
sha1_generic md5 hmac des_generic deflate cbc authenc arc4 usb_storage
ohci_hcd ehci_hcd sd_mod ext4 jbd2 mbcache usbcore usb_common scsi_mod
nls_base crc16 zlib_inflate zlib_deflate ledtrig_timer
ledtrig_default_on leds_gpio gpio_button_hotplug(O)
[ 1024.530000] Process seq (pid: 4023, threadinfo=854c2000,
task=85461df8, tls=7735b440)
[ 1024.530000] Stack : 00000000 8600a9a0 87287d80 8607bb58 85446cc0
80330000 8600a9a0 87287d80
8607bb58 8024a36c c0aa9720 c2337090 86b39000 00000000 00000001 00000014
00000001 02000004 862b6d24 c2337000 00000001 8547bf00 00000000 00000000
854c3bcc 854c3bcc 87287d80 854c3bcc 87287d80 80325dd8 87287d80 8600a9a0
8600a9a0 8607bb58 8607bb44 802516ec 87287d80 803261dc 80340000 80340000

[ 1024.530000] Call Trace:
[ 1024.530000] [<801fc7f4>] reqsk_fastopen_remove+0x30/0x17c
[ 1024.530000] [<8024a36c>] tcp_rcv_state_process+0x7b4/0xc28
[ 1024.530000] [<802516ec>] tcp_v4_do_rcv+0x21c/0x274
[ 1024.530000] [<80253c74>] tcp_v4_rcv+0x5b4/0x974
[ 1024.530000] [<802320f0>] ip_local_deliver_finish+0x168/0x29c
[ 1024.530000] [<80207100>] __netif_receive_skb+0x63c/0x6c0
[ 1024.530000] [] ieee80211_deliver_skb+0x1b8/0x220 [mac80211]
[ 1024.530000] []
ieee80211_rx_handlers.part.12+0x1654/0x23e0 [mac80211]
[ 1024.530000] []
ieee80211_prepare_and_rx_handle+0xa6c/0xaf0 [mac80211]
[ 1024.530000] [] ieee80211_rx+0x810/0x8d8 [mac80211]
[ 1024.530000] [] ath_rx_tasklet+0xf4c/0x10a4 [ath9k]
[ 1024.530000] [] ath9k_tasklet+0x104/0x174 [ath9k]
[ 1024.530000] [<800793b8>] tasklet_action+0x78/0xc8
[ 1024.530000] [<80078c08>] __do_softirq+0xb0/0x184
[ 1024.530000] [<80078d8c>] do_softirq+0x48/0x68
[ 1024.530000] [<80078fa8>] irq_exit+0x4c/0x7c
[ 1024.530000] [<8006330c>] ret_from_irq+0x0/0x4
[ 1024.530000]
[ 1024.530000]
Code: 8e510208 30d300ff 2c420001 <00028036> 0c01e2a7 ac80048c
8e220008 2442ffff ae220008
[ 1024.940000] –[ end trace a47ff22dd20a96c1 ]–
[ 1024.950000] Kernel panic - not syncing: Fatal exception in interrupt
[ 1024.950000] ———–[ cut here ]———–
[ 1024.950000] WARNING: at lib/vsprintf.c:1376 vsnprintf+0x6c/0x39c()
[ 1024.950000] Modules linked in: sch_teql sch_tbf sch_sfq sch_red
sch_qfq sch_prio sch_ns2_codel sch_nfq_codel sch_netem sch_htb
sch_gred sch_efq_codel sch_dsmark em_text em_nbyte em_meta em_cmp
cls_basic act_police act_ipt act_connmark act_skbedit act_mirred
em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc
sch_fq_codel sch_codel sch_ingress ath79_wdt xt_hashlimit xt_set(O)
ip_set_list_set(O) ip_set_hash_netport(O) ip_set_hash_netiface(O)
ip_set_hash_net(O) ip_set_hash_ipportnet(O) ip_set_hash_ipportip(O)
ip_set_hash_ipport(O) ip_set_hash_ip(O) ip_set_bitmap_port(O)
ip_set_bitmap_ipmac(O) ip_set_bitmap_ip(O) ip_set(O) ip6t_REJECT
ip6t_rt ip6t_hbh ip6t_mh ip6t_ipv6header ip6t_frag ip6t_eui64 ip6t_ah
ip6table_raw ip6table_mangle ip6table_filter ip6_tables
nf_conntrack_ipv6 nf_defrag_ipv6 nfnetlink nf_nat_irc nf_nat_ftp
nf_conntrack_irc nf_conntrack_ftp xt_policy xt_esp ipt_ah xt_HL xt_hl
xt_ecn ipt_ECN xt_CLASSIFY xt_time xt_tcpmss xt_statistic xt_mark
xt_length xt_DSCP xt_dscp xt_string xt_layer7 xt_quota xt_pkttype
xt_physdev xt_owner ipt_MASQUERADE iptable_nat xt_nat nf_nat_ipv4
nf_nat xt_recent xt_helper xt_connmark xt_connbytes pptp pppoe
xt_conntrack xt_CT iptable_raw xt_state nf_conntrack_ipv4
nf_defrag_ipv4 nf_conntrack pppox ipt_REJECT xt_TCPMSS xt_LOG
xt_comment xt_multiport xt_mac xt_limit iptable_mangle iptable_filter
ip_tables xt_tcpudp x_tables ip_gre gre ifb sit xfrm4_tunnel
xfrm4_mode_tunnel xfrm4_mode_transport xfrm4_mode_beet esp4 ah4
tunnel4 tun tcp_ledbat(O) ppp_async ppp_generic slhc xfrm_user
xfrm_algo vfat fat autofs4 button_hotplug(O) ath9k(O) ath9k_common(O)
ath9k_hw(O) ath(O) nls_utf8 nls_iso8859_2 nls_iso8859_15
nls_iso8859_13 nls_iso8859_1 nls_cp437 mac80211(O) ts_fsm ts_bm ts_kmp
crc_ccitt ipv6 input_polldev cfg80211(O) compat(O) input_core
sha1_generic md5 hmac des_generic deflate cbc authenc arc4 usb_storage
ohci_hcd ehci_hcd sd_mod ext4 jbd2 mbcache usbcore usb_common scsi_mod
nls_base crc16 zlib_inflate zlib_deflate ledtrig_timer
ledtrig_default_on leds_gpio gpio_button_hotplug(O)
[ 1024.950000] Call Trace:
[ 1024.950000] [<802a25e8>] dump_stack+0x8/0x34
[ 1024.950000] [<80071f84>] warn_slowpath_common+0x78/0xa4
[ 1024.950000] [<80071fc8>] warn_slowpath_null+0x18/0x24
[ 1024.950000] [<80183d5c>] vsnprintf+0x6c/0x39c
[ 1024.950000] [<800bc99c>] crashlog_printf+0x4c/0x68
[ 1024.950000] [<800bca0c>] crashlog_do_dump+0x54/0x138
[ 1024.950000] [<800752bc>] kmsg_dump+0xe0/0x11c
[ 1024.950000] [<802a26b4>] panic+0x8c/0x1b4
[ 1024.950000] [<80067b68>] die+0x104/0x10c
[ 1024.950000] [<80067cd0>] do_trap_or_bp+0x118/0x18c
[ 1024.950000] [<80063300>] ret_from_exception+0x0/0xc
[ 1024.950000] [<801fc7f4>] reqsk_fastopen_remove+0x30/0x17c
[ 1024.950000] [<8024a36c>] tcp_rcv_state_process+0x7b4/0xc28
[ 1024.950000] [<802516ec>] tcp_v4_do_rcv+0x21c/0x274
[ 1024.950000] [<80253c74>] tcp_v4_rcv+0x5b4/0x974
[ 1024.950000] [<802320f0>] ip_local_deliver_finish+0x168/0x29c
[ 1024.950000] [<80207100>] __netif_receive_skb+0x63c/0x6c0
[ 1024.950000] [] ieee80211_deliver_skb+0x1b8/0x220 [mac80211]
[ 1024.950000] []
ieee80211_rx_handlers.part.12+0x1654/0x23e0 [mac80211]
[ 1024.950000] []
ieee80211_prepare_and_rx_handle+0xa6c/0xaf0 [mac80211]
[ 1024.950000] [] ieee80211_rx+0x810/0x8d8 [mac80211]
[ 1024.950000] [] ath_rx_tasklet+0xf4c/0x10a4 [ath9k]
[ 1024.950000] [] ath9k_tasklet+0x104/0x174 [ath9k]
[ 1024.950000] [<800793b8>] tasklet_action+0x78/0xc8
[ 1024.950000] [<80078c08>] __do_softirq+0xb0/0x184
[ 1024.950000] [<80078d8c>] do_softirq+0x48/0x68
[ 1024.950000] [<80078fa8>] irq_exit+0x4c/0x7c
[ 1024.950000] [<8006330c>] ret_from_irq+0x0/0x4
[ 1024.950000]
[ 1024.950000] –[ end trace a47ff22dd20a96c2 ]–
[ 1024.950000] Rebooting in 3 seconds..þþ

U-Boot 1.1.4 (May 27 2011 - 14:58:01)


On Tue, Jan 8, 2013 at 9:45 PM, cerowrt@lists.bufferbloat.net wrote:
>
> Issue #418 has been updated by David Taht.
>
>
> While I acquired a serial adaptor for the 3800 I did not gain the time
> to look at this issue over the weekend.
>
> I’m buried this week and may not be able to get to this til the weekend.
> —————————————-
> Bug #418: TFO crashes cerowrt 3.7.1-1
> https://www.bufferbloat.net/issues/418 >
> Author: David Taht
> Status: New
> Priority: Normal
> Assignee: Dave Täht
> Category: Networking
> Target version:
>
>
> I am not able to get through even a single TCP TFO connection. The router restarts as soon as it sees the TFO connection.
>
> Looks like SYN+Data is crashing the box (see attached trace captured on lo iface of cero). logread, dmesg did not show anything. I don’t know whether its kernel panic.
>
> This is strange as 3.6 was working for SYN+Data cases.
>
> However the difference from previous instance is the polipo server with TFO running on cero box. Client may run on same cero or on my laptop which in either case crashes the box.
>
> On 3.7, if I run the TFO client on cero box and TFO server on the laptop, it still works but not the reverse.
>
> (see email thread at: https://lists.bufferbloat.net/pipermail/cerowrt-devel/2013-January/000844.html for more details)
>
>

Updated by Ketan Kulkarni on Jan 13, 2013.
I do not know if this is the faulty line -

void reqsk_fastopen_remove(struct sock *sk, struct request_sock *req,
bool reset)
{
struct sock *lsk = tcp_rsk(req)->listener;
struct fastopen_queue *fastopenq =
inet_csk(lsk)->icsk_accept_queue.fastopenq;

BUG_ON(!spin_is_locked(&sk->sk_lock.slock) && !sock_owned_by_user(sk));

tcp_sk(sk)->fastopen_rsk = NULL;
spin_lock_bh(&fastopenq->lock);
fastopenq->qlen–;
tcp_rsk(req)->listener = NULL;

On Sun, Jan 13, 2013 at 4:44 PM, cerowrt@lists.bufferbloat.net wrote:
>
> Issue #418 has been updated by Ketan Kulkarni.
>
>
> I could get a chance to get the backtrace from serial port.
> I didnt do the kgdb session yet.
>
> [ 1024.530000] Kernel bug detected[#1]:
> [ 1024.530000] Cpu 0
> [ 1024.530000] \$ 0 : 00000000 00000000 00000001 00000014
> [ 1024.530000] \$ 4 : 8600a9a0 85446cc0 00000000 377f8397
> [ 1024.530000] \$ 8 : 00000000 00000000 00000000 204e7f80
> [ 1024.530000] \$12 : 39808080 00000007 00000000 00000000
> [ 1024.530000] \$16 : 85446cc0 854fb280 8600a500 00000000
> [ 1024.530000] \$20 : 85446cc0 80340000 00000006 803aac70
> [ 1024.530000] \$24 : 00000000 c0aa90d8
> [ 1024.530000] \$28 : 854c2000 854c3ad8 8033c050 8024a36c
> [ 1024.530000] Hi : 00000000
> [ 1024.530000] Lo : 00000000
> [ 1024.530000] epc : 801fc7f4 reqsk_fastopen_remove+0x30/0x17c
> [ 1024.530000] Tainted: G O
> [ 1024.530000] ra : 8024a36c tcp_rcv_state_process+0x7b4/0xc28
> [ 1024.530000] Status: 1000fc03 KERNEL EXL IE
> [ 1024.530000] Cause : 10800034
> [ 1024.530000] PrId : 00019374 (MIPS 24Kc)
> [ 1024.530000] Modules linked in: sch_teql sch_tbf sch_sfq sch_red
> sch_qfq sch_prio sch_ns2_codel sch_nfq_codel sch_netem sch_htb
> sch_gred sch_efq_codel sch_dsmark em_text em_nbyte em_meta em_cmp
> cls_basic act_police act_ipt act_connmark act_skbedit act_mirred
> em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc
> sch_fq_codel sch_codel sch_ingress ath79_wdt xt_hashlimit xt_set(O)
> ip_set_list_set(O) ip_set_hash_netport(O) ip_set_hash_netiface(O)
> ip_set_hash_net(O) ip_set_hash_ipportnet(O) ip_set_hash_ipportip(O)
> ip_set_hash_ipport(O) ip_set_hash_ip(O) ip_set_bitmap_port(O)
> ip_set_bitmap_ipmac(O) ip_set_bitmap_ip(O) ip_set(O) ip6t_REJECT
> ip6t_rt ip6t_hbh ip6t_mh ip6t_ipv6header ip6t_frag ip6t_eui64 ip6t_ah
> ip6table_raw ip6table_mangle ip6table_filter ip6_tables
> nf_conntrack_ipv6 nf_defrag_ipv6 nfnetlink nf_nat_irc nf_nat_ftp
> nf_conntrack_irc nf_conntrack_ftp xt_policy xt_esp ipt_ah xt_HL xt_hl
> xt_ecn ipt_ECN xt_CLASSIFY xt_time xt_tcpmss xt_statistic xt_mark
> xt_length xt_DSCP xt_dscp xt_string xt_layer7 xt_quota xt_pkttype
> xt_physdev xt_owner ipt_MASQUERADE iptable_nat xt_nat nf_nat_ipv4
> nf_nat xt_recent xt_helper xt_connmark xt_connbytes pptp pppoe
> xt_conntrack xt_CT iptable_raw xt_state nf_conntrack_ipv4
> nf_defrag_ipv4 nf_conntrack pppox ipt_REJECT xt_TCPMSS xt_LOG
> xt_comment xt_multiport xt_mac xt_limit iptable_mangle iptable_filter
> ip_tables xt_tcpudp x_tables ip_gre gre ifb sit xfrm4_tunnel
> xfrm4_mode_tunnel xfrm4_mode_transport xfrm4_mode_beet esp4 ah4
> tunnel4 tun tcp_ledbat(O) ppp_async ppp_generic slhc xfrm_user
> xfrm_algo vfat fat autofs4 button_hotplug(O) ath9k(O) ath9k_common(O)
> ath9k_hw(O) ath(O) nls_utf8 nls_iso8859_2 nls_iso8859_15
> nls_iso8859_13 nls_iso8859_1 nls_cp437 mac80211(O) ts_fsm ts_bm ts_kmp
> crc_ccitt ipv6 input_polldev cfg80211(O) compat(O) input_core
> sha1_generic md5 hmac des_generic deflate cbc authenc arc4 usb_storage
> ohci_hcd ehci_hcd sd_mod ext4 jbd2 mbcache usbcore usb_common scsi_mod
> nls_base crc16 zlib_inflate zlib_deflate ledtrig_timer
> ledtrig_default_on leds_gpio gpio_button_hotplug(O)
> [ 1024.530000] Process seq (pid: 4023, threadinfo=854c2000,
> task=85461df8, tls=7735b440)
> [ 1024.530000] Stack : 00000000 8600a9a0 87287d80 8607bb58 85446cc0
> 80330000 8600a9a0 87287d80
> 8607bb58 8024a36c c0aa9720 c2337090 86b39000 00000000 00000001 00000014
> 00000001 02000004 862b6d24 c2337000 00000001 8547bf00 00000000 00000000
> 854c3bcc 854c3bcc 87287d80 854c3bcc 87287d80 80325dd8 87287d80 8600a9a0
> 8600a9a0 8607bb58 8607bb44 802516ec 87287d80 803261dc 80340000 80340000
> …
> [ 1024.530000] Call Trace:
> [ 1024.530000] [<801fc7f4>] reqsk_fastopen_remove+0x30/0x17c
> [ 1024.530000] [<8024a36c>] tcp_rcv_state_process+0x7b4/0xc28
> [ 1024.530000] [<802516ec>] tcp_v4_do_rcv+0x21c/0x274
> [ 1024.530000] [<80253c74>] tcp_v4_rcv+0x5b4/0x974
> [ 1024.530000] [<802320f0>] ip_local_deliver_finish+0x168/0x29c
> [ 1024.530000] [<80207100>] __netif_receive_skb+0x63c/0x6c0
> [ 1024.530000] [] ieee80211_deliver_skb+0x1b8/0x220 [mac80211]
> [ 1024.530000] []
> ieee80211_rx_handlers.part.12+0x1654/0x23e0 [mac80211]
> [ 1024.530000] []
> ieee80211_prepare_and_rx_handle+0xa6c/0xaf0 [mac80211]
> [ 1024.530000] [] ieee80211_rx+0x810/0x8d8 [mac80211]
> [ 1024.530000] [] ath_rx_tasklet+0xf4c/0x10a4 [ath9k]
> [ 1024.530000] [] ath9k_tasklet+0x104/0x174 [ath9k]
> [ 1024.530000] [<800793b8>] tasklet_action+0x78/0xc8
> [ 1024.530000] [<80078c08>] __do_softirq+0xb0/0x184
> [ 1024.530000] [<80078d8c>] do_softirq+0x48/0x68
> [ 1024.530000] [<80078fa8>] irq_exit+0x4c/0x7c
> [ 1024.530000] [<8006330c>] ret_from_irq+0x0/0x4
> [ 1024.530000]
> [ 1024.530000]
> Code: 8e510208 30d300ff 2c420001 <00028036> 0c01e2a7 ac80048c
> 8e220008 2442ffff ae220008
> [ 1024.940000] –[ end trace a47ff22dd20a96c1 ]–
> [ 1024.950000] Kernel panic - not syncing: Fatal exception in interrupt
> [ 1024.950000] ———–[ cut here ]———–
> [ 1024.950000] WARNING: at lib/vsprintf.c:1376 vsnprintf+0x6c/0x39c()
> [ 1024.950000] Modules linked in: sch_teql sch_tbf sch_sfq sch_red
> sch_qfq sch_prio sch_ns2_codel sch_nfq_codel sch_netem sch_htb
> sch_gred sch_efq_codel sch_dsmark em_text em_nbyte em_meta em_cmp
> cls_basic act_police act_ipt act_connmark act_skbedit act_mirred
> em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc
> sch_fq_codel sch_codel sch_ingress ath79_wdt xt_hashlimit xt_set(O)
> ip_set_list_set(O) ip_set_hash_netport(O) ip_set_hash_netiface(O)
> ip_set_hash_net(O) ip_set_hash_ipportnet(O) ip_set_hash_ipportip(O)
> ip_set_hash_ipport(O) ip_set_hash_ip(O) ip_set_bitmap_port(O)
> ip_set_bitmap_ipmac(O) ip_set_bitmap_ip(O) ip_set(O) ip6t_REJECT
> ip6t_rt ip6t_hbh ip6t_mh ip6t_ipv6header ip6t_frag ip6t_eui64 ip6t_ah
> ip6table_raw ip6table_mangle ip6table_filter ip6_tables
> nf_conntrack_ipv6 nf_defrag_ipv6 nfnetlink nf_nat_irc nf_nat_ftp
> nf_conntrack_irc nf_conntrack_ftp xt_policy xt_esp ipt_ah xt_HL xt_hl
> xt_ecn ipt_ECN xt_CLASSIFY xt_time xt_tcpmss xt_statistic xt_mark
> xt_length xt_DSCP xt_dscp xt_string xt_layer7 xt_quota xt_pkttype
> xt_physdev xt_owner ipt_MASQUERADE iptable_nat xt_nat nf_nat_ipv4
> nf_nat xt_recent xt_helper xt_connmark xt_connbytes pptp pppoe
> xt_conntrack xt_CT iptable_raw xt_state nf_conntrack_ipv4
> nf_defrag_ipv4 nf_conntrack pppox ipt_REJECT xt_TCPMSS xt_LOG
> xt_comment xt_multiport xt_mac xt_limit iptable_mangle iptable_filter
> ip_tables xt_tcpudp x_tables ip_gre gre ifb sit xfrm4_tunnel
> xfrm4_mode_tunnel xfrm4_mode_transport xfrm4_mode_beet esp4 ah4
> tunnel4 tun tcp_ledbat(O) ppp_async ppp_generic slhc xfrm_user
> xfrm_algo vfat fat autofs4 button_hotplug(O) ath9k(O) ath9k_common(O)
> ath9k_hw(O) ath(O) nls_utf8 nls_iso8859_2 nls_iso8859_15
> nls_iso8859_13 nls_iso8859_1 nls_cp437 mac80211(O) ts_fsm ts_bm ts_kmp
> crc_ccitt ipv6 input_polldev cfg80211(O) compat(O) input_core
> sha1_generic md5 hmac des_generic deflate cbc authenc arc4 usb_storage
> ohci_hcd ehci_hcd sd_mod ext4 jbd2 mbcache usbcore usb_common scsi_mod
> nls_base crc16 zlib_inflate zlib_deflate ledtrig_timer
> ledtrig_default_on leds_gpio gpio_button_hotplug(O)
> [ 1024.950000] Call Trace:
> [ 1024.950000] [<802a25e8>] dump_stack+0x8/0x34
> [ 1024.950000] [<80071f84>] warn_slowpath_common+0x78/0xa4
> [ 1024.950000] [<80071fc8>] warn_slowpath_null+0x18/0x24
> [ 1024.950000] [<80183d5c>] vsnprintf+0x6c/0x39c
> [ 1024.950000] [<800bc99c>] crashlog_printf+0x4c/0x68
> [ 1024.950000] [<800bca0c>] crashlog_do_dump+0x54/0x138
> [ 1024.950000] [<800752bc>] kmsg_dump+0xe0/0x11c
> [ 1024.950000] [<802a26b4>] panic+0x8c/0x1b4
> [ 1024.950000] [<80067b68>] die+0x104/0x10c
> [ 1024.950000] [<80067cd0>] do_trap_or_bp+0x118/0x18c
> [ 1024.950000] [<80063300>] ret_from_exception+0x0/0xc
> [ 1024.950000] [<801fc7f4>] reqsk_fastopen_remove+0x30/0x17c
> [ 1024.950000] [<8024a36c>] tcp_rcv_state_process+0x7b4/0xc28
> [ 1024.950000] [<802516ec>] tcp_v4_do_rcv+0x21c/0x274
> [ 1024.950000] [<80253c74>] tcp_v4_rcv+0x5b4/0x974
> [ 1024.950000] [<802320f0>] ip_local_deliver_finish+0x168/0x29c
> [ 1024.950000] [<80207100>] __netif_receive_skb+0x63c/0x6c0
> [ 1024.950000] [] ieee80211_deliver_skb+0x1b8/0x220 [mac80211]
> [ 1024.950000] []
> ieee80211_rx_handlers.part.12+0x1654/0x23e0 [mac80211]
> [ 1024.950000] []
> ieee80211_prepare_and_rx_handle+0xa6c/0xaf0 [mac80211]
> [ 1024.950000] [] ieee80211_rx+0x810/0x8d8 [mac80211]
> [ 1024.950000] [] ath_rx_tasklet+0xf4c/0x10a4 [ath9k]
> [ 1024.950000] [] ath9k_tasklet+0x104/0x174 [ath9k]
> [ 1024.950000] [<800793b8>] tasklet_action+0x78/0xc8
> [ 1024.950000] [<80078c08>] __do_softirq+0xb0/0x184
> [ 1024.950000] [<80078d8c>] do_softirq+0x48/0x68
> [ 1024.950000] [<80078fa8>] irq_exit+0x4c/0x7c
> [ 1024.950000] [<8006330c>] ret_from_irq+0x0/0x4
> [ 1024.950000]
> [ 1024.950000] –[ end trace a47ff22dd20a96c2 ]–
> [ 1024.950000] Rebooting in 3 seconds..þþ
>
> U-Boot 1.1.4 (May 27 2011 - 14:58:01)
>
> ————–
>
> On Tue, Jan 8, 2013 at 9:45 PM, cerowrt@lists.bufferbloat.net wrote:
>>
>> Issue #418 has been updated by David Taht.
>>
>>
>> While I acquired a serial adaptor for the 3800 I did not gain the time
>> to look at this issue over the weekend.
>>
>> I’m buried this week and may not be able to get to this til the weekend.
>> —————————————-
>> Bug #418: TFO crashes cerowrt 3.7.1-1
>> https://www.bufferbloat.net/issues/418 >>
>> Author: David Taht
>> Status: New
>> Priority: Normal
>> Assignee: Dave Täht
>> Category: Networking
>> Target version:
>>
>>
>> I am not able to get through even a single TCP TFO connection. The router restarts as soon as it sees the TFO connection.
>>
>> Looks like SYN+Data is crashing the box (see attached trace captured on lo iface of cero). logread, dmesg did not show anything. I don’t know whether its kernel panic.
>>
>> This is strange as 3.6 was working for SYN+Data cases.
>>
>> However the difference from previous instance is the polipo server with TFO running on cero box. Client may run on same cero or on my laptop which in either case crashes the box.
>>
>> On 3.7, if I run the TFO client on cero box and TFO server on the laptop, it still works but not the reverse.
>>
>> (see email thread at: https://lists.bufferbloat.net/pipermail/cerowrt-devel/2013-January/000844.html for more details)
>>
>>
> —————————————-
> Bug #418: TFO crashes cerowrt 3.7.1-1
> https://www.bufferbloat.net/issues/418 >
> Author: David Taht
> Status: New
> Priority: Normal
> Assignee: Dave Täht
> Category: Networking
> Target version:
>
>
> I am not able to get through even a single TCP TFO connection. The router restarts as soon as it sees the TFO connection.
>
> Looks like SYN+Data is crashing the box (see attached trace captured on lo iface of cero). logread, dmesg did not show anything. I don’t know whether its kernel panic.
>
> This is strange as 3.6 was working for SYN+Data cases.
>
> However the difference from previous instance is the polipo server with TFO running on cero box. Client may run on same cero or on my laptop which in either case crashes the box.
>
> On 3.7, if I run the TFO client on cero box and TFO server on the laptop, it still works but not the reverse.
>
> (see email thread at: https://lists.bufferbloat.net/pipermail/cerowrt-devel/2013-January/000844.html for more details)
>
>

Updated by Ketan Kulkarni on Jan 14, 2013.
Eric suggested a patch to fix the issue -
More details of the analysis and fix here -

https://lists.bufferbloat.net/pipermail/cerowrt-devel/2013-January/000897.html

Updated by Ketan Kulkarni on Jan 14, 2013.
Updated by Dave Täht on May 14, 2014.

This is a static export of the original bufferbloat.net issue database. As such, no further commenting is possible; the information is solely here for archival purposes.
RSS feed

Recent Updates

Oct 20, 2023 Wiki page
What Can I Do About Bufferbloat?
Dec 3, 2022 Wiki page
Codel Wiki
Jun 11, 2022 Wiki page
More about Bufferbloat
Jun 11, 2022 Wiki page
Tests for Bufferbloat
Dec 7, 2021 Wiki page
Getting SQM Running Right

Find us elsewhere

Bufferbloat Mailing Lists
#bufferbloat on Twitter
Google+ group
Archived Bufferbloat pages from the Wayback Machine

Sponsors

Comcast Research Innovation Fund
Nlnet Foundation
Shuttleworth Foundation
GoFundMe

Bufferbloat Related Projects

OpenWrt Project
Congestion Control Blog
Flent Network Test Suite
Sqm-Scripts
The Cake shaper
AQMs in BSD
IETF AQM WG
CeroWrt (where it all started)

Network Performance Related Resources


Jim Gettys' Blog - The chairman of the Fjord
Toke's Blog - Karlstad University's work on bloat
Voip Users Conference - Weekly Videoconference mostly about voip
Candelatech - A wifi testing company that "gets it".