Bug #340

massive packet loss on wireless crypto in 3.3-rc4-3

Added by David Taht about 1 year ago. Updated 10 months ago.

Status:Closed Start date:
Priority:Normal Due date:
Assignee:Dave Täht % Done:

0%

Category:Networking Spent time: 22.00 hours
Target version:1st Public Cerowrt release

Description

I am getting 80% packet loss on my attempt to get a 3.3-rc4-3 working at all
on encrypted wireless. the icmpv6 errors are another problem entirely
I'd expect.

[ 72.040000] ADDRCONF: gw11: link becomes ready
[ 318.320000] icmpv6_send: no reply to icmp error
[ 318.320000] icmpv6_send: no reply to icmp error
[ 650.040000] icmpv6_send: no reply to icmp error
[ 650.040000] icmpv6_send: no reply to icmp error
[ 1291.750000] ath: Failed to stop TX DMA, queues=0x004!
[ 1295.080000] ath: Failed to stop TX DMA, queues=0x00c!
[ 1809.500000] icmpv6_send: no reply to icmp error
[ 1923.970000] icmpv6_send: no reply to icmp error
[ 1923.970000] icmpv6_send: no reply to icmp error
[ 1923.970000] icmpv6_send: no reply to icmp error
root@quintaped:/etc/config#

History

Updated by Hector Ordorica about 1 year ago

Perhaps related, but I noticed this change recently in openwrt trunk for ICMPv6:

https://dev.openwrt.org/changeset/30727/trunk

In wireless testing in trunk, there was also a patch to ath9k that reduced tx bufs to 32 I think. I can't seem to find it at the moment though:

https://dev.openwrt.org/changeset/30746/trunk

Updated by Dave Täht about 1 year ago

  • Category set to Networking
  • Target version set to 13

yea, I wondered where the txqueuelen change came from...

I've gone way beyond that (running sfq by default)... but hmmm...... I'll try ripping that out

as for the icmp change, I tried stripping that out to no real effect...

  1. Allow essential incoming IPv6 ICMP traffic
    config rule
    option name Allow-ICMPv6-Input
    option src *
    option proto icmp
    option limit 1000/sec
    option family ipv6
    option target ACCEPT
  1. Allow essential forwarded IPv6 ICMP traffic
    config rule
    option name Allow-ICMPv6-Forward
    option src *
  2. option dest *
    option proto icmp
    option limit 1000/sec
    option family ipv6
    option target ACCEPT

Updated by Dave Täht about 1 year ago

For reference I am way beyond fiddling with txqueuelen

root@quintaped:/etc/config# tc -s qdisc show dev sw10
qdisc mq 1: root
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc sfq 10: parent 1:1 limit 200p quantum 3028b depth 24 headdrop divisor 16384 perturb 600sec
ewma 3 min 4500b max 18000b probability 0.2 ecn
prob_mark 0 prob_mark_head 0 prob_drop 0
forced_mark 0 forced_mark_head 0 forced_drop 0
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc sfq 20: parent 1:2 limit 200p quantum 3028b depth 24 headdrop divisor 16384 perturb 600sec
ewma 3 min 4500b max 18000b probability 0.2 ecn
prob_mark 0 prob_mark_head 0 prob_drop 0
forced_mark 0 forced_mark_head 0 forced_drop 0
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc sfq 30: parent 1:3 limit 200p quantum 3028b depth 24 headdrop divisor 16384 perturb 600sec
ewma 3 min 4500b max 18000b probability 0.2 ecn
prob_mark 0 prob_mark_head 0 prob_drop 0
forced_mark 0 forced_mark_head 0 forced_drop 0
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
qdisc sfq 40: parent 1:4 limit 200p quantum 3028b depth 24 headdrop divisor 16384 perturb 600sec
ewma 3 min 4500b max 18000b probability 0.2 ecn
prob_mark 0 prob_mark_head 0 prob_drop 0
forced_mark 0 forced_mark_head 0 forced_drop 0
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0

Updated by Dave Täht about 1 year ago

This is actually against the rc4-3 release, not the rc5

I am aware that powersave mode does weird things to ping, but the phone does not access the internet worth beans and pinging it has results like:

root@quintaped:/etc/dibbler# ping 172.29.42.85
PING 172.29.42.85 (172.29.42.85): 56 data bytes
64 bytes from 172.29.42.85: seq=0 ttl=64 time=7198.999 ms
64 bytes from 172.29.42.85: seq=1 ttl=64 time=6506.462 ms
64 bytes from 172.29.42.85: seq=2 ttl=64 time=5700.414 ms
64 bytes from 172.29.42.85: seq=3 ttl=64 time=4838.906 ms
64 bytes from 172.29.42.85: seq=4 ttl=64 time=10448.104 ms
64 bytes from 172.29.42.85: seq=5 ttl=64 time=12287.972 ms
64 bytes from 172.29.42.85: seq=6 ttl=64 time=12414.146 ms
64 bytes from 172.29.42.85: seq=11 ttl=64 time=10277.950 ms
64 bytes from 172.29.42.85: seq=24 ttl=64 time=17193.064 ms
64 bytes from 172.29.42.85: seq=31 ttl=64 time=10291.945 ms
64 bytes from 172.29.42.85: seq=32 ttl=64 time=12313.409 ms
64 bytes from 172.29.42.85: seq=33 ttl=64 time=12334.132 ms
64 bytes from 172.29.42.85: seq=34 ttl=64 time=14613.955 ms
64 bytes from 172.29.42.85: seq=35 ttl=64 time=14074.534 ms
64 bytes from 172.29.42.85: seq=41 ttl=64 time=11444.921 ms
64 bytes from 172.29.42.85: seq=42 ttl=64 time=11989.482 ms
64 bytes from 172.29.42.85: seq=43 ttl=64 time=11527.725 ms
64 bytes from 172.29.42.85: seq=44 ttl=64 time=13893.153 ms
64 bytes from 172.29.42.85: seq=45 ttl=64 time=14567.878 ms
64 bytes from 172.29.42.85: seq=51 ttl=64 time=11443.464 ms
64 bytes from 172.29.42.85: seq=52 ttl=64 time=10613.561 ms
64 bytes from 172.29.42.85: seq=53 ttl=64 time=9648.801 ms
64 bytes from 172.29.42.85: seq=54 ttl=64 time=8652.338 ms
64 bytes from 172.29.42.85: seq=55 ttl=64 time=10004.857 ms
64 bytes from 172.29.42.85: seq=56 ttl=64 time=9575.878 ms
64 bytes from 172.29.42.85: seq=57 ttl=64 time=9810.736 ms
64 bytes from 172.29.42.85: seq=58 ttl=64 time=8888.394 ms
64 bytes from 172.29.42.85: seq=59 ttl=64 time=10075.711 ms
64 bytes from 172.29.42.85: seq=60 ttl=64 time=11003.248 ms
64 bytes from 172.29.42.85: seq=61 ttl=64 time=13171.309 ms
64 bytes from 172.29.42.85: seq=62 ttl=64 time=13581.010 ms
64 bytes from 172.29.42.85: seq=63 ttl=64 time=16758.824 ms
64 bytes from 172.29.42.85: seq=71 ttl=64 time=9913.159 ms
64 bytes from 172.29.42.85: seq=72 ttl=64 time=9179.238 ms
64 bytes from 172.29.42.85: seq=73 ttl=64 time=8977.600 ms
64 bytes from 172.29.42.85: seq=74 ttl=64 time=8550.328 ms
64 bytes from 172.29.42.85: seq=75 ttl=64 time=8268.744 ms
64 bytes from 172.29.42.85: seq=76 ttl=64 time=8868.098 ms
64 bytes from 172.29.42.85: seq=77 ttl=64 time=8484.207 ms
64 bytes from 172.29.42.85: seq=78 ttl=64 time=7750.523 ms
64 bytes from 172.29.42.85: seq=79 ttl=64 time=6929.604 ms
64 bytes from 172.29.42.85: seq=80 ttl=64 time=6032.321 ms
64 bytes from 172.29.42.85: seq=81 ttl=64 time=5195.933 ms
64 bytes from 172.29.42.85: seq=82 ttl=64 time=4568.045 ms
64 bytes from 172.29.42.85: seq=83 ttl=64 time=5821.851 ms

Updated by Dave Täht about 1 year ago

Similarly, I see this in the logs against this rc4-3

[35035.940000] ath: Failed to stop TX DMA, queues=0x004!
[38649.010000] ath: Failed to stop TX DMA, queues=0x004!
[39778.210000] ath: Failed to stop TX DMA, queues=0x004!
[41006.390000] ath: Failed to stop TX DMA, queues=0x004!
[41288.800000] ath: Failed to stop TX DMA, queues=0x004!
[42141.720000] ath: Failed to stop TX DMA, queues=0x004!
[45895.820000] ath: Failed to stop TX DMA, queues=0x004!
[53289.030000] ath: Failed to stop TX DMA, queues=0x004!
[53858.920000] ath: Failed to stop TX DMA, queues=0x001!

Updated by Dave Täht about 1 year ago

Mar 1 09:58:57 quintaped daemon.info dnsmasq-dhcp3888: DHCPACK 192.168.1.102 00:27:13:64:89:67 cruithne
Mar 1 09:58:57 quintaped daemon.info hostapd: sw00: STA 00:13:e8:58:2a:c5 WPA: group key handshake completed (RSN)
Mar 1 09:58:58 quintaped daemon.info hostapd: sw00: STA 90:21:55:67:8d:cb WPA: group key handshake completed (RSN)
Mar 1 09:58:58 quintaped daemon.info hostapd: sw00: STA 90:21:55:67:8d:cb WPA: received EAPOL-Key 2/2 Group with unexpected replay counter

Updated by Dave Täht about 1 year ago

Well, the good news is 3.3rc5-1 is not exhibiting the problem when at 2.4ghz,with crypto.
pings are respectable, performance on the phone itself is good...

The bad news is that this is a highly simplified test setup rather than an attempted deployment,
and I'm not going to have time to attempt recreating the deployment.

64 bytes from 172.30.42.85: seq=46 ttl=64 time=137.776 ms
64 bytes from 172.30.42.85: seq=47 ttl=64 time=160.643 ms
64 bytes from 172.30.42.85: seq=48 ttl=64 time=184.652 ms
64 bytes from 172.30.42.85: seq=49 ttl=64 time=3.000 ms
64 bytes from 172.30.42.85: seq=50 ttl=64 time=64.671 ms
64 bytes from 172.30.42.85: seq=51 ttl=64 time=50.848 ms
64 bytes from 172.30.42.85: seq=52 ttl=64 time=4.502 ms
64 bytes from 172.30.42.85: seq=53 ttl=64 time=99.418 ms
64 bytes from 172.30.42.85: seq=54 ttl=64 time=122.181 ms

Updated by Dave Täht about 1 year ago

  • Status changed from New to Closed
  • Assignee set to Dave Täht

as far as I know this was only a bug in 3.3rc4.

Updated by Dave Täht about 1 year ago

  • Target version changed from 13 to 1st Public Cerowrt release

Also available in: Atom PDF