Bug #402
Tail drop behavior on fq_codel on ack-mostly streams
| Status: | New | Start date: | 07/12/2012 | |
|---|---|---|---|---|
| Priority: | Urgent | Due date: | ||
| Assignee: | % Done: | 0% |
||
| Category: | Networking | Spent time: | 6.00 hours | |
| Target version: | 1st Public Cerowrt release |
Description
In order to reduce memory requirements I reduced the packet limits for the 24 fq_codel queues to anywhere between 100 and 600 packets depending on type. This, I reasoned, would be enough for most traffic loads.
I missed the case of ack mostly traffic, which as it has a 20x1 ratio vs big packets, requires that the queues be
much larger. On an ACK-heavy load, at high bandwidths, we end up with fq_codel in an "interesting" drop tail state, where it still admits packets but ends up optimizing for "new" flows that aren't actually "new".
I do not know the right values. The fq_codel default of 10k packets is sufficient to run the router out of memory in a variety of scenarios, resulting in a crash.
History
Updated by Dave Täht 10 months ago
So to add clarity here (see also #401) when I do non-ack mostly streams I run the box out of memory (can't observe this, not necessarily a valid conclusion) and queue depth, after I went back to pure codel I can still crash...
netperf -Y CS5,CS5 -l 60 -H 172.20.1.1 -t TCP_MAERTS & netperf -Y CS1,CS1 -l 60 -H 172.20.1.1 -t TCP_MAERTS & #netperf -Y EF,EF -l 60 -H 172.20.1.1 -t TCP_MAERTS & netperf -Y EF,EF -l 60 -H 172.20.1.1 -t TCP_MAERTS & #netperf -Y EF,EF -l 60 -H 172.20.1.1 -t TCP_MAERTS & #netperf -Y EF,EF -l 60 -H 172.20.1.1 -t TCP_MAERTS & netperf -Y CS0,CS0 -l 60 -H 172.20.1.1 -t TCP_MAERTS &
I may well have an interaction with netperf running on the box too.
root@testbox:~# tc -s qdisc show dev gw11 qdisc mq 1: root Sent 349384011 bytes 231149 pkt (dropped 491, overlimits 0 requeues 26191) backlog 56210b 40p requeues 26191 qdisc codel 10: parent 1:1 limit 1000p target 5.0ms interval 100.0ms Sent 344597768 bytes 227688 pkt (dropped 457, overlimits 0 requeues 25833) backlog 7570b 5p requeues 25833 count 1 lastcount 1 ldelay 354us drop_next 0us maxpacket 1514 ecn_mark 0 drop_overlimit 0 qdisc codel 20: parent 1:2 limit 1000p target 10.0ms interval 100.0ms Sent 4260478 bytes 2827 pkt (dropped 0, overlimits 0 requeues 271) backlog 45420b 30p requeues 271 count 0 lastcount 0 ldelay 19.6ms drop_next 0us maxpacket 1514 ecn_mark 0 drop_overlimit 0 qdisc codel 30: parent 1:3 limit 1000p target 20.0ms interval 100.0ms Sent 525617 bytes 632 pkt (dropped 34, overlimits 0 requeues 87) backlog 3220b 5p requeues 87 count 32 lastcount 23 ldelay 16.7ms drop_next 0us maxpacket 1514 ecn_mark 0 drop_overlimit 0 qdisc codel 40: parent 1:4 limit 1000p target 5.0ms interval 100.0ms Sent 148 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 count 0 lastcount 0 ldelay 0us drop_next 0us maxpacket 256 ecn_mark 0 drop_overlimit 0 root@testbox:~# Write failed: Broken pipe d@ida:~$ ~~~
Updated by Dave Täht 10 months ago
partially a heisenbug, when I drive this this hard with netserver, that consumes a ton more ram than usual.
Usually I have a test box and not netperf running on the other side of the router.
Updated by Dave Täht 10 months ago
At the observed wireless bandwidths with packet limit 1200 I see fq_codel peak at around 700 packets and not enter drop tail.
Figuring out what those bandwidths actually are and the interrelationship between the device queuing, EDCA, and fq_codel is going to be the subject of a long month or three. Or longer.