Bug #255

cerowrt-1.0-rc5 rebooting passing ~80Mbps traffic when QoS is enabled

Added by Aidan Williams almost 2 years ago. Updated 8 months ago.

Status:Closed Start date:08/28/2011
Priority:Urgent Due date:
Assignee:- % Done:

0%

Category:- Spent time: 0.50 hour
Target version:1st Public Cerowrt release

Description

On the weekend I finally got around to loading Cerowrt onto a Wndr3700v2 (which was bought for the purpose of messing around with de-bloating my home network!)

Playing around with the QoS settings trying various numbers to see the effect was a bit confusing because sometimes the router rebooted and sometimes it didn't. Each time I would change a parameter and then fire up a speed test to see what happened. After changing the parameter, it would take some time to be applied and confusingly, sometimes the router would subsequently reboot.

The router reliably crashes if QoS is enabled and then a high rate speedtest is run (~80Mbps). The router appears to hang within seconds of the speedtest beginning and then reboots. If QoS is disabled, the router does not reboot and I can reliably run the speedtest many times. Running "ordinary" web browsing through the router is fine.

Other info:
  • Internet connection is 100Mbps DOCSIS download, ~1Mbps upload.
  • The speedtester was speedtest.syd.optusnet.com.au
  • Speedtests were done with gigabit Ethernet connected to a MacOSX laptop.
  • Screenshot of the QoS configuration is attached

Happy to follow instructions and provide more debug info as required....

certowrt-qos-config.png - QoS config (176.8 kB) Aidan Williams, 08/28/2011 07:35 pm

History

Updated by Dave Täht almost 2 years ago

Thx for taking a look at this! I am aware of several bugs in rc5 that may induce a crash like this, notably the txqueuelen is too low, and there was a bug in the netfilter code.

If you are feeling ambitious, the current development branch of cerowrt rc6 is up at:

http://huchra.bufferbloat.net/~cero1/rc6-smoketest/

So far not a lot of smoke has been emitted. It's easy to upgrade (although you will need to let it NOT save your existing config as multiple important files in /etc have changed)

(I will get some time personally to look into this tomorrow, but as you are in a different time zone...)

Updated by Dave Täht almost 2 years ago

Also, in looking over your screenshot, 90Mbit down 900KB up can't possibly ever work, as
TCP data packets are clocked by ack packets, which has a maximum of about 21 to 1 ratio and a sane one of not more than about 11x1. Is that what your provider is actually selling?

so 90Mbit vs 9Mbit would be reasonable.

But do give smoketest rc6 a shot. I'll try duplicating your results here.

Updated by Dave Täht almost 2 years ago

passing 16Mbit of traffic (via the wireless) worked. I will have to get more rc6 boxes up
to try and pass more... in the morning.

Updated by Dave Täht almost 2 years ago

  • Priority changed from High to Normal

I just tried 9/90Mbit and got 76Mbit out of it. No logged errors, no problems, no crash.

This was with netperf. It's a little harder at present to try another speedtest.

Updated by Aidan Williams almost 2 years ago

Hi Dave,

Thanks for looking at this. The rc6-smoketest image does not crash for me either.

I tried the same speedtest as before, with 100Mbit / 1Mbit and got this result:
http://speedtest.ookla.com/result/1345874780.png
93261kbps / 951kbps
Reliable, no crashes.

Netalyzr reports my upstream storage as dropping from 1400ms to about 160ms - which is nice!

Wireless was a lot slower (circa 10Mbps) on the 5GHz band, although I didn't try wireless with the rc5 software.

- aidan

Updated by Dave Täht almost 2 years ago

Aidan Williams wrote:

Hi Dave,

Thanks for looking at this. The rc6-smoketest image does not crash for me either.

I tried the same speedtest as before, with 100Mbit / 1Mbit and got this result: http://speedtest.ookla.com/result/1345874780.png 93261kbps / 951kbps Reliable, no crashes.

Awesome.

Netalyzr reports my upstream storage as dropping from 1400ms to about 160ms - which is nice!

This is over wired? with qos on? To a site, how many ms away? This is still kind of bad.

Wireless was a lot slower (circa 10Mbps) on the 5GHz band, although I didn't try wireless with the rc5 software.

mmm.... in my case I'm getting 45Mbps on 5ghz, so I worry. Can you open a new bug with more details on your scenario? Are you n or g?

Also, try running netperf (version 2.5.0) to the router from your hardware, if that's not what you did..

- aidan

Updated by Dave Täht almost 2 years ago

and again 100Mbit/1Mbit is just plain not doable. 10Mbit? Your numbers are odd looking....

Updated by Aidan Williams almost 2 years ago

Dave Täht wrote:

Netalyzr reports my upstream storage as dropping from 1400ms to about 160ms - which is nice!

This is over wired? with qos on? To a site, how many ms away? This is still kind of bad.

Gigabit Ethernet, DOCSIS3, to a local Ookla speed tester (speedtest.syd.optusnet.com.au). Optus run a local speedtester box in Sydney, Melbourne, Brisbane.

QoS on, with the config I mentioned. QoS was working - I dialled the numbers up and down and got different measurements with the speedtester - it was very satisfying to see tweaks in the config directly reflected in the measurements..

The PNG link shows 15ms ping latency.

Further reducing the uplink bandwidth limit didn't seem to have a strong effect on Netalyzr's view of the storage in the uplink. I reduced it by 100kbps a couple of times and it stuck at 160-170ms. I'm wondering if this has more to do with how the DOCSIS MAC parameters are configured.

Wireless was a lot slower (circa 10Mbps) on the 5GHz band, although I didn't try wireless with the rc5 software.

mmm.... in my case I'm getting 45Mbps on 5ghz, so I worry. Can you open a new bug with more details on your scenario? Are you n or g?

Also, try running netperf (version 2.5.0) to the router from your hardware, if that's not what you did..

Pretty sure it was n. I'll be a bit more systematic and open another bug if necessary.

- aidan

Updated by Aidan Williams almost 2 years ago

Dave Täht wrote:

and again 100Mbit/1Mbit is just plain not doable. 10Mbit? Your numbers are odd looking....

Yeah, I get what you're saying. OTOH, I didn't photoshop that PNG from the speedtester... ;-)

It may be that there is rate limiting only on TCP payloads with data in them and bare ACKs are not limited the same way. I have not yet done a tcpdump to get to the bottom of it.

For completeness.. I was using MacOSX 10.6.8, which supports window scaling:
bash-3.2# sysctl -a | egrep 1323\|win_scale
net.inet.tcp.rfc1323: 1
net.inet.tcp.win_scale_factor: 3

Updated by Dave Täht almost 2 years ago

jim has some info regarding docsis 3 that I don't....

but it looks like maybe a little different classification might help. Try tossing ping into the highest bucket on the qos system.

If that doesn't work, then we can point a few fingers at the modem.

I am very strongly encouraged, that we are finally WINNING!!

Updated by Dave Täht almost 2 years ago

in /etc/config/qos

try changing target to 'Express' and/or 'Priority'

config 'reclassify'
option 'target' 'Bulk'
option 'proto' 'icmp'

Updated by Aidan Williams almost 2 years ago

Speed tester observations:
  • Upon closer inspection, there is no ICMP - the speedtester thingy is dissembling.. ;-)
  • The flash application running in a browser opens 6 TCP connections in parallel.
  • Quite a bit of laptop CPU gets consumed running the flash application.
A long ICMP echo trace shows quite a bit of variability in ping time. I have put some traces, scripts and results summaries into:

Wireless files have 24g or 5g in the filename. Using 802.11n capable laptop.
Everything else is gig Ethernet.

A 100:1 ratio of data to ACK bandwidth seems quite feasible according to the traces/results in the link above. If I have made some dumb mistake, maybe someone else can spot it.

Presumably 100:1 works because window scaling is working between the sender and receiver and the advertised receiver window is large..
Counting packets, my traces show about 4.5 data packets for every ACK using Ethernet. I guess that is testable by messing around with sysctl.

result-speedtest-2.txt seems pretty typical:

--------------------------------------------
    Statistics
--------------------------------------------
Downstream TCP data packets:  33815
  Upstream TCP ACK packets:   7341
  Upstream TCP SACKs:         370

  Upstream TCP data packets:  1222
Downstream TCP ACK packets:   945
Downstream TCP SACKs:         128

Downstream Data/ACK ratio:    4.606
  Upstream Data/ACK ratio:    1.293

Googling around, I also came across this:

It describes Data/ACK bandwidth ratios of 50:1 or so for DOCSIS 3 cable networks.

Updated by Dave Täht almost 2 years ago

Thx for updating me on the state of the art in ack optimization and qos!

Updated by Dave Täht almost 2 years ago

wow... looked over your data...

thought: Try your levels of qos with sack and dsack disabled (see /etc/sysctl.conf)

Updated by Dave Täht almost 2 years ago

did you fiddle with what I note in comment 11 above for optimizing ping (as a test)?

Updated by Dave Täht almost 2 years ago

And I'd love a wireshark capture from your laptop, to look at, using the

tcptrace -G
xplot.org

tools. Let me know if you need a big place to upload a .cap file to, please don't upload something that large to the bug tracker. :)

Lastly, I'm very interested in you doing a

netperf -l 60 europa.lab.bufferbloat.net

from the router.

- and capturing that one via wireshark, too? We may have we have inadaquate buffering for a connection to australia... which will show in a trace.

Updated by Aidan Williams almost 2 years ago

You want fries with that?? ;-)

Updated by Dave Täht almost 2 years ago

I am simply delighted to have a tester producing documented results on the other side of the world. Only having one on a moonbase would make me happier! With your help rc6 will be the best cerowrt yet!

Updated by Dave Täht almost 2 years ago

  • Status changed from New to Closed
  • Priority changed from Normal to Urgent
  • Target version set to 1st Public Cerowrt release

so no mo crashes, we can close this one

Also available in: Atom PDF