Bug #258
rc6-smoketest2 doesn't route through ge00 right
| Status: | Closed | Start date: | 08/31/2011 | |
|---|---|---|---|---|
| Priority: | Immediate | Due date: | ||
| Assignee: | - | % Done: | 0% |
|
| Category: | - | Spent time: | 4.00 hours | |
| Target version: | 1st Public Cerowrt release |
Description
My first two attempts at a rc6-smoketest2 failed, and I'm not sure why.
All that was was an upgrade from Linux 3.0.3 to 3.0.4, and latest openwrt head which contained a multitude of ath9k fixes.
The symptoms are weird - local routing works to other destinations nearby, but once it goes through a natted connection, stop.
UDP is often fine. TCP is not.
So I updated again to openwrt head, which had a multitude of additional ath9k fixes...
With babel enabled, rc6-smoketest3 does successfully get packets out se00 through multiple hops. Without, and only a default route, clients fail.
I have not got to checking ge00 yet. I'm pretty sure that will fail, too, right now.
On gw11 or gw01 (the ahcp derived, ad-hoc nets), packets often stop at the first hop, even after flushing the firewall rules entirely.
Perhaps this latter is a firewall rule problem. Or babeld is messing up.
I do have multiple gateways to the internet now, and perhaps I also have route flapping, but that doesn't account for the first hop behavior.
I need to stop propagating (if I am, I thought I wasn't) default routes to everywhere throughout the lab.
I'm observing what feels like multiple machines on a single ip address from a halting transit perspective...
I had assumed that what I was encountering was a local configuration error, but two testers reported issues on ge00 routing...
Lastly...
I had tried, with one smoketest, to increase the clockrate, which failed on the mii ethernet detection routine on the phy. I retain high hopes that with a higher clock, buffers throughout, and responsiveness of things like minstrel, would improve, but lacking the ability to find that bug, I have gone back to what was used in rc6-smoketest 1, which was merely NO_HZ operation.
Far too much has changed in the lab to grok what has going on, and I am simplifying as fast as I can. I think the sanest thing is to step back the kernel release I'm on.
Weirdly, I CAN reach certain rc6-smoketest boxes from the rc6-smoketest3 over the ad-hoc channels.
172.29.29.161 via 172.29.30.132 dev gw01 proto babel onlink - unreachable
172.29.30.126 via 172.29.30.126 dev gw01 proto babel onlink - unreachable
172.29.30.130 via 172.29.30.130 dev gw01 proto babel onlink - reachable
172.29.30.131 via 172.29.30.131 dev gw01 proto babel onlink - reachable
172.29.30.132 via 172.29.30.132 dev gw01 proto babel onlink - unreachable
172.29.30.133 via 172.29.30.133 dev gw01 proto babel onlink - unreachable
172.29.30.134 via 172.29.30.134 dev gw01 proto babel onlink - reachable
Related issues
| related to Cerowrt - Bug #260: WAN port won't DHCP successfully (it would not receive pa... | Closed | 09/01/2011 |
History
Updated by Dave Täht over 1 year ago
To say this is really weird, is an understatement. SOME packets are getting through the wireless adhoc interfaces, in both directions.
Off to test the ge00 interface, and then revert to 3.0.3 or earlier, keeping openwrt head. Perhaps the netfilter optimizations are causing difficulties?
21:10:17.595068 IP cruithne.local.57824 > 172.29.5.129.ssh: Flags [F.], seq 1, ack 1, win 115, length 0
21:10:18.333572 IP cruithne.local.34152 > 172.29.5.129.ssh: Flags [SEW], seq 623326897, win 14600, options [mss 1460,sackOK,TS val 7010560 ecr 0,nop,wscale 7], length 0
21:10:18.335591 IP 172.29.5.129.ssh > cruithne.local.34152: Flags [S.E], seq 2972513861, ack 623326898, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 1], length 0
21:10:18.335639 IP cruithne.local.34152 > 172.29.5.129.ssh: Flags [.], ack 1, win 115, length 0
21:10:20.604384 IP cruithne.local.57824 > 172.29.5.129.ssh: Flags [F.], seq 1, ack 1, win 115, length 0
21:10:21.555540 IP 172.29.5.129.ssh > cruithne.local.34152: Flags [S.E], seq 2972513861, ack 623326898, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 1], length 0
21:10:21.555698 IP cruithne.local.34152 > 172.29.5.129.ssh: Flags [.], ack 1, win 115, options [nop,nop,sack 1 {0:1}], length 0
21:10:26.624427 IP cruithne.local.57824 > 172.29.5.129.ssh: Flags [F.], seq 1, ack 1, win 115, length 0
21:10:27.955495 IP 172.29.5.129.ssh > cruithne.local.34152: Flags [S.E], seq 2972513861, ack 623326898, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 1], length 0
21:10:27.955673 IP cruithne.local.34152 > 172.29.5.129.ssh: Flags [.], ack 1, win 115, options [nop,nop,sack 1 {0:1}], length 0
Updated by Dave Täht over 1 year ago
- Status changed from New to Closed
- Target version set to 1st Public Cerowrt release
I had introduced a really bad route to the network and ALSO introduced #260 to the mix on one box, or so I think now.
Things are happier overall as I write.