Bug #266

Fwd: routing table cache bug for mtu/advmss attributes in Linux 3.0.4

Added by David Taht over 1 year ago. Updated about 1 year ago.

Status:Closed Start date:
Priority:Immediate Due date:
Assignee:- % Done:

100%

Category:Linux Kernel Spent time: 4.00 hours
Target version:1st Public Cerowrt release Estimated time:4.00 hours

Description

---------- Forwarded message ----------
From: Yan, Zheng <>
Date: Thu, Sep 8, 2011 at 7:42 AM
Subject: Re: possible routing table cache bug for mtu/advmss
attributes in Linux 3.0.4
To: David Madore <>
Cc: Linux Netdev Mailing-List <>

On Fri, Sep 2, 2011 at 5:46 AM, David Madore <> wrote:

Hi,

I believe I've found a bug in the routing cache (perhaps only for IPv6) in Linux 3.0.4.  Since I'm uncertain about the diagnosis, I'm posting here for guidance as to how to proceed.

In a nutshell, the problem is this: my (manually entered) routing tables specify an explicit MTU/MSS for certain IPv6 routes, and after upgrading from 2.6.38.7 to 3.0.4, I found that this MTU/MSS setting is sometimes ignored.  More precisely, it seems that connecting to a host through the route in question causes a routing cache entry to be created, and it is when this cache entry expires that the route forgets the MTU/MSS setting.  (It may also be relevant that in my case the route belongs to a routing table other than "main".)

In more details: my routing tables contains the following:

vega david ~ $ sudo ip -6 rule show 0:      from all lookup local 32766:  from all lookup main 40000:  from 2002::/16 lookup 2002 40100:  from all lookup 2001 vega david ~ $ sudo ip -6 route show table 2001 2002::/16 via ::192.88.99.1 dev tun6to4  metric 512  mtu 1466 advmss 1406 2000::/3 dev ppp0  metric 64  mtu 1466 advmss 1406

Initially, trying to get the route to 2001:41d0:1:a431::1 correctly returns route with the mtu and advmss values I entered:

vega david ~ $ sudo ip route get 2001:41d0:1:a431::1 2001:41d0:1:a431::1 from :: via 2001:41d0:1:a431::1 dev ppp0  src 2001:7a8:78ae::1  metric 0    cache  mtu 1466 advmss 1406

Now I start an ssh connection to 2001:41d0:1:a431::1, close it immediately, and fetch the route again:

vega david ~ $ sudo ip route get 2001:41d0:1:a431::1 2001:41d0:1:a431::1 from :: via 2001:41d0:1:a431::1 dev ppp0  src 2001:7a8:78ae::1  metric 0    cache  mtu 1466 rtt 37ms rttvar 37ms cwnd 10 advmss 1406

This is still correct.  But if I wait a few seconds (doing nothing) for these data to expire, they seems to take the mtu and advmss values away with them:

vega david ~ $ sudo ip route get 2001:41d0:1:a431::1 2001:41d0:1:a431::1 from :: via 2001:41d0:1:a431::1 dev ppp0  src 2001:7a8:78ae::1  metric 0    cache

(note the empty "cache" line and the absence of mtu/advmss settings).

This is not just wrongly reported: the route really is wrong, and opening a TCP connection at this stage will use a wrong MSS (and freeze on large packets, because my Internet provider is worthless, but that's another story).  At some point the route seems to return to normal, but I couldn't figure out what causes this exactly.

In case it's uesful, my kernel config is on <URL: http://www.madore.org/~david/.tmp/config-3.0.4-vega  >.

Would you please try patch http://marc.info/?l=linux-netdev&m=131529445904858

Thanks

Any thoughts?


Related issues

related to Cerowrt - Bug #265: Fwd: [BUG?] tcp: potential bug in tcp_is_sackblock_valid() Closed
related to Cerowrt - Bug #252: addresses distributed with AHCP appear to be firewalled o... New 08/27/2011

History

Updated by Dave Täht over 1 year ago

  • Priority changed from Normal to Urgent
  • Target version set to 1st Public Cerowrt release

I have seen a great deal of possibly routing related weirdness of late, as the networks and tests has scaled up past 10+ users on a variety of devices. Up until recently I was blaming STA association weirdnesses, but this report opened up new possibilities to test against at the routing layer...

Although this bug is specific to ipv6 I would like to check against ipv4, too.

Updated by David Taht over 1 year ago

Actually, it was this bug that was worrying me about babel
specifically, not the original forward

(the other set me off to potential trouble in the route caches)

---------- Forwarded message ----------
From: Yan, Zheng <>
Date: Tue, Sep 6, 2011 at 12:34 AM
Subject: [PATCH] ipv6: don't use inetpeer to store metrics for routes.
To: "" <>
Cc: "" <>,
"" <>

Current IPv6 implementation uses inetpeer to store metrics for
routes. The problem of inetpeer is that it doesn't take subnet
prefix length in to consideration. If two routes have the same
address but different prefix length, they share same inetpeer.
So changing metrics of one route also affects the other. The
fix is to allocate separate metrics storage for each route.

Signed-off-by: Zheng Yan <>
---
 net/ipv6/route.c |   33 ++++++++++++++++++++-----------
 1 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 9e69eb0..1250f90 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@ -104,6 +104,9 @ static u32 *ipv6_cow_metrics(struct dst_entry
*dst, unsigned long old)
       struct inet_peer *peer;
       u32 *p = NULL;

+       if (!(rt->dst.flags & DST_HOST))
+               return NULL;
+
       if (!rt->rt6i_peer)
               rt6_bind_peer(rt, 1);

@ -252,6 +255,9 @ static void ip6_dst_destroy(struct dst_entry *dst)
       struct inet6_dev *idev = rt->rt6i_idev;
       struct inet_peer *peer = rt->rt6i_peer;

+       if (!(rt->dst.flags & DST_HOST))
+               dst_destroy_metrics_generic(dst);
+
       if (idev != NULL) {
               rt->rt6i_idev = NULL;
               in6_dev_put(idev);
@ -723,9 +729,7 @ static struct rt6_info *rt6_alloc_cow(const struct
rt6_info *ort,
                       ipv6_addr_copy(&rt->rt6i_gateway, daddr);
               }

-               rt->rt6i_dst.plen = 128;
               rt->rt6i_flags |= RTF_CACHE;
-               rt->dst.flags |= DST_HOST;

 #ifdef CONFIG_IPV6_SUBTREES
               if (rt->rt6i_src.plen && saddr) {
@ -775,9 +779,7 @ static struct rt6_info *rt6_alloc_clone(struct
rt6_info *ort,
       struct rt6_info *rt = ip6_rt_copy(ort, daddr);

       if (rt) {
-               rt->rt6i_dst.plen = 128;
               rt->rt6i_flags |= RTF_CACHE;
-               rt->dst.flags |= DST_HOST;
               dst_set_neighbour(&rt->dst,
neigh_clone(dst_get_neighbour_raw(&ort->dst)));
       }
       return rt;
@ -1078,12 +1080,15 @ struct dst_entry *icmp6_dst_alloc(struct
net_device *dev,
                       neigh = NULL;
       }

-       rt->rt6i_idev     = idev;
+       rt->dst.flags |= DST_HOST;
+       rt->dst.output  = ip6_output;
       dst_set_neighbour(&rt->dst, neigh);
       atomic_set(&rt->dst.__refcnt, 1);
-       ipv6_addr_copy(&rt->rt6i_dst.addr, addr);
       dst_metric_set(&rt->dst, RTAX_HOPLIMIT, 255);
-       rt->dst.output  = ip6_output;

      ipv6_addr_copy(&rt->rt6i_dst.addr, addr);
+       rt->rt6i_dst.plen = 128;
+       rt->rt6i_idev     = idev;

       spin_lock_bh(&icmp6_dst_lock);
       rt->dst.next = icmp6_dst_gc_list;
@ -1261,6 +1266,14 @ int ip6_route_add(struct fib6_config *cfg)
       if (rt->rt6i_dst.plen == 128)
              rt->dst.flags |= DST_HOST;

+       if (!(rt->dst.flags & DST_HOST) && cfg->fc_mx) {
+               u32 *metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
+               if (!metrics) {
+                       err = ENOMEM;
+                       goto out;
+               }
+               dst_init_metrics(&rt
>dst, metrics, 0);
+       }
 #ifdef CONFIG_IPV6_SUBTREES
       ipv6_addr_prefix(&rt->rt6i_src.addr, &cfg->fc_src, cfg->fc_src_len);
       rt->rt6i_src.plen = cfg->fc_src_len;
@ -1607,9 +1620,6 @ void rt6_redirect(const struct in6_addr *dest,
const struct in6_addr *src,
       if (on_link)
               nrt->rt6i_flags &= ~RTF_GATEWAY;

-       nrt->rt6i_dst.plen = 128;
-       nrt->dst.flags |= DST_HOST;

       ipv6_addr_copy(&nrt
>rt6i_gateway, (struct in6_addr*)neigh->primary_key);
       dst_set_neighbour(&nrt->dst, neigh_clone(neigh));

@ -1754,9 +1764,10 @ static struct rt6_info *ip6_rt_copy(const
struct rt6_info *ort,
       if (rt) {
               rt->dst.input = ort->dst.input;
               rt->dst.output = ort->dst.output;
+               rt->dst.flags |= DST_HOST;

               ipv6_addr_copy(&rt->rt6i_dst.addr, dest);
-               rt->rt6i_dst.plen = ort->rt6i_dst.plen;
+               rt->rt6i_dst.plen = 128;
               dst_copy_metrics(&rt->dst, &ort->dst);
               rt->dst.error = ort->dst.error;
               rt->rt6i_idev = ort->rt6i_idev;

Updated by Dave Täht over 1 year ago

  • Status changed from New to In Progress
  • Priority changed from Urgent to Immediate

rc6-smoketest7 will contain this patch

Updated by Dave Täht over 1 year ago

well it would contain this patch if it applied against 3.0.4... looking it over now...

Updated by Dave Täht over 1 year ago

given the chaos in the linux kernel trees due to the linux.org server problem, I felt that fixing this bug is going to have to wait until life settles down over there some.

Updated by Dave Täht over 1 year ago

  • Target version changed from 1st Public Cerowrt release to 14

Updated by David Taht over 1 year ago

This patch will need to go into rc7

---------- Forwarded message ----------
From: David Miller <>
Date: Fri, Sep 16, 2011 at 9:57 PM
Subject: Re: [PATCH] ipv6: don't use inetpeer to store metrics for routes.
To:
Cc: ,

From: "Yan, Zheng" <>
Date: Tue, 06 Sep 2011 15:34:30 +0800

Current IPv6 implementation uses inetpeer to store metrics for routes. The problem of inetpeer is that it doesn't take subnet prefix length in to consideration. If two routes have the same address but different prefix length, they share same inetpeer. So changing metrics of one route also affects the other. The fix is to allocate separate metrics storage for each route.

Signed-off-by: Zheng Yan <>

Applied, thanks for fixing this bug.

Updated by Dave Täht over 1 year ago

  • Category set to Linux Kernel
  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100
  • Estimated time set to 4.00

fixed in 3.0.something and in 3.1

Updated by Dave Täht about 1 year ago

  • Target version changed from 14 to 1st Public Cerowrt release

Also available in: Atom PDF