LVS
lvs-devel
Google
 
Web LinuxVirtualServer.org

Re: [RFC net-next] ipv6: Use destination address determined by IPVS

To: Julian Anastasov <ja@xxxxxx>
Subject: Re: [RFC net-next] ipv6: Use destination address determined by IPVS
Cc: Simon Horman <horms@xxxxxxxxxxxx>, YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@xxxxxxxxxxxxxx>, lvs-devel@xxxxxxxxxxxxxxx, netdev@xxxxxxxxxxxxxxx, Mark Brooks <mark@xxxxxxxxxxxxxxxx>, Phil Oester <kernel@xxxxxxxxxxxx>
From: Hannes Frederic Sowa <hannes@xxxxxxxxxxxxxxxxxxx>
Date: Sun, 20 Oct 2013 09:33:08 +0200
On Sun, Oct 20, 2013 at 10:11:16AM +0300, Julian Anastasov wrote:
> 
>       Hello,
> 
> On Sun, 20 Oct 2013, Hannes Frederic Sowa wrote:
> 
> > > Hm, maybe. I don't have too much insight into netfilter stack and
> > > what are the differences between OUTPUT and FORWARD path but plan to
> > > investigate. ;)
> > 
> > It seems tables are processed with bh disabled, so no preemption while
> > recursing. So I guess the use of tee_active is safe for breaking the
> > tie here.
> 
>       May be, I'll check it again, for now I see only
> rcu_read_lock() in nf_hook_slow() which is preemptable.
> Looking at rcu_preempt_note_context_switch, many levels of
> RCU locks are preemptable too.

The caller I found was ip6t_do_table which does deactivate bottom halves.
Maybe there are others I did not see, so double checking is better.

>       In my test I used link route to local subnet, --gateway to IP
> that is not present. I'll try other variants.

Is your kernel compiled with CONFIG_IPV6_ROUTER_PREF?

> > The more I review the patch the more I think it is ok. But we could actually
> > try to just always return rt6i_gateway, as we should always be handed a 
> > cloned
> > rt6_info where the gateway is already filled in, no?
> 
>       Yes, this patch is ok and after spending the whole
> saturday I'm preparing a new patch that will convert
> rt6_nexthop() to return just rt6i_gateway, without daddr.
> This can happen after filling rt6i_gateway in all places.
> 
>       For your concern for loopback, I don't see problem,
> local/anycast route will have rt6i_gateway=IP, they are
> simple DST_HOST routes. I'm preparing now the patches and
> will post them in following hours.

Ok, that's a nice simplification. I'll have a look tomorrow.

I cannot test my patch today any more, so I just leave it here. It is only
compile tested. Maybe you can make use of it:

Btw: I cannot put a reference to the rt6_info into __rt6_probe_work because we
are not supposed to use rt6_info reference counters outside of ip6_fib
because the deletion from the fib will break otherwise.

Maybe we should also create a seperate ipv6 workqueue. Will check later.

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c3130ff..6c539bc 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -476,6 +476,40 @@ out:
 }
 
 #ifdef CONFIG_IPV6_ROUTER_PREF
+struct __rt6_probe_work {
+       struct work_struct work;
+       struct in6_addr target;
+       struct net_device *dev;
+};
+
+static void rt6_probe_deferred(struct work_struct *w)
+{
+       struct in6_addr mcaddr;
+       struct __rt6_probe_work *work =
+               container_of(w, struct __rt6_probe_work, work);
+
+       addrconf_addr_solict_mult(&work->target, &mcaddr);
+       ndisc_send_ns(work->dev, NULL, &work->target, &mcaddr, NULL);
+       dev_put(work->dev);
+       kfree(w);
+}
+
+static bool rt6_probe_later(struct rt6_info *rt)
+{
+       struct __rt6_probe_work *work;
+
+       work = kmalloc(sizeof(*work), GFP_ATOMIC);
+       if (!work)
+               return false;
+
+       INIT_WORK(&work->work, rt6_probe_deferred);
+       work->target = rt->rt6i_gateway;
+       dev_hold(rt->dst.dev);
+       work->dev = rt->dst.dev;
+       schedule_work(&work->work);
+       return true;
+}
+
 static void rt6_probe(struct rt6_info *rt)
 {
        struct neighbour *neigh;
@@ -499,17 +533,10 @@ static void rt6_probe(struct rt6_info *rt)
 
        if (!neigh ||
            time_after(jiffies, neigh->updated + 
rt->rt6i_idev->cnf.rtr_probe_interval)) {
-               struct in6_addr mcaddr;
-               struct in6_addr *target;
-
-               if (neigh) {
-                       neigh->updated = jiffies;
+               if (neigh)
                        write_unlock(&neigh->lock);
-               }
-
-               target = (struct in6_addr *)&rt->rt6i_gateway;
-               addrconf_addr_solict_mult(target, &mcaddr);
-               ndisc_send_ns(rt->dst.dev, NULL, target, &mcaddr, NULL);
+               if (rt6_probe_later(rt) && neigh)
+                       neigh->updated = jiffies;
        } else {
 out:
                write_unlock(&neigh->lock);

Greetings,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

<Prev in Thread] Current Thread [Next in Thread>