Re: [RFC net-next] ipv6: Use destination address determined by IPVS

To: Julian Anastasov <ja@xxxxxx>
Subject: Re: [RFC net-next] ipv6: Use destination address determined by IPVS
Cc: Simon Horman <horms@xxxxxxxxxxxx>, YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@xxxxxxxxxxxxxx>, lvs-devel@xxxxxxxxxxxxxxx, netdev@xxxxxxxxxxxxxxx, Mark Brooks <mark@xxxxxxxxxxxxxxxx>, Phil Oester <kernel@xxxxxxxxxxxx>
From: Hannes Frederic Sowa <hannes@xxxxxxxxxxxxxxxxxxx>
Date: Sun, 20 Oct 2013 09:33:08 +0200
On Sun, Oct 20, 2013 at 10:11:16AM +0300, Julian Anastasov wrote:
>       Hello,
> On Sun, 20 Oct 2013, Hannes Frederic Sowa wrote:
> > > Hm, maybe. I don't have too much insight into netfilter stack and
> > > what are the differences between OUTPUT and FORWARD path but plan to
> > > investigate. ;)
> > 
> > It seems tables are processed with bh disabled, so no preemption while
> > recursing. So I guess the use of tee_active is safe for breaking the
> > tie here.
>       May be, I'll check it again, for now I see only
> rcu_read_lock() in nf_hook_slow() which is preemptable.
> Looking at rcu_preempt_note_context_switch, many levels of
> RCU locks are preemptable too.

The caller I found was ip6t_do_table which does deactivate bottom halves.
Maybe there are others I did not see, so double checking is better.

>       In my test I used link route to local subnet, --gateway to IP
> that is not present. I'll try other variants.

Is your kernel compiled with CONFIG_IPV6_ROUTER_PREF?

> > The more I review the patch the more I think it is ok. But we could actually
> > try to just always return rt6i_gateway, as we should always be handed a 
> > cloned
> > rt6_info where the gateway is already filled in, no?
>       Yes, this patch is ok and after spending the whole
> saturday I'm preparing a new patch that will convert
> rt6_nexthop() to return just rt6i_gateway, without daddr.
> This can happen after filling rt6i_gateway in all places.
>       For your concern for loopback, I don't see problem,
> local/anycast route will have rt6i_gateway=IP, they are
> simple DST_HOST routes. I'm preparing now the patches and
> will post them in following hours.

Ok, that's a nice simplification. I'll have a look tomorrow.

I cannot test my patch today any more, so I just leave it here. It is only
compile tested. Maybe you can make use of it:

Btw: I cannot put a reference to the rt6_info into __rt6_probe_work because we
are not supposed to use rt6_info reference counters outside of ip6_fib
because the deletion from the fib will break otherwise.

Maybe we should also create a seperate ipv6 workqueue. Will check later.

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c3130ff..6c539bc 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -476,6 +476,40 @@ out:
+struct __rt6_probe_work {
+       struct work_struct work;
+       struct in6_addr target;
+       struct net_device *dev;
+static void rt6_probe_deferred(struct work_struct *w)
+       struct in6_addr mcaddr;
+       struct __rt6_probe_work *work =
+               container_of(w, struct __rt6_probe_work, work);
+       addrconf_addr_solict_mult(&work->target, &mcaddr);
+       ndisc_send_ns(work->dev, NULL, &work->target, &mcaddr, NULL);
+       dev_put(work->dev);
+       kfree(w);
+static bool rt6_probe_later(struct rt6_info *rt)
+       struct __rt6_probe_work *work;
+       work = kmalloc(sizeof(*work), GFP_ATOMIC);
+       if (!work)
+               return false;
+       INIT_WORK(&work->work, rt6_probe_deferred);
+       work->target = rt->rt6i_gateway;
+       dev_hold(rt->;
+       work->dev = rt->;
+       schedule_work(&work->work);
+       return true;
 static void rt6_probe(struct rt6_info *rt)
        struct neighbour *neigh;
@@ -499,17 +533,10 @@ static void rt6_probe(struct rt6_info *rt)
        if (!neigh ||
            time_after(jiffies, neigh->updated + 
rt->rt6i_idev->cnf.rtr_probe_interval)) {
-               struct in6_addr mcaddr;
-               struct in6_addr *target;
-               if (neigh) {
-                       neigh->updated = jiffies;
+               if (neigh)
-               }
-               target = (struct in6_addr *)&rt->rt6i_gateway;
-               addrconf_addr_solict_mult(target, &mcaddr);
-               ndisc_send_ns(rt->, NULL, target, &mcaddr, NULL);
+               if (rt6_probe_later(rt) && neigh)
+                       neigh->updated = jiffies;
        } else {



