LVS
lvs-devel
Google
 
Web LinuxVirtualServer.org

Re: [PATCH] netfilter/ipvs: immediately expire UDP connections matching

To: Julian Anastasov <ja@xxxxxx>
Subject: Re: [PATCH] netfilter/ipvs: immediately expire UDP connections matching unavailable destination if expire_nodest_conn=1
Cc: "David S. Miller" <davem@xxxxxxxxxxxxx>, Alexey Kuznetsov <kuznet@xxxxxxxxxxxxx>, Hideaki YOSHIFUJI <yoshfuji@xxxxxxxxxxxxxx>, Wensong Zhang <wensong@xxxxxxxxxxxx>, Simon Horman <horms@xxxxxxxxxxxx>, Jakub Kicinski <kuba@xxxxxxxxxx>, Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx>, Jozsef Kadlecsik <kadlec@xxxxxxxxxxxxx>, Florian Westphal <fw@xxxxxxxxx>, "open list:IPVS" <netdev@xxxxxxxxxxxxxxx>, "open list:IPVS" <lvs-devel@xxxxxxxxxxxxxxx>, open list <linux-kernel@xxxxxxxxxxxxxxx>, "open list:NETFILTER" <netfilter-devel@xxxxxxxxxxxxxxx>, "open list:NETFILTER" <coreteam@xxxxxxxxxxxxx>
From: Andrew Kim <kim.andrewsy@xxxxxxxxx>
Date: Mon, 18 May 2020 15:54:30 -0400
Hi Julian,

Thank you for getting back to me. I will update the patch based on
your feedback shortly.

Regards,

Andrew

On Mon, May 18, 2020 at 3:10 PM Julian Anastasov <ja@xxxxxx> wrote:
>
>
>         Hello,
>
> On Sun, 17 May 2020, Andrew Sy Kim wrote:
>
> > If expire_nodest_conn=1 and a UDP destination is deleted, IPVS should
> > also expire all matching connections immiediately instead of waiting for
> > the next matching packet. This is particulary useful when there are a
> > lot of packets coming from a few number of clients. Those clients are
> > likely to match against existing entries if a source port in the
> > connection hash is reused. When the number of entries in the connection
> > tracker is large, we can significantly reduce the number of dropped
> > packets by expiring all connections upon deletion.
> >
> > Signed-off-by: Andrew Sy Kim <kim.andrewsy@xxxxxxxxx>
> > ---
> >  include/net/ip_vs.h             |  7 ++++++
> >  net/netfilter/ipvs/ip_vs_conn.c | 38 +++++++++++++++++++++++++++++++++
> >  net/netfilter/ipvs/ip_vs_core.c |  5 -----
> >  net/netfilter/ipvs/ip_vs_ctl.c  |  9 ++++++++
> >  4 files changed, 54 insertions(+), 5 deletions(-)
> >
>
> > diff --git a/net/netfilter/ipvs/ip_vs_conn.c 
> > b/net/netfilter/ipvs/ip_vs_conn.c
> > index 02f2f636798d..c69dfbbc3416 100644
> > --- a/net/netfilter/ipvs/ip_vs_conn.c
> > +++ b/net/netfilter/ipvs/ip_vs_conn.c
> > @@ -1366,6 +1366,44 @@ static void ip_vs_conn_flush(struct netns_ipvs *ipvs)
> >               goto flush_again;
> >       }
> >  }
> > +
> > +/*   Flush all the connection entries in the ip_vs_conn_tab with a
> > + *   matching destination.
> > + */
> > +void ip_vs_conn_flush_dest(struct netns_ipvs *ipvs, struct ip_vs_dest 
> > *dest)
> > +{
> > +     int idx;
> > +     struct ip_vs_conn *cp, *cp_c;
> > +
> > +     rcu_read_lock();
> > +     for (idx = 0; idx < ip_vs_conn_tab_size; idx++) {
> > +             hlist_for_each_entry_rcu(cp, &ip_vs_conn_tab[idx], c_list) {
> > +                     if (cp->ipvs != ipvs)
> > +                             continue;
> > +
> > +                     if (cp->dest != dest)
> > +                             continue;
> > +
> > +                     /* As timers are expired in LIFO order, restart
> > +                      * the timer of controlling connection first, so
> > +                      * that it is expired after us.
> > +                      */
> > +                     cp_c = cp->control;
> > +                     /* cp->control is valid only with reference to cp */
> > +                     if (cp_c && __ip_vs_conn_get(cp)) {
> > +                             IP_VS_DBG(4, "del controlling connection\n");
> > +                             ip_vs_conn_expire_now(cp_c);
> > +                             __ip_vs_conn_put(cp);
> > +                     }
> > +                     IP_VS_DBG(4, "del connection\n");
> > +                     ip_vs_conn_expire_now(cp);
> > +             }
> > +             cond_resched_rcu();
>
>         Such kind of loop is correct if done in another context:
>
> 1. kthread
> or
> 2. delayed work: mod_delayed_work(system_long_wq, ...)
>
>         Otherwise cond_resched_rcu() can schedule() while holding
> __ip_vs_mutex. Also, it will add long delay if many dests are
> removed.
>
>         If such loop analyzes instead all cp->dest for
> IP_VS_DEST_F_AVAILABLE, it should be done after calling
> __ip_vs_conn_get().
>
> >  static int sysctl_snat_reroute(struct netns_ipvs *ipvs) { return 0; }
> > diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> > index 8d14a1acbc37..f87c03622874 100644
> > --- a/net/netfilter/ipvs/ip_vs_ctl.c
> > +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> > @@ -1225,6 +1225,15 @@ ip_vs_del_dest(struct ip_vs_service *svc, struct 
> > ip_vs_dest_user_kern *udest)
> >        */
> >       __ip_vs_del_dest(svc->ipvs, dest, false);
> >
> > +     /*      If expire_nodest_conn is enabled and protocol is UDP,
> > +      *      attempt best effort flush of all connections with this
> > +      *      destination.
> > +      */
> > +     if (sysctl_expire_nodest_conn(svc->ipvs) &&
> > +         dest->protocol == IPPROTO_UDP) {
> > +             ip_vs_conn_flush_dest(svc->ipvs, dest);
>
>         Above work should be scheduled from __ip_vs_del_dest().
> Check for UDP is not needed, sysctl_expire_nodest_conn() is for
> all protocols.
>
>         If the flushing is complex to implement, we can still allow
> rescheduling for unavailable dests:
>
> - first we should move this block above the ip_vs_try_to_schedule()
> block because:
>
>         1. the scheduling does not return unavailabel dests, even
>         for persistence, so no need to check new connections for
>         the flag
>
>         2. it will allow to create new connection if dest for
>         existing connection is unavailable
>
>         if (cp && cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) {
>                 /* the destination server is not available */
>
>                 if (sysctl_expire_nodest_conn(ipvs)) {
>                         bool uses_ct = ip_vs_conn_uses_conntrack(cp, skb);
>
>                         ip_vs_conn_expire_now(cp);
>                         __ip_vs_conn_put(cp);
>                         if (uses_ct)
>                                 return NF_DROP;
>                         cp = NULL;
>                 } else {
>                         __ip_vs_conn_put(cp);
>                         return NF_DROP;
>                 }
>         }
>
>         if (unlikely(!cp)) {
>                 int v;
>
>                 if (!ip_vs_try_to_schedule(ipvs, af, skb, pd, &v, &cp, &iph))
>                         return v;
>         }
>
>         Before now, we always waited one jiffie connection to expire,
> now one packet will:
>
> - schedule expiration for existing connection with unavailable dest,
> as before
>
> - create new connection to available destination that will be found
> first in lists. But it can work only when sysctl var "conntrack" is 0,
> we do not want to create two netfilter conntracks to different
> real servers.
>
>         Note that we intentionally removed the timer_pending() check
> because we can not see existing ONE_PACKET connections in table.
>
> Regards
>
> --
> Julian Anastasov <ja@xxxxxx>

<Prev in Thread] Current Thread [Next in Thread>