LVS
lvs-devel
Google
 
Web LinuxVirtualServer.org

Re: [PATCH] ipvs: drop templates for never established TCP connections

To: Michal Koutný <mkoutny@xxxxxxxx>
Subject: Re: [PATCH] ipvs: drop templates for never established TCP connections
Cc: wensong@xxxxxxxxxxxx, horms@xxxxxxxxxxxx, mkubecek@xxxxxxxx, netdev@xxxxxxxxxxxxxxx, lvs-devel@xxxxxxxxxxxxxxx
From: Julian Anastasov <ja@xxxxxx>
Date: Tue, 22 May 2018 10:51:47 +0300 (EEST)
        Hello,

On Mon, 21 May 2018, Michal Koutný wrote:

> IPVS includes protection against filling the ip_vs_conn_tab by dropping 1/32 
> of
> feasible entries every second. The template entries (for persistent services)
> are never directly deleted by this mechanism but when a picked TCP connection
> entry is being dropped (1), the respective template entry is dropped too
> (realized by expiring 60 seconds after the connection entry being dropped).

        We try to drop the template in ip_vs_random_dropentry()
but I guess kernel/time/timer.c:enqueue_timer() puts both timers
in reverse order for expiration by using hlist_add_head().

> There is another mechanism that removes connection entries when they
> time out (2), in this case the associated template entry is not deleted.
> Under SYN flood template entries would accumulate (due to their entry
> longer timeout).

        There is also ip_vs_todrop() called in tcp_conn_schedule().
It just drops specific part from the SYNs on memory pressure.

> The accumulation takes place also with drop_entry being enabled. Roughly
> 15% ((31/32)^60) of SYN_RECV connections survive the dropping mechanism
> (1) and are removed by the timeout mechanism (2)(defaults to 60 seconds
> for SYN_RECV), thus template entries would still accumulate.
> 
> The patch ensures that when a connection entry times out, we also remove the
> template entry from the table. To prevent breaking persistent services (since
> the connection may time out in already established state) we add a new entry
> flag to protect templates what spawned at least one established TCP 
> connection.
> 
> Cc: Michal Kubeček <mkubecek@xxxxxxxx>
> Signed-off-by: Michal Koutný <mkoutny@xxxxxxxx>
> ---
>  include/uapi/linux/ip_vs.h           | 33 +++++++++++++++++----------------
>  net/netfilter/ipvs/ip_vs_conn.c      | 10 +++++++++-
>  net/netfilter/ipvs/ip_vs_core.c      | 15 ++++++++++++++-
>  net/netfilter/ipvs/ip_vs_proto_tcp.c |  6 ++++++
>  4 files changed, 46 insertions(+), 18 deletions(-)
> 
> diff --git a/include/uapi/linux/ip_vs.h b/include/uapi/linux/ip_vs.h
> index 1c916b2f89dc..ef3bbc001fcd 100644
> --- a/include/uapi/linux/ip_vs.h
> +++ b/include/uapi/linux/ip_vs.h
> @@ -79,22 +79,23 @@
>   *      IPVS Connection Flags
>   *      Only flags 0..15 are sent to backup server
>   */
> -#define IP_VS_CONN_F_FWD_MASK        0x0007          /* mask for the fwd 
> methods */
> -#define IP_VS_CONN_F_MASQ    0x0000          /* masquerading/NAT */
> -#define IP_VS_CONN_F_LOCALNODE       0x0001          /* local node */
> -#define IP_VS_CONN_F_TUNNEL  0x0002          /* tunneling */
> -#define IP_VS_CONN_F_DROUTE  0x0003          /* direct routing */
> -#define IP_VS_CONN_F_BYPASS  0x0004          /* cache bypass */
> -#define IP_VS_CONN_F_SYNC    0x0020          /* entry created by sync */
> -#define IP_VS_CONN_F_HASHED  0x0040          /* hashed entry */
> -#define IP_VS_CONN_F_NOOUTPUT        0x0080          /* no output packets */
> -#define IP_VS_CONN_F_INACTIVE        0x0100          /* not established */
> -#define IP_VS_CONN_F_OUT_SEQ 0x0200          /* must do output seq adjust */
> -#define IP_VS_CONN_F_IN_SEQ  0x0400          /* must do input seq adjust */
> -#define IP_VS_CONN_F_SEQ_MASK        0x0600          /* in/out sequence mask 
> */
> -#define IP_VS_CONN_F_NO_CPORT        0x0800          /* no client port set 
> yet */
> -#define IP_VS_CONN_F_TEMPLATE        0x1000          /* template, not 
> connection */
> -#define IP_VS_CONN_F_ONE_PACKET      0x2000          /* forward only one 
> packet */
> +#define IP_VS_CONN_F_FWD_MASK                0x0007          /* mask for the 
> fwd methods */
> +#define IP_VS_CONN_F_MASQ            0x0000          /* masquerading/NAT */
> +#define IP_VS_CONN_F_LOCALNODE               0x0001          /* local node */
> +#define IP_VS_CONN_F_TUNNEL          0x0002          /* tunneling */
> +#define IP_VS_CONN_F_DROUTE          0x0003          /* direct routing */
> +#define IP_VS_CONN_F_BYPASS          0x0004          /* cache bypass */
> +#define IP_VS_CONN_F_SYNC            0x0020          /* entry created by 
> sync */
> +#define IP_VS_CONN_F_HASHED          0x0040          /* hashed entry */
> +#define IP_VS_CONN_F_NOOUTPUT                0x0080          /* no output 
> packets */
> +#define IP_VS_CONN_F_INACTIVE                0x0100          /* not 
> established */
> +#define IP_VS_CONN_F_OUT_SEQ         0x0200          /* must do output seq 
> adjust */
> +#define IP_VS_CONN_F_IN_SEQ          0x0400          /* must do input seq 
> adjust */
> +#define IP_VS_CONN_F_SEQ_MASK                0x0600          /* in/out 
> sequence mask */
> +#define IP_VS_CONN_F_NO_CPORT                0x0800          /* no client 
> port set yet */
> +#define IP_VS_CONN_F_TEMPLATE                0x1000          /* template, 
> not connection */
> +#define IP_VS_CONN_F_ONE_PACKET              0x2000          /* forward only 
> one packet */
> +#define IP_VS_CONN_F_TMPL_PERSISTED  0x4000          /* template, confirmed 
> persistent */
>  
>  /* Initial bits allowed in backup server */
>  #define IP_VS_CONN_F_BACKUP_MASK (IP_VS_CONN_F_FWD_MASK | \
> diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
> index 370abbf6f421..6afc606a388c 100644
> --- a/net/netfilter/ipvs/ip_vs_conn.c
> +++ b/net/netfilter/ipvs/ip_vs_conn.c
> @@ -820,6 +820,7 @@ static void ip_vs_conn_rcu_free(struct rcu_head *head)
>  static void ip_vs_conn_expire(struct timer_list *t)
>  {
>       struct ip_vs_conn *cp = from_timer(cp, t, timer);
> +     struct ip_vs_conn *cp_c;
>       struct netns_ipvs *ipvs = cp->ipvs;
>  
>       /*
> @@ -834,8 +835,15 @@ static void ip_vs_conn_expire(struct timer_list *t)
>               del_timer(&cp->timer);
>  
>               /* does anybody control me? */
> -             if (cp->control)
> +             cp_c = cp->control;
> +             if (cp_c) {
>                       ip_vs_control_del(cp);
> +                     if (cp_c->flags & IP_VS_CONN_F_TEMPLATE &&
> +                         !(cp_c->flags & IP_VS_CONN_F_TMPL_PERSISTED)) {
> +                             IP_VS_DBG(4, "del conn template\n");
> +                             ip_vs_conn_expire_now(cp_c);

        So, we have current conn expired after 60 seconds
in IP_VS_TCP_S_SYN_RECV state and possibly other conns
in same state that are not expired yet.

        Another option is just to use something like:

        if (cp_c) {
                ip_vs_control_del(cp);
                /* Restart cp_c timer only for last conn */
                if (!atomic_read(&cp_c->n_control) &&
                    (cp_c->flags & IP_VS_CONN_F_TEMPLATE) &&
                    /* Some func to decide when to drop cp_c:
                     * - it can be for SYN state
                     * - it can be when cp was dropped on load
                     */
                    cp->state == IP_VS_TCP_S_SYN_RECV) {
                        IP_VS_DBG(4, "del conn template\n");
                        ip_vs_conn_expire_now(cp_c);
                }
        }

        It is not perfect, i.e. it does not know if there was
some conn that was established in the past:

- CONN1: SYN, SYN+ACK, ESTABLISH, FIN, FIN+ACK, expire
- CONN2: expire in SYN state, drop tpl before persistent timeout

        But it should work in the general case.
Anyways, give me some days to think more on this issue.

Regards

--
Julian Anastasov <ja@xxxxxx>
<Prev in Thread] Current Thread [Next in Thread>