Re: [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_

To: Julian Anastasov <ja@xxxxxx>
Subject: Re: [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT
Cc: lvs-devel@xxxxxxxxxxxxxxx, hannes@xxxxxxxxxx, jbrouer@xxxxxxxxxx
From: Marcelo Ricardo Leitner <mleitner@xxxxxxxxxx>
Date: Thu, 11 Dec 2014 10:44:31 -0200
On 10-12-2014 19:27, Julian Anastasov wrote:


On Wed, 10 Dec 2014, Marcelo Ricardo Leitner wrote:

     return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
            (cp->state == IP_VS_TCP_S_FIN_WAIT &&
         (cp->flags & IP_VS_CONN_F_NOOUTPUT) &&
         time_after_eq(jiffies + cp->timeout,
                   cp->timer.expires + 1 * HZ));

not expired because only one side sent FIN. For

And when the other side send it too, it will be in TIME_WAIT then. Ok.

        Yes, for IPVS-NAT TIME_WAIT means both sides closed.

     If such checks look dangerous we can try to expire

Or only enable it within some sysctl, like we have expire_nodest_conn? We
could have a "(tcp_)rebalance_on_port_reuse" or something like that.

Actually, I'm more for this tunable than a timeout, as it would allow one to
easily rebalance the cluster when one add new nodes. More like:

      return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
            (sysctl_tcp_reschedule_on_syn &&
             cp->state == IP_VS_TCP_S_FIN_WAIT &&
             (cp->flags & IP_VS_CONN_F_NOOUTPUT));

Then we play safe, leave that sysctl_tcp_reschedule_on_syn off by default, but
one may use it when applicable. WDYT?

        Agreed, a sysctl var is ok. Only that my preference
is to have sysctl var that is checked even before
is_new_conn() call to save some CPU cycles for a rare packet
such as SYN :)

        In theory, skb_header_pointer() is fast and we will
get pointer to header in the common case (without copy) but
checks here and there cost some cycles...

        One option would be to implement "conn_reuse_mode"
as a bitmask (for now 2 bits), values 0, 1 or 3:

0: do not call is_new_conn(), even for sysctl_expire_nodest_conn()
purposes. Fastest but without any expiation (reuse always).

1: default: TIME_WAIT-only check for TCP,
allow the sysctl_expire_nodest_conn check for TCP/SCTP.
In fact, check for this state is the global conn_reuse_mode != 0.

3 (&2 if treated as bitmask): like 1 but also
allows DR/TUN in FIN_WAIT to expire (int conn_reuse_mode can
come as argument):

        return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
               ((conn_reuse_mode & 2) &&
                cp->state == IP_VS_TCP_S_FIN_WAIT &&
                (cp->flags & IP_VS_CONN_F_NOOUTPUT));

        In fact, 2 and 3 will do the same.

        Then we will have:

        int conn_reuse_mode;

        conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
        if (conn_reuse_mode && !iph.fragoffs &&
            is_new_conn(skb, &iph) && cp &&
            unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {

Looks great! Thanks for all the directions.

One question: at this if(), shouldn't we also check for &cp->n_control or you considered it in the '...'? Because as you said, if this connection controls another one we won't expire it and AFAICT we may end up with two duplicate entries for the given src host/port/service.

Currently I'm with:
        if (conn_reuse_mode && !iph.fragoffs &&
            is_new_conn(skb, &iph) && cp &&
            !atomic_read(&cp->n_control) &&
            ((unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
              unlikely(!atomic_read(&cp->dest->weight))) ||
             unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {


        conn_reuse_mode=0 will work as global disable for reuse.
But 1 will be the default value.

        You can grep in following way to see the places that
define sysctl vars, "backup_only" is a good example to use
as reference:
# grep -r backup_only include/net/ip_vs.h net/netfilter/ipvs/

        Order of vars in struct, vs_vars and
ip_vs_control_net_init_sysctl() should be same, we
have per-netns vars. You can add the new var after backup_only.


Julian Anastasov <ja@xxxxxx>

To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at

<Prev in Thread] Current Thread [Next in Thread>