[PATCH nf 1/3] netfilter: ipvs: drop first packet to redirect conntrack

To: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx>
Subject: [PATCH nf 1/3] netfilter: ipvs: drop first packet to redirect conntrack
Cc: lvs-devel@xxxxxxxxxxxxxxx, netdev@xxxxxxxxxxxxxxx, netfilter-devel@xxxxxxxxxxxxxxx, Wensong Zhang <wensong@xxxxxxxxxxxx>, Julian Anastasov <ja@xxxxxx>, Jiri Bohac <jbohac@xxxxxxx>, Simon Horman <horms@xxxxxxxxxxxx>
From: Simon Horman <horms@xxxxxxxxxxxx>
Date: Thu, 18 Feb 2016 09:40:59 +0900
From: Julian Anastasov <ja@xxxxxx>

Jiri Bohac is reporting for a problem where the attempt
to reschedule existing connection to another real server
needs proper redirect for the conntrack used by the IPVS
connection. For example, when IPVS connection is created
to NAT-ed real server we alter the reply direction of
conntrack. If we later decide to select different real
server we can not alter again the conntrack. And if we
expire the old connection, the new connection is left
without conntrack.

So, the only way to redirect both the IPVS connection and
the Netfilter's conntrack is to drop the SYN packet that
hits existing connection, to wait for the next jiffie
to expire the old connection and its conntrack and to rely
on client's retransmission to create new connection as

Jiri Bohac provided a fix that drops all SYNs on rescheduling,
I extended his patch to do such drops only for connections
that use conntrack. Here is the original report from Jiri Bohac:

Since commit dc7b3eb900aa ("ipvs: Fix reuse connection if real server
is dead"), new connections to dead servers are redistributed
immediately to new servers.  The old connection is expired using
ip_vs_conn_expire_now() which sets the connection timer to expire

However, before the timer callback, ip_vs_conn_expire(), is run
to clean the connection's conntrack entry, the new redistributed
connection may already be established and its conntrack removed

Fix this by dropping the first packet of the new connection
instead, like we do when the destination server is not available.
The timer will have deleted the old conntrack entry long before
the first packet of the new connection is retransmitted.

Fixes: dc7b3eb900aa ("ipvs: Fix reuse connection if real server is dead")
Signed-off-by: Jiri Bohac <jbohac@xxxxxxx>
Signed-off-by: Julian Anastasov <ja@xxxxxx>
Signed-off-by: Simon Horman <horms@xxxxxxxxxxxx>
 include/net/ip_vs.h             | 17 +++++++++++++++++
 net/netfilter/ipvs/ip_vs_core.c | 37 ++++++++++++++++++++++++++++---------
 2 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 0816c872b689..a6cc576fd467 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1588,6 +1588,23 @@ static inline void ip_vs_conn_drop_conntrack(struct 
ip_vs_conn *cp)
 #endif /* CONFIG_IP_VS_NFCT */
+/* Really using conntrack? */
+static inline bool ip_vs_conn_uses_conntrack(struct ip_vs_conn *cp,
+                                            struct sk_buff *skb)
+       enum ip_conntrack_info ctinfo;
+       struct nf_conn *ct;
+       if (!(cp->flags & IP_VS_CONN_F_NFCT))
+               return false;
+       ct = nf_ct_get(skb, &ctinfo);
+       if (ct && !nf_ct_is_untracked(ct))
+               return true;
+       return false;
 static inline int
 ip_vs_dest_conn_overhead(struct ip_vs_dest *dest)
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index f57b4dcdb233..4da560005b0e 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1757,15 +1757,34 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, 
struct sk_buff *skb, int
        cp = pp->conn_in_get(ipvs, af, skb, &iph);
        conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
-       if (conn_reuse_mode && !iph.fragoffs &&
-           is_new_conn(skb, &iph) && cp &&
-           ((unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
-             unlikely(!atomic_read(&cp->dest->weight))) ||
-            unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {
-               if (!atomic_read(&cp->n_control))
-                       ip_vs_conn_expire_now(cp);
-               __ip_vs_conn_put(cp);
-               cp = NULL;
+       if (conn_reuse_mode && !iph.fragoffs && is_new_conn(skb, &iph) && cp) {
+               bool uses_ct = false, resched = false;
+               if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
+                   unlikely(!atomic_read(&cp->dest->weight))) {
+                       resched = true;
+                       uses_ct = ip_vs_conn_uses_conntrack(cp, skb);
+               } else if (is_new_conn_expected(cp, conn_reuse_mode)) {
+                       uses_ct = ip_vs_conn_uses_conntrack(cp, skb);
+                       if (!atomic_read(&cp->n_control)) {
+                               resched = true;
+                       } else {
+                               /* Do not reschedule controlling connection
+                                * that uses conntrack while it is still
+                                * referenced by controlled connection(s).
+                                */
+                               resched = !uses_ct;
+                       }
+               }
+               if (resched) {
+                       if (!atomic_read(&cp->n_control))
+                               ip_vs_conn_expire_now(cp);
+                       __ip_vs_conn_put(cp);
+                       if (uses_ct)
+                               return NF_DROP;
+                       cp = NULL;
+               }
        if (unlikely(!cp)) {

To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at

<Prev in Thread] Current Thread [Next in Thread>