LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] One second connection delay in masquerading mode

To: Sergey Urbanovich <surbanovich@xxxxxxxxxxxxx>
Subject: Re: [lvs-users] One second connection delay in masquerading mode
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Julian Anastasov <ja@xxxxxx>
Date: Sat, 27 Jan 2018 17:16:33 +0200 (EET)
        Hello,

On Wed, 24 Jan 2018, Sergey Urbanovich wrote:

> Hi all,
> 
> I encountered an issue with IPVS load balancing in case of short-lived
> connections. I've seen it in masqurading mode on CentOS 7 (kernel
> 3.10.0-693.11.6) and CoreOS 1235.12.0 (4.7.3-coreos-r3). After opening and
> closing of thousands of TCP connctions, new connections are being delayed for 
> 1
> second.
> 
> Please see a short example [4], there are steps to reproduce the issue. It
> starts nginx on port 8080 and creates a virtual service 127.0.0.1:80 ->
> 127.0.0.1:8080. After that an HTTP load generator (rakyll/hey) sends 30k 
> queries
> with disabled keep-alive option. All records in ip_vs_conn table are in
> TIME_WAIT state. Then ipvs debug level is enabled and strace starts curl(1) 
> with
> the same virtual service. Curl encounters the 1 second delay as shown. 
> Attached
> you can find the full versions of strace.log and dmesg.log as well as their 
> short
> versions [1] [2].
> 
> Setting conn_reuse_mode to 0 resolves the issue, but doesn't fit our needs and
> doesn't work well in case of changing the list of real servers.
> 
> What could be causing the delay? How can we get rid of it?

        It should be this code that leads to delay:

        if (uses_ct)
                return NF_DROP;

        What happens is that we drop SYN packet that hits IPVS
connection in TIME_WAIT state if such connection uses
Netfilter connection tracking (conntrack=1).

        The conn_reuse_mode=1 relies on selecting different
real server but as we can not alter the Netfilter conntrack
tuple after it is confirmed, we drop the conntrack, the IPVS
connection and current packet and expect next SYN (retransmitted
after 1 second, as you observe) to create new IPVS connection
and corresponding conntrack to some available real server.
And that is what happens after 1 second.

        To get rid of this delay you have the following options:

1. do not enable IPVS conntrack mode (can be slower to create
and drop conntrack on every packet), use conntrack=0 for this.
This allows IPVS to ignore the TIME_WAIT connection and to
create a new one.

2. Use NOTRACK for IPVS connections, it should be faster
because conntracks are not created/removed

iptables -t raw -A PREROUTING -p tcp -d VIP --dport VPORT -j CT --notrack

For local clients use -A OUTPUT -o lo

If needed, such traffic can be matched with -m state --state UNTRACKED

3. Reduce the TIME_WAIT timeout in IPVS source, table
tcp_timeouts[]. It does not solve the problem but reduces
its rate.

Regards

--
Julian Anastasov <ja@xxxxxx>

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread] Current Thread [Next in Thread>