Hello,
We detected a problem with IPVS module. Here's a quick summary of what
triggers the problem:
- IPVS has a hardcoded TIME_WAIT timeout of 120s
- TCP/IP layer on the kernel has a hardcoded TIME_WAIT timeout of 60s
- the connection rescheduling mechanism on IPVS acts by dropping the
first received SYN message and letting the client retransmit the SYN
message after (also hardcoded) RTO timeout, which in practice seems to
be 1s
Here is a scenario that triggers this problem:
- we have some backend server balanced by IPVS
- we have an external load balancer that balances requests from real
clients to IPVS and does SNAT
Here is what happens previous scenario under high throughput:
- the external load balancer is behaving (due to SNAT) as a single
origin IP for requests forwarded to IPVS
- IPVS receives connections and forwards them to internal servers, but
once served, on the IPVS connection table, connections remain in
TIME_WAIT during 120s
- the external load balancer has a TIME_WAIT of 60s, so after this
time (or before if reusing connections in TIME_WAIT) it recycles the
same ephemeral ports to send requests to IPVS
- in-between those 60s (where the external LB starts reusing ports)
and those 120s (where IPVS still has the connection in TIME_WAIT), the
re-scheduling mechanism on IPVS has the result of adding a 1s delay
(due to SYN-drop and the RTO timeout on the LB) to the connection
establishment
And this implies that when the external LB is under mid load, approx
250 req/s (calculated from [net.ipv4.ip_local_port_range on the LB]
divided by [TW timeout on the LB = 60s]), the rescheduling mechanism
at IPVS adds a delay of 1s to the establishment of TCP connections to
internal servers.
This 1s delay seems to be either caused by:
- a mismatch between hardcoded TW-timeout on: IPVS = 120s, standard
kernel TCP driver = 60s
- the rescheduling algorithm on IPVS that forces the client (the LB)
to wait an entire RTO before retransmitting the SYN packet
I'm not telling that IPVS is either bad parametrized neither that the
rescheduling algorithm is bad designed. You guys are awesome and have
done a really great work with IPVS.
The question is then: what can we do to avoid that 1s delay when
rescheduling connections?
If you need it, I can elaborate on all the previous details, even
provide a link of a github issue (for the docker project) with the
details on how we arrived at sending an email to this list.
Thanks in advance,
Toni
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
|