Hi all,
I encountered an issue with IPVS load balancing in case of short-lived
connections. I've seen it in masqurading mode on CentOS 7 (kernel
3.10.0-693.11.6) and CoreOS 1235.12.0 (4.7.3-coreos-r3). After opening and
closing of thousands of TCP connctions, new connections are being delayed for 1
second.
Please see a short example [4], there are steps to reproduce the issue. It
starts nginx on port 8080 and creates a virtual service 127.0.0.1:80 ->
127.0.0.1:8080. After that an HTTP load generator (rakyll/hey) sends 30k queries
with disabled keep-alive option. All records in ip_vs_conn table are in
TIME_WAIT state. Then ipvs debug level is enabled and strace starts curl(1) with
the same virtual service. Curl encounters the 1 second delay as shown. Attached
you can find the full versions of strace.log and dmesg.log as well as their
short
versions [1] [2].
Setting conn_reuse_mode to 0 resolves the issue, but doesn't fit our needs and
doesn't work well in case of changing the list of real servers.
What could be causing the delay? How can we get rid of it?
Thank you.
[1] strace.log
1516823661.570756 connect(3, {sa_family=AF_INET, sin_port=htons(80),
sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in
progress)
1516823661.571578 poll([{fd=3, events=POLLOUT|POLLWRNORM}], 1, 0) = 0 (Timeout)
1516823661.572153 poll([{fd=3, events=POLLOUT}], 1, 1000) = 1 ([{fd=3,
revents=POLLOUT}])
1516823662.573304 poll([{fd=3, events=POLLOUT|POLLWRNORM}], 1, 0) = 1 ([{fd=3,
revents=POLLOUT|POLLWRNORM}])
1516823662.573556 getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
1516823662.573947 getpeername(3, {sa_family=AF_INET, sin_port=htons(80),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
1516823662.574139 getsockname(3, {sa_family=AF_INET, sin_port=htons(34258),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
1516823662.574641 sendto(3, "GET / HTTP/1.1\r\nUser-Agent: curl"..., 73,
MSG_NOSIGNAL, NULL, 0) = 73
[2] dmesg.log
706.699647 IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132
706.699656 IPVS: lookup/out TCP 127.0.0.1:34258->127.0.0.1:80 not hit
706.699662 IPVS: lookup/in TCP 127.0.0.1:34258->127.0.0.1:80 hit
706.700092 IPVS: ip_vs_conn_drop_conntrack: dropping conntrack with
tuple=127.0.0.1:34258->127.0.0.1:80/6 for conn
127.0.0.1:34258->127.0.0.1:80->127.0.0.1:8080/6:5
706.700097 IPVS: ip_vs_conn_drop_conntrack: no conntrack for
tuple=127.0.0.1:34258->127.0.0.1:80/6
706.700102 IPVS: ip_vs_conn_drop_conntrack: dropping conntrack with
tuple=127.0.0.1:34258->127.0.0.1:80/6 for conn
127.0.0.1:34258->127.0.0.1:80->127.0.0.1:8080/6:5
706.700105 IPVS: ip_vs_conn_drop_conntrack: no conntrack for
tuple=127.0.0.1:34258->127.0.0.1:80/6
706.700110 IPVS: Unbind-dest TCP c:127.0.0.1:34258 v:127.0.0.1:80
d:127.0.0.1:8080 fwd:M s:5 conn->flags:10100 conn->refcnt:0 dest->refcnt:15505
707.701103 IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132
707.701128 IPVS: lookup/out TCP 127.0.0.1:34258->127.0.0.1:80 not hit
707.701143 IPVS: lookup/in TCP 127.0.0.1:34258->127.0.0.1:80 not hit
707.701156 IPVS: lookup service: fwm 0 TCP 127.0.0.1:80 hit
707.701169 IPVS: lookup/in TCP 127.0.0.1:80->127.0.0.1:34258 not hit
707.701180 IPVS: ip_vs_wlc_schedule(): Scheduling...
707.701193 IPVS: WLC: server 127.0.0.1:8080 activeconns 0 refcnt 15504 weight 1
overhead 15503
707.701210 IPVS: Bind-dest TCP c:127.0.0.1:34258 v:127.0.0.1:80
d:127.0.0.1:8080 fwd:M s:5 conn->flags:100 conn->refcnt:1 dest->refcnt:15505
707.701225 IPVS: Schedule fwd:M c:127.0.0.1:34258 v:127.0.0.1:80
d:127.0.0.1:8080 conn->flags:10140 conn->refcnt:2
707.701239 IPVS: TCP input [S...] 127.0.0.1:8080->127.0.0.1:34258 state:
NONE->SYN_RECV conn->refcnt:2
707.701252 IPVS: Enter: ip_vs_nat_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 625
707.701269 IPVS: ip_vs_update_conntrack: Updating conntrack
ct=ffff8800ce14e5d0, status=0x100, ctinfo=2, old
reply=127.0.0.1:80->127.0.0.1:34258/6, new
reply=127.0.0.1:8080->127.0.0.1:34258/6,
cp=127.0.0.1:34258->127.0.0.1:80->127.0.0.1:8080/6:3
707.701282 IPVS: Leave: ip_vs_nat_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 698
707.701330 IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132
707.701374 IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132
[3] sysctl
# uname -r
3.10.0-693.11.6.el7.x86_64.debug
# sysctl net.ipv4.vs
net.ipv4.vs.am_droprate = 10
net.ipv4.vs.amemthresh = 1024
net.ipv4.vs.backup_only = 0
net.ipv4.vs.cache_bypass = 0
net.ipv4.vs.conn_reuse_mode = 1
net.ipv4.vs.conntrack = 1
net.ipv4.vs.debug_level = 0
net.ipv4.vs.drop_entry = 0
net.ipv4.vs.drop_packet = 0
net.ipv4.vs.expire_nodest_conn = 0
net.ipv4.vs.expire_quiescent_template = 0
net.ipv4.vs.nat_icmp_send = 0
net.ipv4.vs.pmtu_disc = 1
net.ipv4.vs.secure_tcp = 0
net.ipv4.vs.snat_reroute = 1
net.ipv4.vs.sync_ports = 1
net.ipv4.vs.sync_qlen_max = 61432
net.ipv4.vs.sync_refresh_period = 0
net.ipv4.vs.sync_retries = 0
net.ipv4.vs.sync_sock_size = 0
net.ipv4.vs.sync_threshold = 3 50
net.ipv4.vs.sync_version = 1
# cat /sys/module/ip_vs/parameters/conn_tab_bits
18
[4] Steps to reproduce
# modprobe ip_vs_wlc
# sysctl -w net.ipv4.vs.conntrack=1
# sysctl -w net.ipv4.ip_local_port_range='32768 60999'
# docker run --rm --net=host --detach --name=nginx nginx sh -c 'echo "daemon
off; worker_processes 2; events { worker_connections 16384; } http { server {
listen 8080 backlog=16384; access_log off; server_name localhost; location / {
return 200; } } }" > /etc/nginx/ping.conf && nginx -c /etc/nginx/ping.conf'
# docker run --rm --net=host --privileged mesosphere/net-toolbox ipvsadm -C
# docker run --rm --net=host --privileged mesosphere/net-toolbox ipvsadm -A -t
127.0.0.1:80
# docker run --rm --net=host --privileged mesosphere/net-toolbox ipvsadm -a -t
127.0.0.1:80 -r 127.0.0.1:8080 -m
# docker run --rm --net=host mesosphere/hey -n 30000 -c 1000 -disable-keepalive
-t 3 http://127.0.0.1/
# awk '{print $8}' /proc/net/ip_vs_conn | sort | uniq -c
# sysctl -w net.ipv4.vs.debug_level=15; strace -ttt curl --silent
http://127.0.0.1/ 2>&1 | tee strace.log; sysctl -w net.ipv4.vs.debug_level=0
# dmesg | sed -re 's!^\[ *([0-9.]+)\]!\1!' | tee dmesg.log
# docker run --rm --net=host --privileged mesosphere/net-toolbox ipvsadm -C
# docker kill nginx
dmesg.log
Description: Binary data
strace.log
Description: Binary data
--
Yours,
Sergey Urbanovich
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
|