On 23/08/2017 16:32, Julian Anastasov wrote:
> When you add more real servers you will need to use -p to enable
> persistence, ipvsadm man page explains this for FTP.
Yeah, I have persistence enabled on the HTTP / HTTPS and other services,
but I didn't bother here simply because there's only a single FTP server
at the moment.
> What shows 'ipvsadm -Lnc' when connection stucks ? What is shown for
> the command (:21) and data (20xxx data port) connection? IPVS can see
> only the packets from client, so any FIN bit we see is the client's
> wish to half-close the TCP connection. While there is existing
> connection entry, IPVS should not stop the traffic. But if the
> transfer is very long and short TCP EST timeout is used (ipvsadm --set
> TCP ...) the command connection can expire. If persistence is used
> this can not happen because the data connections bump a reference
> count that keeps the command connection. In any case, tcpdump -lnnnv
> 'host CLIENT_IP' output on director would be useful, even if only for
> the last packets before connections stucks. You can run it both on
> incoming (from client) and outgoing (to real server) interface, so
> that we can see if some packets are not forwarded. Also, what is the
> kernel version?: uname -a
It's kernel version 4.4.0-92-generic on Ubuntu Server 16.04.
(And in "/etc/modules", I've got "bonding" - as all the inter-server
links are bonded for HA purposes - and the whole list of ip_vs modules,
which "lsmod" confirms are loaded.)
I've tried "watch ipvsadm -Lnc" as it got stuck and everything looks
normal. An "ESTABLISHED" TCP connection from the router to port 21 of
the virtual and then routed to the destination of port 21 on the FTP
server. And the same thing with another TCP connection but with a port
in my passive port range. What I'd expect to see.
I did try using tshark on the director, filtering for the VIP and ports
20000 to 21000 (my passive port range). When it gets stuck, what I'm
seeing is:
192.68.0.1 [router IP] -> 192.168.0.99 [VIP] TCP 85 [TCP
Retransmission] -> 20103 [ FIN, PSH, ACK ]
And this is repeating over and over.
So what I'm interpreting this as being is the client signalling to the
VIP that it wants to close the connection (FIN) and, being TCP, there
should be an ACKnowledgement from the server that it received this. But
as that ACK ain't coming, then it's repeatedly retransmitting the
passive port "close connection" to a server that isn't responding.
And, looking on the FTP server, when it's stuck, I'm seeing this
repeating over and over:
192.168.0.99 (VIP) -> 192.168.0.1 (router) TCP 85 [TCP Retransmission]
20197 -> 51311 [ FIN, PSH, ACK ]
So the server is getting it and trying to respond, but it's not getting
back to the client.
Hmm, on the realserver, I've got a dummy interface with the VIP and, to
combat the ARP problem, have used the following rules in my
/etc/network/interfaces:
---- 8< ----
auto dummy0
iface dummy0 inet static
address 192.168.0.99
netmask 255.255.255.0
pre-up arptables -F
pre-up arptables -A INPUT -d 192.168.0.99 -j DROP
pre-up arptables -A OUTPUT -s 192.168.0.99 -j mangle
--mangle-ip-s [external IP address]
---- >8 ----
So that any ARP requests for the VIP are ignored by the realserver and
if it ARPs the VIP itself then it's changed to the external IP address
to redirect it to the right entry point into the cluster.
I preferred to use "arptables" to explicitly control the ARP problem. It
just felt less messy to do it that way to me.
Looking at these results, I think LVS is actually delivering the close
connection to the FTP server but the response is not getting back to the
client. So my configuration of "dummy0" that carries the VIP is possibly
wrong somehow?
Regards,
Owain
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
|