LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] FTP data port connection not closing?

To: Julian Anastasov <ja@xxxxxx>
Subject: Re: [lvs-users] FTP data port connection not closing?
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Owain Jones <Owain@xxxxxxxxxxxxxxxxxx>
Date: Thu, 24 Aug 2017 10:04:49 +0100
On 23/08/2017 16:32, Julian Anastasov wrote:
> When you add more real servers you will need to use -p to enable 
> persistence, ipvsadm man page explains this for FTP.
Yeah, I have persistence enabled on the HTTP / HTTPS and other services, 
but I didn't bother here simply because there's only a single FTP server 
at the moment.

> What shows 'ipvsadm -Lnc' when connection stucks ? What is shown for 
> the command (:21) and data (20xxx data port) connection? IPVS can see 
> only the packets from client, so any FIN bit we see is the client's 
> wish to half-close the TCP connection. While there is existing 
> connection entry, IPVS should not stop the traffic. But if the 
> transfer is very long and short TCP EST timeout is used (ipvsadm --set 
> TCP ...) the command connection can expire. If persistence is used 
> this can not happen because the data connections bump a reference 
> count that keeps the command connection. In any case, tcpdump -lnnnv 
> 'host CLIENT_IP' output on director would be useful, even if only for 
> the last packets before connections stucks. You can run it both on 
> incoming (from client) and outgoing (to real server) interface, so 
> that we can see if some packets are not forwarded. Also, what is the 
> kernel version?: uname -a

It's kernel version 4.4.0-92-generic on Ubuntu Server 16.04.

(And in "/etc/modules", I've got "bonding" - as all the inter-server 
links are bonded for HA purposes - and the whole list of ip_vs modules, 
which "lsmod" confirms are loaded.)

I've tried "watch ipvsadm -Lnc" as it got stuck and everything looks 
normal. An "ESTABLISHED" TCP connection from the router to port 21 of 
the virtual and then routed to the destination of port 21 on the FTP 
server. And the same thing with another TCP connection but with a port 
in my passive port range. What I'd expect to see.

I did try using tshark on the director, filtering for the VIP and ports 
20000 to 21000 (my passive port range). When it gets stuck, what I'm 
seeing is:

192.68.0.1 [router IP] -> 192.168.0.99 [VIP] TCP 85 [TCP 
Retransmission]  -> 20103 [ FIN, PSH, ACK ]

And this is repeating over and over.

So what I'm interpreting this as being is the client signalling to the 
VIP that it wants to close the connection (FIN) and, being TCP, there 
should be an ACKnowledgement from the server that it received this. But 
as that ACK ain't coming, then it's repeatedly retransmitting the 
passive port "close connection" to a server that isn't responding.

And, looking on the FTP server, when it's stuck, I'm seeing this 
repeating over and over:

192.168.0.99 (VIP) -> 192.168.0.1 (router) TCP 85 [TCP Retransmission] 
20197 -> 51311 [ FIN, PSH, ACK ]

So the server is getting it and trying to respond, but it's not getting 
back to the client.

Hmm, on the realserver, I've got a dummy interface with the VIP and, to 
combat the ARP problem, have used the following rules in my 
/etc/network/interfaces:

---- 8< ----

auto dummy0
iface dummy0 inet static
     address        192.168.0.99
     netmask        255.255.255.0
     pre-up        arptables -F
     pre-up        arptables -A INPUT -d 192.168.0.99 -j DROP
     pre-up        arptables -A OUTPUT -s 192.168.0.99 -j mangle 
--mangle-ip-s [external IP address]

---- >8 ----

So that any ARP requests for the VIP are ignored by the realserver and 
if it ARPs the VIP itself then it's changed to the external IP address 
to redirect it to the right entry point into the cluster.

I preferred to use "arptables" to explicitly control the ARP problem. It 
just felt less messy to do it that way to me.

Looking at these results, I think LVS is actually delivering the close 
connection to the FTP server but the response is not getting back to the 
client. So my configuration of "dummy0" that carries the VIP is possibly 
wrong somehow?

Regards,
Owain

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread] Current Thread [Next in Thread>