Yo
kernel 2.6.22 + julian's nfct patch.
/proc/net/ipv4/vs/
snat_reroute=1
conntrack=1
I have a server behind LVS-NAT that sends all it's data quite fast
followed by a FIN. after that, it retransmits lost packets as needed.
the problem is, that for some reason, the connection-terminating FIN
(with the last ACK) from CLIENT isn't delivered to the RS (in some
cases), which keeps on sending the last packet until it gives up.
The following rules in FORWARD chain:
ACCEPT 0 -- 0.0.0.0/0 0.0.0.0/0 ctstate
ESTABLISHED
DROP 0 -- 0.0.0.0/0 0.0.0.0/0 ctstate
INVALID
LOG 0 -- 0.0.0.0/0 0.0.0.0/0 LOG flags 0
level 4 prefix `forward: '
Netfilter seems to be matching a lot of ESTABLISHED and some INVALID
packets. All those retransmissions from RS to CLIENT end up in the LOG
rule and get dropped, so for them no ctstate was found?
Packet traces (from external and internal interfaces: 1.2.3.4 VIP,
10.0.0.1 RIP, 4.3.2.1 CIP):
external:
13:34:04 4.3.2.1.9876 > 1.2.3.4.8888: S 3015053360:3015053360(0)
13:34:04 1.2.3.4.8888 > 4.3.2.1.9876: S 3950144430:3950144430(0) ack
3015053361
13:34:05 4.3.2.1.9876 > 1.2.3.4.8888: . ack 1
13:34:05 4.3.2.1.9876 > 1.2.3.4.8888: P 1:6(5) ack 1
13:34:05 1.2.3.4.8888 > 4.3.2.1.9876: . ack 6
13:34:05 1.2.3.4.8888 > 4.3.2.1.9876: P 1:6(5) ack 6
13:34:05 4.3.2.1.9876 > 1.2.3.4.8888: . ack 6
13:34:05 4.3.2.1.9876 > 1.2.3.4.8888: P 6:216(210) ack 6
13:34:05 1.2.3.4.8888 > 4.3.2.1.9876: . ack 216
13:34:06 4.3.2.1.9876 > 1.2.3.4.8888: P 216:323(107) ack 6
13:34:06 1.2.3.4.8888 > 4.3.2.1.9876: . ack 323
13:34:06 1.2.3.4.8888 > 4.3.2.1.9876: P 6:22(16) ack 323
13:34:06 1.2.3.4.8888 > 4.3.2.1.9876: . 22:1462(1440) ack 323
13:34:07 4.3.2.1.9876 > 1.2.3.4.8888: . ack 22
13:34:07 1.2.3.4.8888 > 4.3.2.1.9876: . 1462:2902(1440) ack 323
13:34:09 1.2.3.4.8888 > 4.3.2.1.9876: . 22:1462(1440) ack 323
13:34:10 4.3.2.1.9876 > 1.2.3.4.8888: . ack 1462
13:34:10 1.2.3.4.8888 > 4.3.2.1.9876: . 1462:2902(1440) ack 323
13:34:11 4.3.2.1.9876 > 1.2.3.4.8888: . ack 2902
13:34:11 1.2.3.4.8888 > 4.3.2.1.9876: . 2902:4342(1440) ack 323
13:34:15 1.2.3.4.8888 > 4.3.2.1.9876: . 2902:4342(1440) ack 323
...skip some...
13:34:21 1.2.3.4.8888 > 4.3.2.1.9876: . 60502:61942(1440) ack 323
13:34:21 1.2.3.4.8888 > 4.3.2.1.9876: . 61942:63382(1440) ack 323
13:34:21 1.2.3.4.8888 > 4.3.2.1.9876: FP 63382:64463(1081) ack 323
13:34:22 4.3.2.1.9876 > 1.2.3.4.8888: . ack 2902
13:34:22 4.3.2.1.9876 > 1.2.3.4.8888: . ack 2902
13:34:25 1.2.3.4.8888 > 4.3.2.1.9876: . 2902:4342(1440) ack 323
13:34:25 4.3.2.1.9876 > 1.2.3.4.8888: . ack 7222
13:34:25 1.2.3.4.8888 > 4.3.2.1.9876: . 7222:8662(1440) ack 323
13:34:43 1.2.3.4.8888 > 4.3.2.1.9876: . 7222:8662(1440) ack 323
13:34:44 4.3.2.1.9876 > 1.2.3.4.8888: . ack 8662
13:34:44 1.2.3.4.8888 > 4.3.2.1.9876: . 8662:10102(1440) ack 323
13:35:21 1.2.3.4.8888 > 4.3.2.1.9876: . 8662:10102(1440) ack 323
13:35:22 4.3.2.1.9876 > 1.2.3.4.8888: . ack 10102
13:35:22 1.2.3.4.8888 > 4.3.2.1.9876: . 10102:11542(1440) ack 323
13:35:22 4.3.2.1.9876 > 1.2.3.4.8888: . ack 11542
13:35:22 1.2.3.4.8888 > 4.3.2.1.9876: . 11542:12982(1440) ack 323
13:35:23 4.3.2.1.9876 > 1.2.3.4.8888: . ack 12982
13:35:23 1.2.3.4.8888 > 4.3.2.1.9876: . 12982:14422(1440) ack 323
13:35:24 4.3.2.1.9876 > 1.2.3.4.8888: . ack 14422
13:35:24 1.2.3.4.8888 > 4.3.2.1.9876: . 14422:15862(1440) ack 323
13:35:25 4.3.2.1.9876 > 1.2.3.4.8888: . ack 15862
13:35:25 1.2.3.4.8888 > 4.3.2.1.9876: . 15862:17302(1440) ack 323
13:35:25 4.3.2.1.9876 > 1.2.3.4.8888: . ack 17302
13:35:25 1.2.3.4.8888 > 4.3.2.1.9876: . 17302:18742(1440) ack 323
13:37:25 4.3.2.1.9876 > 1.2.3.4.8888: F 323:323(0) ack 17302
13:37:28 4.3.2.1.9876 > 1.2.3.4.8888: F 323:323(0) ack 17302
13:37:33 4.3.2.1.9876 > 1.2.3.4.8888: F 323:323(0) ack 17302
13:37:45 4.3.2.1.9876 > 1.2.3.4.8888: F 323:323(0) ack 17302
13:38:09 4.3.2.1.9876 > 1.2.3.4.8888: F 323:323(0) ack 17302
internal:
13:34:05 4.3.2.1.9876 > 10.0.0.1.8888: . ack 1
13:34:05 4.3.2.1.9876 > 10.0.0.1.8888: P 1:6(5) ack 1
13:34:05 10.0.0.1.8888 > 4.3.2.1.9876: . ack 6
13:34:05 10.0.0.1.8888 > 4.3.2.1.9876: P 1:6(5) ack 6
13:34:05 4.3.2.1.9876 > 10.0.0.1.8888: . ack 6
13:34:05 4.3.2.1.9876 > 10.0.0.1.8888: P 6:216(210) ack 6
13:34:05 10.0.0.1.8888 > 4.3.2.1.9876: . ack 216
13:34:06 4.3.2.1.9876 > 10.0.0.1.8888: P 216:323(107) ack 6
13:34:06 10.0.0.1.8888 > 4.3.2.1.9876: . ack 323
13:34:06 10.0.0.1.8888 > 4.3.2.1.9876: P 6:22(16) ack 323
13:34:06 10.0.0.1.8888 > 4.3.2.1.9876: . 22:1462(1440) ack 323
13:34:07 4.3.2.1.9876 > 10.0.0.1.8888: . ack 22
13:34:07 10.0.0.1.8888 > 4.3.2.1.9876: . 1462:2902(1440) ack 323
13:34:09 10.0.0.1.8888 > 4.3.2.1.9876: . 22:1462(1440) ack 323
13:34:10 4.3.2.1.9876 > 10.0.0.1.8888: . ack 1462
13:34:10 10.0.0.1.8888 > 4.3.2.1.9876: . 1462:2902(1440) ack 323
13:34:11 4.3.2.1.9876 > 10.0.0.1.8888: . ack 2902
13:34:11 10.0.0.1.8888 > 4.3.2.1.9876: . 2902:4342(1440) ack 323
13:34:15 10.0.0.1.8888 > 4.3.2.1.9876: . 2902:4342(1440) ack 323
...skip some...
13:34:21 10.0.0.1.8888 > 4.3.2.1.9876: . 60502:61942(1440) ack 323
13:34:21 10.0.0.1.8888 > 4.3.2.1.9876: . 61942:63382(1440) ack 323
13:34:21 10.0.0.1.8888 > 4.3.2.1.9876: FP 63382:64463(1081) ack 323
13:34:22 4.3.2.1.9876 > 10.0.0.1.8888: . ack 2902
13:34:22 4.3.2.1.9876 > 10.0.0.1.8888: . ack 2902
13:34:25 10.0.0.1.8888 > 4.3.2.1.9876: . 2902:4342(1440) ack 323
13:34:25 4.3.2.1.9876 > 10.0.0.1.8888: . ack 7222
13:34:25 10.0.0.1.8888 > 4.3.2.1.9876: . 7222:8662(1440) ack 323
13:34:43 10.0.0.1.8888 > 4.3.2.1.9876: . 7222:8662(1440) ack 323
13:34:44 4.3.2.1.9876 > 10.0.0.1.8888: . ack 8662
13:34:44 10.0.0.1.8888 > 4.3.2.1.9876: . 8662:10102(1440) ack 323
13:35:21 10.0.0.1.8888 > 4.3.2.1.9876: . 8662:10102(1440) ack 323
13:35:22 4.3.2.1.9876 > 10.0.0.1.8888: . ack 10102
13:35:22 10.0.0.1.8888 > 4.3.2.1.9876: . 10102:11542(1440) ack 323
13:35:22 4.3.2.1.9876 > 10.0.0.1.8888: . ack 11542
13:35:22 10.0.0.1.8888 > 4.3.2.1.9876: . 11542:12982(1440) ack 323
13:35:23 4.3.2.1.9876 > 10.0.0.1.8888: . ack 12982
13:35:23 10.0.0.1.8888 > 4.3.2.1.9876: . 12982:14422(1440) ack 323
13:35:24 4.3.2.1.9876 > 10.0.0.1.8888: . ack 14422
13:35:24 10.0.0.1.8888 > 4.3.2.1.9876: . 14422:15862(1440) ack 323
13:35:25 4.3.2.1.9876 > 10.0.0.1.8888: . ack 15862
13:35:25 10.0.0.1.8888 > 4.3.2.1.9876: . 15862:17302(1440) ack 323
13:35:25 4.3.2.1.9876 > 10.0.0.1.8888: . ack 17302
13:35:25 10.0.0.1.8888 > 4.3.2.1.9876: . 17302:18742(1440) ack 323
13:36:38 10.0.0.1.8888 > 4.3.2.1.9876: . 17302:18742(1440) ack 323
As seen, the RS keeps on trying to send the last packet while CLIENT
keeps on trying to send the FIN.
I'm not entirely sure if I was able to read the said information fast
enough (lots of connections, big tables) but it seems that at that time
ipvsadm -L --connection shows that connection in "FIN_WAIT" while
/proc/net/ip_conntrack does not have an entry for it at all.
There is also a variation of this issue, where the final FIN is
delivered from CLIENT to RS, but the RS's ACK isn't delivered to the
CLIENT, so the client still keeps on sending FINs. In that case, ipvsadm
shows the connection in "TIME_WAIT" state (still nothing in conntrack).
Alltogether, a few percent of connections is affected by this. My
interpetation is, that for some reason LVS code seems to remove the
conntrack immediately when a final FIN is seen and stops forwarding
packets after that. My iptables rules stop the answers going out,
because the connection is no longer ESTABLISHED.
Siim
|