I was writing this earlier as a problem report but since I solved it, I
thought others may benefit from my waste of time.
I had a simple LVS-NAT configuration where LVS lives on a gateway:
CIP - VIP/LVS - RIP
LVS does run iptables but fairly open, INPUT allows all incoming traffic
for the VIP, OUTPUT allows NEW,ESTABLISHED,RELATED states and FORWARD is
open (for what it's worth, I think ipvs does not go through there at
all). So that worked fine.
Until I noticed the real server has many connections in FIN_WAIT2 state.
They have the same timeout as TIME_WAIT so I was gonna let it go, but
then I looked at the client and all of them were in LAST_ACK state. The
client kept resending FIN-ACKs, none of which made it to the server at
all. On the LVS, ipvsadm -Lc shows connections in TIME_WAIT state so it
did get them.
Well, long story short, the OUTPUT chain blocked *only* that FIN-ACK
packet for some odd reason. I was sure that ipvs is shortcircuiting
iptables and bypassing OUTPUT, but I guess I misinterpreted the little
map in the HOWTO. All the other packets matched the "NEW" rule. This
would would have ended up as INVALID probably. I am now adding rules to
allow all OUTPUT towards the RIPs, stateless.
This is how it looked from the network side:
Incoming traffic to LVS (CIP->VIP)
0.016862 CIP -> VIP HTTP HEAD / HTTP/1.1
0.017193 VIP -> CIP [ACK] Seq=1 Ack=117 Win=5888 Len=0
0.021949 VIP -> CIP [TCP segment of a reassembled PDU]
0.022173 VIP -> CIP [FIN, ACK] Seq=195 Ack=117 Win=5888 Len=0
0.034046 CIP -> VIP [ACK] Seq=117 Ack=195 Win=6912 Len=0
*0.046042 CIP -> VIP [FIN, ACK] Seq=117 Ack=196 Win=6912 Len=0*
the above packet does not make it, beyond here retransmits only
0.250217 VIP -> CIP [FIN, ACK] Seq=195 Ack=117 Win=5888 Len=0
0.260110 CIP -> VIP [FIN, ACK] Seq=117 Ack=196 Win=6912 Len=0
0.267333 CIP -> VIP [TCP Dup ACK 11#1] [ACK] Seq=118 Ack=196
Win=6912 Len=0 SLE=195 SRE=196
0.705855 CIP -> VIP [FIN, ACK] Seq=117 Ack=196 Win=6912 Len=0
Coming out the other end towards RIP:
0.016847 CIP -> RIP HTTP HEAD / HTTP/1.1
0.017119 RIP -> CIP [ACK] Seq=1 Ack=117 Win=5888 Len=0
0.021873 RIP -> CIP [TCP segment of a reassembled PDU]
0.022115 RIP -> CIP [FIN, ACK] Seq=195 Ack=117 Win=5888 Len=0
0.034021 CIP -> RIP [ACK] Seq=117 Ack=195 Win=6912 Len=0
only two retransmits seen:
0.250147 RIP -> CIP [FIN, ACK] Seq=195 Ack=117 Win=5888 Len=0
0.267312 CIP -> RIP [TCP Previous segment lost] [ACK] Seq=118
Ack=196 Win=6912 Len=0 SLE=195 SRE=196
The kernel is 2.6.20 with ipvsadm 1.24 (Fedora 5).
--
Laurentiu
lc.vcf
Description: Vcard
|