Hello,
I have a LVS Director listening to about 20 IP's, and forwarding the
requests for HTTP/HTTPS/SSH/FTP/POP3 etc to 7 different real servers,
Linux and Microsoft alike.
What schedulers do you use? wrr? wlc?
Not quite sure if I am correct, but I really think it is load related,
because it mostly only happens when my websites take quite a knock.
Interesting.
Here is a normal tcp connection, using tcpdump -i any host [client ip]
port 80 on the director :
Could you maybe either add -n to tcpdump or sed the output for me
please, next time :). I'm lost with names in tcpdump output, too many
characters to read it fluently.
[correct tcpdump interpretation]
And here is what happens every now and again when things go wrong:
16:55:35.406321 pc-2178249.unisa.ac.za.48704 > www2.unisa.ac.za.http: S
244216559:244216559(0) win 5840 <mss 1460,sackOK,timestamp 15490025
0,nop,wscale 7> (DF) # Client tries to connect to www2.unisa.ac.za by
sending a SYN packet
16:55:35.406340 pc-2178249.unisa.ac.za.48704 >
umweb2.cluster.unisa.ac.za.http: S 244216559:244216559(0) win 5840 <mss
1460,sackOK,timestamp 15490025 0,nop,wscale 7> (DF) # IPVS rewrites the
packet destination to the real server, umweb2.cluster.unisa.ac.za
16:55:35.406424 umweb2.cluster.unisa.ac.za.http >
pc-2178249.unisa.ac.za.48704: S 424874716:424874716(0) ack 244216560 win
65535 <mss 1460,nop,wscale 0,nop,nop,timestamp 0 0,nop,nop,sackOK> (DF)
# Real server responds correctly with a SYN accompanied by an ACK for
the original SYN
16:55:35.406521 ulweb4.unisa.ac.za.http > pc-2178249.unisa.ac.za.48704:
S 424874716:424874716(0) ack 244216560 win 65535 <mss 1460,nop,wscale
0,nop,nop,timestamp 0 0,nop,nop,sackOK> (DF)
Odd! Looks almost like a bucket lookup bug.
# But IPVS rewrites the
packet incorrectly, and now the packet seems to come from a different
host ???? The VIP it uses here is a valid VIP on the director, but there
is no reason why it should use this VIP, and not www2.unisa.ac.za, which
was the originally requested VIP in the first place...
Would it be possible for you to capture this trace again but during this
time also enable vs_debug in proc-fs? Also the output of
/proc/net/ip_vs_conn and /proc/net/ip_vs with the relevant IPs in question?
16:55:35.406731 pc-2178249.unisa.ac.za.48704 > ulweb4.unisa.ac.za.http:
R 244216560:244216560(0) win 0 (DF) # Client gets a SYN/ACK from an
unknown host, not related to the original request, and therefore sends
the RESET back to the sending host
Exactly.
16:55:35.406760 pc-2178249.unisa.ac.za.48704 >
umweb2.cluster.unisa.ac.za.http: R 244216560:244216560(0) win 0 (DF) #
I'm guessing that iptables nat rewrites the RESET to route back to the
original sender.
Hmmm, I don't know if netfilter is involved in that part. But maybe
you're right, since the lookup for a service template for ulweb4 is
definitely going to fail for an established connection, the conntrack
should take care of it and send it to umweb2. However to be honest, I'm
not sure here.
This is causing hanging and timeouts from the clients (the whole
world). Really an urgent issue.
So you have persistency on your services? This of course adds to the hangs.
Does anyone have any advice?
Not yet, but something looks extremely fishy.
Best regards,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
|