LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: Director resets existing tcp connection

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: Director resets existing tcp connection
From: Roberto Nibali <ratz@xxxxxxxxxxxx>
Date: Thu, 19 Jan 2006 21:39:25 +0100
From time to time, my director sends a tcp reset packet into an established connection, thus terminating the connection on the initiators side. The balanced realserver doesn't know anything about the reset packet and tries to resend its last unacknowledged tcp packet until it times out.

Let's see.

Below are the dumps from the database server, the database load balancer and the webserver, respectivly.

Handshake and initial communication is normal until 12:17:57.493527 where the database server sends a large packet with 2896 bytes.

It is fragmented and arrives at the webserver at timecode 12:17:57.727493 / 12:17:57.727494.

An acknowledge is sent at 12:17:57.729146 which arrives at the loadbalancer at 12:18:05.873445. But instead of sending the ack packet to the database server, the loadbalancer issues a TCP reset at 12:18:05.873464!!!

Hmm, haven't looked too deep into your dumps, but how do you explain an 8s delay between webserver and load balancer? Are your servers time synchronised?

The reset immediatly closes the connection on the webserver, but it's still open on the database server which now tries to resend the packet for a while. This happens not on every connection with 2896 byte large packets, but if it happens, is is always the case that the director sends the tcp reset as a reply to the ack packet which should acknowledge the arrival of this 2896 byte packet.

Any ideas?

Would it be possible to use at least a 2.6.15 kernel, so at least the IPVS related bugs can be ruled out and we can send you off to netdev :).

DATABASE SERVER
12:17:49.549014 IP webserver.57289 > dbv1.mysql: P 86:148(62) ack 77 win 1460 <nop,nop,timestamp 667148057 50890978> 12:17:49.586863 IP dbv1.mysql > webserver.57289: . ack 148 win 1448 <nop,nop,timestamp 50890982 667148057> 12:17:57.493527 IP dbv1.mysql > webserver.57289: . 77:2973(2896) ack 148 win 1448 <nop,nop,timestamp 50891772 667148057> 12:17:57.697127 IP dbv1.mysql > webserver.57289: . 77:1525(1448) ack 148 win

TCP/RST: none seen (different collision domain?)

LOADBALANCER
12:17:57.926982 IP webserver.57289 > dbv1.mysql: P 85:147(62) ack 77 win 1460 <nop,nop,timestamp 667148057 50890978> 12:18:05.873445 IP webserver.57289 > dbv1.mysql: . ack 2973 win 2908 <nop,nop,timestamp 667150044 50891772> 12:18:05.873464 IP dbv1.mysql > webserver.57289: R 2926802478:2926802478(0)

TCP/RST: 12:18:05.873464

WEBSERVER
12:17:49.819829 IP dbv1.mysql > webserver.57289: . ack 148 win 1448 <nop,nop,timestamp 50890982 667148057> 12:17:57.727493 IP dbv1.mysql > webserver.57289: . 77:1525(1448) ack 148 win 1448 <nop,nop,timestamp 50891772 667148057> 12:17:57.727494 IP dbv1.mysql > webserver.57289: . 1525:2973(1448) ack 148 win 1448 <nop,nop,timestamp 50891772 667148057> 12:17:57.729146 IP webserver.57289 > dbv1.mysql: . ack 2973 win 2908 <nop,nop,timestamp 667150044 50891772> 12:17:57.729366 IP dbv1.mysql > webserver.57289: R 2926802478:2926802478(0)

TCP/RST: 12:17:57.729366

To me it looks like the web server is sending the RST and not the director. I need to think about your setup again, it's a bit unconventional. Please also consider replying to other's requests. Also dump using -e to get information on the mac address.

Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

<Prev in Thread] Current Thread [Next in Thread>