LVS losing connection info of client/server after some time?

To:	lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject:	LVS losing connection info of client/server after some time?
Cc:	rp@xxxxxxxxxxxxxxxxxx
From:	Ray Pitmon <rp@xxxxxxxxxxxxxxxxxx>
Date:	Tue, 14 Nov 2000 18:36:20 -0600 (CST)

Hi,


I am using LVS-DR to load balance a TCP-based application, and it appears that
the LVS is losing connection information.  (timing out and being removed from
the connection hash table?) (At the end I've included some tcpdump/snoop info)

Is there any way to dump the connection table to see if that's really the case?

We have a client app that makes a connection to the server app, running on port
12000.

Everything works great except that over time, when I do a netstat on the server,
I see alot of connections in "ESTABLISHED", but when I do a netstat on the
client, it only shows the number of connections that are really there (only 1
per client)
I think that the server app could use TCP_KEEPALIVE to remedy this situation,
but my developers say that it isn't supported in the version of JAVA they are
using(1.2.2 I believe)

Our client app is designed to hold a connection open to the server indefinately,
while it may not transmit data over that connection for a very long time.
A couple of possible resolutions..?

1.  If it's an LVS timeout issue, crank up the right timeout value ( > 1 day)
    (not sure which one it is??) 

2.  Modify the client to send a "hello there" to the server every so often.
    (developers not happy about this one)

3.  Upgrade JAVA, use TCP_KEEPALIVE on the server app.


FYI-  the client is linux, LVS is RedHat(2.2.16-3 kernel), and the realserver is
solaris.

Any suggestions?  comments?

Thanks,

-Ray


Other Details:

First some IPVSADM info: (It's a test env, that's why there's only 1 server.

[root@lvs1 rayp]# ipvsadm
IP Virtual Server version 0.9.12 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port          Forward Weight ActiveConn InActConn
TCP  dc84.digitalcyclone.com:12000 wlc
  -> dc29.digitalcyclone.com:12000 Route   1      0          0


For a test, I killed one of the clients off:

**STEP 1: determine which port(1939) to monitor and which PID(1899) to kill.

[dc@sdcs ~]$ netstat -ap | grep 12000
tcp        0      0 sdcs:1939 gridsvr:12000 ESTABLISHED 1899/dcs            
[dc@sdcs ~]$ kill 1899
**NOW IT'S DEAD (or dying anyway)
[dc@sdcs ~]$ netstat -ap | grep 12000
tcp        0     62 sdcs:1939 gridsvr:12000 FIN_WAIT1   -                   


**STEP 2: netstat on gridsvr (notice that it's still there in ESTABLISHED, will
be there forever)

gridsvr# netstat -a | grep 1939
gridsvr.12000 sdcs.1939 32120      0 10136      0 ESTABLISHED


**STEP 3: Here are the TCPDUMPS running on the 3 machines involved (while I
issue the kill of the client)

** ON THE CLIENT.  packets go out to gridsvr
[root@sdcs /root]# tcpdump -i eth0 tcp port 1939 
Kernel filter, protocol ALL, datagram packet socket
tcpdump: listening on eth0
16:42:02.130022 > sdcs.1939 > gridsvr.12000: P 483508496:483508557(61) ack
1618010192 win 32120 <no)

16:44:02.130022 > sdcs.1939 > gridsvr.12000: P 0:61(61) ack 1 win 32120
<nop,nop,timestamp 71571933 249897)

16:46:02.130022 > sdcs.1939 > gridsvr.12000: P 0:61(61) ack 1 win 32120
<nop,nop,timestamp 71583933 249897)

** ON THE LVS.  packets come into LVS.
[root@lvs1 rayp]# tcpdump -i eth0 tcp port 1939 and host 216.245.140.89
Kernel filter, protocol ALL, datagram packet socket
tcpdump: listening on eth0
05:43:36.735041 < sdcs.1939 > gridsvr.com.12000: P 483508496:483508557(61) ack
1618010192 win 32120 <nop,nop,timestamp 71559933 249897356> (DF)

05:45:36.741938 < sdcs.1939 > gridsvr.12000: P 0:61(61) ack 1 win 32120
<nop,nop,timestamp 71571933 249897356> (DF)

05:47:36.748837 < sdcs.1939 > gridsvr.12000: P 0:61(61) ack 1 win 32120
<nop,nop,timestamp 71583933 249897356> (DF)

** ON THE SERVER.  (anyone home...?)
gridsvr# snoop tcp port 1939
Using device /dev/hme (promiscuous mode)

(nothing)

<Prev in Thread]	Current Thread	[Next in Thread>
LVS losing connection info of client/server after some time?, Ray Pitmon <= Re: LVS losing connection info of client/server after some time?, Julian Anastasov

Previous by Date:	Re: lvs questions, Horms
Next by Date:	Re: Can M$ Win 2000 be a Real Server ?, Wenzhuo Zhang
Previous by Thread:	Re: lvs questions, Horms
Next by Thread:	Re: LVS losing connection info of client/server after some time?, Julian Anastasov
Indexes:	[Date] [Thread] [Top] [All Lists]