On Wed, 27 Feb 2002, Roberto Nibali wrote:
> Hi,
>
> Hayden Myers wrote:
> > We've been using the lvs project for loadbalancing for quite some time
> > now. We've experienced minimal problems until yesterday. For some reason
> > our director starts routing traffic to only one of the real servers in our
> > pool. ipvsadm output show almost all 0's for all servers. Normal numbers
>
> Zero's for what? Active connections? Could you please send an ipvsadm -L
> -n before and after this problem occurs?
Active connections and Inactive connections show all zeros with the
exception of one real server that shows up to 4 connections. Here's the
ipvsadm output when working correctly after a reboot.
IP Virtual Server version 1.0.8 (size=524288)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 216.151.100.246:443 rr persistent 600
-> 216.151.100.253:443 Route 1 0 0
TCP 216.151.100.246:5555 rr
-> 216.151.100.251:5555 Route 1 5 0
TCP 216.151.100.246:80 rr
-> 216.151.100.253:80 Route 1 300 7984
-> 216.151.100.252:80 Route 1 324 8218
When the phones start ringing, the output is the same except activeconn
and inactconn report zero with the exception of 216.151.100.252 which
reports 0-3 activeconn and 0 inactconn.
>
> > are around 400 active connections for each active real server. We were
> > using an older 2.2.16 lvs kernel, and updated the system to a 2.2.19
> > kernel with ipvsadm 1.0.8 yesterday. We did recently move our realservers
> > to a 2.4 kernel which may be something to look at. We use the direct
> > routing method and the 2.4 kernels are patched with julians hidden patch.
>
> Is it enabled? (Sorry to ask this, but it sounds like you've an arp problem)
>
/proc/sys/net/ipv4/conf/all/hidden = 1
> > Nothing but rebooting the director seems to get rid of the problem.
>
> That I doubt, but lets have a clean analysis.
Nothing is a very strong word. I doubt this as well but it's all I've
seen that fixes the problem.
>
> > Clearing the ipvsadm table and reloading doesn't do much. I just turned
> > up the debug_level to see if anything pops up. Has anybody seen this
> > before? Any suggestions on where else to look?
>
> Can you send a tcpdump trace of a session, please? Maybe if you already
> enabled debugging, you could send a part of the output (the one, when a
> session is initiated)
>
> Regards,
> Roberto Nibali, ratz
>
Here's a dump from the director taking a request for the farm on
216.151.100.246:80
Kernel filter, protocol ALL, datagram packet socket
tcpdump: listening on all devices
16:30:48.336599 eth0 < 64.42.115.38.42710 > 216.151.100.246.www: S
1546816788:1546816788(0) win 5840 <mss 1460,sackOK,timestamp 379374974
0,nop,wscale 0> (DF)
16:30:48.336696 eth0 > 64.42.115.38.42710 > 216.151.100.246.www: S
1546816788:1546816788(0) win 5840 <mss 1460,sackOK,timestamp 379374974
0,nop,wscale 0> (DF)
16:30:48.391330 eth0 < 64.42.115.38.42710 > 216.151.100.246.www: .
1546816789:1546816789(0) ack 889182055 win 5840 <nop,nop,timestamp
379374980 56808216> (DF)
16:30:48.391376 eth0 > 64.42.115.38.42710 > 216.151.100.246.www: . 0:0(0)
ack 1 win 5840 <nop,nop,timestamp 379374980 56808216> (DF)
16:30:48.404614 eth0 < 64.42.115.38.42710 > 216.151.100.246.www: P
0:384(384) ack 1 win 5840 <nop,nop,timestamp 379374980 56808216> (DF)
16:30:48.404660 eth0 > 64.42.115.38.42710 > 216.151.100.246.www: P
0:384(384) ack 1 win 5840 <nop,nop,timestamp 379374980 56808216> (DF)
16:30:48.460961 eth0 < 64.42.115.38.42710 > 216.151.100.246.www: .
384:384(0) ack 16 win 5840 <nop,nop,timestamp 379374987 56808223> (DF)
16:30:48.461005 eth0 > 64.42.115.38.42710 > 216.151.100.246.www: .
384:384(0) ack 16 win 5840 <nop,nop,timestamp 379374987 56808223> (DF)
16:30:48.475328 eth0 < 64.42.115.38.42710 > 216.151.100.246.www: .
384:384(0) ack 1464 win 8688 <nop,nop,timestamp 379374988 56808224> (DF)
16:30:48.475375 eth0 > 64.42.115.38.42710 > 216.151.100.246.www: .
384:384(0) ack 1464 win 8688 <nop,nop,timestamp 379374988 56808224> (DF)
16:30:48.524091 eth0 < 64.42.115.38.42710 > 216.151.100.246.www: .
384:384(0) ack 2912 win 11584 <nop,nop,timestamp 379374993 56808229> (DF)
16:30:48.524149 eth0 > 64.42.115.38.42710 > 216.151.100.246.www: .
384:384(0) ack 2912 win 11584 <nop,nop,timestamp 379374993 56808229> (DF)
16:30:48.527235 eth0 < 64.42.115.38.42710 > 216.151.100.246.www: .
384:384(0) ack 4360 win 14480 <nop,nop,timestamp 379374993 56808229> (DF)
16:30:48.527279 eth0 > 64.42.115.38.42710 > 216.151.100.246.www: .
384:384(0) ack 4360 win 14480 <nop,nop,timestamp 379374993 56808229> (DF)
16:30:48.537414 eth0 < 64.42.115.38.42710 > 216.151.100.246.www: .
384:384(0) ack 5808 win 17376 <nop,nop,timestamp 379374994 56808230> (DF)
16:30:48.537458 eth0 > 64.42.115.38.42710 > 216.151.100.246.www: .
384:384(0) ack 5808 win 17376 <nop,nop,timestamp 379374994 56808230> (DF)
16:30:48.541646 eth0 < 64.42.115.38.42710 > 216.151.100.246.www: .
384:384(0) ack 7256 win 20272 <nop,nop,timestamp 379374994 56808230> (DF)
16:30:48.541689 eth0 > 64.42.115.38.42710 > 216.151.100.246.www: .
384:384(0) ack 7256 win 20272 <nop,nop,timestamp 379374994 56808230> (DF)
16:30:48.586804 eth0 < 64.42.115.38.42710 > 216.151.100.246.www: .
384:384(0) ack 8704 win 23168 <nop,nop,timestamp 379374999 56808235> (DF)
16:30:48.586848 eth0 > 64.42.115.38.42710 > 216.151.100.246.www: .
384:384(0) ack 8704 win 23168 <nop,nop,timestamp 379374999 56808235> (DF)
16:30:48.591695 eth0 < 64.42.115.38.42710 > 216.151.100.246.www: F
384:384(0) ack 9449 win 26064 <nop,nop,timestamp 379374999 56808235> (DF)
16:30:48.591800 eth0 > 64.42.115.38.42710 > 216.151.100.246.www: F
384:384(0) ack 9449 win 26064 <nop,nop,timestamp 379374999 56808235> (DF)
Here's a dump from the real server 216.151.100.253:80 following the above
request.
tcpdump: listening on eth0
14:12:27.309782 64.42.115.38.42710 > 216.151.100.246.http: S
1546816788:1546816788(0) win 5840 <mss 1460,sackOK,timestamp 379374974
0,nop,wscale 0> (DF)
14:12:27.309813 216.151.100.246.http > 64.42.115.38.42710: S
889182054:889182054(0) ack 1546816789 win 5792 <mss 1460,sackOK,timestamp
56808216 379374974,nop,wscale 0> (DF)
14:12:27.364463 64.42.115.38.42710 > 216.151.100.246.http: . ack 1 win
5840 <nop,nop,timestamp 379374980 56808216> (DF)
14:12:27.377807 64.42.115.38.42710 > 216.151.100.246.http: P 1:385(384)
ack 1 win 5840 <nop,nop,timestamp 379374980 56808216> (DF)
14:12:27.377840 216.151.100.246.http > 64.42.115.38.42710: . ack 385 win
6432 <nop,nop,timestamp 56808223 379374980> (DF)
14:12:27.378621 216.151.100.246.http > 64.42.115.38.42710: P 1:16(15) ack
385 win 6432 <nop,nop,timestamp 56808223 379374980> (DF)
14:12:27.382864 216.151.100.246.http > 64.42.115.38.42710: . 16:1464(1448)
ack 385 win 6432 <nop,nop,timestamp 56808224 379374980> (DF)
14:12:27.434084 64.42.115.38.42710 > 216.151.100.246.http: . ack 16 win
5840 <nop,nop,timestamp 379374987 56808223> (DF)
14:12:27.434108 216.151.100.246.http > 64.42.115.38.42710: .
1464:2912(1448) ack 385 win 6432 <nop,nop,timestamp 56808229 379374987>
(DF)
14:12:27.434113 216.151.100.246.http > 64.42.115.38.42710: .
2912:4360(1448) ack 385 win 6432 <nop,nop,timestamp 56808229 379374987>
(DF)
14:12:27.448461 64.42.115.38.42710 > 216.151.100.246.http: . ack 1464 win
8688 <nop,nop,timestamp 379374988 56808224> (DF)
14:12:27.448496 216.151.100.246.http > 64.42.115.38.42710: .
4360:5808(1448) ack 385 win 6432 <nop,nop,timestamp 56808230 379374988>
(DF)
14:12:27.448501 216.151.100.246.http > 64.42.115.38.42710: .
5808:7256(1448) ack 385 win 6432 <nop,nop,timestamp 56808230 379374988>
(DF)
14:12:27.500357 64.42.115.38.42710 > 216.151.100.246.http: . ack 4360 win
14480 <nop,nop,timestamp 379374993 56808229> (DF)
14:12:27.510547 64.42.115.38.42710 > 216.151.100.246.http: . ack 5808 win
17376 <nop,nop,timestamp 379374994 56808230> (DF)
14:12:27.514784 64.42.115.38.42710 > 216.151.100.246.http: . ack 7256 win
20272 <nop,nop,timestamp 379374994 56808230> (DF)
14:12:27.559938 64.42.115.38.42710 > 216.151.100.246.http: . ack 8704 win
23168 <nop,nop,timestamp 379374999 56808235> (DF)
14:12:27.564873 64.42.115.38.42710 > 216.151.100.246.http: F 385:385(0)
ack 9449 win 26064 <nop,nop,timestamp 379374999 56808235> (DF)
14:12:27.564891 216.151.100.246.http > 64.42.115.38.42710: . ack 386 win
6432 <nop,nop,timestamp 56808242 379374999> (DF)
I'm not familiar with the ipvs debug output. I set debug_level to 7. A
lot of the output looks truncated for some reason. I don't know what
corresponds to my connection so here's a chunk of what's currently going
on.
Feb 27 16:56:51 lb kernel: 7>IPVS: look/in TCP
205.162.233.107:1424->216.151.100.246:80 hit
Feb 27 16:56:54 lb kernel: hit
Feb 27 16:57:01 lb kernel: m 0 TCP 216.151.100.246:80 hit
Feb 27 16:57:01 lb kernel: service fwm 0 TCP 216.151.100.246:80 hit
Feb 27 16:57:03 lb kernel: .246:80 hit
Feb 27 16:57:03 lb kernel: 00.253:80 activeconns 280 refcnt 9459 weight 1
Feb 27 16:57:05 lb kernel:
Feb 27 16:57:06 lb kernel: VS: look/in TCP
208.188.43.195:2200->216.151.100.246:80 hit
Feb 27 16:57:07 lb kernel: IPVS: lookup_service fwm 0 TCP
216.151.100.246:80 hit
Feb 27 16:57:16 lb kernel: S: lookup_service fwm 0 TCP 216.151.100.246:80
hit
Feb 27 16:57:18 lb kernel: 253.253:25717->216.151.100.246:80 hit
Feb 27 16:57:19 lb kernel: okup_service fwm 0 TCP 216.151.100.246:80 hit
Feb 27 16:57:20 lb kernel: 0.246:80 hit
Feb 27 16:57:22 lb kernel: conns 304 refcnt 9306 weight 1
Feb 27 16:57:23 lb kernel: 3.102:16224->216.151.100.246:80 hit
Feb 27 16:57:28 lb kernel: :80 activeconns 345 refcnt 9285 weight 1
Feb 27 16:57:31 lb kernel: 151.100.246:80 hit
Feb 27 16:57:40 lb kernel: onns 334 refcnt 9315 weight 1
Feb 27 16:57:55 lb kernel: .151.100.246:80 hit
Feb 27 16:57:56 lb kernel:
Feb 27 16:57:57 lb kernel: ervice fwm 0 TCP 216.151.100.246:80 hit
Feb 27 16:57:58 lb kernel: hit
Feb 27 16:57:59 lb kernel: fcnt 9402 weight 1
Feb 27 16:58:01 lb kernel: .100.246:80 hit
Feb 27 16:58:01 lb kernel: hit
Feb 27 16:58:02 lb kernel: : lookup_service fwm 0 TCP 216.151.100.246:80
hit
Feb 27 16:58:02 lb kernel: ng...
Feb 27 16:58:03 lb kernel: 216.151.100.246:80 hit
Feb 27 16:58:05 lb kernel: 16.151.100.246:80 hit
Feb 27 16:58:06 lb kernel: >IPVS: lookup_service fwm 0 TCP
216.151.100.2>IPVS: lookup_service fwm 0 TCP 216.151.100.246:80 hit
Feb 27 16:58:07 lb kernel: ice fwm 0 TCP 216.151.100.246:80 hit
Feb 27 16:58:08 lb kernel: : look/in TCP
129.7.47.118:1700->216.151.100.246:80 not hit
Feb 27 16:58:09 lb kernel: 0 TCP 216.151.100.246:80 hit
Feb 27 16:58:09 lb kernel: <729:1359->216.151.100.246:80 hit
Feb 27 16:58:10 lb kernel: 16.151.100.252:80 activeconns 343 refcnt 9439
weight 1
Feb 27 16:58:11 lb kernel: TCP 65.58.153.251:1255->216.151.100.246:80 hit
Feb 27 16:58:13 lb kernel: .151.100.253:80 activeconns 316 refcnt 9447
weight 1 Feb 27 16:58:18 lb kernel: ip_vs_rr_schedule(): Scheduling...
Feb 27 16:58:19 lb kernel: _service fwm 0 TCP 216.151.100.246:80 hit
Feb 27 16:58:21 lb kernel: 0.252:80 activeconns 333 refcnt 9371 weight 1
Feb 27 16:58:22 lb kernel: 6.151.100.246:80 hit
Feb 27 16:58:23 lb kernel: : look/in TCP
66.99.108.21:1285->216.151.100.246:80 hit
Feb 27 16:58:26 lb kernel: 80 hit
Feb 27 16:58:27 lb kernel: dule(): Scheduling...
Feb 27 16:58:29 lb kernel: 0 TCP 216.151.100.246:80 hit
Feb 27 16:58:34 lb kernel: : Scheduling...
Feb 27 16:58:38 lb kernel: .100.246:80 hit
Feb 27 16:58:41 lb kernel: 6.151.100.246:80 hit
Feb 27 16:58:43 lb kernel: 16.151.100.246:80 hit
Feb 27 16:58:44 lb kernel: PVS: lookup_service fwm 0 TCP
216.151.100.246:80 hit Feb 27 16:58:45 lb kernel: ook/in TCP
80.13.202.169:3953->216.151.100.246:80 hit
Feb 27 16:58:47 lb kernel: 151.76.71:1836->216.151.100.246:80 hit
Feb 27 16:58:49 lb kernel: .111.90:3465->216.151.100.246:80 hit
Feb 27 16:58:50 lb kernel: 0 hit
Feb 27 16:58:53 lb kernel: fwm 0 TCP 216.151.100.246:80 hit
Feb 27 16:58:56 lb kernel: 46:80 hit
Feb 27 16:59:02 lb kernel: wm 0 TCP 216.151.100.246:80 hit
Feb 27 16:59:07 lb kernel: 0 TCP 216.151.100.246:80 hit
Feb 27 16:59:09 lb kernel: 151.100.246:80 hit
I hope this helps. I don't have any output from when the problem is
occuring, but I'm sure I will see it again. I'll be sure to note the
debugging information and record some tcpdump info. Let me know if you
need anything else.
Hayden Myers
Support Manager
Skyline Network Technologies
hayden@xxxxxxxxxxx
(888)917-1600x120
|