Re: Very strange behavior

To:	lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject:	Re: Very strange behavior
From:	Roberto Nibali <ratz@xxxxxxxxxxxx>
Date:	Thu, 28 Feb 2002 11:21:46 +0100

Hi,

Active connections and Inactive connections show all zeros with the
exception of one real server that shows up to 4 connections.  Here's the
ipvsadm output when working correctly after a reboot.

IP Virtual Server version 1.0.8 (size=524288)

                                        ^^^^^^
This is a rather high hash table size, you don't need to set it so high.

Prot LocalAddress:Port Scheduler Flags-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP  216.151.100.246:443 rr persistent 600
-> 216.151.100.253:443 Route 1 0 0TCP 216.151.100.246:5555 rr-> 216.151.100.251:5555 Route 1 5 0TCP 216.151.100.246:80 rr-> 216.151.100.253:80 Route 1 300 7984-> 216.151.100.252:80 Route 1 324 8218
When the phones start ringing, the output is the same except activeconn

           ^^^^^^^^^^^^^^^^^^^^
      How can I understand/interpret that?

and inactconn report zero with the exception of 216.151.100.252 which
reports 0-3 activeconn and 0 inactconn.

Just to make sure: We only and always talk about the service defined by216.151.100.246:80?


I don't get it. Do you understand you correctly, if I say:

o everything works after a reboot for a while with service VIP:80
o after a undefined amount of time your picture looks like follows:

 TCP  216.151.100.246:80 rr
   -> 216.151.100.253:80             Route   1      0          0
   -> 216.151.100.252:80             Route   1      2          0

o no request will be forwarded to ~.253 anymore from that point on
o flushing doesn't help (this is clear, actually)
o we have only a trace of a working setup

Nothing but rebooting the director seems to get rid of the problem.

That I doubt, but lets have a clean analysis.

Nothing is a very strong word.  I doubt this as well but it's all I've

seen that fixes the problem.

Ok, how long does it take until this situation occurs normally? Do yousend requests during the reboot?

Here's a dump from the director taking a request for the farm on
216.151.100.246:80

With what do you send your requests? They all seem to have the sameCIP:CIPPort.

384:384(0) ack 8704 win 23168 <nop,nop,timestamp 379374999 56808235> (DF)
16:30:48.586848 eth0 > 64.42.115.38.42710 > 216.151.100.246.www: .
384:384(0) ack 8704 win 23168 <nop,nop,timestamp 379374999 56808235> (DF)
16:30:48.591695 eth0 < 64.42.115.38.42710 > 216.151.100.246.www: F
384:384(0) ack 9449 win 26064 <nop,nop,timestamp 379374999 56808235> (DF)
16:30:48.591800 eth0 > 64.42.115.38.42710 > 216.151.100.246.www: F
384:384(0) ack 9449 win 26064 <nop,nop,timestamp 379374999 56808235> (DF)


Could you send us following infos, please?

o From the director:
    ip link show dev eth0
    ip neigh show dev eth0

o From the RIPs:
    ip link show dev eth0
    ip neigh show dev eth0

Here's a dump from the real server 216.151.100.253:80 following the above
request.

Thank you. Can you check if ~.252 does something, when the request isbeing forwarded to ~.253 like in that case? Just check if tcpdump -lenon ~.252 jumps, when it shouldn't.

tcpdump: listening on eth0
14:12:27.309782 64.42.115.38.42710 > 216.151.100.246.http: S
1546816788:1546816788(0) win 5840 <mss 1460,sackOK,timestamp 379374974
0,nop,wscale 0> (DF)
14:12:27.309813 216.151.100.246.http > 64.42.115.38.42710: S
889182054:889182054(0) ack 1546816789 win 5792 <mss 1460,sackOK,timestamp
56808216 379374974,nop,wscale 0> (DF)
14:12:27.364463 64.42.115.38.42710 > 216.151.100.246.http: . ack 1 win
5840 <nop,nop,timestamp 379374980 56808216> (DF)


Good.

14:12:27.377807 64.42.115.38.42710 > 216.151.100.246.http: P 1:385(384)
ack 1 win 5840 <nop,nop,timestamp 379374980 56808216> (DF)
14:12:27.377840 216.151.100.246.http > 64.42.115.38.42710: . ack 385 win
6432 <nop,nop,timestamp 56808223 379374980> (DF)
14:12:27.378621 216.151.100.246.http > 64.42.115.38.42710: P 1:16(15) ack
385 win 6432 <nop,nop,timestamp 56808223 379374980> (DF)
14:12:27.382864 216.151.100.246.http > 64.42.115.38.42710: . 16:1464(1448)
ack 385 win 6432 <nop,nop,timestamp 56808224 379374980> (DF)
14:12:27.434084 64.42.115.38.42710 > 216.151.100.246.http: . ack 16 win
5840 <nop,nop,timestamp 379374987 56808223> (DF)
14:12:27.434108 216.151.100.246.http > 64.42.115.38.42710: .
1464:2912(1448) ack 385 win 6432 <nop,nop,timestamp 56808229 379374987>
(DF)
14:12:27.434113 216.151.100.246.http > 64.42.115.38.42710: .
2912:4360(1448) ack 385 win 6432 <nop,nop,timestamp 56808229 379374987>
(DF)
14:12:27.448461 64.42.115.38.42710 > 216.151.100.246.http: . ack 1464 win
8688 <nop,nop,timestamp 379374988 56808224> (DF)
14:12:27.448496 216.151.100.246.http > 64.42.115.38.42710: .
4360:5808(1448) ack 385 win 6432 <nop,nop,timestamp 56808230 379374988>
(DF)
14:12:27.448501 216.151.100.246.http > 64.42.115.38.42710: .
5808:7256(1448) ack 385 win 6432 <nop,nop,timestamp 56808230 379374988>
(DF)
14:12:27.500357 64.42.115.38.42710 > 216.151.100.246.http: . ack 4360 win
14480 <nop,nop,timestamp 379374993 56808229> (DF)
14:12:27.510547 64.42.115.38.42710 > 216.151.100.246.http: . ack 5808 win
17376 <nop,nop,timestamp 379374994 56808230> (DF)
14:12:27.514784 64.42.115.38.42710 > 216.151.100.246.http: . ack 7256 win
20272 <nop,nop,timestamp 379374994 56808230> (DF)
14:12:27.559938 64.42.115.38.42710 > 216.151.100.246.http: . ack 8704 win
23168 <nop,nop,timestamp 379374999 56808235> (DF)

Just out of curiousity, is CIP a Windows client or do you have bandwidthlimitation somewhere on a router?

14:12:27.564873 64.42.115.38.42710 > 216.151.100.246.http: F 385:385(0)
ack 9449 win 26064 <nop,nop,timestamp 379374999 56808235> (DF)
14:12:27.564891 216.151.100.246.http > 64.42.115.38.42710: . ack 386 win
6432 <nop,nop,timestamp 56808242 379374999> (DF)

I'm not familiar with the ipvs debug output.  I set debug_level to 7.  A
lot of the output looks truncated for some reason.  I don't know what
corresponds to my connection so here's a chunk of what's currently going
on.

Your ringbuffer is full and the kernel cannot log the stuff anymore.Either you do have a lot of incoming connections/sec or you logsomething else like hell or you have a 80486 as a director. Try loweringdebug_level to 4 or so and see if it's still truncated. Do you runsyslogd/klogd or syslog-ng?

Feb 27 16:58:11 lb kernel: TCP 65.58.153.251:1255->216.151.100.246:80 hitFeb 27 16:58:13 lb kernel: .151.100.253:80 activeconns 316 refcnt 9447weight 1 Feb 27 16:58:18 lb kernel: ip_vs_rr_schedule(): Scheduling...Feb 27 16:58:19 lb kernel: _service fwm 0 TCP 216.151.100.246:80 hitFeb 27 16:58:21 lb kernel: 0.252:80 activeconns 333 refcnt 9371 weight 1Feb 27 16:58:22 lb kernel: 6.151.100.246:80 hitFeb 27 16:58:23 lb kernel: : look/in TCP66.99.108.21:1285->216.151.100.246:80 hit

Sorry, this doesn't help too much like that, I see that connections arebeing forwarded correctly. And the other problem is, that this is a casewhere everything works. I need this output when your problem shows up.

I hope this helps.  I don't have any output from when the problem is
occuring, but I'm sure I will see it again.  I'll be sure to note the
debugging information and record some tcpdump info.  Let me know if you
need anything else.

The arp's I think to verify your setup. Not that we don't trust you, butmore eyes are always better ;)


Best regards,
Roberto Nibali, ratz

<Prev in Thread]	Current Thread	[Next in Thread>
Very strange behavior, Hayden Myers Re: Very strange behavior, Roberto Nibali Re: Very strange behavior, Joseph Mack Re: Very strange behavior, Hayden Myers Re: Very strange behavior, Hayden Myers Re: Very strange behavior, Julian Anastasov Re: Very strange behavior, Roberto Nibali <= Re: Very strange behavior, Wayne Re: Very strange behavior, Hayden Myers

Previous by Date:	Re: Gateway between CIP and VIP on LVS-NAT., Carlos J. Ramos
Next by Date:	Re: Gateway between CIP and VIP on LVS-NAT., Joseph Mack
Previous by Thread:	Re: Very strange behavior, Julian Anastasov
Next by Thread:	Re: Very strange behavior, Wayne
Indexes:	[Date] [Thread] [Top] [All Lists]