[lvs-users] LVS start to malfunction after few hours

To:	lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject:	[lvs-users] LVS start to malfunction after few hours
From:	Michael Ben-Nes <michael@xxxxxxxxxxx>
Date:	Mon, 18 Jan 2010 10:06:11 +0200

Hi,

My fresh LVS installation start to malfunction after few hours.

I use piranha / pulse on CentOS 5.4 to RR between two Nginx server ( only
static files, not persistent )

When I start the service everything work as expected. After few hours one of
the real servers become unavailable ( randomly ) even though ipvsadm show it
is ok.
The real server which is not answering is continuously been accessed by the
nanny process every 6 sec and is accessible through its real IP.

The only clue I found is that instead of 2 nanny process there are 4 nanny
process ( 2 for each server ).
logs show nothing of interest while pulse ruining.

When I stop pulse all the related process are terminated beside the 4
nannies which I need to kill by hand:

Here is a snip from the logs:
Jan 18 09:57:21 blb1 pulse[5795]: Terminating due to signal 15
Jan 18 09:57:21 blb1 lvs[5798]: shutting down due to signal 15
Jan 18 09:57:21 blb1 lvs[5798]: shutting down virtual service Nginx
Jan 18 09:59:15 blb1 nanny[2668]: Terminating due to signal 15
Jan 18 09:59:15 blb1 nanny[2670]: Terminating due to signal 15
Jan 18 09:59:15 blb1 nanny[5812]: Terminating due to signal 15
Jan 18 09:59:15 blb1 nanny[2668]: /sbin/ipvsadm command failed!
Jan 18 09:59:15 blb1 nanny[5812]: /sbin/ipvsadm command failed!
Jan 18 09:59:15 blb1 nanny[2670]: /sbin/ipvsadm command failed!
Jan 18 09:59:15 blb1 nanny[5813]: Terminating due to signal 15
Jan 18 09:59:15 blb1 nanny[5813]: /sbin/ipvsadm command failed!


Here is the data I gathered while the build malfunctioned:

####### LVS server ( no backup server )

CentOS 5.4 - piranha-0.8.4-13.el5 - ipvsadm-1.24-10

# sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1

# cat /etc/sysconfig/ha/lvs.cf
serial_no = 37
primary = 82.81.215.137
service = lvs
backup = 0.0.0.0
heartbeat = 1
heartbeat_port = 539
keepalive = 6
deadtime = 18
network = direct
nat_nmask = 255.255.255.255
debug_level = NONE
virtual Nginx {
     active = 1
     address = 82.81.215.141 eth0:1
     vip_nmask = 255.255.255.224
     port = 80
     send = "GET / HTTP/1.0\r\n\r\n"
     expect = "HTTP"
     use_regex = 0
     load_monitor = none
     scheduler = wlc  # Suppose to be RR - changed only to test if the
scheduler is the problem - same effect
     protocol = tcp
     timeout = 6
     reentry = 15
     quiesce_server = 1
     server bweb1.my-domain.com {
         address = 82.81.215.138
         active = 1
         weight = 1
     }
     server bweb2.my-domain.com {
         address = 82.81.215.139
         active = 1
         weight = 1
     }
     server bweb3.my-domain.com {
         address = 82.81.215.140
         active = 0
         weight = 1
     }
}

# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  82.81.215.141:80 wlc
  -> 82.81.215.139:80             Route   1      0          0
  -> 82.81.215.138:80             Route   1      0          0


# ps auxw|egrep "nanny|ipv|lvs|pulse"
root      2668  0.0  0.0   8456   692 ?        Ss   Jan16   0:00
/usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
root      2670  0.0  0.0   8456   688 ?        Ss   Jan16   0:00
/usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
root      5795  0.0  0.0   8488   372 ?        Ss   Jan17   0:00 pulse
root      5798  0.0  0.0   8476   656 ?        Ss   Jan17   0:00
/usr/sbin/lvsd --nofork -c /etc/sysconfig/ha/lvs.cf
root      5812  0.0  0.0   8456   692 ?        Ss   Jan17   0:00
/usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
root      5813  0.0  0.0   8456   692 ?        Ss   Jan17   0:00
/usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs



####### One of the servers ( the one that does not answer. though its
identical to the other )

# arptables -L -n
Chain IN (policy ACCEPT)
target     source-ip            destination-ip       source-hw
 destination-hw     hlen   op         hrd        pro
DROP       0.0.0.0/0            82.81.215.141        00/00
 00/00              any    0000/0000  0000/0000  0000/0000

Chain OUT (policy ACCEPT)
target     source-ip            destination-ip       source-hw
 destination-hw     hlen   op         hrd        pro
mangle     0.0.0.0/0            82.81.215.141        00/00
 00/00              any    0000/0000  0000/0000  0000/0000 --mangle-ip-s
82.81.215.139

Chain FORWARD (policy ACCEPT)
target     source-ip            destination-ip       source-hw
 destination-hw     hlen   op         hrd        pro


# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:11:25:41:69:A4
          inet addr:82.81.215.139  Bcast:82.81.215.159  Mask:255.255.255.224
          inet6 addr: fe80::211:25ff:fe41:69a4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:602454 errors:0 dropped:0 overruns:0 frame:0
          TX packets:514536 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:51144864 (48.7 MiB)  TX bytes:251901147 (240.2 MiB)
          Interrupt:169 Memory:dcff0000-dd000000

eth0:1    Link encap:Ethernet  HWaddr 00:11:25:41:69:A4
          inet addr:82.81.215.141  Bcast:82.81.215.159  Mask:255.255.255.224
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:169 Memory:dcff0000-dd000000


Thanks for any idea that might shade some light on this topic :)

Best,
Miki

--------------------------------------------------
Michael Ben-Nes - Internet Consultant and Director.
http://www.epoch.co.il - weaving the Net.
Cellular: 054-4848113
--------------------------------------------------
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread]	Current Thread	[Next in Thread>
[lvs-users] LVS start to malfunction after few hours, Michael Ben-Nes <= Re: [lvs-users] LVS start to malfunction after few hours, Graeme Fowler Re: [lvs-users] LVS start to malfunction after few hours, Michael Ben-Nes Re: [lvs-users] LVS start to malfunction after few hours, Graeme Fowler Re: [lvs-users] LVS start to malfunction after few hours, Michael Ben-Nes

Previous by Date:	Re: [lvs-users] Disable a real server, Graeme Fowler
Next by Date:	[lvs-users] Nanny and disabled server, Diego Bello
Previous by Thread:	[lvs-users] Disable a real server, Diego Bello
Next by Thread:	Re: [lvs-users] LVS start to malfunction after few hours, Graeme Fowler
Indexes:	[Date] [Thread] [Top] [All Lists]