LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: System crash

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: System crash
From: Sébastien BONNET <sebastien.bonnet@xxxxxxxxxxx>
Date: Mon, 18 Jul 2005 11:09:34 +0200
Hi folks,

Here is the latest news about this (still) unsolved problem. I've reactivated IPVS a few days ago and machines are crashing every 3 days.

Since sept. 2004, a few changes have been made : now running 2.6.10, UP instead of SMP. Still using bcm5700 NIC drivers. Kernel args : acpi=off. I've just applyed pci=noapic (as initially proposed) but it's not yet used (server not yet rebooted).

The serial console is now operationnal. Unfortunately, Alt-SysRq was also panic'ed this morning, I was only able to save a screen height of kernel dump :

8<--------
si_meminfo+0x1f/0x3b
update_defense_level+0x10/0x372 [ip_vs]
ip_rcv+0x36d/0x3a1
defense_timer_handler+0x0/0x29 [ip_vs]
defense_timer_handler+0x5/0x29 [ip_vs]
run_timer_softirq+0x1ff/0x312
__do_softirq+0x35/0x79
do_softirq+0x38/0x3f
==========
do_IRQ+0x70/0x7a
common_interrupt+0x1a/0x20
do_generic_mapping_read+0x1bc/0x357
nr_blockdev_pages+0xb2/0x135
si_meminfo+0x1f/0x3b
meminfo_read_proc+0x43/0x1cb
buffered_rmqueue+0x1e0/0x203
__alloc_pages+0x2c5/0x2d1
proc_file_read+0xcd/0x1e5
vfs_read+0xb8/0Xe4
sys_read+0x3c/0X62
syscall_call+0x7/0Xb
Kernel panic - not syncing: fs/block_dev.c:396: spin_lock(fs/block_dev.c:c035d880) already locked by fs/block_dev.c/396
8<--------

As one can see, it's still talking about ip_vs defense something. Running ksymoops on the above is not significant :

8<--------
[root@martinique ksymoops-2.4.11]# ./ksymoops -m /boot/System.map-2.6.10-1.771_FC2 < ~/ksymoops.log

ksymoops 2.4.11 on i686 2.6.10-1.771_FC2.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.6.10-1.771_FC2/ (default)
     -m /boot/System.map-2.6.10-1.771_FC2 (specified)

Error (regular_file): read_ksyms stat /proc/ksyms failed
./ksymoops: No such file or directory
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Kernel panic - not syncing: fs/block_dev.c:396: spin_lock(fs/block_dev.c:c035d880) already locked by fs/block_dev.c/396

1 error issued.  Results may not be reliable.
8<--------

I'm setting up a permanent serial console, recording every kernel message. I'm hoping to get a full kernel panic dump in less than 3 days.

If some of you have ideas in the meantime ...

Regards,

Sébastien BONNET wrote:
Hi list,

I've been using IVPS 1.0.x for years now and I never had any problem.

I recently built a new cluster, running Fedora Core 2 and IPVS 1.2.0. Since then, I'm having big problems. Eventhough the servers are not loaded, the director periodically crashed.

No more network, no more interactive console access ... the only thing left is Alt-SysRq ! No single char logged (to disk or to the console) that could help :(

Alt-SysRq-p has recently shown that something related to ipvs tcp defense was found in the stack trace. Unfornunately, I wasn't able to read enough infos as they scroll way too fast.

Right now, I don't know what I can try and test ? I've changed the NIC driver, I've stopped the backup director, I've just stopped the sync daemon ... and still hoping it won't crash again. If it crashes again, I'll run without ipvs for a few days to see if it's really the cause.

Has anybody already faced such a situation ? Any help appreciated.

Kind regards.

[root@frioul root]# uname -a
Linux frioul 2.6.5-1.358smp #1 SMP Sat May 8 09:25:36 EDT 2004 i686 i686 i386 GNU/Linux

[root@hawai root]# ipvsadm -ln
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  a.b.c.78:80 wrr
  -> a.b.c.104:80             Local   981    0          0
TCP  a.b.c.79:80 wrr
  -> a.b.c.104:80             Local   981    0          0
TCP  a.b.c.74:80 wrr
  -> a.b.c.104:80             Local   1000   0          0
TCP  a.b.c.78:443 wrr
  -> a.b.c.104:443            Local   1000   0          0
TCP  a.b.c.79:443 wrr
  -> a.b.c.104:443            Local   1000   0          0
TCP  a.b.c.74:443 wrr
  -> a.b.c.104:443            Local   1000   0          70

Setup : 2 servers with one NIC each, both acting as director and realserver, forwarding method is Direct Routing, MAC pb solved using net.ipv4.conf.{lo,all}.arp_{ignore,announce} The above ipvsadm output shows the current situation with only one active server as I investigate on the other one.


--
Sébastien BONNET     --    Ingénieur système
Tel: 04.42.25.15.40      GSM: 06.64.44.58.98

<Prev in Thread] Current Thread [Next in Thread>