LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: Kernel sync daemon causing lockups?

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: Kernel sync daemon causing lockups?
From: Wensong Zhang <wensong@xxxxxxxxxxxx>
Date: Mon, 29 Jul 2002 22:02:20 +0800 (CST)

Hi,

I thought that I have made a mistake that ipvs sync kernel thread cannot
be pre-empted by the softirqd kernel thread. Actually, after reading some
of the kernel code, I see that it is possible, because the ipvs sync
kernel thread may be pre-empted by the interrupt, and some interrupt
handlers may run wakeup_softirqd(cpu) to wake up the softirq.

Please try the attached diff (just apply it to ip_vs_sync.c), hope that it 
will fix the lockup problem in the sync daemon. please let me know if it 
is ok or not.

Thanks,

Wensong


On Mon, 29 Jul 2002, Bradley McLean wrote:

> We've just upgraded our LBs from running lvs 1.02 to 1.04, and
> keepalived 0.5.6 to 0.6.8.
> 
> All addresses prefixed by non-routable 192.168.x.x
> 
> As part of the upgrade, we enabled the sync daemon again.  We'd
> disabled it in the past because it was suspected of causing the
> systems to hang.
> 
> Hardware:  Dell 2450, 600 Mhz P3, 128 Mb, DE570TX quad nic.
> OS:  RH7.2 w/ kernel.org 2.4.18 kernel.
> 
> Old configuration:  LVS-NAT, 4096 connections, 1800 persistence.
> LB1: eth0: .100.4  eth1: .110.4  eth2 .120.4  eth3 .130.4
> LB2: eth0: .100.5  eth1: .110.5  eth2 .120.5  eth3 .130.5
> (eth0 admin, eth1 outside, eth2 inside, eth3 syncdaemon).
> VIP: .110.10  RIPs in the .120.x net.  Sync daemon alone on .130
> via crossover cable.
> 
> New configuration:  LVS-DR, 4096 connections, 900 persistence.
> LB1: eth0: .100.4  eth1: .110.4  eth2 .120.4  eth3 .130.4
> LB2: eth0: .100.5  eth1: .110.5  eth2 .120.5  eth3 .130.5
> (eth0 admin, eth1 outside/inside, eth2 unused, eth3 syncdaemon)
> VIP: .110.10  RIPs in the .110.x net.  Sync daemon alone on .130
> via crossover cable.
> 
> Symptom:  After running well for 24-72 hours, primary load balancer
> locks hard.  No kernel messages, no keyboard, no mouse, no ping.
> System reset required.  Secondary load balancer runs for an additional
> 4-12 hours, then fails as well.  Shut off the sync daemon, and they
> run well forever (or at least months at a time).
> 
> We've shut off the sync daemon again.
> 
> Anybody else see this?  What other information can I provide?
> 
> regards,
> 
> -Brad
> 

Attachment: ip_vs_sync.c.diff
Description: Text document

<Prev in Thread] Current Thread [Next in Thread>