We've just upgraded our LBs from running lvs 1.02 to 1.04, and
keepalived 0.5.6 to 0.6.8.
All addresses prefixed by non-routable 192.168.x.x
As part of the upgrade, we enabled the sync daemon again. We'd
disabled it in the past because it was suspected of causing the
systems to hang.
Hardware: Dell 2450, 600 Mhz P3, 128 Mb, DE570TX quad nic.
OS: RH7.2 w/ kernel.org 2.4.18 kernel.
Old configuration: LVS-NAT, 4096 connections, 1800 persistence.
LB1: eth0: .100.4 eth1: .110.4 eth2 .120.4 eth3 .130.4
LB2: eth0: .100.5 eth1: .110.5 eth2 .120.5 eth3 .130.5
(eth0 admin, eth1 outside, eth2 inside, eth3 syncdaemon).
VIP: .110.10 RIPs in the .120.x net. Sync daemon alone on .130
via crossover cable.
New configuration: LVS-DR, 4096 connections, 900 persistence.
LB1: eth0: .100.4 eth1: .110.4 eth2 .120.4 eth3 .130.4
LB2: eth0: .100.5 eth1: .110.5 eth2 .120.5 eth3 .130.5
(eth0 admin, eth1 outside/inside, eth2 unused, eth3 syncdaemon)
VIP: .110.10 RIPs in the .110.x net. Sync daemon alone on .130
via crossover cable.
Symptom: After running well for 24-72 hours, primary load balancer
locks hard. No kernel messages, no keyboard, no mouse, no ping.
System reset required. Secondary load balancer runs for an additional
4-12 hours, then fails as well. Shut off the sync daemon, and they
run well forever (or at least months at a time).
We've shut off the sync daemon again.
Anybody else see this? What other information can I provide?
regards,
-Brad
|