LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: Please help - load balancers fail back and forth for noapparent reas

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: Please help - load balancers fail back and forth for noapparent reason
From: mike <mike503@xxxxxxxxx>
Date: Fri, 9 Mar 2007 18:04:31 -0800
Thanks Joe - I don't think the issue is with ldirectord though.

Unless I'm wrong, ldirectord is independent of heartbeat. ldirector is
just directing the traffic to the virtual servers. Heartbeat is what
seems to be the issue, I don't think ldirectord tells heartbeat "yo,
no virtual services are up, we need to failover to another load
balancer" - or does it?

Mar  9 10:17:08 lvs02 heartbeat: [31332]: WARN:
Gmain_timeout_dispatch: Dispatch function for check for signals was
delayed
570 ms (> 510 ms) before being called (GSource: 0x5e74f8)
Mar  9 10:17:08 lvs02 heartbeat: [31332]: info:
Gmain_timeout_dispatch: started at 2077565087 should have started at
2077565030
Mar  9 10:17:08 lvs02 heartbeat: [31332]: WARN: Late heartbeat: Node
lvs01: interval 12380 ms
Mar  9 10:17:09 lvs02 heartbeat: [31332]: WARN: node lvs01: is dead
Mar  9 10:17:09 lvs02 heartbeat: [31332]: info: Dead node lvs01 gave
up resources.
Mar  9 10:17:09 lvs02 ipfail: [31351]: info: Status update: Node lvs01
now has status dead
Mar  9 10:17:09 lvs02 heartbeat: [31332]: WARN:
Gmain_timeout_dispatch: Dispatch function for send local status was
delayed
930 ms (> 510 ms) before being called (GSource: 0x5e6f48)
Mar  9 10:17:09 lvs02 heartbeat: [31332]: info:
Gmain_timeout_dispatch: started at 2077568611 should have started at
207756
8518
Mar  9 10:17:09 lvs02 heartbeat: [31332]: CRIT: Cluster node lvs01
returning after partition.
Mar  9 10:17:09 lvs02 heartbeat: [31332]: info: For information on
cluster partitions, See URL: http://linux-ha.org/SplitBrain
Mar  9 10:17:09 lvs02 heartbeat: [31332]: WARN: Deadtime value may be too small.
Mar  9 10:17:09 lvs02 heartbeat: [31332]: info: See FAQ for
information on tuning deadtime.
Mar  9 10:17:09 lvs02 heartbeat: [31332]: info: URL:
http://linux-ha.org/FAQ#heavy_load
Mar  9 10:17:09 lvs02 heartbeat: [31332]: WARN: Late heartbeat: Node
lvs01: interval 33790 ms
Mar  9 10:17:09 lvs02 heartbeat: [31332]: info: Status update for node
lvs01: status active

so it's like, it detects it's down, but then back up, then it restarts
heartbeat...

see the log from lvs02:
http://mikehost.com/~mike/tmp/lvs/lvs02-daemon.log

all those gmain_timeout_dispatch warnings are a concern maybe too that
I'm going to google right now too. Again I don't want to take the easy
way out, I want the most stable way out and I'd be willing to pay for
someone's time to adjust my configuration.

Thanks,
mike

On 3/9/07, Joseph Mack NA3T <jmack@xxxxxxxx> wrote:
there have been postings with similar symptoms recently
(last month or so), due to some distros packaging the an old
version of ldirectord with LVS. Anyhow look in the archives
for problesm with ldirectord

<Prev in Thread] Current Thread [Next in Thread>