Re: heartbeat node taking over resources upon reboot

To:	"LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject:	Re: heartbeat node taking over resources upon reboot
From:	Roberto Nibali <ratz@xxxxxxxxxxxx>
Date:	Fri, 10 Nov 2006 19:55:00 +0100

Hello,

Every time i reboot the active node, it comes back as the backup as normal,
but then it suddenly declares itself dead and says it has no localheartbeat(???) and restarts. While it's restarting it happily declares the othernode
dead as well and (i guess) starts taking over the resources. Resulting in
every connected client to disconnect.

Sounds like timing issues. This is also a typical question for thelinux-ha mailinglist where people can give you appropriate answers inshorter time than here normally.

I also see that it says somewhere "Deadtime value may be too small", but in
normal production i don't see any 'late heartbeats' or such, which made me
not change them. My ha.cf :

udpport 694
logfacility local0
keepalive 75ms
deadtime 300ms
warntime 200ms

Your timings are absolutely crazy. This will only work in the lab. Also,there's no point in having such a snappy system, especially if youconfigure template synchronisation, when deploying LVS.


http://www.linux-ha.org/ha.cf/DeadtimeDirective
http://www.linux-ha.org/FAQ#heavy_load

initdead 60
mcast eth1 224.1.2.3 694 1 0
auto_failback off
node rpzlvs05 rpzlvs06

My question is, should i really go experiment with the *time values again,
or is it something else?

In my opinion you should instrument those values to a more sane value.Also note that even though the kernel operates between 100Hz and 1000Hz,there is no guarantee user-space gets assigned 10ms-1ms slices,especially during boot up, where we have a fork-bomb situation with allthe deamons starting and writing their shit on the platter. Unless yourun a hard RT-enabled kernel, you get blocking I/O peeks in the high 100ms.

I would be surprised if setting a higher deadtime does not fix yourissues, then again the experts are next door on the linux-ha mailinglist.


HTH and best regards,
Roberto Nibali, ratz
--

echo'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

<Prev in Thread]	Current Thread	[Next in Thread>
Re: heartbeat node taking over resources upon reboot, Roberto Nibali <=

Previous by Date:	Re: problem ldirectord 'missing' weight, Roberto Nibali
Next by Date:	Re: sync_threshold question, Roberto Nibali
Previous by Thread:	Re: problem ldirectord 'missing' weight, Roberto Nibali
Next by Thread:	Re: sync_threshold question, Roberto Nibali
Indexes:	[Date] [Thread] [Top] [All Lists]