LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

heartbeat not able to start local resources

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx, ultramonkey-users@xxxxxxxxxxxxxxx
Subject: heartbeat not able to start local resources
From: Jiang <bearie66@xxxxxxxxx>
Date: Thu, 29 Jun 2006 12:23:03 +0800
Hi,

I recently upgraded heartbeat to version 2.0.2 with packages installed:
heartbeat-stonith-2.0.2-1
heartbeat-pils-2.0.2-1
heartbeat-2.0.2-1
heartbeat-ldirectord-1.2.3-2.rh.el.3.0

In haresource (without crm), I have configurations:

****************************************
SMSCONV11 \
       172.16.1.80 \
       172.16.1.81 \
       ldirectord::/etc/ha.d/ldirectord.cf

SMSCONV11 interopOB1

SMSCONV12 interopOB2
****************************************

In a situation the heartbeat is not running on SMSCONV12, I tried to
start heartbeart on SMSCONV11, but in /var/log/message, it seems the
process stucked after bring up interopOB2,

****************************************
Jun 29 12:03:45 SMSCONV11 heartbeat: [1202]: WARN: glib: TTY write
timeout on [/dev/ttyS0] (no connection or bad cable? [see
documentation])
Jun 29 12:04:00 SMSCONV11 heartbeat: [1198]: WARN: node smsconv12: is dead
Jun 29 12:04:00 SMSCONV11 heartbeat: [1198]: info: Local status now
set to: 'active'
Jun 29 12:04:00 SMSCONV11 heartbeat: [1198]: info: Starting child
client "/usr/lib/heartbeat/ipfail" (1001,104)
Jun 29 12:04:00 SMSCONV11 heartbeat: [1209]: info: Starting
"/usr/lib/heartbeat/ipfail" as uid 1001  gid 104 (pid 1209)
Jun 29 12:04:00 SMSCONV11 heartbeat: [1198]: WARN: No STONITH device configured.
Jun 29 12:04:00 SMSCONV11 heartbeat: [1198]: WARN: Shared disks are
not protected.
Jun 29 12:04:00 SMSCONV11 heartbeat: [1198]: info: Resources being
acquired from smsconv12.
Jun 29 12:04:00 SMSCONV11 harc[1210]: info: Running /etc/ha.d/rc.d/status status
Jun 29 12:04:00 SMSCONV11 mach_down[1239]: info: Taking over resource
group interopOB2
Jun 29 12:04:00 SMSCONV11 heartbeat: [1211]: info: Local Resource
acquisition completed.
Jun 29 12:04:00 SMSCONV11 ResourceManager[1299]: info: Acquiring
resource group: smsconv12 interopOB2
Jun 29 12:04:00 SMSCONV11 ResourceManager[1299]: info: Running
/etc/init.d/interopOB2  start
Jun 29 12:04:10 SMSCONV11 heartbeat: [1198]: info: Local Resource
acquisition completed. (none)
Jun 29 12:04:10 SMSCONV11 heartbeat: [1198]: info: local resource
transition completed.
****************************************

ps -e shows ResourceManager still waiting for sth:
****************************************
1117 pts/10   00:00:00 ha_logd
1118 pts/10   00:00:00 ha_logd
1198 ?        00:00:00 heartbeat
1201 ?        00:00:00 heartbeat
1202 ?        00:00:00 heartbeat
1203 ?        00:00:00 heartbeat
1204 ?        00:00:00 heartbeat
1205 ?        00:00:00 heartbeat
1206 ?        00:00:00 heartbeat
1207 ?        00:00:00 heartbeat
1209 ?        00:00:00 ipfail
1210 ?        00:00:00 status
1239 ?        00:00:00 mach_down
1299 ?        00:00:00 ResourceManager
1343 ?        00:00:00 ResourceManager
****************************************

[root@SMSCONV11 init.d]# ps -ef | grep Re
root      1299  1239  0 12:04 ?        00:00:00 /bin/sh
/usr/lib/heartbeat/ResourceManager takegroup interopOB2
root      1343  1299  0 12:04 ?        00:00:00 /bin/sh
/usr/lib/heartbeat/ResourceManager takegroup interopOB2
****************************************

And then it never carry on to start interopOB1 and ldirectord on
SMSCONV11. I previously have the same problem with heartbeat version
1.2.3, but someone told it has been fixed in 1.2.4, since I can't get
1.2.4's rpms for RH EL3, so I upgraded to heartbeat 2.0.2.

Is it a known bug for heartbeat 2.0.2 as well? How can I fix it or if
I have configured something wrongly?

Your help is highly appreciated!!

- Jiang -

<Prev in Thread] Current Thread [Next in Thread>