Hi,
I recently upgraded heartbeat to version 2.0.2 with packages installed:
heartbeat-stonith-2.0.2-1
heartbeat-pils-2.0.2-1
heartbeat-2.0.2-1
heartbeat-ldirectord-1.2.3-2.rh.el.3.0
In haresource (without crm), I have configurations:
****************************************
SMSCONV11 \
172.16.1.80 \
172.16.1.81 \
ldirectord::/etc/ha.d/ldirectord.cf
SMSCONV11 interopOB1
SMSCONV12 interopOB2
****************************************
In a situation the heartbeat is not running on SMSCONV12, I tried to
start heartbeart on SMSCONV11, but in /var/log/message, it seems the
process stucked after bring up interopOB2,
****************************************
Jun 29 12:03:45 SMSCONV11 heartbeat: [1202]: WARN: glib: TTY write
timeout on [/dev/ttyS0] (no connection or bad cable? [see
documentation])
Jun 29 12:04:00 SMSCONV11 heartbeat: [1198]: WARN: node smsconv12: is dead
Jun 29 12:04:00 SMSCONV11 heartbeat: [1198]: info: Local status now
set to: 'active'
Jun 29 12:04:00 SMSCONV11 heartbeat: [1198]: info: Starting child
client "/usr/lib/heartbeat/ipfail" (1001,104)
Jun 29 12:04:00 SMSCONV11 heartbeat: [1209]: info: Starting
"/usr/lib/heartbeat/ipfail" as uid 1001 gid 104 (pid 1209)
Jun 29 12:04:00 SMSCONV11 heartbeat: [1198]: WARN: No STONITH device configured.
Jun 29 12:04:00 SMSCONV11 heartbeat: [1198]: WARN: Shared disks are
not protected.
Jun 29 12:04:00 SMSCONV11 heartbeat: [1198]: info: Resources being
acquired from smsconv12.
Jun 29 12:04:00 SMSCONV11 harc[1210]: info: Running /etc/ha.d/rc.d/status status
Jun 29 12:04:00 SMSCONV11 mach_down[1239]: info: Taking over resource
group interopOB2
Jun 29 12:04:00 SMSCONV11 heartbeat: [1211]: info: Local Resource
acquisition completed.
Jun 29 12:04:00 SMSCONV11 ResourceManager[1299]: info: Acquiring
resource group: smsconv12 interopOB2
Jun 29 12:04:00 SMSCONV11 ResourceManager[1299]: info: Running
/etc/init.d/interopOB2 start
Jun 29 12:04:10 SMSCONV11 heartbeat: [1198]: info: Local Resource
acquisition completed. (none)
Jun 29 12:04:10 SMSCONV11 heartbeat: [1198]: info: local resource
transition completed.
****************************************
ps -e shows ResourceManager still waiting for sth:
****************************************
1117 pts/10 00:00:00 ha_logd
1118 pts/10 00:00:00 ha_logd
1198 ? 00:00:00 heartbeat
1201 ? 00:00:00 heartbeat
1202 ? 00:00:00 heartbeat
1203 ? 00:00:00 heartbeat
1204 ? 00:00:00 heartbeat
1205 ? 00:00:00 heartbeat
1206 ? 00:00:00 heartbeat
1207 ? 00:00:00 heartbeat
1209 ? 00:00:00 ipfail
1210 ? 00:00:00 status
1239 ? 00:00:00 mach_down
1299 ? 00:00:00 ResourceManager
1343 ? 00:00:00 ResourceManager
****************************************
[root@SMSCONV11 init.d]# ps -ef | grep Re
root 1299 1239 0 12:04 ? 00:00:00 /bin/sh
/usr/lib/heartbeat/ResourceManager takegroup interopOB2
root 1343 1299 0 12:04 ? 00:00:00 /bin/sh
/usr/lib/heartbeat/ResourceManager takegroup interopOB2
****************************************
And then it never carry on to start interopOB1 and ldirectord on
SMSCONV11. I previously have the same problem with heartbeat version
1.2.3, but someone told it has been fixed in 1.2.4, since I can't get
1.2.4's rpms for RH EL3, so I upgraded to heartbeat 2.0.2.
Is it a known bug for heartbeat 2.0.2 as well? How can I fix it or if
I have configured something wrongly?
Your help is highly appreciated!!
- Jiang -
|