Re: LVS + DRBD

To:	"LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject:	Re: LVS + DRBD
From:	Roberto Nibali <ratz@xxxxxxxxxxxx>
Date:	Mon, 20 Nov 2006 22:50:03 +0100

I've got a pair of fileservers running DRBD using heartbeat and NFS
(active/passive).  Seeing as how these two NFS servers serve the entire
office, I figured I'd also put Nagios onto this pair (not on the NFS
filesystem) since if they were both down, then everything would be out of
service (so there wouldn't be much point monitoring everything being down ;)
).

Unless the monitoring caches local health status for traceability, butinstalling executables on a NFS partition is suicide anyway :).

The problem I am having seems to be that now that I've got the LVS setup and
running, DRBD will no longer start.  Looking through the /var/log/ha-debug

Those have absolutely nothing in common so I suspect either a softwareconfiguration problem or heartbeat problem. Since heartbeat is rocksolid, I'll jump right at your configuration.

Sidenote: The linux-ha mailinglist is full of experts regarding linux-HAissues. I understand why you've posted this here, but we might need abit longer to help you out.

logs, it appears that none of the DRBD commands ever get a proper start
command.  My haresources file looks like this:

mimir.yggdrasil \
        drbddisk::clients0 \
        Filesystem::/dev/drbd0::/opt/mnt/data::ext3 \
        killnfsd \
        nfs \
        nfslock \
        Delay::3::0 \
        IPaddr2::10.0.0.3/24/eth2/10.0.0.255
mimir.yggdrasil \
        ldirectord \
        LVSSyncDaemonSwap::master \

IPaddr2::192.168.0.3/24/eth1/192.168.0.255

The second mimir.yggdrasil does not make sense to me at first sight.What's the purpose of it? Why is the killnfsd needed and how does itlook like? Also, I'd put the IPaddr2 resource first, because otherdaemons might need it so they don't need to lazy bind.

After the Delay command,everything in the first section shuts down again.


What's the first section?

Is it not possible to have multiple sections in haresources?  Should
everything be combined into the one section?


It's possible to have multiple node configuration, like so:

node-A \
  res-1 \
  res-2 \
  res-3

node-B \
  res-4 \
  res-5

 My ha-debug log looks like this:
heartbeat: 2006/11/12_01:21:23 debug: StartNextRemoteRscReq(): child count 1
heartbeat: 2006/11/12_01:21:23 debug: Starting /etc/ha.d/resource.d/drbddisk
clients0 start
heartbeat: 2006/11/12_01:21:23 debug: /etc/ha.d/resource.d/drbddisk clients0
start done. RC=0


Very good.

heartbeat: 2006/11/12_01:21:23 debug: Starting
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /opt/mnt
/data ext3 start
heartbeat: 2006/11/12_01:21:23 debug: /etc/ha.d/resource.d/Filesystem
/dev/drbd0 /opt/mnt/data ext
3 start done. RC=0


Very good.

nfsd: no process killed
heartbeat: 2006/11/12_01:21:23 debug: Starting /etc/ha.d/resource.d/killnfsd
start
nfsd: no process killed
heartbeat: 2006/11/12_01:21:23 debug: /etc/ha.d/resource.d/killnfsd  start
done. RC=1

Not so good, wrong return code. So heartbeat has a resource problem andwill shut down the node ... in reverse order of the last semanticallycorrect resource configuration item, which is:

heartbeat: 2006/11/12_01:21:24 debug: Starting /etc/ha.d/resource.d/IPaddr
10.0.0.3/24/eth2 stop
heartbeat: 2006/11/12_01:21:24 debug: /etc/ha.d/resource.d/IPaddr
10.0.0.3/24/eth2 stop done. RC=0
heartbeat: 2006/11/12_01:21:24 debug: Starting /etc/ha.d/resource.d/Delay 3
0 stop
Delay already stopped
heartbeat: 2006/11/12_01:21:24 debug: /etc/ha.d/resource.d/Delay 3 0 stop
done. RC=0


So far so good (maybe not for you, but for heartbeat)

heartbeat: 2006/11/12_01:21:24 debug: Starting /etc/init.d/nfslock  stop
Stopping NFS locking: [FAILED]
Stopping NFS statd: [FAILED]
heartbeat: 2006/11/12_01:21:24 debug: /etc/init.d/nfslock  stop done. RC=0

I'm not yet sure why you need this nfslock stuff and especially thekillnfsd.

heartbeat: 2006/11/12_01:21:24 debug: Starting /etc/init.d/nfs  stop
Shutting down NFS mountd: [FAILED]
Shutting down NFS daemon: [FAILED]
Shutting down NFS quotas: [FAILED]
Shutting down NFS services:  [  OK  ]
heartbeat: 2006/11/12_01:21:24 debug: /etc/init.d/nfs  stop done. RC=0


Seems to have worked wonderfully.

heartbeat: 2006/11/12_01:21:24 debug: Starting /etc/ha.d/resource.d/killnfsd
stop
nfsd: no process killed
heartbeat: 2006/11/12_01:21:24 debug: /etc/ha.d/resource.d/killnfsd  stop
done. RC=1

Wrong return value again, but this time in the resource release state ofheartbeat. This mean, we will try again ...

heartbeat: 2006/11/12_01:21:25 debug: Starting /etc/ha.d/resource.d/killnfsd
stop
nfsd: no process killed


... and again ...

heartbeat: 2006/11/12_01:21:25 debug: /etc/ha.d/resource.d/killnfsd  stop
done. RC=1
heartbeat: 2006/11/12_01:21:26 debug: Starting /etc/ha.d/resource.d/killnfsd
stop
nfsd: no process killed
heartbeat: 2006/11/12_01:21:26 debug: /etc/ha.d/resource.d/killnfsd  stop
done. RC=1
heartbeat: 2006/11/12_01:21:27 debug: Starting /etc/ha.d/resource.d/killnfsd
stop
nfsd: no process killed
heartbeat: 2006/11/12_01:21:27 debug: /etc/ha.d/resource.d/killnfsd  stop
done. RC=1
heartbeat: 2006/11/12_01:21:28 debug: Starting /etc/ha.d/resource.d/killnfsd
stop
nfsd: no process killed
heartbeat: 2006/11/12_01:21:28 debug: /etc/ha.d/resource.d/killnfsd  stop
done. RC=1
heartbeat: 2006/11/12_01:21:29 debug: Starting /etc/ha.d/resource.d/killnfsd
stop
nfsd: no process killed
heartbeat: 2006/11/12_01:21:29 debug: /etc/ha.d/resource.d/killnfsd  stop
done. RC=1
heartbeat: 2006/11/12_01:21:30 debug: Starting /etc/ha.d/resource.d/killnfsd
stop
nfsd: no process killed
heartbeat: 2006/11/12_01:21:30 debug: /etc/ha.d/resource.d/killnfsd  stop
done. RC=1
heartbeat: 2006/11/12_01:21:31 debug: Starting /etc/ha.d/resource.d/killnfsd
stop
nfsd: no process killed
heartbeat: 2006/11/12_01:21:31 debug: /etc/ha.d/resource.d/killnfsd  stop
done. RC=1
heartbeat: 2006/11/12_01:21:32 debug: Starting /etc/ha.d/resource.d/killnfsd
stop
nfsd: no process killed
heartbeat: 2006/11/12_01:21:32 debug: /etc/ha.d/resource.d/killnfsd  stop
done. RC=1
heartbeat: 2006/11/12_01:21:33 debug: Starting /etc/ha.d/resource.d/killnfsd
stop
nfsd: no process killed
heartbeat: 2006/11/12_01:21:33 debug: /etc/ha.d/resource.d/killnfsd  stop
done. RC=1
heartbeat: 2006/11/12_01:21:34 debug: Starting /etc/ha.d/resource.d/killnfsd
stop
nfsd: no process killed
heartbeat: 2006/11/12_01:21:34 debug: /etc/ha.d/resource.d/killnfsd  stop
done. RC=1
nfsd: no process killed

Boah, heartbeat got tired of being messed around with and triessomething new to irritate the user :)

heartbeat: 2006/11/12_01:21:34 debug: Starting
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /opt/mnt/data ext3 stop


Fine, next in list to stop.

heartbeat: 2006/11/12_01:21:35 debug: /etc/ha.d/resource.d/Filesystem
/dev/drbd0 /opt/mnt/data ext3 stop done. RC=0
heartbeat: 2006/11/12_01:21:35 debug: Starting /etc/ha.d/resource.d/drbddisk
clients0 stop
heartbeat: 2006/11/12_01:21:35 debug: /etc/ha.d/resource.d/drbddisk clients0
stop done. RC=0

Ok, all resources are now released. Now we enter the undefined area ofharesources parsing, the second node configuration, which is actuallythe same node, but with new resources, such as:

ldirectord is stopped for /etc/ha.d/ldirectord.cf
heartbeat: 2006/11/12_01:21:35 debug: Starting /etc/init.d/ldirectord  start
Starting ldirectord [  OK  ]
heartbeat: 2006/11/12_01:21:36 debug: /etc/init.d/ldirectord  start done.
RC=0


Which works fine.

heartbeat: 2006/11/12_01:21:36 debug: Starting
/etc/ha.d/resource.d/LVSSyncDaemonSwap master start
heartbeat: 2006/11/12_01:21:36 debug: /etc/ha.d/resource.d/LVSSyncDaemonSwap
master start done. RC=0


Which works fine also.

heartbeat: 2006/11/12_01:21:36 debug: Starting /etc/ha.d/resource.d/IPaddr2
192.168.0.3/24/eth1/192.168.0.255 start

Which works fine and is also the last resource to be started in thesecond mimir node configuration. The node mimir has now successfullyreleased its resources and the node mimir (same machine) has take overwith its resources.

heartbeat: 2006/11/12_01:22:17 debug: Received standby message done from
mimir.yggdrasil in state 0heartbeat: 2006/11/12_01:22:17 debug: RscMgmtProc 'go_standby' exited code 0

You've just shown a startup, shutdown, standby and resource acquisitionperformed on a single node. I've never tried that before but I doubtit's what you intended to do.

Can you show your ha.cf? What are the names of your two nodes? Right nowthey are mimir.yggdrasil and mimir.yggdrasil.


Best regards,
Roberto Nibali, ratz
--

echo'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

<Prev in Thread]	Current Thread	[Next in Thread>
LVS + DRBD, Dan Brown Re: LVS + DRBD, Roberto Nibali <=

Previous by Date:	Re: Having got over the excitement of getting SNMP data out of LVS..., Will Murnane
Next by Date:	RE: Geographically separated load balancers?, Neil Aggarwal
Previous by Thread:	LVS + DRBD, Dan Brown
Next by Thread:	Having got over the excitement of getting SNMP data out of LVS..., Malcolm
Indexes:	[Date] [Thread] [Top] [All Lists]