LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: problems after failover

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: problems after failover
From: Roberto Nibali <ratz@xxxxxxxxxxxx>
Date: Sun, 05 Feb 2006 22:39:12 +0100
Hello,

 This friday the director node felt down and backup director became to own
 the resources, including drbd.

Linux-HA is nice, isn't it?

 The problem is that now, after restarted primary directord, the shared disk
 seems to be not syncronized.

What do you mean by restarted primary directord? Did you fallback to the master, or did you start the directord resource on the master?

 I get disklessclient....Inconsistent in the old primary director. And
 ServerforDless....Consistent in the old backup director (now the primary
 after failover) What is happening?

Please show the correct output of /proc/drbd (or whichever entry it is, as I don't know it by heart) with the inconsistent behaviour. Also reading this I get the impression that you might be better served at the linux-ha-users mailing list.

I need to get back to the old
 configuration. How can syncronize both disks??

Please share you linux-ha configuration and what you refer to with "old configuration".

I've read in the list and this is what I made:
I stopped drbd in old director node and then started again.

Why? Doing that you've probably disabled a crucial service during runtime. Since you've not told us what exactly you share over your DRBD it's difficult to tell.

Then, watching the status, drbd noticed that there was some MB to resync and started to

This does not make much sense.

syncronize but suddenly the sync process stopped and what I get now is:
        in old director: cs:WFConnection   st:Primary/Unknown   ld:Consistent
in new director: cs:NetworkFailure st:Secondary/Unknown ld:Inconsistent

Check your heartbeat and your interface configuration and your linux-ha log files. It looks like your heartbeat network is broken; possibly the reason for the failover.

So it seems to be a problem with the net between both nodes, doesn't it?

Yes.

I tried to change the net that drbd uses to syncronize the disks (changing /etc/drbd.conf) but If I change it in the new director, Should I restart drbd??

Yes.

How could this affect the data?

Depends how you restart drbd.

The cluster nodes are mounting a directory that is in shared disks and is very used, could this be a problem??

Local disks?

Please, I need some help with this problem (this cluster is in production).

Debug your network and check the log file entries. Heartbeat has certainly logged interesting information regarding this incident.

**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso los ficheros adjuntos, pueden contener información protegida para el uso exclusivo de su destinatario. Se prohíbe la distribución, reproducción o cualquier otro tipo de transmisión por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido. **CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies.

Please drop such email statements, since this is legally difficult.

Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

<Prev in Thread] Current Thread [Next in Thread>
  • Re: problems after failover, Roberto Nibali <=