Hello Alexandre,
> La pe`che ? :)
Qui pe`che avec ratz pe`che honorablement!
> a :) nice, and exactly what is needed for linux !. When developing
> networking code such zebra, VRRP, ... link state monitoring is really
> needed and must be included into the state machine code to not
perturbe the
> network protocol native functionality.
the problem is that not all NICs support this phy->state checking.
> This mean that without NIC link state/hardware state monitoring it can
> introduce a side effect in many protocol. For example in VRRP, if the
> instance is in BACKUP state <=> waiting for remote MASTER adverts. If we
> unplug the BACKUP NIC but the MASTER still active, then BACKUP will not
> receive MASTER adverts and deduce MASTER is down... so will transit to
> MASTER state ....
This is a common problem with HA frameworks. Either make usage of a
STOMITH capability or you try to write intelligent server/client
software :). With intelligent I mean that you implement this phy->state
checking on the MASTER thread and on the BACKUP state. If you unplug the
network cable and the situation you described occurs, then make the
policy that the BACKUP always goes up if he can (provided his phy->state
is not down also) and that the MASTER always shuts down, independant of
the phy->state. The MASTER will be active when he gets at least two
solitation link beats from the BACKUP (now MASTER) telling him
everything is back ok (the idiot that removed the cable had his coffee
now and decided that the cable was vital to get the $2M e-commerce
project online)
> This side effect is very noisy and can be sometime worked around with
some
> logics algo but still some side effect (for me "sync_instance"
without link
> state reporting introduce a noisy loop into the state machine :/).
What do you mean by noisy loop? For me this is a nop in defined time
intervals.
> Link state like MII-register monitoring is really needed and must be used
> in routing soft (like zebra :) as Stefan mention in his LKML mail).
Yes. But as you can read the following emails of that thread people tend
to disagree to a certain extend. I suggest to Stefan to adjust the rest
of the patch to have a completely independant patch (2 ifdef parts are
missing) so he doesn't change the semantics if one chooses not to choose
CONFIG_LINKWATCH. The parts are:
============================
--- linux-2.4.18ac2/net/core/dev.c Wed Mar 27 00:06:54 2002
+++ linux-2.4.18ac2-stefan/net/core/dev.c Wed Mar 27 00:32:17 2002
@@ -812,7 +818,7 @@
* Device is now down.
*/
-
dev->flags &= ~IFF_UP;
+
dev->flags &= ~(IFF_UP | IFF_RUNNING);
#ifdef CONFIG_NET_FASTROUTE
dev_clear_fastroute(dev);
#endif
============================
--- linux-2.4.18ac2/net/core/Makefile Wed Mar 27 00:06:54 2002
+++ linux-2.4.18ac2-stefan/net/core/Makefile Mon Mar 25 21:54:26 2002
@@ -27,4 +27,6 @@
obj-$(CONFIG_NET_DIVERT) += dv.o
obj-$(CONFIG_NET_PROFILE) += profile.o
+obj-$(CONFIG_LINKWATCH) += link_watch.o
+
include $(TOPDIR)/Rules.make
=============================
> => VRRP RFC spec must be complete with a FAULT state drived according to
> the NIC availibility.
Ugh, how is this possible? Do I understand you correctly that you would
like to put in a policy for handling FAULT state that every NIC driver
then must be able to handle?
> => This FAULT_STATE is really needed in the VRRP FSM => It place the VRRP
> instance in a "waiting for advert" state without the timeout handling for
> BACKUP_STATE. That way we are clean and effective :)
Ok.
> When we develop a network software userspace like BGP, VRRP, ...
(zebra in
> short) what is the best way to handle NIC state activity and state
> transition ?
I've the same questions especially since `ip link set dev eth0 state
down` should trigger the failover but of course it doesn't since the PHY
register isn't updated only the routing.
> For me the ideal solution would be a netlink broadcast message on
> IFF_RUNING validity. Nice but need to patch/rebuild kernel. And need to
> wait for official integration patch.... But for me it is the final wanted
> functionality for NIC state notification.
Well, patching the kernel is not as bad as it sounds, LVS has done it
with success for years now. Noone is complaining, except maybe Joe and
me ;). That's why I would like to see Stefan's patch clean and
completely independant.
> If we look on MII code, MII is present on most NIC so monitoring
> MII-register is the right way IMHO to handle NIC state notification. MII
> transciever can be probed from userspace using a specific ioctl to
> SIOCGMIIPHY. This userspace tool can be generic and portable to other
> kernels to permit support of this for kernel 2.2 users.
This is not at all implemented on all NICs but you could make a tradeoff
which would probably address 95% of the people which would deploy
keepalived/vrrpd: Take the 3-5 most common NICs and add support there.
You might want to check the status of pollable SIOCGMIIPHY && getting
the right information of various NICs from Jeff Garzik.
> IMHO MII polling (in the VRRP code) can be done throught a MII probe
before
> each sending VRRP advert thread. That way the soft will monitor MII
> transceiver every secondes (since in MASTER state adverts are sent every
> secondes). And in BACKUP state if the VRRP state machine want to
transit to
> MASTER state (no remote MASTER adverts received) MII probe will detect if
> this is the remote MASTER down or a link loose from itself. => So the MII
> states will condition the new VRRP FAULT_STATE. => That way takeover will
> be more quicker enven sync_instance because probe will be done in both
> states MASTER & BACKUP.
You have to make sure that if the MASTER detects link failure you shut
it down, since the BACKUP is about to come up.
> For the MII I need to export mii-diag code into a layer1.c lib in
> keepalived. Basically functions probing MII transiver. The functions will
> be :
>
> o int mii_tranceiver_present(int ifindex); => checking MII availibility
> throught SIOCGMIIPHY ioctl call
The problem is that the SIOCGMIIPHY is not supported by all devices. You
need to add all the structures in 'struct mii_if_info mii' as
mdio_read() calls to the driver. Example:
laphish:~ # ./iftest eth0
Interface [eth0] is up
ioctl(SIOCGMIIPHY): Operation not supported
ioctl(SIOCGMIIREG): Operation not supported
laphish:~ # cat iftest.c
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <linux/if.h>
#include <linux/sockios.h>
#include <sys/ioctl.h>
int main(int argc, char *argv[]){
struct ifreq ifr;
char *device="lo";
int s;
if (argc>1){
device=argv[1];
} else {
printf("Please set a device.\n");
}
if ((s=socket(PF_PACKET, SOCK_DGRAM, 0))<0){
perror("socket");
}
memset(&ifr, 0, sizeof(ifr));
strncpy(ifr.ifr_name, device, IFNAMSIZ-1);
if (ioctl(s, SIOCGIFINDEX, &ifr) < 0) {
printf("Unknown interface %s\n", device);
exit(2);
}
if (ioctl(s, SIOCGIFFLAGS, &ifr)) {
perror("ioctl(SIOCGIFFLAGS)");
exit(2);
}
if (!(ifr.ifr_flags&IFF_UP)) {
printf("Interface [%s] is down\n", device);
} else if (ifr.ifr_flags&(IFF_NOARP|IFF_LOOPBACK)){
printf("Loopback interface which is not ARPable\n");
} else {
printf("Interface [%s] is up\n", device);
}
if (ioctl(s, SIOCGMIIPHY, &ifr) < 0) {
perror("ioctl(SIOCGMIIPHY)");
}
if (ioctl(s, SIOCGMIIREG, &ifr) < 0) {
perror("ioctl(SIOCGMIIREG)");
}
return(0);
}
laphish:~ # ip link set dev eth0 down
laphish:~ # ./iftest eth0
Interface [eth0] is down
ioctl(SIOCGMIIPHY): Operation not supported
ioctl(SIOCGMIIREG): Operation not supported
laphish:~ # ip link set dev eth0 up
laphish:~ # ./iftest eth0
Interface [eth0] is up
ioctl(SIOCGMIIPHY): Operation not supported
ioctl(SIOCGMIIREG): Operation not supported
laphish:~ #
> o struct MII *mii_probe(int ifindex); => probing and fetching MII
infos =>
> during VRRP bootstrap
And these are not present on a wide range of NICs
> o int mii_linkup(int ifindex); => does MII report a properly functional
> link beat ?
On certain NICs I think so, but I'm not sure there.
> that is all :)
Well, you've got Easter time now. Send your wife to some nice holiday
trip and start coding.
> => Will try today :) or this week end
You can contact me offline about the status of your development.
> :) I need to obtain agreements from my employer before... Is there a sign
Give me his phone number and I talk to him.
> up deadline ? signup is needed for OLS passport ?
There is no explicit deadline but OLS is(was?) _the_ linux kernel hacker
event. It is no tradeshow, pure technical talks and BOFs. You certainly
don't wanna miss it, plus since in 2000 Jerome Etienne was a speaker
there and talked about ARPsec and VRRP. You could show up and tell the
people how much you've improved the code and the framework :)
Best regards,
Roberto Nibali, ratz
|