At 08:52 AM 3/16/00 -0500, Kyle Sparger wrote:
> > By the time the LVS box needs to fail-over, the state of the failing kernel
> > can not be trusted. So send the state to other LVS box may not get correct
> > data and cause more problem than help. It should not have kernel involved
> > at all. Just my 2 cents.
>
>Right, but that assumes that it's the kernel itself that failed here.
>There are other situations, such as cut power, failed network cables, etc,
>that this would be a perfectly reasonable approach.
Being a kernel programmer for too many years too long, I did not
explain everything clearly. Basically, kernel is a piece of code being
exercised millions times every minute. If kernel has any bug, it will
show quickly. However, if anything wrong outside the kernel, cable,
NIC, etc like you pointed out, it will affect the kernel code making
decision that might be wrong. This does not mean there is any thing
wrong about the kernel code -- it depends on the drivers for all other
parts to provide its information.
For most time, a human being died is not because its brain died, it
because other body parts failing. The brain is still in perfect condition,
or maybe even smarter than many young energetic brain. But the
other failing body parts cause brain making decision that is not
normal any more. Same apply to our computers. I would not trust
any decision made from a critically failing computer. You may
argue that many times the failing computer is not critically failed.
What about that small percentage time that the computer was
critically failed? The method has to be able to handle both.
In addition, if the logic is in the kernel, how the kernel communicate
with the other computers? It must though LAN, which may going
through the failing NIC or broken cable. What if someone wants
to let the configuration going through an encrypted method to
the other LVS boxes? Would the encryption code be added into
the kernel also?
Not mentioning the kernel burden will introduce the performance
issue. Maybe today that is not so obvious for most of us. But the
speed of communication is going up. I just read something
yesterday that Schwab moved everything to Gigabit network.
I love wonderful LVS code that Wensong did, but I have to say
that the current LVS code has hard time to reach close to Gigabit
in NAT mode -- even without LVS code, filling up the Gigabit pipe
will take an extraordinary computer to do it.
Best regards,
Wayne
>I don't see how you're going to get around the fact that the kernel could
>be lying to the backup. You have to trust it to give you good states, or
>the whole exercise is pointless, right? About all I can imagine you can
>do is make sure what input you get doesn't break any rules.
>
>Actually, that's not all, come to think of it. You could change the
>kernel oops code (and/or whatever other code may be more appropriate) to
>shut down the state transfer thread immediately, and notify the backup
>server that it just detected an internal fault, and have it take over the
>operation. This may be overly paranoid, but since part of what we're
>aiming for is HA, it pays to be paranoid.
>
>Also, even if the program is in user space, the kernel's going to be
>involved, since it has to go out over the wire to the backup director
>(through the serial drivers, through the tcp/ip stack, etc), so I don't
>think that's a very good argument against putting this stuff in the
>kernel.
>
>Kyle Sparger
>
>On Wed, 15 Mar 2000, Wayne wrote:
>
> > At 10:07 AM 3/16/00 +0800, Wensong Zhang wrote:
> >
> >
> >
> > >On Wed, 15 Mar 2000, Ratz wrote:
> > >
> > > > I cannot get the point out of your new ip_vs_random_drop_syn function.
> > > > At which point could such a function be important? Or what exactly has
> > > > to occur that you want to drop a connection?
> > > > I mean, standard (without ISN-prediction) SYN flooding is IMHO not
> > > > possible to a 2.2.14 kernel unless you set the
> > > > /proc/sys/net/ipv4/tcp_max_syn_backlog to a too high value.
> > > >
> > > > Please, could you enlighten me once more?
> > > >
> > >
> > >Yeah, syncookie in the kernel 2.2.x can help tcp connection avoid syn
> > >flooding attach, I mean that it is work on the TCP layer. However, IPVS is
> > >working on IP layer, each entry (marking connection state) need 128 bytes
> > >effective memory. Random Syn-drop is to randomly drop some syn entry
> > >before running out of memory. It may help IPVS box survive even under a
> > >big distributed syn-flooding attach, but real servers still need setup
> > >syncookie to prevent themselves from syn-flooding attack.
> > >
> > > >
> > > > BTW.: What are the plans for transfering the ip_vs_masq_table from one
> > > > kernel to another one in case of a failover of the loadbalancer? Is
> > > > there already some idea or whatever?
> > > >
> > >
> > >I just thought an idea on transfering the state table, it might be good.
> > >We run a SendingState and a ReceivingState kernel_thread (daemons inside
> > >the kernel like kflushd and kswapd) on the primary IPVS and the backup
> > >respectively. Everytime the primary handles packets, it will put the
> > >change of state in a sending queue. The SendingState kernel_thread will
> > >wake up every HZ or HZ/2, to send the change in the queue to the
> > >ReceivingState kernel_thread through UDP packets, and clear the queue
> > >finally. The ReceivingState receives the packets and changes its own state
> > >table.
> > >
> > >Since all is inside the kernel, it should be efficient, because the
> > >switching overhead between the kernel and the user space (both for the UDP
> > >communications and the read & write of those entries) can be avoided.
> > >
> > >Any comments?
> >
> > By the time the LVS box needs to fail-over, the state of the failing kernel
> > can not be trusted. So send the state to other LVS box may not get correct
> > data and cause more problem than help. It should not have kernel involved
> > at all. Just my 2 cents.
> >
> >
> > >Thanks,
> > >
> > >Wensong
> > >
> > >
> >
> >
> >
>
>
>
>
|