LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: Ipvs 0.9.3 : panic on heavy load.

To: Lionel Bringuier <lb@xxxxxxxxxxxxxxxxx>
Subject: Re: Ipvs 0.9.3 : panic on heavy load.
Cc: Wensong Zhang <wensong@xxxxxxxxxxxx>, <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
From: Julian Anastasov <ja@xxxxxx>
Date: Mon, 3 Dec 2001 19:40:00 +0200 (EET)
        Hello,

On Mon, 3 Dec 2001, Lionel Bringuier wrote:

> Hi.
>
> On sam, déc 01, 2001 at 10:22:09 +0000, Julian Anastasov wrote:
> > > And I could see l L u U l L u U l L u U l u (lock). I repeat, that happens
> > > only on a UP machine with kernel configured as SMP.
> >     I hope if you use IPVS as modules you fix your compile
> > options in ipvs/Makefile?
> Yes, the compile options are right. I use the IPVS patch BTW.

        When hunting for bugs in IPVS it is really recommended
to use the compiled in version of IPVS, it is more difficult to
find bugs in modules.

> >     May be we have to reorder the include files and to use
> > something like:
> > #include <linux/config.h>
> > #if defined(CONFIG_SMP) && !defined(__SMP__)
> > #define     __SMP__
> > #endif
> That is propably it... I tried on a bi-CPU box (with CONFIG_SMP naturaly),
> and it went through perfectly.

        But it seems __SMP__ is for 2.2 only, my mistake (I recently
compiled eth drivers for 2.2 ...).

>
> > - locking in user space does not use _bh functions and the current
> > user context is interrupted from the same CPU between lock and unlock
> > But in the case with mod_sltimer I don't see how user space will deal
> > with connection states. But there should be something we miss.
> OK, that's a good point. In fact, I am using an extension over IPVS that
> provides a userland queing, adapted from netfilter's userland queuing. I

        Ops. It is still not clear to me, is the plain IPVS with
problems near mod_sltimer? Or only with the patched IPVS?

> tried to replace all the 'spin_lock' and 'spin_unlock' with
> 'spin_(un)lock_bh', but I still have the crash. Or maybe I did not catch it
> right, and I was too brutal while changing all the spin_(un)lock calls in
> ipvs code... Of course, I can provide you the code if that is of any help.

        The IPVS lockings are not designed for such things. I hope
you know what you are doing. I don't know the netfilter's queueing
at all. May be you already know that IPVS does not like the new
netfilter's design and we have different kind of handling at some
chains.

> Furthermore, when I stuck to 'plain' kernel panics, I noticed that in all my
> call stacks I was interrupted by irq0 during sltimer manipulations
> (sltimer_handler / del_sltimer). And, as I did not mention that earlier, my
> routing policy is NAT (masq.).

        The networking code avoids disabling the hardware IRQs. You
should be aware of interrupting the user space from softirq/bh (on the
same CPU). sltimer_handler is started from the timer bh. At the same
time the packet handling (the other caller of mod_sltimer) can run
only on another CPU (softirq). The *_lock functions are enough for
such situation.

> I did not try with 2.4.16/ipvs 0.9.7 yet, but I haven't seen anything in the
> changelog that could deal with that. However, I'll do that this week

        IPVS 0.9.7 has more places where your patch can fail. We
recently added many optimizations ready for receiving fragmented packets,
once the kernel feed us with them. You have to be very careful.

> Regards,
>
> LB.

Regards

--
Julian Anastasov <ja@xxxxxx>



<Prev in Thread] Current Thread [Next in Thread>