LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: Ipvs 0.9.3 : panic on heavy load.

To: Lionel Bringuier <lb@xxxxxxxxxxxxxxxxx>
Subject: Re: Ipvs 0.9.3 : panic on heavy load.
Cc: <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
From: Julian Anastasov <ja@xxxxxx>
Date: Fri, 30 Nov 2001 16:55:20 +0200 (EET)
        Hello,

On Fri, 30 Nov 2001, Lionel Bringuier wrote:

> On ven, nov 30, 2001 at 02:35:04 +0200, Julian Anastasov wrote:
> > > 1. On a single CPU machine, with a kernel compiled with SMP support, I 
> > > get a
> > > kernel freeze in mod_sltimer (ip_vs_timer.c). I get locked on a concurrent
> > > write_lock/write_unlock(&__ip_vs_sltimerlist_lock) acces in mod_sltimer.
> > > That problem disappears if I disable CONFIG_SMP (on a single CPU machine).
> > > Notice that I did not reproduce that with a bi-CPU machine.
> >
> >     Can you reproduce it with 0.9.7. It seems it will need fresh
> > kernel.
> I did not try yet. The validation process started some times ago and was
> based on 2.4.5 (which all in all worked quite well), and I was very
> suspicious about all the buzz about VM ans stability in recent kernels. I'll
> give a try to 2.4.16, as it seems to be usable again.

        Yes, the results with 2.4.16 will be very interesting

>
> > BTW, how you found that it is in mod_sltimer?
> With an old technique : I added a (dirty) function, which enables to have
> characters printed directly in video memory (because I suspected that printk
> was not as accurate as I expected) :

        :)

>
> #include <asm/io.h>
> #define OFFSMAX (80*20) /* 20 lines of chars */
> int ncxoffset;
> static inline void printncx (const char c, char color) {
>     int j;
>     char * video_mem_v = phys_to_virt(0xb8000);
>     video_mem_v += 2*(ncxoffset++));
>       if (ncxoffset >= OFFSMAX) ncxoffset = 0;
>     *(video_mem_v++)=c;
>     *(video_mem_v++)=color;
>     for (j=0; j<4; j++) *(video_mem_v+j) = '#'; /* where we are */
> }
>
> Then in ip_vs_timer.c :
> void mod_sltimer(struct timer_list *timer, unsigned long expires)
> {
>     int ret;
>   printncx('l','B'); /* B : green on red */
>     write_lock(&__ip_vs_sltimerlist_lock);
>   printncx('L','B');
>     timer->expires = expires;
>     ret = detach_sltimer(timer);
>     internal_add_sltimer(timer);
>   printncx('u','B');
>     write_unlock(&__ip_vs_sltimerlist_lock);
>   printncx('U','B');
> }
>
> And I could see l L u U l L u U l L u U l u (lock). I repeat, that happens
> only on a UP machine with kernel configured as SMP.
>
> > Can you find which ip_vs_conn_put call causes this problem?
> No... how can I (easily ?).

        If you compile the kernel with SysRq (Kernel debugging)
then with Alt-SysRq-p to see at what PC each CPU loops, then with
gdb vmlinux to check at which place in LVS the code loops but
it is a difficult process, I can't explain it.

        By default, if some lock blocks then the variants are two:

- lock and unlock don't match

- locking in user space does not use _bh functions and the current
user context is interrupted from the same CPU between lock and unlock

But in the case with mod_sltimer I don't see how user space will deal
with connection states. But there should be something we miss.

> >     I don't remember for problems with mod_sltimer fixed after 0.9.3.
> > We have to find the problem with your help. Can you tell us the proto used
> > (UDP?), the forwarding method?
> Proto : UDP, forwarding method WRR (preferred) or WLC. I don't use others.

        DR/NAT/TUN ?

Regards

--
Julian Anastasov <ja@xxxxxx>



<Prev in Thread] Current Thread [Next in Thread>