Hello,
On Fri, 30 Nov 2001, Lionel Bringuier wrote:
> On ven, nov 30, 2001 at 02:35:04 +0200, Julian Anastasov wrote:
> > > 1. On a single CPU machine, with a kernel compiled with SMP support, I
> > > get a
> > > kernel freeze in mod_sltimer (ip_vs_timer.c). I get locked on a concurrent
> > > write_lock/write_unlock(&__ip_vs_sltimerlist_lock) acces in mod_sltimer.
> > > That problem disappears if I disable CONFIG_SMP (on a single CPU machine).
> > > Notice that I did not reproduce that with a bi-CPU machine.
> >
> > Can you reproduce it with 0.9.7. It seems it will need fresh
> > kernel.
> I did not try yet. The validation process started some times ago and was
> based on 2.4.5 (which all in all worked quite well), and I was very
> suspicious about all the buzz about VM ans stability in recent kernels. I'll
> give a try to 2.4.16, as it seems to be usable again.
Yes, the results with 2.4.16 will be very interesting
>
> > BTW, how you found that it is in mod_sltimer?
> With an old technique : I added a (dirty) function, which enables to have
> characters printed directly in video memory (because I suspected that printk
> was not as accurate as I expected) :
:)
>
> #include <asm/io.h>
> #define OFFSMAX (80*20) /* 20 lines of chars */
> int ncxoffset;
> static inline void printncx (const char c, char color) {
> int j;
> char * video_mem_v = phys_to_virt(0xb8000);
> video_mem_v += 2*(ncxoffset++));
> if (ncxoffset >= OFFSMAX) ncxoffset = 0;
> *(video_mem_v++)=c;
> *(video_mem_v++)=color;
> for (j=0; j<4; j++) *(video_mem_v+j) = '#'; /* where we are */
> }
>
> Then in ip_vs_timer.c :
> void mod_sltimer(struct timer_list *timer, unsigned long expires)
> {
> int ret;
> printncx('l','B'); /* B : green on red */
> write_lock(&__ip_vs_sltimerlist_lock);
> printncx('L','B');
> timer->expires = expires;
> ret = detach_sltimer(timer);
> internal_add_sltimer(timer);
> printncx('u','B');
> write_unlock(&__ip_vs_sltimerlist_lock);
> printncx('U','B');
> }
>
> And I could see l L u U l L u U l L u U l u (lock). I repeat, that happens
> only on a UP machine with kernel configured as SMP.
>
> > Can you find which ip_vs_conn_put call causes this problem?
> No... how can I (easily ?).
If you compile the kernel with SysRq (Kernel debugging)
then with Alt-SysRq-p to see at what PC each CPU loops, then with
gdb vmlinux to check at which place in LVS the code loops but
it is a difficult process, I can't explain it.
By default, if some lock blocks then the variants are two:
- lock and unlock don't match
- locking in user space does not use _bh functions and the current
user context is interrupted from the same CPU between lock and unlock
But in the case with mod_sltimer I don't see how user space will deal
with connection states. But there should be something we miss.
> > I don't remember for problems with mod_sltimer fixed after 0.9.3.
> > We have to find the problem with your help. Can you tell us the proto used
> > (UDP?), the forwarding method?
> Proto : UDP, forwarding method WRR (preferred) or WLC. I don't use others.
DR/NAT/TUN ?
Regards
--
Julian Anastasov <ja@xxxxxx>
|