LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] LVS-NAT wrr crashing + tiny patch

To: "'LinuxVirtualServer.org users mailing list.'" <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [lvs-users] LVS-NAT wrr crashing + tiny patch
From: "Kees Hoekzema" <kees@xxxxxxxxxxxx>
Date: Thu, 12 Jul 2007 18:47:30 +0200
Ok, I have found a bit more information from my debugging, 
and it seems that Horms already knows about it:
http://marc.info/?l=linux-netdev&m=118040107213444&w=2


Basicly, I adjust the values a lot too, not as often as 
two times per second, but quite often. I recompiled the 
Ipvs modules with a bit more debug and everytime my system
crashed I had the same debug output:

Jul 12 15:43:15 atropos kernel: Enter: ip_vs_edit_dest,
net/ipv4/ipvs/ip_vs_ctl.c line 885
Jul 12 15:43:15 atropos kernel: DEBUG: ip_vs_edit_dest,
net/ipv4/ipvs/ip_vs_ctl.c line 886
Jul 12 15:43:15 atropos kernel: DEBUG: ip_vs_edit_dest,
net/ipv4/ipvs/ip_vs_ctl.c line 891
Jul 12 15:43:15 atropos kernel: DEBUG: ip_vs_edit_dest,
net/ipv4/ipvs/ip_vs_ctl.c line 897
Jul 12 15:43:15 atropos kernel: DEBUG: ip_vs_edit_dest,
net/ipv4/ipvs/ip_vs_ctl.c line 906
Jul 12 15:43:15 atropos kernel: DEBUG: ip_vs_edit_dest,
net/ipv4/ipvs/ip_vs_ctl.c line 908
Jul 12 15:43:15 atropos kernel: DEBUG: ip_vs_edit_dest,
net/ipv4/ipvs/ip_vs_ctl.c line 910
Jul 12 15:43:15 atropos kernel: DEBUG: ip_vs_edit_dest,
net/ipv4/ipvs/ip_vs_ctl.c line 913
Jul 12 15:43:15 atropos kernel: DEBUG: ip_vs_edit_dest,
net/ipv4/ipvs/ip_vs_ctl.c line 916
Jul 12 15:43:15 atropos kernel: DEBUG: ip_vs_edit_dest,
net/ipv4/ipvs/ip_vs_ctl.c line 918
Jul 12 15:43:15 atropos kernel: Leave: ip_vs_edit_dest,
net/ipv4/ipvs/ip_vs_ctl.c line 919
Jul 12 15:43:15 atropos kernel: Enter: ip_vs_edit_dest,
net/ipv4/ipvs/ip_vs_ctl.c line 885
Jul 12 15:43:15 atropos kernel: DEBUG: ip_vs_edit_dest,
net/ipv4/ipvs/ip_vs_ctl.c line 886
Jul 12 15:43:15 atropos kernel: DEBUG: ip_vs_edit_dest,
net/ipv4/ipvs/ip_vs_ctl.c line 891
Jul 12 15:43:15 atropos kernel: DEBUG: ip_vs_edit_dest,
net/ipv4/ipvs/ip_vs_ctl.c line 897
Jul 12 15:43:15 atropos kernel: DEBUG: ip_vs_edit_dest,
net/ipv4/ipvs/ip_vs_ctl.c line 906
Jul 12 15:43:15 atropos kernel: DEBUG: ip_vs_edit_dest,
net/ipv4/ipvs/ip_vs_ctl.c line 908
Jul 12 15:43:15 atropos kernel: DEBUG: ip_vs_edit_dest,
net/ipv4/ipvs/ip_vs_ctl.c line 910

The code after line 910 reads:
while (atomic_read(&svc->usecnt) > 1) {};

Every other busy lock in the code reads:
IP_VS_WAIT_WHILE(atomic_read(&svc->usecnt) > 1);

Which basicly is the same except a cpu_relax();

At the moment I am testing my server with cpu_relax() code in the
ip_vs_edit_dest function, and so far it has not crashed yet and is directing
the traffic quite a bit longer than previously was possible.

The only differences between this server and the old server (which didn't
have any problems) are:
- SMP (4 cores) vs Single core
- 64 bits vs 32 bits
- 2.6.21.5 vs 2.6.20.4 (but I do not see any changes in ip_vs_ctl.c)

In my first mail I accused the 64/32 bits difference, but right now I'm more
thinking of a SMP issue, but unfortunatly I lack the kernel hacking skills
to say why, or why that cpu_relax() helps so much in the while loop.

Well, hopefully Horms understands it better than I do ;)

-kees

--- linux-2.6.22.1/net/ipv4/ipvs/ip_vs_ctl.c    2007-07-12
19:41:27.000000000 +0200
+++ old/net/ipv4/ipvs/ip_vs_ctl.c       2007-07-10 20:56:30.000000000 +0200
@@ -909,8 +909,8 @@
        write_lock_bh(&__ip_vs_svc_lock);

        /* Wait until all other svc users go away */
-       IP_VS_WAIT_WHILE(atomic_read(&svc->usecnt) > 1);
-
+       while (atomic_read(&svc->usecnt) > 1) {};
+
        /* call the update_service, because server weight may be changed */
        svc->scheduler->update_service(svc);


> -----Original Message-----
> From: lvs-users-bounces@xxxxxxxxxxxxxxxxxxxxxx 
> [mailto:lvs-users-bounces@xxxxxxxxxxxxxxxxxxxxxx] On Behalf 
> Of Kees Hoekzema
> Sent: Wednesday, July 11, 2007 14:28
> To: 'LinuxVirtualServer.org users mailing list.'
> Subject: [lvs-users] LVS-NAT wrr crashing on 64-bits
> 
> 
> Well, apparently it didn't have to do anything with the NAT 
> issue Cristi was having; so let's split those two problems as 
> it would seem I have an other problem than him ;).
> 
> -kees
> 
>  
> 
> > -----Original Message-----
> > From: lvs-users-bounces@xxxxxxxxxxxxxxxxxxxxxx
> > [mailto:lvs-users-bounces@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of Kees 
> > Hoekzema
> > Sent: Wednesday, July 11, 2007 14:20
> > To: 'LinuxVirtualServer.org users mailing list.'
> > Subject: Re: [lvs-users] LVS-NAT issue
> > 
> >  
> > 
> > > -----Original Message-----
> > > The problem is as folows: the setup works randomly, fron 
> 15 mins to
> > > 1-2 hours, flawlessly, i might add, serving content from
> > both backend
> > > machines. However, it randomly stops doing that. When that
> > happens, i
> > > cannot ping the VIP from the outside, only from within the
> > LAN (i have
> > > a backup LB, not configured yet, i plan to use ultramonkey
> > later on). 
> > > I checked logs, tcpdumped but with no clue as of what is
> > causing this.
> > > Some input would be really appreciated.
> > 
> > Now I know this is an old message, and this issue has been 
> 'resolved' 
> > by not using LVS-NAT anymore, but recently I had a similar problem.
> > 
> > Let me explain my setup first; I have two loadbalancers, 
> which use wrr 
> > to direct trafic to 5 realservers. A small script on the 
> loadbalancers 
> > checks the realservers periodically and requests some numbers from 
> > them. Based on those numbers the weight of the server is adjusted 
> > using 'ipvsadm --edit-server'.
> > 
> > The setup i described above worked flawlessly for years (well
> > - after an iptables problem, and after a small patch to the 
> wrr code) 
> > until my trafic could spike so high the loadbalancers were 
> not able to 
> > handle it properly.
> > So we decided to upgrade the loadbalancers with new hardware.
> > 
> > The new hardware runs on a quadcore 64-bits Xeon, while the 
> old had a 
> > 32 bits Celeron, so quite an upgrade, and more notable, the 
> new server 
> > was able to process 950 mbit with only 20% cpu time, while 
> the old one 
> > was eating up more than 90% cputime at around 60 mbit.
> > 
> > So we went from a 32 bits OS to a 64 bits OS. We tested the 
> hardware 
> > and it seemed stable, next we put them into production and after 
> > several hours they would crash and would not respond to 
> anything, much 
> > like Cristi experienced before.
> > So we pulled them out and put in the old loadbalancers again and we 
> > started testing a bit more.
> > 
> > After running and writing several program's i got the 
> loadbalancers to 
> > crash finally again but this time in our testing environment. To 
> > achieve a crash i had to generate enough traffic from 
> different ip's 
> > and ports through the ipvs services while running 'ipvsadm 
> > --edit-server' on the loadbalancer.
> > Running the traffic through iptables wouldn't crash the server, nor 
> > would one client ip from different ports bashing the services work.
> > 
> > So i started debugging a lot more and i am still working on it, the 
> > problem being is that the server will freeze totally, so i 
> can't look 
> > up anything.
> > but it seems that changing the weights on the server will make your 
> > system crash if you run it on a 64 bits OS. our 'old' 32 bits 
> > environment still happily changes the values of the servers every 
> > couple of seconds without crashing. So there is somewhere 
> in the code 
> > of the ipvsadm program, or in the kernel code a problem - 
> so i'll keep 
> > debugging.
> > 
> > What i want to know is if there is anyone out there with:
> > 1) a 64 bits installation
> > 2) using wrr
> > 3) is changing the weights on the server while the server 
> is getting 
> > heavy traffic from multiple ip:ports And is experiencing the same 
> > problems as i do; a freezing server which needs a cold reset
> > 
> > For the moment, ill just keep looking at traces to see if i 
> can spot 
> > anything particular, and i hope anyone got a suggestion as 
> to where to 
> > look / what debugger to use.
> > 
> > -kees
> > 
> > 
> > _______________________________________________
> > LinuxVirtualServer.org mailing list - 
> lvs-users@xxxxxxxxxxxxxxxxxxxxxx 
> > Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
> > or go to http://lists.graemef.net/mailman/listinfo/lvs-users
> > 
> 
> 
> _______________________________________________
> LinuxVirtualServer.org mailing list - 
> lvs-users@xxxxxxxxxxxxxxxxxxxxxx Send requests to 
> lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
> or go to http://lists.graemef.net/mailman/listinfo/lvs-users
> 



<Prev in Thread] Current Thread [Next in Thread>