LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: LVS oops/panic

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: LVS oops/panic
Cc: wensong@xxxxxxxxxxxx
From: Horms <horms@xxxxxxxxxxxx>
Date: Wed, 30 Mar 2005 17:02:43 +0900
On Mon, Mar 28, 2005 at 03:51:01PM +0900, Horms wrote:
> On Sat, Mar 26, 2005 at 11:56:15AM +1100, Bradley Baetz wrote:
> > We have multiple LVS systems set up in master/backup configuration,
> > running 2.4.27. The backup server is a recently setup pair oopses and
> > then panics once every 0.5-2 weeks.
> > 
> > The only non-build patch to the stock 2.4.27 kernel is a 2.4 version of
> > http://sourceforge.net/mailarchive/message.php?msg_id=9252540 which I
> > don't think is causing the problem here.
> > 
> > The one that fails is the busiest of the various pairs that we have set
> > up, both in terms of number of connections, and number of real servers
> > set up - its handling mail for a large ISP.
> > Its also running an smp kernel, with a single CPU with hyperthreading,
> > while the other LVS systems we have are UP. That may be related..
> > 
> > Theres one VIP, with multiple services on different ports, all set up
> > for LVS-DR and managed with keepalived.
> > 
> > This is not hardware related; we swapped out the entire chassis last
> > week and its failed again.
> > 
> > Happy to run some debugging patches on this kernel if required, after
> > the long weekend.

I have taken a look into this, and firstly I think that the oops is
actually occuring in ip_vs_conn_unhash(), not
__kstrtab_register_ip_vs_app() as the trace seems to indicate.
This makes more sense as ip_vs_conn_unhash() is actually called by
ip_vs]ip_vs_conn_expire(), which is in turn controlled by a timer.

http://archive.linuxvirtualserver.org/html/lvs-users/2005-03/msg00199.html

More specifically I think that the oops is occuring in list_del(),
which makes some sense if list_del() was called on the same cp
without list_add() being called in between. In a nutshell,
ip_vs_conn_unhash() being called twice, without ip_vs_conn_hash()
being called in between.

First, I am assuming that you were running an SMP kernel on 
a machine that had multiple CPUs or CPU threads (hyperthreading).
I know the former is true from some offline correspondance.

The use and behaviour of ip_vs_conn_unhash() and ip_vs_conn_hash()
is nicely summarised by the patch below that reworked it somewhat.
Actually, the patch below is from 2.6 not 2.4, but its the same code
as far as I can see.

What I am concerned about is ip_vs_conn_hashkey() is called
by ip_vs_conn_unhash() and ip_vs_conn_hash() without the
cp being locked. I wonder if a situation arose where the port of
cp was modified between its calculation and calling ct_write_lock(hash)
might cause an issue with the wront part of the table being locked
and thus two writers being permitted.

I am also somewhat curious that cp->lock is only used in 
ip_vs_conn_fill_cport(). I am not really sure why it is needed
if the behaviour of ip_vs_conn_unhash() is reliable. And if it is
needed, why it is not also needed in ip_vs_check_template().

-- 
Horms

# origin: ja (BitKeeper)
# cset: 1.1371.617.12 (2.6) key=405fa317kaaZXZ4SjfkiBI9rIrvHWw
# URL: 
http://linux.bkbits.net:8080/linux-2.6/cset@405fa317kaaZXZ4SjfkiBI9rIrvHWw
# inclusion: upstream
# descrition: [IPVS] Fix connection rehashing with new cport
# revision date: Wed, 30 Mar 2005 16:28:12 +0900
#
# S rset: ChangeSet|1.1371.617.11..1.1371.617.12
# I rset: net/ipv4/ipvs/ip_vs_conn.c|1.13..1.14
#
# Key:
# S: Skipped  ChangeSet file only
# O: Original Followed by Updated
# U: Updated  Included with updated range of versions
# I: Included Included verbatim
# E: Excluded Excluded on request from user
# D: Deleted  Manually deleted by subsequent user edit
# R: Revised  Manually revised by subsequent user edit
#
#
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/03/22 18:38:15-08:00 ja@xxxxxx 
#   [IPVS] Fix connection rehashing with new cport
# 
# net/ipv4/ipvs/ip_vs_conn.c
#   2004/03/22 18:38:03-08:00 ja@xxxxxx +44 -32
#   [IPVS] Fix connection rehashing with new cport
# 
#
===== net/ipv4/ipvs/ip_vs_conn.c 1.13 vs 1.14 =====
--- 1.13/net/ipv4/ipvs/ip_vs_conn.c     2004-02-19 06:03:52 +09:00
+++ 1.14/net/ipv4/ipvs/ip_vs_conn.c     2004-03-23 11:38:03 +09:00
@@ -125,25 +125,27 @@ static unsigned int ip_vs_conn_hashkey(u
 static inline int ip_vs_conn_hash(struct ip_vs_conn *cp)
 {
        unsigned hash;
-
-       if (cp->flags & IP_VS_CONN_F_HASHED) {
-               IP_VS_ERR("ip_vs_conn_hash(): request for already hashed, "
-                         "called from %p\n", __builtin_return_address(0));
-               return 0;
-       }
+       int ret;
 
        /* Hash by protocol, client address and port */
        hash = ip_vs_conn_hashkey(cp->protocol, cp->caddr, cp->cport);
 
        ct_write_lock(hash);
 
-       list_add(&cp->c_list, &ip_vs_conn_tab[hash]);
-       cp->flags |= IP_VS_CONN_F_HASHED;
-       atomic_inc(&cp->refcnt);
+       if (!(cp->flags & IP_VS_CONN_F_HASHED)) {
+               list_add(&cp->c_list, &ip_vs_conn_tab[hash]);
+               cp->flags |= IP_VS_CONN_F_HASHED;
+               atomic_inc(&cp->refcnt);
+               ret = 1;
+       } else {
+               IP_VS_ERR("ip_vs_conn_hash(): request for already hashed, "
+                         "called from %p\n", __builtin_return_address(0));
+               ret = 0;
+       }
 
        ct_write_unlock(hash);
 
-       return 1;
+       return ret;
 }
 
 
@@ -154,24 +156,24 @@ static inline int ip_vs_conn_hash(struct
 static inline int ip_vs_conn_unhash(struct ip_vs_conn *cp)
 {
        unsigned hash;
-
-       if (!(cp->flags & IP_VS_CONN_F_HASHED)) {
-               IP_VS_ERR("ip_vs_conn_unhash(): request for unhash flagged, "
-                         "called from %p\n", __builtin_return_address(0));
-               return 0;
-       }
+       int ret;
 
        /* unhash it and decrease its reference counter */
        hash = ip_vs_conn_hashkey(cp->protocol, cp->caddr, cp->cport);
+
        ct_write_lock(hash);
 
-       list_del(&cp->c_list);
-       cp->flags &= ~IP_VS_CONN_F_HASHED;
-       atomic_dec(&cp->refcnt);
+       if (cp->flags & IP_VS_CONN_F_HASHED) {
+               list_del(&cp->c_list);
+               cp->flags &= ~IP_VS_CONN_F_HASHED;
+               atomic_dec(&cp->refcnt);
+               ret = 1;
+       } else
+               ret = 0;
 
        ct_write_unlock(hash);
 
-       return 1;
+       return ret;
 }
 
 
@@ -285,12 +287,18 @@ void ip_vs_conn_put(struct ip_vs_conn *c
  */
 void ip_vs_conn_fill_cport(struct ip_vs_conn *cp, __u16 cport)
 {
-       atomic_dec(&ip_vs_conn_no_cport_cnt);
-       ip_vs_conn_unhash(cp);
-       cp->flags &= ~IP_VS_CONN_F_NO_CPORT;
-       cp->cport = cport;
-       /* hash on new dport */
-       ip_vs_conn_hash(cp);
+       if (ip_vs_conn_unhash(cp)) {
+               spin_lock(&cp->lock);
+               if (cp->flags & IP_VS_CONN_F_NO_CPORT) {
+                       atomic_dec(&ip_vs_conn_no_cport_cnt);
+                       cp->flags &= ~IP_VS_CONN_F_NO_CPORT;
+                       cp->cport = cport;
+               }
+               spin_unlock(&cp->lock);
+
+               /* hash on new dport */
+               ip_vs_conn_hash(cp);
+       }
 }
 
 
@@ -457,11 +465,14 @@ int ip_vs_check_template(struct ip_vs_co
                /*
                 * Invalidate the connection template
                 */
-               ip_vs_conn_unhash(ct);
-               ct->dport = 65535;
-               ct->vport = 65535;
-               ct->cport = 0;
-               ip_vs_conn_hash(ct);
+               if (ct->cport) {
+                       if (ip_vs_conn_unhash(ct)) {
+                               ct->dport = 65535;
+                               ct->vport = 65535;
+                               ct->cport = 0;
+                               ip_vs_conn_hash(ct);
+                       }
+               }
 
                /*
                 * Simply decrease the refcnt of the template,
@@ -493,7 +504,8 @@ static void ip_vs_conn_expire(unsigned l
        /*
         *      unhash it if it is hashed in the conn table
         */
-       ip_vs_conn_unhash(cp);
+       if (!ip_vs_conn_unhash(cp))
+               goto expire_later;
 
        /*
         *      refcnt==1 implies I'm the only one referrer

<Prev in Thread] Current Thread [Next in Thread>