On Mon, Mar 28, 2005 at 03:51:01PM +0900, Horms wrote:
> On Sat, Mar 26, 2005 at 11:56:15AM +1100, Bradley Baetz wrote:
> > We have multiple LVS systems set up in master/backup configuration,
> > running 2.4.27. The backup server is a recently setup pair oopses and
> > then panics once every 0.5-2 weeks.
> >
> > The only non-build patch to the stock 2.4.27 kernel is a 2.4 version of
> > http://sourceforge.net/mailarchive/message.php?msg_id=9252540 which I
> > don't think is causing the problem here.
> >
> > The one that fails is the busiest of the various pairs that we have set
> > up, both in terms of number of connections, and number of real servers
> > set up - its handling mail for a large ISP.
> > Its also running an smp kernel, with a single CPU with hyperthreading,
> > while the other LVS systems we have are UP. That may be related..
> >
> > Theres one VIP, with multiple services on different ports, all set up
> > for LVS-DR and managed with keepalived.
> >
> > This is not hardware related; we swapped out the entire chassis last
> > week and its failed again.
> >
> > Happy to run some debugging patches on this kernel if required, after
> > the long weekend.
I have taken a look into this, and firstly I think that the oops is
actually occuring in ip_vs_conn_unhash(), not
__kstrtab_register_ip_vs_app() as the trace seems to indicate.
This makes more sense as ip_vs_conn_unhash() is actually called by
ip_vs]ip_vs_conn_expire(), which is in turn controlled by a timer.
http://archive.linuxvirtualserver.org/html/lvs-users/2005-03/msg00199.html
More specifically I think that the oops is occuring in list_del(),
which makes some sense if list_del() was called on the same cp
without list_add() being called in between. In a nutshell,
ip_vs_conn_unhash() being called twice, without ip_vs_conn_hash()
being called in between.
First, I am assuming that you were running an SMP kernel on
a machine that had multiple CPUs or CPU threads (hyperthreading).
I know the former is true from some offline correspondance.
The use and behaviour of ip_vs_conn_unhash() and ip_vs_conn_hash()
is nicely summarised by the patch below that reworked it somewhat.
Actually, the patch below is from 2.6 not 2.4, but its the same code
as far as I can see.
What I am concerned about is ip_vs_conn_hashkey() is called
by ip_vs_conn_unhash() and ip_vs_conn_hash() without the
cp being locked. I wonder if a situation arose where the port of
cp was modified between its calculation and calling ct_write_lock(hash)
might cause an issue with the wront part of the table being locked
and thus two writers being permitted.
I am also somewhat curious that cp->lock is only used in
ip_vs_conn_fill_cport(). I am not really sure why it is needed
if the behaviour of ip_vs_conn_unhash() is reliable. And if it is
needed, why it is not also needed in ip_vs_check_template().
--
Horms
# origin: ja (BitKeeper)
# cset: 1.1371.617.12 (2.6) key=405fa317kaaZXZ4SjfkiBI9rIrvHWw
# URL:
http://linux.bkbits.net:8080/linux-2.6/cset@405fa317kaaZXZ4SjfkiBI9rIrvHWw
# inclusion: upstream
# descrition: [IPVS] Fix connection rehashing with new cport
# revision date: Wed, 30 Mar 2005 16:28:12 +0900
#
# S rset: ChangeSet|1.1371.617.11..1.1371.617.12
# I rset: net/ipv4/ipvs/ip_vs_conn.c|1.13..1.14
#
# Key:
# S: Skipped ChangeSet file only
# O: Original Followed by Updated
# U: Updated Included with updated range of versions
# I: Included Included verbatim
# E: Excluded Excluded on request from user
# D: Deleted Manually deleted by subsequent user edit
# R: Revised Manually revised by subsequent user edit
#
#
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
# 2004/03/22 18:38:15-08:00 ja@xxxxxx
# [IPVS] Fix connection rehashing with new cport
#
# net/ipv4/ipvs/ip_vs_conn.c
# 2004/03/22 18:38:03-08:00 ja@xxxxxx +44 -32
# [IPVS] Fix connection rehashing with new cport
#
#
===== net/ipv4/ipvs/ip_vs_conn.c 1.13 vs 1.14 =====
--- 1.13/net/ipv4/ipvs/ip_vs_conn.c 2004-02-19 06:03:52 +09:00
+++ 1.14/net/ipv4/ipvs/ip_vs_conn.c 2004-03-23 11:38:03 +09:00
@@ -125,25 +125,27 @@ static unsigned int ip_vs_conn_hashkey(u
static inline int ip_vs_conn_hash(struct ip_vs_conn *cp)
{
unsigned hash;
-
- if (cp->flags & IP_VS_CONN_F_HASHED) {
- IP_VS_ERR("ip_vs_conn_hash(): request for already hashed, "
- "called from %p\n", __builtin_return_address(0));
- return 0;
- }
+ int ret;
/* Hash by protocol, client address and port */
hash = ip_vs_conn_hashkey(cp->protocol, cp->caddr, cp->cport);
ct_write_lock(hash);
- list_add(&cp->c_list, &ip_vs_conn_tab[hash]);
- cp->flags |= IP_VS_CONN_F_HASHED;
- atomic_inc(&cp->refcnt);
+ if (!(cp->flags & IP_VS_CONN_F_HASHED)) {
+ list_add(&cp->c_list, &ip_vs_conn_tab[hash]);
+ cp->flags |= IP_VS_CONN_F_HASHED;
+ atomic_inc(&cp->refcnt);
+ ret = 1;
+ } else {
+ IP_VS_ERR("ip_vs_conn_hash(): request for already hashed, "
+ "called from %p\n", __builtin_return_address(0));
+ ret = 0;
+ }
ct_write_unlock(hash);
- return 1;
+ return ret;
}
@@ -154,24 +156,24 @@ static inline int ip_vs_conn_hash(struct
static inline int ip_vs_conn_unhash(struct ip_vs_conn *cp)
{
unsigned hash;
-
- if (!(cp->flags & IP_VS_CONN_F_HASHED)) {
- IP_VS_ERR("ip_vs_conn_unhash(): request for unhash flagged, "
- "called from %p\n", __builtin_return_address(0));
- return 0;
- }
+ int ret;
/* unhash it and decrease its reference counter */
hash = ip_vs_conn_hashkey(cp->protocol, cp->caddr, cp->cport);
+
ct_write_lock(hash);
- list_del(&cp->c_list);
- cp->flags &= ~IP_VS_CONN_F_HASHED;
- atomic_dec(&cp->refcnt);
+ if (cp->flags & IP_VS_CONN_F_HASHED) {
+ list_del(&cp->c_list);
+ cp->flags &= ~IP_VS_CONN_F_HASHED;
+ atomic_dec(&cp->refcnt);
+ ret = 1;
+ } else
+ ret = 0;
ct_write_unlock(hash);
- return 1;
+ return ret;
}
@@ -285,12 +287,18 @@ void ip_vs_conn_put(struct ip_vs_conn *c
*/
void ip_vs_conn_fill_cport(struct ip_vs_conn *cp, __u16 cport)
{
- atomic_dec(&ip_vs_conn_no_cport_cnt);
- ip_vs_conn_unhash(cp);
- cp->flags &= ~IP_VS_CONN_F_NO_CPORT;
- cp->cport = cport;
- /* hash on new dport */
- ip_vs_conn_hash(cp);
+ if (ip_vs_conn_unhash(cp)) {
+ spin_lock(&cp->lock);
+ if (cp->flags & IP_VS_CONN_F_NO_CPORT) {
+ atomic_dec(&ip_vs_conn_no_cport_cnt);
+ cp->flags &= ~IP_VS_CONN_F_NO_CPORT;
+ cp->cport = cport;
+ }
+ spin_unlock(&cp->lock);
+
+ /* hash on new dport */
+ ip_vs_conn_hash(cp);
+ }
}
@@ -457,11 +465,14 @@ int ip_vs_check_template(struct ip_vs_co
/*
* Invalidate the connection template
*/
- ip_vs_conn_unhash(ct);
- ct->dport = 65535;
- ct->vport = 65535;
- ct->cport = 0;
- ip_vs_conn_hash(ct);
+ if (ct->cport) {
+ if (ip_vs_conn_unhash(ct)) {
+ ct->dport = 65535;
+ ct->vport = 65535;
+ ct->cport = 0;
+ ip_vs_conn_hash(ct);
+ }
+ }
/*
* Simply decrease the refcnt of the template,
@@ -493,7 +504,8 @@ static void ip_vs_conn_expire(unsigned l
/*
* unhash it if it is hashed in the conn table
*/
- ip_vs_conn_unhash(cp);
+ if (!ip_vs_conn_unhash(cp))
+ goto expire_later;
/*
* refcnt==1 implies I'm the only one referrer
|