LVS
lvs-devel
Google
 
Web LinuxVirtualServer.org

[lvs-devel] IPVS fix: real server overcommit on director failover

Subject: [lvs-devel] IPVS fix: real server overcommit on director failover
From: horms at verge.net.au (Simon Horman)
Date: Mon, 29 Oct 2007 14:53:43 +0900
[ Again, I forgot the CC to lvs-devel ]

On Mon, Oct 29, 2007 at 02:45:28PM +0900, Simon Horman wrote:
> [CCed lvs-devel, just so this is archived somewhere
>  CCed Wensong and Julian, the other LVS maintainers]
> 
> On Sat, Oct 27, 2007 at 02:27:19PM +0300, Rumen Bogdanovski wrote:
> > Hallo all,
> > 
> > This patch fixes the problem with node overload on director fail-over.
> > Given the scenario: 2 nodes each accepting 3 connections at a time and 2
> > directors, director failover occurs when the nodes are fully loaded (6
> > connections to the cluster) in this case the new director will assign
> > another 6 connections to the cluster, If the same real servers exist
> > there.
> > 
> > The he problem turned to be in not binding the inherited connections to
> > the real servers (destinations) on the backup director. Therefore:
> > "ipvsadm -l" reports 0 connections:
> > root at test2:~# ipvsadm -l
> > IP Virtual Server version 1.2.1 (size=4096)
> > Prot LocalAddress:Port Scheduler Flags
> >   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
> > TCP  test2.local:5999 wlc
> >   -> node473.local:5999           Route   1000   0          0         
> >   -> node484.local:5999           Route   1000   0          0         
> > 
> > while "ipvs -lnc" is right
> > root at test2:~# ipvsadm -lnc
> > IPVS connection entries
> > pro expire state       source             virtual            destination
> > TCP 14:56  ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999
> > 192.168.0.51:5999
> > TCP 14:59  ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999
> > 192.168.0.52:5999 
> > 
> > So the patch I am sending fixes the problem by binding the received
> > connections to the appropriate service on the backup director, if it
> > exists, else the connection will be handled the old way. So if the
> > master and the backup directors are synchronized in terms of real
> > services there will be no problem with server over-committing since 
> > new connections will not be created on the nonexistent real services 
> > on the backup. With this patch the inherited connections will show as
> > inactive on the backup:
> > 
> > root at test2:~# ipvsadm -l
> > IP Virtual Server version 1.2.1 (size=4096)
> > Prot LocalAddress:Port Scheduler Flags
> >   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
> > TCP  test2.local:5999 wlc
> >   -> node473.local:5999           Route   1000   0          1         
> >   -> node484.local:5999           Route   1000   0          1         
> > 
> > 
> > The patch is based on kernel 2.6.22.10, but patches 2.6.23 also (with
> > two hunks, but it is not tested on 2.6.23).
> > 
> > The result for 2.6.23.1:
> > patching file net/ipv4/ipvs/ip_vs_ctl.c
> > Hunk #2 succeeded at 543 (offset -1 lines).
> > Hunk #3 succeeded at 580 (offset -1 lines).
> > 
> > 
> > Regards,
> > Rumen Bogdanovski
> 
> 
> Hi Rumen,
> 
> this seems entirely reasonably to me if we think of 2 linux-directors
> and a cluster, and thus the connection counts refering to the number
> of connections handled by the cluster. In any case, a connection
> is really just an entry in the connection table, and surely
> synchronised entried meet that criteria.
> 
> The only down side that I can see is for people who really
> want to know how many (non-sychronised) connections a machine
> is handling. But I'm not entirely sure that such people exist.
> 
> I have made a few inline comments below, but generally everything
> seems fine, and the patch seems to apply cleanly to DaveM's net-2.6
> tree - which basically means it should be easy to get him to apply it.
> 
> http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=summary
> 
> One thing that would make my life a little easier is if you added a
> sign-off line, as per section 5 the following document. Basically
> it says that this is your work, and that you have the rights to
> add ask for it to be included in the Linux kernel which is GPL licenced.
> 
> http://linux.yyz.us/patch-format.html
> 
> 
> > diff -Naur linux-2.6.22.10/include/net/ip_vs.h 
> > linux-2.6.22.10_patch/include/net/ip_vs.h
> > --- linux-2.6.22.10/include/net/ip_vs.h     2007-10-10 20:50:35.000000000 
> > +0300
> > +++ linux-2.6.22.10_patch/include/net/ip_vs.h       2007-10-27 
> > 12:09:06.000000000 +0300
> > @@ -901,6 +901,9 @@
> >  extern void ip_vs_use_count_dec(void);
> >  extern int ip_vs_control_init(void);
> >  extern void ip_vs_control_cleanup(void);
> > +extern struct ip_vs_dest *
> > +ip_vs_find_dest( __be32 daddr,__be16 dport,
> > +                 __be32 vaddr,__be16 vport,__u16 protocol);
> >  
> >  
> >  /*
> > diff -Naur linux-2.6.22.10/net/ipv4/ipvs/ip_vs_core.c 
> > linux-2.6.22.10_patch/net/ipv4/ipvs/ip_vs_core.c
> > --- linux-2.6.22.10/net/ipv4/ipvs/ip_vs_core.c      2007-10-10 
> > 20:50:35.000000000 +0300
> > +++ linux-2.6.22.10_patch/net/ipv4/ipvs/ip_vs_core.c        2007-10-27 
> > 04:14:43.000000000 +0300
> > @@ -43,7 +43,6 @@
> >  
> >  #include <net/ip_vs.h>
> >  
> > -
> 
> This is a whitespace-only change.
> I'm quite ok with whitespace fixes, in fact I quite like them.
> But please put them in a separate patch.
> 
> 
> >  EXPORT_SYMBOL(register_ip_vs_scheduler);
> >  EXPORT_SYMBOL(unregister_ip_vs_scheduler);
> >  EXPORT_SYMBOL(ip_vs_skb_replace);
> > @@ -55,6 +54,7 @@
> >  EXPORT_SYMBOL(ip_vs_tcp_conn_listen);
> >  #endif
> >  EXPORT_SYMBOL(ip_vs_conn_put);
> > +EXPORT_SYMBOL(ip_vs_find_dest);
> >  #ifdef CONFIG_IP_VS_DEBUG
> >  EXPORT_SYMBOL(ip_vs_get_debug_level);
> >  #endif
> > diff -Naur linux-2.6.22.10/net/ipv4/ipvs/ip_vs_ctl.c 
> > linux-2.6.22.10_patch/net/ipv4/ipvs/ip_vs_ctl.c
> > --- linux-2.6.22.10/net/ipv4/ipvs/ip_vs_ctl.c       2007-10-10 
> > 20:50:35.000000000 +0300
> > +++ linux-2.6.22.10_patch/net/ipv4/ipvs/ip_vs_ctl.c 2007-10-27 
> > 13:00:49.000000000 +0300
> > @@ -17,6 +17,9 @@
> >   *              2 of the License, or (at your option) any later version.
> >   *
> >   * Changes:
> > + * 
> > + * Rumen Bogdanovski :  Added function ip_vs_find_dest() to be used in the 
> > + *                      backup sync daemon.
> >   *
> >   */
> 
> Adding yourself to the Changes file isn't entirely neccessary
> as all changes are logged through git these days. But if you
> want your name there, thats fine by me.
> 
> >  
> > @@ -541,7 +544,6 @@
> >      *      Return the first found entry
> >      */
> >     hash = ip_vs_rs_hashkey(daddr, dport);
> > -
> 
> Again a whitespace change.
> 
> >     read_lock(&__ip_vs_rs_lock);
> >     list_for_each_entry(dest, &ip_vs_rtable[hash], d_list) {
> >             if ((dest->addr == daddr)
> > @@ -579,6 +581,28 @@
> >     return NULL;
> >  }
> >  
> > +/*
> > + * Find destination by {daddr,dport,vaddr,protocol}
> > + * Cretaed to be used in ip_vs_process_message() in 
> > + * the backup synchronization daemon. It finds the 
> > + * destination to be bound to the received connection 
> > + * on the backup.
> > + * 
> > + * ip_vs_lookup_real_service() looked promissing, but 
> > + * seems not working as expected.
> > + * 
> > + * by Rumen Bogdanovski <rumen at voicecho.com>
> > + */
> 
> Again, putting your name here isn't entirely neccessary.
> 
> > +struct ip_vs_dest *ip_vs_find_dest( __be32 daddr,__be16 dport,
> > +                                    __be32 vaddr,__be16 vport,__u16 
> > protocol) {
> > +   struct ip_vs_dest *dest;
> > +   struct ip_vs_service * svc;
> > +   svc = ip_vs_service_get(0, protocol, vaddr, vport);
> > +   if(!svc) return NULL;
> > +   dest=ip_vs_lookup_dest(svc,daddr,dport);
> > +   ip_vs_service_put(svc);
> > +   return dest;
> > +}
> >  
> >  /*
> >   *  Lookup dest by {svc,addr,port} in the destination trash.
> > diff -Naur linux-2.6.22.10/net/ipv4/ipvs/ip_vs_sync.c 
> > linux-2.6.22.10_patch/net/ipv4/ipvs/ip_vs_sync.c
> > --- linux-2.6.22.10/net/ipv4/ipvs/ip_vs_sync.c      2007-10-10 
> > 20:50:35.000000000 +0300
> > +++ linux-2.6.22.10_patch/net/ipv4/ipvs/ip_vs_sync.c        2007-10-27 
> > 12:25:27.000000000 +0300
> > @@ -17,6 +17,9 @@
> >   * Alexandre Cassen        :       Added SyncID support for incoming sync
> >   *                                 messages filtering.
> >   * Justin Ossevoort        :       Fix endian problem on sync message size.
> > + *  Rumen Bogdanovski   :   Added binding the received connections to the 
> > + *                          appropriate destinations on the backup daemon
> > + *                          (if the destination exists) 
> >   */
> 
> The whitespaece in the change above seems strange.
> It might be better to use tabs as per the above entries.
> 
> >  
> >  #include <linux/module.h>
> > @@ -284,8 +287,10 @@
> >     struct ip_vs_sync_conn *s;
> >     struct ip_vs_sync_conn_options *opt;
> >     struct ip_vs_conn *cp;
> > +   struct ip_vs_dest *dest;
> >     char *p;
> >     int i;
> > +   
> 
> This new blank line seems uneccessary.
> 
> >  
> >     /* Convert size back to host byte order */
> >     m->size = ntohs(m->size);
> > @@ -317,11 +322,19 @@
> >                                            s->caddr, s->cport,
> >                                            s->vaddr, s->vport);
> >             if (!cp) {
> > +                   /*
> > +                    * Find the appropriate destination for the connection.
> > +                    * If it is not found the connection will remain unbound
> > +                    * but still handled.
> > +                    */
> > +                   dest = ip_vs_find_dest( s->daddr, s->dport,
> > +                                        s->vaddr, s->vport,
> > +                                        s->protocol);
> >                     cp = ip_vs_conn_new(s->protocol,
> >                                         s->caddr, s->cport,
> >                                         s->vaddr, s->vport,
> >                                         s->daddr, s->dport,
> > -                                       flags, NULL);
> > +                                       flags, dest);
> >                     if (!cp) {
> >                             IP_VS_ERR("ip_vs_conn_new failed\n");
> >                             return;
> > @@ -344,7 +357,7 @@
> >             atomic_set(&cp->in_pkts, sysctl_ip_vs_sync_threshold[0]);
> >             cp->timeout = IP_VS_SYNC_CONN_TIMEOUT;
> >             ip_vs_conn_put(cp);
> > -
> > +           
> 
> The whitespace change on the line immediately above is spurious
> and adds trailing whitespace.
> 
> >             if (p > buffer+buflen) {
> >                     IP_VS_ERR("bogus message\n");
> >                     return;
> 
> 
> -- 
> Horms
>   H: http://www.vergenet.net/~horms/
>   W: http://www.valinux.co.jp/en/
> 

-- 
Horms
  H: http://www.vergenet.net/~horms/
  W: http://www.valinux.co.jp/en/



<Prev in Thread] Current Thread [Next in Thread>
  • [lvs-devel] IPVS fix: real server overcommit on director failover, Simon Horman <=