[ Again, I forgot the CC to lvs-devel ]
On Mon, Oct 29, 2007 at 02:45:28PM +0900, Simon Horman wrote:
> [CCed lvs-devel, just so this is archived somewhere
> CCed Wensong and Julian, the other LVS maintainers]
>
> On Sat, Oct 27, 2007 at 02:27:19PM +0300, Rumen Bogdanovski wrote:
> > Hallo all,
> >
> > This patch fixes the problem with node overload on director fail-over.
> > Given the scenario: 2 nodes each accepting 3 connections at a time and 2
> > directors, director failover occurs when the nodes are fully loaded (6
> > connections to the cluster) in this case the new director will assign
> > another 6 connections to the cluster, If the same real servers exist
> > there.
> >
> > The he problem turned to be in not binding the inherited connections to
> > the real servers (destinations) on the backup director. Therefore:
> > "ipvsadm -l" reports 0 connections:
> > root at test2:~# ipvsadm -l
> > IP Virtual Server version 1.2.1 (size=4096)
> > Prot LocalAddress:Port Scheduler Flags
> > -> RemoteAddress:Port Forward Weight ActiveConn InActConn
> > TCP test2.local:5999 wlc
> > -> node473.local:5999 Route 1000 0 0
> > -> node484.local:5999 Route 1000 0 0
> >
> > while "ipvs -lnc" is right
> > root at test2:~# ipvsadm -lnc
> > IPVS connection entries
> > pro expire state source virtual destination
> > TCP 14:56 ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999
> > 192.168.0.51:5999
> > TCP 14:59 ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999
> > 192.168.0.52:5999
> >
> > So the patch I am sending fixes the problem by binding the received
> > connections to the appropriate service on the backup director, if it
> > exists, else the connection will be handled the old way. So if the
> > master and the backup directors are synchronized in terms of real
> > services there will be no problem with server over-committing since
> > new connections will not be created on the nonexistent real services
> > on the backup. With this patch the inherited connections will show as
> > inactive on the backup:
> >
> > root at test2:~# ipvsadm -l
> > IP Virtual Server version 1.2.1 (size=4096)
> > Prot LocalAddress:Port Scheduler Flags
> > -> RemoteAddress:Port Forward Weight ActiveConn InActConn
> > TCP test2.local:5999 wlc
> > -> node473.local:5999 Route 1000 0 1
> > -> node484.local:5999 Route 1000 0 1
> >
> >
> > The patch is based on kernel 2.6.22.10, but patches 2.6.23 also (with
> > two hunks, but it is not tested on 2.6.23).
> >
> > The result for 2.6.23.1:
> > patching file net/ipv4/ipvs/ip_vs_ctl.c
> > Hunk #2 succeeded at 543 (offset -1 lines).
> > Hunk #3 succeeded at 580 (offset -1 lines).
> >
> >
> > Regards,
> > Rumen Bogdanovski
>
>
> Hi Rumen,
>
> this seems entirely reasonably to me if we think of 2 linux-directors
> and a cluster, and thus the connection counts refering to the number
> of connections handled by the cluster. In any case, a connection
> is really just an entry in the connection table, and surely
> synchronised entried meet that criteria.
>
> The only down side that I can see is for people who really
> want to know how many (non-sychronised) connections a machine
> is handling. But I'm not entirely sure that such people exist.
>
> I have made a few inline comments below, but generally everything
> seems fine, and the patch seems to apply cleanly to DaveM's net-2.6
> tree - which basically means it should be easy to get him to apply it.
>
> http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=summary
>
> One thing that would make my life a little easier is if you added a
> sign-off line, as per section 5 the following document. Basically
> it says that this is your work, and that you have the rights to
> add ask for it to be included in the Linux kernel which is GPL licenced.
>
> http://linux.yyz.us/patch-format.html
>
>
> > diff -Naur linux-2.6.22.10/include/net/ip_vs.h
> > linux-2.6.22.10_patch/include/net/ip_vs.h
> > --- linux-2.6.22.10/include/net/ip_vs.h 2007-10-10 20:50:35.000000000
> > +0300
> > +++ linux-2.6.22.10_patch/include/net/ip_vs.h 2007-10-27
> > 12:09:06.000000000 +0300
> > @@ -901,6 +901,9 @@
> > extern void ip_vs_use_count_dec(void);
> > extern int ip_vs_control_init(void);
> > extern void ip_vs_control_cleanup(void);
> > +extern struct ip_vs_dest *
> > +ip_vs_find_dest( __be32 daddr,__be16 dport,
> > + __be32 vaddr,__be16 vport,__u16 protocol);
> >
> >
> > /*
> > diff -Naur linux-2.6.22.10/net/ipv4/ipvs/ip_vs_core.c
> > linux-2.6.22.10_patch/net/ipv4/ipvs/ip_vs_core.c
> > --- linux-2.6.22.10/net/ipv4/ipvs/ip_vs_core.c 2007-10-10
> > 20:50:35.000000000 +0300
> > +++ linux-2.6.22.10_patch/net/ipv4/ipvs/ip_vs_core.c 2007-10-27
> > 04:14:43.000000000 +0300
> > @@ -43,7 +43,6 @@
> >
> > #include <net/ip_vs.h>
> >
> > -
>
> This is a whitespace-only change.
> I'm quite ok with whitespace fixes, in fact I quite like them.
> But please put them in a separate patch.
>
>
> > EXPORT_SYMBOL(register_ip_vs_scheduler);
> > EXPORT_SYMBOL(unregister_ip_vs_scheduler);
> > EXPORT_SYMBOL(ip_vs_skb_replace);
> > @@ -55,6 +54,7 @@
> > EXPORT_SYMBOL(ip_vs_tcp_conn_listen);
> > #endif
> > EXPORT_SYMBOL(ip_vs_conn_put);
> > +EXPORT_SYMBOL(ip_vs_find_dest);
> > #ifdef CONFIG_IP_VS_DEBUG
> > EXPORT_SYMBOL(ip_vs_get_debug_level);
> > #endif
> > diff -Naur linux-2.6.22.10/net/ipv4/ipvs/ip_vs_ctl.c
> > linux-2.6.22.10_patch/net/ipv4/ipvs/ip_vs_ctl.c
> > --- linux-2.6.22.10/net/ipv4/ipvs/ip_vs_ctl.c 2007-10-10
> > 20:50:35.000000000 +0300
> > +++ linux-2.6.22.10_patch/net/ipv4/ipvs/ip_vs_ctl.c 2007-10-27
> > 13:00:49.000000000 +0300
> > @@ -17,6 +17,9 @@
> > * 2 of the License, or (at your option) any later version.
> > *
> > * Changes:
> > + *
> > + * Rumen Bogdanovski : Added function ip_vs_find_dest() to be used in the
> > + * backup sync daemon.
> > *
> > */
>
> Adding yourself to the Changes file isn't entirely neccessary
> as all changes are logged through git these days. But if you
> want your name there, thats fine by me.
>
> >
> > @@ -541,7 +544,6 @@
> > * Return the first found entry
> > */
> > hash = ip_vs_rs_hashkey(daddr, dport);
> > -
>
> Again a whitespace change.
>
> > read_lock(&__ip_vs_rs_lock);
> > list_for_each_entry(dest, &ip_vs_rtable[hash], d_list) {
> > if ((dest->addr == daddr)
> > @@ -579,6 +581,28 @@
> > return NULL;
> > }
> >
> > +/*
> > + * Find destination by {daddr,dport,vaddr,protocol}
> > + * Cretaed to be used in ip_vs_process_message() in
> > + * the backup synchronization daemon. It finds the
> > + * destination to be bound to the received connection
> > + * on the backup.
> > + *
> > + * ip_vs_lookup_real_service() looked promissing, but
> > + * seems not working as expected.
> > + *
> > + * by Rumen Bogdanovski <rumen at voicecho.com>
> > + */
>
> Again, putting your name here isn't entirely neccessary.
>
> > +struct ip_vs_dest *ip_vs_find_dest( __be32 daddr,__be16 dport,
> > + __be32 vaddr,__be16 vport,__u16
> > protocol) {
> > + struct ip_vs_dest *dest;
> > + struct ip_vs_service * svc;
> > + svc = ip_vs_service_get(0, protocol, vaddr, vport);
> > + if(!svc) return NULL;
> > + dest=ip_vs_lookup_dest(svc,daddr,dport);
> > + ip_vs_service_put(svc);
> > + return dest;
> > +}
> >
> > /*
> > * Lookup dest by {svc,addr,port} in the destination trash.
> > diff -Naur linux-2.6.22.10/net/ipv4/ipvs/ip_vs_sync.c
> > linux-2.6.22.10_patch/net/ipv4/ipvs/ip_vs_sync.c
> > --- linux-2.6.22.10/net/ipv4/ipvs/ip_vs_sync.c 2007-10-10
> > 20:50:35.000000000 +0300
> > +++ linux-2.6.22.10_patch/net/ipv4/ipvs/ip_vs_sync.c 2007-10-27
> > 12:25:27.000000000 +0300
> > @@ -17,6 +17,9 @@
> > * Alexandre Cassen : Added SyncID support for incoming sync
> > * messages filtering.
> > * Justin Ossevoort : Fix endian problem on sync message size.
> > + * Rumen Bogdanovski : Added binding the received connections to the
> > + * appropriate destinations on the backup daemon
> > + * (if the destination exists)
> > */
>
> The whitespaece in the change above seems strange.
> It might be better to use tabs as per the above entries.
>
> >
> > #include <linux/module.h>
> > @@ -284,8 +287,10 @@
> > struct ip_vs_sync_conn *s;
> > struct ip_vs_sync_conn_options *opt;
> > struct ip_vs_conn *cp;
> > + struct ip_vs_dest *dest;
> > char *p;
> > int i;
> > +
>
> This new blank line seems uneccessary.
>
> >
> > /* Convert size back to host byte order */
> > m->size = ntohs(m->size);
> > @@ -317,11 +322,19 @@
> > s->caddr, s->cport,
> > s->vaddr, s->vport);
> > if (!cp) {
> > + /*
> > + * Find the appropriate destination for the connection.
> > + * If it is not found the connection will remain unbound
> > + * but still handled.
> > + */
> > + dest = ip_vs_find_dest( s->daddr, s->dport,
> > + s->vaddr, s->vport,
> > + s->protocol);
> > cp = ip_vs_conn_new(s->protocol,
> > s->caddr, s->cport,
> > s->vaddr, s->vport,
> > s->daddr, s->dport,
> > - flags, NULL);
> > + flags, dest);
> > if (!cp) {
> > IP_VS_ERR("ip_vs_conn_new failed\n");
> > return;
> > @@ -344,7 +357,7 @@
> > atomic_set(&cp->in_pkts, sysctl_ip_vs_sync_threshold[0]);
> > cp->timeout = IP_VS_SYNC_CONN_TIMEOUT;
> > ip_vs_conn_put(cp);
> > -
> > +
>
> The whitespace change on the line immediately above is spurious
> and adds trailing whitespace.
>
> > if (p > buffer+buflen) {
> > IP_VS_ERR("bogus message\n");
> > return;
>
>
> --
> Horms
> H: http://www.vergenet.net/~horms/
> W: http://www.valinux.co.jp/en/
>
--
Horms
H: http://www.vergenet.net/~horms/
W: http://www.valinux.co.jp/en/
|