LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

[PATCH][RFC]: add threshhold per RS (dirty hospital version)

To: Wensong Zhang <wensong@xxxxxxxxxxxx>
Subject: [PATCH][RFC]: add threshhold per RS (dirty hospital version)
Cc: "ja@xxxxxx" <ja@xxxxxx>, Joseph Mack <mack.joseph@xxxxxxx>, "lvs-users@xxxxxxxxxxxxxxxxxxxxxx" <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
From: ratz <ratz@xxxxxx>
Date: Mon, 29 Jan 2001 17:16:27 +0100
Hi guys,

This patch on top of ipvs-1.0.3-2.2.18 adds support for threshhold
settings per realserver for all scheduler that have the -w option.

Description/Purpose:
--------------------
I was always thinking of how a kernel based implementation of
connection limitation per real server would work and how it could
be implemented so while waiting in the hospital for the x-ray I
had enough time to write up some little dirty hack to show a 
proof of concept. It works like follows. I added three new entries
to the ip_vs_dest() struct, u_thresh and l_thresh in ip_vs.* and
I modified the ipvsadm to add the two new options x and y.
A typical setup would be:

ipvsadm -A -t 192.168.100.100:80 -s wlc
ipvsadm -a -t 192.168.100.100:80 -r 192.168.100.3:80 -w 3 -x 1145 -y 923
ipvsadm -a -t 192.168.100.100:80 -r 192.168.100.3:80 -w 2 -x 982 -y 677
ipvsadm -a -t 192.168.100.100:80 -r 127.0.0.1:80 -w 1 -x 100 -y 50

So, this means, as soon as (dest->inactconns + dest->activeconns) 
exceed the x value the weight of this server is set to zero. As 
soon as the connections drop below the lower threshhold (y) the 
weight is set back to the initial value. 
What is it good for? Yeah well, I don't know exactly, imagine yourself,
but first of all this is proposal and I wanted to ask for a discussion
about a possible inclusion of such a feature or even a derived one into
the main code (of course after fixing the race conditions and bugs and
cleaning up the code) and second, I found out with tons of talks with
customers that such a feature is needed, because also commercial lb
have this and managers always like to have a nice comparision of all
features to decide which product they take. Doing all this in user-
space is unfortunately just not atomic enough.

Anyway, if anybody else thinks that such a feature might be vital
for inclusion we can talk about it. If you look at the code, it
wouldn't break anything and just add two lousy CPU cycles for checking
if u_thresh is < 0. This feature can easily be disabled by just 
setting u_thresh to zero or not even initialize it.

Well, I'm open for discussion and flames. I have it running in 
production :) but with a special SLA. I implemented the last 
server of resort which works like this: If all RS of a service
are down (healthcheck took it out or treshhold check set weight
to zero), my userspace tool automagically invokes the last 
server of resort, a tiny httpd with a static page saying that 
the service is currently unavailable. This is also useful if you
want to do maintainance of the realservers.

I already implemented a dozen of such setups and they work all
pretty well. 

Best regards,
Roberto Nibali, ratz

-- 
mailto: `echo NrOatSz@xxxxxxxxx | sed 's/[NOSPAM]//g'`
--- ipvsadm.c-old       Mon Jan 29 08:39:45 2001
+++ ipvsadm.c   Mon Jan 29 08:56:24 2001
@@ -265,6 +265,10 @@
         {"gatewaying",'g', POPT_ARG_NONE, NULL, 'g'};
         struct poptOption weight_option =
         {"weight", 'w', POPT_ARG_STRING, &optarg, 'w'};
+        struct poptOption u_thresh_option =
+        {"u_thresh", 'x', POPT_ARG_STRING, &optarg, 'x'};
+        struct poptOption l_thresh_option =
+        {"l_thresh", 'y', POPT_ARG_STRING, &optarg, 'y'};
         struct poptOption numeric_option =
         {"numeric", 'n', POPT_ARG_NONE, NULL, 'n'};
         struct poptOption NULL_option =
@@ -326,6 +330,8 @@
                udp_service_option,
                fwmark_service_option,
                weight_option,
+               u_thresh_option,
+               l_thresh_option,
                real_server_option,
                real_server2_option,
                gatewaying_option,
@@ -517,6 +523,17 @@
                              string_to_number(optarg,0,65535)) == -1)
                                 fail(2, "illegal weight specified");
                         break;
+               case 'x':
+                       if ((mc->u.vs_user.u_thresh=
+                            string_to_number(optarg, 0, 65535)) == -1)
+                               fail(2, "illegal u_thresh specified");
+                       break;
+               case 'y':
+                       if ((mc->u.vs_user.l_thresh=
+                            string_to_number(optarg, 0, 65535)) == -1)
+                               fail(2, "illegal l_thresh specified");
+                       break;
+
                 case 'n':
                         *format |= FMT_NUMERIC;
                         break;
@@ -611,6 +628,8 @@
                {"ipip", 0, 0, 'i'},
                {"gatewaying", 0, 0, 'g'},
                {"weight", 1, 0, 'w'},
+               {"u_thresh", 1, 0, 'x'},
+               {"l_thresh", 1, 0, 'y'},
                {"numeric", 0, 0, 'n'},
                {"help", 0, 0, 'h'},
                {0, 0, 0, 0}
@@ -624,7 +643,7 @@
        /* Re-process the arguments each time options is called*/
        optind = 1;
 
-       if ((cmd = getopt_long(argc, argv, "AEDCSRaedlLhv",
+       if ((cmd = getopt_long(argc, argv, "AEDCSRaedlLhvxy",
                                long_options, NULL)) == EOF)
                usage_exit(argv[0], -1);
 
@@ -643,11 +662,11 @@
                 break;
         case 'a':
                 mc->m_cmd = IP_MASQ_CMD_ADD_DEST;
-                optstr = "t:u:f:w:r:R:gmi";
+                optstr = "t:u:f:w:r:R:gmi:x:y";
                 break;
         case 'e':
                 mc->m_cmd = IP_MASQ_CMD_SET_DEST;
-                optstr = "t:u:f:w:r:R:gmi";
+                optstr = "t:u:f:w:r:R:gmi:x:y";
                 break;
         case 'd':
                 mc->m_cmd = IP_MASQ_CMD_DEL_DEST;
@@ -787,6 +806,20 @@
                              string_to_number(optarg,0,65535)) == -1)
                                 fail(2, "illegal weight specified");
                         break;
+               case 'x':
+                       if (mc->u.vs_user.u_thresh != -1)
+                                fail(2, "multiple server u_thresh specified");
+                       if ((mc->u.vs_user.u_thresh=
+                            string_to_number(optarg, 0, 65535)) == -1)
+                               fail(2, "illegal u_thresh specified");
+                       break;
+               case 'y':
+                       if (mc->u.vs_user.l_thresh != -1)
+                                fail(2, "multiple server l_thresh specified");
+                       if ((mc->u.vs_user.l_thresh=
+                            string_to_number(optarg, 0, 65535)) == -1)
+                               fail(2, "illegal l_thresh specified");
+                       break;
                 case 'n':
                         *format |= FMT_NUMERIC;
                         break;
@@ -814,6 +847,9 @@
         ctl.m_target = IP_MASQ_TARGET_VS;
         /* weight=0 is allowed, which means that server is quiesced */
         ctl.u.vs_user.weight = -1;
+       /* set u_thresh and l_thresh to zero -> disabled */
+       ctl.u.vs_user.u_thresh = 0;
+       ctl.u.vs_user.l_thresh = 0;
         /* Set direct routing as default forwarding method */
         ctl.u.vs_user.masq_flags = IP_MASQ_F_VS_DROUTE;
         /* Set the default persistent granularity to /32 masking */
@@ -1078,7 +1114,7 @@
                 "  %s -R\n"
                 "  %s -S [-n]\n"
 #endif
-                "  %s -[a|e] -[t|u|f] service-address -[r|R] server-address 
[-g|-i|-m] [-w weight]\n"
+                "  %s -[a|e] -[t|u|f] service-address -[r|R] server-address 
[-g|-i|-m] [-w weight] [-x u_thresh] [-y l_thresh]\n"
                 "  %s -d -[t|u|f] service-address -[r|R] server-address\n"
                 "  %s -[L|l] [-n]\n"
                 "  %s -h\n\n",
@@ -1128,6 +1164,8 @@
         fprintf(stream,
                 "  --ipip         -i                   ipip encapsulation 
(tunneling)\n"
                 "  --masquerading -m                   masquerading (NAT)\n"
+                "  --u_thresh     -x <u_thresh>        max. connections\n"
+                "  --l_thresh     -y <l_thresh>        weight fallback 
connections\n"
                 "  --weight       -w <weight>          capacity of real 
server\n"
                 "  --numeric      -n                   numeric output of 
addresses and ports\n"
                );
@@ -1230,7 +1268,7 @@
         }
         if (fgets(buffer, sizeof(buffer), handle) && !(format & FMT_RULE))
                 printf("  -> RemoteAddress:Port          "
-                       "Forward Weight ActiveConn InActConn\n");
+                       "Forward Weight ActiveConn InActConn u_thresh 
l_thresh\n");
         
         /*
          * Print the VS information according to the format
@@ -1280,6 +1318,8 @@
         int weight;
         int activeconns;
         int inactconns;
+       unsigned int u_thresh;
+       unsigned int l_thresh;
         
         int n;
         unsigned long temp;
@@ -1289,11 +1329,11 @@
        
         if (buf[0] == ' ') {
                 /* destination entry */
-                if ((n = sscanf(buf, " %s %lX:%hX %s %d %d %d",
+                if ((n = sscanf(buf, " %s %lX:%hX %s %d %d %d %d %d",
                                 arrow, &temp, &dport, fwd, &weight,
-                                &activeconns, &inactconns)) == -1)
+                                &activeconns, &inactconns, &u_thresh, 
&l_thresh)) == -1)
                         exit(1);
-                if (n != 7)
+                if (n != 9)
                         fail(2, "unexpected input data");
                 
                 daddr.s_addr = (__u32) htonl(temp);
@@ -1315,8 +1355,9 @@
                                        dname, get_fwd_switch(fwd), weight);
                         }
                 } else {
-                        printf("  -> %-27s %-7s %-6d %-10d %-10d\n",
-                               dname , fwd, weight, activeconns, inactconns);
+                        printf("  -> %-27s %-7s %-6d %-10d %-10d %-10d 
%-10d\n",
+                               dname , fwd, weight, activeconns, inactconns, 
+                               u_thresh, l_thresh);
                 }
                 free(dname);
         } else if (buf[0] == 'F') {
Only in linux-2.2.18.vanilla/include/linux: coda_opstats.h
Only in linux-2.2.18.vanilla/include/linux: dasd.h
diff -ur linux-2.2.18.vanilla/include/linux/ip_masq.h 
linux-2.2.18/include/linux/ip_masq.h
--- linux-2.2.18.vanilla/include/linux/ip_masq.h        Fri Jan 26 22:28:59 2001
+++ linux-2.2.18/include/linux/ip_masq.h        Thu Jan 25 08:54:06 2001
@@ -121,6 +121,9 @@
        u_int16_t       dport;
        unsigned        masq_flags;     /* destination flags */
        int             weight;         /* destination weight */
+       int             old_weight;     /* old destination weight */
+       u_int16_t       u_thresh;       /* upper threshold */
+       u_int16_t       l_thresh;       /* lower threshold */
 };
 
 
diff -ur linux-2.2.18.vanilla/include/net/ip_vs.h 
linux-2.2.18/include/net/ip_vs.h
--- linux-2.2.18.vanilla/include/net/ip_vs.h    Fri Jan 26 22:28:59 2001
+++ linux-2.2.18/include/net/ip_vs.h    Sun Jan 28 10:10:28 2001
@@ -110,7 +110,10 @@
         atomic_t               activeconns;    /* active connections */
         atomic_t               inactconns;     /* inactive connections */
         atomic_t               refcnt;         /* reference counter */
+       __u16                   u_thresh;       /* upper threshold */
+       __u16                   l_thresh;       /* lower threshold */
         int                    weight;         /* server weight */
+        int                    old_weight;     /* old server weight */
        struct list_head        d_list;   /* table with all dests */
 
         /* for virtual service */
@@ -215,6 +218,8 @@
 extern int ip_vs_wrr_init(void);
 extern int ip_vs_lc_init(void);
 extern int ip_vs_wlc_init(void);
+extern int ip_vs_lblc_init(void);
+extern int ip_vs_lblcr_init(void);
 
 
 /*
diff -ur linux-2.2.18.vanilla/net/ipv4/ip_vs.c linux-2.2.18/net/ipv4/ip_vs.c
--- linux-2.2.18.vanilla/net/ipv4/ip_vs.c       Fri Jan 26 22:28:59 2001
+++ linux-2.2.18/net/ipv4/ip_vs.c       Sat Jan 27 14:07:42 2001
@@ -69,6 +69,7 @@
  *     Wensong Zhang           :    changed to two service hash tables
  *     Julian Anastasov        :    corrected trash_dest lookup for both
  *                                  normal service and fwmark service
+ *     Roberto Nibali         :    added per realserver threshhold (hospital 
version)
  *
  */
 
@@ -1274,6 +1275,8 @@
          *    Set the weight and the flags
          */
         dest->weight = mm->weight;
+       dest->u_thresh = mm->u_thresh;
+       dest->l_thresh = mm->l_thresh;
         dest->masq_flags = mm->masq_flags;
 
         dest->masq_flags |= IP_MASQ_F_VS;
@@ -1817,9 +1820,21 @@
         ms->dest = dest;
 
         /*
-         *    Increase the refcnt counter of the dest.
+         *    Increase the refcnt counter of the dest and set the weight 
+        *    accordingly. I don't why dest->refcnt is conns+1?
          */
         atomic_inc(&dest->refcnt);
+       if ( dest->u_thresh != 0) {
+               if (( (atomic_read(&dest->inactconns) + 
atomic_read(&dest->activeconns)+1) >= dest->u_thresh) && (dest->weight > 0)){
+                       IP_VS_DBG(7, "Bind-masq [changing weight] conns:%d "
+                               "weight=%d oldweight=%d\n",
+                               atomic_read(&dest->inactconns) + 
+                               atomic_read(&dest->activeconns), 
+                               dest->weight, dest->old_weight);
+                       dest->old_weight=dest->weight;
+                       dest->weight=0;
+               }
+       }
 
         IP_VS_DBG(9, "Bind-masq fwd:%c s:%s c:%u.%u.%u.%u:%d v:%u.%u.%u.%u:%d "
                   "d:%u.%u.%u.%u:%d flg:%X cnt:%d destcnt:%d\n",
@@ -1862,6 +1877,21 @@
                         }
                 }
                 
+               /* 
+                  if all connections are smaller then lower threshhold and the
+                  old weight isn't zero.
+               */
+               if (dest->u_thresh != 0) {
+                       if (((atomic_read(&dest->inactconns) + 
atomic_read(&dest->activeconns)) <= dest->l_thresh) && (dest->old_weight > 0)){
+                               IP_VS_DBG(7, "Unbind-masq conns:%d weight=%d "
+                                    "oldweight=%d\n", 
+                                    atomic_read(&dest->inactconns) + 
+                                    atomic_read(&dest->activeconns), 
+                               dest->weight, dest->old_weight);
+                               dest->weight=dest->old_weight;
+                               dest->old_weight=0;
+                       }
+               }
                 /*
                  *  Decrease the refcnt of the dest, and free the dest
                  *  if nobody refers to it (refcnt=0).
@@ -2415,7 +2445,7 @@
         size = sprintf(buf+len,
                        "IP Virtual Server version %d.%d.%d (size=%d)\n"
                        "Prot LocalAddress:Port Scheduler Flags\n"
-                       "  -> RemoteAddress:Port Forward Weight ActiveConn 
InActConn\n",
+                       "  -> RemoteAddress:Port Forward Weight ActiveConn 
InActConn u_thresh l_thresh\n",
                        NVERSION(IP_VS_VERSION_CODE), IP_VS_TAB_SIZE);
         pos += size;
         len += size;
@@ -2456,13 +2486,15 @@
                                 dest = list_entry(q,struct ip_vs_dest,n_list);
                                 size = sprintf(buf+len,
                                                "  -> %08X:%04X      %-7s "
-                                               "%-6d %-10d %-10d\n",
+                                               "%-6d %-10d %-10d %-10d 
%-10d\n",
                                                ntohl(dest->addr),
                                                ntohs(dest->port),
                                                
ip_vs_fwd_name(dest->masq_flags),
                                                dest->weight,
                                                atomic_read(&dest->activeconns),
-                                               atomic_read(&dest->inactconns));
+                                               atomic_read(&dest->inactconns),
+                                              dest->u_thresh,
+                                              dest->l_thresh);
                                 len += size;
                                 pos += size;
                                 
@@ -2505,13 +2537,15 @@
                                 dest = list_entry(q,struct ip_vs_dest,n_list);
                                 size = sprintf(buf+len,
                                                "  -> %08X:%04X      %-7s "
-                                               "%-6d %-10d %-10d\n",
+                                               "%-6d %-10d %-10d %-10d 
%-10d\n",
                                                ntohl(dest->addr),
                                                ntohs(dest->port),
                                                
ip_vs_fwd_name(dest->masq_flags),
                                                dest->weight,
                                                atomic_read(&dest->activeconns),
-                                               atomic_read(&dest->inactconns));
+                                               atomic_read(&dest->inactconns),
+                                              dest->u_thresh,
+                                              dest->l_thresh);
                                 len += size;
                                 pos += size;
                                 
@@ -2565,6 +2599,7 @@
                        atomic_read(&ip_vs_concurrentconns),
                        atomic_read(&ip_vs_connshandled),
                        atomic_read(&ip_vs_packetshandled));
+       /* Here we should add a per svc and per rs statistics */
         pos += size;
         len += size;
         
<Prev in Thread] Current Thread [Next in Thread>