LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: invoke scheduler for every received packet

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: invoke scheduler for every received packet
Cc: Horms <horms@xxxxxxxxxxxx>
Cc: Wensong Zhang <wensong@xxxxxxxxxxxx>
Cc: Julian Anastasov <ja@xxxxxx>
From: Roberto Nibali <ratz@xxxxxxxxxxxx>
Date: Thu, 13 Jul 2006 09:28:41 +0200
Hi guys,

I have just briefly skimmed over this patch and am a bit unsure how
efficient it is, but it seems to not populate the template cache.  Why
invoking the scheduler module for every packet is special, I fail to
see. Having non-persistent scheduling to does the same. But I've only
looked at the patch for 2 minutes.

Isn't the difference that persistance acts on connections,
where as this patch acts on packets? Of course, UDP ones,
TCP would make no sense.

Yes, I reckon I must have been mislead by the wording used in "schedule" which I was referring to the kernel scheduler. What this patch does is (and we leave TCP out, since it does not make sense there):

srcIP:srcPort --> dstIP:dstPort --> call scheduler code (RR, LC, ...)
srcIP:srcPort <-- dstIP:dstPort --> do nothing
srcIP:srcPort --> dstIP:dstPort --> call scheduler code
srcIP:srcPort <-- dstIP:dstPort --> do nothing
srcIP:srcPort --> dstIP:dstPort --> call scheduler code
srcIP:srcPort <-- dstIP:dstPort --> do nothing
[IP_VS_S_UDP]
srcIP:srcPort --> dstIP:dstPort --> call scheduler code
srcIP:srcPort <-- dstIP:dstPort --> do nothing

Whereas the non-OPS patched kernel would do:

srcIP:srcPort --> dstIP:dstPort --> call scheduler code & add template
srcIP:srcPort <-- dstIP:dstPort --> do nothing
srcIP:srcPort --> dstIP:dstPort --> read template entry for this hash
srcIP:srcPort <-- dstIP:dstPort --> do nothing
srcIP:srcPort --> dstIP:dstPort --> read template entry for this hash
srcIP:srcPort <-- dstIP:dstPort --> do nothing
[IP_VS_S_UDP]
srcIP:srcPort --> dstIP:dstPort --> call scheduler code & add template
srcIP:srcPort <-- dstIP:dstPort --> do nothing

Is this more or less accurate? Also, the "do nothing" part is to my avail independent of LVS_DR or LVS_NAT, since UDP has no state transition to be handled.

I seem to recall needing a similar feature myself at some stage.
The usage that I had was related to SIP, where all packets seem to
come and go to and from the same ports and addresses, but can actually
be loadbalanced independantly.

I am quite happy to push this to DaveM for inclusion. After some reveiw
of course. I will start with a review myself, the patch is taken from
the archive.linuxvirtualserver.org URL above.

[How can one make email stuff visible after the '-- '-tag?]

> diff -ruN linux-2.6.13.orig/include/net/ip_vs.h linux-2.6.13/include/net/ip_vs.h > > --- linux-2.6.13.orig/include/net/ip_vs.h 2005-09-28 00:49:21.000000000 +0200 > > +++ linux-2.6.13/include/net/ip_vs.h 2005-09-28 14:39:32.000000000 +0200
> > @@ -19,6 +19,7 @@
> >   */
> >  #define IP_VS_SVC_F_PERSISTENT       0x0001          /* persistent port */
> >  #define IP_VS_SVC_F_HASHED   0x0002          /* hashed entry */
> > +#define IP_VS_SVC_F_ONEPACKET        0x0004          /* one-packet 
scheduling */

Maybe "EACHPACKET"? Since the scheduler is invoked for each packet and not for one packet :). The same of course applies to the whole patch.

> >
> >  /*
> >   *      Destination Server Flags
> > @@ -84,6 +85,7 @@
> >  #define IP_VS_CONN_F_IN_SEQ  0x0400          /* must do input seq adjust */
> >  #define IP_VS_CONN_F_SEQ_MASK        0x0600          /* in/out sequence 
mask */
> >  #define IP_VS_CONN_F_NO_CPORT        0x0800          /* no client port set 
yet */
> > +#define IP_VS_CONN_F_ONE_PACKET      0x1000          /* forward only one 
packet */
> >
> >  /* Move it to better place one day, for now keep it unique */
> >  #define NFC_IPVS_PROPERTY    0x10000
> > diff -ruN linux-2.6.13.orig/net/ipv4/ipvs/ip_vs_conn.c linux-2.6.13/net/ipv4/ipvs/ip_vs_conn.c > > --- linux-2.6.13.orig/net/ipv4/ipvs/ip_vs_conn.c 2005-09-28 00:49:23.000000000 +0200 > > +++ linux-2.6.13/net/ipv4/ipvs/ip_vs_conn.c 2005-09-28 02:54:55.000000000 +0200
> > @@ -127,6 +127,9 @@
> >       unsigned hash;
> >       int ret;
> >
> > +     if (cp->flags & IP_VS_CONN_F_ONE_PACKET)
> > +             return 0;
> > +
> >       /* Hash by protocol, client address and port */
> >       hash = ip_vs_conn_hashkey(cp->protocol, cp->caddr, cp->cport);
> >
> > @@ -275,6 +278,11 @@
> >   */
> >  void ip_vs_conn_put(struct ip_vs_conn *cp)
> >  {
> > +     if (cp->flags & IP_VS_CONN_F_ONE_PACKET) {
> > +             ip_vs_conn_expire_now(cp);
> > +             return;
> > +     }
> > +

Is it really needed to have a timer in case OPS is active? Or would it be even more effort to comment the timer code out for OPS-flagged packets, than just leaving it in.

> >       /* reset it expire in its timeout */
> >       mod_timer(&cp->timer, jiffies+cp->timeout);
> >
> > @@ -506,7 +514,7 @@
> >       /*
> >        *      unhash it if it is hashed in the conn table
> >        */
> > -     if (!ip_vs_conn_unhash(cp))
> > +     if (!ip_vs_conn_unhash(cp) && !(cp->flags & IP_VS_CONN_F_ONE_PACKET))
> >               goto expire_later;
> >
> >       /*
> > diff -ruN linux-2.6.13.orig/net/ipv4/ipvs/ip_vs_core.c linux-2.6.13/net/ipv4/ipvs/ip_vs_core.c > > --- linux-2.6.13.orig/net/ipv4/ipvs/ip_vs_core.c 2005-09-28 00:49:23.000000000 +0200 > > +++ linux-2.6.13/net/ipv4/ipvs/ip_vs_core.c 2005-09-28 02:54:55.000000000 +0200
> > @@ -215,6 +215,7 @@
> >       struct ip_vs_dest *dest;
> >       struct ip_vs_conn *ct;
> >       __u16  dport;    /* destination port to forward */
> > +     __u16 flags;

Since I'm working on Windows only right now for a customer, I have no easy access to kernel repositories. However, from looking at that snipped, I have two comments:

1.) Potential user space breakage?
2.) Namespace: We don't need another variable just being named "flags".
    In the past I've cleaned up some of this for my own tree because it
    was simple too cumbersome to find out to which struct a sprinkled
    flags variable in the code would belong.
    Make it ops_flags or cp_flags or whichever context it belongs to.

> >       __u32  snet;     /* source network of the client, after masking */
> >
> >       /* Mask saddr with the netmask to adjust template granularity */
> > @@ -345,6 +346,9 @@
> >               dport = ports[1];
> >       }
> >
> > +     flags = (svc->flags & IP_VS_SVC_F_ONEPACKET
> > +              && iph->protocol == IPPROTO_UDP)?
> > +             IP_VS_CONN_F_ONE_PACKET : 0;

> Could this be changed to an if then construct?

Agreed.

> > +     flags = (svc->flags & IP_VS_SVC_F_ONEPACKET
> > +              && iph->protocol == IPPROTO_UDP)?
> > +             IP_VS_CONN_F_ONE_PACKET : 0;

> Again, if then, please.

Again, agreed :).

> >       /*
> >        *    Create a connection entry.
> >        */
> > @@ -419,7 +427,7 @@
> >                           iph->saddr, pptr[0],
> >                           iph->daddr, pptr[1],
> >                           dest->addr, dest->port?dest->port:pptr[1],
> > -                         0,
> > +                         flags,
> >                           dest);
> >       if (cp == NULL)
> >               return NULL;
> > @@ -462,6 +470,9 @@
> >           && (inet_addr_type(iph->daddr) == RTN_UNICAST)) {
> >               int ret, cs;
> >               struct ip_vs_conn *cp;
> > +             __u16 flags = (svc->flags & IP_VS_SVC_F_ONEPACKET &&
> > +                             iph->protocol == IPPROTO_UDP)?
> > +                             IP_VS_CONN_F_ONE_PACKET : 0;

Since this construct is in use at least 3 times already, how about a macro (ip_vs.h)?

> > +++ linux-2.6.13/net/ipv4/ipvs/ip_vs_ctl.c 2005-09-28 14:49:20.000000000 +0200
> > @@ -1753,14 +1753,18 @@
> >               const struct ip_vs_dest *dest;
> >
> >               if (iter->table == ip_vs_svc_table)
> > -                     seq_printf(seq, "%s  %08X:%04X %s ",
> > +                     seq_printf(seq, "%s  %08X:%04X %s%s ",
> >                                  ip_vs_proto_name(svc->protocol),
> >                                  ntohl(svc->addr),
> >                                  ntohs(svc->port),
> > -                                svc->scheduler->name);
> > +                                svc->scheduler->name,
> > +                                (svc->flags & IP_VS_SVC_F_ONEPACKET)?
> > +                                " ops":"");
> >               else
> > -                     seq_printf(seq, "FWM  %08X %s ",
> > -                                svc->fwmark, svc->scheduler->name);
> > +                     seq_printf(seq, "FWM  %08X %s%s ",
> > +                                svc->fwmark, svc->scheduler->name,
> > +                                (svc->flags & IP_VS_SVC_F_ONEPACKET)?
> > +                                " ops":"");

> I'm not entirely comforatle with this proc change. Won't it break
> user-space compatiblility?

I'm not sure if it will break or simply ignore it. I believe that if you cat /proc/... OPS will be shown, however ipvsadm needs a patch to show it. This should be around anyway to honour setting the ONEPACKET flag to a service.

The other problem I see is that it heavily interferes with my work done on the session overflow code. I've also already enhanced that line heavily with more information. It's a pity that the ABI was never frozen. Ipvsadm actually should query all this information from the kernel, instead of parsing proc-fs output. And this preferably via netlink :).

Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

<Prev in Thread] Current Thread [Next in Thread>