Hi guys,
I have just briefly skimmed over this patch and am a bit unsure how
efficient it is, but it seems to not populate the template cache. Why
invoking the scheduler module for every packet is special, I fail to
see. Having non-persistent scheduling to does the same. But I've only
looked at the patch for 2 minutes.
Isn't the difference that persistance acts on connections,
where as this patch acts on packets? Of course, UDP ones,
TCP would make no sense.
Yes, I reckon I must have been mislead by the wording used in "schedule"
which I was referring to the kernel scheduler. What this patch does is
(and we leave TCP out, since it does not make sense there):
srcIP:srcPort --> dstIP:dstPort --> call scheduler code (RR, LC, ...)
srcIP:srcPort <-- dstIP:dstPort --> do nothing
srcIP:srcPort --> dstIP:dstPort --> call scheduler code
srcIP:srcPort <-- dstIP:dstPort --> do nothing
srcIP:srcPort --> dstIP:dstPort --> call scheduler code
srcIP:srcPort <-- dstIP:dstPort --> do nothing
[IP_VS_S_UDP]
srcIP:srcPort --> dstIP:dstPort --> call scheduler code
srcIP:srcPort <-- dstIP:dstPort --> do nothing
Whereas the non-OPS patched kernel would do:
srcIP:srcPort --> dstIP:dstPort --> call scheduler code & add template
srcIP:srcPort <-- dstIP:dstPort --> do nothing
srcIP:srcPort --> dstIP:dstPort --> read template entry for this hash
srcIP:srcPort <-- dstIP:dstPort --> do nothing
srcIP:srcPort --> dstIP:dstPort --> read template entry for this hash
srcIP:srcPort <-- dstIP:dstPort --> do nothing
[IP_VS_S_UDP]
srcIP:srcPort --> dstIP:dstPort --> call scheduler code & add template
srcIP:srcPort <-- dstIP:dstPort --> do nothing
Is this more or less accurate? Also, the "do nothing" part is to my
avail independent of LVS_DR or LVS_NAT, since UDP has no state
transition to be handled.
I seem to recall needing a similar feature myself at some stage.
The usage that I had was related to SIP, where all packets seem to
come and go to and from the same ports and addresses, but can actually
be loadbalanced independantly.
I am quite happy to push this to DaveM for inclusion. After some reveiw
of course. I will start with a review myself, the patch is taken from
the archive.linuxvirtualserver.org URL above.
[How can one make email stuff visible after the '-- '-tag?]
> diff -ruN linux-2.6.13.orig/include/net/ip_vs.h
linux-2.6.13/include/net/ip_vs.h
> > --- linux-2.6.13.orig/include/net/ip_vs.h 2005-09-28
00:49:21.000000000 +0200
> > +++ linux-2.6.13/include/net/ip_vs.h 2005-09-28 14:39:32.000000000
+0200
> > @@ -19,6 +19,7 @@
> > */
> > #define IP_VS_SVC_F_PERSISTENT 0x0001 /* persistent port */
> > #define IP_VS_SVC_F_HASHED 0x0002 /* hashed entry */
> > +#define IP_VS_SVC_F_ONEPACKET 0x0004 /* one-packet
scheduling */
Maybe "EACHPACKET"? Since the scheduler is invoked for each packet and
not for one packet :). The same of course applies to the whole patch.
> >
> > /*
> > * Destination Server Flags
> > @@ -84,6 +85,7 @@
> > #define IP_VS_CONN_F_IN_SEQ 0x0400 /* must do input seq adjust */
> > #define IP_VS_CONN_F_SEQ_MASK 0x0600 /* in/out sequence
mask */
> > #define IP_VS_CONN_F_NO_CPORT 0x0800 /* no client port set
yet */
> > +#define IP_VS_CONN_F_ONE_PACKET 0x1000 /* forward only one
packet */
> >
> > /* Move it to better place one day, for now keep it unique */
> > #define NFC_IPVS_PROPERTY 0x10000
> > diff -ruN linux-2.6.13.orig/net/ipv4/ipvs/ip_vs_conn.c
linux-2.6.13/net/ipv4/ipvs/ip_vs_conn.c
> > --- linux-2.6.13.orig/net/ipv4/ipvs/ip_vs_conn.c 2005-09-28
00:49:23.000000000 +0200
> > +++ linux-2.6.13/net/ipv4/ipvs/ip_vs_conn.c 2005-09-28
02:54:55.000000000 +0200
> > @@ -127,6 +127,9 @@
> > unsigned hash;
> > int ret;
> >
> > + if (cp->flags & IP_VS_CONN_F_ONE_PACKET)
> > + return 0;
> > +
> > /* Hash by protocol, client address and port */
> > hash = ip_vs_conn_hashkey(cp->protocol, cp->caddr, cp->cport);
> >
> > @@ -275,6 +278,11 @@
> > */
> > void ip_vs_conn_put(struct ip_vs_conn *cp)
> > {
> > + if (cp->flags & IP_VS_CONN_F_ONE_PACKET) {
> > + ip_vs_conn_expire_now(cp);
> > + return;
> > + }
> > +
Is it really needed to have a timer in case OPS is active? Or would it
be even more effort to comment the timer code out for OPS-flagged
packets, than just leaving it in.
> > /* reset it expire in its timeout */
> > mod_timer(&cp->timer, jiffies+cp->timeout);
> >
> > @@ -506,7 +514,7 @@
> > /*
> > * unhash it if it is hashed in the conn table
> > */
> > - if (!ip_vs_conn_unhash(cp))
> > + if (!ip_vs_conn_unhash(cp) && !(cp->flags & IP_VS_CONN_F_ONE_PACKET))
> > goto expire_later;
> >
> > /*
> > diff -ruN linux-2.6.13.orig/net/ipv4/ipvs/ip_vs_core.c
linux-2.6.13/net/ipv4/ipvs/ip_vs_core.c
> > --- linux-2.6.13.orig/net/ipv4/ipvs/ip_vs_core.c 2005-09-28
00:49:23.000000000 +0200
> > +++ linux-2.6.13/net/ipv4/ipvs/ip_vs_core.c 2005-09-28
02:54:55.000000000 +0200
> > @@ -215,6 +215,7 @@
> > struct ip_vs_dest *dest;
> > struct ip_vs_conn *ct;
> > __u16 dport; /* destination port to forward */
> > + __u16 flags;
Since I'm working on Windows only right now for a customer, I have no
easy access to kernel repositories. However, from looking at that
snipped, I have two comments:
1.) Potential user space breakage?
2.) Namespace: We don't need another variable just being named "flags".
In the past I've cleaned up some of this for my own tree because it
was simple too cumbersome to find out to which struct a sprinkled
flags variable in the code would belong.
Make it ops_flags or cp_flags or whichever context it belongs to.
> > __u32 snet; /* source network of the client, after masking */
> >
> > /* Mask saddr with the netmask to adjust template granularity */
> > @@ -345,6 +346,9 @@
> > dport = ports[1];
> > }
> >
> > + flags = (svc->flags & IP_VS_SVC_F_ONEPACKET
> > + && iph->protocol == IPPROTO_UDP)?
> > + IP_VS_CONN_F_ONE_PACKET : 0;
> Could this be changed to an if then construct?
Agreed.
> > + flags = (svc->flags & IP_VS_SVC_F_ONEPACKET
> > + && iph->protocol == IPPROTO_UDP)?
> > + IP_VS_CONN_F_ONE_PACKET : 0;
> Again, if then, please.
Again, agreed :).
> > /*
> > * Create a connection entry.
> > */
> > @@ -419,7 +427,7 @@
> > iph->saddr, pptr[0],
> > iph->daddr, pptr[1],
> > dest->addr, dest->port?dest->port:pptr[1],
> > - 0,
> > + flags,
> > dest);
> > if (cp == NULL)
> > return NULL;
> > @@ -462,6 +470,9 @@
> > && (inet_addr_type(iph->daddr) == RTN_UNICAST)) {
> > int ret, cs;
> > struct ip_vs_conn *cp;
> > + __u16 flags = (svc->flags & IP_VS_SVC_F_ONEPACKET &&
> > + iph->protocol == IPPROTO_UDP)?
> > + IP_VS_CONN_F_ONE_PACKET : 0;
Since this construct is in use at least 3 times already, how about a
macro (ip_vs.h)?
> > +++ linux-2.6.13/net/ipv4/ipvs/ip_vs_ctl.c 2005-09-28
14:49:20.000000000 +0200
> > @@ -1753,14 +1753,18 @@
> > const struct ip_vs_dest *dest;
> >
> > if (iter->table == ip_vs_svc_table)
> > - seq_printf(seq, "%s %08X:%04X %s ",
> > + seq_printf(seq, "%s %08X:%04X %s%s ",
> > ip_vs_proto_name(svc->protocol),
> > ntohl(svc->addr),
> > ntohs(svc->port),
> > - svc->scheduler->name);
> > + svc->scheduler->name,
> > + (svc->flags & IP_VS_SVC_F_ONEPACKET)?
> > + " ops":"");
> > else
> > - seq_printf(seq, "FWM %08X %s ",
> > - svc->fwmark, svc->scheduler->name);
> > + seq_printf(seq, "FWM %08X %s%s ",
> > + svc->fwmark, svc->scheduler->name,
> > + (svc->flags & IP_VS_SVC_F_ONEPACKET)?
> > + " ops":"");
> I'm not entirely comforatle with this proc change. Won't it break
> user-space compatiblility?
I'm not sure if it will break or simply ignore it. I believe that if you
cat /proc/... OPS will be shown, however ipvsadm needs a patch to show
it. This should be around anyway to honour setting the ONEPACKET flag to
a service.
The other problem I see is that it heavily interferes with my work done
on the session overflow code. I've also already enhanced that line
heavily with more information. It's a pity that the ABI was never
frozen. Ipvsadm actually should query all this information from the
kernel, instead of parsing proc-fs output. And this preferably via
netlink :).
Best regards,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
|