Hello,
On Mon, 26 Dec 2011, Aleksey Chudov wrote:
> Hello,
>
> Thanks for the answer.
>
> >> Is it possible to implement change schedule_timeout_interruptible via
> >> sysctl?
> > May be better to implement logic with auto-adjustment.
>
> Auto-adjustment looks much better.
>
> > Using in_pkts for templates is not a good idea.
> > As drops are possible, it can be done more often but not every time as for
> > sync version 0.
> > Also, before ip_vs_conn_expire() we do not know if template life will be
> > extended.
> > May be backup server should use longer timeout for templates, so that it
> > can not miss
> > the sync packets during the extended period.
>
> > So, now the question is how to properly reduce the rate of sync packets for
> > templates
> > and may be for other conns when state is not changed but its life is
> > extended. I have
> > to think for some time about such changes.
>
> > Can you try such change: in ip_vs_sync_conn() comment the following two
> > lines under
> > "Reduce sync rate for templates":
> > if (pkts % sysctl_sync_period(ipvs) != 1)
> > return;
>
> > By this way we will sync templates every time a normal connection is
> > synced, as for
> > version 0. It is still too often for templates but now you can try again
> > with "3 100",
> > so that we can see if the difference is reduced.
>
> > BTW, what is the persistence timeout value?
>
> Persistence timeout is 1800 (30 min). It is application specific.
>
> Tried the following:
>
> Linux Kernel 2.6.39.4 + LVS Fwmark
>
> iptables -t mangle -A PREROUTING -d VIP -i bond0 -p tcp -m multiport --dports
> 80,443 -j MARK --set-mark 1
>
> ipvsadm -A -f 1 -s wlc -p 1800
> -a -f 1 -r 1.1.1.1:0 -i -w 100
> -a -f 1 -r 1.1.1.2:0 -i -w 100
> ...
> -a -f 1 -r 1.1.X.X:0 -i -w 100
> (320 servers total)
OK, this is IPIP method. It seems we have a problem
with this rport 0. See the appended patch. It should allow
sync-ed conns in backup to find their real server. As result,
the inact/act counters should work again, CPU usage should
be lower because before now we fail to bind to real server
for every sync message for the connection.
> # ipvsadm -l --daemon
> master sync daemon (mcast=eth3, syncid=1)
> backup sync daemon (mcast=eth3,syncid=1)
>
> 1. ip_vs_sync_conn original, schedule_timeout_interruptible(HZ/10) and
> sync_threshold = "3 10"
> Results: sync traffic 60 Mbit/s, 6000 packets/sec, 60 %sys CPU on Backup node,
> 8% difference in persistent connections between Master and Backup nodes,
> netstat -s on Master SndbufErrors: 0
>
> 2. ip_vs_sync_conn patched, schedule_timeout_interruptible(HZ/10) and
> sync_threshold = "3 10"
> Results: sync traffic 100 Mbit/s, 8500 packets/sec, 93 %sys CPU on Backup
> node,
> <1% difference in persistent connections between Master and Backup nodes,
> netstat -s on Master SndbufErrors: 0
>
> 3. ip_vs_sync_conn patched, schedule_timeout_interruptible(HZ/10) and
> sync_threshold = "3 100"
> Results: sync traffic 70 Mbit/s, 6000 packets/sec, 70 %sys CPU on Backup node,
> ~2% difference in persistent connections between Master and Backup nodes,
> netstat -s on Master SndbufErrors: 0
>
> 4. ip_vs_sync_conn patched, schedule_timeout_interruptible(HZ/10) and
> sync_threshold = "3 200"
> Results: sync traffic 66 Mbit/s, 5800 packets/sec, 66 %sys CPU on Backup node,
> ~3% difference in persistent connections between Master and Backup nodes,
> netstat -s on Master SndbufErrors: 0
>
> 5. ip_vs_sync_conn patched, schedule_timeout_interruptible(HZ/10) and
> sync_threshold = "3 1000"
> Results: sync traffic 64 Mbit/s, 5600 packets/sec, 64 %sys CPU on Backup node,
> ~3% difference in persistent connections between Master and Backup nodes,
> netstat -s on Master SndbufErrors: 0
>
> In all test I can't check difference in Active and InAct connections because
> ipvsadm does not show
> Active and InAct connections on Backup node for Fwmark virtual, only Persist
> connections.
>
> There is no significant differences in sync traffic after "3 100".
Yes, it depends on how many packets we see for conns.
> >> As mentioned in another report
> >> http://www.gossamer-threads.com/lists/lvs/users/24331
> >> after switching from TCP VIP to Fwmark %sys CPU is raised from 40 - 50
> >> % (TCP VIP) to 80 - 100 % (Fwmark) with no difference in sync traffic.
>
> Could you explain why %sys CPU is raised with Fwmark?
> Could you explain why ipvsadm does not show Active and InAct connections on
> Backup node for Fwmark virtual?
Yes, we try to bind to dest for every sync message
without success because conns come with dport=80/443 while
real server port is 0. Only the template conns find the
server because they have rport 0. I hope the appended patch
should fix it. How better is the CPU then?
> Regards,
> Aleksey
Subject: [PATCH] ipvs: try also real server with port 0 in backup server
We should not forget to try for real server with port 0
in the backup server when processing the sync message. We should
do it in all cases because the backup server can use different
forwarding method.
Signed-off-by: Julian Anastasov <ja@xxxxxx>
---
include/net/ip_vs.h | 2 +-
net/netfilter/ipvs/ip_vs_conn.c | 2 +-
net/netfilter/ipvs/ip_vs_ctl.c | 10 ++++++++--
net/netfilter/ipvs/ip_vs_sync.c | 2 +-
4 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 48fd12e..ebe517f 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1207,7 +1207,7 @@ extern void ip_vs_control_cleanup(void);
extern struct ip_vs_dest *
ip_vs_find_dest(struct net *net, int af, const union nf_inet_addr *daddr,
__be16 dport, const union nf_inet_addr *vaddr, __be16 vport,
- __u16 protocol, __u32 fwmark);
+ __u16 protocol, __u32 fwmark, __u32 flags);
extern struct ip_vs_dest *ip_vs_try_bind_dest(struct ip_vs_conn *cp);
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index 12571fb..29fa5ba 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -616,7 +616,7 @@ struct ip_vs_dest *ip_vs_try_bind_dest(struct ip_vs_conn
*cp)
if ((cp) && (!cp->dest)) {
dest = ip_vs_find_dest(ip_vs_conn_net(cp), cp->af, &cp->daddr,
cp->dport, &cp->vaddr, cp->vport,
- cp->protocol, cp->fwmark);
+ cp->protocol, cp->fwmark, cp->flags);
ip_vs_bind_dest(cp, dest);
return dest;
} else
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 008bf97..e1a66cf 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -619,15 +619,21 @@ struct ip_vs_dest *ip_vs_find_dest(struct net *net, int
af,
const union nf_inet_addr *daddr,
__be16 dport,
const union nf_inet_addr *vaddr,
- __be16 vport, __u16 protocol, __u32 fwmark)
+ __be16 vport, __u16 protocol, __u32 fwmark,
+ __u32 flags)
{
struct ip_vs_dest *dest;
struct ip_vs_service *svc;
+ __be16 port = dport;
svc = ip_vs_service_get(net, af, fwmark, protocol, vaddr, vport);
if (!svc)
return NULL;
- dest = ip_vs_lookup_dest(svc, daddr, dport);
+ if (fwmark && (flags & IP_VS_CONN_F_FWD_MASK) != IP_VS_CONN_F_MASQ)
+ port = 0;
+ dest = ip_vs_lookup_dest(svc, daddr, port);
+ if (!dest)
+ dest = ip_vs_lookup_dest(svc, daddr, port ^ dport);
if (dest)
atomic_inc(&dest->refcnt);
ip_vs_service_put(svc);
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index bcf5563..8a0d6d6 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -740,7 +740,7 @@ static void ip_vs_proc_conn(struct net *net, struct
ip_vs_conn_param *param,
* but still handled.
*/
dest = ip_vs_find_dest(net, type, daddr, dport, param->vaddr,
- param->vport, protocol, fwmark);
+ param->vport, protocol, fwmark, flags);
/* Set the approprite ativity flag */
if (protocol == IPPROTO_TCP) {
--
1.7.3.4
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
|