LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: Okay, it happened again. (incredible expire times)

To: Lars Marowsky-Bree <lmb@xxxxxxxxx>
Subject: Re: Okay, it happened again. (incredible expire times)
Cc: linux-virtualserver@xxxxxxxxxxxx
From: Wensong Zhang <wensong@xxxxxxxxxxxx>
Date: Sun, 29 Aug 1999 20:31:36 +0800
Hi Lars,

The masq entries of huge expire time is mostly due to slow timer.
Slow timer is added to avoid the overhead of cascading masq
entries when there are lots of masq entries in the system.

Well, let me explain that the detailed reason to you. There is
system jiffies, the masq timeout is equal to
        timeout value = (unsigned) masq->expires - jiffies

Sometimes the run_timer_list() is run behind system jiffies, which
is used to collect stale timer. So, some masq->expires are less 
then jiffies, it is negative, when it is converted to unsigned 
value, it become a huge timeout value that you saw. Anyway, when
the run_timer_list() is activated, it will collect all timers whoes
expires are less than system jiffies. So, the masq entries of huge
timeout value will be collected soon. Every time the masq
entries of huge timeout that you saw are mostly different.
Please write down these masq entries of huge timeout value. Then,
"ipchains -M -L -n" again. Check the new masq entries of huge
timeout value. You can see that they are totally different, because
the original entries of huge timeout value were already deleted.

The other reason of huge expire time is because of incorrect
reference number of the masq entry. I forgot to do ip_masq_put
when checksum failed. Julian Anastasov <uli@xxxxxxxxxxxxxxxxxxxxxx>
submitted the fix to me as follows:

@@ -2335,6 +2347,9 @@
                                if (csum_tcpudp_magic(iph->saddr, iph->daddr, 
                                                size, iph->protocol, skb->csum))
                                {
+#ifdef CONFIG_IP_MASQUERADE_VS
+                                       if (ms) ip_masq_put(ms);
+#endif
                                        IP_MASQ_DEBUG(0, "Incoming failed %s 
checksum from %d.%d.%d.%d
(size=%d)!\n",
                                               masq_proto_name(iph->protocol),
                                               NIPQUAD(iph->saddr),
Julian also gave other fixes and suggestions. Thank Julain.
We are working on the next release of IPVS patch now.

Thanks,

Wensong


Lars Marowsky-Bree wrote:
> 
> # /sbin/ipchains -L -Mn | wc -l
>    6079
> 
> The number is increasing steadily.
> 
> # /sbin/ipvsadm -L
> IP Virtual Server (Version 0.7)
> Protocol Local Address:Port Scheduler
>       -> Remote Address:Port   Forward Weight ActiveConn FinConn
> TCP XXX.XXX.XXX.XX:80 pcc
>       -> 192.168.168.42:80     Masq    2000   86         192
>       -> 192.168.168.43:80     Masq    2000   95         229
>       -> 192.168.168.44:80     Masq    2000   93         343
>       -> 192.168.168.41:80     Masq    2000   175        264
> 
> The client servers don't open that many outgoing connections to account for
> the difference.
> 
> The entries look just like this:
> 
> TCP  715092:31.23 192.168.168.44       141.15.3.1           80 (80) -> 48142
> 
> While this is a clumsy solution, I vote for purging any entries with an expire
> time >1000 minutes or so to get rid of those.
> 
> (This is using 2.1.12pre7) Hopefully, I assume that this is not going to
> impact performance. Since the machines are HA, I _could_ reboot them everytime
> it gets to worse, but I would rather not.
> 
> Sincerely,
>     Lars Marowsky-Brée
> 
> --
> Lars Marowsky-Brée
> Network Management
> 
> teuto.net Netzdienste GmbH - DPN Verbund-Partner



<Prev in Thread] Current Thread [Next in Thread>
  • Re: Okay, it happened again. (incredible expire times), Wensong Zhang <=