LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: firewall marks + tunneling + persistence = ERR! state

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: firewall marks + tunneling + persistence = ERR! state
From: Jaroslav Libák <jarol1@xxxxxxxxx>
Date: Fri, 01 Dec 2006 19:27:50 +0100 (CET)
>tcp timeouts have the values they do for a good reason. If you understand your 
>system and are prepared to deal with the consequences of changing the 
>timeouts, then this being a GPL project you can go ahead and change anything 
>you like.

In LVS/tunneling configuration the director receives only the incoming traffic 
so its connection table is highly inaccurate which is probably the reason for 
unusually high number of FIN_WAITs. Getting rid of the FIN_WAITs quickly will 
free some memory on the director. Short tcpfin will not cause ACK being routed 
to the wrong server if you use persistence. In other words, if you don't use 
persistence and use NAT or DR/TUN with director as the default route for the 
realservers then you need proper timeouts. On the other hand if you use 
persistence and DR/TUN with realserver having another default route then in my 
opinion monitoring the state of TCP connection is pointless and you can live 
with very short timeouts.

>> I'm using hash table size 2^20 (which doesn't limit the maximum number of 
>> values in it, it just sets the number of rows, then each row has a linked 
>> list). Doesn't it cause some slowdown in the LVS?
>have you found a slowdown?

Unfortunately I didn't have time to test LVS with smaller hash table. And 
testing it at home under vmware doesn't make sense.

>I understand that your concern is memory preasure on the slave in
>the case of a DoS attack. And it is true that the simplification
>in the synchronisation protocol can exasabate that problem.
>However, by doing it this way the synchronisatin traffic is actually
>reduced, including in the case of a DoS attack. So expanding it
>may actually just move the problem else where.

What I would welcome in the LVS code is a setting for minimum memory. If there 
is less memory then no more memory would be allocated.
Today I tested LVS on PIII 1.4Ghz with 1GB RAM as director D2. I had another 
director D1 which was used solely for sending packets to D2 via 1Gbps link. D2 
was connected directly by 1Gbps to R1 (realserver) and R2. Real servers were 
dual 2.8Ghz Xeons. I used http://www.ssi.bg/~ja/testlvs-0.1.tar.gz for sending 
packets from different IP addresses with the aim of causing R2 to crash. I used 
the script that is included in the tar to measure incoming packets on the real 
servers. I was sending TCP syn packets from R1 from 16 000 000 different IP 
addresses. Amemthresh was set to default value. I was able to crash D2 within 1 
minute (it ran out of memory, also heartbeat and other stuff didn't work 
properly, but pings did. Reboot was necessary). I changed amemthreshold to 
16384 and even with that value i could crash D2 within a couple of minutes. 
With 16384*4096=67MB - when the drop packet defense started working. So I 
changed amemthresh to 65536*4096=268MB and with that value I wasn
 't able to crash it (it survived syn DOS, after I stopped DOSing it it resumed 
working). However I think it would be better to have a configurable hard limit 
on the minimum amount of memory i.e 50MB when LVS stops allocating memory for 
new connections.

What would happen if D1 was used as slave? Would it run out of memory with 
amemthresh 65536? Does this value have any meaning on the slave? Is there any 
protection on the slave?

Some results I measured:
(LVS TUN with persistence 160s, wrr scheduling, firewall marks, 1 director, 2 
real servers, 1Gbps links, 2.6.18.3 kernel, hash table size 2^20, director with 
1GB memory, tcpfin 15s)

tcp 40 bytes syn packets - 60 000/s received on R1/R2 = 2.4MB/s = 20MBit/s 
using 16 000 000 different IP addresses. Not sustainable.

udp 1400 bytes - 44 000/s = 62MB/s = 500Mbit/s - using 16 000 000 IP addresses. 
Not sustainable.

udp 1400 bytes - 64 000/s = 90MB/s = 717Mbit/s - using 200 000 IP addresses - 
sustainable

udp 1400 bytes - 73 000/s = 102MB/s = 817Mbit/s - using 10 000 IP addresses

udp 1400 bytes - 73000/s = 102MB/s = 817Mbit/s - using 1 000 IP addresses

We don't really need this performance, we won't use more than 100Mbit/s so 
there is big reserve.


>I don't think that is is to do with ipvsadm, as I think that the
>strings come from the kernel. Can you see if the same problem shows
>up when you cat /proc/net/ip_vs_conn ?
>Once these connections get into that state, do they stay in that state
>until they timeout, or do they progress to a different state?

cat shows ERR! too. They didn't cause any problems in my tests, they 
dissappeared after specified timeout .

>Could you send some examples of this behaviour?
>I suspect that it is harmless, but I also think it is
>a bug in the the reporting functionality.

It occurs on kernel 2.6.18.3 with tunneling + fwmarks + persistence on (it 
happens to me at work and at home in vmware too). Its very easy to reproduce.

Jaro

<Prev in Thread] Current Thread [Next in Thread>
  • Re: firewall marks + tunneling + persistence = ERR! state, Jaroslav Libák <=