>tcp timeouts have the values they do for a good reason. If you understand your
>system and are prepared to deal with the consequences of changing the
>timeouts, then this being a GPL project you can go ahead and change anything
>you like.
In LVS/tunneling configuration the director receives only the incoming traffic
so its connection table is highly inaccurate which is probably the reason for
unusually high number of FIN_WAITs. Getting rid of the FIN_WAITs quickly will
free some memory on the director. Short tcpfin will not cause ACK being routed
to the wrong server if you use persistence. In other words, if you don't use
persistence and use NAT or DR/TUN with director as the default route for the
realservers then you need proper timeouts. On the other hand if you use
persistence and DR/TUN with realserver having another default route then in my
opinion monitoring the state of TCP connection is pointless and you can live
with very short timeouts.
>> I'm using hash table size 2^20 (which doesn't limit the maximum number of
>> values in it, it just sets the number of rows, then each row has a linked
>> list). Doesn't it cause some slowdown in the LVS?
>have you found a slowdown?
Unfortunately I didn't have time to test LVS with smaller hash table. And
testing it at home under vmware doesn't make sense.
>I understand that your concern is memory preasure on the slave in
>the case of a DoS attack. And it is true that the simplification
>in the synchronisation protocol can exasabate that problem.
>However, by doing it this way the synchronisatin traffic is actually
>reduced, including in the case of a DoS attack. So expanding it
>may actually just move the problem else where.
What I would welcome in the LVS code is a setting for minimum memory. If there
is less memory then no more memory would be allocated.
Today I tested LVS on PIII 1.4Ghz with 1GB RAM as director D2. I had another
director D1 which was used solely for sending packets to D2 via 1Gbps link. D2
was connected directly by 1Gbps to R1 (realserver) and R2. Real servers were
dual 2.8Ghz Xeons. I used http://www.ssi.bg/~ja/testlvs-0.1.tar.gz for sending
packets from different IP addresses with the aim of causing R2 to crash. I used
the script that is included in the tar to measure incoming packets on the real
servers. I was sending TCP syn packets from R1 from 16 000 000 different IP
addresses. Amemthresh was set to default value. I was able to crash D2 within 1
minute (it ran out of memory, also heartbeat and other stuff didn't work
properly, but pings did. Reboot was necessary). I changed amemthreshold to
16384 and even with that value i could crash D2 within a couple of minutes.
With 16384*4096=67MB - when the drop packet defense started working. So I
changed amemthresh to 65536*4096=268MB and with that value I wasn
't able to crash it (it survived syn DOS, after I stopped DOSing it it resumed
working). However I think it would be better to have a configurable hard limit
on the minimum amount of memory i.e 50MB when LVS stops allocating memory for
new connections.
What would happen if D1 was used as slave? Would it run out of memory with
amemthresh 65536? Does this value have any meaning on the slave? Is there any
protection on the slave?
Some results I measured:
(LVS TUN with persistence 160s, wrr scheduling, firewall marks, 1 director, 2
real servers, 1Gbps links, 2.6.18.3 kernel, hash table size 2^20, director with
1GB memory, tcpfin 15s)
tcp 40 bytes syn packets - 60 000/s received on R1/R2 = 2.4MB/s = 20MBit/s
using 16 000 000 different IP addresses. Not sustainable.
udp 1400 bytes - 44 000/s = 62MB/s = 500Mbit/s - using 16 000 000 IP addresses.
Not sustainable.
udp 1400 bytes - 64 000/s = 90MB/s = 717Mbit/s - using 200 000 IP addresses -
sustainable
udp 1400 bytes - 73 000/s = 102MB/s = 817Mbit/s - using 10 000 IP addresses
udp 1400 bytes - 73000/s = 102MB/s = 817Mbit/s - using 1 000 IP addresses
We don't really need this performance, we won't use more than 100Mbit/s so
there is big reserve.
>I don't think that is is to do with ipvsadm, as I think that the
>strings come from the kernel. Can you see if the same problem shows
>up when you cat /proc/net/ip_vs_conn ?
>Once these connections get into that state, do they stay in that state
>until they timeout, or do they progress to a different state?
cat shows ERR! too. They didn't cause any problems in my tests, they
dissappeared after specified timeout .
>Could you send some examples of this behaviour?
>I suspect that it is harmless, but I also think it is
>a bug in the the reporting functionality.
It occurs on kernel 2.6.18.3 with tunneling + fwmarks + persistence on (it
happens to me at work and at home in vmware too). Its very easy to reproduce.
Jaro
|