Hello,
On Sat, 10 Nov 2001, Matthijs van der Klip wrote:
> - I have a custom hit tester (run from an Origin 200) which can generate
> between 3000 and 3500 hits/connections per second.
You are missing one reason for this problem: the fact that
your client(s) create connections from limited number of addresses
and ports. Try to answer yourself from how many different client
saddr/sport pairs you hit the LVS cluster. IMO, you reach this
limit. I'm not sure how many test client hosts you are using. If the
client host is only one then there is a limit of 65536 TCP ports per
src IP addr. Each connection has expiration time according to its
proto state. When the rate is high enough not to allow the old entries
to expire, you reach a situation where the connections are reused,
i.e. the connection number showed from ipvsadm -L does not increase.
> My problem/question:
>
> - When I test the LVS (again by throwing more than 3000 hits/second at it),
> it tops at about 16384 (*4=65536) connections (inactcon) per realserver.
> Packets are not being dropped by ip_conntrack at the realservers so it looks
> like they're being dropped at the director. My question is: why are these
> packets being dropped? I expected a maximum of 4*32768 = 131072 connections
> before packets being dropped (by ip_conntrack again).
>
> - I have done a second testrun where I removed the director as a realserver,
> so I had three realservers instead of four. This time the number of
> connections (inactcon) topped at about 21000 (again *4=65536) per
> realserver.
In this case the number of client connections is still the
same. The difference is to how many real servers they are scheduled.
> What is the limiting factor in this story? I have searched the mailing
> archives and it has been explained there several times that a table size of
> 65536 does _not_ mean a maximum of 65536 connections. I expected to be able
> to saturate the webservers (due to the tcp TIMEWAIT state timeout), but I
> did not expect any limitations (other than RAM/CPU etc) in the LVS itself.
Yes, there are no such limits, at least such low.
> The reason I switched from LVS/NAT to LVS/DR was exactly because I hit this
> limit of 65536 simultaneous connections (which I then believed was to blame
> the NAT tables).
>
> I hope I have explained the situation/problem clear enough. This setup has
> to be able to handle >3000 hits/s in the near future, so I hope you will be
> able to help me.
Use more client hosts. These days one client host can not load a
director from the same CPU class. You are lucky that the TCP timestamp
support allows you not to hit the 65536ports/120sec conn/sec limit.
> Best regards,
>
> Matthijs van der Klip
> NOS (dutch public broadcasting organisation)
Regards
--
Julian Anastasov <ja@xxxxxx>
|