Hi
We are planning to use LVS for a setup with a lot of (millions)
concurrent (mostly idle) connections and were setting up sync daemon to
avoid a reconnect flood when the master fails.
Originally I was planning to ask for help, but it turned out to be one
of those cases where you go over the problem description and refine the
details until the problem description ceases to exist. So, instead I'll
post the results and what we needed tuning to get it working.
Short summary: sync daemon is working very well with high connection
rate if you increase rmem_default and wmem_default sysctls.
Initially, there was a problem with sync_master daemon sending updates.
As it just sent updates every second, the send buffer of the socket got
full and we got ip_vs_sync_send_async errors in kernel log. We decreased
the sleep time to 100ms which gave slightly better results, but
net.core.wmem_max and net.core.wmem_default also needed increasing
(which probably means, that we could have left the kernel unchanged).
After that we had problems on the sync_backup daemon size, whose receive
buffer now got full from time to time and resulted in lost sync packets
(visible through udp receive errors). So we also increased the rmem
sysctls quite a bit, which solved that problem as well.
Another consideration for mostly idle connections seems to be choosing
appropriate sync_threshold and tcp timeout (ipvsadm -L --timeout)
values. Our current plan is to increase the tcp timeout to 30 minutes
(1800) and reduce sync_threshold to (3 10) so that the connections would
stay actual on the backup even with relatively infrequent keepalives
being sent.
Hardware for testing was a few of 2xquad opterons with 16GB memory, dual
e1000 and onboard dual bnx network cards, sync_threshold = 0 1 (sync on
every packet, for testing), using LVS-NAT. Set up and run by a very
diligent coworker :)
Some results:
8.5 million connections all synced
~100Kpackets/s of keepalives on external interface
900 packets/s of sync daemon traffic
just over 100Mbps of traffic (short packets)
On primary LVS, ~1% of 1 core for sync_master daemon, 1 core 10-40% in
softirq (ipvs?), ~1.7GB of memory used in total
On secondary LVS, ~10% of 1 core for sync_backup daemon, 1 core 20% in
softirq (ipvs?), ~1.7GB of memory used in total
Failover with keepalived worked as expected once all connections were
established.
The likely limiting factor seems to be the 1 core 40% in softirq. This
was also the core which serviced the bnx network card so it's possible
that switching entirely to e1000 would leviate the problem (the core
responsible for e1000 was ~10% in softirq). Also, time spent in softirq
was not really consistent and sometimes dropped quite low (maybe an
altogether different problem).
Interrupt load was low (8K/s in total) with both e1000 and bnx cards in
use, although we still superstitiously suspect broadcom is not quite as
scalable as intel.
Siim
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
|