I'm curious if there are any known DR LVS bottlenecks? My company had the
opportunity to put LVS to the test the day following the superbowl when we
delivered 12TB of data in 1 day, and peaked at about 750Mbps.
In doing this we had a couple of problems with LVS (I think they were with
LVS). I was using the latest lvs for 2.2.18, and ldiretord to keep the
machines in and out of LVS. The LVS servers were running redhat with an
EEPro100. I had two clusters, web and video. The web cluster was a couple of
1U's with an acenic gig card, running 2.4.0, thttpd, with a somewhat
performance tuned system (parts of the C10K). At peak our LVS got slammed
with 40K active connections (so said ipvsadmin). When we reached this number,
or sometime before, LVS became in-accessible. I could however pull content
directly from a server, just not through the LVS. LVS was running on a single
proc p3, and load never went much above 3% the entire time, I could execute
tasks on the LVS but http requests weren't getting passed along.
A similar thing occurred with our video LVS. While our real servers aren't
quite capable of handling the C10K, we did about 1500 a peice and maxed out at
about 150Mbps per machine. I think this is primarily modem users fault. I
think we would have pushed more bandwidth to a smaller number of high
bandwidth users (of course).
I know this volume of traffic choked LVS. What I'm wondering is, if there is
anything I could do to prevent this. Until we got hit with too many
connections (mostly modems I imagine) LVS performed superbly. I wonder if we
could have better performance with a gig card, or some other algorithm (I
started with wlc, but quickly changed to wrr because all the rr calculations
should be done initially and never need to be done again unless we change
weights, I thought this would save us).
Another problem I had was with ldirectord and the test (negotiate, connect).
It seemed like I needed some type of test to put the servers in initially,
then too many connections happened so I wanted no test (off), but the servers
would still drop out from ldirectord. That's a snowball type problem for my
amount of traffic, one server gets bumped because it's got too many
connections, and then the other servers get over-loaded, they'll get dropped
to, then I'll have an LVS directing to localhost.
So, if anyone has pushed DR LVS to the limits and has ideas to share on how to
maximize it's potential for given hardware, please let me know.
Jeffrey Schoolcraft
|