Hi,
Case 1 is already handled with the normal persistence setting and WLC.
Case 3 can be enabled using the sysctl setting in the new LVS.
No, case 3 can only be solved, if you have an intelligent user space
daemon running which takes out the service. Otherwise the load balancer
doesn't really know about the state of a RS.
Obviously. But does anyone here run LVS without ldirectord or keepalived at
all?
You bet! Even if you just count me, you would have 30 e-commerce sites in
Switzerland being balancer over LVS without ldirectord or keepalived. In fact
those didn't exist or weren't in a useable shape back then. I know of a lot of
other people using their own tools. Since you seem to be located in the
Netherlands you might be interested to know that the Fleurop corporation for
example is being load balanced over LVS.
[At least now you know who you can blame for if you sent a bunch flowers to your
mistress in the states (business trip, I know) and while entering the CC number
you got redirected to a new server :).]
For new client IP addresses, yes, but not for clients that already connected.
I seem to have a barrier when it comes to reassign already connected clients.
Let's see if I understand it in the course of this email.
In the case of non-persistent connections each HTTP request will be spread
over all available servers and the balance will always be close to optimal.
Completely agreed.
More importantly, changes in the weighting will take effect almost instantly.
Agreed.
In the case of fully persistent connections it will take quite a while before
the clients are gone, so the weight shift is not even remotely instant.
In theory you're right but you can't generalize it like that. It depends on the
RTT of the page fetch. I also don't see it as a problem of "... before the
clients are gone..." because using the WLC algorithm and changing weights will
not change the view of the scheduler about the total amount of connections. So
even if you change it, the load distribution should happen fairly quick. Unless
of course you do an extreme case where you assign one server 10 times the weight
of another one, while the former already has tons of active connections.
During normal operation the persistency performs best because it avoids
throwing session state around. When reassigning the weight, however, it's
It performs best for _your_ application framework. I also have customers that
have session ID's stored on a central DB and they never cache it. So it doesn't
matter for our customer and in their case it is better _not_ to use persistency.
fine to move all clients to another server as if they were not persistent at
all. All subsequent requests are then treated as 'persistent' again.
Here is where I have a problem understanding you. How does your idea differ from
the WLC scheduler? Please give me an outline by an example, something like:
t0: RS1, WLC pers, w=3, a_conns=300, i_conns=3500
t0: RS2, WLC pers, w=2, a_conns=200, i_conns=2700
t0: RS3, WLC pers, w=3, a_conns=300, i_conns=1380
t1: change RS2 weight to 3
t2: ??? (new incoming connections go where)
This way there is only a very limited amount of reassigning done when there
truly is an imbalance (accidentally, because a single IP turns out to be a
NAT-ed LAN, or implied, because we changed weighting). Non-persistent
If you have a single IP representing a NAT pool like those AOL ones, there is
not a lot you can do. Once you have assigned this IP with a template to a RS
you're stuck with it, be it soft persistent or hard persistent. Unless you mean
that in such a case we should say: Oh well, everyone is hitting my RS1 from the
same IP today, so I say in that case, that sending subsequent requests from this
IP will go to RS2 where we need (and I think this is your point) to _reload_ the
session ID from disk to RAM but where we can say that this is better in terms of
equalizing load imbalance than if those subsequent requests had gone to the
initial RS?
I know I did get it right this time. Don't tell me almost, I won't take it, I'm
already under medical care because of this problem, I can't sleep anymore ... :)
reassigns for each and every request, i.e. waaaaaay too often, and fully
persistent never reassigns at all, which works, but isn't really required.
Ok, this calls for an addition to the persistent WLC scheduler. It would be very
difficult though, because the template to choose the RS would be the same but
according to the load of the RS you would need to generate to subtemplates or
subpools of RS for one template. This is a nice idea!
Now, what I come out with after thinking about it for 2 minutes is:
o what we could do is add an additional field to the template which indicates
the subpool of assigned RS. Normally this is 0
o when a per template byte or packet counter exceeds a threshold, you do an
internal split into those pools and reassign the clients to new RS
o Wensong and Julian are going to beat the crap out of me for that :)
Ok. You call it soft requirement or non-true persistency if you set
persistency where you wouldn't really need it but where you gain from it
by not needing to load session IDs for new requests, right?
Yup.
Very good. It starts making a lot of sense not. I had a knot in my brain because
I wasn't realizing that you actually don't need persistency at all, it's just
for performance reasons (the session ID disk -> RAM thing).
You're _very_ close now... I'm not intending to reassign active TCP socket
connections, only _subsequent_ incoming connections from the same IP.
I get it now, what a wonderful world this is ...
Remember that we're talking something similar to persistency here, so we're
not working on a connection-per-connection basis, but IP-per-IP basis,
stretching over much more than a single socket connection.
Yep.
No, each subsequent request would get to another RS, incurring the session
retrieval penalty once and only once. The active requests will obviously
still be served by the 'old' RS. After all a single request can't take that
long...
I'm inclined to tell you not to use persistency and upgrade your DB ;)
Quiescence and persistency is a pain I discovered to my disgrace. One of our
realservers crashed due to a broken motherboard and some clients (amongst
which the website's very company itself...) got connected to the broken
server. I had to turn on quiescence in ldirectord.
I don't know ldirectord and those tools because I've written my own suite but in
my tools this is detected and the template is being taken out immediately and
the remaining connections are flushed. No need to manually adapt it. YMMV.
Or do you mean my above example, where the quiescence is not caused by
ldirectord but enforced manually before maintenance? Yes, in that case the
Yes, for maintainance you might need to force it manually unless your
application does it (mine for example has a listener on the director where I can
send such requests via an XML form to the core user space engine). Other than
that there is no need to quiesce a RS unless you use my threshold limitation
patch. There it is wise to set the weights to zero once the upper connection
threshold is reached. But this is all done in the kernel.
old clients will stay on the quiesced RS, which is exactly what I _don't_
want. If the server is down or weighted downwards it's fine to reassign
clients one time. It's not fine to reassign them every time, but there's
nothing wrong with reassigning them only once.
There are two things you're talking about now, and I assume we're always talking
about the persistent case.
If a service on a RS is down, your user space core detection engine should take
the template out and before that make sure you have set
/proc/sys/net/ipv4/vs/expire_nodest_conn.
The second case is when you have soft persistency (I start liking that term)
where you see that one RS is overloaded because of stupid proxy configurations
such as AOL and where you would like to reassign sessions to new RS to equalize
the load imbalance. I've sort of proposed something but this needs to be
investigated. Also I am not sure if the source hash scheduler would help you. I
haven't played with it in a while and the documentation is kind of sparse ;).
Wait a minute, what is being reassigned? Old connections are being
assigned according to the exisiting template to their appropriate RS.
And that's what I wanted to influence ;-)
I see.
For best performance and reliability it would be very nice if LVS could change
the template when needed, but not on every HTTP request if there is no need
for it. See what I'm heading at?
Absolutely. I got it further up. Thanks.
Yes. It's only a very small amount of people that stays longer than 15
minutes, but discarding the monitoring software from us and the web site
owner those few clients are the ones actually ordering products and hence the
most valuable...
I see. Maybe you should have a separate 'gold client' pool with a dedicated RS.
This can be achieved by generating an IP pool of your most valuable customers
and then fwmark it and load balance that fwmark.
I know that current LVS cannot do this :-)
:) Just checking ...
I was curious if it was feasible for future versions...
Maybe yes, let's see if other people have an opinion on this too once we all
understand what it is about.
Same as the ASP session timeout, 15 minutes. The few people staying that long
are really requesting multiple pages.
Ok, fair enough. One little thing that bothers me though is that you're talking
about upgrading and problems with hardware a lot. I run several 100 boxes around
the world and I hardly ever need to exchange anything. That's why it also almost
never occured to me to quiesce a RS to be able to perform upgrades. One thing
besides that of course are software upgrades. Then I understand your pain.
Hmm, the calculation seems to be right, although I'm unsure why you are
dividing by 3 in the end. You want the bandwidth usage per realserver? I took
Yes. Actually you should even divide it per 8-10 if you're using LVS-DR because
this will then be the request stream rate hitting the load balancer if one
assumes that the ratio request/reply is 1/8 in bytes. I also used the three
because you seem to have a gaussian distribution and by taking 1/3 of the peak
and rounding up/down it would just pop out this divisor.
a quick look at the bandwidth usage graphs for our leased line and it looks
like we have close to 0 MBit/s during the night and well over 2.5 MBit/s
during the evenings. The 1.2-1.5 is reasonable for the rest of the day. All
Fine. So we're talking about a site with very low bandwidth constraints. I just
checked one of our customers site and they have between 4 and 13Mbit/s.
measured for the entire cluster, not for single realservers.
Yes.
It's not exactly what you'd call 'high volume' though, indeed.
Exactly. Do you actually have numbers about the mean packet size? This would be
very interesting. It's the first step in optimizing your DB :).
Reasonable is a better word here, but there's not too much room for
improvement without partitioning the database or playing other advanced
tricks. Either way it's a tradeoff of budget vs performance and as long as
DB tuning always needs advanced tricks and most of the time you need a Russian
guy to do this :)
the site still performs well enough management wants to delay buying a bigger
database. And frankly, the database can still handle the load, it's only that
there's no growth space available.
And that is a risky business, but I'm sure you know that already.
Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
|