LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: 'Preference' instead 'persistence'?

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: 'Preference' instead 'persistence'?
From: Martijn Klingens <mklingens@xxxxxx>
Date: Wed, 9 Oct 2002 16:30:55 +0200
On Wednesday 09 October 2002 12:11, Roberto Nibali wrote:
> > In the case of fully persistent connections it will take quite a while
> > before the clients are gone, so the weight shift is not even remotely
> > instant.
>
> In theory you're right but you can't generalize it like that. It depends on
> the RTT of the page fetch. I also don't see it as a problem of "... before
> the clients are gone..." because using the WLC algorithm and changing
> weights will not change the view of the scheduler about the total amount of
> connections. So even if you change it, the load distribution should happen
> fairly quick. Unless of course you do an extreme case where you assign one
> server 10 times the weight of another one, while the former already has
> tons of active connections.

Well, I'm only concerned about two cases anyway: 1. I manually set the weight 
to 0 because of maintenance work and 2. The machine goes down because of 
problems. Upgrading LVS to the new version with the sysctl setting covers #2, 
so I hope my boss assigns some time in my TODO list to upgrade LVS.

Which leaves me with option 1. Ideally there's no downtime because of 
maintenance, but reality is that you often can't apply service packs and 
hotfixes to the realservers without bringing them down (the realservers run 
win2k, which is unfortunate from a sysadmin point of view, but implied by a 
rather large ASP codebase).

> It performs best for _your_ application framework.

That's what our developers tell me. They claim that using other session 
storage than the IIS default handler (which keeps session state in RAM and 
only in RAM) is by far the best performing and using custom session save 
handlers degrades performance a lot. I don't have the ASP/IIS knowledge to 
question that, but it sounds reasonable enough to me.

(Currently we don't use shared storage of session data at all, because of the 
potential performance problems, so we stick with pure persistency. But I 
personally dislike this choice and would rather prefer to move on to 
something more professional and reliable.)

> If you have a single IP representing a NAT pool like those AOL ones, there
> is not a lot you can do. Once you have assigned this IP with a template to
> a RS you're stuck with it, be it soft persistent or hard persistent. Unless
> you mean that in such a case we should say: Oh well, everyone is hitting my
> RS1 from the same IP today, so I say in that case, that sending subsequent
> requests from this IP will go to RS2 where we need (and I think this is
> your point) to _reload_ the session ID from disk to RAM but where we can
> say that this is better in terms of equalizing load imbalance than if those
> subsequent requests had gone to the initial RS?

It's a bit complex to reassign parts of a client IP, but not the whole IP, but 
if you notice that, say, IP 1.2.3.4 causes a lot of traffic on RS1, why not 
reassign all _other_ IPs that are currently using RS1 to the other 
realservers? It sounds to me that you only need to track the amount of 
activity per IP and base the weighting on that and reassign the assigned RS 
from the _least_ active IPs, since those are best used for balancing.

But again, it's not imbalance during normal business that really worries me, 
as it has never hit me until now. It's the imbalance penalty when doing 
maintenance to the site that annoys me most.

> I know I did get it right this time. Don't tell me almost, I won't take it,
> I'm already under medical care because of this problem, I can't sleep
> anymore ... :)

At least you don't need medical care for lack of humour ;-)

> Ok, this calls for an addition to the persistent WLC scheduler. It would be
> very difficult though, because the template to choose the RS would be the
> same but according to the load of the RS you would need to generate to
> subtemplates or subpools of RS for one template. This is a nice idea!

You made it a bit more complex than what I thought about, but overengineering 
is what every programmer does, no? ;-)

I was thinking along the lines of

- We monitor each client IP for activity
- If the realserver is imbalanced by more than a few percent, or of the target
  balance is 0 we start reassigning each client IP to a new realserver, by
  modifying the template, starting on the first _new_ socket connection.
- If the target weight for the realserver is 0 we always reassign
- If the realserver is only seriously imbalanced, but the weight is nonzero we
  reassign only if a given client's activity is smaller than the average
  activity per client IP for this realserver (or something similar), thereby
  balancing the cluster using the single hosts and leaving the big NAT-ed
  networks for what they are if possible. Even if multiple NAT ranges end up
  on a single realserver the disappearing smaller IPs make the average
  activity higher for each subsequent reassign run, so in the end even the
  NAT-blocks will be reassigned, if needed, but only if really needed.

> o Wensong and Julian are going to beat the crap out of me for that :)

I hope you can run fast ;-)

> I get it now, what a wonderful world this is ...

;-)

> I'm inclined to tell you not to use persistency and upgrade your DB ;)

Tell the people deciding over the money and I'm all for it ;-)

> > Quiescence and persistency is a pain I discovered to my disgrace. One of
> > our realservers crashed due to a broken motherboard and some clients
> > (amongst which the website's very company itself...) got connected to the
> > broken server. I had to turn on quiescence in ldirectord.
                          ^^^^^^^^^^^^^^^^

Make that "off", not "on". Oops.

> I don't know ldirectord and those tools because I've written my own suite
> but in my tools this is detected and the template is being taken out
> immediately and the remaining connections are flushed. No need to manually
> adapt it. YMMV.

What ldirectord does if it detects a realserver failure is set the weight to 0 
if quiescence is turned on. That's nice for transitional errors and/or for 
non-persistent connections, but when the connections are persistent that 
simply means clients are never redirected at all to another RS until the 
timeout setting. Needless to say that's unwanted behaviour :-)

Turning the quiescence option off avoids this problem btw.

> If a service on a RS is down, your user space core detection engine should
> take the template out and before that make sure you have set
> /proc/sys/net/ipv4/vs/expire_nodest_conn.

ldirectord with quiescence doesn't do that by default, but that's what I did 
configure it to do indeed.

> The second case is when you have soft persistency (I start liking that
> term) where you see that one RS is overloaded because of stupid proxy
> configurations such as AOL and where you would like to reassign sessions to
> new RS to equalize the load imbalance. I've sort of proposed something but
> this needs to be investigated. Also I am not sure if the source hash
> scheduler would help you. I haven't played with it in a while and the
> documentation is kind of sparse ;).

Indeed :-) I read the docs as well, and it seems to me it doesn't do what I 
need. But as you correctly state, the docs are a bit sparse...

> Ok, fair enough. One little thing that bothers me though is that you're
> talking about upgrading and problems with hardware a lot. I run several 100
> boxes around the world and I hardly ever need to exchange anything. That's
> why it also almost never occured to me to quiesce a RS to be able to
> perform upgrades. One thing besides that of course are software upgrades.
> Then I understand your pain.

Ever had to admin win2k realservers? ;-)

Besides, most of the maintenance downtime is formed by code updates, because 
the sites are still evolving. And copying over new code from the beta to the 
live is not something you do on an active RS...

> Yes. Actually you should even divide it per 8-10 if you're using LVS-DR
> because this will then be the request stream rate hitting the load balancer
> if one assumes that the ratio request/reply is 1/8 in bytes. I also used
> the three because you seem to have a gaussian distribution and by taking
> 1/3 of the peak and rounding up/down it would just pop out this divisor.

Not really, as RS1 has half the weight of RS2 and RS3 (the backoffice and some 
other stuff runs on RS1, so that machine is loaded enough without LVS 
activity :-)

> Fine. So we're talking about a site with very low bandwidth constraints. I
> just checked one of our customers site and they have between 4 and
> 13Mbit/s.

I could only dream about adminning such boxes :-)

Then again, for a first job after graduation it's not bad at all. This setup 
is more than ambitious enough before I think I know how all of it works...

> Exactly. Do you actually have numbers about the mean packet size? This
> would be very interesting. It's the first step in optimizing your DB :).

Hmm, actually I have no idea how to measure that. Will ask the database admin 
if he can come up with something.

> DB tuning always needs advanced tricks and most of the time you need a
> Russian guy to do this :)

We have a Russian developer, would that qualify? :-P

> > the site still performs well enough management wants to delay buying a
> > bigger database. And frankly, the database can still handle the load,
> > it's only that there's no growth space available.
>
> And that is a risky business, but I'm sure you know that already.

Yes, I am well aware of this and so is the rest of the sysadmin team. But we 
don't do the budgets :(
-- 
Martijn



<Prev in Thread] Current Thread [Next in Thread>