Re: 'Preference' instead 'persistence'?

To:	lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject:	Re: 'Preference' instead 'persistence'?
From:	Roberto Nibali <ratz@xxxxxx>
Date:	Wed, 09 Oct 2002 12:11:59 +0200

Hi,

Case 1 is already handled with the normal persistence setting and WLC.
Case 3 can be enabled using the sysctl setting in the new LVS.


No, case 3 can only be solved, if you have an intelligent user space
daemon running which takes out the service. Otherwise the load balancer
doesn't really know about the state of a RS.

Obviously. But does anyone here run LVS without ldirectord or keepalived atall?

You bet! Even if you just count me, you would have 30 e-commerce sites inSwitzerland being balancer over LVS without ldirectord or keepalived. In factthose didn't exist or weren't in a useable shape back then. I know of a lot ofother people using their own tools. Since you seem to be located in theNetherlands you might be interested to know that the Fleurop corporation forexample is being load balanced over LVS.

[At least now you know who you can blame for if you sent a bunch flowers to yourmistress in the states (business trip, I know) and while entering the CC numberyou got redirected to a new server :).]

For new client IP addresses, yes, but not for clients that already connected.

I seem to have a barrier when it comes to reassign already connected clients.Let's see if I understand it in the course of this email.

In the case of non-persistent connections each HTTP request will be spreadover all available servers and the balance will always be close to optimal.


Completely agreed.

More importantly, changes in the weighting will take effect almost instantly.


Agreed.

In the case of fully persistent connections it will take quite a while beforethe clients are gone, so the weight shift is not even remotely instant.

In theory you're right but you can't generalize it like that. It depends on theRTT of the page fetch. I also don't see it as a problem of "... before theclients are gone..." because using the WLC algorithm and changing weights willnot change the view of the scheduler about the total amount of connections. Soeven if you change it, the load distribution should happen fairly quick. Unlessof course you do an extreme case where you assign one server 10 times the weightof another one, while the former already has tons of active connections.

During normal operation the persistency performs best because it avoidsthrowing session state around. When reassigning the weight, however, it's

It performs best for _your_ application framework. I also have customers thathave session ID's stored on a central DB and they never cache it. So it doesn'tmatter for our customer and in their case it is better _not_ to use persistency.

fine to move all clients to another server as if they were not persistent atall. All subsequent requests are then treated as 'persistent' again.

Here is where I have a problem understanding you. How does your idea differ fromthe WLC scheduler? Please give me an outline by an example, something like:


t0: RS1, WLC pers, w=3, a_conns=300, i_conns=3500
t0: RS2, WLC pers, w=2, a_conns=200, i_conns=2700
t0: RS3, WLC pers, w=3, a_conns=300, i_conns=1380
t1: change RS2 weight to 3
t2: ??? (new incoming connections go where)

This way there is only a very limited amount of reassigning done when theretruly is an imbalance (accidentally, because a single IP turns out to be aNAT-ed LAN, or implied, because we changed weighting). Non-persistent

If you have a single IP representing a NAT pool like those AOL ones, there isnot a lot you can do. Once you have assigned this IP with a template to a RSyou're stuck with it, be it soft persistent or hard persistent. Unless you meanthat in such a case we should say: Oh well, everyone is hitting my RS1 from thesame IP today, so I say in that case, that sending subsequent requests from thisIP will go to RS2 where we need (and I think this is your point) to _reload_ thesession ID from disk to RAM but where we can say that this is better in terms ofequalizing load imbalance than if those subsequent requests had gone to theinitial RS?

I know I did get it right this time. Don't tell me almost, I won't take it, I'malready under medical care because of this problem, I can't sleep anymore ... :)

reassigns for each and every request, i.e. waaaaaay too often, and fullypersistent never reassigns at all, which works, but isn't really required.

Ok, this calls for an addition to the persistent WLC scheduler. It would be verydifficult though, because the template to choose the RS would be the same butaccording to the load of the RS you would need to generate to subtemplates orsubpools of RS for one template. This is a nice idea!


Now, what I come out with after thinking about it for 2 minutes is:

o what we could do is add an additional field to the template which indicates
  the subpool of assigned RS. Normally this is 0
o when a per template byte or packet counter exceeds a threshold, you do an
  internal split into those pools and reassign the clients to new RS
o Wensong and Julian are going to beat the crap out of me for that :)

Ok. You call it soft requirement or non-true persistency if you set
persistency where you wouldn't really need it but where you gain from it
by not needing to load session IDs for new requests, right?

Yup.

Very good. It starts making a lot of sense not. I had a knot in my brain becauseI wasn't realizing that you actually don't need persistency at all, it's justfor performance reasons (the session ID disk -> RAM thing).

You're _very_ close now... I'm not intending to reassign active TCP socketconnections, only _subsequent_ incoming connections from the same IP.


I get it now, what a wonderful world this is ...

Remember that we're talking something similar to persistency here, so we'renot working on a connection-per-connection basis, but IP-per-IP basis,stretching over much more than a single socket connection.


Yep.

No, each subsequent request would get to another RS, incurring the sessionretrieval penalty once and only once. The active requests will obviouslystill be served by the 'old' RS. After all a single request can't take thatlong...


I'm inclined to tell you not to use persistency and upgrade your DB ;)

Quiescence and persistency is a pain I discovered to my disgrace. One of ourrealservers crashed due to a broken motherboard and some clients (amongstwhich the website's very company itself...) got connected to the brokenserver. I had to turn on quiescence in ldirectord.

I don't know ldirectord and those tools because I've written my own suite but inmy tools this is detected and the template is being taken out immediately andthe remaining connections are flushed. No need to manually adapt it. YMMV.

Or do you mean my above example, where the quiescence is not caused byldirectord but enforced manually before maintenance? Yes, in that case the

Yes, for maintainance you might need to force it manually unless yourapplication does it (mine for example has a listener on the director where I cansend such requests via an XML form to the core user space engine). Other thanthat there is no need to quiesce a RS unless you use my threshold limitationpatch. There it is wise to set the weights to zero once the upper connectionthreshold is reached. But this is all done in the kernel.

old clients will stay on the quiesced RS, which is exactly what I _don't_want. If the server is down or weighted downwards it's fine to reassignclients one time. It's not fine to reassign them every time, but there'snothing wrong with reassigning them only once.

There are two things you're talking about now, and I assume we're always talkingabout the persistent case.

If a service on a RS is down, your user space core detection engine should takethe template out and before that make sure you have set/proc/sys/net/ipv4/vs/expire_nodest_conn.

The second case is when you have soft persistency (I start liking that term)where you see that one RS is overloaded because of stupid proxy configurationssuch as AOL and where you would like to reassign sessions to new RS to equalizethe load imbalance. I've sort of proposed something but this needs to beinvestigated. Also I am not sure if the source hash scheduler would help you. Ihaven't played with it in a while and the documentation is kind of sparse ;).

Wait a minute, what is being reassigned? Old connections are being
assigned according to the exisiting template to their appropriate RS.

And that's what I wanted to influence ;-)


I see.

For best performance and reliability it would be very nice if LVS could changethe template when needed, but not on every HTTP request if there is no needfor it. See what I'm heading at?


Absolutely. I got it further up. Thanks.

Yes. It's only a very small amount of people that stays longer than 15minutes, but discarding the monitoring software from us and the web siteowner those few clients are the ones actually ordering products and hence themost valuable...

I see. Maybe you should have a separate 'gold client' pool with a dedicated RS.This can be achieved by generating an IP pool of your most valuable customersand then fwmark it and load balance that fwmark.

I know that current LVS cannot do this :-)


:) Just checking ...

I was curious if it was feasible for future versions...

Maybe yes, let's see if other people have an opinion on this too once we allunderstand what it is about.

Same as the ASP session timeout, 15 minutes. The few people staying that longare really requesting multiple pages.

Ok, fair enough. One little thing that bothers me though is that you're talkingabout upgrading and problems with hardware a lot. I run several 100 boxes aroundthe world and I hardly ever need to exchange anything. That's why it also almostnever occured to me to quiesce a RS to be able to perform upgrades. One thingbesides that of course are software upgrades. Then I understand your pain.

Hmm, the calculation seems to be right, although I'm unsure why you aredividing by 3 in the end. You want the bandwidth usage per realserver? I took

Yes. Actually you should even divide it per 8-10 if you're using LVS-DR becausethis will then be the request stream rate hitting the load balancer if oneassumes that the ratio request/reply is 1/8 in bytes. I also used the threebecause you seem to have a gaussian distribution and by taking 1/3 of the peakand rounding up/down it would just pop out this divisor.

a quick look at the bandwidth usage graphs for our leased line and it lookslike we have close to 0 MBit/s during the night and well over 2.5 MBit/sduring the evenings. The 1.2-1.5 is reasonable for the rest of the day. All

Fine. So we're talking about a site with very low bandwidth constraints. I justchecked one of our customers site and they have between 4 and 13Mbit/s.

measured for the entire cluster, not for single realservers.


Yes.

It's not exactly what you'd call 'high volume' though, indeed.

Exactly. Do you actually have numbers about the mean packet size? This would bevery interesting. It's the first step in optimizing your DB :).

Reasonable is a better word here, but there's not too much room forimprovement without partitioning the database or playing other advancedtricks. Either way it's a tradeoff of budget vs performance and as long as

DB tuning always needs advanced tricks and most of the time you need a Russianguy to do this :)

the site still performs well enough management wants to delay buying a biggerdatabase. And frankly, the database can still handle the load, it's only thatthere's no growth space available.


And that is a risky business, but I'm sure you know that already.

Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

<Prev in Thread]	Current Thread	[Next in Thread>
'Preference' instead 'persistence'?, Martijn Klingens Re: 'Preference' instead 'persistence'?, Roberto Nibali Re: 'Preference' instead 'persistence'?, Martijn Klingens Re: 'Preference' instead 'persistence'?, Roberto Nibali Re: 'Preference' instead 'persistence'?, Martijn Klingens Re: 'Preference' instead 'persistence'?, Roberto Nibali Re: 'Preference' instead 'persistence'?, Martijn Klingens Re: 'Preference' instead 'persistence'?, Roberto Nibali <= Re: 'Preference' instead 'persistence'?, Martijn Klingens Re: 'Preference' instead 'persistence'?, Malcolm Turnbull Re: 'Preference' instead 'persistence'?, Roberto Nibali Re: 'Preference' instead 'persistence'?, Martijn Klingens Re: 'Preference' instead 'persistence'?, Roberto Nibali list-admin? (Re: 'Preference' instead 'persistence'?), Martijn Klingens Re: list-admin? (Re: 'Preference' instead 'persistence'?), Martijn Klingens Re: list-admin? (Re: 'Preference' instead 'persistence'?), Joseph Mack Re: list-admin? (Re: 'Preference' instead 'persistence'?), Martijn Klingens Re: 'Preference' instead 'persistence'?, Jeremy Kerr

Previous by Date:	Re: Problem with ipvsadm and kernel 2.4.18, cameleon
Next by Date:	Re: 'Preference' instead 'persistence'?, Martijn Klingens
Previous by Thread:	Re: 'Preference' instead 'persistence'?, Martijn Klingens
Next by Thread:	Re: 'Preference' instead 'persistence'?, Martijn Klingens
Indexes:	[Date] [Thread] [Top] [All Lists]