LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: 'Preference' instead 'persistence'?

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: 'Preference' instead 'persistence'?
From: Roberto Nibali <ratz@xxxxxx>
Date: Wed, 09 Oct 2002 12:11:59 +0200
Hi,

Case 1 is already handled with the normal persistence setting and WLC.
Case 3 can be enabled using the sysctl setting in the new LVS.

No, case 3 can only be solved, if you have an intelligent user space
daemon running which takes out the service. Otherwise the load balancer
doesn't really know about the state of a RS.
Obviously. But does anyone here run LVS without ldirectord or keepalived at all?

You bet! Even if you just count me, you would have 30 e-commerce sites in Switzerland being balancer over LVS without ldirectord or keepalived. In fact those didn't exist or weren't in a useable shape back then. I know of a lot of other people using their own tools. Since you seem to be located in the Netherlands you might be interested to know that the Fleurop corporation for example is being load balanced over LVS.

[At least now you know who you can blame for if you sent a bunch flowers to your mistress in the states (business trip, I know) and while entering the CC number you got redirected to a new server :).]

For new client IP addresses, yes, but not for clients that already connected.

I seem to have a barrier when it comes to reassign already connected clients. Let's see if I understand it in the course of this email.

In the case of non-persistent connections each HTTP request will be spread over all available servers and the balance will always be close to optimal.

Completely agreed.

More importantly, changes in the weighting will take effect almost instantly.

Agreed.

In the case of fully persistent connections it will take quite a while before the clients are gone, so the weight shift is not even remotely instant.

In theory you're right but you can't generalize it like that. It depends on the RTT of the page fetch. I also don't see it as a problem of "... before the clients are gone..." because using the WLC algorithm and changing weights will not change the view of the scheduler about the total amount of connections. So even if you change it, the load distribution should happen fairly quick. Unless of course you do an extreme case where you assign one server 10 times the weight of another one, while the former already has tons of active connections.

During normal operation the persistency performs best because it avoids throwing session state around. When reassigning the weight, however, it's

It performs best for _your_ application framework. I also have customers that have session ID's stored on a central DB and they never cache it. So it doesn't matter for our customer and in their case it is better _not_ to use persistency.

fine to move all clients to another server as if they were not persistent at all. All subsequent requests are then treated as 'persistent' again.

Here is where I have a problem understanding you. How does your idea differ from the WLC scheduler? Please give me an outline by an example, something like:

t0: RS1, WLC pers, w=3, a_conns=300, i_conns=3500
t0: RS2, WLC pers, w=2, a_conns=200, i_conns=2700
t0: RS3, WLC pers, w=3, a_conns=300, i_conns=1380
t1: change RS2 weight to 3
t2: ??? (new incoming connections go where)

This way there is only a very limited amount of reassigning done when there truly is an imbalance (accidentally, because a single IP turns out to be a NAT-ed LAN, or implied, because we changed weighting). Non-persistent

If you have a single IP representing a NAT pool like those AOL ones, there is not a lot you can do. Once you have assigned this IP with a template to a RS you're stuck with it, be it soft persistent or hard persistent. Unless you mean that in such a case we should say: Oh well, everyone is hitting my RS1 from the same IP today, so I say in that case, that sending subsequent requests from this IP will go to RS2 where we need (and I think this is your point) to _reload_ the session ID from disk to RAM but where we can say that this is better in terms of equalizing load imbalance than if those subsequent requests had gone to the initial RS?

I know I did get it right this time. Don't tell me almost, I won't take it, I'm already under medical care because of this problem, I can't sleep anymore ... :)

reassigns for each and every request, i.e. waaaaaay too often, and fully persistent never reassigns at all, which works, but isn't really required.

Ok, this calls for an addition to the persistent WLC scheduler. It would be very difficult though, because the template to choose the RS would be the same but according to the load of the RS you would need to generate to subtemplates or subpools of RS for one template. This is a nice idea!

Now, what I come out with after thinking about it for 2 minutes is:

o what we could do is add an additional field to the template which indicates
  the subpool of assigned RS. Normally this is 0
o when a per template byte or packet counter exceeds a threshold, you do an
  internal split into those pools and reassign the clients to new RS
o Wensong and Julian are going to beat the crap out of me for that :)

Ok. You call it soft requirement or non-true persistency if you set
persistency where you wouldn't really need it but where you gain from it
by not needing to load session IDs for new requests, right?
Yup.

Very good. It starts making a lot of sense not. I had a knot in my brain because I wasn't realizing that you actually don't need persistency at all, it's just for performance reasons (the session ID disk -> RAM thing).

You're _very_ close now... I'm not intending to reassign active TCP socket connections, only _subsequent_ incoming connections from the same IP.

I get it now, what a wonderful world this is ...

Remember that we're talking something similar to persistency here, so we're not working on a connection-per-connection basis, but IP-per-IP basis, stretching over much more than a single socket connection.

Yep.

No, each subsequent request would get to another RS, incurring the session retrieval penalty once and only once. The active requests will obviously still be served by the 'old' RS. After all a single request can't take that long...

I'm inclined to tell you not to use persistency and upgrade your DB ;)

Quiescence and persistency is a pain I discovered to my disgrace. One of our realservers crashed due to a broken motherboard and some clients (amongst which the website's very company itself...) got connected to the broken server. I had to turn on quiescence in ldirectord.

I don't know ldirectord and those tools because I've written my own suite but in my tools this is detected and the template is being taken out immediately and the remaining connections are flushed. No need to manually adapt it. YMMV.

Or do you mean my above example, where the quiescence is not caused by ldirectord but enforced manually before maintenance? Yes, in that case the

Yes, for maintainance you might need to force it manually unless your application does it (mine for example has a listener on the director where I can send such requests via an XML form to the core user space engine). Other than that there is no need to quiesce a RS unless you use my threshold limitation patch. There it is wise to set the weights to zero once the upper connection threshold is reached. But this is all done in the kernel.

old clients will stay on the quiesced RS, which is exactly what I _don't_ want. If the server is down or weighted downwards it's fine to reassign clients one time. It's not fine to reassign them every time, but there's nothing wrong with reassigning them only once.

There are two things you're talking about now, and I assume we're always talking about the persistent case.

If a service on a RS is down, your user space core detection engine should take the template out and before that make sure you have set /proc/sys/net/ipv4/vs/expire_nodest_conn.

The second case is when you have soft persistency (I start liking that term) where you see that one RS is overloaded because of stupid proxy configurations such as AOL and where you would like to reassign sessions to new RS to equalize the load imbalance. I've sort of proposed something but this needs to be investigated. Also I am not sure if the source hash scheduler would help you. I haven't played with it in a while and the documentation is kind of sparse ;).

Wait a minute, what is being reassigned? Old connections are being
assigned according to the exisiting template to their appropriate RS.
And that's what I wanted to influence ;-)

I see.

For best performance and reliability it would be very nice if LVS could change the template when needed, but not on every HTTP request if there is no need for it. See what I'm heading at?

Absolutely. I got it further up. Thanks.

Yes. It's only a very small amount of people that stays longer than 15 minutes, but discarding the monitoring software from us and the web site owner those few clients are the ones actually ordering products and hence the most valuable...

I see. Maybe you should have a separate 'gold client' pool with a dedicated RS. This can be achieved by generating an IP pool of your most valuable customers and then fwmark it and load balance that fwmark.

I know that current LVS cannot do this :-)

:) Just checking ...

I was curious if it was feasible for future versions...

Maybe yes, let's see if other people have an opinion on this too once we all understand what it is about.

Same as the ASP session timeout, 15 minutes. The few people staying that long are really requesting multiple pages.

Ok, fair enough. One little thing that bothers me though is that you're talking about upgrading and problems with hardware a lot. I run several 100 boxes around the world and I hardly ever need to exchange anything. That's why it also almost never occured to me to quiesce a RS to be able to perform upgrades. One thing besides that of course are software upgrades. Then I understand your pain.

Hmm, the calculation seems to be right, although I'm unsure why you are dividing by 3 in the end. You want the bandwidth usage per realserver? I took

Yes. Actually you should even divide it per 8-10 if you're using LVS-DR because this will then be the request stream rate hitting the load balancer if one assumes that the ratio request/reply is 1/8 in bytes. I also used the three because you seem to have a gaussian distribution and by taking 1/3 of the peak and rounding up/down it would just pop out this divisor.

a quick look at the bandwidth usage graphs for our leased line and it looks like we have close to 0 MBit/s during the night and well over 2.5 MBit/s during the evenings. The 1.2-1.5 is reasonable for the rest of the day. All

Fine. So we're talking about a site with very low bandwidth constraints. I just checked one of our customers site and they have between 4 and 13Mbit/s.

measured for the entire cluster, not for single realservers.

Yes.

It's not exactly what you'd call 'high volume' though, indeed.

Exactly. Do you actually have numbers about the mean packet size? This would be very interesting. It's the first step in optimizing your DB :).

Reasonable is a better word here, but there's not too much room for improvement without partitioning the database or playing other advanced tricks. Either way it's a tradeoff of budget vs performance and as long as

DB tuning always needs advanced tricks and most of the time you need a Russian guy to do this :)

the site still performs well enough management wants to delay buying a bigger database. And frankly, the database can still handle the load, it's only that there's no growth space available.

And that is a risky business, but I'm sure you know that already.

Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc



<Prev in Thread] Current Thread [Next in Thread>