LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: 'Preference' instead 'persistence'?

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: 'Preference' instead 'persistence'?
From: Roberto Nibali <ratz@xxxxxxxxxxxx>
Date: Wed, 09 Oct 2002 22:55:15 +0200
Hello,

Well, I'm only concerned about two cases anyway: 1. I manually set the weight to 0 because of maintenance work and 2. The machine goes down because of problems. Upgrading LVS to the new version with the sysctl setting covers #2, so I hope my boss assigns some time in my TODO list to upgrade LVS.

You can prepare everything and when the next software upgrade is due to, you simply upgrade a bit more ;).

Which leaves me with option 1. Ideally there's no downtime because of maintenance, but reality is that you often can't apply service packs and hotfixes to the realservers without bringing them down (the realservers run win2k, which is unfortunate from a sysadmin point of view, but implied by a rather large ASP codebase).

I don't understand why? Let's assume you want to apply SP9239834.233a-1 to your RS. Now since you have multiple RS doing the work for you, you can simply quiesce one and wait half an hour and then perform the upgrade with SP9239834.233a-1. Then you put it back in with the old weight. Then play the game with the next RS. No downtime, no customer problems.

That's what our developers tell me. They claim that using other session storage than the IIS default handler (which keeps session state in RAM and only in RAM) is by far the best performing and using custom session save handlers degrades performance a lot. I don't have the ASP/IIS knowledge to question that, but it sounds reasonable enough to me.

Aha, does it. Did they do any reasonable performance tests with different frameworks? Apart from that this is only feasable if you have a very low traffic web site because otherwise the memory consumption will kill you or you will end up in memory trashing.

(Currently we don't use shared storage of session data at all, because of the potential performance problems, so we stick with pure persistency. But I personally dislike this choice and would rather prefer to move on to something more professional and reliable.)

Maybe the approach for your site is the right one. I can't tell you and I guess your programmers are doing a good enough job of exploiting the best techniques applicable in solving this issues.

It's a bit complex to reassign parts of a client IP, but not the whole IP, but if you notice that, say, IP 1.2.3.4 causes a lot of traffic on RS1, why not reassign all _other_ IPs that are currently using RS1 to the other

Yes, you could do that too.

realservers? It sounds to me that you only need to track the amount of activity per IP and base the weighting on that and reassign the assigned RS from the _least_ active IPs, since those are best used for balancing.

But then it would take longer to get the proper balance again. What about a policy selector where you can simply choose which approach you want to take?

But again, it's not imbalance during normal business that really worries me, as it has never hit me until now. It's the imbalance penalty when doing maintenance to the site that annoys me most.

Hmm, I never had imbalance when doing maintenance work because the load simply distributes on the remaining RS.

At least you don't need medical care for lack of humour ;-)

Thanks.

You made it a bit more complex than what I thought about, but overengineering is what every programmer does, no? ;-)

I'd call it careful design with possible extensibility in future, but yes, overengineering is not a bad expression either ... I guess.

I was thinking along the lines of

- We monitor each client IP for activity

What is we and what is the monitoring metric?

- If the realserver is imbalanced by more than a few percent, or of the target
  balance is 0 we start reassigning each client IP to a new realserver, by
  modifying the template, starting on the first _new_ socket connection.

I don't know if this a good idea. Try to imagine the worst case situation. You have 3 RS. AOL connects to RS1, some other proxy connects come from a provider in UK and are stuck to RS3. Now we have of course a load imbalance which will soon show up in a few percentages. And the schedulers (because they run parallel and asynchronously) will both give away clients to RS2, which in turn will be the overloaded one. Trust me, the Internet is so dynamic that the latency in reaction to a network anomaly causing load imbalance which tries to equalize it will result in a chaotic oscillation of network load distribution.

- If the target weight for the realserver is 0 we always reassign

Already done, yes.

- If the realserver is only seriously imbalanced, but the weight is nonzero we
  reassign only if a given client's activity is smaller than the average
  activity per client IP for this realserver (or something similar), thereby
  balancing the cluster using the single hosts and leaving the big NAT-ed
  networks for what they are if possible. Even if multiple NAT ranges end up
  on a single realserver the disappearing smaller IPs make the average
  activity higher for each subsequent reassign run, so in the end even the
  NAT-blocks will be reassigned, if needed, but only if really needed.

I think you should have more patience. The Internet is slow and things will equalize. You cannot assume to have a load equilibrium after a AOL burst within the first half a day, especially when complexity and dynamics of the site vary. Your approach simply sharpens the bursts and tries to modulate to a mean earlier but I'm not so sure if this will work.

I hope you can run fast ;-)

No problem with that.

I'm inclined to tell you not to use persistency and upgrade your DB ;)
Tell the people deciding over the money and I'm all for it ;-)

Give me the phone number of your boss or project manager and I will talk with him about it.

(amongst which the website's very company itself...) got connected to the
broken server. I had to turn on quiescence in ldirectord.

^^^^^^^^^^^^^^^ Make that "off", not "on". Oops.

Ok.

What ldirectord does if it detects a realserver failure is set the weight to 0 if quiescence is turned on. That's nice for transitional errors and/or for non-persistent connections, but when the connections are persistent that simply means clients are never redirected at all to another RS until the timeout setting. Needless to say that's unwanted behaviour :-)

Needless to say that it is completely broken behaviour. If a service is not available anymore you _mustn't_ set the weight to 0. Never, ever, it's a bit nononononono. Take the service template out and put it back in when the healthcheck says so. There are only two cases where you need weight 0.

a) You want to do maintenance work and instead of pissing off your
   potential customers by killing their sessions you quiesce the RS
   until the template timeout expires.
b) You use the per RS threshold limitation patch that will put a RS
   into 'quiesced/cripple' mode until the amount of sessions is below
   the lower threshold.

And you're completely sure that ldirectord does show this behaviour when using persistency and the RS goes down (and quiesce option is on)?

Turning the quiescence option off avoids this problem btw.

I'm stunned.

If a service on a RS is down, your user space core detection engine should
take the template out and before that make sure you have set
/proc/sys/net/ipv4/vs/expire_nodest_conn.


ldirectord with quiescence doesn't do that by default, but that's what I did configure it to do indeed.

It's none of the quiesce functionality's business. If a service on a RS is not available anymore, take it out. End of story. Not hard feelings about it, just rip it out of the connection template because that RS ain't gonna give you any warm feelings anyway ;).

Indeed :-) I read the docs as well, and it seems to me it doesn't do what I need. But as you correctly state, the docs are a bit sparse...

Written by a genius, you know how this is ...

Ever had to admin win2k realservers? ;-)

Sure and I love it. I mean, come on, read it from my lips: I'm the born Windows administrator! Ok, seriously, here's what I do when a customer wants W2K as RS:

1.) get a decent box with lots of RAM (1-2GB)
2.) install Linux on it (doesn't matter which one)
3.) Now: Install vmware on it!
4.) install W2K into the vmware
5.) setup bridged networking and a terminal service access
6.) give the customer the login and passwd and tell him the IP address
7.) if W2K stalls in a way the customer can't do anything about it:
    pkill -9 vmware
8.) if the customer wants backup:
    dd if=/var/vmware/nt.disk of=/nfs/backup/bck-$(date)-cust
9.) go into the pub to have a few beers because I don't need to spend
    time with support
10.) sleep well because I know backup is there, we can easily powercycle
     W2k and because of the amount of beer.

Note to our customers: All those 10 points are not true, it's just a fairy tale. It would never work that way. [/me runs again like hell]

Besides, most of the maintenance downtime is formed by code updates, because the sites are still evolving. And copying over new code from the beta to the live is not something you do on an active RS...

Ok, I'm not so sure about your business but in 4 years of doing e-commerce projects I've seen some pretty funny stories and failures and one thing that I learned was: Make an exact copy of your product framework in a pilot network setup. Have it in-house and do your software tests and SP upgrades on the exact same setup in-house. How can you make sure that a new SP doesn't all of the sudden disable the MS loopback adapter or moves it to a different place in the registry? How can you make sure that the session ID fetching from disk to RAM still works? I mean, we're talking about big applications here but even if yours is not so big, you might convince your boss to spend a few bucks on a decent pilot setup.

Not really, as RS1 has half the weight of RS2 and RS3 (the backoffice and some other stuff runs on RS1, so that machine is loaded enough without LVS activity :-)

So RS1 to RS3 are not providing the same load balanced service? May I ask you to share your 'ipvsadm -n -L' with us, please?

Fine. So we're talking about a site with very low bandwidth constraints. I
just checked one of our customers site and they have between 4 and
13Mbit/s.

I could only dream about adminning such boxes :-)

Well, there is not a lot to do there, you know ... Linux and things like that :).

Then again, for a first job after graduation it's not bad at all. This setup
                                    ^^^^^^^^^^
                            still working on that one

is more than ambitious enough before I think I know how all of it works...

Absolutely.

Hmm, actually I have no idea how to measure that. Will ask the database admin if he can come up with something.

Very good.

DB tuning always needs advanced tricks and most of the time you need a
Russian guy to do this :)
We have a Russian developer, would that qualify? :-P

See, my work experience with Russians tells me that there are (besides thousands of other nice things) 3 things they produce for sure:

o vodka in all flavours and colours
o excellent mathematicians (Hello NSA, do you copy?)
o fully fledged (Oracle) DB admins with indepth Delphi knowledge

I haven't found the conjunction of those three items yet.

Yes, I am well aware of this and so is the rest of the sysadmin team. But we don't do the budgets :(

I understand. Good luck anyway.

Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc



<Prev in Thread] Current Thread [Next in Thread>