On Wednesday 09 October 2002 22:55, Roberto Nibali wrote:
> I don't understand why? Let's assume you want to apply SP9239834.233a-1
> to your RS. Now since you have multiple RS doing the work for you, you
> can simply quiesce one and wait half an hour and then perform the
> upgrade with SP9239834.233a-1. Then you put it back in with the old
> weight. Then play the game with the next RS. No downtime, no customer
> problems.
Sure, but that takes a bit of time for us. I *hoped* it would be possible to
get that waiting back to a few minutes at most, but it seems it's too
difficult or not worth it...
> Hmm, I never had imbalance when doing maintenance work because the load
> simply distributes on the remaining RS.
Not with persistency, unless you wait for quite a while. You won't get new
clients, but it will take a while before existing clients disappear.
> I don't know if this a good idea. Try to imagine the worst case
> situation. You have 3 RS. AOL connects to RS1, some other proxy connects
> come from a provider in UK and are stuck to RS3. Now we have of course a
> load imbalance which will soon show up in a few percentages. And the
> schedulers (because they run parallel and asynchronously) will both give
> away clients to RS2, which in turn will be the overloaded one. Trust me,
> the Internet is so dynamic that the latency in reaction to a network
> anomaly causing load imbalance which tries to equalize it will result in
> a chaotic oscillation of network load distribution.
Point taken. Hmm, maybe the idea is not very good at all then. Each line
further down your mail I'm more and more doubting it at least...
> > - If the target weight for the realserver is 0 we always reassign
>
> Already done, yes.
With persistency? With persistency no reassigning occurs AFAICS.
> I think you should have more patience. The Internet is slow and things
> will equalize. You cannot assume to have a load equilibrium after a AOL
> burst within the first half a day, especially when complexity and
> dynamics of the site vary. Your approach simply sharpens the bursts and
> tries to modulate to a mean earlier but I'm not so sure if this will work.
Again, point taken. This is indeed rather tricky business now I think of it
again. For offloading a quiesced server this approach should work fairly
well, but for equalizing it's dangerous.
> Needless to say that it is completely broken behaviour. If a service is
> not available anymore you _mustn't_ set the weight to 0. Never, ever,
> it's a bit nononononono. Take the service template out and put it back
> in when the healthcheck says so. There are only two cases where you need
> weight 0.
>
> a) You want to do maintenance work and instead of pissing off your
> potential customers by killing their sessions you quiesce the RS
> until the template timeout expires.
> b) You use the per RS threshold limitation patch that will put a RS
> into 'quiesced/cripple' mode until the amount of sessions is below
> the lower threshold.
>
> And you're completely sure that ldirectord does show this behaviour when
> using persistency and the RS goes down (and quiesce option is on)?
Yes. I found it out the hard way. The earlier-mentioned motherboard problem
took down a RS and the customer was assigned to exactly that RS (Murphy's Law
I guess ;-). Anyway, the phone rang pretty quickly and it took me a while to
figure out that the ldirecord upgrade caused this. I did notice the new
option before upgrading, but I figured it to be actually useful and my
testing was apparently flakey, so I ended up with a horribly broken
quiescence option in a running live config...
> Note to our customers: All those 10 points are not true, it's just a
> fairy tale. It would never work that way. [/me runs again like hell]
You might be surprised how far you could get in trying this, but only for
testing. For a real-life situation I'd rather stay away from these practices
;-)
On a more serious note, the use of Win2k as realserver is not too bad, but
requiring a reboot for most service packs and hotfixes makes it dreadfully
annoying at times.
> Ok, I'm not so sure about your business but in 4 years of doing
> e-commerce projects I've seen some pretty funny stories and failures and
> one thing that I learned was: Make an exact copy of your product
> framework in a pilot network setup. Have it in-house and do your
> software tests and SP upgrades on the exact same setup in-house. How can
> you make sure that a new SP doesn't all of the sudden disable the MS
> loopback adapter or moves it to a different place in the registry? How
> can you make sure that the session ID fetching from disk to RAM still
> works? I mean, we're talking about big applications here but even if
> yours is not so big, you might convince your boss to spend a few bucks
> on a decent pilot setup.
The pilot is not 100% identical and I didn't really think about this yet, but
it makes indeed sense to equalize them. I have plenty of other tasks though
besides the LVS cluster, so I'm afraid this has to wait.
> > Not really, as RS1 has half the weight of RS2 and RS3 (the backoffice and
> > some other stuff runs on RS1, so that machine is loaded enough without
> > LVS activity :-)
>
> So RS1 to RS3 are not providing the same load balanced service? May I
> ask you to share your 'ipvsadm -n -L' with us, please?
Same service, different weight.
RS2 == RS3, RS1 = RS2 / 2 (or RS3 / 2)
> Well, there is not a lot to do there, you know ... Linux and things like
> that :).
Yeah, linux boxes tend to self-admin them once you really know how they work
and have had the time to decently set them up. Our mail and dns servers
hardly take maintenance anymore. It's the LVS business that's new and isn't
as automated as it should be yet.
> See, my work experience with Russians tells me that there are (besides
> thousands of other nice things) 3 things they produce for sure:
>
> o vodka in all flavours and colours
> o excellent mathematicians (Hello NSA, do you copy?)
> o fully fledged (Oracle) DB admins with indepth Delphi knowledge
>
> I haven't found the conjunction of those three items yet.
* rotfl *
Martijn
|