Re: memory use on long persistent connection (eg for e-commerce sites,

To:	lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject:	Re: memory use on long persistent connection (eg for e-commerce sites, squids)
From:	Roberto Nibali <ratz@xxxxxxxxxxxx>
Date:	Fri, 20 Sep 2002 14:59:29 +0200

Hi,

2 on topic questions at the bottom honest !

;)

Its not to bad I use NT2k built in Service checker that restarts IIS onfailure. (normaly takes about 20 seconds) Server re-boots are v.rare.

But from what I can see about your CISCO LD timing this is not theeffective downtime of the server. Even if IIS restarts within 20s, theLD will not forward requests to the server for another 40s, right?

BTW, are you using your CLD in the NAT or the triangulation mode? In theNAT mode there is a possibility of up to 10% packet loss under certaincircumstances (maybe they've fixed it since I've tested it back in 2000).

Current live site has a CISCO 416 local director in front of it thatdetects 8 failures (ACKS ?) in a row then takes the real server out ofaction for 1 minute + (until it comes back online)


Ugh, what about network congestion?

Sometimes however IIS crashes but still manages to respond to HTTP GETrequests (CISCO say this is a bug with IIS) and then the CISCO can'tdetect the real server failure...

Cisco has to give you that answer because they can't fix it. And thereason is very simple. The CLD is a hardware load balancer and thus canonly be equipped with very simplistic healthchecks such as ping, TCPconnect, HTTP GET, RADIUS, POP/IMAP and some other protocols/services.But every 'bigger' site has content specific data that is most of thetime created dynamically via a DB calls and whatnot. Now it can happenthat IIS crashes but HTTP GET still works either because it is in thecache handler of the IIS (which didn't crash) or because this page isnot affected by the crash somehow.

For such cases you need sophisticated healthchecks which can be quitelenghty and complex in their nature. But they assure that everything youwant to run does run by doing the appropriate test. Look at it as aautomated QA test that verifies that noone (not even the process itself)has changed the specifications.

As you can imagine, the space and complexity of what you can implementin ASIC/FPGAs is limited and of course you can't implement asophisticated healthcheck script into hardware. CISCO can't make it foryou and you can't make it either. You're delivered and stuck to theexisting healthchecks (which are most of the time accurate enough).

A typical equivalent situation is Oracle. Oracle manages to crash insidebut still somehow deliver SNMP or other relevant data. Now if you onlyhave a healthcheck that checks for the correct SNMP values and maybe aSQL manager port connect and successful user/passwd login, you might notrecognize that Oracle has crashed because of a memory segment violationsomewhere inside it's own wicked world.

That is why the LVS project is such a nice approach from the designpoint of view. You're not restricted by any hardware issues (unless youneed need Gigabit Ethernet and do have this amount of traffic) and youcan write your healthcheck in whatever language it pleases you.

Thats why I have LVS / Ldirectord under test at the moment so that I canforce it to check for specific page and test result.


Exactly.

1) The only problem I have so far is that all the solutions I've triedfor Non Arp interface on real servers seems to knock out windows filesharring (SMB/CIFS) which I use for ROBOCOPY replication of files...

So much to the point of Microsofts own way of defining how things shouldwork starting from L4. No, let's get serious and keep the rant out: Ican imagine very well that this happens. Maybe you have to spend a fewbucks and put in an additional NIC into your RS and to the ROBOCOPY overthose dedicated interfaces. How does that sound?

2) Possible feature request ?
The CISCO has a slow start option i.e. bring the real server backonline slowly in order to not overload it..

Ok.

Without this option our real servers will sometimes continouslycrash as soon as they are brought online 'cause the load is to high...


I've seen this on HP boxes running Netscape servers as well.

I think it's 'cause IIS caches the script first time it runs and thistakes about 8 secconds, and if a rush of people ask for a page while thescript is compiling then IIS dies.


Yes.

 Could this be made an option in LVS ?

Well there are two ways to achieve it, both of which basically have beenimplemented or are available by design default:


o threshold limitation (I've made a patch for the 2.2.x kernel series,
  and for the 2.5.x kernels it is already in):
  - You would need to dynamically limit the RS upper threshold up in
    fixed timeslices. This will do the job. Once the server seem to be
    stable, you can remove the RS threshold limitation for it. You have
    to write a script though that does this but this is not very
    difficult. And actually this is a great idea. I've only done this
    for a shutdown procedure but I think I will do it for a startup
    sequence too, using something like a TCP slowstart algorithm, only
    without window notification feedback :)

o Use QoS an a egress policy on the outgoing interface of the load
  balancer to rate limit the incoming (actually outgoing) requests to
  the RS. Also this limit has to be adjusted up in fixed timeslices. A
  similar script as the one above would result.

I know you could probably fake it using WLC and a mod on ldirectord butthat doesn't sound the right way to do it.


Exactly. Application level is not the right layer to to this.

BTW did I say thanks for a great product ?


What product are your referring to?

Best regards and I hope this helps,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc

<Prev in Thread]	Current Thread	[Next in Thread>
memory use on long persistent connection (eg for e-commerce sites, squids), Joseph Mack Re: memory use on long persistent connection (eg for e-commerce sites, squids), Malcolm Turnbull Re: memory use on long persistent connection (eg for e-commerce sites, squids), Roberto Nibali Re: memory use on long persistent connection (eg for e-commerce sites, squids), Malcolm Turnbull Re: memory use on long persistent connection (eg for e-commerce sites, squids), Roberto Nibali Re: memory use on long persistent connection (eg for e-commerce sites, squids), Malcolm Turnbull Re: memory use on long persistent connection (eg for e-commerce sites, squids), Roberto Nibali <= Re: memory use on long persistent connection (eg for e-commerce sites,squids), Sébastien Bonnet RE: memory use on long persistent connection (eg for e-commerce sites, squids), Mark Weaver Re: memory use on long persistent connection (eg for e-commerce sites, squids), Roberto Nibali Re: memory use on long persistent connection (eg for e-commerce sites, squids), Julian Anastasov Re: memory use on long persistent connection (eg for e-commerce sites, squids), Roberto Nibali

Previous by Date:	答复: Slowly disconnect a real server, lidsa
Next by Date:	Re: memory use on long persistent connection (eg for e-commerce sites,squids), Sébastien Bonnet
Previous by Thread:	Re: memory use on long persistent connection (eg for e-commerce sites, squids), Malcolm Turnbull
Next by Thread:	Re: memory use on long persistent connection (eg for e-commerce sites,squids), Sébastien Bonnet
Indexes:	[Date] [Thread] [Top] [All Lists]