abhijeet wrote:
>
> Well I did think a lot about supersparrow.
> Though it does attempt geographic load balancing,
> it suffers from the drawback of people not being ready to
> share routing information(according to Joseph Mack).
Maybe, maybe not. All things are difficult if no-one is doing
them already and it maybe in a few years when/if the supersparrow
approach in more widespread, that poeple who control the routers
will be forced/be delighted to give people access to this information.
Whether this happens will be more a matter of politics, people
looking out for job security, and money rather than anything technical
and you need to be more of a Solomon than I am to say whether this will
happen.
More importantly, do you have access to some BGP router
and can you put some machines at geographically different
sites to test the code you produce?
> Secondly I believe that Network Latency + Server Load is the real
> problem.Even though a Server is closer, it might be loaded
> and thus might take a longer time to reply a request than possibly
> a server that's farther away from it.(Please correct me if I am wrong)
We don't have much good data on this, although plenty of it must exist
somewhere. It would seem that LVS realservers either have well balanced
loads or else when serving groups of people which includes a large
proportion of clients which come through a single proxy (et AOL),
then 1 server gets the AOL clients. There isn't much any code you
produce that will fix this. So I would expect that in most cases
the realservers will be well balanced.
> And what we aim is to see to it that a request gets handled as soon as
> possible.(Again please correct me)
fine. Are your realservers in the same place or geographically distributed?
> Therefore what one really needs to do is to design a metric based
> on network latency + server load which would be used for load balancing.
>
> I found the following suggestion very interesting:
>
> > Wouldn't it be nice if you had hierarchial load balancing - if a servers
> > weight didn't only include itself, but also all servers behind it. (Think
> > "tree" and "aggregrated weight" in complex server topologies)
> >
> > And if the node value you computed did not only take the pure machine load
> > into account, but also network latency to the remote server, so it would
> > scale
> > nicely to distributed environments.
> >
>
> What I suggest is that latency is computed using *pseudo requests* to each
> lower tier LVS from a *master* director (which would be required in this
> architecture.) and measuring their response time.
>
> > A side aspect of this would be to roll out the uptodate configuration
> > seamlessly to all nodes from a master node.
>
> Another side aspect would be that the master director would be a critical
> point in the architecture in terms of failure.
> This can be overcome either:
> * by using a redundant master director
> * or/and configuring the system such that in case of
> such a failure, the LVS's function independently.
the dead one is usually failed out.
> How do you pple feel about this idea?(pseudo requests being the central
> point)
> Has anybody done it before?
do you have a geographically distributed system which is showing the problems
you are hoping to correct? You need such a system whose problems are measureable
in some terms that can be clearly appreciated by people who own/run/use these
systems (usually measured in pain/aggravation/$/failed connections... rather
than
ms latency) and to able to show that you reduced it.
Your approach has two components
o controlling the system by some parameter
o the parameter is network latency (or whatever you decide)
You should separate these two (incase later someone decides that
some other parameter(s) should control the system) and do them
so that they fit in with the current LVS code, so that there is
some hope that people will use them. In this case you should
contact Lars, who gave you the original suggestion quoted here,
and the other people who have code which controls LVS eg ldirectord
to make sure your code will fit in with theirs.
Remember when you get your metric, it must be available at the time
the client connects when the director is running in kernel mode.
The kernel request has to complete in the time of the context switch
and can't wait too long to get replies
back from the realservers as to their load. Horms has the client
wait for the BGP info from the router, and he gets it back quickly
enough. You don't have time to query the realservers for their disk
latency etc. The realserver will have to poll and have a number ready
and updated every second say, for your code to request
Joe
--
Joseph Mack PhD, Senior Systems Engineer, Lockheed Martin
contractor to the National Environmental Supercomputer Center,
mailto:mack.joseph@xxxxxxx ph# 919-541-0007, RTP, NC, USA
|