Re: Release new code: Scheduler for distributed caching

To: "Matthew S. Crocker" <matthew@xxxxxxxxxxx>, lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: Release new code: Scheduler for distributed caching
From: Joe Cooper <joe@xxxxxxxxxxxxx>
Date: Fri, 27 Oct 2000 04:41:32 -0500
An API for detecting load within the real server and reporting it to the
director is a good idea.  The only problem is that it isn't quite as
simple a solution as it sounds.

In the case of Squid (our current topic for discussion), it's very
difficult to ascertain what the load is like.  Yes, you can monitor the
CPU usage, but a cache can be running at 0% idle and still effectively
serve requests, as long as the disks are keeping up.  In fact, most of
our boxes run with only about 5-10% idle CPU _all_ the time, even when
not under very heavy load.  It is a quirk of the current async i/o
implementation.  All the disk threads eat a lot of CPU even when not
being pushed hard.

Response time is also problematic, because the origin servers--which are
completely out of the control of the Squid--have an effect on the
overall average response time of the proxy.  How do we know whether the
slowdowns, and increase in open connections is due to a heavily loaded
Squid, or heavily loaded origin server, or a heavily loaded internet
pipe (which is presumably shared across all caches), or heavily
loaded/slow pipe to the clients (like a dialup line that Squid has no
control over)?

Monitoring disk load is much more difficult still, but is probably the
best way to decide how loaded a cache is.  But the scale must be
adjusted, by hand, for every cache.  It's impossible to know, without
some benchmarking, how a certain combination of disks will perform with
a certain version of Squid (there are wild variances in the performance
of different versions of squid with different compile time options), on
a certain processor.

I suspect the easiest metric to obtain that would provide some level of
predictability is hit response time for a local artificial client, like
echoping or wget.  Run it on the director and measure how long a known
hit takes to come back from the cache.  If it climbs over, say 4
seconds, readjust the hash table to lower it's load a little.  If the
next check 30 seconds later comes back slow again, lower the load some

Just my .02...I can't think of any easier way to get a relatively good
idea of the current overload status of a web cache.  Too many factors
out of the control of the cache.

"Matthew S. Crocker" wrote:
> On Fri, 27 Oct 2000, Julian Anastasov wrote:
> Why can't we come up with an API the Real Servers can use to tell the
> ldirector their load so the ldirector can update its routing table?  It
> doesn't sound like a very difficult problem.  Just have the real servers
> give a rating from 0 to 100 (100 being fully loaded, 0 being no load). the
> ldirector can multiply that by the weighting and figure out it's load.
> The Servers would update their load setting every x seconds.
> -Matt

                     Joe Cooper <joe@xxxxxxxxxxxxx>
                 Affordable Web Caching Proxy Appliances

<Prev in Thread] Current Thread [Next in Thread>