Re: Release new code: Scheduler for distributed caching

To: Joe Cooper <joe@xxxxxxxxxxxxx>
Subject: Re: Release new code: Scheduler for distributed caching
Cc: "Matthew S. Crocker" <matthew@xxxxxxxxxxx>, lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Julian Anastasov <ja@xxxxxx>
Date: Fri, 27 Oct 2000 22:56:23 +0000 (GMT)

On Fri, 27 Oct 2000, Joe Cooper wrote:

> An API for detecting load within the real server and reporting it to the
> director is a good idea.  The only problem is that it isn't quite as
> simple a solution as it sounds.
> In the case of Squid (our current topic for discussion), it's very
> difficult to ascertain what the load is like.  Yes, you can monitor the
> CPU usage, but a cache can be running at 0% idle and still effectively
> serve requests, as long as the disks are keeping up.  In fact, most of
> our boxes run with only about 5-10% idle CPU _all_ the time, even when
> not under very heavy load.  It is a quirk of the current async i/o
> implementation.  All the disk threads eat a lot of CPU even when not
> being pushed hard.

        5-10% is a very dangerous area. One FastEthernet downloads
with 11MB/sec. System with many SCSI disks can run on such network
speed. Of course, if the server application can distribute the load
on many disks. IMO, the memory and the CPU can be the only bottleneck
in one server considering the current hardware solutions. But 10% is
near the dark zone and if you enter it with many processes the box
just dies. With one process the effect is only a delay in the
processing. Even the load balancing can keep the load difference near
this level, i.e. 5%.

> Response time is also problematic, because the origin servers--which are
> completely out of the control of the Squid--have an effect on the
> overall average response time of the proxy.  How do we know whether the
> slowdowns, and increase in open connections is due to a heavily loaded
> Squid, or heavily loaded origin server, or a heavily loaded internet
> pipe (which is presumably shared across all caches), or heavily
> loaded/slow pipe to the clients (like a dialup line that Squid has no
> control over)?
> Monitoring disk load is much more difficult still, but is probably the
> best way to decide how loaded a cache is.  But the scale must be
> adjusted, by hand, for every cache.  It's impossible to know, without
> some benchmarking, how a certain combination of disks will perform with
> a certain version of Squid (there are wild variances in the performance
> of different versions of squid with different compile time options), on
> a certain processor.
> I suspect the easiest metric to obtain that would provide some level of
> predictability is hit response time for a local artificial client, like
> echoping or wget.  Run it on the director and measure how long a known
> hit takes to come back from the cache.  If it climbs over, say 4
> seconds, readjust the hash table to lower it's load a little.  If the
> next check 30 seconds later comes back slow again, lower the load some
> more.

        Yes, the expression with load parameters can be very complex.
But I, as user, will prefer this expression to be controlled from me.
I.e. make your test with expressions, tune the parameters and try to
define what is "load" for your systems. I don't prefer systems that
make the decisions with predefined parameters and without the ability
for user control over the load sharing. The good example is the WLC

> Just my .02...I can't think of any easier way to get a relatively good
> idea of the current overload status of a web cache.  Too many factors
> out of the control of the cache.

        I think, many people can come with different ideas about
controlling the load. We have to move all these good ideas to a
different topic very soon. The experience from production clusters will
be the key for the success of this story.


Julian Anastasov <ja@xxxxxx>

<Prev in Thread] Current Thread [Next in Thread>