LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: Release new code: Scheduler for distributed caching

To: Wensong Zhang <wensong@xxxxxxxxxxxx>
Subject: Re: Release new code: Scheduler for distributed caching
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Thomas Proell <Thomas.Proell@xxxxxxxxxx>
Date: Thu, 2 Nov 2000 12:04:16 +0100 (MET)

On Thu, 2 Nov 2000, Wensong Zhang wrote:

> Have you thought about the algorithm that I posted? It aims for both

Yes. We both map IP-addresses into an array by using a hash-function.
The main difference between our schedulers is, what we write into
the array.

You write the address of the least loaded cache, e.g. you associate
the address with the least connected cache.

In my implementation, the entry will depend only on the result of
the hashing of the ip-address.

In the beginning, your idea seems to be the better, because it shows
a much better load distribution. But your solution is unusable in
the environment of my scheduler.
There, you'll have a redirector in Paris, one in Cannes and at
least two others. They are using the same caches. The client
population is huge, which is very important for the performance
of the caching system.
Assume, the redirectors are running for a week now. What will happen, 
when peoplerequest the same new page from all 4 redirectors?

Each redirector sends the request to that cache, that is the least 
loaded from its point of view. 
So, the locality is very poor using several redirectors.

The next problem is that you won't have "new" addresses after a
few weeks. When I used a 65536 slot hashtable, half of the 
slots were used after one day. Let it work with a great client
population and you'll soon have every slot!="NULL".
So, new incoming requests are not sent to the least loaded cache
as the origin idea was, because they'll be mapped into a
hashtable-slot that is aleready used by another address (or, as 
I experienced, used by 5 other addresses). You'll have something 
like static routing then. 
Only if the cache is very loaded, you use another cache -- I
was calling this the "hot spot solution". 
I'm only distributing the hot spots among the leat loaded caches, 
you'll distribute all.

So, for getting "new" addresses even after months, you'll need
an aging-function to consider hashtable-entries that weren't used
for some time as "unused"="NULL".

So, I don't see disadvantages of my scheduler (with hot spot solution).

> 1. I don't think the algorithm is good. :)

I do :-)

> 2. The scheduler can only work for one service. Scheduler are usually
>    bound to several services.

I don't understand that very well. Can you explain?

> 3. Hash method is not good.

Like 1?

> 4. code is not neat. :)

THAT is true. But I don't waste energy into a project that people
don't want. So, I released it to start a discussion and to see the
feedback, if it's worth improving. Moreover, I'll need some help 
for some details. 



Salut

Thomas

<Prev in Thread] Current Thread [Next in Thread>