On Sun, 2 Jul 2000, Julian Anastasov wrote:
>
> Hello,
>
> On Sat, 1 Jul 2000, Chetan Ahuja wrote:
> > What exactly constitutes "Load" on a real server and how often should it
> > be measured?
>
> Only the LVS user can decide what is "Load" and what
> is the preferred period to update its values. The different
> real services change different parameters in the real hosts.
> We have to allow the user to select which parameters to be
> monitored for each real service.
Yes.. I agree that ultimately we would want to let the user decide as
many parameters of scheduling algorithm as possible. ( More about this later..)
> > If yes, what kind of things should be measured as "load". The astute
> > reader would shoot back immediately with stuff like CPU, memory and
> > network load. Let's treat them one by one:
>
> >
> > (All of the following is assuming that the realservers are runnig linux.
> > At least I'm going to deal with only such cases for now)
> >
> > 1) CPU: How good is polling /proc/loadavg? My problem with that is the
> > load introduced by the measurement itself if polling is done too often.
>
> 5 seconds is not fatal.
I think a once every 5 second sampling of load is probably
not enough. We'd need more frequent sampling. This is where people
with real experience in running large LVS clusters comes in. I would
really LOVE to hear from people who think that they are not entirely
satisfied with the current schedulers and would like a
load-informed scheduler. What kind of applications demand such a
scheme ? Once we have this information, we could decide what
would be the best sampling period. (And of course, ideally we should
make it a runtime configurable parameter as I said before)
> > Besides, how good is the info in loadavg? Doesn't it just
> > measure the number of processes in the queue every few milliseconds or
> > so to calculate the load. One could argue (and many people do argue)
> > that this is not a good metric of CPU load. Any ideas ??
>
> Yes, loadavg is not good for web and the other well
> known services. But the user still can run some real
> services that eat CPU. If the loadavg can be high the user
> can select it as load parameter.
So what would be a better way to get CPU load info. I would like
to use the /proc interface as much as possible and avoid special
kernel patches. But that's not an absolute requirement. I'm looking for
suggestions as to the best metric for CPU activity (averaged, say,
over the past one second). I'm thinking some combination of
a) num. of passes through schedule()
b) num. of interrupts
c) num. of processes in run queues
d) some count of how many processes used up their alloted time quantum
without sleeping on I/O (might indicate CPU intensive work as
against I/O intensive work)
Comments please...
> > 2) Memory: We could just do what free command does (which is
> > just reading /proc/meminfo). Is that good enough. Anybody see any
> > pitfalls in that approach? Of course, polling /proc too often is again
> > a problem here. Besides that ??
>
> Yes, you can create many load parameters from
> /proc/meminfo. Even "Cached" and "Buffers" are useful. And
> sometimes it is faster to open and read from /proc than to
> to read the parameters one by one by using other interfaces.
Seems like we have all the info we need for memory from /proc/meminfo
> > 3) Network: This is the hardest one. What would be a good metric of
> > network load... number of alive TCP connections?? Is that good
> > enough... I'm not deeply familiar with the kernel networking code. Could
> > somebody who is more familiar would throw some more light on this....
>
> You can try with /proc/net/dev. There are bytes and
> packets for each interface but the drawback is that they are
> sometimes zeroed and the interfaces sometimes disappear :)
Well, yes, num. of dropped packets might give us an indication that
the networking load is heavy (which is a good thing to know if it's
happening) but the numbers in /dev/net are cumulative numbers since
the interface was brought up and it may not have any relation to the
current situation ( but yes we could do some simple math to extract the
numbers for last second or whatever)
I was thinking more in terms of TCP connection overhead and
all the costs associated with that... Since most likely
use of LVS is as web servers, proxies etc., TCP load is probably
the most important issue here. At least that's what I've come up
with so far. I am really looking for comments from the "experts" on
this one...
> The list is a good place for such discussion :)
>
> More ideas:
>
> - use ioctls to add/delete LVS services/destinations
>
> - use all kinds of virtual services, forwarding methods and
> scheduling methods (configured from the user). IOW, all LVS
> feauters.
>
> - user space tool to manage the config file and the network
> interfaces/routes/settings. For example:
>
> <tool> start <domain> send gratuitous ARP, set ifaces, etc
> <tool> stop <domain> stop ifaces, etc
> <tool> secondary <domain> role: director -> real server
> <tool> primary <domain> role: real server -> director
>
> - call scripts to play with policy routing and other kernel
> settings, etc.
>
> - support for backup directors working as real servers
These are all nice TODO items but I'm afraid I'll have just
enought time to focus on only the load-informed scheduling for now.
Thanks for the considered reply...
Chetan
|