LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: What is "load"? Monitoring, load-informed scheduling and so on..

To: Julian Anastasov <uli@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: What is "load"? Monitoring, load-informed scheduling and so on..
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Chetan Ahuja <ahuja@xxxxxxxxxxxxxxxxx>
Date: Mon, 3 Jul 2000 01:40:53 -0400 (EDT)

On Sun, 2 Jul 2000, Julian Anastasov wrote:

> 
>       Hello,
> 
> On Sat, 1 Jul 2000, Chetan Ahuja wrote:

> >  What exactly constitutes "Load" on a real server and how often should it
> > be measured? 
> 
>       Only the LVS user can decide what is "Load" and what
> is  the preferred period to update its values. The different
> real services change different parameters in the real hosts.
> We  have to allow the user  to select which parameters to be
> monitored for each real service.
 
   Yes.. I agree that ultimately we would want to let the user decide as
many parameters of scheduling algorithm as possible. ( More about this later..)


> >    If yes, what kind of things should be measured as "load". The astute
> > reader would shoot back immediately with stuff like CPU, memory and
> > network load. Let's treat them one by one:
> 
> > 
> > (All of the following is assuming that the realservers are runnig linux.
> >   At least  I'm going to deal with only such cases for now)
> > 
> > 1) CPU: How good is polling  /proc/loadavg? My problem with that is the
> >   load  introduced by the measurement itself if polling is done too often.
> 
>       5  seconds is  not fatal.

   I think a  once every 5 second sampling of load is probably
 not enough. We'd need more frequent sampling. This is where people
 with real experience in running large LVS clusters comes in. I would
 really LOVE to hear from people who think that they are not entirely
 satisfied with the current  schedulers and would like a
 load-informed scheduler. What kind of applications demand such a
 scheme ? Once we have this information, we could decide what
 would be the best sampling period.  (And of course, ideally we should
 make it a runtime configurable parameter as I said before)

> >         Besides, how good is the info in loadavg?   Doesn't it just
> >  measure  the number of processes in the queue every few  milliseconds or
> >  so to calculate the load. One could argue  (and many  people do argue)
> >  that this is not a good metric of CPU load. Any ideas ??
> 
>       Yes,  loadavg is not good for web and the other well
> known  services.   But  the  user still  can  run  some real
> services  that eat CPU. If the  loadavg can be high the user
> can select it as load parameter.

     So what would be a better way to get CPU load info. I would like
to use the /proc interface as much as possible and avoid special
kernel patches. But that's not an absolute requirement. I'm looking for
suggestions as to the best metric for CPU activity (averaged, say,
over  the past  one second). I'm thinking some combination of

a) num. of passes through schedule()
b) num. of interrupts
c) num. of processes in run queues
d) some count  of how many processes used up their alloted time quantum
   without  sleeping on I/O  (might indicate CPU intensive work as
   against I/O intensive work)

  Comments please...   

> > 2) Memory: We could just do  what free command does (which is
> >    just   reading   /proc/meminfo). Is that good enough. Anybody see any
> >    pitfalls in that approach?  Of course, polling /proc too often is again
> >    a problem here. Besides that ??
> 
>       Yes,  you  can  create  many  load  parameters  from
> /proc/meminfo.   Even "Cached" and "Buffers" are useful. And
> sometimes  it is faster to open  and read from /proc than to
> to read the parameters one by one by using other interfaces.

    Seems like we have all the info we need for memory from /proc/meminfo
  
> > 3) Network: This is the hardest one. What would be a good metric of
> >   network  load... number of alive TCP connections??  Is that good
> >   enough... I'm not deeply familiar with the kernel networking code. Could
> >   somebody who is more familiar would throw some more light on this....
> 
>       You  can try with /proc/net/dev. There are bytes and
> packets for each interface but the drawback is that they are
> sometimes zeroed and the interfaces sometimes disappear :)

  Well, yes, num. of dropped packets might give us an indication that
  the networking load is heavy (which is a good thing to know if it's
  happening) but the numbers in /dev/net are cumulative numbers since
  the interface was brought up and it may not have any relation to the
  current situation ( but yes we could do some simple math to extract the
  numbers for last second or whatever)
       I was thinking more in terms of TCP connection   overhead and
  all the costs associated with that... Since most likely  
  use of  LVS is as web servers, proxies etc.,  TCP load is probably
  the most important issue here. At least that's what I've come up
  with so far. I am really looking for comments from the "experts" on
  this one...

  
>       The list is a good place for such discussion :)
> 
>       More ideas:
> 
> - use ioctls to add/delete LVS services/destinations
> 
> - use  all kinds of virtual services, forwarding methods and
> scheduling  methods (configured from the user). IOW, all LVS
> feauters.
> 
> - user  space tool to manage the config file and the network
> interfaces/routes/settings. For example:
> 
> <tool> start <domain> send gratuitous ARP, set ifaces, etc
> <tool> stop <domain>  stop ifaces, etc
> <tool> secondary <domain>     role: director -> real server
> <tool> primary <domain>               role: real server -> director
> 
> - call  scripts to play with policy routing and other kernel
> settings, etc.
> 
> - support for backup directors working as real servers


    These are all nice TODO items but I'm afraid I'll have just
enought time to focus on only the load-informed scheduling for now.

Thanks for the considered reply...
Chetan
  






<Prev in Thread] Current Thread [Next in Thread>