LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: What is "load"? Monitoring, load-informed scheduling and so on..

To: Chetan Ahuja <ahuja@xxxxxxxxxxxxxxxxx>
Subject: Re: What is "load"? Monitoring, load-informed scheduling and so on..
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Julian Anastasov <uli@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 4 Jul 2000 07:47:46 +0300 (EEST)
        Hello,

On Mon, 3 Jul 2000, Chetan Ahuja wrote:

> 
> 
> On Sun, 2 Jul 2000, Julian Anastasov wrote:
> 
> > 
> >     Hello,
> > 
> > On Sat, 1 Jul 2000, Chetan Ahuja wrote:
> 
> > >  What exactly constitutes "Load" on a real server and how often should it
> > > be measured? 
> > 
> >     Only the LVS user can decide what is "Load" and what
> > is  the preferred period to update its values. The different
> > real services change different parameters in the real hosts.
> > We  have to allow the user  to select which parameters to be
> > monitored for each real service.
>  
>    Yes.. I agree that ultimately we would want to let the user decide as
> many parameters of scheduling algorithm as possible. ( More about this 
> later..)
> 
> 
> > >    If yes, what kind of things should be measured as "load". The astute
> > > reader would shoot back immediately with stuff like CPU, memory and
> > > network load. Let's treat them one by one:
> > 
> > > 
> > > (All of the following is assuming that the realservers are runnig linux.
> > >   At least  I'm going to deal with only such cases for now)
> > > 
> > > 1) CPU: How good is polling  /proc/loadavg? My problem with that is the
> > >   load  introduced by the measurement itself if polling is done too often.
> > 
> >     5  seconds is  not fatal.
> 
>    I think a  once every 5 second sampling of load is probably
>  not enough. We'd need more frequent sampling. This is where people
>  with real experience in running large LVS clusters comes in. I would
>  really LOVE to hear from people who think that they are not entirely
>  satisfied with the current  schedulers and would like a
>  load-informed scheduler. What kind of applications demand such a
>  scheme ? Once we have this information, we could decide what
>  would be the best sampling period.  (And of course, ideally we should
>  make it a runtime configurable parameter as I said before)

        Yes,  complex task. But you can select weights using
averaged  parameters.  It  is difficult  to react  on client
requests  but  don't forget  about  the load  generated from
local  processes not part from  the service.  I don't think,
reducing  the interval  will help.  My  experince shows that
selecting  an  expression  from averaged  values  works with
current WRR without stressing some of the real hosts.

        And  yes, selecting big weights is not good for WRR.
It  is possible some of the  real servers to be flooded with
requests. We must follow this rule when using WRR:

weight < request_rate * update_interval

        So,  for 50 req/sec and 5 seconds interval don't set
weight  greater than  250. In  fact, the  problem is  in the
different  weights. Big differences  in the weights compared
to  the request rate lead to  inbalance.  Of course, this is
with  current WRR. And I don't  see any other better method.
And there is a backlog for any TCP service provided from the
kernel.   May be WRR can schedule  more requests to one real
server  for short period if its weight is big but if the sum
of  all weight is  less than the number  of requests for the
update interval, the requests are scheduled according to the
weights. You will need another scheduling method if you want
to  break the above rules and to use little update intervals
reacting  to  the  client  requests.   Reducing  the  update
interval  in the above  formula leads to  inbalance when you
use real servers with very different weights.

        The  result: we need benchmarks from production. The
user  can tune its expressions and update interval according
to the load.

> 
> > >         Besides, how good is the info in loadavg?   Doesn't it just
> > >  measure  the number of processes in the queue every few  milliseconds or
> > >  so to calculate the load. One could argue  (and many  people do argue)
> > >  that this is not a good metric of CPU load. Any ideas ??
> > 
> >     Yes,  loadavg is not good for web and the other well
> > known  services.   But  the  user still  can  run  some real
> > services  that eat CPU. If the  loadavg can be high the user
> > can select it as load parameter.
> 
>      So what would be a better way to get CPU load info. I would like
> to use the /proc interface as much as possible and avoid special
> kernel patches. But that's not an absolute requirement. I'm looking for
> suggestions as to the best metric for CPU activity (averaged, say,
> over  the past  one second). I'm thinking some combination of

        /proc/stat   is   good,   there  is   a   cpu  load:
user/nice/system/idle.  You have to make the averaged values
according  to  the  update  interval or  may  be  to another
interval which is good to allow removing the load picks.

> 
> a) num. of passes through schedule()
> b) num. of interrupts
> c) num. of processes in run queues
> d) some count  of how many processes used up their alloted time quantum
>    without  sleeping on I/O  (might indicate CPU intensive work as
>    against I/O intensive work)
> 
>   Comments please...   

        Hm, very exotic parameters but may be needed.

        For web I use:

- 5 seconds interval

- cpu  idle (on all  CPUs) in percents  averaged from last 5
seconds

- free memory in megabytes

- WRR with weights < 200

- I  can build kernel  in the web  while it is  used as real
server. Is this sounds good :) This is not possible for WLC.


        I didn't tested ftp but may be expression from:

- number of processes

- network packets/bytes (per interface?)

        Any people in production?

> 
> > > 2) Memory: We could just do  what free command does (which is
> > >    just   reading   /proc/meminfo). Is that good enough. Anybody see any
> > >    pitfalls in that approach?  Of course, polling /proc too often is again
> > >    a problem here. Besides that ??
> > 
> >     Yes,  you  can  create  many  load  parameters  from
> > /proc/meminfo.   Even "Cached" and "Buffers" are useful. And
> > sometimes  it is faster to open  and read from /proc than to
> > to read the parameters one by one by using other interfaces.
> 
>     Seems like we have all the info we need for memory from /proc/meminfo

        Yes

>   
> > > 3) Network: This is the hardest one. What would be a good metric of
> > >   network  load... number of alive TCP connections??  Is that good
> > >   enough... I'm not deeply familiar with the kernel networking code. Could
> > >   somebody who is more familiar would throw some more light on this....
> > 
> >     You  can try with /proc/net/dev. There are bytes and
> > packets for each interface but the drawback is that they are
> > sometimes zeroed and the interfaces sometimes disappear :)
> 
>   Well, yes, num. of dropped packets might give us an indication that
>   the networking load is heavy (which is a good thing to know if it's
>   happening) but the numbers in /dev/net are cumulative numbers since
>   the interface was brought up and it may not have any relation to the
>   current situation ( but yes we could do some simple math to extract the
>   numbers for last second or whatever)

        Yes and this is not difficult. Just create parameter
cpuidle5 or numpackets5 and make some calcs to support them.
Some of the values are not useful in raw form. And the world
is  not perfect,  there are  different clients:  some of the
server  processes can be  delayed or they  can add different
load.  Cpuidle and freemem < 20% are not very good and after
this point we have to add more real servers in the cluster.

>        I was thinking more in terms of TCP connection   overhead and
>   all the costs associated with that... Since most likely  
>   use of  LVS is as web servers, proxies etc.,  TCP load is probably
>   the most important issue here. At least that's what I've come up
>   with so far. I am really looking for comments from the "experts" on
>   this one...

        Hm,  measuring  the kernel  load is  very difficult.
And  I don't think  it is needed.   Which TCP parameters you
need.   May be some  from /proc/net/snmp ?  To be honest, it
seems  I don't need  to react on  client's requests. Keeping
the  load before 80% allows this. The problem is in the load
generated   from  processes  not   part  from  the  service.
Including  the  cpuidle  and freeram  in  the  expression is
mandatory  in this case and solves  all problems. May be the
net packets too.

> 
>   
> >     The list is a good place for such discussion :)
> > 
> >     More ideas:
> > 
> > - use ioctls to add/delete LVS services/destinations
> > 
> > - use  all kinds of virtual services, forwarding methods and
> > scheduling  methods (configured from the user). IOW, all LVS
> > feauters.
> > 
> > - user  space tool to manage the config file and the network
> > interfaces/routes/settings. For example:
> > 
> > <tool> start <domain>       send gratuitous ARP, set ifaces, etc
> > <tool> stop <domain>        stop ifaces, etc
> > <tool> secondary <domain>   role: director -> real server
> > <tool> primary <domain>             role: real server -> director
> > 
> > - call  scripts to play with policy routing and other kernel
> > settings, etc.
> > 
> > - support for backup directors working as real servers
> 
> 
>     These are all nice TODO items but I'm afraid I'll have just
> enought time to focus on only the load-informed scheduling for now.

        OK

        Yes,  the list can grow :)  It is preferred the load
monitoring  to work  together with  the virtual/real service
management  but  may  be  it is  good  these  modules  to be
separated,  who knows. We need  other opinions.  May be from
someone  using  similar  expressions and  with  other golden
rules :)

> 
> Thanks for the considered reply...
> Chetan


Regards

--
Julian Anastasov <uli@xxxxxxxxxxxxxxxxxxxxxx>



<Prev in Thread] Current Thread [Next in Thread>