LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Thanks, & Systems for monitoring and maintaining a large scale LVS (was

To: Lars Marowsky-Bree <lmb@xxxxxxxxx>
Subject: Thanks, & Systems for monitoring and maintaining a large scale LVS (was Re: Factual Inaccuracy :-/ Re: Announcement made to Slashdot, LinuxToday)
Cc: Michael Sparks <zathras@xxxxxxxxxxxxxxxxxx>, lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Michael Sparks <zathras@xxxxxxxxxxxxxxxxxx>
Date: Fri, 12 Nov 1999 17:23:55 +0000 (GMT)
On Fri, 12 Nov 1999, Lars Marowsky-Bree wrote:
> On 1999-11-12T16:32:02,
> I am sorry I included such incorrect data and I have corrected the webpage
> with your data.

Excellent ! :-)))

The reason I reacted so quickly was aside from anything else it would
conflict with an announcement we're making to our users on Monday... (he
says cryptically since he can't say anything yet :)

> This is something I have already done, based on the - admittedly not that
> perfectly suited - load average of the real servers.
> 
> I keep having this grand idea about how monitoring etc should be done in the
> back of my head, but I think I will need a few weeks vacation to get it
> done... *sigh*

I know the feeling... However if the LVS Cluster continues to deliver on
its promises - as it has done to date - we're going to be needing
monitoring systems etc, so I'll be spending quite a chunk of work time
developing these. As a result I've put together some thoughts on how to
monitor and maintain a large scale system - based on waht we already do
for the main JWCS production system.

For the moment, our small system is monitored by the Director based on the
following assumptions:

   * If packets in a TCP stream cannot get from the real server to the
     director, then the real server is as good as dead.

   * Given the real servers have thousands of concurrent connections, one
     more TCP stream to monitor the system is alive is useful, and minimal
     overhead.

As a result our current monitoring system is a set of perl programs that
do this:

   * Real servers connect to the Director using a TCP stream, and send
     data pulses regularly out along it.

   * The director expects a pulse every N seconds, and can therefore
     detect if pulses are late, repeatedly late, or if the server appears
     to've stopped sending them altogether.

   * If the TCP connection is broken at either end, the other end will
     notice straight away

         - in the case of a real server detecting this, it will expect a
           new Director to become actively shortly, and so pause, and then
           start sending pulses to the new Director automatically, and be
           added into the cluster automatically.

         - in the case of master detecting this, it can mark the realsever
           down, send no more requests to that server.

   * The data pulses can be anything. This provides us with a window to
     explore more interesting - and potentially very useful automatic
     load balancing options such as based on data pulses of the median
     TCP HIT time for a cache, and if this becomes too high reducing
     traffic automatically, and if it's "too low", automatically
     increasing traffic levels.

     This would have the potential side effect of allowing the system to
     find the best levels of traffic for a server, rather than relying on
     human judgement. Secondly it could result in a system whereby no
     single server could become overloaded by sudden increases in traffic -
     unlike a DNS based  approach.

   * A simple API to help the production of such a load balancing system
     has been developed and proved itself very useful, and helps to point the 
way
     to more flexibly systems in the future.

This final point is based on some ideas as to how to make this work better
in a mission critical environment. Some thoughts we've had include:


For failover this has worked well. For configuration, the manual approach
is fairly OK. Clearly though in a production environment with (if current
purchase trends continue) dozens of machines manual configuration etc
becomes
impractical at best. 

As a result a framework that allows automated failover, load balancing
based on response times, manual configuration where necessary (no
automated system is perfect :-), or automated intervention by a third
party (eg swedish-chef) is desireable.

Outline analysis:

3 desirable forms of external system input:
  * A console interface ala ether switches.
  * A web interface.
  * A shell based interface - so that an independent machine can remotely 
    execute commands.

Things we'd like the system to be able to do:
  * Monitor the pulses' data content, and increase/decrease the load based
    on this.
  * The system must detect server failure. (Failure to do this would
    seriously affect the service for all clients)

Block diagram wise, one way of achieving this is:

 +-----------------+
 | External inputs |
 +-----------------+--\|                                       +------------+
                     --+-----------+     +---------------+     + Low Level  |
                       | Actionlog | ==> | Central state | ==> |    LVS     |
                     --+-----------+     +---------------+     | Modifier   |
                      /|                                       +------------+
+-------------------+/
| LVS Pulse Monitor |
+-------------------+

ie Both the internal and external stimulus are sent to an actionlog. The
reason for this is to allow the system to smoothly handle changes in
weights of servers. The action log would then be used to update a central
state determining current required services, servers and weights. Finally
the low level LVS modifer would examine (but not modify) this central
state to make the desired calls to /sbin/ipvsadm in order to make the real
traffic reflect the desired central state.

Whilst this may seem like jumping through hoops, what it allows us is to
have an automatic system that does the load balancing that can be modified
and over-rideded if necessary. (For example to do the "chuck another
server in" test (*) I had to effectively disable the LVS Pulse monitor -
which is highly undesirable, and this is because the existing setup isn't
of the above architecture.)

    (*) cf http://www.mailbase.ac.uk/lists/wwwcache-users/1999-10/0001.html

It should also be obvious from the above that we could have many
difference external inputs and many different sorts of LVS pulse
monitor - some complex, some not - with this scenario.

Comments on the above architecture welcome. 

One potential race hazard I can see would be in the action log,  but
since there would be a single process managing the update of actionsto the
central state, effectively the action log serialises the system allowing
us to detect any such problems.) The rest of the system is either writing
only or reading only processes, which by definition cannot interfere with
the algorithmic correctness of the other portions of the system.

FWIW, I've already started work on the console style interface at home,
and once a dummy version of that's in place I'll put it somewhere readily
accessible for comments. I may well discuss this on the LVS list as well,
if there's no objections, since the above would probably be of interest
there as well.

In the absence of comments etc, these things are going to get developed
for our use anyway :-)



Michael. 
--
National & Local Web Cache Support        R: G117
Manchester Computing                      T: 0161 275 7195
University of Manchester                  F: 0161 275 6040
Manchester UK M13 9PL                     M: Michael.Sparks@xxxxxxxxxxxxxxx




----------------------------------------------------------------------
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
To unsubscribe, e-mail: lvs-users-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: lvs-users-help@xxxxxxxxxxxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>