LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

[lvs-users] self adaptive weight changing (was Re: [lvs-users] New to LV

To: Linux Virtual Servers Mailling List <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: [lvs-users] self adaptive weight changing (was Re: [lvs-users] New to LVS)
From: Michael Sparks <zathras@xxxxxxxxxxxxxxxxxx>
Date: Wed, 16 Feb 2000 12:40:28 +0000 (GMT)
> But everything is theory and must be tried in production :) We must be
> ready for surprises :)

Currently I've been trying the following stuff (which is aimed at self
adaptive weight changing) in a production environment on the smaller of
our LVS based caches. (which has a greater variety of hardware
styles/ages, and therefore needs it more badly!)

Summary version of the setup:
  * Pinger process/pinger reciever process on realserver/director.
    (Director has 3 IP's - primary machine IP, VIP of the service, and
    then a service IP for the VIP - the real servers ping the service IP
    for the VIP)

  * Pinger information is currently a single line of data.

  * If the pinger reciever process doesn't get a ping within a certain
    time period, the director assumes that the realserver is unreachable
    - either due to broken network or broken server - so no point in
    forwarding new connections. Weight of server is dropped to 0.

  * On the real server periodically loading information is measured, and
    written to a file - currently /usr/local/bin/ping_data.

  * A symbolic link exists called /usr/local/bin/data_to_ping which refers
    to the above file.

  * The pinger only sends data if the symbolic link exists. If the
    symbolic link exists the data it contains is sent.

The format of this file (and hence data sent) is a simple XML fragment as
follows: (based on the fact that dealing with XML is simple to deal with
in perl - the language I use most :-)

(reformatted so it's not on one line for readability)

<lvs-mon>
   <realinfo t="hostip:port:type:status" 
             r="server:ver1:ver2">
        server specific info
   </realinfo>
</lvs-mon>

This isn't very good XML, since t & r should be elements themselves really
since they contain composite data, but currently this is constrained to
being on a single line. This will change when I get round to re-writing
the pinger-recieving code to detect "</lvs-mon>" as an end of message
marker rather than "\n" :-)

t element fields refer to realserver info:
  hostip:port - obvious info on the real server.
  type - TCP/UDP.
  status - underload/nominal/overload.

r element fields refer to service info - dictates what the server specific
  info is relating to.
  server - servername/type
  ver1   - version of the server software
  ver2   - version of the body server specific info's format.

server specific info - just that - any other data that the system wants to
send.

(real) Example :
<lvs-mon>
   <realinfo t="194.83.240.13:8080:TCP:nominal"
             r="SQUID:2.1:0.0">
       335,34076,510,4906,20799
   </realinfo>
</lvs-mon>

So in this case the server specific info is relating to Squid 2.1, and the
version format of the body contents is 0.0. The body contents in this case
is:

field1 - Median TCP_HIT time
field2 - Median "throughput" for want of a better phrase. (Isn't actually,
         but is related to it since it is based on the response time &
         object sizes over a period of time)
field3 - Mean TCP_HIT time
field4 - Mean object size
field5 - mean "throughput"

In our case, the nominal field is calaculated by saying this:

if (median TCP_HIT < 100) THEN underload
if (median TCP_HIT > 1000 AND median THROUGHPUT<30000) then overload
otherwise nominal


Since this data is only sent if the symbolic link exists, another
subsystem can check to see whether the system believes itself to be
healthy or not, and manipulate this link accordingly. (squid can be a
beast when running constantly under its upper tolerances :-)

This information being recieved by the director is designed to be dealt
with in layers, currently only the lowest layer is implemented, but the
rest will follow shortly. (It has to for various reasons)

Lowest layer - This deals with recieving ping data & monitoring if
               machines are alive or not.

Middle layer - Recieves the t field information - is given rules as to
               what to do in various situations - if underload - increase
               weight by a factor of 1.1. If overload, halve weight.

Top Layer    - Receives the r field information. Can be designed to look
               out for specific server specific information. 

The base level I use for manipulating weights is 1000, since this gives a
fairly good level of granularity - especially when adding servers into the
cluster. (I looked at the source to see the weight's data type, and cam to
the conclusion that anything less was "silly" :-)

Currently I'm personally acting as the middle/top layer by looking at the
data we recieve periodically and adjusting weights accordingly. This has
already proved it's worth by allowing me to adjust the weights on the
smaller LVS cache cluster we use which has a much greater variety of
hardware styles/machine ages than the larger cluster. By doing this it's
kept the majority of machines in an underload state, and not get the
situation where one machine is underloaded when another is overloaded.

For example current figures for kai.mcc.wwwcache.ja.net are:
(Extractable easily because it get's dumped to a file :-)

<lvs-mon><realinfo t="194.83.240.42:8080:TCP:nominal" 
r="SQUID:2.1:0.0">272,265,407,5149,19073</realinfo></lvs-mon>
<lvs-mon><realinfo t="194.83.240.22:8080:TCP:overload" 
r="SQUID:2.1:0.0">2471,2351,2982,4037,1117</realinfo></lvs-mon>
<lvs-mon><realinfo t="194.83.240.14:8080:TCP:nominal" 
r="SQUID:2.1:0.0">483,3404,980,4148,4010</realinfo></lvs-mon>
<lvs-mon><realinfo t="194.83.240.13:8080:TCP:overload" 
r="SQUID:2.1:0.0">2900,332,1908,4473,1956</realinfo></lvs-mon>
<lvs-mon><realinfo t="194.83.240.21:8080:TCP:nominal" 
r="SQUID:2.1:0.0">391,330,680,4658,5974</realinfo></lvs-mon>
<lvs-mon><realinfo t="194.83.240.18:8080:TCP:nominal" 
r="SQUID:2.1:0.0">416,3482,997,4847,5429</realinfo></lvs-mon>

(sometimes overloads are unavoidable no matter how hard you try manually...)

With the director's routing table looking like this: 
(though I'm likely to change weights shortly :-)

IP Virtual Server version 0.8.3 (size=65536)
Protocol LocalAddress:Port Scheduler Flags
TCP 194.83.240.130:8080 wlc
      -> bla.bla.bla.bla:8080    Tunnel  1750   708        5502
      -> bla.bla.bla.bla:8080    Tunnel  1000   410        2779
      -> bla.bla.bla.bla:8080    Route   500    212        1211
      -> bla.bla.bla.bla:8080    Route   500    206        1468
      -> bla.bla.bla.bla:8080    Route   1000   396        3778
      -> bla.bla.bla.bla:8080    Tunnel  750    309        2182

Other points:
  * The real server decides when it's overloaded, not the director, and
    expects the director to try and do something about it.

  * Real server takes itself out of service when/if it detects software
    problems. Or if it needs to stop recieving requests. (eg a large
    virtual web server created by taking a bunch of mirror sites, and
    tunneling requests from the director to the mirrors' sites. By simply
    deleting the symbolic link they remove themselves from the service
    without needing access to the service, and can rejoin it when they
    need to.)

  * These last two could be combined say, by having the overload/underload/
    removal mechanism controlled by cost - the mirror site only deals with
    a certain number of requests per day/hour for cost reasons, and then
    removes itself.

  * Automatic weight selection code at bare minimum just has to deal with
    a single simple format piece of data : overload/underload/nominal

  * Support for higher level/greater monitoring if needed - eg the higher
    level software could mail/page admin if servers have gone down.

  * Very loose coupling of components. Changing any section of this
    monitoring system can be done on the fly in a production environment
    without affecting service. The pinger at the moment is a very simple
    shell script using netcat as it's transport tool. The reciever
    mechanism is currently a perl script that does that same.
    The program that collates the load balancing info doesn't care how
    it's called or why. The program to monitor the system state doesn't
    care about the contents of the file it deletes.

  * Quickly finding out the status of the entire cluster is dead easy as
    above - using XML/XSL style sheets, creating fancy reports/pretty
    status indicators is trivial using the right (existing) tools.

There's quite a lot more to come on the director's monitoring side for our
systems which I'm still working on, but for an indicator of where I'm
heading you can take peek at http://epsilon3.mcc.ac.uk/~zathras/LVS/XML/eg.xml

It probably seems a bit OTT to an extent, but when you consider the whole
thing can be read in as a single usefully structured data structure using
a single line of perl code, it might be obvious why I'm using that format
:-)

The other thing is you'll see that some parts are designed to feedback
into the system (eg the LVSMONITORS elements specifically) and allow us to
control the behaviour of the system without having to write extra code to
do so. This also means we can get human readable/pretty printable reports
on system activity very simply - just use an XML/XSL style sheet. (Things
I've noticed management tend to like.)

I'm writing this code in between a lot of other things, so it's taking a
while to get out - but any/all feedback is most welcome.


Michael.
--
National & Local Web Cache Support        R: G117
Manchester Computing                      T: 0161 275 7195
University of Manchester                  F: 0161 275 6040
Manchester UK M13 9PL                     M: Michael.Sparks@xxxxxxxxxxxxxxx


----------------------------------------------------------------------
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
To unsubscribe, e-mail: lvs-users-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: lvs-users-help@xxxxxxxxxxxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>