> But everything is theory and must be tried in production :) We must be
> ready for surprises :)
Currently I've been trying the following stuff (which is aimed at self
adaptive weight changing) in a production environment on the smaller of
our LVS based caches. (which has a greater variety of hardware
styles/ages, and therefore needs it more badly!)
Summary version of the setup:
* Pinger process/pinger reciever process on realserver/director.
(Director has 3 IP's - primary machine IP, VIP of the service, and
then a service IP for the VIP - the real servers ping the service IP
for the VIP)
* Pinger information is currently a single line of data.
* If the pinger reciever process doesn't get a ping within a certain
time period, the director assumes that the realserver is unreachable
- either due to broken network or broken server - so no point in
forwarding new connections. Weight of server is dropped to 0.
* On the real server periodically loading information is measured, and
written to a file - currently /usr/local/bin/ping_data.
* A symbolic link exists called /usr/local/bin/data_to_ping which refers
to the above file.
* The pinger only sends data if the symbolic link exists. If the
symbolic link exists the data it contains is sent.
The format of this file (and hence data sent) is a simple XML fragment as
follows: (based on the fact that dealing with XML is simple to deal with
in perl - the language I use most :-)
(reformatted so it's not on one line for readability)
<lvs-mon>
<realinfo t="hostip:port:type:status"
r="server:ver1:ver2">
server specific info
</realinfo>
</lvs-mon>
This isn't very good XML, since t & r should be elements themselves really
since they contain composite data, but currently this is constrained to
being on a single line. This will change when I get round to re-writing
the pinger-recieving code to detect "</lvs-mon>" as an end of message
marker rather than "\n" :-)
t element fields refer to realserver info:
hostip:port - obvious info on the real server.
type - TCP/UDP.
status - underload/nominal/overload.
r element fields refer to service info - dictates what the server specific
info is relating to.
server - servername/type
ver1 - version of the server software
ver2 - version of the body server specific info's format.
server specific info - just that - any other data that the system wants to
send.
(real) Example :
<lvs-mon>
<realinfo t="194.83.240.13:8080:TCP:nominal"
r="SQUID:2.1:0.0">
335,34076,510,4906,20799
</realinfo>
</lvs-mon>
So in this case the server specific info is relating to Squid 2.1, and the
version format of the body contents is 0.0. The body contents in this case
is:
field1 - Median TCP_HIT time
field2 - Median "throughput" for want of a better phrase. (Isn't actually,
but is related to it since it is based on the response time &
object sizes over a period of time)
field3 - Mean TCP_HIT time
field4 - Mean object size
field5 - mean "throughput"
In our case, the nominal field is calaculated by saying this:
if (median TCP_HIT < 100) THEN underload
if (median TCP_HIT > 1000 AND median THROUGHPUT<30000) then overload
otherwise nominal
Since this data is only sent if the symbolic link exists, another
subsystem can check to see whether the system believes itself to be
healthy or not, and manipulate this link accordingly. (squid can be a
beast when running constantly under its upper tolerances :-)
This information being recieved by the director is designed to be dealt
with in layers, currently only the lowest layer is implemented, but the
rest will follow shortly. (It has to for various reasons)
Lowest layer - This deals with recieving ping data & monitoring if
machines are alive or not.
Middle layer - Recieves the t field information - is given rules as to
what to do in various situations - if underload - increase
weight by a factor of 1.1. If overload, halve weight.
Top Layer - Receives the r field information. Can be designed to look
out for specific server specific information.
The base level I use for manipulating weights is 1000, since this gives a
fairly good level of granularity - especially when adding servers into the
cluster. (I looked at the source to see the weight's data type, and cam to
the conclusion that anything less was "silly" :-)
Currently I'm personally acting as the middle/top layer by looking at the
data we recieve periodically and adjusting weights accordingly. This has
already proved it's worth by allowing me to adjust the weights on the
smaller LVS cache cluster we use which has a much greater variety of
hardware styles/machine ages than the larger cluster. By doing this it's
kept the majority of machines in an underload state, and not get the
situation where one machine is underloaded when another is overloaded.
For example current figures for kai.mcc.wwwcache.ja.net are:
(Extractable easily because it get's dumped to a file :-)
<lvs-mon><realinfo t="194.83.240.42:8080:TCP:nominal"
r="SQUID:2.1:0.0">272,265,407,5149,19073</realinfo></lvs-mon>
<lvs-mon><realinfo t="194.83.240.22:8080:TCP:overload"
r="SQUID:2.1:0.0">2471,2351,2982,4037,1117</realinfo></lvs-mon>
<lvs-mon><realinfo t="194.83.240.14:8080:TCP:nominal"
r="SQUID:2.1:0.0">483,3404,980,4148,4010</realinfo></lvs-mon>
<lvs-mon><realinfo t="194.83.240.13:8080:TCP:overload"
r="SQUID:2.1:0.0">2900,332,1908,4473,1956</realinfo></lvs-mon>
<lvs-mon><realinfo t="194.83.240.21:8080:TCP:nominal"
r="SQUID:2.1:0.0">391,330,680,4658,5974</realinfo></lvs-mon>
<lvs-mon><realinfo t="194.83.240.18:8080:TCP:nominal"
r="SQUID:2.1:0.0">416,3482,997,4847,5429</realinfo></lvs-mon>
(sometimes overloads are unavoidable no matter how hard you try manually...)
With the director's routing table looking like this:
(though I'm likely to change weights shortly :-)
IP Virtual Server version 0.8.3 (size=65536)
Protocol LocalAddress:Port Scheduler Flags
TCP 194.83.240.130:8080 wlc
-> bla.bla.bla.bla:8080 Tunnel 1750 708 5502
-> bla.bla.bla.bla:8080 Tunnel 1000 410 2779
-> bla.bla.bla.bla:8080 Route 500 212 1211
-> bla.bla.bla.bla:8080 Route 500 206 1468
-> bla.bla.bla.bla:8080 Route 1000 396 3778
-> bla.bla.bla.bla:8080 Tunnel 750 309 2182
Other points:
* The real server decides when it's overloaded, not the director, and
expects the director to try and do something about it.
* Real server takes itself out of service when/if it detects software
problems. Or if it needs to stop recieving requests. (eg a large
virtual web server created by taking a bunch of mirror sites, and
tunneling requests from the director to the mirrors' sites. By simply
deleting the symbolic link they remove themselves from the service
without needing access to the service, and can rejoin it when they
need to.)
* These last two could be combined say, by having the overload/underload/
removal mechanism controlled by cost - the mirror site only deals with
a certain number of requests per day/hour for cost reasons, and then
removes itself.
* Automatic weight selection code at bare minimum just has to deal with
a single simple format piece of data : overload/underload/nominal
* Support for higher level/greater monitoring if needed - eg the higher
level software could mail/page admin if servers have gone down.
* Very loose coupling of components. Changing any section of this
monitoring system can be done on the fly in a production environment
without affecting service. The pinger at the moment is a very simple
shell script using netcat as it's transport tool. The reciever
mechanism is currently a perl script that does that same.
The program that collates the load balancing info doesn't care how
it's called or why. The program to monitor the system state doesn't
care about the contents of the file it deletes.
* Quickly finding out the status of the entire cluster is dead easy as
above - using XML/XSL style sheets, creating fancy reports/pretty
status indicators is trivial using the right (existing) tools.
There's quite a lot more to come on the director's monitoring side for our
systems which I'm still working on, but for an indicator of where I'm
heading you can take peek at http://epsilon3.mcc.ac.uk/~zathras/LVS/XML/eg.xml
It probably seems a bit OTT to an extent, but when you consider the whole
thing can be read in as a single usefully structured data structure using
a single line of perl code, it might be obvious why I'm using that format
:-)
The other thing is you'll see that some parts are designed to feedback
into the system (eg the LVSMONITORS elements specifically) and allow us to
control the behaviour of the system without having to write extra code to
do so. This also means we can get human readable/pretty printable reports
on system activity very simply - just use an XML/XSL style sheet. (Things
I've noticed management tend to like.)
I'm writing this code in between a lot of other things, so it's taking a
while to get out - but any/all feedback is most welcome.
Michael.
--
National & Local Web Cache Support R: G117
Manchester Computing T: 0161 275 7195
University of Manchester F: 0161 275 6040
Manchester UK M13 9PL M: Michael.Sparks@xxxxxxxxxxxxxxx
----------------------------------------------------------------------
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
To unsubscribe, e-mail: lvs-users-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: lvs-users-help@xxxxxxxxxxxxxxxxxxxxxx
|