Poor man's level 7 switch

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Poor man's level 7 switch
Cc: Joseph Mack <mack@xxxxxxxxxxx>
From: andreas.koenig@xxxxxxxx (Andreas J. Koenig)
Date: 15 Aug 2000 10:58:25 +0200
The following is my first stab at describing application level
switching. Joe Mack has asked me to write it for his HOWTO and that is
most probably the place where this document will end up. Your feedback
is welcome.


The poor man's level 7 switch realized with LVS and Squid

An often lamented shortcoming of LVS clusters is that the realservers
have to be configured to work identically. Thus if you want to build
up a service with many servers that need to be configured differently
for some reason, you cannot take advantage of the powerful LVS.

The following describes an LVS topology where not all servers in the
pool of available servers are configured identical and where
loadbalancing is content-based.

The goal is achieved by combining the features of Squid and LVS. The
workhorses are running Apache, but any HTTP server would do.


Before we start we need to introduce a bit of Squid terminology. A
redirector ( is a
director that examines the URL and request method of an HTTP and is
enabled to change the URL in any way it needs. An accelerator
( plays the role os a
buffer and cache. The accelerator handles a relatively big amount of
slow connections to the clients on the internet with a relativly small
amount of memory. It passes requests through to any number of back-end
servers. It can be configured to cache the results of the back-end
servers according to the HTTP headers.


In the following example installation we will realize this
configuration (real IP addresses anonymized):



Squid2      # Same box as Webserver2

Webserver2   # Same box as Squid2

Note that a squid and a webserver can coexist in a single box, that's
why we have put Squid2 and Webserver7 into a single machine.

Note also that squids can cache webservers' output and thus reduce the
work for them. We dedicate 24 GB disk to caching in Squid1 and 6 GB
disk in Squid2.

And finally note that several squids can exchange digest information
about cached data if they want. We haven't yet configured for this.

Strictly speaking, a single squid can take the role of an LVSdirector,
but only for HTTP. It's slower, but it works. By accessing one of the
squids in our setup directly, this can be easily demonstrated.

Let's start assembling

I'd suggest, the first thing to do is to setup the four apache on
Webserver1..4. These servers are the working horses for the whole
cluster. They are not what LVS terminology calls realservers though.
The realservers according to LVS are the Squids.

We configure the apaches completely stardard. The only deviation from
a standard installation here is that we specify

    Port 81

in the httpd.conf. Everything else is the default configuration file
that comes with apache. In the choice of the port we are, of course,
free to choose any port we like. It's an old habit of mine to select
81 if a squid is around to act as accelerator.

We finish this round of assembling with tests that only try to access
Webserver1..4 on port 81 directly. For later testing, I recommend to
activate the printenv CGI program that comes with Apache:

    chmod 755 /usr/local/apache/cgi-bin/printenv

This program shows us, on which server the script is running
(SERVER_ADDR) and which server appears as the requesting site

One squid

Next we should configure one Squid box. The second one will mostly be
a replication of the first, so let's first nail that first one down.

When we compile the squid 2.3-STABLE4, we need already decide about
compilation options. Personally I like the features associated with
this configuration:

./configure --enable-heap-replacement --disable-http-violations \
            --enable-cache-digests    --enable-delay-pools 

We can build and install squid with these settings. But before we
start squid, we must go through a 2700 lines configuration file and
set lots of options. The following is a collection of diffs between
the squid.conf.default and my squid.conf with comments in between.

--- squid.conf.default  Mon Aug 14 12:04:33 2000
+++ squid.conf  Mon Aug 14 14:34:35 2000
@@ -47 +47 @@
-#http_port 3128
+http_port 80

Yes, we want this squid on port 80 because from outside it looks like
a normal HTTP server.

@@ -54 +54 @@
-#icp_port 3130
+icp_port 0

In the demo installation I turned ICP off, but I'll turn it on again
later. ICP is the protocol that the squids can use to exchange sibling
information about what they have on their disks.

@@ -373 +373 @@
-#cache_mem  8 MB
+cache_mem 700 MB

This is the memory reserved for holding cache data. We have 1 GB total
physical memory and 24 GB disk cache. To manage the disk cache, squid
needs about 150 MB of memory (estimate 6 MB per GB for an average
object size of 13kB). Once you're running, you can use squid's
statistics to find out *your* average object size. I usually leave 1/6
of the memory for the operating system, but at least 100 MB.

@@ -389,2 +389,2 @@
-#cache_swap_low  90
-#cache_swap_high 95
+#cache_swap_low  94
+#cache_swap_high 96
@@ -404 +404 @@
-#maximum_object_size 4096 KB
+maximum_object_size 8192 KB

Please refer to squid's docs for these values.

@@ -463,0 +464,5 @@
+cache_dir ufs /var/squid01 5600 16 256
+cache_dir ufs /var/squid02 5600 16 256
+cache_dir ufs /var/squid03 5600 16 256
+cache_dir ufs /var/squid04 5600 16 256

You do not need bigger disks, you need many disks to speed up squid.
Join the squid mailing list to find out about the efficiency of
filesystem tuning like "noatime" or Reiser FS.

@@ -660 +665 @@
-#redirect_program none
+redirect_program /usr/local/squid/etc/

This is the meat of our usage of squid. This program can be as simple
as you want or as powerful as you want. It can be implemented in any
language and it will be run within a pool of daemons. My program is
written in perl and looks something like the following:

    while (<>) {
      my($url,$host,$ident,$method) = split;
      my @redir = $url =~ /\bh=([\d,]+);?/ ?
                 split(/,/,$1) : (6,7,8,9); # last components of our IP numbers
      my $redir = $redir[int rand scalar @redir];
      $url =~ s/PLACEHOLDER:81/10.0.0.$redir\:81/i;
      print STDOUT "$url\n";

This is ideal for testing, because it allows me to request a single
backend server or a set of backend servers to choose from via the CGI
querystring. A request like

will then be served by backend apache

@@ -668 +673 @@
-#redirect_children 5
+redirect_children 10

The more complex the redirector program is, the more processes should
be allocated to run it.

@@ -674 +679 @@
-#redirect_rewrites_host_header on
+redirect_rewrites_host_header off
@@ -879 +884 @@
-#replacement_policy LFUDA
+replacement_policy LFUDA
@@ -1168 +1173 @@
-acl Safe_ports port 80 21 443 563 70 210 1025-65535
+acl Safe_ports port 80 81 21 443 563 70 210 1025-65535
@@ -1204 +1209 @@
-http_access deny all
+http_access allow all

For all of the above changes, please refer to the squid.conf.default.

@@ -1370,2 +1375,3 @@
-#httpd_accel_host hostname
-#httpd_accel_port port
+# we will replace with our host of choice
+httpd_accel_host PLACEHOLDER
+httpd_accel_port 81

As we are redirecting everything through the redirector, we can fill
in anything we want. No real hostname, no real port is needed. The
redirector program will have to know what we chose here.

@@ -1377 +1383 @@
-#httpd_accel_with_proxy off
+httpd_accel_with_proxy on

If we want ICP working (and we said, we would like to get it working),
we need this turned on.

We're done with our first squid, we can start it and test it. If you
send a request to this squid, one of the backend servers will answer
according to the redirect policy of the redirector program.

Basically, at this point in time we have a fully working content based
redirector. As already mentioned, we do not really need LVS to
accomplish this. But the downside of this approach is:

- we are comparatively slow: squid is not famous for speed.

- we do not scale well: if the bottleneck is a the squid, we want LVS
  to scale up.

Another squid

So the next step in our demo is to build another squid. This is very
trivial given that we have already one. We just copy the whole
configuration and adjust a few parameters if there are any differences
in the hardware.

Combining pieces with LVS

The rest of the story is to read the appropriate docs for LVS. I have
used Horms's ultramonkey docs and there's nothing to be added for this
kind of setup. Keep in mind that only the squids are to be known by
the LVS box. They are the "realservers" in LVS terminology. The apache
back end servers are only known to the squids' redirector program.


It has been said that LVS is fast and squid is slow, so people
believe, they must implement a level 7 switch in LVS to have it
faster. This remains to be proofed.

Squid is really slow compared to some of the HTTP servers that are
tuned for speed. If you're serving static content with a hernel HTTP
daemon, you definitely do not want to lose the speed by running it
through a squid.

If you want persistent connections, you need to implemented them in
your redirector. If you want to take dead servers out of the pool, you
must implement it in your redirector. If you have a complicated
redirector, you need more of them and thus need more ressources.

In the above setup, ldirectord monitors just the two squids. A failure
of one of the apaches might go by unnoticed, so you need to do
something about this.

If you have not many cacheable data like SSL or things that need to
expire immediately or a high fraction of POST requests, the squid
seems like a waste of resources. I'd say, in that case you just give
it less disk space and memory.

Sites that prove unviewable through Squid are a real problem (Joe
Cooper reports there is a stock ticker that doesn't work through
squid). If you have contents that cannot be served through a squid,
you're in big trouble--and as it seems, on your own.


<Prev in Thread] Current Thread [Next in Thread>
  • Poor man's level 7 switch, Andreas J. Koenig <=