
Re: Redundant Load balanced cluster,

To: " users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: Redundant Load balanced cluster,
From: ipvsuser <ipvsuser@xxxxxxxxxxxxxxxx>
Date: Tue, 15 Feb 2005 08:28:53 -0800
Jan Klopper wrote:
> I just bought 5 new servers to replace/extend my old 2 webservers.
> Would this be a good setup?
> The 2 old servers now in service could have their websites migrated to
> the cluster, en than ater on join the cluster.
> Internal lan will be 100mbits, internet on both the routers 100mbit.
> Hearthbeats between routers and between mysqls will be serial.
what about another NIC vs serial connected via a small hub, it could be
more flexible than serial, easier to config or adjust remotely?

> The application which i going to run on this is a really database
> heavy php application, caching with squids won't do much good.

I just managed a commercial site with almost that exact configuration
over the holidays - on the busiest day Dec 23rd we pumped about 10  GB
of data out per each of the 4 servers during business hours, at least a
few million hits per server.

We had:
* 2 schedulers using tun,
* 4 web apache 2.0.X content servers with all site content rsync'd onto
each server when content changes were made,
* 1 netapp with 480 GB of user files connected to the 4 content servers
via nfs (backed up to slow raid system)
* 1 mysql running about 2000 "questions"/sec peak with a cold backup
(being converted to warm)

ipvs created near zero cpu load on the schedulers at peak, but hung a
couple times in a two month period with no sign of why, whole site was
just dead
# ipvsadm -C;ipvsadm -R blah.youdidbackuptheconfig
fixed it and I would have automated checks that try to evaluate if they
can see the realservers responding but not the vip, to try the above
after exhausting other revival methods. Also, that is why it is nice to
have the real servers bound to another address besides the vip - you can
test for liveness somewhat outside of the ipvs world.

We were running mod_perl, so I had the individual children only live for
so many requests so they didn't kill the machine - apache uptime
improved after having the children only live for a few hours worth of
requests. Keep alive on the apache servers saves setup/teardown effort,
I would watch the extended server-status and could multiple requests
coming through the same child/connection and people navigated through
the site. You start to see when the keep alive time is too long and the
children are there waiting. I kept one server with keep alive turned off
to be more likely to have open slots if a burst of home page
"looky-lu's" came in.

Check out spread for the apache logs instead of logging to nfs and use rsync for static
content, if you can, even if it changes every hour, you only need to
create the script once. They also have a project doing https session
sharing if you are doing https, but it looks stagnant.

If you are running the director on the same machine(s) as a content/real
server make sure your network parameters are beefy

I used a cronjob to record ipvsadm -l -n --rate every few minutes so I
could go back and review director/dispatcher reaction to different
conditions, server outages and re-ups, etc.

And do yourself a favor, set up your ipvs config using a simple web
server first, one where you can go to a directory and assign an ip and
port to bind to on the command line. You can isolate any ipvs config
problems vs apache, etc problems. I use gatling , because it screams and I can load test the
hell out of my ipvs set up to pre-stress everything except apache.
1 director, 2 web servers all 3 are old, $100, P-III 650 MHz, 384 Meg
RAM Compaq EN's running FC 3 not tuned on a 3com/100mb hub (not switch):
.21/.22 # cd /test/docs
.21/.22 # /opt/diet/bin/gatling -V -E -P 4M -F -d -i -p 90

.30 # ipvsadm -l -n
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port director Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP rr
  ->               Route   1      0          0
  ->               Route   1      0          0

.31 # nohup ab -n 1000000 -c 120 -v 2
Server Software:        Gatling/0.7
Server Hostname:
Server Port:            90

Document Path:          /
Document Length:        108 bytes

Concurrency Level:      120
Time taken for tests:   287.195 seconds
Complete requests:      1000000
Failed requests:        0
Broken pipe errors:     0
Total transferred:      281017422 bytes
HTML transferred:       108006696 bytes
Requests per second:    3481.95 [#/sec] (mean)
Time per request:       34.46 [ms] (mean)
Time per request:       0.29 [ms] (mean, across all concurrent requests)
Transfer rate:          978.49 [Kbytes/sec] received

Connnection Times (ms)
             min  mean[+/-sd] median   max
Connect:        0    12   99.8      9  9017
Processing:     1    22    5.9     20   244
Waiting:        0    21    5.9     20   244
Total:          1    34  100.2     30  9050

Percentage of the requests served within a certain time (ms)
 50%     30
 66%     31
 75%     33
 80%     35
 90%     39
 95%     42
 98%     46
 99%     50
100%   9050 (last request)

<Prev in Thread] Current Thread [Next in Thread>