LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: Redundant Load balanced cluster,

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: Redundant Load balanced cluster,
From: Jan Klopper <janklopper@xxxxxxxxx>
Date: Wed, 16 Feb 2005 09:22:36 +0100
ipvsuser wrote:

>Jan Klopper wrote:
>  
>
>>I just bought 5 new servers to replace/extend my old 2 webservers.
>>    
>>
>...
>  
>
>>Would this be a good setup?
>>The 2 old servers now in service could have their websites migrated to
>>the cluster, en than ater on join the cluster.
>>Internal lan will be 100mbits, internet on both the routers 100mbit.
>>Hearthbeats between routers and between mysqls will be serial.
>>    
>>
>what about another NIC vs serial connected via a small hub, it could be
>more flexible than serial, easier to config or adjust remotely?
>  
>

Im allready using an seccond internal lan, but i might include extra
nics, since im allready running on two seperate switches, i just create
a heartbeat vlan.

>  
>
>>The application which i going to run on this is a really database
>>heavy php application, caching with squids won't do much good.
>>    
>>
>
>I just managed a commercial site with almost that exact configuration
>over the holidays - on the busiest day Dec 23rd we pumped about 10  GB
>of data out per each of the 4 servers during business hours, at least a
>few million hits per server.
>
>  
>
Im allready doing a hefty 10gb's per day on the old win2k dual p2-450
machine, with an avarage of 400queries/sec.
Il think my cluster will need to handle an avarage of 6000 q's per sec.

>We had:
>* 2 schedulers using tun,
>* 4 web apache 2.0.X content servers with all site content rsync'd onto
>each server when content changes were made,
>* 1 netapp with 480 GB of user files connected to the 4 content servers
>via nfs (backed up to slow raid system)
>* 1 mysql running about 2000 "questions"/sec peak with a cold backup
>(being converted to warm)
>  
>
Il was plannig on using an NFS server, which would double as the mysql
slave (for data gathering), But the sever proves to be horribly unstable.
It will keep running, but sometimes the scsi controller gives up and
brings down linux, and while booting it might forget its a dual machine,
or only count half its memmory. Not very data center proof. im ditching
it, might build a different server in the case.

>ipvs created near zero cpu load on the schedulers at peak, but hung a
>couple times in a two month period with no sign of why, whole site was
>just dead
># ipvsadm -C;ipvsadm -R blah.youdidbackuptheconfig
>fixed it and I would have automated checks that try to evaluate if they
>can see the realservers responding but not the vip, to try the above
>after exhausting other revival methods. Also, that is why it is nice to
>have the real servers bound to another address besides the vip - you can
>test for liveness somewhat outside of the ipvs world.
>  
>
So this is one of the harder things to figure out right?

>We were running mod_perl, so I had the individual children only live for
>so many requests so they didn't kill the machine - apache uptime
>improved after having the children only live for a few hours worth of
>requests. Keep alive on the apache servers saves setup/teardown effort,
>I would watch the extended server-status and could multiple requests
>coming through the same child/connection and people navigated through
>the site. You start to see when the keep alive time is too long and the
>children are there waiting. I kept one server with keep alive turned off
>to be more likely to have open slots if a burst of home page
>"looky-lu's" came in.
>  
>
Il be running an accelerated php version, and im not having a whole lot
of problems with the uptime's so far. I think mod-perl's the bitch in
your example, which i won't need to deal with.

>Check out spread for the apache logs instead of logging to nfs
>http://www.lethargy.org/mod_log_spread/ and use rsync for static
>content, if you can, even if it changes every hour, you only need to
>create the script once. They also have a project doing https session
>sharing if you are doing https, but it looks stagnant.
>  
>
I don't think i will be creating a whole lot of apache logs, since im
allready recording most hits in my application allready, only
error-logging might be needed.

>If you are running the director on the same machine(s) as a content/real
>server make sure your network parameters are beefy
>http://www.web100.org/
>  
>
Thanks, il try that.

>I used a cronjob to record ipvsadm -l -n --rate every few minutes so I
>could go back and review director/dispatcher reaction to different
>conditions, server outages and re-ups, etc.
>
>And do yourself a favor, set up your ipvs config using a simple web
>server first, one where you can go to a directory and assign an ip and
>port to bind to on the command line. You can isolate any ipvs config
>problems vs apache, etc problems. I use gatling
>http://www.fefe.de/gatling/ , because it screams and I can load test the
>hell out of my ipvs set up to pre-stress everything except apache.
>Example:
>1 director, 2 web servers all 3 are old, $100, P-III 650 MHz, 384 Meg
>RAM Compaq EN's running FC 3 not tuned on a 3com/100mb hub (not switch):
>.21/.22 # cd /test/docs
>.21/.22 # /opt/diet/bin/gatling -V -E -P 4M -F -d -i 10.20.10.80 -p 90
>
>.30 # ipvsadm -l -n
>IP Virtual Server version 1.2.0 (size=4096)
>Prot LocalAddress:Port director Flags
>  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
>TCP  10.20.10.80:90 rr
>  -> 10.20.10.22:90               Route   1      0          0
>  -> 10.20.10.21:90               Route   1      0          0
>
>.31 # nohup ab -n 1000000 -c 120 -v 2 10.20.10.80:90/simple.sh
>Server Software:        Gatling/0.7
>Server Hostname:        10.20.10.80
>Server Port:            90
>
>Document Path:          /simple.sh
>Document Length:        108 bytes
>
>Concurrency Level:      120
>Time taken for tests:   287.195 seconds
>Complete requests:      1000000
>Failed requests:        0
>Broken pipe errors:     0
>Total transferred:      281017422 bytes
>HTML transferred:       108006696 bytes
>Requests per second:    3481.95 [#/sec] (mean)
>Time per request:       34.46 [ms] (mean)
>Time per request:       0.29 [ms] (mean, across all concurrent requests)
>Transfer rate:          978.49 [Kbytes/sec] received
>
>Connnection Times (ms)
>             min  mean[+/-sd] median   max
>Connect:        0    12   99.8      9  9017
>Processing:     1    22    5.9     20   244
>Waiting:        0    21    5.9     20   244
>Total:          1    34  100.2     30  9050
>
>Percentage of the requests served within a certain time (ms)
> 50%     30
> 66%     31
> 75%     33
> 80%     35
> 90%     39
> 95%     42
> 98%     46
> 99%     50
>100%   9050 (last request)
>
>
>  
>
Okay, thanks,

As soon as i have the machines, and have gentoo running on them, il get
back to you guys.

<Prev in Thread] Current Thread [Next in Thread>