On Sat, Nov 01, 2008 at 08:53:51AM +0000, Graeme Fowler wrote:
> On Fri, 2008-10-31 at 16:57 -0700, Robinson, Eric wrote:
> > 2. When I asked about the throughput measurement, I meant HOW to obtain
> > that measurement. I know we're looking for packets per second, I just
> > don't know the best way to obtain that. There's no snmp on this
> > computer. All the counters I can find show totals for packets in and
> > out, but not pps. I was hoping to get away without showing my ignorance.
> > Too late. :0)
>
> Heh :)
>
> ipvsadm -L -n --stats
> ipvsadm -L -n --rate
>
> Those two commands will get you pretty far down the road of what you
> need in terms of packets/sec, conns/sec and so on. --rate will give you
> the instantaneous rate, where --stats will give you counters since this
> LVS was started. This is useful for post-processing to get overall
> averages.
>
> > 4. The health check interval is 2 seconds. The 60 VS with 2 RS each are
> > checking via http. The other 60 VS with 1 RS each check by tcp connect.
>
> ...and you also said in reply to Horms:
>
> > It also does the health checking, right?
Yes. It does health checking and configures LVS accordingly.
> > I think the earlier
> > suggestion about overlapping requests may have merit.
I'm not entirely sure what overlapping means here, but a single
ldirectord process runs checks in series. One runs until it
finishes or times out, then the next one. There is no parallisation.
> Given the large number of services and realservers you have, I think
> this is the key. Looking at the config file (if I read it correctly) you
> have:
>
> 60x tomcat services, 2 realservers each == 120 checks
> 60x MySQL services, 1 realserver each == 60 checks
> 5x other services, 1 or 2 realservers each == 7 checks
>
> That's a total of 187 checks to be run every two seconds. If we make it
> a round 200 (since the maths is then easier!) then you're talking a
> maximum latency of 0.01 seconds per check.
>
> It would appear that ldirectord isn't being given a chance to draw
> breath. Ever.
Ldirectord isn't that smart. Each ldirectord process just sits in a loop
that looks a bit like this
while (1) {
run check 1 and wait for it to either succeed or time-out;
run check 2 and wait for it to either succeed or time-out;
...
run check n and wait for it to either succeed or time-out;
if configuration file has changed
if $AUTOCHECK is set
re-read configuration file;
else
sleep $CHECKINTERVAL;
}
So unless something odd is happening with the configuration file,
it should always get a chance to take a breath for $CHECKINTERVAL seconds.
> Just as a test, what happens if you move the checkinterval out to (say)
> 5, 10, 20 or 30 seconds? Can you tolerate that level of pause if
> something happens to a realserver?
--
Simon Horman
VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en
|