On Fri, 2008-10-31 at 16:57 -0700, Robinson, Eric wrote:
> 2. When I asked about the throughput measurement, I meant HOW to obtain
> that measurement. I know we're looking for packets per second, I just
> don't know the best way to obtain that. There's no snmp on this
> computer. All the counters I can find show totals for packets in and
> out, but not pps. I was hoping to get away without showing my ignorance.
> Too late. :0)
Heh :)
ipvsadm -L -n --stats
ipvsadm -L -n --rate
Those two commands will get you pretty far down the road of what you
need in terms of packets/sec, conns/sec and so on. --rate will give you
the instantaneous rate, where --stats will give you counters since this
LVS was started. This is useful for post-processing to get overall
averages.
> 4. The health check interval is 2 seconds. The 60 VS with 2 RS each are
> checking via http. The other 60 VS with 1 RS each check by tcp connect.
...and you also said in reply to Horms:
> It also does the health checking, right? I think the earlier
> suggestion about overlapping requests may have merit.
Given the large number of services and realservers you have, I think
this is the key. Looking at the config file (if I read it correctly) you
have:
60x tomcat services, 2 realservers each == 120 checks
60x MySQL services, 1 realserver each == 60 checks
5x other services, 1 or 2 realservers each == 7 checks
That's a total of 187 checks to be run every two seconds. If we make it
a round 200 (since the maths is then easier!) then you're talking a
maximum latency of 0.01 seconds per check.
It would appear that ldirectord isn't being given a chance to draw
breath. Ever.
Just as a test, what happens if you move the checkinterval out to (say)
5, 10, 20 or 30 seconds? Can you tolerate that level of pause if
something happens to a realserver?
Graeme
|