On Fri, Oct 31, 2008 at 10:52:13PM +0000, Graeme Fowler wrote:
> On Fri, 2008-10-31 at 14:57 -0700, Robinson, Eric wrote:
> > > What sort of packet throughput are you getting?
> > How would you like that measured?
> Packets/sec in and packets/sec out on the director is usually a good
> bet :)
> > > Are you using LVS-DR or LVS-NAT?
> > LVS-NAT
> Right... NAT makes the CPU work harder than DR because, well, it's doing
> more work. If that isn't self-evident, say so, and I'll explain further.
> > Aside from running heartbeat and ldirectord with 100+ virtual servers,
> > not too much. Here's the output from top:
> > top - 13:43:47 up 81 days, 9:12, 1 user, load average: 1.40, 1.42,
> > 1.38
> > Tasks: 60 total, 1 running, 59 sleeping, 0 stopped, 0 zombie
> > Cpu(s): 46.8% us, 3.0% sy, 0.0% ni, 48.8% id, 0.0% wa, 1.3% hi,
> > 0.0% si
> > Mem: 516304k total, 506348k used, 9956k free, 45448k buffers
> > Swap: 1048568k total, 4k used, 1048564k free, 369656k cached
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > 2762 root 17 0 13708 9884 1744 S 50.4 1.9 13386:29 ldirectord
> Whoa there, horsey!
> 81 days uptime is 116640 minutes; that means ldirectord has consumed >
> 10% of the CPU in the time the server's been up. What's the health check
> interval here?
> With (say) 100 virtual servers, 2 realservers each, an interval of 10
> seconds means 200 checks every ten seconds (nominally). Assuming a 0.1
> second latency for each check, you're talking overlapping checks there
> so a given check thread is only half way through running when it starts
> I can see some tuning being required here - or trying to make ldirectord
> fork and thread correctly (if it doesn't already). Horms, can you
> comment here?
It seems odd to me that Ldirectord would take up so much CPU,
its primarily either a) sending small amounts of data and waiting
for a reply or b) sleeping. So if it is consuming lots of CPU
I suspect a bug, probably in one of the checks (or more specifically
one of the modules that is used for one of the checks). There have
been problems with the HTTPS check leaking memory in the past, so I
would start by seeing if that is the culprit.
In answer to the multi-threading question - no ldirectord is not
multi-threaded, though you can split your configuration up into
multiple configuration files and run multiple instances of ldirectord.
I can handle the forking for you, or you can do it manually.
Somewhere in the thread it was suggested that you could split your
configuration up so that you have one ldirectord process per virtual
service as a means of attempting to narrow down the problem. I think that
this is a good idea.
A long time ago there was an effort to use non-blocking IO to allow
ldirectord to run multiple checks in parallel in a single processes.
However the code (in the supporting modules) did not work well.
The primary motivation for parallelising ldirectord either within
a single process or with multiple processes is usually to minimise
the delays inherent in running checks serially. This would actually
result in increased CPU usage - as it would be doing more work in
a given space of time.
With regards to LVS, it is almost certainly not the cause of ldirectord
taking up 50% of CPU. ldirectord only configures LVS. And this is done by
forking an ipvsadm process. So if there was a problem with ldirectord
configuring LVS, it should show up as ipvsadm processes consuming lots of
resources. (Although I guess it is possible that ldirectord is having
trouble forking ipvsadm.)
As Joe stated elsewhere in this thread, LVS is part of the Linux
kernel which is able to use multiple CPUs. This is a fairly complex topic
and how well it can utilise multiple CPUs involves many issues including
how many NICs are involved.
VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en