On Mon, Nov 03, 2008 at 10:08:51AM +1100, Simon Horman wrote:
> On Sat, Nov 01, 2008 at 02:25:54PM +0000, Graeme Fowler wrote:
> > > Which raises a question about LVS. Could it get confused with multiple
> > > ldirectord instances constantly forking ipvsadm?
> > As long as they are managing discrete pools of virtual & real servers,
> > then no I don't think it will *unless* you hit the problem someone else
> > reported very recently where realservers seem to migrate between
> > virtuals at random. Horms was going to try to work on that, but it might
> > be tricky to isolate.
> ldirectord (or any other code that manages LVS from user-space) may get
> confused, if one process is reading things and another is changing things
> for the same virtual server - though as Graeme says, if they are managing
> discrete pools this should not be a problem, with the caveat that there
> seems to be a bug in that code in ldirectord.
> It is not possible to confuse LVS itself (unless there is a bug I don't
> know about). It just does what it is configured to do. And it uses locking to
> ensure that only one user-space process can change things at a time. So
> even if user-space is making multiple changes simultaneously (on multiple
> processors or cores) to the same real server in the same virtual service,
> the LVS kernel code will serialise these changes and something sensible
> should result - albeit perhaps not what the multiple user-space processes
> were expecting.
> In other words, LVS serialises changes from user-space.
> > For such a large number of realservers I think you may need to get
> > creative with your healthchecking. You could use the "checkcommand"
> > setting to ldirectord to read a value from a file which is kept updated
> > by some other script which can check in parallel. Unfortunately I can't
> > pull one of those out of a hat right now... :)
> Yes, I agree that some sort of creativity is in order.
> I did some work on making ldirectord more scalable, but that was a long
> time ago, and for a somewhat different scenario. The main outcome of that
> work was fwmark support in both LVS and ldirectord, which allowed many
> virtual services with the same real servers to be aggregated.
> > Thinking about it laterally, how does something like Nagios cope with a
> > very large number of service checks? It does them in parallel, by
> > running multiple threads. So does OpenNMS, and Zabbix, and in fact
> > pretty much every one of the decent (fsvo "decent") NMS apps I've ever
> > used.
> > Making ldirectord threaded and parallel however isn't likely to start
> > working straight away! Anyone fancy a stab at that?
> As ldirectord is written in Perl, doing non-blocking IO to parallelise
> things is difficult - or more to the point, appeared to not work the last
> time it was tried. I believe that keepalived, which is written in C, has an
> easier time here.
> On the other hand ldirectord does have a forking option, which parallelises
> things by forking a process for each virtual service. Though now
> that I think about it, it might be better if it used a pool of processes,
> if you have 50 virtual services it will try and fork 50 processes for
> each iteration of the main loop!
I missread the code, the processes should only be forked on startup,
and then re-forked if they die. Not forked for each iterration
of the main loop. But still, a pool might be a good idea, albeit
more complex than the current code.
> It also allows you to split up the configuration file manually and fork
> at that granularity at start-up.
VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en