Hi Salah, Hi Other LVS People,
I spend quite a lot of time looking into this, on a setup somewhat
similar to yours - Red Hat 7, 2 Gb Memory, Gigabit to the Linux
Director, 5 real servers and 5 clients. But unfortunately no matter how
much I load the system - at one stage I had 3,000,000 inactive
connections - I don't seem to be able to reproduce the problem.
My main suggestions are:
1. Use the kernel-bigmem-2.4.20-28.7.um.1.i686.rpm kernel if you are not
already from UltraMonkey.Org.
2. You might want to try not using ldirectord and just configuring
ipvs manually. Obviously this is not that desirable as if a
real-server fails it won't be detected. But under situations of
very high load I noticed that ldirectord took servers off line
that were actually alive and well - its a scheduling/timeout issue.
While I don't think this would cause the problem you described
it might be worth checking.
As an aside. I did check to see what happens when the system
runs out of memory. Basically, the kernel will kill off user-space
programmes like ldirectord. But LVS will keep functioning fine.
Some new connections may be refused, but existing ones seem fine.
But as you have plenty of memory that is neither here nor there.
I also asked my colleague Kurosawa-san, who has done a lot of
performance testing of LVS in the past. He hadn't seen this kind of
behaviour - where all active connections are dropped - before, so
unfortunately he didn't have any ideas either.