Yo!
We've had LVS machines dying a couple of times when the service is using
the wrr scheduler and keepalived pulls all real servers from behind the
service IP.
The symptoms are that there are a lot of (thousands, apparently for
every packet?) messages in syslog:
ip_vs_wrr_schedule(): no available servers
After which the machine hangs. I don't recall if i've had to boot it
manually or if it boots by itself.
Also, I'm not sure if it is that message that is killing the machine,
but the problem hasn't occured with other schedulers (that don't print
such a message). We use wrr the most though.
I think we should either remove the message or ratelimit it (unless the
bug is somewhere else). I tested the patch and it seems to be ok, but as
I'm unable to reproduce the hanging/crashing in test environment, I
can't verify wether it actually helps.
Siim
--- linux-2.6.24/net/ipv4/ipvs/ip_vs_wrr.c 2008-01-24 22:58:37.000000000
+0000
+++ linux-2.6.24-ipvs_patches/net/ipv4/ipvs/ip_vs_wrr.c 2008-05-06
16:17:17.790662800 +0000
@@ -169,7 +169,7 @@
*/
if (mark->cw == 0) {
mark->cl = &svc->destinations;
- IP_VS_INFO("ip_vs_wrr_schedule(): "
+ IP_VS_DBG_RL("ip_vs_wrr_schedule(): "
"no available servers\n");
dest = NULL;
goto out;
|