LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: ping hanging?

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: ping hanging?
From: Roberto Nibali <ratz@xxxxxxxxxxxx>
Date: Fri, 20 Sep 2002 00:29:38 +0200
Hi,

Sorry for being confusing. You are correct, the BB server is outside the
LVS. The BB client running on the director pings the RIPs using one
interface (eth1) and the BB server using the other interface (eth0). I've

And the load balanced traffic is routed through both or how do you run the LVS, LVS-DR or LVS-NAT?

seen it hung up on both interfaces - i.e. the problem is not specific to
just one interface, not that it's hung on both interfaces at once.

I doubt it would be specific to any interface anyway.

Ok, so far we have a BB client running on LVS and a server (also used to
display the results) outside the LVS cluster. The BB client hangs while
trying to ping. If you run a BB client on a non-LVS machine, it works.
Exactly.

Well, how long did you test with a client not being on a director? I'm just asking because further down you mention something like once every week it fails.

I'm confused by the statement part "... every once in a while ..."
because this is a bit flaky. Either it works or it doesn't, how can it
work sometimes and sometimes not?


That's my question too. I don't see why it ever fails, but the facts are
that runs successfully every 5 minutes, and only once every week or so does
it get stuck on something.

Could check your crontabs to see if there is anything that will generate a high CPU load when being executed?

I once had a similar problem while performing a monitoring task on a pool of RS. And since the whole trick when setting up a LVS cluster is that you have replicable RS, everything was the same for all RS. Now some brain dead engineer made a crontab entry to call a script which would push out 500MB worth of logfiles over the front interface (where the LVS'd packets would normally arrive) to a central loghost server. As you can imagine, cron is highly accurate and started the cronjob on all RS at the same time (they were NTP synchronized). And this happened every day at 2.30 in the night when noone was at the office to fix it.

So, maybe there is something similar going on on your servers and this happens exactly when the LVS (BB client) would like to ping them.

Nothing changes dynamically. I do run mon to test the status of my real
servers, but actions triggered by mon are not related to ping hanging,
either in time or frequency.

Do you monitor the in/out bytes on your LVS server? Could you probably check that?

By hand. As far as I can tell, LVS works perfectly before, after, and during
a ping hang. I'm not implying that LVS is at fault for this - it's just that
I can't think of any other factors and was hoping somebody else here had
some ideas.

Hmm, I'm honestly quite clueless as you probably can tell.

Sure, next time I need to clear the table.

Oh, you're running productive? Then don't do it. I also didn't know that the problem only shows up once a week or so. Nevertheless it is a pain because it kill statistics and keeps the manager asking you unconfortable questions.

That works fine whenever I try it by hand. And like I said above, it works
fine for BB most of the time too.

Strange, strange. Could you tell me again, if BB is spawning a ping process for every RS in parallel or is it spawning a ping for the next RS after the first one has returned something (serialized monitoring)?

Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc



<Prev in Thread] Current Thread [Next in Thread>