Hi,
Sorry for being confusing. You are correct, the BB server is outside the
LVS. The BB client running on the director pings the RIPs using one
interface (eth1) and the BB server using the other interface (eth0). I've
And the load balanced traffic is routed through both or how do you run
the LVS, LVS-DR or LVS-NAT?
seen it hung up on both interfaces - i.e. the problem is not specific to
just one interface, not that it's hung on both interfaces at once.
I doubt it would be specific to any interface anyway.
Ok, so far we have a BB client running on LVS and a server (also used to
display the results) outside the LVS cluster. The BB client hangs while
trying to ping. If you run a BB client on a non-LVS machine, it works.
Exactly.
Well, how long did you test with a client not being on a director? I'm
just asking because further down you mention something like once every
week it fails.
I'm confused by the statement part "... every once in a while ..."
because this is a bit flaky. Either it works or it doesn't, how can it
work sometimes and sometimes not?
That's my question too. I don't see why it ever fails, but the facts are
that runs successfully every 5 minutes, and only once every week or so does
it get stuck on something.
Could check your crontabs to see if there is anything that will generate
a high CPU load when being executed?
I once had a similar problem while performing a monitoring task on a
pool of RS. And since the whole trick when setting up a LVS cluster is
that you have replicable RS, everything was the same for all RS. Now
some brain dead engineer made a crontab entry to call a script which
would push out 500MB worth of logfiles over the front interface (where
the LVS'd packets would normally arrive) to a central loghost server. As
you can imagine, cron is highly accurate and started the cronjob on all
RS at the same time (they were NTP synchronized). And this happened
every day at 2.30 in the night when noone was at the office to fix it.
So, maybe there is something similar going on on your servers and this
happens exactly when the LVS (BB client) would like to ping them.
Nothing changes dynamically. I do run mon to test the status of my real
servers, but actions triggered by mon are not related to ping hanging,
either in time or frequency.
Do you monitor the in/out bytes on your LVS server? Could you probably
check that?
By hand. As far as I can tell, LVS works perfectly before, after, and during
a ping hang. I'm not implying that LVS is at fault for this - it's just that
I can't think of any other factors and was hoping somebody else here had
some ideas.
Hmm, I'm honestly quite clueless as you probably can tell.
Sure, next time I need to clear the table.
Oh, you're running productive? Then don't do it. I also didn't know that
the problem only shows up once a week or so. Nevertheless it is a pain
because it kill statistics and keeps the manager asking you
unconfortable questions.
That works fine whenever I try it by hand. And like I said above, it works
fine for BB most of the time too.
Strange, strange. Could you tell me again, if BB is spawning a ping
process for every RS in parallel or is it spawning a ping for the next
RS after the first one has returned something (serialized monitoring)?
Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc
|