> -----Original Message-----
> The problem is as folows: the setup works randomly, fron 15
> mins to 1-2 hours, flawlessly, i might add, serving content
> from both backend machines. However, it randomly stops doing
> that. When that happens, i cannot ping the VIP from the
> outside, only from within the LAN (i have a backup LB, not
> configured yet, i plan to use ultramonkey later on). I
> checked logs, tcpdumped but with no clue as of what is causing this.
> Some input would be really appreciated.
Now I know this is an old message, and this issue has been 'resolved' by not
using LVS-NAT anymore, but recently I had a similar problem.
Let me explain my setup first; I have two loadbalancers, which use wrr to
direct trafic to 5 realservers. A small script on the loadbalancers checks
the realservers periodically and requests some numbers from them. Based on
those numbers the weight of the server is adjusted using 'ipvsadm
--edit-server'.
The setup i described above worked flawlessly for years (well - after an
iptables problem, and after a small patch to the wrr code) until my trafic
could spike so high the loadbalancers were not able to handle it properly.
So we decided to upgrade the loadbalancers with new hardware.
The new hardware runs on a quadcore 64-bits Xeon, while the old had a 32
bits Celeron, so quite an upgrade, and more notable, the new server was able
to process 950 mbit with only 20% cpu time, while the old one was eating up
more than 90% cputime at around 60 mbit.
So we went from a 32 bits OS to a 64 bits OS. We tested the hardware and it
seemed stable, next we put them into production and after several hours they
would crash and would not respond to anything, much like Cristi experienced
before. So we pulled them out and put in the old loadbalancers again and we
started testing a bit more.
After running and writing several program's i got the loadbalancers to crash
finally again but this time in our testing environment. To achieve a crash i
had to generate enough traffic from different ip's and ports through the
ipvs services while running 'ipvsadm --edit-server' on the loadbalancer.
Running the traffic through iptables wouldn't crash the server, nor would
one client ip from different ports bashing the services work.
So i started debugging a lot more and i am still working on it, the problem
being is that the server will freeze totally, so i can't look up anything.
but it seems that changing the weights on the server will make your system
crash if you run it on a 64 bits OS. our 'old' 32 bits environment still
happily changes the values of the servers every couple of seconds without
crashing. So there is somewhere in the code of the ipvsadm program, or in
the kernel code a problem - so i'll keep debugging.
What i want to know is if there is anyone out there with:
1) a 64 bits installation
2) using wrr
3) is changing the weights on the server while the server is getting heavy
traffic from multiple ip:ports
And is experiencing the same problems as i do; a freezing server which needs
a cold reset
For the moment, ill just keep looking at traces to see if i can spot
anything particular, and i hope anyone got a suggestion as to where to look
/ what debugger to use.
-kees
|