On Mon, 27 Nov 2006, Olle Ö~Vstlund wrote:
Our LVS-backends typically freezes after 5-7 days of normal operation.
These freezes are not system-crashes, but it seems like all new
TCP-connections towards the servers will hang forever. It is impossible
to logon or perform an su (they will hang), but existing sessions will
function fine as long as you don't issue 'critical commands' (commands
which perform a tcp-connection?).
such as loging in remotely, but not from the console...?
The syslogd stops writing the syslog,
etc. Looking at the servers activity using top reveals nothing abnormal
-- there is no swapping, cpu-usage is low, etc.
what about various outputs from ipvsadm on the director?
Anything monotonically increasing there (I know the problem
is on the realservers)?
Anything monotonically increasing on the realservers (eg
look with netstat to see if running out of ports or all
connections in FIN_WAIT)?
The cluster is hosting 14 websites, typically serving 1.5 million
request a normal day. As I said the cluster may run fine for a week and
then suddly the backends freezes. The funny thing is that both backend
usually freezes at roughly the same time.
presumably they've been evenly balanced.
I take it that this has something to do with LVS, ie you
don't get the same behaviour with a bare single server?
The only cure we have come up with so far is to reboot the servers. Once
rebooted a server will run for days again. It has ocationally happed
that the second (frozen) server has recoverd once the first server is
rebooted.
do the realservers talk to each other (eg have a common disk
system)?
Anyone out there having a good idea where to look for clues to what may
be wrong?
not a clue sorry
Joe
--
Joseph Mack NA3T EME(B,D), FM05lw North Carolina
jmack (at) wm7d (dot) net - azimuthal equidistant map
generator at http://www.wm7d.net/azproj.shtml
Homepage http://www.austintek.com/ It's GNU/Linux!
|