LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

RE: No buffer space available

To: 'Peter Mueller' <pmueller@xxxxxxxxxxxx>, "''lvs-users@xxxxxxxxxxxxxxxxxxxxxx' '" <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: RE: No buffer space available
From: Jeremy Kusnetz <JKusnetz@xxxxxxxx>
Date: Mon, 30 Sep 2002 10:37:19 -0400
I posted a detailed response to Roberto.  Unfortunitely the response was
over 40K, so I'm waiting for the moderators to post it for me.

Other things running on this box:

Besides LVS, I'm running MON, used to bring up and down various LVS
connections to the realservers incase they go down.

Heartbeat is running so I can failover to the secondary director if the
primary goes down (which is happening a lot lately :)

I'm also running stunnel, which allows some encrypted rsyncing, and sql
queries through to the realserver network.

I'm also running snmpd to graph out ethernet connections, and various other
statistics.

-----Original Message-----
From: Peter Mueller [mailto:pmueller@xxxxxxxxxxxx]
Sent: Monday, September 30, 2002 2:54 AM
To: Jeremy Kusnetz; ''lvs-users@xxxxxxxxxxxxxxxxxxxxxx' '
Subject: RE: No buffer space available


what else is running on the box?  since this sounds like an event that
happens regularly is it possible for you to get some kernel.info & debugging
messages for us?  I think vmstat & top & summaries of ntop or tcpdump might
be interesting.

-----Original Message-----
From: Jeremy Kusnetz
To: 'lvs-users@xxxxxxxxxxxxxxxxxxxxxx'
Cc: Jeremy Kusnetz
Sent: 9/29/2002 4:34 PM
Subject: No buffer space available

Not sure if this an LVS problem, but maybe someone here is smart enough
to
at least get me looking in the right direction.

My primary director box had been running kernel 2.4.7 with the LVS
version
that was current at the time 2.4.7 came out.  It was running perfectly
stable for at least 6 months, never a single problem.

I'm running LVS-NAT.  There are 53 VIPs on the box, pointing to 6
realservers.  Each VIP forwards, mail, pop, dns, radius, http, https to
virtual interfaces running on the realservers.

The director consists of a SMP PIII 1gig, with 512 ram.  Two built in
Intel
nics, eth0 has the 53 VIPs, eth1 is the gateway to the realservers.
Then
there is a 3com nic which is running heartbeat with our secondary
director.

About a month ago I got an alert that the primary director was down.  I
logged in, and it was up, I could ping out, I could ping the RIPs, but I
couldn't ping any of it's own interfaces, even the loopback.  (the
realservers could ping those interfaces on the director) Pinging these
interfaces from the director gave me  the following error: 

ping: sendto: No buffer space available

A reboot fixed the problem.

This problem started happening more and more frequently, until it was
happening at least once a day, usually in the middle of the night.

I figured it was time to upgrade the kernel and LVS version.  I upgraded
to
2.4.19 and LVS 1.0.6.  I hoped this would fix the problem, but it did
not.

Next I swapped out all the hardware, everything but the drives, and the
cables.  This box had the same amount of memory, but slightly slower
CPUs,
800mhz, but I figured even those are probably overkill.

This did not help either.

I changed the driver for the intel NICs from eepro100, to the latest
e100
from intel.  I've always had problems with eepro100 drivers, but when I
was
running the old version of LVS, it had problems with the e100 drivers.
But
now with the latest version, e100 seems to work.

But alas this did not fix the problem either.

The only thing that had changed before I started upgrading everything
was
the amount of VIPs on the director.  A couple of realservers had been
added
to the mix too, along with more RIPs.  We had also added some iptable
rules
to drop SMTP connections from some some external IPs that were really
bad
spammers.  This list grew to about 50 chains.

After changing drivers to e100, and it not fixing the problem, I changed
the
iptable rules to reject the packets instead of dropping them.  This had
a
slight change to the symptoms.  Instead of not being able to ping any of
it's own interfaces on the director, I can no longer ping random RIPs,
to
the point where I start losing services because the LVS can't forward
connections to those IPs.

I've now removed the iptable rules completely, hoping that was the
cause.
Didn't help.

The only thing I can do besides rebooting that helps is I can bring down
and
up eth2, which is the heartbeat interface.  Sometimes that will clear
the
problem for a few hours or longer, but sometime it will only clear up
the
problem for a few seconds.  I've now reverted to a cron job that brings
down
and up this interface once a minute.  It sort of helps, but I still end
up
having to reboot.  Bringing up and down the loopback sometimes works too
for
a few seconds.  I tried bring up and down the eth0 and eth1, but that
didn't
seem to have any affect.

Is there some sort of tuning I need to do to the /proc file system.  I
have
no idea what else to do, I've upgraded everything I can think of.
Apparently it's not an issue of buggy software.  Please help, being
paged at
all hours of the night for the past few weeks is getting really old!  If
you
can't help, what would be a good list to post this question to?


<Prev in Thread] Current Thread [Next in Thread>