LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: No buffer space available

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: No buffer space available
Cc: 'Peter Mueller' <pmueller@xxxxxxxxxxxx>, Jeremy Kusnetz <JKusnetz@xxxxxxxx>
From: Roberto Nibali <ratz@xxxxxxxxxxxx>
Date: Wed, 02 Oct 2002 12:36:48 +0200
Hmm, found this following thread:
http://marc.theaimsgroup.com/?l=apache-modssl&m=103336922830201&w=2

MaxKeepAliveRequests from 100 to 20 is not a bad advice and also you need to down the max forked child processes. I don't recall the parameters by heart and I'm to lazy to delve into the httpd.conf syntax right now.

This person foundout he was removing his mutex file.  my ssl mutex file is
there on both the cluster and the other webserver, so that's not the
problem, but my symptoms are exactly the same.  I can't belive I can't find
more info out there.

We need the signature of the worm, if it is one and then we can create a u32 filter to rate limit it.

Any ideas?  Especially the rate limiting stuff?

Yes, ideas are there but they are pretty complex. You have two choices:

o iptables pattern matching and then marking the packets with a fwmark.
  set up a ingress policy to rate limit or even drop packets with the
  fwmark
o Do the pattern matching with a u32 selector and put it into a queue to
  reorder priority.

The latter is certainly the best choice but also rather difficult to implement (it means tedious work with a hex editor on TCP/SSL payloads).

I know that before the /proc tuning, I would get the buffer space issues at
the same time as the worm attack.  I think the /proc tuning prevented LVS
from having problems this time.  But the outage is even worse with MON

Not, the proc-fs tuning prevented the node from not having enough space in the routing cache bucket to fill in another entry. LVS is pretty innocent to all this ;).

running, because it will actually remove the RIP, and take a little while to
notice that apache is responding again.  Maybe I can play around with the
mon settings.

Yes, only take the server out, if:

o at least two consecutive tests with timeout of 5 seconds and 3 seconds
  threshold setup time will return a miss.
o you get an immediate RST from the socket.


RIP http healthcheck flow:
--------------------------
test1(return a miss after 5 seconds) -----> 3 second relaxing -----> test2(return a miss after 5 seconds) -----> down -----> take out RIP,
otherwise leave it in.

Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc



<Prev in Thread] Current Thread [Next in Thread>