LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: No buffer space available

To: Jeremy Kusnetz <JKusnetz@xxxxxxxx>
Subject: Re: No buffer space available
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Roberto Nibali <ratz@xxxxxxxxxxxx>
Date: Mon, 30 Sep 2002 16:44:13 +0200
Hello Jeremy,

First of all, would you mind not inlining a whole posting when replying unless you refer to some specific part of the text? It makes it easier to read postings, thank you.

I've setup a script to run which calls all of your requested commands, plus
dmesg.  I'll show you what I get now when I'm not experiencing problems.  I
shut down the cron job that brings eth2 down and up, and hopefully I'll get
the problem before the day is over.

I think I've already found your problems.

When director goes down, heartbeat tells director2 to bring up all it's eth0
interfaces, sends out arps for them, and also changes it's eth1 address to
10.75.0.1, so it's now the DGW for the realservers.  This seems to work
fine, and it's been getting a real workout for awhile now. :)

Ok, thanks for the explanation, if only every poster here with problems would be as detailed and specific as you are ... ;)

We have another web server on the same network as the director.  Twice
during the director's problems I got alerts that the the other webserver was
down too.  I logged into that box and saw a bunch of httpd processes
running, a lot more then normal.  Looking at the apache log files I saw
there were a bunch of SSL handshake errors.  This sounds like the new apache
mod_ssl worm that's out there.  All of our openssls have been upgraded and
mod_ssl/apache recompiled.  I think it may have been infected servers
hitting our servers trying to figure out if we were exploitable.

Ok.

Another thing I found during these problems through ntop is sometimes a huge
spike of mail will come in.  I think these are spammers doing dictionary
attacts on us.  We have a few tens of thousand email accounts on the
realservers, so you can imagine the spam that comes into our network.

I hope you know how to start proper countermeasures against these 'attacks'

The thing is, I've seen those two issues durring the buffer space problem,
but not all of the time.  Sometimes one, sometimes the other, sometimes
neither.  Don't know if it's coincident or not.

I think it has to do with the net_ratelimit() or the gc_treshold. See further below.


4: eth2: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
    link/ether 00:01:03:e4:4b:93 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast 4827139 25897 0 0 0 0 RX errors: length crc frame fifo missed 0 0 0 25 0
                                         ^^^^
                                   not much, but still.
[deleted the IP addresses for now]

-------------------------------------------------
cat /proc/net/softnet_stat
0163073a 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00007cff
0162c847 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00007c00

Ok

-------------------------------------------------
cat /proc/net/rt_cache_stat
00000b2d  0138e91f 0021bb47 00000000 00000000 00000441 00000000 00000025
00368c6a 000a014c 00000444 000a03fe 0009fe0c 0000006c 00000000 00000b2d 0138ae70 0021b50c 00000000 00000000 00000414 00000000 00000020 003697a3 0009e898 000004bd 000a033e 0009fcf4 00000061 00000000

Ok

cat /proc/slabinfo
slabinfo - version: 1.1 (SMP)
kmem_cache            80     80    244    5    5    1 :  252  126
ip_conntrack        1884   3806    352  275  346    1 :  124   62
ip_fib_hash          339    339     32    3    3    1 :  252  126
ip_vs_conn          8322  11850    128  290  395    1 :  252  126
tcp_tw_bucket        420    420    128   14   14    1 :  252  126
tcp_bind_bucket      326    452     32    4    4    1 :  252  126
tcp_open_request     280    280     96    7    7    1 :  252  126
inet_peer_cache      408   1416     64   24   24    1 :  252  126


ip_dst_cache        3074   6980    192  254  349    1 :  252  126
arp_cache           1044   1170    128   39   39    1 :  252  126

Aha, might get full soon.

blkdev_requests      400    400     96   10   10    1 :  252  126
nfs_write_data       132    132    352   12   12    1 :  124   62
nfs_read_data        132    132    352   12   12    1 :  124   62
nfs_page             280    280     96    7    7    1 :  252  126
journal_head         324   2340     48    7   30    1 :  252  126
revoke_table         126    253     12    1    1    1 :  252  126
revoke_record        226    226     32    2    2    1 :  252  126
dnotify cache          0      0     20    0    0    1 :  252  126
file lock cache      126    126     92    3    3    1 :  252  126
fasync cache           0      0     16    0    0    1 :  252  126
uid_cache            226    226     32    2    2    1 :  252  126
skbuff_head_cache    582    960    192   30   48    1 :  252  126
sock                 184    184    928   46   46    1 :  124   62
sigqueue             261    261    132    9    9    1 :  252  126
cdev_cache          1239   1239     64   21   21    1 :  252  126
bdev_cache           118    118     64    2    2    1 :  252  126
mnt_cache            118    118     64    2    2    1 :  252  126

---------------------
inode_cache       114100 114100    512 16300 16300    1 :  124   62
dentry_cache      116370 116370    128 3879 3879    1 :  252  126

Jeez' what the hell are you running on this box?

192.168.0.128 sent an invalid ICMP error to a broadcast.
192.168.0.128 sent an invalid ICMP error to a broadcast.
Neighbour table overflow.
Neighbour table overflow.
Neighbour table overflow.

Ok, try following

echo "4096" > /proc/sys/net/ipv4/neigh/default/gc_thresh3

and try to ping again and check dmesg.

192.168.0.128 sent an invalid ICMP error to a broadcast.
IPVS: incoming ICMP: failed checksum from 65.113.143.64!

:) Julian, look at that!

Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc



<Prev in Thread] Current Thread [Next in Thread>