LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: No buffer space available

To: Peter Mueller <pmueller@xxxxxxxxxxxx>
Subject: Re: No buffer space available
Cc: "'lvs-users@xxxxxxxxxxxxxxxxxxxxxx '" <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>, 'Jeremy Kusnetz ' <JKusnetz@xxxxxxxx>
From: Roberto Nibali <ratz@xxxxxxxxxxxx>
Date: Mon, 30 Sep 2002 21:42:55 +0200
Hi Peter,

Peter Mueller wrote:
You guys have been busy.  I'm glad I got to sleep ;)

Good for you, then you can take over then because I'm busy conducting high speed packet filtering tests and hacking procps :)

Ratz here is choice cuts from IPaddr.  Someday you will have to mail me an

Ouch, who wrote that? And why is this person not using ip? :)

explanation of why /proc/slabinfo is so useful and uh.. what it is.  It
makes me want to BBQ.

Ok, since I've had a hard time reading the code and then finding out that slabinfo(5) is just about what I wanted, I give it a whirl:

Let's start with the man page:

DESCRIPTION
       Frequently used objects in the Linux kernel (buffer heads,
       inodes, dentries, etc.)  have their own  cache.  The  file
       /proc/slabinfo gives statistics. For example:

              % cat /proc/slabinfo
              slabinfo - version: 1.1
              kmem_cache            60     78    100    2    2    1
              blkdev_requests     5120   5120     96  128  128    1
              mnt_cache             20     40     96    1    1    1
              inode_cache         7005  14792    480 1598 1849    1
              dentry_cache        5469   5880    128  183  196    1
              filp                 726    760     96   19   19    1
              buffer_head        67131  71240     96 1776 1781    1
              vm_area_struct      1204   1652     64   23   28    1
              ...
              size-8192              1     17   8192    1   17    2
              size-4096             41     73   4096   41   73    1
              ...

       For  each  slab  cache, the cache name, the number of cur-
       rently active  objects,  the  total  number  of  available
       objects,  the  size of each object in bytes, the number of
       pages with at least one active object, the total number of
       allocated  pages,  and  the  number  of pages per slab are
       given.

Ok, what does it mean for us easy people? Let's say I would like to know the memory usage of a nifty new packet filter tool like nf-hipac (www.hipac.org). I suspect a packet filter rule entry to be 64 bytes because I've read the struct my_cool_fw_packet {};. Good so what do I do? I load a few rules, let's say 1000 and check the before and the after status:

bloodyhell:/var/FWTEST/nf-hipac # egrep "size-64 |size-128 |size-256 " < /proc/slabinfo
size-256               7     15    256    1    1    1
size-128             467    510    128   16   17    1
size-64              123   4130     64    3   70    1
bloodyhell:/var/FWTEST/nf-hipac # cat hipac.rules_1000 | sh
bloodyhell:/var/FWTEST/nf-hipac # egrep "size-64 |size-128 |size-256 " < /proc/slabinfo
size-256               7     15    256    1    1    1
size-128             467    510    128   16   17    1
size-64             2123   4130     64   36   70    1
bloodyhell:/var/FWTEST/nf-hipac #

Ok, as you can see, the amount of size-64 slabs (cache objects with size 64 bytes has luckily increased from 123 to 2123. Hmmm, strange, let's do it again:

bloodyhell:/var/FWTEST/nf-hipac # egrep "size-64 |size-128 |size-256 " < /proc/slabinfo
size-256               7     15    256    1    1    1
size-128             467    510    128   16   17    1
size-64             2123   4130     64   36   70    1
bloodyhell:/var/FWTEST/nf-hipac # cat hipac.rules_1000 | sh
bloodyhell:/var/FWTEST/nf-hipac # egrep "size-64 |size-128 |size-256 " < /proc/slabinfo
size-256               7     15    256    1    1    1
size-128             467    510    128   16   17    1
size-64             3123   4130     64   53   70    1
bloodyhell:/var/FWTEST/nf-hipac #

Aha, now we have the 1000 slab objects like predicted. Why were it 2000 the first time? After checking the source I found out that they store the entries in a btree structure. This is of course very nice and for the first 1000 entries they needed one leaf each thus needed to allocate the double amount of memory.

Not let's check if the implementation has some obvious kfree() bug :)

bloodyhell:/var/FWTEST/nf-hipac # egrep "size-64 |size-128 |size-256 " < /proc/slabinfo
size-256               7     15    256    1    1    1
size-128             467    510    128   16   17    1
size-64             3123   4130     64   53   70    1
bloodyhell:/var/FWTEST/nf-hipac # nf-hipac -F
bloodyhell:/var/FWTEST/nf-hipac # egrep "size-64 |size-128 |size-256 " < /proc/slabinfo
size-256               7     15    256    1    1    1
size-128             467    510    128   16   17    1
size-64              123   4130     64    3   70    1
bloodyhell:/var/FWTEST/nf-hipac #

Nope, it doesn't look like. So I'm already very pleased with the implementation. As you can see the /proc/slabinfo shows you the details of cached objects in the kernel. If you multiply the second with the fourth column per row and sum it up, you get the complete memory usage of the kernel in your system.

To put a header above the /proc/slabinfo for a second:

      1)               2)     3)     4)   5)   6)   7)
------------------------------------------------------
[...]
kmem_cache            58     72    108    2    2    1
ip_fib_hash           13    113     32    1    1    1
tcp_tw_bucket          0     40     96    0    1    1
[...]


1) the name of the cached object
2) the amount of active objects
3) the total amount of objects
4) size of an object in bytes
5) the number of pages (a page=4kb on x86) with at least on active obj.
6) the total number if allocated pages (a page=4kb on x86)
7) the number of pages per slab

A page you get with '__get_free_pages(gfp_mask, order);', where

gfp_mask =
  o GFP_ATOMIC: __GFP_WAIT=0 && __GFP_IO=0
  o GFP_KERNEL: __GFP_WAIT=1 && __GFP_IO=0
  o GFP_USER  : __GFP_WAIT=1 && __GFP_IO=1

__GFP_WAIT == 1: the kernel is allowed to discard contents of page
                 frames in order to free memory
__GFP_IO   == 1: the kernel is allowed to write pages to disk in order
                 to free corresponding page frames.

order    = this is the power of 2 of the amount of pages that need to be
           allocated. Example: if order=2, then 2^2 (2**2) = 4 pages
           will be requested, each with the size of 4kb on x86.

Maybe you've seen messages in the kernel like '0-order allocation of ... failed'. Those are such messages when get_free_page doesn't succeed.

I hope this helps you understanding it a little bit better.

IFCONFIG=/sbin/ifconfig
ROUTE=/sbin/route
SENDARP=$HA_BIN/send_arp
FINDIF=$HA_BIN/findif
USAGE="usage: $0 ip-address {start|stop|status}";
  IP=/sbin/ip

find_interface() {
  ^^^^^^^^^^^^^^^^^^
can't work, probably you cut too less :). Where is this function? I need it.

ip_stop() {
  BASEIP=`echo $1 | sed s'%/.*%%'`

    BASEIP="$1"

  IF=`find_interface $BASEIP`

    IF=`find_interface ${BASEIP%/*}`

  if
    [ -z "$IF" ]
  then
    : Requested interface not in use
    exit 0
  fi

  if
    [ -x $HA_RCDIR/local_giveip ]
  then
    $HA_RCDIR/local_giveip $*
  fi

Ok.

  $ROUTE del -host $BASEIP

Why? Drop that thing.

  $IFCONFIG $IF down

    ${IP} link set dev ${IF} down

  ha_log "info: IP Address $BASEIP released"

    That's actually not what the above command did!

}
ip_start() {
#
#       Do we already service this IP address?
#
  if
    $IFCONFIG | grep "inet addr:$1 " >/dev/null 2>&1

WTF!! Please, whoever wrote this, what about consulting the ifconfig page? This only shows interfaces which have link state up. You can have interfaces with link state down && a defined IP address.

Better:
    BASEIP="$1"
    IF=$(find_interface ${BASEIP%/*})
    tmp=$(${IP} addr show to "${BASEIP}" dev ${IF})

  then
    exit 0      # We already own this IP address
  fi

  if
    IFINFO=`find_free_interface $1`

Now what is this? There are only but the instantiated physical interfaces free. I suspect the author counts eth0:1 as an interface too. Just bloody take one?

  then
    : OK got interface [$IFINFO] for $1
  else
    exit 1
  fi

Drop it.

  IF=`echo "$IFINFO" | cut -f1`
  IFEXTRA=`echo "$IFINFO" | cut -f2-`
  BASEIP=`echo $1 | sed s'%/.*%%'`

Inconsistent programming, why is BASEIP evaluated so late while in stop it is evaluated the first?

  if
    [ -x $HA_RCDIR/local_takeip ]
  then
    $HA_RCDIR/local_takeip $*
  fi

  ha_log "info: ifconfig $IF $BASEIP $IFEXTRA"
  $IFCONFIG $IF $BASEIP $IFEXTRA

    ${IP} addr add ${BASEIP} brd + dev ${IF%:*} label ${IF}
    ${IP} link set dev ${IF%:*} up

  $ROUTE add -host $BASEIP dev $IF

Not needed unless you run a 2.0.x kernel!

  TARGET_INTERFACE=`echo $IF | sed 's%:.*%%'`

  MACADDR=$($IFCONFIG $TARGET_INTERFACE  | \
      fgrep $TARGET_INTERFACE | \
      sed \
      's/^.*HWaddr
\(..\):\(..\):\(..\):\(..\):\(..\):\(..\).*$/\1\2\3\4\5\6/')

  if [ "${MACADDR:=NULL}" = "NULL" ]; then
      ha_log "ERROR: Could not locate obtain hardware address for
$TARGET_INTERFACE"
  fi

  ha_log "info: Sending Gratuitous Arp for $BASEIP on $IF
[$TARGET_INTERFACE]"

Unfixable but I guess it works.

  for j in 1 2 3 4 5
  do
   $SENDARP $TARGET_INTERFACE ${BASEIP} ${MACADDR} ${BASEIP} ffffffffffff \
     || ha_log "ERROR: Could not send gratuitous arp"
     sleep 2
  done &
}

Ok, this might work.

Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc



<Prev in Thread] Current Thread [Next in Thread>