LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: hash table size

To: Joseph Mack <mack.joseph@xxxxxxx>
Subject: Re: hash table size
Cc: Joseph Mack <mack.joseph@xxxxxxxxxxxxxxx>, Roberto Nibali <ratz@xxxxxx>, <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
From: Julian Anastasov <ja@xxxxxx>
Date: Thu, 7 Jun 2001 01:26:59 +0000 (GMT)
        Hello,

On Wed, 6 Jun 2001, Joseph Mack wrote:

> Julian Anastasov wrote:
>
> > > what about returning to a hash table with fixed upper size?
> >
> >         There is nothing to return to.
>
> In the old days, I thought the table size was fixed. At least that's
> what I've been telling everyone in the HOWTO. Here's from the HOWTO
> in the section about setting up the director
>
> http://www.linuxvirtualserver.org/Joseph.Mack/HOWTO/LVS-HOWTO-6.html#ss6.1
>
>
>       The default LVS hash table size (2^12 entries) originally meant
>       2^12 simultanous connections. If you are editing the .config

        Hm, I don't remember such semantic. May be Wensong does :)

>       by hand look for CONFIG_IP_MASQUERADE_VS_TAB_BITS. Each
>       entry (for a connection to a client) takes 128 bytes,
>       2^12 entries requires 512kbytes. If you have 128M spare
>       memory you can have 10^6 entries if you set the table size
>       to 2^20. (Note: not all connections are active - some are waiting to 
> timeout).

        The last 2 lines are wrong. Yes, 128MB are 2^20 entries but
this is not related to the table size. You can achieve the same number
of connections with 256 rows, for example.

>       Early versions of ipvs would crash your machine if you alloted
>       too much memory to this table. This problem has been fixed in 0.9.9.

        Yes, because the bzImage size is too long. And because the users
selected too big value and even the empty table (without linked
connections) can't fit in the available memory.

>       (Note "top" reports memory allocated, not memory you are
>       using. No matter how much memory you have, Linux will eventually
>       allocate all of it as you continue to run the machine and load 
> programs.)


        Here is the picture:

the hash table is an array of double-linked list heads, i.e.

struct list_head *ip_vs_conn_tab;

        In some versions ago ( < 0.9.9? ) it was a static array, i.e.

struct list_head ip_vs_table[IP_VS_TAB_SIZE];


struct list_head is 8 bytes (d-linked list), the next and prev pointers

        In the second variant when IP_VS_TAB_SIZE is selected too high
the kernel crashes on boot. Currently (the first variant),
vmalloc(IP_VS_TAB_SIZE*sizeof(struct list_head)) is used to allocate
the space for the empty hash table for connections. Once the table
is created, more memory is allocated only for connections, not for the
table itself.

        In any case, after boot, before any connections are created,
the occupied memory for this empty table is IP_VS_TAB_SIZE*8 bytes.
For 20 bits this is (2^20)*8 bytes=8MB. When we start to create
connection they are enqueued in one of these 2^20 double-linked
lists after evaluating a hash function. In the ideal case you can
have one connection per row (a dream), so 2^20 connections. When I'm
talking about columns, in this example we have 2^20 rows and
average 1 column used.

        So, you have to fix the HOWTO. The *TAB_BITS define only
the number of rows (the power of 2 is useful to mask the hash function
result with the IP_VS_TAB_SIZE-1 instead of using '%' module operation).
But this is not a limit for the number of connections. When the
value is selected from the user, the real number of connections must
be considered. For example, if you think your site can accept
1,000,000 simultaneous connections, you have to select such number
of hash rows that will spread all connections in short rows. You
can create these 1,000,000 conns with TAB_BITS=1 too but then all
these connections will be linked in two rows and the lookup process
will take too much time to walk 500,000 entries. This lookup is
performed on each received packet.

        The selection of *TAB_BITS is entirely based on the
recommendation to keep the d-linked lists short (less than 20, not
500,000). This will speedup the lookup dramatically.

        So, for our example of 1,000,000 we must select table with
1,000,000/20 rows, i.e. 50,000 rows. In our case the min TAB_BITS
value is 16 (2^16=65536 >= 50000). If we select 15 bits (32768 rows)
we can expect 30 entries in one row (d-linked list) which increases
the average time to access these connections.

        So, the TAB_BITS selection is a compromise between the
memory that will use the empty table and the lookup speed in one
table row. They are orthogonal. More rows => More memory => faster
access. So, for 1,000,000 entries (which is an real limit for 128MB
directors) you don't need more than 16 bits for the conn hash table.
And the space occupied by such empty table is 65536*8=512KBytes.
Bits greater than 16 can speedup the lookup more but we waste too
much memory. And usually we don't achieve 1,000,000 conns with 128MB
directors, some memory is occupied for other things.

        The reason to move to vmalloc-ed buffer is because an 65536-row
table occupies 512KB and if the table is statically defined in the
kernel the boot image is with 512KB longer which is obviously very
bad. So, the new definition is a pointer (4 bytes instead of 512KB
in the bzImage) to the vmalloced area.

> > May be you are talking about
> > a new sysctl var in /proc/.../vs/conn_limit ?
>
> didn't know about this.
>
> I'm looking in 0.9.1-2.4.5 /proc/sys/net/ipv4/vs and don't
> see it (I have amemthresh, timeout*, drop*)

        This is an idea I'm proposing to you if the defense
strategies are not triggered fast enough to follow the incoming
packet rate :)) It is not implemented yet :) It is not hard to do,
the patch will be short:

value 0 => no limit
value > 0 => limit of the number of connections

for example:

echo 300000 > /proc/sys/net/ipv4/vs/conn_limit

=> max 38MBytes for connections

this is a blind DoS prevention :)))

> Will it limit the memory the hash table can use?

        Sure, once implemented :) Of course, it should be coordinated
with the other objects that allocate memory: the processes, etc. And
this is not a limit of the used hash table memory, it is a limit in
the number of connections and so in the memory they allocate. We
allocate memory for the hash table only once, at boot.

> Joe


Regards

--
Julian Anastasov <ja@xxxxxx>



<Prev in Thread] Current Thread [Next in Thread>