LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: LVS performance bug

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: LVS performance bug
From: Roberto Nibali <ratz@xxxxxxxxxxxx>
Date: Fri, 16 Mar 2007 09:31:45 +0100
Hi Graeme,

Thanks for analysis, it's very much appreciated, especially since most of use here do not have much time left at the moment.

So the only thing I see shooting up higher in memory used is
buffers/cache used seems to grow. But in the slabinfo the ip_vs_conn
active objects grows fast. I watched it grow during the test from 39K
objects to over 2 million objects. Maybe something isn't being reset or
returned to the pool. We are running the OPS patch(one packet
scheduling) because we are using LVS for the udp service DNS. I'm sure
it treats connections differently than the regularly hashed connections
thing.

Aside from OPS, is this a relatively stock kernel (or a distributed
one); ie. not custom compiled by you? I'm going to have a pitch at
something slightly out of my normal range here... I'm wondering if the
conns/sec * conn time in sec is greater than the default connection
table size - the error you see is from ip_vs_conn.c:

Unless I'm misinterpreting you, your statement

conns/sec * conn time in sec > default connection table size

is almost always true, however that's why it's a linked list:

struct ip_vs_conn entries go into the buckets with a simply Jenkins hash function distribution:

Buckets (# == hash table size := n)
[entry 1  ] --> entry 1.1
[entry 2  ] --> entry 2.1 --> entry 2.2 --> entry 2.3
[entry 3  ] --> entry 3.1 --> entry 3.1
[...]
[entry n-1]
[entry n  ] --> entry n.1 --> entry n.2

  cp = kmem_cache_alloc(ip_vs_conn_cachep, GFP_ATOMIC);
  if (cp == NULL) {
          IP_VS_ERR_RL("ip_vs_conn_new: no memory available.\n");
          return NULL;
  }

If we cannot allocate space (sizeof(struct ip_vs_conn) and HW alined in pages) for the table to which ip_vs_conn_cachep points to anymore using GFP_ATOMIC we bail out.

in turn, ip_vs_conn_cachep is defined inside ip_vs_conn_init:

  /*
   * Allocate the connection hash table and initialize its list heads
   */
  ip_vs_conn_tab = vmalloc(IP_VS_CONN_TAB_SIZE*sizeof(struct
list_head));
  if (!ip_vs_conn_tab)
          return -ENOMEM;

This is the table memory.

  /* Allocate ip_vs_conn slab cache */
  ip_vs_conn_cachep = kmem_cache_create("ip_vs_conn",
                                        sizeof(struct ip_vs_conn), 0,
                                        SLAB_HWCACHE_ALIGN, NULL, NULL);

This is the pointer to the table.

  if (!ip_vs_conn_cachep) {
          vfree(ip_vs_conn_tab);
          return -ENOMEM;
  }

It there wasn't enough space (in our case 256 bytes) to get the initial slab object for struct ip_vs_conn, we will of course also free the table space allocated, since there's no chance in hell, that even one connection can be inserted into the lookup table. This really should not happen on any box, or there's been a serious memory leak in the kernel previously.

The ip_vs_conn_tab is therefore sized according to IP_VS_CONN_TAB_SIZE,
which is set in the compile process and defaults to 12. This gives a
table size of 4096 (2^12). If you hit your server with a *very* high

Correct.

connection rate (as in busy DNS) then you're going to exhaust your
connection table in no time, especially if the DNS servers take a little

Not quite, since the table entries are linked lists. The average length for the lookup is higher and will kill your performance when you have slow or little L1/L2 cache, however there's nothing wrong with having a smaller table. Although, I believe nowadays the default size in the kernel configuration could be increased to 2^16 entries.

longer to respond when loaded (in this case you get a variant of the
"thundering herd" problem; namely that when responses start to take
longer, you get more requests).

I'd try recompiling with IP_VS_TAB_BITS set to something higher.

What this will do is that it'll shorten the average lookup (and increase the initially allocated static memory for the table), since the linked list entries will be smaller in length.

I'd also try not using OPS, to see whether or not that in itself is the
problem *or* if the more straightforward schedulers exhaust the
connection table before OPS does.

I have already forgotten how OPS works :). But yes, this could be an option, as well.

Take care and best regards,
Roberto Nibali, ratz

ps: Hope all is fine with your new family member and I reckon your memory is pretty much exhausted too at the moment :)
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

<Prev in Thread] Current Thread [Next in Thread>