LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: FW: new question - iptables on LB and connection limit?

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: FW: new question - iptables on LB and connection limit?
From: Roberto Nibali <ratz@xxxxxxxxxxxx>
Date: Wed, 22 Nov 2006 15:34:47 +0100
Larry Ludwig wrote:
Sorry, but I simply don't understand this. iptables is a user space
command which cannot be started or stopped. It's a command line tool and
has little to do with your problem. Is the connection tracking still
running in the kernel? What does your lsmod show?

Sure it gets unload via the 'service' command

Ahh, now I get it.

[root@loadb1 ha.d]# service iptables stop
Flushing firewall rules:                                   [  OK  ]
Setting chains to policy ACCEPT: filter                    [  OK  ]
Unloading iptables modules:                                [  OK  ]

lsmod doesn't show it running.

So if you stop your iptables service, there is no /proc/net/ip_conntrack
anymore, right?

What kind of page do you fetch with this? Static or dynamic?

Simple static page.

Ok.

What's its size?

Under 5k for the testing.  Page is much bigger for the real content now,
still static.  See below

So this does not fit into one TCP packet for one PSH. This will have an impact on what you measure.

BTW, with 2.6 kernel test clients spawning 1000 threads sometimes
lead to stalls due to the local_port_range and gc cleanups. What's your
local port range settings on your client? Also please show the ulimit -a
command output right before your start your test conducts.

[root@zeus ~]# ab  -n 300000 -c 1000 http://67.72.106.71/
This is ApacheBench, Version 2.0.41-dev <$Revision: 1.141 $> apache-2.0
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2002 The Apache Software Foundation,
http://www.apache.org/

Benchmarking 67.72.106.71 (be patient)

[root@zeus ~]# ab -n 100000 -c 1000 http://67.72.106.71/ This is ApacheBench, Version 2.0.41-dev <$Revision: 1.141 $> apache-2.0
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2002 The Apache Software Foundation,
http://www.apache.org/

Benchmarking 67.72.106.71 (be patient)

[root@zeus ~]# ab -n 10000 -c 1000 http://67.72.106.71/ This is ApacheBench, Version 2.0.41-dev <$Revision: 1.141 $> apache-2.0
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2002 The Apache Software Foundation,
http://www.apache.org/

Benchmarking 67.72.106.71 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Finished 10000 requests

Server Software:        lighttpd
Server Hostname:        67.72.106.71
Server Port:            80

Document Path:          /
Document Length:        7327 bytes

Concurrency Level:      1000
Time taken for tests:   10.679202 seconds
Complete requests:      10000
Failed requests:        5694
   (Connect: 0, Length: 5694, Exceptions: 0)

This is a massive amount of failed requests!

Write errors:           0
Total transferred:      122363820 bytes
HTML transferred:       119753282 bytes
Requests per second:    936.40 [#/sec] (mean)
Time per request:       1067.920 [ms] (mean)
Time per request:       1.068 [ms] (mean, across all concurrent requests)
Transfer rate:          11189.51 [Kbytes/sec] received

Wire speed, you capped the test with your throughput. Try to recreate the test using a Gbit network. The retransmits due to TCP timeouts won't probably even get through this net-pipe anymore.

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        8  168 666.8     20    9020
Processing:    21  544 1336.0    102   10609
Waiting:        8  279 1072.8     21    9032
Total:         32  713 1476.9    127   10647

The difference between min and max is so big that one of the servers is probably 0% idle and therefor the drops happen. Netfilter only makes matters worse.

Percentage of the requests served within a certain time (ms)
  50%    127
  66%    341
  75%    379
  80%    760
  90%   3069
  95%   3308
  98%   4716
  99%   9149
 100%  10647 (longest request)

This percentile distribution does not explain the timed-out requests directly unless a RTT of over 127ms is already deadly for one side of the test setup.

[root@zeus ~]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
pending signals                 (-i) 1024
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024

This might be another source of your problems, since this is open ab files (per thread) including already opened fds. And thus this is a low number.

pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 16383
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
[root@zeus ~]# sysctl -a | grep local_
net.ipv4.ip_local_port_range = 32768    61000

This is enough.

??? In both traces you have the LB enabled? Or did you mean netfilter?

Iptables was disabled in the second case

I see now :). What are ab's conclusions when you run those tests? How
many dropped connections, how many packets ... and so one.
With iptables enabled the IP address stops responding on the test client
server (zeus)

How long does this usually take?

Could you send along the ethtool $intf and ethtool -k $intf output?

[root@loadb1 ha.d]# ethtool eth0
Settings for eth0:
        Supported ports: [ MII ]
Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: d
        Current message level: 0x000000ff (255)
        Link detected: yes
[root@loadb1 ha.d]# ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off

Ok, no TSO.

 Please show cat /proc/interrupts and /proc/slabinfo

[root@loadb1 ha.d]# cat /proc/interrupts CPU0 CPU1 0: 26109146 26134774 IO-APIC-edge timer
  4:     822532     821228    IO-APIC-edge  serial
  8:          0          1    IO-APIC-edge  rtc
  9:          0          0   IO-APIC-level  acpi
 10:          0          2   IO-APIC-level  ehci_hcd, ohci_hcd, ohci_hcd
 11:          0          0   IO-APIC-level  libata
 14:     234713     234473    IO-APIC-edge  ide0
177:      23675      32438   IO-APIC-level  3ware Storage Controller
185:          0    1001131   IO-APIC-level  eth0
193:     309782        257   IO-APIC-level  eth1
NMI: 0 0 LOC: 52246993 52246992 ERR: 0
MIS:          0
[root@loadb1 ha.d]# cat /proc/slabinfo slabinfo - version: 2.0
# name            <active_objs> <num_objs> <objsize> <objperslab>
<pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata
<active_slabs> <num_slabs> <sharedavail>
ip_vs_conn             2     20    192   20    1 : tunables  120   60    8 :
slabdata      1      1      0
fib6_nodes             7    119     32  119    1 : tunables  120   60    8 :
slabdata      1      1      0
ip6_dst_cache          7     15    256   15    1 : tunables  120   60    8 :
slabdata      1      1      0
ndisc_cache            1     20    192   20    1 : tunables  120   60    8 :
slabdata      1      1      0
rawv6_sock             4     11    704   11    2 : tunables   54   27    8 :
slabdata      1      1      0
udpv6_sock             1     11    704   11    2 : tunables   54   27    8 :
slabdata      1      1      0
tcpv6_sock             2      3   1216    3    1 : tunables   24   12    8 :
slabdata      1      1      0
ip_fib_alias          16    226     16  226    1 : tunables  120   60    8 :
slabdata      1      1      0
ip_fib_hash           16    119     32  119    1 : tunables  120   60    8 :

Nothing special here.

Care to show your lighttpd configuration?

Very basic... The site we are preping for it mostly static too, with fastcgi
for PHP.  I'll show the info that's important for performance:

server.max-fds = 2048
server.max-keep-alive-requests = 32
server.max-keep-alive-idle=5

Fine.

If it's something with the connection tracking overflow you'll see it in
your kernel logs.
No message on the LB when this happens.

Could you share the socket states on the RS during both runs? Also the
ipvsadm -L -n -c output in the middle of the run?

With iptables enabled.

[root@loadb1 ha.d]# ipvsadm -L -n -c | wc 27724 166341 2162413

[root@loadb1 ha.d]# ipvsadm -L -n -c | grep "ESTABLISHED" | wc
  27719  166314 2162082

I'm interested also in the socket states on the RS and also compared to not using netfilter.

I'm not sure the firewall is the issue and could be the client machine. As I
just ran ab with iptables disabled and it still gave me the error.  Iptables
is enabled on the client test machine.

Well, if you disable netfilter (iptables service) completely on all systems, and it still exhibits the problems, we need to debug this further.

Best regards,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc


<Prev in Thread] Current Thread [Next in Thread>