Sorry for the delay in this response... been out for the past few days.
No worries, we're not in a hurry here.
stopping iptables is how I removed it.
Sorry, but I simply don't understand this. iptables is a user space
command which cannot be started or stopped. It's a command line tool and
has little to do with your problem. Is the connection tracking still
running in the kernel? What does your lsmod show?
What _EXACT_ test conduct do you run?
ab -n 50000 -c 1000 http://67.72.106.71/
What kind of page do you fetch with this? Static or dynamic? What's its
size? BTW, with 2.6 kernel test clients spawning 1000 threads sometimes
lead to stalls due to the local_port_range and gc cleanups. What's your
local port range settings on your client? Also please show the ulimit -a
command output right before your start your test conducts.
What are the ipvsadm -L -n and the stats numbers?
with iptables enabled on the LB:
[root@loadb1 ~]# ipvsadm -Ln
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 67.72.106.71:80 wlc
-> 67.72.106.68:80 Route 5 3493 93
-> 67.72.106.66:80 Route 5 3483 96
Active connections with the LB enabled seem to "hang around" more and take much
longer to become inactive.
??? In both traces you have the LB enabled? Or did you mean netfilter?
With iptables disabled and the same ab test the active connections match my ab
connurency.
Meaning connections are being made inactive once they are passed along.
[root@loadb1 ~]# ipvsadm -Ln
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 67.72.106.71:80 wlc
-> 67.72.106.68:80 Route 5 524 12137
-> 67.72.106.66:80 Route 5 456 14222
I see now :). What are ab's conclusions when you run those tests? How
many dropped connections, how many packets ... and so one. Could you
post the results, both for the netfilter-connection-tracking-enabled and
the LB-only test run?
What's the kernel version on the director?
2.6.9-42.0.3.ELsmp CentOS 4.4
Should not be an issue then.
What processor, how much ram on the RS, what kind of NIC?
AMD 170 Dual core 2GB of ram
Broadcom BCM5704C
Could you send along the ethtool $intf and ethtool -k $intf output?
LB and RS are all the same (for now)
NAPI enabled?
what is that? How do you check if it is running?
ethtool or dmesg after your driver's loaded.
SMP?
yes
Please show cat /proc/interrupts and /proc/slabinfo
What kind of HTTP server is running on the RS?
lighthttpd but may go with apache 2
I like lighttpd (I reckon you mean that one) a lot. When I did tests
with static pages, it outperformed apache2 by a factor of 4.5 (take this
with a grain of salt). We were able to serve 12'500 pages/s sustained
using a 100kb page IIRC, using a LVS cluster with 4 RS (HP DL380) and
dual-NIC cards bonded. According to the author's own test conducts, the
number does not seem to be far off:
http://www.lighttpd.net/benchmark/
Care to show your lighttpd configuration?
If it's something with the connection tracking overflow you'll see it in
your kernel logs.
No message on the LB when this happens.
Could you share the socket states on the RS during both runs? Also the
ipvsadm -L -n -c output in the middle of the run?
Cheers,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
|