LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: foundry vs. lvs

To: Wayne <wayne@xxxxxxxxxxxxxxx>
Subject: Re: foundry vs. lvs
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Julian Anastasov <ja@xxxxxx>
Date: Thu, 19 Jul 2001 15:48:14 +0300 (EEST)
        Hello,

On Wed, 18 Jul 2001, Wayne wrote:

> NAT taks some CPU and memory copying, with slower CPU, it will
> be slower.  It is different from the switch only -- even switch only

        This is a myth from the 2.2 age. In 2.2 there are 2 input
route calls for the out->in traffic and this reduces the performance.
By default, in 2.2 (and 2.4 too) the data is not copied when the IP
header is changed. Updating the checksum in the IP header does not
cost too much time compared to the total packet handling time.

        To check the difference between the NAT and DR forwarding
method in out->in direction you can use testlvs from
http://www.linux-vs.org/~julian/ and to flood a 2.4 director in 2
setups: DR and NAT. My tests show that I can't see a visible difference.
We are talking about 110,000 SYN packets/sec with 10 pseudo clients and
same cpu idle during the tests (there is no enough client power in my
setup for full test), 2 CPUx 866MHz, 2 100mbit internal i82557/i82558
NICs, switched hub:

3 testlvs client hosts -> NIC1-LVS-NIC2 -> packets/sec.

        I use small number of clients because I don't want to spend time
in routing cache or LVS table lookups.

        Of course, the NAT involves in->out traffic and this can reduce
twice the performance if the CPU or the PCI power is not enough to handle
the traffic in both directions. This is the real reason the NAT method
to look so slow in 2.4. IMO, the overhead from the TUN encapsulation
or from the NAT process is negliable.

        Here come the surprises:

The basic setup: 1 CPU PIII 866MHz, 2 NICs (1 IN and 1 OUT), LVS-NAT,
SYN flood using testlvs with 10 pseudo clients, no ipchains rules.
Kernels: 2.2.19 and 2.4.7pre7.

Linux 2.2 (with ipchains support, with modified demasq path to use
one input routing call, something like LVS uses in 2.4 but without dst
cache usage):

In 80,000 SYNs/sec, Out 80,000 SYNs/sec, CPU idle: 99% (strange)
In 110,000 SYNs/sec, Out 88,000 SYNs/sec, CPU idle: 0%

Linux 2.4 (with ipchains support):

with 3-4 ipchains rules: In 80,000 SYNs/sec, Out 55,000 SYNs/sec, CPU idle: 0%
In 80,000 SYNs/sec, Out 80,000 SYNs/sec, CPU idle: 0%
In 110,000 SYNs/sec, Out 63,000 SYNs/sec (strange), CPU idle: 0%

Linux 2.4 (without ipchains support):

In 80,000 SYNs/sec, Out 80,000 SYNs/sec, CPU idle: 20%
In 110,000 SYNs/sec, Out 96,000 SYNs/sec, CPU idle: 2%

Linux 2.4, 2 CPU (with ipchains support):

In 80,000 SYNs/sec, Out 80,000 SYNs/sec, CPU idle: 30%
In 110,000 SYNs/sec, Out 96,000 SYNs/sec, CPU idle: 0%

Linux 2.4, 2 CPU (without ipchains support):

In 80,000 SYNs/sec, Out 80,000 SYNs/sec, CPU idle: 45%
In 110,000 SYNs/sec, Out 96,000 SYNs/sec, CPU idle: 15%, 30000 ctxswitches/sec


What I see is that:

- modified 2.2 and 2.4 UP look equal on 80,000P/s, limits: (2.2=88,000P/s,
2.4=96,000P/s, i.e. 8% difference)

- 1 and 2 CPU in 2.4 look equal 110,000->96,000 (100mbit or PCI bottleneck?),
may be we can't send more that 96,000P/s through 100mbit NIC?

- the ipchains rules can dramatically reduce the performance: from
88,000 to 55,000 P/s

- 2.4.7pre7 SMP shows too many context switches

- DR and NAT show equal results for 2.4 UP: 110,000->96,000P/s, 2-3% idle,
so I can't claim that there is a NAT-specific overhead.

        In the next days I'll try to repeat these tests. May be I have
to add more NICs (oh, all 8 ports in my switch are full).

        I performed other tests, testlvs with UDP flood. The packet rate
is lower, the cpu idle time in the LVS box was increased dramatically
but the client hosts show 0% cpu idle, may be more testlvs client
hosts are needed.

> has the speed that is slower than wire.  We measured top of
> line Intel Gigabit switch, the latency is about 8 micro-second without
> NAT.  We tested the load balancer we build based on Wensong's
> kernel code, the latency including the NAT is under 45 micro second
> through two 10/100 Ethernet interface cards.  CPU usage also has

        Can you repeat the tests with kernel 2.4?

> things to do with how many user daemons do you have and
> how much other overhead do you have.  With kernel only, there
> is really little CPU demand, no matter how much load you have.
> The bandwidth is limited by the network interface.

        And what is the speed limit of one netif?

Regards

--
Julian Anastasov <ja@xxxxxx>



<Prev in Thread] Current Thread [Next in Thread>