Balancing outgoing traffic (was Re: MAC address on dummy0)

To:	Alexandre Cassen <Alexandre.Cassen@xxxxxxxxxx>
Subject:	Balancing outgoing traffic (was Re: MAC address on dummy0)
Cc:	Roberto Nibali <ratz@xxxxxx>, Alexandre Cassen <acassen@xxxxxxxxxxxx>, Joseph Mack <mack.joseph@xxxxxxx>, Wensong Zhang <wensong@xxxxxxxxxxxx>, <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
From:	Julian Anastasov <ja@xxxxxx>
Date:	Sat, 25 Aug 2001 03:11:58 +0000 (GMT)

        Hello,

On Fri, 24 Aug 2001, Alexandre Cassen wrote:

> > >         Hm, we better first to put the goals and the possible setups
> > > on the table. Tonight I'll try to read again the RFC and will reply
> > > again but for now, IMO, we should consider many things:
> >
> >I think this is also what I would consider Alexandre. Could you write
> >down the exact specifications and wishes? Or is it the paper you once
> >sent, or can I find it on the project's homepage?
>
> Ok will do it tonight ! The first shot is the simple PDF I have sent. I
> will write detailed specs tonight.

        I'm reading your pdf, first some questions:

- in some setups we must consider how we can send the outgoing traffic
usually according to the source addresses. I.e. when you get WANs from
one ISP, how many subnets you receive. Same subnet or different subnet
for each WAN?

- when we have different public nets for each WAN can we send traffic
with source addresses from WAN1 through WAN2? If not, we don't have
many options for balancing the lines: we can balance only NAT-ed
hosts (see my example from yesterday on the mailing list). No place
for VRRP here. May be only if we want the internal host to use
another path to the NAT box on eth failure. In this posting you can
see that only by using routing on the NAT box one can use many WANs
for the NAT-ed traffic by using multipath (yes, you can saturate
them :)). Then what happens: the NAT box dies and our two WANs fail.
So, we have to use another gateway (NAT box) with other two WANs.
We all agree that the multipath routing plays nice role for the
NAT-ed hosts - you can use many WANs, no matter they provide you
with different public nets or with same nets. You hardcode them in
the default routes for the NAT-ed network and you usually put source
routing for the different nets. If you want to send traffic with
public IPs you can only use the available (not failed) WANs for
these IPs. In our case, when one WAN fails nobody can replace it.

        If only one WAN fails and you are lucky the driver to change
the route DEAD flag, than the multipath can stop to create new routes
to this WAN while it is down. So, the NAT-ed host notice only the
broken existing routes through this WAN but any new connections can
be routed through the available devices. OK, we still serve the NAT-ed
hosts. But wait, there are non-NAT-ed hosts, i.e. other public
hosts (this is the reason you to get /27 subnet for example, not one IP).
Such hosts can usually use the different WANs (indirectly) as default
gateways. Then they will select source address from a non-failed
public net. For example:


   WAN1     WAN2        WAN3
        \ /             |               INT3
        GW1             GW2             |
--------+---------------+---------------+----------------LAN



GW1     - gateway for WAN1 and WAN2
        - WAN1 serves publicnet 10.1.0.0/24 which is placed on eth
        - WAN2 serves publicnet 10.2.0.0/24 which is placed on eth
        - have public address from 10.3.0.0/24 via 10.3.0.1 on eth

GW2     - gateway for WAN3
        - WAN3 serves publicnet 10.3.0.0/24 which is placed on eth
        - have public address from 10.1.0.0/24 via 10.1.0.1 on eth
        - have public address from 10.2.0.0/24 via 10.2.0.1 on eth

INT3    - internal NAT-ed box that uses all WANs
        - needs two private networks to use source routing to the
        two gateways


        The NAT-ed clients don't care which WAN is used and they can
use any public net (they don't know about it). If one WAN fails, the
others are used. The main questions in this setup:

- how GW2 and INT3 can notice that WAN1 is failed?

        One of the solutions is to put 10.1.0.1 (all use this IP
as gateway for packets from 10.1.0.0/24 to world) on the WAN1 device.
Yes, not at the usual eth device. In Linux we can put local IP on any
device but now we have a reason to put it on WAN1. If the net driver
changes IFF_UP GW2 and INT3 will notice that the ARP probes for
10.1.0.1 are not replied. It is possible to check this IP more often
and not to rely on the ARP timers. GW2 and INT3 can now mark the path
to their gateway as broken. This is already implemented in all end hosts.
So, even with static routes the internal hosts can accommodate to the
routing changes. The kernel marks them and does not use them.

        Both the GW1 and GW2 have 3 default routes, each of them
can originate traffic to the world when there is at least one WAN
working. The admin can choose the order. Of course, the traffic
with specific source can fail if there is no available WAN for these
public sources.

        It seems VRRP can't help here. There are no IPs that move
from one host to another because you can't send one traffic through
another device. If you put a master and backup device between the
internal hosts and the real gateways you can then use VRRP to provide
single gateway IP for the NAT-ed hosts but you introduce SPOF. The
question is whether we need such device. See below.

        What we can notice here is that "we assume" the lines are
balanced but this highly depends on the traffic kind. Multipath
works better when there are more flows aggregated in this path but
we rely on the chaos.

        The real problems come when we use these lines for servers,
i.e. not for NAT-ed clients. In such case the remote clients select
the public IP they connect to, so the admins have to differentiate
the traffic by DNS names and services. It is hard to proportionally
load the links in such case.

        So, for the multipath balancing. What we know for multipath,
bonding, teql and LVS:

- multipath forwards traffic by scheduling paths to different
nexthops. Cons: many connections from one path use only one nexthop

- bonding and teql can reorder packets. Cons: requires/assumes support
on both ends. Any ARP problems with bonding and when using hubs?

- LVS can route traffic for next hops

- LVS forwards only traffic belonging to virtual service or related
to it

        So, can we use LVS as intermediate router to provide one
gateway IP to the internal hosts and to balance the outgoing traffic
through different links or gateways? It is already known that LVS
can be used in Direct Routing setups where the real servers are
the transparent proxy servers. This works by delivering locally
the outgoing web traffic and balancing it through the real servers.
Now the questions is: can we do it for most of the used traffic:

- TCP/UDP (virtual service with port 0?)

- the related ICMP traffic already follows the TCP/UDP traffic

- what remains is the outgoing ICMP

- Stupid WLC scheduler or complex tools that tune WRR

Cons: we waste memory for keeping the info for each connection - this
is the way LVS is working

Pros: per-"connection"

        The result (if such balancing can work) is that a LVS box
can be used to balance traffic from internal hosts through many
gateways per-connection. One can put VRRP too. The benefits can
be in providing higher layer metrics about the balancing: how much
the lines are loaded with traffic, etc. We can at least try to
balance better than multipath. Of course, proper routing should
be used to deliver all outgoing traffic locally, for example, the
order of ip rules can be:

- prio 0: local routes (already hardcoded)

- prio X: link routes (by destination)

- prio Y: gatewayed routes (by source)

- prio Z: everything from device XXX and IPs YYY deliver locally
and feed LVS with it. LVS should forward without change any traffic
that can't be balanced

- last prio: our default routes

Some pros:

- if you can't use multipath for non-gatewayed routes, you can do
it with LVS-DR and the proposed feature

- you can tune your cluster software to stop using one gateway (WAN) if
the web page of your ISP does not work :)


Example setup:


        GW1     GW2     LVSIN1  LVSOUT1         INT1    INT2
        |       |       |       |               |       |
--------+-------+-------+-------+---------------+-------+---

LVSIN1: for out->in traffic, using INT1 and INT2 as real servers

LVSOUT1: for in->out traffic, using GW1 and GW2 as real servers

The situation:

- the requests come according to the client action

- the replies are balanced in the best way: they are sent where possible
and we hope the links are proportionally loaded. If we have one publicnet
per WAN, then we don't need LVS to balance server-oriented traffic - there
are no options to send one packet to more than one place.


        Another view:

             ,--in1---> LVSIN1 --------> INT 1, same for INT 2
        GW1 / <-out1--- LVSOUT1 <--------'
        GW2 <---out2----'

[I really can't draw]

        The SNAT function, if used, can be performed on the GW hosts
or on the LVS box (LVSIN1==LVSOUT1)

So, gurus, what problems you see in trying to balance outgoing traffic
by using LVS? Pros, Cons, what is possible and what is not? Possible
good and bad setups?

        We assume the out->in traffic reaches the internal hosts
directly from the real servers (the real/intermediate gateways) - DR
method. Of course, may be we can SNAT the traffic too. Then the
replies will return.

For now, I see that we must check all ICMP packets and to try to
forward them after creating connections structure (same as the MASQ
code). May be I'm missing something that breaks such idea (I'm already
sleeping).

> Regards,
> Alexandre


Regards

--
Julian Anastasov <ja@xxxxxx>

<Prev in Thread]	Current Thread	[Next in Thread>
Re: MAC address on dummy0, Alexandre Cassen Re: MAC address on dummy0, Roberto Nibali Re: MAC address on dummy0, Roberto Nibali Re: MAC address on dummy0, Julian Anastasov Re: MAC address on dummy0, Alexandre Cassen Balancing outgoing traffic (was Re: MAC address on dummy0), Julian Anastasov <= Re: Balancing outgoing traffic (was Re: MAC address on dummy0), Alexandre Cassen

Previous by Date:	RE: LVSGSP, Peter Mueller
Next by Date:	Re: Balancing outgoing traffic (was Re: MAC address on dummy0), Alexandre Cassen
Previous by Thread:	Re: MAC address on dummy0, Alexandre Cassen
Next by Thread:	Re: Balancing outgoing traffic (was Re: MAC address on dummy0), Alexandre Cassen
Indexes:	[Date] [Thread] [Top] [All Lists]