LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: NAT cluster....

To: Stephen Rowles <spr@xxxxxxxxxxxxxxx>
Subject: Re: NAT cluster....
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Joseph Mack <mack@xxxxxxxxxxx>
Date: Sat, 9 Sep 2000 07:39:37 -0400 (EDT)
On Fri, 8 Sep 2000, Stephen Rowles wrote:

> hi,
> 
> After trying to use Direct Routing on an ATM network I discovered that 
> because of the ATM it is not possible to have duplicate MAC addresses for a 
> single IP. 

Well this is Linux, we have the source code.

Possible solutions (I'm assuming the ATM router which talks to the
director and the real-servers is the problem here).

1. demo that ATM is causing the problem: Put ethernet cards on a small
number of real-servers and put a linux router between the real-servers and
the ATM router.

2. rather than using a linux router as in #1, allow the director to be the
router/default route for the real-servers. The standard ipvs patch
will _not_ work with the director as the default gw for the
real-servers. 

2a. add Julian's martian modification patch (I've tried this).
This patches a piece of code that is not being changed much in the 2.2
kernel and you can probably patch the current kernel with it. Then put
ethernet on the real-server/director network.

2b have the director accept packets by transparent proxy (see
below)(I've used TP for other things, but not tested whether it
works for making the director the default gw for a VS-DR setup)

3. Contact the people who write the ATM drivers and tell them your
problem. There may even be a switch in /proc to turn this behaviour off
already.

4. If  #3 fails, see if Julian is prepared to look at the ATM code (you
may need to send/lend him 2 ATM cards)

5. See if anyone has a priority routing work around (Julian - any ideas?)

Joe

----------------------------------
here's the stuff from the next version of the HOWTO on making
the director the default gw for the real-servers.

8.6 Julian's martian modification: director is default gw for
real-servers

In the case where the director is the firewall for the real-server
network, the director has to be the default gw for the real-servers.

If the reply packet from the real-server to the client
(VIP->CIP) goes through the director (which has a device
with IP=VIP), the director is being asked to route a packet with a
src address that is on the director.

 >From: Horms <horms@xxxxxxxxxxxx>
 >
 >The problem is that with Direct routing the reply from the real
 >server has the vip as the source address. As this is an address
 >of one of the interfaces on the director it will drop it if you
 >try and forward it through the director. It appears from
 >experimentation with /proc/sys/net/ipv4/conf/*/rp_filter
 >that at least on 2.2.14, there is no way to turn this behaviour
 >off.

This type of packet is called a "source martian" and is dropped by
the director. (martians can be logged with

# echo 1 >/proc/sys/net/ipv4/conf/all/log_martians).

Julian has come up with 2 solutions to this.

8.6.1 Director has 1 NIC, accepts packets via transparent proxy.

If the director accepts packets for the VIP via transparent proxy,
then the director doesn't have the VIP and the return packets are
processed normally.

Here's Julian's posting



                Clients
                   |
                  ISP
                   |eth0/ppp0/...
                Router/Firewall/Director (LVS box)
                   |eth1
        +----------+------------+
        |eth0                   |eth0
        Real 1                  Real2

Router: transparent proxy for VIP (or all served VIPs)
The ISP must feed your Director with packets for your subnet
199.199.199.0/24
VS/DR mode (Yes, VS/DR, this is not a mistake)
eth1: 199.199.199.2
default gw is ISP

Real server(s): nothing special
VIP on hidden device or via transparent proxy
eth0: 199.199.199.3
default gateway is 199.199.199.2 (the Director)

        This is a minimum required config. You can add internal subnets
yourself using the same physical network (one NIC) or by adding additional
NICs, etc. They are not needed for this test.

        Packets from the real servers with saddr=VIP will be forwarded
from the director because VIP is not configured in the Director. We expect
that this setup is faster than VS/NAT.

8.6.2 Kernel Patch, director has 2 NICs, VIP is on outside NIC.

8.6.2.1 Martian modification setup

        The patch (below) has been tested against 2.2.15pre9 (Joe)
and 2.2.13 (Stephen Zander <gibreel@xxxxxxxxx>).
The kernel code is not changing very fast for these files. If
patching other 2.2 kernels produces no rejects (HUNK FAILED)
then the patch is probably OK.

        2 NICs are required: one for the external net and one
for the internal net (with the real servers). It doesn't work
with one NIC.

        After applying this patch, for a test, use the default
values for */rp_filter(=0). This allows real servers to send
packets with saddr=VIP and daddr=client through the Director.

        If this patch is applied and external_eth/rp_filter is
0 (which is the default) the real servers can receive packets
with saddr=any_director_ip and dst=any_RIP_or_VIP which is not
very good. On the external net, set rp_filter=1 for better
security.


Here's the test setup (Joe)

             ____________
            |            |
            |  client    |
            |____________|
                  |
                  |  192.168.2.0/24
             _____|______
            |            |
            |  director  | VS-DR director has 2 NICs
            |____________|
                  | eth0    192.168.1.9
                  | eth0:12 192.168.1.1
                  |
                  |  192.168.1.0/24
            ______|____________________
            |
            |
       _____|_______
      |             |
      |real-server(s)| default gw=192.168.1.1
      |_____________|

192.168.1.1 is the normal router. For the test it was put on the
director as an alias. The director has 2 NICs, with forwarding=on
(client and real-servers can ping each other).

Director runs linux-0.9.8-2.2.15pre9 unpatched or with Julian's
patch. LVS is setup using the configure script in the HOWTO,
redirecting telnet, with rr scheduling to 3 real-servers.
The real-servers were running 2.0.36 (1) or 2.2.14 (2). The arp problem
was handled for the 2.2.14 real-servers by permanently
installing in the client's arp table, the MAC address
of the NIC on the outside of the director, using
the command `arp -f /etc/ethers`


The director was booted 4 times, into unpatched, patched, unpatched and
patched. After each reboot the lvs scripts were run on the director and
the real-servers, then the functioning of the LVS tested by telnet'ing
multiple times from the client to the VIP.

For the unpatched kernel, the client connection hung and inactive
connections
acccumulated for each real-server. For the patched kernel, the client
telnet'ed to the VIP connecting with each real-server in turn.

The conifigure script will set up the modified VS-DR (it will warn you
that you need the patch to work). Setup details are in

http://www.linuxvirtualserver.org/Joseph.Mack/performance/single_real-server_performance.html

8.6.2.2 Martian modification performance

Performance has similar latency to VS-NAT but the low load on the director
at high throughput of VS-DR.

http://www.linuxvirtualserver.org/Joseph.Mack/performance/single_real-server_performance.html

8.6.2.3 Martian modification patch (it looks like all these lines have
been line wrapped by my editor)

--- linux/include/net/ip_fib.h.orig     Wed Feb 23 16:54:27 2000
+++ linux/include/net/ip_fib.h  Wed Mar 15 13:46:22 2000
@@ -200,7 +200,7 @@
 extern int inet_rtm_getroute(struct sk_buff *skb, struct nlmsghdr* nlh,
void *arg);
 extern int inet_dump_fib(struct sk_buff *skb, struct netlink_callback
*cb);
 extern int fib_validate_source(u32 src, u32 dst, u8 tos, int oif,
-                              struct device *dev, u32 *spec_dst, u32
*itag);
+                       struct device *dev, u32 *spec_dst, u32 *itag, int
our);
 extern void fib_select_multipath(const struct rt_key *key, struct
fib_result *res);

 /* Exported by fib_semantics.c */
--- linux/net/ipv4/fib_frontend.c.orig  Wed Feb 23 16:54:27 2000
+++ linux/net/ipv4/fib_frontend.c       Wed Mar 15 14:44:45 2000
@@ -189,7 +189,7 @@
  */

 int fib_validate_source(u32 src, u32 dst, u8 tos, int oif,
-                       struct device *dev, u32 *spec_dst, u32 *itag)
+                       struct device *dev, u32 *spec_dst, u32 *itag, int
our)
 {
        struct in_device *in_dev = dev->ip_ptr;
        struct rt_key key;
@@ -206,7 +206,8 @@
                return -EINVAL;
        if (fib_lookup(&key, &res))
                goto last_resort;
-       if (res.type != RTN_UNICAST)
+       if ((res.type != RTN_UNICAST) &&
+               ((res.type != RTN_LOCAL) || our))
                return -EINVAL;
        *spec_dst = FIB_RES_PREFSRC(res);
        if (itag)
@@ -216,13 +217,20 @@
 #else
        if (FIB_RES_DEV(res) == dev)
 #endif
+       {
+               if (res.type == RTN_LOCAL) {
+                       *itag = 0;
+                       return -EINVAL;
+               }
                return FIB_RES_NH(res).nh_scope >= RT_SCOPE_HOST;
+       }

        if (in_dev->ifa_list == NULL)
                goto last_resort;
        if (IN_DEV_RPFILTER(in_dev))
                return -EINVAL;
        key.oif = dev->ifindex;
+       if (res.type == RTN_LOCAL) key.iif = loopback_dev.ifindex;
        if (fib_lookup(&key, &res) == 0 && res.type == RTN_UNICAST) {
                *spec_dst = FIB_RES_PREFSRC(res);
                return FIB_RES_NH(res).nh_scope >= RT_SCOPE_HOST;
--- linux/net/ipv4/route.c.orig Wed Feb 23 17:00:07 2000
+++ linux/net/ipv4/route.c      Wed Mar 15 13:07:28 2000
@@ -1037,7 +1037,7 @@
                if (!LOCAL_MCAST(daddr))
                        return -EINVAL;
                spec_dst = inet_select_addr(dev, 0, RT_SCOPE_LINK);
-       } else if (fib_validate_source(saddr, 0, tos, 0, dev, &spec_dst,
&itag) < 0)
+       } else if (fib_validate_source(saddr, 0, tos, 0, dev, &spec_dst,
&itag, our) < 0)
                return -EINVAL;

        rth = dst_alloc(sizeof(struct rtable), &ipv4_dst_ops);
@@ -1181,7 +1181,7 @@
        if (res.type == RTN_LOCAL) {
                int result;
                result = fib_validate_source(saddr, daddr, tos,
loopback_dev.ifindex,
-                                            dev, &spec_dst, &itag);
+                                            dev, &spec_dst, &itag, 1);
                if (result < 0)
                        goto martian_source;
                if (result)
@@ -1206,7 +1206,7 @@
                return -EINVAL;
        }

-       err = fib_validate_source(saddr, daddr, tos, FIB_RES_OIF(res),
dev, &spec_dst, &itag);
+       err = fib_validate_source(saddr, daddr, tos, FIB_RES_OIF(res),
dev, &spec_dst, &itag, 0);
        if (err < 0)
                goto martian_source;

@@ -1279,7 +1279,7 @@
        if (ZERONET(saddr)) {
                spec_dst = inet_select_addr(dev, 0, RT_SCOPE_LINK);
        } else {
-               err = fib_validate_source(saddr, 0, tos, 0, dev,
&spec_dst, &itag);
+               err = fib_validate_source(saddr, 0, tos, 0, dev,
&spec_dst, &itag, 1);
                if (err < 0)
                        goto martian_source;
                if (err)




--
Joseph Mack mack@xxxxxxxxxxx



<Prev in Thread] Current Thread [Next in Thread>