LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

LVS NAT and source address routing/antefacto patches

To: LVS Users <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: LVS NAT and source address routing/antefacto patches
From: Mark Weaver <mark@xxxxxxxxxx>
Date: Thu, 15 Jul 2004 17:37:37 +0100
One of our customers wants to get an additional, lower bandwidth IP connection (to be used in conjunction with a low TTL, server monitoring DNS server) as a cheapish way of ensuring that the site is reasonably available in the event of bandwidth provider breakage.

The setup is currently using LVS NAT in a standard configuration, e.g.:

ipvsadm -A -t website_ip:80 -s rr
ipvsadm -a -t website_ip:80 -r rs1:80 -m
ipvsadm -a -t website_ip:80 -r rs2:80 -m
...

where website_ip = the external ip address of the service, and the ip addresses of the real servers are assigned from private ip space.

My idea was to simply add the extra ip addresses in as separate load balanced services, and then use something like:

ip rule from backup_ip table backup_route
ip route add default backup_gw table backup_route

This works fine for non-LVS services (and I can therefore provide a straightforward NAT service without redundancy), but with LVS services the traffic is pushed straight down the default route. I'm guessing that this is because the packets are routed before the NAT happens. A few questions:

- Am I right therefore in thinking that this would work with LVS/DR?
- Can anyone think of another method of using LVS-NAT to get these packets to take the right route?

Digging around a little I thought that the old antefacto patches might sort this out, and in fact, they do. However, they are unfortunately unstable (in testing, they seemed fine, but with real traffic the box just drops off the network, presumably with a kernel oops that I can't see as it is in some hosting centre miles away). Reading those a bit further, there is a particular section that would seem to be just what I want:


 /*
  *     It is hooked at the NF_IP_FORWARD chain, used only for VS/NAT.
@@ -642,6 +686,7 @@ static unsigned int ip_vs_out(unsigned i
        struct ip_vs_conn *cp;
        int size;
        int ihl;
+       int retval;

        EnterFunction(11);

@@ -809,8 +854,20 @@ static unsigned int ip_vs_out(unsigned i

        skb->nfcache |= NFC_IPVS_PROPERTY;

+        /* For policy routing, packets originating from this
+         * machine itself may be routed differently to packets
+         * passing through.  We want this packet to be routed as
+         * if it came from this machine itself.  So re-compute
+         * the routing information.
+         */
+        if (route_me_harder(skb) == 0)
+            retval = NF_ACCEPT;
+        else
+            /* No route available; what can we do? */
+            retval = NF_DROP;
+
        LeaveFunction(11);
-       return NF_ACCEPT;
+       return retval;
 }


I believe that this is just rerouting the packet after the NAT rewrite has taken place. Can any kernel experts see any problems with this approach? Should I apply the same change to ip_vs_out_icmp?

Thanks,

Mark


The route function is:

+/* This code stolen from ip_nat_standalone.c, as is the
+ * following comment:
+ *
+ * FIXME: change in oif may mean change in hh_len.  Check and realloc
+ * --RR
+ * (
+ * note from Joe: function name retained for compatibility with Rusty's code + * - in recent kernels has been moved to a different file and called ip_route_me_harder()
+ * )
+ */
+static int
+route_me_harder(struct sk_buff *skb)
+{
+       struct iphdr *iph = skb->nh.iph;
+       struct rtable *rt;
+       struct rt_key key = { dst:iph->daddr,
+                             src:iph->saddr,
+                             oif:skb->sk ? skb->sk->bound_dev_if : 0,
+                             tos:RT_TOS(iph->tos)|RTO_CONN,
+#ifdef CONFIG_IP_ROUTE_FWMARK
+                             fwmark:skb->nfmark
+#endif
+                           };
+
+        /* Note that ip_route_output_key() makes routing
+         * decisions assuming that the packet has originated
+         * from this machine itself.  This is the correct
+         * behaviour for our case.
+         */
+       if (ip_route_output_key(&rt, &key) != 0) {
+               printk("route_me_harder(): No more route.\n");
+               return -EINVAL;
+       }
+
+       /* Drop old route. */
+       dst_release(skb->dst);
+
+       skb->dst = &rt->u.dst;
+       return 0;
+}
+
<Prev in Thread] Current Thread [Next in Thread>