First, I need to correct the stated provenance of this patch. It is
a small tweaked subset of an antefacto patch posted to integrate
netfilter's connection tracking into LVS, not the nfct patches as I
said. Lots of Googling, not enough brain cells. This patch applies
to v1.0.10, but appears to be portable to 2.6.
During a maintenance window this morning, I had the opportunity to
test the patch.
The first time I ever loaded the patched module, and shockingly it
worked perfectly -- outbound traffic from masq VIPs now follows
source-routes and choses the correct outbound gateway. No side
effects so far, no obvious increased load.
I also poked around the 2.6 LVS source a bit to see if this issue had
been resolved in later versions, and noticed uses of
ip_route_output_key, but the source address was always set to 0
instead of something more specific. I'd say it might be worth a
review of the LVS code to make sure source addresses are set
usefully, and routes are recalculated where necessary.
In any case, if anyone has a similar problem with VIPs spanning
multiple external IP spaces and gateways, this has been working like
a charm for me in significant production load. So far.
*knock*on*wood* I'll update if it crashes and/or burns.
Cheers,
--
Ken.
krb@xxxxxxxxxxx
On Mar 15, 2006, at 2:52 AM, Ken Brownfield wrote:
On Mar 14, 2006, at 7:00 PM, Ken Brownfield wrote:
On Mar 14, 2006, at 2:49 PM, Joseph Mack NA3T wrote:
Julian's nfct code is not used much so we don't hear a lot about
it. It came after the -SH scheduler. Maybe the -SH scheduler
shouldn't be neede if the netfilter problems really have been
cleaned up.
Yes, the route_me_harder() function in the nfct code seems
promising. I fear I'm going to have to grab the source and track
down the routing behavior specifically.
Scanning the nfct patch and looking at the icmp handling, I'm
pretty sure the problem is that ip_vs_out() is sending out the
packet with a route calculated from the real server's IP. Since
ip_vs_out() is reputedly only called for masq return traffic, I
think this is just plain incorrect behavior.
I pulled out the route_me_harder() mod and created the attached
patch. My only concern would be performance, but it seems
netfilter's NAT uses this.
I'll try to set up a tiny test environment tomorrow. Assuming I'm
not all wet!
--
Ken.
--- ip_vs_core.c 2006/03/15 08:10:01 1.1
+++ ip_vs_core.c 2006/03/15 08:41:48
@@ -625,6 +625,42 @@
return NF_ACCEPT;
}
+/* This code stolen from ip_nat_standalone.c, as is the
+ * following comment:
+ *
+ * FIXME: change in oif may mean change in hh_len. Check and realloc
+ * --RR
+ */
+static inline int
+ip_vs_route_me_harder(struct sk_buff *skb)
+{
+ struct iphdr *iph = skb->nh.iph;
+ struct rtable *rt;
+ struct rt_key key = { dst:iph->daddr,
+ src:iph->saddr,
+ oif:skb->sk ? skb->sk->bound_dev_if : 0,
+ tos:RT_TOS(iph->tos)|RTO_CONN,
+#ifdef CONFIG_IP_ROUTE_FWMARK
+ fwmark:skb->nfmark,
+#endif
+ };
+
+ /* Note that ip_route_output_key() makes routing
+ * decisions assuming that the packet has originated
+ * from this machine itself. This is the correct
+ * behaviour for outgoing VS/NAT traffic.
+ */
+ if (ip_route_output_key(&rt, &key) != 0) {
+ printk("ip_vs_route_me_harder(): No more route.\n");
+ return -EINVAL;
+ }
+
+ /* Drop old route. */
+ dst_release(skb->dst);
+ skb->dst = &rt->u.dst;
+
+ return 0;
+}
/*
* It is hooked at the NF_IP_FORWARD chain, used only for VS/NAT.
@@ -643,6 +679,7 @@
struct ip_vs_conn *cp;
int size;
int ihl;
+ int retval;
EnterFunction(11);
@@ -812,8 +849,20 @@
skb->nfcache |= NFC_IPVS_PROPERTY;
+ /* For policy routing, packets originating from this
+ * machine itself may be routed differently to packets
+ * passing through. We want this packet to be routed as
+ * if it came from this machine itself. So re-compute
+ * the routing information.
+ */
+ if (ip_vs_route_me_harder(skb) == 0)
+ retval = NF_ACCEPT;
+ else
+ /* No route available; what can we do? */
+ retval = NF_DROP;
+
LeaveFunction(11);
- return NF_ACCEPT;
+ return retval;
}
|