The configuration of ipvs at Facebook is relatively straightforward. All
ipvs instances bgp advertise a set of VIPs and the network prefers the
nearest one or uses ECMP in the event of a tie. For the uninitiated, ECMP
deterministically and statelessly load balances by hashing the packet
(usually a 5-tuple of protocol, saddr, daddr, sport, and dport) and using
that number as an index (basic hash table type logic).
The problem is that ICMP packets (which contain really important
information like whether or not an MTU has been exceeded) will get a
different hash value and may end up at a different ipvs instance. With no
information about where to route these packets, they are dropped, creating
ICMP black holes and breaking Path MTU discovery. Suddenly, my mom's
pictures can't load and I'm fielding midday calls that I want nothing to do
with.
To address this, this patch set introduces the ability to schedule icmp
packets which is gated by a sysctl net.ipv4.vs.schedule_icmp. If set to 0,
the old behavior is maintained -- otherwise ICMP packets are scheduled.
Alex Gartrell (12):
ipvs: pull out ip_vs_try_to_schedule function
ipvs: replace ip_vs_fill_ip4hdr with ip_vs_fill_iph_skb_off
ipvs: Add hdr_flags to iphdr
ipvs: drop inverse argument to conn_{in,out}_get
ipvs: Make ip_vs_schedule aware of inverse iph'es
ipvs: add schedule_icmp sysctl
ipvs: Use outer header in ip_vs_bypass_xmit_v6
ipvs: attempt to schedule icmp packets
ipvs: ensure that ICMP cannot be sent in reply to ICMP
ipvs: support scheduling inverse and icmp TCP packets
ipvs: support scheduling inverse and icmp UDP packets
ipvs: support scheduling inverse and icmp SCTP packets
include/net/ip_vs.h | 101 ++++++++++++-----
net/netfilter/ipvs/ip_vs_conn.c | 12 +-
net/netfilter/ipvs/ip_vs_core.c | 190 +++++++++++++++++++-------------
net/netfilter/ipvs/ip_vs_ctl.c | 8 +-
net/netfilter/ipvs/ip_vs_proto_ah_esp.c | 17 ++-
net/netfilter/ipvs/ip_vs_proto_sctp.c | 35 ++++--
net/netfilter/ipvs/ip_vs_proto_tcp.c | 37 +++++--
net/netfilter/ipvs/ip_vs_proto_udp.c | 26 ++++-
net/netfilter/ipvs/ip_vs_xmit.c | 9 +-
9 files changed, 287 insertions(+), 148 deletions(-)
--
Alex Gartrell <agartrell@xxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
|