[PATCH net-next,v4 00/14] ipvs: Add icmp scheduling

To: <horms@xxxxxxxxxxxx>, <ja@xxxxxx>, <lvs-devel@xxxxxxxxxxxxxxx>
Subject: [PATCH net-next,v4 00/14] ipvs: Add icmp scheduling
Cc: <alexgartrell@xxxxxxxxx>, <kernel-team@xxxxxx>, Alex Gartrell <agartrell@xxxxxx>
From: Alex Gartrell <agartrell@xxxxxx>
Date: Wed, 26 Aug 2015 10:47:52 -0700
The configuration of ipvs at Facebook is relatively straightforward.  All
ipvs instances bgp advertise a set of VIPs and the network prefers the
nearest one or uses ECMP in the event of a tie.  For the uninitiated, ECMP
deterministically and statelessly load balances by hashing the packet
(usually a 5-tuple of protocol, saddr, daddr, sport, and dport) and using
that number as an index (basic hash table type logic).

The problem is that ICMP packets (which contain really important
information like whether or not an MTU has been exceeded) will get a
different hash value and may end up at a different ipvs instance.  With no
information about where to route these packets, they are dropped, creating
ICMP black holes and breaking Path MTU discovery.  Suddenly, my mom's
pictures can't load and I'm fielding midday calls that I want nothing to do

To address this, this patch set introduces the ability to schedule icmp
packets which is gated by a sysctl net.ipv4.vs.schedule_icmp.  If set to 0,
the old behavior is maintained -- otherwise ICMP packets are scheduled.

  v2: Added ip_vs_sh change, IP_VS_DBG_PKT macro changes,
      reordered ip_vs_try_to_schedule, and other ja fixes.
  v3: Added ip_vs_leave change, ip_vs_sched_persist handling,
      and `offset = ciph.len` change.  Dropped unnecessary !cp check
  v4: Return NF_DROP from ip_vs_leave on icmp_case if not ftp special case.
      Fix LOG invocation with iph->off argument in ip_vs_try_to_schedule

Alex Gartrell (14):
  ipvs: replace ip_vs_fill_ip4hdr with ip_vs_fill_iph_skb_off
  ipvs: Add hdr_flags to iphdr
  ipvs: Handle inverse and icmp headers in ip_vs_leave
  ipvs: pull out ip_vs_try_to_schedule function
  ipvs: drop inverse argument to conn_{in,out}_get
  ipvs: Make ip_vs_schedule aware of inverse iph'es
  ipvs: add schedule_icmp sysctl
  ipvs: Use outer header in ip_vs_bypass_xmit_v6
  ipvs: sh: support scheduling icmp/inverse packets consistently
  ipvs: attempt to schedule icmp packets
  ipvs: ensure that ICMP cannot be sent in reply to ICMP
  ipvs: support scheduling inverse and icmp TCP packets
  ipvs: support scheduling inverse and icmp UDP packets
  ipvs: support scheduling inverse and icmp SCTP packets

 include/net/ip_vs.h                     | 109 ++++++++----
 net/netfilter/ipvs/ip_vs_conn.c         |  12 +-
 net/netfilter/ipvs/ip_vs_core.c         | 289 +++++++++++++++++++-------------
 net/netfilter/ipvs/ip_vs_ctl.c          |   8 +-
 net/netfilter/ipvs/ip_vs_pe_sip.c       |   2 +-
 net/netfilter/ipvs/ip_vs_proto_ah_esp.c |  17 +-
 net/netfilter/ipvs/ip_vs_proto_sctp.c   |  34 ++--
 net/netfilter/ipvs/ip_vs_proto_tcp.c    |  38 ++++-
 net/netfilter/ipvs/ip_vs_proto_udp.c    |  25 ++-
 net/netfilter/ipvs/ip_vs_sh.c           |  45 +++--
 net/netfilter/ipvs/ip_vs_xmit.c         |  24 +--
 net/netfilter/xt_ipvs.c                 |   4 +-
 12 files changed, 390 insertions(+), 217 deletions(-)

Alex Gartrell <agartrell@xxxxxx>

To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at

<Prev in Thread] Current Thread [Next in Thread>
  • [PATCH net-next,v4 00/14] ipvs: Add icmp scheduling, Alex Gartrell <=