|
Hello,
On Wed, 15 Apr 2026, Yingnan Zhang wrote:
> Currently, IPVS skips MTU checks for GSO packets by excluding them with
> the !skb_is_gso(skb) condition. This creates problems when IPVS tunnel
> mode encapsulates GSO packets with IPIP headers.
>
> The issue manifests in two ways:
>
> 1. MTU violation after encapsulation:
> When a GSO packet passes through IPVS tunnel mode, the original MTU
> check is bypassed. After adding the IPIP tunnel header, the packet
> size may exceed the outgoing interface MTU, leading to unexpected
> fragmentation at the IP layer.
>
> 2. Fragmentation with problematic IP IDs:
> When net.ipv4.vs.pmtu_disc=1 and a GSO packet with multiple segments
> is fragmented after encapsulation, each segment gets a sequentially
> incremented IP ID (0, 1, 2, ...). This happens because:
>
> a) The GSO packet bypasses MTU check and gets encapsulated
> b) At __ip_finish_output, the oversized GSO packet is split into
> separate SKBs (one per segment), with IP IDs incrementing
> c) Each SKB is then fragmented again based on the actual MTU
>
> This sequential IP ID allocation differs from the expected behavior
> and can cause issues with fragment reassembly and packet tracking.
>
> Fix this by properly validating GSO packets using
> skb_gso_validate_network_len(). This function correctly validates
> whether the GSO segments will fit within the MTU after segmentation. If
> validation fails, send an ICMP Fragmentation Needed message to enable
> proper PMTU discovery.
>
> Fixes: 4cdd34084d53 ("netfilter: nf_conntrack_ipv6: improve fragmentation
> handling")
> Signed-off-by: Yingnan Zhang <342144303@xxxxxx>
Looks good to me for the nf tree, thanks!
Acked-by: Julian Anastasov <ja@xxxxxx>
> ---
> v4:
> - Introduce a new helper function ip_vs_exceeds_mtu() to improve readability
> (reviewer feedback)
>
> v3:
> https://lore.kernel.org/netdev/tencent_73010FBD5FA1C05C3BC23A07A50B11CEC90A@xxxxxx/
> v2:
> https://lore.kernel.org/netdev/tencent_CA2C1C219C99D315086BE55E8654AF7E6009@xxxxxx/
> v1:
> https://lore.kernel.org/netdev/tencent_4A3E1C339C75D359093BE4F08648AFAA6009@xxxxxx/
> ---
> ---
> net/netfilter/ipvs/ip_vs_xmit.c | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
> index 0fb5162992e5..64dfdf8b00c4 100644
> --- a/net/netfilter/ipvs/ip_vs_xmit.c
> +++ b/net/netfilter/ipvs/ip_vs_xmit.c
> @@ -102,6 +102,18 @@ __ip_vs_dst_check(struct ip_vs_dest *dest)
> return dest_dst;
> }
>
> +/* Based on ip_exceeds_mtu(). */
> +static bool ip_vs_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu)
> +{
> + if (skb->len <= mtu)
> + return false;
> +
> + if (skb_is_gso(skb) && skb_gso_validate_network_len(skb, mtu))
> + return false;
> +
> + return true;
> +}
> +
> static inline bool
> __mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu)
> {
> @@ -112,7 +124,7 @@ __mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu)
> if (IP6CB(skb)->frag_max_size > mtu)
> return true; /* largest fragment violate MTU */
> }
> - else if (skb->len > mtu && !skb_is_gso(skb)) {
> + else if (ip_vs_exceeds_mtu(skb, mtu)) {
> return true; /* Packet size violate MTU size */
> }
> return false;
> @@ -232,7 +244,7 @@ static inline bool ensure_mtu_is_adequate(struct
> netns_ipvs *ipvs, int skb_af,
> return true;
>
> if (unlikely(ip_hdr(skb)->frag_off & htons(IP_DF) &&
> - skb->len > mtu && !skb_is_gso(skb) &&
> + ip_vs_exceeds_mtu(skb, mtu) &&
> !ip_vs_iph_icmp(ipvsh))) {
> icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
> htonl(mtu));
> --
> 2.51.0.windows.1
Regards
--
Julian Anastasov <ja@xxxxxx>
|