Re: [PATCH ipvs 1/2] ipvs: Do tcp/udp checksumming prior to tunnel xmit

To: Julian Anastasov <ja@xxxxxx>
Subject: Re: [PATCH ipvs 1/2] ipvs: Do tcp/udp checksumming prior to tunnel xmit
Cc: "horms@xxxxxxxxxxxx" <horms@xxxxxxxxxxxx>, "lvs-devel@xxxxxxxxxxxxxxx" <lvs-devel@xxxxxxxxxxxxxxx>, kernel-team <Kernel-team@xxxxxx>
From: Alex Gartrell <agartrell@xxxxxx>
Date: Fri, 25 Jul 2014 07:40:50 +0000

On 7/24/14, 9:30 PM, "Julian Anastasov" <ja@xxxxxx> wrote:

>       Hello,
>On Thu, 24 Jul 2014, Alex Gartrell wrote:
>> So I've found a patch that addresses the problem in what I suspect is
>> right way.
>> diff --git a/net/netfilter/ipvs/ip_vs_xmit.c
>> index 6f70bdd..ea7ef5e 100644
>> --- a/net/netfilter/ipvs/ip_vs_xmit.c
>> +++ b/net/netfilter/ipvs/ip_vs_xmit.c
>> @@ -486,6 +486,7 @@ static inline int ip_vs_tunnel_xmit_prepare(struct
>>sk_buff *skb,
>>         if (ret == NF_ACCEPT) {
>>                 nf_reset(skb);
>>                 skb_forward_csum(skb);
>> +               skb->encapsulation = 1;
>>         }
>>         return ret;
>>  }
>> Empirically, this solves my problem :) I believe it solves the problem
>> generally as well.
>> Digging deeper into the v6 case (FB <3's v6)
>> ip_vs_tunnel_xmit_v6
>> -> ip6_local_out
>> -> __ip6_local_out
>> -> dst_output
>> -> dst_output_sk
>> -> skb_dst(skb)->output (= ip6_output)
>> -> ip6_finish_output
>> -> ip6->finish_output2
>> -> neigh_direct_output
>> -> dev_queue_xmit
>> -> __dev_queue_xmit
>> -> dev_hard_start
>> And then we run into this:
>>         /* If encapsulation offload request, verify we are testing
>>          * hardware encapsulation features instead of standard
>>          * features for the netdev
>>          */
>>         if (skb->encapsulation)
>>                 features &= dev->hw_enc_features;
>> This essentially strips out NETIF_F_ALL_CSUM, so we end up invoking
>> skb_checksum_help below.
>       For the case with missing NETIF_F_ALL_CSUM bits
>skb_checksum_help() does not care for skb->encapsulation,
>it works only by checking skb->csum_* fields. The
>skb_set_inner_transport_header() and skb_set_transport_header()
>calls simply update the header pointers for the drivers
>that have NETIF_F_ALL_CSUM bits set.
>       What is your driver? I have to check why do you
>have problem when NETIF_F_ALL_CSUM is not set. Is it
>missing the NETIF_F_IP_CSUM (v4) and NETIF_F_IPV6_CSUM (v6) bits?

The driver in question is mlx4_en version 2.2-1

In this case, we have MLX4_TUNNEL_OFFLOAD_MODE turned off, so
hw_enc_features is set to 0.

>       Also, can you clarify again which test has
>the problem with broken csums? All 3 tests? Or it
>depends on enabled HW csums?

We run into problems with packets captured from NF_INET_LOCAL_OUT when
we have hardware checksumming enabled.  Disabling hardware checksumming
fixes the problem by removing the bits from the features and makes it
act roughly the same as just setting the encapsulation bit

>From my reading, the pseudo code for it is basically as follows

Features = device_features # which has a bunch of stuff, including full

If skb->encapsulated:
    Features = device_encapsulated_features # which is zero

If needs_gso(skb, features) # requires that gso params are set and that
the features have it
    # gso segment stuff
    If skb->ip_summed == CHECKSUM_PARTIAL: # This is us
         If skb->encapsulation:
             set_transport_header(skb, skb_checksum_start_offset(skb))
         If !(features & NETIF_F_ALL_CSUM): # Basically equal to
skb->encapsulation for us
             If skb_checksum_help(skb) # will do the software checksumming
we need
                    # do error stuff

>>         /* If packet is not checksummed and device does not
>>          * support checksumming for this protocol, complete
>>          * checksumming here.
>>          */
>>         if (skb->ip_summed == CHECKSUM_PARTIAL) {
>>                 if (skb->encapsulation)
>>                         skb_set_inner_transport_header(skb,
>>                                 skb_checksum_start_offset(skb));
>>                 else
>>                         skb_set_transport_header(skb,
>>                                 skb_checksum_start_offset(skb));
>>                 if (!(features & NETIF_F_ALL_CSUM) &&
>>                      skb_checksum_help(skb))
>>                         goto out_kfree_skb;
>>         }
>> Does this seem like a reasonable approach to you, Julian?
>       Exactly, I was referring to this place. For IPv4
>we do the same as your change in ip_vs_tunnel_xmit_prepare,
>only that it is via iptunnel_handle_offloads() which in
>addition to setting skb->encapsulation adds SKB_GSO_IPIP bit
>for GSO purposes but skb_checksum_help() is called again
>in dev_hard_start_xmit().
>       Don't forget that the devices with NETIF_F_ALL_CSUM set
>need skb_reset_inner_headers() call, they use the
>skb->inner_* fields. What I have done in the patch is
>just to copy the handling from ip*_tunnel*.c. When
>NETIF_F_ALL_CSUM is not set we call skb_checksum_help() and
>then the device driver detects CHECKSUM_NONE, not

Taking a second look, I think that the problem with my approach is
that it won¹t take advantage of gso offload for ipip encapsulation.
Strictly speaking, I don¹t think we need it (the volume of ipvs
Traffic originating from the host itself is almost always going to
be negligible).  That said, taking advantage is clearly the right
way to do it.

With that in mind, my only problem with your patch is that ipv4 is
inconsistent with ipv6.  Can we do something like this?  (Apologies,
I¹m stuck with outlook at home and it is ruining the formatting)  The
specific change is that we use the same iptunnel_handle_offloads
Invocation in ipv4 and ipv6.  It¹s disconcerting to me that it¹s unclear
if we can do IP6IP6 with gso offload ‹ any thoughts there?

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c
index 6f70bdd..b62a23e 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -37,6 +37,7 @@
 #include <net/icmp.h>                   /* for icmp_send */
 #include <net/route.h>                  /* for ip_route_output */
 #include <net/ipv6.h>
+#include <net/ip_tunnels.h>
 #include <net/ip6_route.h>
 #include <net/addrconf.h>
 #include <linux/icmpv6.h>
@@ -862,6 +863,10 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct
ip_vs_conn *cp,
                old_iph = ip_hdr(skb);
+       skb = iptunnel_handle_offloads(skb, false, SKB_GSO_IPIP);
+       if (IS_ERR(skb))
+               goto tx_error;
        skb->transport_header = skb->network_header;
        /* fix old IP header checksum */
@@ -900,7 +905,8 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct
ip_vs_conn *cp,
        return NF_STOLEN;
-       kfree_skb(skb);
+       if (!IS_ERR(skb))
+               kfree_skb(skb);
        return NF_STOLEN;
@@ -953,6 +959,10 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct
ip_vs_conn *cp,
                old_iph = ipv6_hdr(skb);
+       skb = iptunnel_handle_offloads(skb, false, 0);
+       if (IS_ERR(skb))
+               goto tx_error;
        skb->transport_header = skb->network_header;
        skb_push(skb, sizeof(struct ipv6hdr));
@@ -988,7 +998,8 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct
ip_vs_conn *cp,
        return NF_STOLEN;
-       kfree_skb(skb);
+       if (!IS_ERR(skb))
+               kfree_skb(skb);
        return NF_STOLEN;

To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at

<Prev in Thread] Current Thread [Next in Thread>