Re: [PATCH ipvs 0/7] Support v6 real servers in v4 pools and vice versa

To: Alex Gartrell <agartrell@xxxxxx>
Subject: Re: [PATCH ipvs 0/7] Support v6 real servers in v4 pools and vice versa
Cc: horms@xxxxxxxxxxxx, lvs-devel@xxxxxxxxxxxxxxx, kernel-team@xxxxxx
From: Julian Anastasov <ja@xxxxxx>
Date: Thu, 31 Jul 2014 00:32:58 +0300 (EEST)

On Tue, 29 Jul 2014, Alex Gartrell wrote:

> At Facebook we use ipip forwarding to deliver packets from our layer 4 ipvs
> load balancers to our layer 7 proxies.  Today these layer 7 proxies are all
> dual stacked, so we can simply send v4 over v4 and v6 over v6.  In the
> future, we're going to have v6-only layer 7 load balancers (no internal v4
> address).  To deal with this, we'll forward v4 packets in v6 tunnels.  This
> patchset introduces the necessary functionality into ipvs.
> The noteworthy limitation of this is that it is not compatible with state
> synchronization, so great care is taken to keep these things mutually
> exclusive.
> This patchset includes changes that add an additional netlink attribute to
> destinations and changes that plumb the destination address family through
> parts of the code where it was assumed that it was the same as the service.
> Finally, there's a change that updates the transmit functions for tunneling
> to share common code and support v4 in v6 and vice versa.
> Alex Gartrell (7):
>   ipvs: Add destination address family to netlink interface
>   ipvs: Supply destination addr family to ip_vs_{lookup_dest,find_dest}
>   ipvs: Pass destination address family to ip_vs_trash_get_dest
>   ipvs: Supply destination address family to ip_vs_conn_new
>   ipvs: maintain a mixed_address_family_dest count
>   ipvs: prevent mixing heterogeneous pools and synchronization
>   ipvs: support ipv4 in ipv6 and ipv6 in ipv4 tunnel forwarding

        This feature looks good to me.

        Such change should go first in net-next tree, after the change
for skb->encapsulation (net tree). So, probably, after some -rc
we can add this patchset.

        So, to summarize our plan:

- we have sockopt and netlink interface

- IPv6 is supported only on netlink interface

- the sockopt structs can not be changed, eg.
struct ip_vs_dest_user in include/uapi/linux/ip_vs.h.

- ipvsadm should allow different family (svc and dest) only for TUN method

- we can also encode somehow the tunnel mode in sync message, eg.
around the STYPE_F_INET6 value. May be this octet can become
tunnel/header type. This can be done later in another patchset.

        Comments for your patchset:

Patch 1:
        - please fix the dest->af = in __ip_vs_update_dest(),
        so that this patch can compile and as result this fix will
        disappear from patch 2:
        -       dest->af =;
        +       dest->af = udest->af;

        - __ip_vs_get_dest_entries() needs to skip dests with
        dest->af != svc->af, we can not exports such dests to
        sockopt users

        - too long comments, use scripts/ --strict /tmp/p1.diff

        - everything else looks ok

Patch 2:
        - avoid error from patch 1 in __ip_vs_update_dest

        - errors from scripts/ --strict /tmp/p2.diff

Patch 3:
        - in the "Destination %u/%s:%u still in trash, " message
        we should use dest->af, not dest_af.

Patch 4:
        - we have to introduce u16 cp->daf, just below cp->protocol

        - as result, cp->af is for caddr and vaddr, cp->daf is for cp->daddr,
        this is needed because sync protocol can create connections
        with cp->dest = NULL, cp binds to dest later

        - ip_vs_ftp.c uses ip_vs_conn_new, now it does not compile

        - Another file net/netfilter/xt_ipvs.c depends on IPVS
        definitions but I think this patchset does not break it

Patch 5:
        - note that dest address can not be changed, so if you see
        reason this patch to exist the check in __ip_vs_update_dest
        should be: if (add && udest->af != svc->af)

        - errors from scripts/ --strict /tmp/p5.diff

Patch 6:
        - comment and code do not match

Patch 7:
        - patch looks based on skb_checksum_help patch, the plan
        is to base it after the offload change we are about to
        send soon

        - ip_vs_bind_xmit*: sync protocol does not guarantee that
        cp->dest exists in all cases, we can not use cp->dest->af
        because cp->dest can be NULL, we have to use cp->daf (from
        patch 4)

        - for ip_vs_prepare_tunneled_skb():
                - usage of 'version' can be changed to 'af'

        - __ip_vs_get_out_rt_v6 and __ip_vs_get_out_rt should
        be changed to account header from both families for MTU check,
        these checks happen for IP_VS_RT_MODE_TUNNEL

        - errors from scripts/ --strict /tmp/p7.diff

- looking at cp, set_tcp_state() can not know what family is used for
        cp->daddr, we have to use cp->daf. We have to check the
        cp->af usage everywhere.

- I didn't fixed ip_vs_core.c logs yet

        I'm attaching some patches for your patchset. If it is
difficult for you to maintain them in the patchset, we can post
them later after your patchset, as separate patchset. For now you
can use them as reference. Or may be they should be inserted
at some point in your patchset, so that your patchset looks
correct if applied up to some patch.

        Let me know if you plan ipvsadm changes based on the
ipvsadm tree:


        I can do it, if needed.


Julian Anastasov <ja@xxxxxx>

Attachment: af.tgz
Description: AF patches

<Prev in Thread] Current Thread [Next in Thread>