Re: Forwarding method in backup server

To:	Simon Horman <horms@xxxxxxxxxxxx>
Subject:	Re: Forwarding method in backup server
Cc:	Julian Anastasov <ja@xxxxxx>, Wensong Zhang <wensong@xxxxxxxxxxxx>, "lvs-devel@xxxxxxxxxxxxxxx" <lvs-devel@xxxxxxxxxxxxxxx>
From:	Hans Schillstrom <hans.schillstrom@xxxxxxxxxxxx>
Date:	Mon, 4 Oct 2010 14:19:00 +0200
On Mon, 2010-10-04 at 11:44 +0200, Simon Horman wrote:
> On Mon, Oct 04, 2010 at 08:34:59AM +0200, Hans Schillstrom wrote:
> > Hi 
> > 
> > On Sat, 2010-10-02 at 10:30 +0200, Simon Horman wrote:
> > > On Wed, Sep 29, 2010 at 01:01:37AM +0300, Julian Anastasov wrote:
> > > 
> > > Hi Julian, Hi all,
> > > 
> > > > 
> > > >         Hello,
> > > > 
> > > >         From the recent discussion about loaded backup server
> > > > it looks like we do not properly assign forwarding method
> > > > to connections in backup server. If backup is used in master
> > > > as real server, eg. DR, then backup should use LOCALNODE
> > > > for its IP. May be ip_vs_find_dest should allow real server
> > > > with port 0 to be used as default server? And if real server
> > > > is found its forwarding method should be used for the
> > > > connection? So, backup should have the same IP and Port but
> > > > it can choose to use different forwarding method? For example,
> > > > master uses DR but backup TUN for the same real server.
> > > > 
> > > >         Because now when server is added its method can
> > > > be converted to LOCALNODE but when such connections
> > > > are created in backup server we should use DR or NAT
> > > > or whatever the method is configured there. The same is
> > > > when backup is added as DR server in master but the
> > > > connections should be LOCALNODE when created in backup.
> > > > 
> > > >         If we still allow DR/NAT/TUN connections in backup
> > > > to work without real server then all such xmitters should
> > > > check RTCF_LOCAL and assume LOCALNODE if needed. This is
> > > > needed for the case when we do not know the fwmark used
> > > > by connection and we can not find the virtual service.
> > > > 
> > > >         Then __ip_vs_update_dest should not replace the
> > > > configured forwarding method with IP_VS_CONN_F_LOCALNODE
> > > > to allow backup to see this method in fwmark connections.
> > > > If needed, we can remember that it is local in some
> > > > new dest flag, eg. IP_VS_DEST_F_LOCAL. But better to
> > > > show it as it was configured?
> > > > 
> > > >         So, how to fix these problems? May be:
> > > > 
> > > > - ip_vs_find_dest to find svc and dest in more complex way
> > > > 
> > > > - if backup has dest it should assign its forwarding method
> > > > to the connection (ip_vs_bind_dest)
> > > > 
> > > > - allow some transmitters to deliver traffic locally to support
> > > > fwmark setups, eg. when no dest is assigned to connection
> > > 
> > > This seems rather tricky to say the least.
> > > I prefer the 2nd version of struct ip_vs_sync_conn option...
> > > 
> > > >         There is also an option to create 2nd version
> > > > of struct ip_vs_sync_conn. For example, size in
> > > > struct ip_vs_sync_mesg can be moved after new field
> > > > version which will be in place of size. Old backups will
> > > > think the small version number as some short size and will
> > > > ignore the message. New backup servers can support both
> > > > formats. The new format can add new fields for fwmark,
> > > > IPv6 addresses, 1 byte af (AF_INET/AF_INET6), 1 byte len
> > > > for easy skipping of messages if af or protocol are not
> > > > supported.
> > 
> > >From my narrow view of the LVS:
> > If you use Network name spaces there is no need of LOCAL NODE since the
> > entire LVS could be placed in it's own netns....
> > (I know people will use what they always have been using.)
> 
> I'm not quite sure what you are getting at there.
> 
> LOCAL NODE is basically an optimisation in the transmit path for
> the case where the real-server is the local host. But I think
> that most of the problem with it relates to it being determined
> at the time that a real-server is added.
> 
> I'm unclear about how name spaces can help here,
> but I'm certainly very happy to learn.
> 

If the LVS run in it's own network name-space on a real-server
there is no need for LOCAL_NODE. From the LVS point of view it's runing
on "another machine" (i.e netns)

> > > It funny that you should mention that. I need to extend the 
> > > synchronisation
> > > protocol to allow the synchronisation of persistence engine data. And I
> > > came up with more or less the same scheme for extending the protocol
> > > without breaking old implementations - set the current size field to 0 (or
> > > any other value that doesn't match the packet length), add a new size 
> > > field
> > > and a version field.
> > 
> > Why not change port ?
> 
> I considered that too. But I think changing the protocol is easy enough.
> And in any case new kernels will need to understand both the new and
> old ways of doing things.
> 
> > > Lets spend a bit of time thinking out a v2 of the protocol that solves the
> > > outstanding problems that we have.
> > > 
> > > * No version field
> > > * Only 16 bits of flags
> > > * No space for IPv6 addresses
> > > * No space fwmarks
> > > (* No space for persistence engine data)
> > > 
> > 
> > I have stared to implement IPv6 backup using IPv6 multicast
> > My Idea was to keep the IPv4 and IPv6 separated, i.e. send IPv4 over its
> > own socket and IPv6 over another just to keep IPv4 untouched.
> > If there is a need for changes I vote for - "keep them together".
> > 
> > I think a version 2 would be nice, where IPv6 is a part.
> > 
> > Needed new fields
> > * Version must be there
> > * next field  (offset to next filed, IPv4, fwmark, IPv6)
> > * flags/type field
> > 
> > 
> > Divide the messages into required no of fields ex.
> > IPv4
> > fwmark 
> > IPv6
> 
> Perhaps we just need an addrlen field somewhere.
> Or if we wanted to save space, an addr type field.
> 
> If you have some firm ideas perhaps you could send
> them here, perhaps in the form of a C structure or a diagram?
> 

This is the structures that I work with right now, 
(have a look at them and see them as a source for discussion )

The connections is only modified in the IP address i.e. IPv6 
struct ip_vs_sync_conn_v6 {
        __u8                    reserved;

        /* Protocol, addresses and port numbers */
        __u8                    protocol;       /* Which protocol (TCP/UDP) */
        __be16                  cport;
        __be16                  vport;
        __be16                  dport;
        struct in6_addr         caddr;          /* client address */
        struct in6_addr         vaddr;          /* virtual address */
        struct in6_addr         daddr;          /* destination address */

        /* Flags and state transition */
        __be16                  flags;          /* status flags */
        __be16                  state;          /* state info */

        /* The sequence options start here */
};

struct ipvs_synchdr {
        __u8            version;
        __u8            type;
        __u8            nexthdr;
        __u8            size;
};

 New
       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Version      |    type       |  next hdr     |   size        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |            Payload data  ex IPv4 Connections                  |
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Version      |  next header  |  header len   |    type       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |            Payload data  ex IPv6 Connections                  |
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Old
       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Count Conns  |    SyncID     |            Size               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                    IPVS Sync Connection (1)                   |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                            .                                  |
      |                            .                                  |
      |                            .                                  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                    IPVS Sync Connection (n)                   |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


> > > Individually those problems don't seem to warrant a new protocol.
> > > But when combined it seems worthwhile to me.
> > > 
> > > >         Simon, may be now ip_vs_nat_xmit should see
> > > > RTCF_LOCAL flag and we should check all NAT handlers
> > > > to support the LOCALNODE fallback where the port can
> > > > be changed too.
> > > 
> > > I'm not quite sure what you are describing there.
> > > 
> > > Is the idea that if the forwarding mechanism is NAT
> > > then packets will always go via ip_vs_nat_xmit, even if
> > > the IP is local (at config time). And that ip_vs_nat_xmit()
> > > will use local xmit if RTCF_LOCAL is set?
> > 
> > IPv6 also have a number of other issues not related to the backup
> > protocol like  Usage of IPv6 or IPv4 multicast address etc. 
> 
> Could you elaborate?
If I think about it, they do shrink into nothing if a common solution
will be used.

My first approach was a separate sync thread for IPv6 with it's own
socket and don't touch the IPv4 part, if that approach should be used
new sysctls is needed and new switches to ipvsadm.


> 
> --
> To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
<Prev in Thread]	Current Thread	[Next in Thread>
Re: Forwarding method in backup server, Simon Horman Re: Forwarding method in backup server, Hans Schillstrom Re: Forwarding method in backup server, Simon Horman Re: Forwarding method in backup server, Hans Schillstrom <= Re: Forwarding method in backup server, Julian Anastasov
Previous by Date:	Re: [patch v4 00/12] IPVS: SIP Persistence Engine, Simon Horman
Next by Date:	Re: [patch v4 10/12] IPVS: Allow configuration of persistence engines, Simon Horman
Previous by Thread:	Re: Forwarding method in backup server, Simon Horman
Next by Thread:	Re: Forwarding method in backup server, Julian Anastasov
Indexes:	[Date] [Thread] [Top] [All Lists]