LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: Failover with a high persistent timeout

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: Failover with a high persistent timeout
Cc: ratz@xxxxxx
From: Alexandre Cassen <Alexandre.Cassen@xxxxxxxxxx>
Date: Fri, 13 Sep 2002 15:44:48 +0200
Hi ratz,


> All this is of course steerable from user space. Currently via ioctl's and in a
> very inefficient way, just like the insertion of new rules :)

        I know that Alexandre and Wensong are deeply into this
stuff but I'm somehow not convinced to play with it :)


Last month, I have take time to work on IPVS syncd. I have started working with to current code moving from the current mcast UDP design to a new dedicated protocol I called : SSYNCP : stands for Stateful Synchronization Protocol.


I have done a first working draft code as proof of concept. The design implemented is :

/*
  Protocol definition.
  - Alexandre Cassen, <Alexandre.Cassen@xxxxxxxxxxxxxx>, (2002, 08/08)

  Name: SSYNCP (Stateful Synchronization Protocol)
  Goal: Sending stateful based adverts to a set of
        backup nodes.
  Specs:
    o Broadcast adverts will use multicast. We will
      use the multicast address 224.0.0.81 which is
      part of the reserved IANA range 224.0.0.69-.100
      [http://www.iana.org/assignments/multicast-addresses]


    o Adverts will be multicasted using IP. We will
      use the Internet Protocol number 114 which is
      refered to "any 0-hop protocol".
      [http://www.iana.org/assignments/protocol-numbers]


  Note: RFC1700 is out-of-date. Prefer the IANA assignement
        directory. [http://www.iana.org/numbers.html]


  Protocol frame:

  The master mulitcasts messages to backups load balancers in the
  following format.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Count Conns  |    Sync ID    |            Size               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |            Reserved           |          Checksum             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                    IPVS Sync Connection (1)                   |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                            .                                  |
      |                            .                                  |
      |                            .                                  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                    IPVS Sync Connection (n)                   |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


  o We set the max IPVS Sync Connection count to 50.
  o Message size :
    - Inconditionnal size = 50 * sizeof(ip_vs_sync_conn) +
                                 sizeof(ip_vs_sync_mesg)
    - For VS/NAT entries we append in/out seq numbers to
      previous static size. sizeof(ip_vs_sync_conn_options) = 24


  Considering that all IPVS connections synced are VS/NAT conns, with a
  IP_VS_CONN_F_SEQ_MASK flag set, then this introduce a fixed message
  length equal to :


  SYNC_MESG_MAX_SIZE = 50 * (24 + 24) + 8 = 2408 Bytes


  But since common config are using default MTU value of 1500 Bytes,
  we so trunc this global value to 50*24+8 = 1208 Bytes.
  In conclusion, for conns flagged with IP_VS_CONN_F_SEQ_MASK, maximum
  IPVS sync connection per adverts is 25.
*/


My goal is to try to design a sync protocol the more generic I can, that way we will be able to use it for other kernel connection oriented framework like netfilter.

This design allow synchronization for active/active setup director. The code use protocol definition using inet_add_protocol(...) to kernel using the following kernel protocol definition :

static struct inet_protocol ssyncp_protocol = {
        ssyncp_rcv,             /* SSYNCP handler       */
        ssyncp_err,             /* Error handler        */
        0,                      /* Next                 */
        IPPROTO_SSYNCP,         /* Protocol ID          */
        0,                      /* Copy ?               */
        NULL,                   /* data                 */
        "SSYNCP"                /* Protocol name        */
};

the current code is working, but is very draft and need to be audited for locking issues especially.

For now the question I am thought on is the location of this SSYNCP, kernel-space or userspace. Considering the fact that packet are sent every sec (consider a max of 10 pps) the userspace location can be acceptable.

I was thinking in a machinery that reduce the syncd part of LVS to a simple queue that is in charge to queue new connection called by ip_vs_in(...) just like currently. All the mcast part, and routing issue is done userspace. The question now is: in order to mcast userspace those queued connection entries we need a good design that permit kernel-space to user-space copy of the queued connection. like ioctl call... Don t really know here what is the best design, need inputs on this :)

With this design the user space code of ssyncp will be clean and generic. Netfilter conntrack extension to this sync protocol will be easy. Anyway need to thought on some backlog queue, and the like... :) considering the userspace part, this can be done very quickly since this is very close to VRRP code... and my futur goal is to drive SSYNCP protocol state according to VRRP finite state machine... The road is long :)

what would be the best design that will be the most generic possible in order to simplify futur extensions ? user-space with a kernel-space cnx queue unqueued by periodical calls...

regards,
Alexandre

PS: I have not publish my current very draft code, so if you want to pick an eye on it ask me for it.



<Prev in Thread] Current Thread [Next in Thread>