Hi ratz,
> All this is of course steerable from user space. Currently via ioctl's
and in a
> very inefficient way, just like the insertion of new rules :)
I know that Alexandre and Wensong are deeply into this
stuff but I'm somehow not convinced to play with it :)
Last month, I have take time to work on IPVS syncd. I have started working
with to current code moving from the current mcast UDP design to a new
dedicated protocol I called : SSYNCP : stands for Stateful Synchronization
Protocol.
I have done a first working draft code as proof of concept. The design
implemented is :
/*
Protocol definition.
- Alexandre Cassen, <Alexandre.Cassen@xxxxxxxxxxxxxx>, (2002, 08/08)
Name: SSYNCP (Stateful Synchronization Protocol)
Goal: Sending stateful based adverts to a set of
backup nodes.
Specs:
o Broadcast adverts will use multicast. We will
use the multicast address 224.0.0.81 which is
part of the reserved IANA range 224.0.0.69-.100
[http://www.iana.org/assignments/multicast-addresses]
o Adverts will be multicasted using IP. We will
use the Internet Protocol number 114 which is
refered to "any 0-hop protocol".
[http://www.iana.org/assignments/protocol-numbers]
Note: RFC1700 is out-of-date. Prefer the IANA assignement
directory. [http://www.iana.org/numbers.html]
Protocol frame:
The master mulitcasts messages to backups load balancers in the
following format.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Count Conns | Sync ID | Size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| IPVS Sync Connection (1) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| . |
| . |
| . |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| IPVS Sync Connection (n) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
o We set the max IPVS Sync Connection count to 50.
o Message size :
- Inconditionnal size = 50 * sizeof(ip_vs_sync_conn) +
sizeof(ip_vs_sync_mesg)
- For VS/NAT entries we append in/out seq numbers to
previous static size. sizeof(ip_vs_sync_conn_options) = 24
Considering that all IPVS connections synced are VS/NAT conns, with a
IP_VS_CONN_F_SEQ_MASK flag set, then this introduce a fixed message
length equal to :
SYNC_MESG_MAX_SIZE = 50 * (24 + 24) + 8 = 2408 Bytes
But since common config are using default MTU value of 1500 Bytes,
we so trunc this global value to 50*24+8 = 1208 Bytes.
In conclusion, for conns flagged with IP_VS_CONN_F_SEQ_MASK, maximum
IPVS sync connection per adverts is 25.
*/
My goal is to try to design a sync protocol the more generic I can, that
way we will be able to use it for other kernel connection oriented
framework like netfilter.
This design allow synchronization for active/active setup director. The
code use protocol definition using inet_add_protocol(...) to kernel using
the following kernel protocol definition :
static struct inet_protocol ssyncp_protocol = {
ssyncp_rcv, /* SSYNCP handler */
ssyncp_err, /* Error handler */
0, /* Next */
IPPROTO_SSYNCP, /* Protocol ID */
0, /* Copy ? */
NULL, /* data */
"SSYNCP" /* Protocol name */
};
the current code is working, but is very draft and need to be audited for
locking issues especially.
For now the question I am thought on is the location of this SSYNCP,
kernel-space or userspace. Considering the fact that packet are sent every
sec (consider a max of 10 pps) the userspace location can be acceptable.
I was thinking in a machinery that reduce the syncd part of LVS to a simple
queue that is in charge to queue new connection called by ip_vs_in(...)
just like currently. All the mcast part, and routing issue is done
userspace. The question now is: in order to mcast userspace those queued
connection entries we need a good design that permit kernel-space to
user-space copy of the queued connection. like ioctl call... Don t really
know here what is the best design, need inputs on this :)
With this design the user space code of ssyncp will be clean and generic.
Netfilter conntrack extension to this sync protocol will be easy. Anyway
need to thought on some backlog queue, and the like... :) considering the
userspace part, this can be done very quickly since this is very close to
VRRP code... and my futur goal is to drive SSYNCP protocol state according
to VRRP finite state machine... The road is long :)
what would be the best design that will be the most generic possible in
order to simplify futur extensions ? user-space with a kernel-space cnx
queue unqueued by periodical calls...
regards,
Alexandre
PS: I have not publish my current very draft code, so if you want to pick
an eye on it ask me for it.
|