Hello Horms,
> The attached patch adds two new proc entries that effect the operation
> of the synchronisation daemon.
>
> /proc/sys/net/vs/sync_frequency
>
> This is a companion to the existing /proc/sys/net/vs/sync_threshold proc
> entry. It sets how often a packet will be synchronised. The default is
> 50, which was the hard-coded value for this.
>
> /proc/sys/net/vs/sync_msg_max_size
>
> This sets the maximum size of messages sent by the synchronisation
> daemon in bytes. The intention is to be able to fine tune this for
> networks whose MTU may be other than 1500 bytes. One example that
> springs to mind would be a (Gigabit) network that uses jumbo frames of
> 6000 bytes. The default is 1228 which is the old hard-coded value, plus a
> little extra space to allow 50 simple connections to be transmitted by
> having an extra 24 bytes to allow for the case where the last connection
> is a full connection.
I like it very much. It's a very small and clean patch.
> I have also updated the ipvsadm(8) man pages to document the
> functionality of these proc entries, as well as the existing
> /proc/sys/net/vs/sync_threshold proc entry that was previously
> undocumented.
Thanks for de-Americani[sz]ing and the name space cleanup for
FULL_CONN_SIZE :)
One idea I discussed at the OLS with Harald Welte about state
synchronisation (which he didn't like) was:
Currently we synchronise all entries as good as we can and we don't
really care too much if not all entries are synchronised. This is
not what Harald wants. He wants to be able to have an almost 100%
up to date state transition table on all backup nodes. Obviously
this is not feasable or only if you stop the packet in the input
queue, synchronize and acknowledge and then send the packet through
the stack. The latency issues of such a theoretical approach should
be clear to everyone. There is a commercial hardware load balancer
I worked with that does have this but this is because they're using
fast hardware and Motorola DSP's.
Now to get to a point. During his talk I proposed following model:
I want to weighten services to address service priority. For example
I don't care if under a http 'flood' I don't get sync'd http connections,
but I certainly do care about my https/ssh or whatever persistent
or mission critical protocol I'm running. So to address this, we
add a weight flag to the ip_vs_* structure and the sync daemon will
parse and do kind of a qdisc on the to be sync'd connection entries.
So if you give the https service a weight of 100 and the http service
1 and you have 100 valid entries of each service in the ipvs_conntrack
table and the sync daemon kicks in, you would sync 100 https entries
and if you still can you'd start with the first http entry and then
100 https entries again. Like that you can give priorities and somehow
make sure that the probability of an important service not to by in
sync by 100% is small or even 0.
What do you think about that?
[I will also post some replies to the ongoing discussion at <dev>@linux-vs
internally about how exactly Harald wants his replication to work.
Give me some time to catch up with work again :)
Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
|