On Wed, 20 Jun 2018, Phillip Moore wrote:
> I'm trying to understand some behavior we are seeing. We run a group of
> IPVS director nodes all active with BGP for advertising their addresses and
> every IPVS node is running both the master and slave sync processes. We run
> our real nodes in TUN mode (for direct server return).
> We believe that some times new connections are reset before they can be
> established if the network delivers packets to the wrong node before state
> has been synced. For the most part this doesn't happen but we have
> encountered a few scenarios mostly involving maintenance or testing new BGP
> related configs (like removing source interface from ECMP hash) that have
> caused it.
There is a sloppy_tcp sysctl var that allows creating
connections from non-SYN packets.
> Currently we have these default settings:
> >cat /proc/sys/net/ipv4/vs/sync_threshold
> 3 50
We have here sync_threshold and sync_period.
The first is packet ID (1 .. sync_period-1 or 0),
the second is period defined as number of packets.
> >cat /proc/sys/net/ipv4/vs/sync_refresh_period
This is in seconds.
> I've read the sysctl docs for these settings and I don't really understand
> the interaction of the 2nd number in sync_threshold with
> sync_refresh_period being set to 0.
May be that 'sync_period=0' is considered only
when sync_refresh_period=0? I.e. 'sync_refresh_period>0'
deactivates the 'sync_period=0' mechanism.
sync_refresh_period is a new mechanism to reduce
rate of sync messages for loaded setups. You should use
"0 0" for sync_threshold when setting sync_refresh_period > 0.
sync_refresh_period is clamped in this range depending
on the state's timeout: [10 sec - timeout/2].
Why we send sync messages? To create same connections
on backup server and to keep their (eg. TCP) state/timeout
When TCP sends many packets in a second it is
useless to send sync messages which do not change much
in the backup server. That is why sync_refresh_period was
created. Basicly, it says "keep connections in backup
server with same state and timeout difference not above
The sync_refresh_period period mechanism works
in this way: if set to 10, do not send sync message if
previous one was sent in the last 10 seconds. The
change in TCP state to final state (on FIN/RST) violates this
rule because we want the new state to be synced, switching
from large EST timeout to small final timeout.
But the switch to EST state is not always reported,
it is under sync_refresh_period/sync_threshold control.
> Looking at this, I think we would desire sync_threshold to be "1
> $something" but I don't know what $something should be. Or should we only
> really care about state changes and set it to "0 $something" ?
If $something is 10, it means one sync message will
be sent per 10 packets. If a TCP connections has a rate of
500 packets/sec, you will see 50 sync messages per second
for this connection. I.e. sync_period 10 means 10% of the
traffic is sync messages. Basicly, it defines rate of
1/N of sync messages per connection packets.
The first value (threshold) just defines which of the
packets are sent, eg. for "2 10" sync messages will be
sent for packets 2, 12, 22, 32, 42, 52, etc. Note that
SYN packet is #1, first ACK is #2. So, the earliest TCP
packet you can sync is with threshold=2. Also, if sync_period>0
sync_threshold should be below sync_period, otherwise
'pkts % sync_period != sysctl_sync_threshold(ipvs)' will
always be TRUE resulting in no sync messages in EST state.
> I'm trying to figure out if there would be any unintended consequences of
> changing these to either 1 or 0.
"2 0" is the minimal traffic, only one sync message in
EST state and one per change to final states.
Julian Anastasov <ja@xxxxxx>
Please read the documentation before posting - it's available at:
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users