On Fri, 1 Jun 2018, Poh Chiat wrote:
> I'm trying this setup where an upstream router ECMPs to multiple
> active-active LVS nodes with the exact same configuration.
> --- | router | -- ECMP --> | LVS nodes | -- IPinIP --> | HAProxy nodes |
> The LVS nodes are configured as such:
> 1. BIRD to advertise VIP to upstream router
> 2. ipvs setup with VIP and sh scheduler
> 3. ipvs uses TUN forwarding method
> 4. loopback interface has VIP
> LVS load balances to some real servers with a tunnel interface setup.
> The real servers will direct return to the client.
> In one of my tests, when I stop BIRD and hence drop the route to node
> A, the upstream router starts forwarding packets to node B. So far so
> What I noticed then is that node B will send TCP RST packets back to
> the client. Instead, what I hoped to see was that
> 1. node B receives the packet
> 2. sh scheduler means that the same real server picked by node A will
> also be picked by node B
> 3. the same real server receives the packet and the connection stays up
What people use in such case is:
1. Enable Sloppy Mode to create connection from any non-RST TCP packet:
echo 1 > sloppy_tcp
echo 1 > sloppy_sctp
2. For active-active setups do not use sync but:
2.1 Maglev ("mh" scheduler in 4.18): non-persistent connections,
MH is better than SH when real servers are added/removed from
2.2 SH scheduler (not better than MH): non-persistent connections
3. For persistence use sync only for persistent templates,
any scheduler can be used:
# sync only templates (requires sloppy_tcp=1):
echo 1 > sync_persist_mode
# How often to sync the templates, in seconds:
echo 10 > sync_refresh_period
echo 0 > sync_retries
echo "0 0" > sync_threshold
> When I enable daemon state sync between node A and B the connection
> stays up. However, state sync happens periodically and so it is still
> possible for some connections to be dropped during this interval.
> >From the above observation, my guess is that ipvs evaluated the packet
> (perhaps in ip_vs_in()), but somehow managed it differently because
> the connection does not exist in the table.
> What I hope to understand is
> 1. is ipvs sending the RST? or is that somewhere else in the netfilter system?
IPVS when not in sloppy mode simply skips the packet
and it is delivered to the local stack where RST is generated
> 2. since sh scheduler (and the recent maglev scheduler patch) are
> consistent hashes, they don't actually have a state to sync (except
> for purely optimization purposes). Is it reasonable to hope that we do
> not need state sync to get this working at all?
Yes, as long as you do not need persistence.
> 3. should I be setting up this in a different way?
May be you are missing only the sloppy_tcp setting...
> Forgive me if I have gaps in my knowledge of the Linux networking
> stack and ipvs's hook with netfilter (and the schedulers and
> connection tracking too).
No problem, we are still learning too :)
Julian Anastasov <ja@xxxxxx>
Please read the documentation before posting - it's available at:
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users