LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

[lvs-users] Connection sync breaks fwmark-based localnode setup

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: [lvs-users] Connection sync breaks fwmark-based localnode setup
From: svensven <svensven@xxxxxxxxx>
Date: Sun, 28 Mar 2010 12:31:20 +0200
In short: ip_vs_conn_in_get() does not match on fwmark, so incoming
packets to the backup LVS that were forwarded from the master LVS will
match a synchronized connection and thus be sent through ipvs on the
backup LVS, which is also the destination realserver. ipvs will loop
the packet, causing the node to hang. Without conn sync, the nodes
work fine (though of course breaking existing connections when failing
over). Tested on Linux 2.6.33.

Here's my setup:

      client ----+
     10.0.0.3    | vip: 10.0.0.10
                / \
               /   \
   +------------+ +------------+
   | LVS A (mst)| | LVS B (bkp)|
   |Realserver A| |Realserver B|
   |  10.0.0.5  | |  10.0.0.6  |
   +------------+ +------------+

Both nodes are set up with the vip on lo:10, an iptables rule to set
the fwmark if the request does not come from the other LVS and
arp_ignore=1, arp_announce=2 on all interfaces. See net/iptables/
sysctl config for LVS master [3] and backup [4]. The realservers run
lighttpd on port 9999 and bind to 0.0.0.0.

Both nodes have an identical keepalived.conf, except for the priority.
See full keepalived.conf for LVS A [5]. The important parts of it are
shown below:

   virtual_server fwmark 10 {
       lb_algo rr
       lb_kind DR
       real_server 10.0.0.5 9999 {...}
       real_server 10.0.0.6 9999 {...}
   }

The config includes notify_master/notify_backup scripts that
start/stop the ipvs connection synchronization daemon. For testing
purposes, the sync threshold is tweaked to sync after the TCP 3-way
handshake is done (2 incoming packets seen: SYN and ACK):

   net.ipv4.vs.sync_threshold="2 10"

The debug kernel output in [1] shows how the connection fails when the
client queries the vip, LVS A is master, and the connection is
forwarded to realserver B.

The debug kernel output in [2] shows how the connection works when the
client queries the vip, LVS B is the master, and the connection is
forwarded to realserver B (itself), i.e. with no connection
synchronization.


Questions:
1. Should the ip_vs_conn_in_get() function also take fwmark into
    consideration when matching incoming packets to its list of
    established ipvs connections?

2. Is this the right way of setting up a two-node LVS setup with
    localnodes and connection synchronization on a modern kernel?
    (Assuming the conn sync would not break)


thanks!
S.

***

[1]: Example of fail
LVS A is master, balances to realserver B.
The output below is from LVS B / realserver B kern.log after:
* adding LOG entries to iptables -t filter, chain INPUT and OUTPUT
* setting net.ipv4.vs.debug_level to 13 (max)
* stripping away some crud, cleaning timestamps, etc
* adding <notes> on progress

Interesting lines: 11, 21, 28

  1 <Connection from client to VIP>
  2 [52.351] filter-INPUT : IN=eth0 OUT= MAC=lvsB_mac:lvsA_mac:08:00 
SRC=10.0.0.3 DST=10.0.0.10 SPT=54590 DPT=9999 SYN
  3 [52.351] IPVS: lookup/in TCP 10.0.0.3:54590->10.0.0.10:9999 not hit
  4 [52.351] IPVS: lookup/out TCP 10.0.0.3:54590->10.0.0.10:9999 not hit
  5 [52.351] IPVS: lookup service: fwm 0 TCP 10.0.0.10:9999 not hit
  6 [52.351] filter-OUTPUT: IN= OUT=eth0 SRC=10.0.0.10 DST=10.0.0.3 
SPT=9999 DPT=54590 ACK SYN
  7 [52.457] filter-INPUT : IN=eth0 OUT= MAC=lvsB_mac:lvsA_mac:08:00 
SRC=10.0.0.3 DST=10.0.0.10 SPT=54590 DPT=9999 ACK
  8 [52.457] IPVS: lookup/in TCP 10.0.0.3:54590->10.0.0.10:9999 not hit
  9 [52.457] IPVS: lookup/out TCP 10.0.0.3:54590->10.0.0.10:9999 not hit
10 <TCP handshake complete>
11 <IPVS state is synchronized from MASTER to BACKUP>
12 [52.869] IPVS: packet type=2 proto=17 daddr=224.0.0.81 ignored
13 [52.869] IPVS: Enter: ip_vs_receive, net/netfilter/ipvs/ip_vs_sync.c 
line 722
14 [52.869] IPVS: Leave: ip_vs_receive, net/netfilter/ipvs/ip_vs_sync.c 
line 733
15 [52.869] IPVS: lookup/in TCP 10.0.0.3:54590->10.0.0.10:9999 not hit
16 [52.869] IPVS: lookup service: fwm 0 TCP 10.0.0.10:9999 not hit
17 [53.353] IPVS: packet type=5 proto=2 daddr=224.0.0.81 ignored
18 <One line of data sent from client to VIP>
19 [60.906] filter-INPUT : IN=eth0 OUT= MAC=lvsB_mac:lvsA_mac:08:00 
SRC=10.0.0.3 DST=10.0.0.10 SPT=54590 DPT=9999 ACK PSH
20 <Packet matches synchronized state>
21 [60.906] IPVS: lookup/in TCP 10.0.0.3:54590->10.0.0.10:9999 hit
22 [60.906] IPVS: Enter: ip_vs_dr_xmit, net/netfilter/ipvs/ip_vs_xmit.c 
line 756
23 <IPVS forwards the packet to the local interface>
24 [60.906] filter-OUTPUT: IN= OUT=lo SRC=10.0.0.3 DST=10.0.0.10 
SPT=54590 DPT=9999 ACK PSH
25 [60.906] IPVS: Leave: ip_vs_dr_xmit, net/netfilter/ipvs/ip_vs_xmit.c 
line 789
26 [61.011] filter-INPUT : IN=lo OUT= 
MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=10.0.0.3 DST=10.0.0.10 
SPT=54590 DPT=9999 ACK PSH
27 <Packet matches synchronized state again ...>
28 [61.019] IPVS: lookup/in TCP 10.0.0.3:54590->10.0.0.10:9999 hit
29 [61.019] IPVS: Enter: ip_vs_dr_xmit, net/netfilter/ipvs/ip_vs_xmit.c 
line 756
30 <IPVS repeats the forwarding in a loop, machine stops responding>
31 [61.030] filter-OUTPUT: IN= OUT=lo SRC=10.0.0.3 DST=10.0.0.10 
SPT=54590 DPT=9999 ACK PSH
32 [61.041] IPVS: Leave: ip_vs_dr_xmit, net/netfilter/ipvs/ip_vs_xmit.c 
line 789
33 [61.074] filter-INPUT : IN=lo OUT= 
MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=10.0.0.3 DST=10.0.0.10 
SPT=54590 DPT=9999 ACK PSH
34 [61.083] IPVS: lookup/in TCP 10.0.0.3:54590->10.0.0.10:9999 hit
35 [61.084] IPVS: Enter: ip_vs_dr_xmit, net/netfilter/ipvs/ip_vs_xmit.c 
line 756
36 <etc, etc>

Note that the incoming packet is not fwmarked, and that the ipvs
lookup/in check does not try to match on fwmark.

***

[2] Example of success
LVS B is master, balances to realserver B (itself).
The output below is from LVS B / realserver B kern.log:

Interesting lines: 13-16

  1 <Connection from client to VIP>
  2 [74.370] filter-INPUT : IN=eth0 OUT= 
MAC=lvsB_mac:00:18:8b:6a:3d:a2:08:00 SRC=10.0.0.3 DST=10.0.0.10 
SPT=38258 DPT=9999 SYN
  3 [74.370] IPVS: lookup/in TCP 10.0.0.3:38258->10.0.0.10:9999 not hit
  4 [74.370] IPVS: lookup/out TCP 10.0.0.3:38258->10.0.0.10:9999 not hit
  5 [74.370] IPVS: lookup service: fwm 0 TCP 10.0.0.10:9999 not hit
  6 [74.370] filter-OUTPUT: IN= OUT=eth0 SRC=10.0.0.10 DST=10.0.0.3 
SPT=9999 DPT=38258 ACK SYN
  7 [74.461] filter-INPUT : IN=eth0 OUT= 
MAC=lvsB_mac:00:18:8b:6a:3d:a2:08:00 SRC=10.0.0.3 DST=10.0.0.10 
SPT=38258 DPT=9999 ACK
  8 [74.471] IPVS: lookup/in TCP 10.0.0.3:38258->10.0.0.10:9999 not hit
  9 [74.471] IPVS: lookup/out TCP 10.0.0.3:38258->10.0.0.10:9999 not hit
10 <TCP handshake complete>
11 <One line of data sent from client to VIP>
12 [76.894] filter-INPUT : IN=eth0 OUT= 
MAC=lvsB_mac:00:18:8b:6a:3d:a2:08:00 SRC=10.0.0.3 DST=10.0.0.10 
SPT=38258 DPT=9999 ACK PSH
13 <Packet does not match synchronized state (there is none)>
14 [76.894] IPVS: lookup/in TCP 10.0.0.3:38258->10.0.0.10:9999 not hit
15 [76.894] IPVS: lookup/out TCP 10.0.0.3:38258->10.0.0.10:9999 not hit
16 <Packet exchange continues as normal>
17 [76.894] filter-OUTPUT: IN= OUT=eth0 SRC=10.0.0.10 DST=10.0.0.3 
SPT=9999 DPT=38258 ACK
18 [77.062] filter-INPUT : IN=eth0 OUT= 
MAC=lvsB_mac:00:18:8b:6a:3d:a2:08:00 SRC=10.0.0.3 DST=10.0.0.10 
SPT=38258 DPT=9999 ACK PSH
19 [77.062] IPVS: lookup/in TCP 10.0.0.3:38258->10.0.0.10:9999 not hit
20 [77.062] IPVS: lookup/out TCP 10.0.0.3:38258->10.0.0.10:9999 not hit
21 [77.062] filter-OUTPUT: IN= OUT=eth0 SRC=10.0.0.10 DST=10.0.0.3 
SPT=9999 DPT=38258 ACK
22 [77.300] filter-INPUT : IN=eth0 OUT= 
MAC=lvsB_mac:00:18:8b:6a:3d:a2:08:00 SRC=10.0.0.3 DST=10.0.0.10 
SPT=38258 DPT=9999 ACK PSH
23 [77.309] IPVS: lookup/in TCP 10.0.0.3:38258->10.0.0.10:9999 not hit
24 [77.309] IPVS: lookup/out TCP 10.0.0.3:38258->10.0.0.10:9999 not hit
25 [77.320] filter-OUTPUT: IN= OUT=eth0 SRC=10.0.0.10 DST=10.0.0.3 
SPT=9999 DPT=38258 ACK
26 [77.402] filter-OUTPUT: IN= OUT=eth0 SRC=10.0.0.10 DST=10.0.0.3 
SPT=9999 DPT=38258 ACK PSH
27 [77.439] filter-INPUT : IN=eth0 OUT= 
MAC=lvsB_mac:00:18:8b:6a:3d:a2:08:00 SRC=10.0.0.3 DST=10.0.0.10 
SPT=38258 DPT=9999 ACK
28 [77.450] IPVS: lookup/in TCP 10.0.0.3:38258->10.0.0.10:9999 not hit
29 [77.450] IPVS: lookup/out TCP 10.0.0.3:38258->10.0.0.10:9999 not hit
30 [77.463] filter-OUTPUT: IN= OUT=eth0 SRC=10.0.0.10 DST=10.0.0.3 
SPT=9999 DPT=38258 ACK FIN
31 [77.508] filter-INPUT : IN=eth0 OUT= 
MAC=lvsB_mac:00:18:8b:6a:3d:a2:08:00 SRC=10.0.0.3 DST=10.0.0.10 
SPT=38258 DPT=9999 ACK FIN
32 [77.518] IPVS: lookup/in TCP 10.0.0.3:38258->10.0.0.10:9999 not hit
33 [77.518] IPVS: lookup/out TCP 10.0.0.3:38258->10.0.0.10:9999 not hit
34 [77.531] filter-OUTPUT: IN= OUT=eth0 SRC=10.0.0.10 DST=10.0.0.3 
SPT=9999 DPT=38258 ACK
35 <etc, etc>

***

[3]: LVS master / Realserver A:
1: lo: <LOOPBACK,UP,LOWER_UP>
     inet 127.0.0.1/8 scope host lo
     inet 10.0.0.10/32 scope global lo:10
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP>
     inet 10.0.0.5/24 brd 10.0.0.255 scope global eth0

iptables -t mangle -A PREROUTING  -d 10.0.0.10 -p tcp -m tcp \
   --dport 9999 -m mac ! --mac-source <realserverB_mac> \
   -j MARK --set-mark 10

net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
net.ipv4.vs.sync_threshold = 2 10

***

[4]: LVS backup / Realserver B:
1: lo: <LOOPBACK,UP,LOWER_UP>
     inet 127.0.0.1/8 scope host lo
     inet 10.0.0.10/32 scope global lo:10
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP>
     inet 10.0.0.6/24 brd 10.0.0.255 scope global eth0

iptables -t mangle -A PREROUTING  -d 10.0.0.10 -p tcp -m tcp \
   --dport 9999 -m mac ! --mac-source <realserverA_mac> \
   -j MARK --set-mark 10

net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
net.ipv4.vs.sync_threshold = 2 10

***

[5]: LVS master keepalived.conf
global_defs {
         lvs_id testlvs
}
vrrp_sync_group test {
         group {
                 VI_1
         }
}
vrrp_instance VI_1 {
         state BACKUP
         interface eth0
         virtual_router_id 10
         priority 100
         advert_int 1
         notify_master /etc/keepalived/master.sh
         notify_backup /etc/keepalived/backup.sh
         notify_fault /etc/keepalived/fault.sh
         authentication {
                 auth_type pass
                 auth_pass hulahoop
         }
         virtual_ipaddress {
                 10.0.0.10
         }
         nopreempt
}
virtual_server fwmark 10 {
         lb_algo rr
         lb_kind DR
         persistence_timeout 0
         delay_loop 20
         protocol TCP
         real_server 10.0.0.5 9999 {
                 weight 1
                 TCP_CHECK
                 {
                         connect_timeout 20
                 }
         }
         real_server 10.0.0.6 9999 {
                 weight 1
                 TCP_CHECK
                 {
                         connect_timeout 20
                 }
         }
}

***


_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread] Current Thread [Next in Thread>