LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

[lvs-users] KeepAlived + LVS NAT + UDP DNS + Multiple Ext VIPS = All of

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>, keepalived-devel@xxxxxxxxxxxxxxxxxxxxx
Subject: [lvs-users] KeepAlived + LVS NAT + UDP DNS + Multiple Ext VIPS = All of a sudden, responses go out on wrong VIP
From: Tom <tom@xxxxxxxx>
Date: Mon, 09 Jan 2012 14:53:36 +0000
Hi Guys,

I'm sending this to both LVS and Keepalived mailings lists, as both technologies are involved here, and I'm not sure where the failure might be. It's conceivable that it should go to the netfilter list too, but perhaps you can advise me on that.

The basic gist of the problem is that my DNS cluster has multiple external VIPS, being highly available using keepalived. DNS requests are load balanced across 6 back-end real servers. In most cases, DNS requests come in on one of three external VIPS (the 88.88.192.250, 193.250 and 192.254 addresses), and the responses go back out from those IPs, although I'm not sure if the LVS NAT or iptables NAT is taking care of that side of things - it's just magic.

A few times now, we have had big external providers come to us saying that no requests to our nameservers were working. Looking in to it, it looks like we're replying on the wrong VIPs (ie, not the one that the requests came in on). If you have a look at the attached tcpdump.txt, you'll see one provider (Virgin Media), constantly getting responses from the wrong IPs. This doesn't happen forever however, it just seems to go really badly wrong for an extended period! The tcpdump doesn't take in to account the traffic to the real servers unfortunately. I'm just waiting for this to happen again so I can get a dump from the LB to the real servers in parallel to compare (although I'm praying that it doesn't happen again at the same time..)

Could anyone advise me why the NAT/connection tracking has been failing in these cases?

Please see my keepalived.conf attached (although some information has been replaced)

*Here are my IP addresses:*

[root@dns-lb-02 ~]# ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 qlen 1000
    link/ether 00:22:19:57:97:ca brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 qlen 1000
    link/ether 00:22:19:57:97:ca brd ff:ff:ff:ff:ff:ff
4: sit0: <NOARP> mtu 1480 qdisc noop
    link/sit 0.0.0.0 brd 0.0.0.0
5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
    link/ether 00:22:19:57:97:ca brd ff:ff:ff:ff:ff:ff
    inet6 fe80::222:19ff:fe57:97ca/64 scope link
       valid_lft forever preferred_lft forever
8: bond0.192@bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
    link/ether 00:22:19:57:97:ca brd ff:ff:ff:ff:ff:ff
    inet 88.88.192.36/23 brd 88.88.193.255 scope global bond0.192
    inet 88.88.192.250/23 scope global secondary bond0.192
    inet 88.88.193.250/23 scope global secondary bond0.192
    inet 88.88.192.254/23 scope global secondary bond0.192
    inet 88.88.192.37/23 scope global secondary bond0.192
    inet6 fe80::222:19ff:fe57:97ca/64 scope link
       valid_lft forever preferred_lft forever
9: bond0.81@bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
    link/ether 00:22:19:57:97:ca brd ff:ff:ff:ff:ff:ff
    inet 10.44.81.110/24 brd 10.44.81.255 scope global bond0.81
    inet 10.44.81.108/24 scope global secondary bond0.81
    inet6 fe80::222:19ff:fe57:97ca/64 scope link
       valid_lft forever preferred_lft forever

*routing table:*

[root@dns-lb-02 ~]# ip ro s
10.44.81.0/24 dev bond0.81  proto kernel  scope link  src 10.44.81.110
88.88.192.0/23 dev bond0.192  proto kernel  scope link  src 88.88.192.36
169.254.0.0/16 dev bond0.81  scope link
10.216.0.0/16 via 10.44.81.1 dev bond0.81
10.44.0.0/16 via 10.44.81.1 dev bond0.81
default via 88.88.192.1 dev bond0.192


*This is my iptables nat table:*

[root@dns-lb-02 ~]# iptables -t nat -nL -v
Chain PREROUTING (policy ACCEPT 1032M packets, 82G bytes)
pkts bytes target prot opt in out source destination

Chain POSTROUTING (policy ACCEPT 23M packets, 1372M bytes)
pkts bytes target prot opt in out source destination 7378 592K MASQUERADE all -- * bond0.192 0.0.0.0/0 0.0.0.0/0

Chain OUTPUT (policy ACCEPT 23M packets, 1372M bytes)
pkts bytes target prot opt in out source destination


*here is my IPVS table:*

[root@dns-lb-02 ~]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
UDP  88.88.192.37:53 lc persistent 600
  -> 10.44.81.151:53              Masq    10     0          0
  -> 10.44.81.153:53              Masq    10     0          0
  -> 10.44.81.152:53              Masq    10     0          0
  -> 10.44.81.154:53              Masq    10     0          0
  -> 10.44.81.155:53              Masq    10     0          0
  -> 10.44.81.150:53              Masq    10     0          0
TCP  88.88.192.37:53 lc persistent 600
  -> 10.44.81.155:53              Masq    10     0          0
  -> 10.44.81.151:53              Masq    10     0          0
  -> 10.44.81.153:53              Masq    10     0          0
  -> 10.44.81.152:53              Masq    10     0          0
  -> 10.44.81.154:53              Masq    10     0          0
  -> 10.44.81.150:53              Masq    10     0          0
TCP  88.88.193.250:53 lc persistent 600
  -> 10.44.81.155:53              Masq    10     0          10
  -> 10.44.81.151:53              Masq    10     0          6
  -> 10.44.81.153:53              Masq    10     0          5
  -> 10.44.81.152:53              Masq    10     0          6
  -> 10.44.81.154:53              Masq    10     0          4
  -> 10.44.81.150:53              Masq    10     0          5
TCP  88.88.192.250:53 lc persistent 600
  -> 10.44.81.155:53              Masq    10     0          5
  -> 10.44.81.151:53              Masq    10     0          7
  -> 10.44.81.152:53              Masq    10     0          6
  -> 10.44.81.153:53              Masq    10     0          6
  -> 10.44.81.154:53              Masq    10     0          6
  -> 10.44.81.150:53              Masq    10     0          5
TCP  88.88.192.254:53 lc persistent 600
  -> 10.44.81.155:53              Masq    10     0          7
  -> 10.44.81.151:53              Masq    10     1          5
  -> 10.44.81.153:53              Masq    10     0          6
  -> 10.44.81.152:53              Masq    10     0          5
  -> 10.44.81.154:53              Masq    10     0          5
  -> 10.44.81.150:53              Masq    10     0          5
UDP  88.88.192.254:53 lc persistent 600
  -> 10.44.81.151:53              Masq    10     0          23976
  -> 10.44.81.155:53              Masq    10     0          23961
  -> 10.44.81.150:53              Masq    10     0          23966
  -> 10.44.81.153:53              Masq    10     0          23969
  -> 10.44.81.154:53              Masq    10     0          23969
  -> 10.44.81.152:53              Masq    10     0          23985
UDP  88.88.193.250:53 lc persistent 600
  -> 10.44.81.151:53              Masq    10     0          49915
  -> 10.44.81.155:53              Masq    10     0          49916
  -> 10.44.81.153:53              Masq    10     0          50559
  -> 10.44.81.154:53              Masq    10     0          49982
  -> 10.44.81.152:53              Masq    10     0          50210
  -> 10.44.81.150:53              Masq    10     0          49945
UDP  88.88.192.250:53 lc persistent 600
  -> 10.44.81.151:53              Masq    10     0          48668
  -> 10.44.81.152:53              Masq    10     0          48668
  -> 10.44.81.154:53              Masq    10     0          48686
  -> 10.44.81.155:53              Masq    10     0          48650
  -> 10.44.81.153:53              Masq    10     0          49025
  -> 10.44.81.150:53              Masq    10     0          48655

*This is the ip/route information from a real server:

*[root@dns-be-01 ~]# ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:22:19:57:97:cf brd ff:ff:ff:ff:ff:ff
    inet 10.44.81.150/24 brd 10.44.81.255 scope global eth0
    inet6 fe80::222:19ff:fe57:97cf/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
    link/ether 00:22:19:57:97:d1 brd ff:ff:ff:ff:ff:ff
4: sit0: <NOARP> mtu 1480 qdisc noop
    link/sit 0.0.0.0 brd 0.0.0.0
[root@dns-be-01 ~]# ip ro s
10.44.81.0/24 dev eth0  proto kernel  scope link  src 10.44.81.150
10.216.0.0/16 via 10.44.81.1 dev eth0
10.44.0.0/16 via 10.44.81.1 dev eth0
default via 10.44.81.108 dev eth0

Attachment: keepalived.conf
Description: Text document

Attachment: tcpdump.txt
Description: Text document

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
<Prev in Thread] Current Thread [Next in Thread>