LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

[lvs-users] Problem with udp/1812 on a 2-node UltraMonkey style HA clust

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: [lvs-users] Problem with udp/1812 on a 2-node UltraMonkey style HA cluster
From: John Donath <john.donath@xxxxx>
Date: Wed, 24 Oct 2007 14:46:39 +0200
Hi,

I have setup a 2 node HA cluster based on the Streamline High 
availability and Load Balancing concept.

The weird thing is that it works fantastic for tcp/80 but it doesn't 
work properly for a udp service like radius (up/1812).

-------------------
Problem description
-------------------

Assume we have both the http and radius service down on the failover 
director (grind12):

[root@grind11 ~]# ipvsadm
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
UDP  172.31.1.10:radius rr
   -> 172.31.1.11:radius           Local   1      0          0
TCP  172.31.1.10:http rr persistent 600
   -> 172.31.1.11:http             Local   1      0          0

I now can access the webserver but I don't get any response from the 
radius service.

Here are results from tcpdump on both nodes when a radius request is 
initiated:
[root@grind11 ~]# tcpdump -ni any -p udp and host 83.162.10.97
14:41:10.069858 IP 83.162.10.97.32843 > 172.31.1.10.radius: RADIUS, 
Access Request (1), id: 0xdb length: 65
14:41:10.069891 IP 172.31.1.11.radius > 83.162.10.97.32843: RADIUS, 
Access Accept (2), id: 0xdb length: 26

As you will note the wrong source address is used !!
It's responding with the realnode IP instead of the VIP and that's 
causing the problem.

I am puzzled why this problem does not exist when testing http (tcp/80) 
as yo can see from this:
14:43:53.399206 IP 83.162.10.97.41143 > 172.31.1.10.http: F 553:553(0) 
ack 268 win 1728 <nop,nop,timestamp 496389562 507325571>
14:43:53.399224 IP 172.31.1.10.http > 83.162.10.97.41143: . ack 554 win 
1724 <nop,nop,timestamp 507325582 496389562>

Might this be UDP related?

[root@grind12 ~]# tcpdump -ni any -p udp and host 83.162.10.97
** nothing of course **

If I reverse the situation - bringing down both services on the primary 
director node (grind11) and starting them up on the failover director 
(grind12) then both services are accessible.

[root@grind11 ~]# ipvsadm
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
UDP  172.31.1.10:radius rr
   -> 172.31.1.12:radius           Route   1      0          0
TCP  172.31.1.10:http rr persistent 600
   -> 172.31.1.12:http             Route   1      0          0

[root@grind11 ~]# tcpdump -ni any -p udp and host 83.162.10.97
11:28:18.604803 IP 83.162.10.97.32841 > 172.31.1.10.radius: RADIUS, 
Access Request (1), id: 0x88 length: 65
11:28:18.604915 IP 83.162.10.97.32841 > 172.31.1.10.radius: RADIUS, 
Access Request (1), id: 0x88 length: 65

[root@grind12 ~]# tcpdump -ni any -p udp and host 83.162.10.97
11:28:22.517935 IP 83.162.10.97.32841 > 172.31.1.10.radius: RADIUS, 
Access Request (1), id: 0x88 length: 65
11:28:22.522124 IP 172.31.1.10.radius > 83.162.10.97.32841: RADIUS, 
Access Accept (2), id: 0x88 length: 26

I have tried all I can think off and I am getting a little desperate now 
.. -(

Do you gurus have any clue?

------------------------------------
Configuration and topology
------------------------------------

ha.cf
-----
logfacility   local0
debug         0
keepalive     2
deadtime      10
warntime      5
initdead      120
udpport       694
ucast         eth1 172.31.1.12
ucast         eth3 10.0.0.2
auto_failback on
node          grind11.graddelt.com
node          grind12.graddelt.com
respawn hacluster /usr/lib/heartbeat/ipfail
crm           off

haresources
-----------
grind11.graddelt.com    \
         ldirectord::ldirectord.cf \
         LVSSyncDaemonSwap::master \
         IPaddr2::172.31.1.10/24/eth1/172.31.1.255

/etc/ha.d/ldirectord.cf
checktimeout=10
checkinterval=2
autoreload=no
logfile="/var/log/ldirectord.log"
quiescent=no
virtual=172.31.1.10:1812
         fallback=127.0.0.1:1812
         real=172.31.1.11:1812 gate
         real=172.31.1.12:1812 gate
         service=radius
         scheduler=rr
         #persistent=600
         protocol=udp
         checktype=negotiate
         login="ldtest@xxxxx"
         passwd="ScdCz32v"
         secret="ldtest123"

virtual=172.31.1.10:80
         fallback=127.0.0.1:80
         real=172.31.1.11:80 gate
         real=172.31.1.12:80 gate
         service=http
         scheduler=rr
         persistent=600
         protocol=tcp
         checktype=negotiate
         request="ldtest.html"
         receive="ALIVE"

sysctl
------

[root@grind11 ~]# sysctl -a | egrep "(forward|arp)"
net.ipv4.conf.eth3.arp_ignore = 1
net.ipv4.conf.eth3.arp_announce = 2
net.ipv4.conf.eth3.arp_filter = 0
net.ipv4.conf.eth3.proxy_arp = 0
net.ipv4.conf.eth3.mc_forwarding = 0
net.ipv4.conf.eth3.forwarding = 1
net.ipv4.conf.eth1.arp_ignore = 1
net.ipv4.conf.eth1.arp_announce = 2
net.ipv4.conf.eth1.arp_filter = 0
net.ipv4.conf.eth1.proxy_arp = 0
net.ipv4.conf.eth1.mc_forwarding = 0
net.ipv4.conf.eth1.forwarding = 1
net.ipv4.conf.eth0.arp_ignore = 1
net.ipv4.conf.eth0.arp_announce = 2
net.ipv4.conf.eth0.arp_filter = 0
net.ipv4.conf.eth0.proxy_arp = 0
net.ipv4.conf.eth0.mc_forwarding = 0
net.ipv4.conf.eth0.forwarding = 1
net.ipv4.conf.lo.arp_ignore = 0
net.ipv4.conf.lo.arp_announce = 0
net.ipv4.conf.lo.arp_filter = 0
net.ipv4.conf.lo.proxy_arp = 0
net.ipv4.conf.lo.mc_forwarding = 0
net.ipv4.conf.lo.forwarding = 1
net.ipv4.conf.default.arp_ignore = 0
net.ipv4.conf.default.arp_announce = 0
net.ipv4.conf.default.arp_filter = 0
net.ipv4.conf.default.proxy_arp = 0
net.ipv4.conf.default.mc_forwarding = 0
net.ipv4.conf.default.forwarding = 1
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
net.ipv4.conf.all.arp_filter = 0
net.ipv4.conf.all.proxy_arp = 0
net.ipv4.conf.all.mc_forwarding = 0
net.ipv4.conf.all.forwarding = 1
net.ipv4.ip_forward = 1


<Prev in Thread] Current Thread [Next in Thread>