Re: [lvs-users] Problem with udp/1812 on a 2-node UltraMonkey style HA c

To: " users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [lvs-users] Problem with udp/1812 on a 2-node UltraMonkey style HA cluster
From: Joseph Mack NA3T <jmack@xxxxxxxx>
Date: Wed, 24 Oct 2007 07:20:56 -0700 (PDT)
On Wed, 24 Oct 2007, John Donath wrote:

> Hi,
> I have setup a 2 node HA cluster based on the Streamline High
> availability and Load Balancing concept.
> The weird thing is that it works fantastic for tcp/80 but it doesn't
> work properly for a udp service like radius (up/1812).

There are conceptual problems loadbalancing UDP, as there is 
no connection (see UDP in the HOWTO, there are solutions but 
all have problems). As well do you understand the many 
reader/single writer problem when loadbalancing databases?

> Assume we have both the http and radius service down on the failover
> director (grind12):
> [root@grind11 ~]# ipvsadm
> IP Virtual Server version 1.2.0 (size=4096)
> Prot LocalAddress:Port Scheduler Flags
>   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
> UDP rr
>   ->           Local   1      0          0
> TCP rr persistent 600
>   ->             Local   1      0          0

(not related to your problem) persistence has problems. You 
could look at the -SH scheduler instead.

> I now can access the webserver but I don't get any response from the
> radius service.

how can you access a service when the service is down?

Is Radius listening on the VIP? (it should be, see writeup 
for LocalNode)

> Here are results from tcpdump on both nodes when a radius request is
> initiated:
> [root@grind11 ~]# tcpdump -ni any -p udp and host
> 14:41:10.069858 IP > RADIUS,
> Access Request (1), id: 0xdb length: 65
> 14:41:10.069891 IP > RADIUS,
> Access Accept (2), id: 0xdb length: 26
> As you will note the wrong source address is used !!
> It's responding with the realnode IP instead of the VIP and that's
> causing the problem.

No idea. I assume that Radius is listening on x.x.x.11 
(instead of x.x.x.10), in which case I can't imagine how 
Radius is getting packets at all.

> I am puzzled why this problem does not exist when testing http (tcp/80)
> as yo can see from this:
> 14:43:53.399206 IP > F 553:553(0)
> ack 268 win 1728 <nop,nop,timestamp 496389562 507325571>
> 14:43:53.399224 IP > . ack 554 win
> 1724 <nop,nop,timestamp 507325582 496389562>
> Might this be UDP related?

possibly (since I have no idea what's wrong yet).

> [root@grind12 ~]# tcpdump -ni any -p udp and host
> ** nothing of course **

I'm sorry, this went over my head. Why "of course"?

> If I reverse the situation - bringing down both services on the primary
> director node (grind11) and starting them up on the failover director
> (grind12) then both services are accessible.

hmm. let's leave this till later.


Joseph Mack NA3T EME(B,D), FM05lw North Carolina
jmack (at) wm7d (dot) net - azimuthal equidistant map
generator at
Homepage It's GNU/Linux!

<Prev in Thread] Current Thread [Next in Thread>