LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] Problem with udp/1812 on a 2-node UltraMonkey style HA c

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [lvs-users] Problem with udp/1812 on a 2-node UltraMonkey style HA cluster
From: John Donath <john.donath@xxxxx>
Date: Wed, 24 Oct 2007 21:30:03 +0200
Joseph Mack NA3T wrote:
> On Wed, 24 Oct 2007, John Donath wrote:
>
>   
>> Hi,
>>
>> I have setup a 2 node HA cluster based on the Streamline High
>> availability and Load Balancing concept.
>>
>> The weird thing is that it works fantastic for tcp/80 but it doesn't
>> work properly for a udp service like radius (up/1812).
>>     
>
> There are conceptual problems loadbalancing UDP, as there is 
> no connection (see UDP in the HOWTO, there are solutions but 
> all have problems). As well do you understand the many 
> reader/single writer problem when loadbalancing databases?
>
>   
Yes, I do. This is not a problem as only read actions are involved.
>> Assume we have both the http and radius service down on the failover
>> director (grind12):
>>
>> [root@grind11 ~]# ipvsadm
>> IP Virtual Server version 1.2.0 (size=4096)
>> Prot LocalAddress:Port Scheduler Flags
>>   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
>> UDP  172.31.1.10:radius rr
>>   -> 172.31.1.11:radius           Local   1      0          0
>> TCP  172.31.1.10:http rr persistent 600
>>   -> 172.31.1.11:http             Local   1      0          0
>>     
>
> (not related to your problem) persistence has problems. You 
> could look at the -SH scheduler instead.
>   
I will sure do.
>   
>> I now can access the webserver but I don't get any response from the
>> radius service.
>>     
>
> how can you access a service when the service is down?
>
>   
The service is down on the failover node but up on the primary.
> Is Radius listening on the VIP? (it should be, see writeup 
> for LocalNode)
>
>   
Radius is listening on 0.0.0.0.
>   
>> Here are results from tcpdump on both nodes when a radius request is
>> initiated:
>> [root@grind11 ~]# tcpdump -ni any -p udp and host 83.162.10.97
>> 14:41:10.069858 IP 83.162.10.97.32843 > 172.31.1.10.radius: RADIUS,
>> Access Request (1), id: 0xdb length: 65
>> 14:41:10.069891 IP 172.31.1.11.radius > 83.162.10.97.32843: RADIUS,
>> Access Accept (2), id: 0xdb length: 26
>>
>> As you will note the wrong source address is used !!
>> It's responding with the realnode IP instead of the VIP and that's
>> causing the problem.
>>     
>
> No idea. I assume that Radius is listening on x.x.x.11 
> (instead of x.x.x.10), in which case I can't imagine how 
> Radius is getting packets at all.
>
>   
>> I am puzzled why this problem does not exist when testing http (tcp/80)
>> as yo can see from this:
>> 14:43:53.399206 IP 83.162.10.97.41143 > 172.31.1.10.http: F 553:553(0)
>> ack 268 win 1728 <nop,nop,timestamp 496389562 507325571>
>> 14:43:53.399224 IP 172.31.1.10.http > 83.162.10.97.41143: . ack 554 win
>> 1724 <nop,nop,timestamp 507325582 496389562>
>>
>> Might this be UDP related?
>>     
>
> possibly (since I have no idea what's wrong yet).
>
>   
>> [root@grind12 ~]# tcpdump -ni any -p udp and host 83.162.10.97
>> ** nothing of course **
>>     
>
> I'm sorry, this went over my head. Why "of course"?
>   
"Of course" because I don't expect any packets on the failover node as 
the service is only up on the primary node.
So nothing will be forwarded ...
>
>   
>> If I reverse the situation - bringing down both services on the primary
>> director node (grind11) and starting them up on the failover director
>> (grind12) then both services are accessible.
>>     
>
> hmm. let's leave this till later.
>
>   
Just a remark - when the radius service is down on the primary but up on 
the failover node the radius service nicely responds to requests.
> Joe
>
>   

Thanks for your quick response.

John


<Prev in Thread] Current Thread [Next in Thread>