LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: DNS problems solved

To: Simon Pearce <sp@xxxxxxxx>
Subject: Re: DNS problems solved
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Simon Horman <horms@xxxxxxxxxxxx>
Date: Tue, 8 May 2007 11:30:30 +0900
On Fri, Apr 06, 2007 at 01:12:39PM +0200, Simon Pearce wrote:
>  
> 
> Some of you on the list might remember my problem concerning our DNS cluster 
> last year.
> 
> 
> http://archive.linuxvirtualserver.org/html/lvs-users/2006-11/msg00278.html
> 
> 
> These problems (DNS timeouts) have continued throughout this year and
> i have been desperately trying to find the solution. I have been
> folowing the mailing list and stumbled over the probems Adrian Chapela
> was having with his DNS setup. Which brought me to the solution
> ipvsadm -L --timeout the default settings for UDP packets was set to
> 500 seconds which should be changed. Which is way to long the load
> balancers were waiting for 5 minutes to timeout a UDP packet i get
> ablout 1500 queries a second. I changed the setting to 15 seconds last
> week. And moved some of our old windows/bind DNS servers to the new
> linux DNS cluster. Before i changed the timeout settings i always
> recieved a call from our customers within two hours your DNS services
> are not responding correctly. The IP's that refused to answer would
> always change i have 254 IP's some of the large German dialup
> providers would refuse to talk to us which resulted in domains not
> being reachable. Our DNS cluster is autorative for about 250000
> domains so you can imagine how many complaints i recieved. I was about
> to give up and scrap keepalived i am so glad i did not. Changing the
> timeout value solved my problems and i am a happy man at the moment.
> Is there a way to set the timeout value permently so it is saved after
> a reboot of the server? One last thing i would like to say is a big
> thank you to Graeme Fowler, Horms, Adrian Chapela and Alexandre Cassen
> for writing this grat piece of software. and anyone else on the list
> who maybe contributed to help me finaly find the solution. Thank you
> guys you do a great job on the mailing list.

Hi Simon,

glad to hear that you got to the bottom of your problem.

I am a little concerned about the idea of reducing UDP timeouts
significantly because to be quite frank UDP load-balancing is a bit of
a hack. The problem lies in the connectionless nature of the protocol,
so natrually LVS has a devil of a time tracking UDP "connections" - that
is a series of datagrams between a client and server that are really
part of what would be a connection if TCP was being used.

As UDP doesn't really have any state all LVS can do to identify
such "connections" is to set up affinity based on the source and
destination ip and port tuples. If my memory serves me correctly
DNS quite often originates from port 53, and so if you are getting
lots of requests from the same DNS server then this affinity heristic
breaks down.

The trouble is that if the timeout is significatnly reduced, the
probablility of it breaking down the other way - in the case where
that affinity is correct - increases.

I'm not saying that you don't have a good case. Nor am I saying that
changing the default timeout is off-limits. Just that what exactly is a
good default timeout is a tricky question, because what works well in
some cases will not work well in others, and vice versa.

To some extent I wonder if the userspace tools should have the smarts to
change the timeout if port 53 (DNS) is in use. Thought that may be an
even worse heuristic.

I wonder if a better idea might be the one packet scheduling patches
by Julian. Much to my surprise these aren't merged. Perhaps thats my
fault. I should look into it...

http://archive.linuxvirtualserver.org/html/lvs-users/2005-09/msg00214.html

I also wonder, if problem relates to connection entries for servers that
have been quiesced, then does setting expire_quiescent_template help?

echo 1 > /proc/sys/net/ipv5/vs/expire_quiescent_template

Sorry if those ideas have been canvased before, I only breifly 
refreshed my memory of the original thread.

-- 
Horms
  H: http://www.vergenet.net/~horms/
  W: http://www.valinux.co.jp/en/


<Prev in Thread] Current Thread [Next in Thread>