Re: DNS Server Cluster

To: " users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: DNS Server Cluster
Cc: Horms <horms@xxxxxxxxxxxx>
Cc: Roberto Nibali <ratz@xxxxxxxxxxxx>
From: Joseph Mack NA3T <jmack@xxxxxxxx>
Date: Mon, 27 Nov 2006 10:48:17 -0800 (PST)
On Mon, 27 Nov 2006, Simon Pearce wrote:

I have a total of about 250 IP addresses to migrate and here's where the problems start. Everytime time the dns cluster exceedes a certain limit some of the ip addresses stop working properly.

From Wayne's posting it's possible that this may not work
with our setup, but since I don't know why, go I'll just forge ahead anyhow.

Ted Pavlic, back in the early days, had a director with 1024 IPs, so it's not the large number of IPs, at least for TCP

There was a posting (in the last month I'd guess) where someone's UDP balancing was not working properly and the suggested solution was Julian's UDP single packet scheduler patch. I forget their symptoms, but they aren't your symptoms, but there may be problems with UDP we haven't found because no-one is stressing UDP balancing very hard.

It effects the system in a way
that for certain domains you get a timeout when querying the cluster.
Some of the transfered IP's

transferred IPs? these are just the VIPs, that you have running on the LVS cluster, nothing special, just VIPs?

seem to stop working or slow down to an
extend that other dns servers stop querying us.

do you know which IP's these are? Anything strange in the output of ipvsadm, netstat on the realservers for these IPs?

I am also using iptables on the two load balancers with a conntrack table because the real servers have private ip addresses and i can't update them otherwise.

I don't know the connection between conntrack and private IP's. Want to enlighten me?

I checked the logs but i can't find any info that the conntrack tables is full. But i read on the lvs list that the conntrack tables ist not needed for lvs nat and can slow the system down i am however not sure about this?

can you do a test with conntrack off?

Is there anything else someone could think of that i might have done wrong. The unuseal thing is that the cluster seems to work fine untill the load exceedes a certain limit i menchioned earlier which i can't really define in words.

Is the problem load or the number of IPs (if you can tell)?

There is another problem with failover of large numbers of IPs, just incase you want to read more on the topic (it may not be related to your problem).

Can you setup ipvsadm with a single fwmark instead of all the IPs? That would shift the responsibility for handling all the IPs to iptables, rather than ipvsadm.

Do you have a large iptables rule set that might be slowing things down? iptables scales with O(n^2); still 250 IPs doesn't seem a lot of IPs.

        Are we having UDP problems here?


Joseph Mack NA3T
jmack (at) wm7d (dot) net - azimuthal equidistant map
generator at
Homepage It's GNU/Linux!

