Hi,
I am switching out current windows cluster with bind9 to an
lvs/keepalived cluster with four realservers running powerdns. And two
load balancers with an active/active setup. I have to switch about 200
ip addresses because most of our customers have there own virtual dns
servers with a static ip. Up till now everything has been working fine i
have been gradually switching ips on a daily basis so i don't create to
much load on the servers at once. Today i switched over a few ip's and
some of the ip's that i switched weeks ago stopped working properly. Dns
requests were taking about 4-5 seconds to answer so windows nslookup was
giving people a servfail or timeouts. The load on both of the load
balancers was stabil about 3-4. I am using nat for my realservers with
health checking through keepalived are there any arp issues i should
address? Or ist there any max connection limits i need to change. At the
moment i have this
lvs01 ~ # ipvsadm
IP Virtual Server version 1.2.1 (size=1048576)
Prot LocalAddress:Port Scheduler Flags
Am i perhaps overloading the server with connections? I have 1 gig Ram
and i get about three million connections every day. Mostly udp packet
for dns queries and a few thousand tcp axfr queries. At the moment i am
clueless about what to do. Everytime i think the system is running
stable and i switch a few of our power customers with a few thousand
domains the system starts slowing down. The query timeouts kicks in and
i don't no how address this problem. Perhaps somone can push me in the
right direction at the moment it all seems to be really frustration for
me. Especialy if you have to explane to loads of ranting customers why
there dns servers i giving them timeouts. Sorry for the long mail
Regards Simon
|