Today, LVS on our director for some reason stopped forwarding DNS requests
to the realservers. On each realserver I have 2 bind processes running, one
in recursive mode only listing on port 53, and another doing authoritative
requests listing on port 5353. On the director port 53 on one IP forwards
to port 53 on the realservers and port 53 on the other director ip forwards
to port 5353 on the realservers.
Both of these died at roughly the same time. Doing an ipvsadm -ln I found
the following:
TCP 216.163.120.19:53 wlc
-> 10.75.0.4:53 Masq 18 0 0
-> 10.75.0.7:53 Masq 18 0 0
-> 10.75.0.6:53 Masq 19 0 0
-> 10.75.0.5:53 Masq 19 0 0
-> 10.75.0.3:53 Masq 15 0 0
-> 10.75.0.8:53 Masq 25 0 0
TCP 216.163.120.20:53 wlc
-> 10.75.0.4:5353 Masq 18 0 0
-> 10.75.0.7:5353 Masq 18 0 0
-> 10.75.0.6:5353 Masq 19 0 0
-> 10.75.0.5:5353 Masq 19 0 0
-> 10.75.0.3:5353 Masq 15 0 0
-> 10.75.0.8:5353 Masq 25 0 0
UDP 216.163.120.20:53 wlc
-> 10.75.0.4:5353 Masq 18 0 0
-> 10.75.0.7:5353 Masq 18 0 0
-> 10.75.0.6:5353 Masq 19 0 0
-> 10.75.0.5:5353 Masq 19 0 0
-> 10.75.0.3:5353 Masq 15 0 0
-> 10.75.0.8:5353 Masq 25 0 0
UDP 216.163.120.19:53 wlc
-> 10.75.0.4:53 Masq 18 0 2
-> 10.75.0.7:53 Masq 18 0 3
-> 10.75.0.6:53 Masq 19 0 3
-> 10.75.0.5:53 Masq 19 0 3
-> 10.75.0.3:53 Masq 15 0 2
-> 10.75.0.8:53 Masq 25 0 4
Which shows the recursive IP had a few inactive connections.
I was able to do digs on all the realservers for both port 53 and 5353 from
the director, just requests going through the director were timing out.
I stopped and restarted LVS on the director and that seems to have cleared
up the problem. But I'd like to know what happened. I've never seen this
before. LVS prior to this has been running for 48days without any problems.
btw, I'm running LVS-NAT (ipvsadm v1.21) on a 2.4.19 kernal.
No message in syslog to indicate any problems.
|