Dear all
I am new to LVS and I have a problem in setting up two LVSes for failover
issue.
The problem is related to the ARP caching of the primary LVS' MAC address in
the real servers and the router connected to the Internet. The problem leads
all the
Internet connections stalled until all ARP caching in Web Servers and router
to be
expired. Can anyone help to solve the problem by making some changes in the
Linux LVS ? ( It is because I am not able to change the router ARP cache
time.
The router is not owned by the Web hosting company not by me.)
My situation is:
I have setup two RedHat Linux 6.1, without any kernel patch, for LVSes. One
is the
primary LVS and the other is the backup LVS. Behind the two Linux, there are
two
NT IIS4 Web Servers.
In each LVS, there are two network card installed. The eth0 is connected to
a router which is connected to the Internet. The eth1 is connected to a
private network
which is the same segment as the two NT IIS4.
The eth0 of the primary LVS is assigned an IP address 202.53.128.56
The eth0 of the backup LVS is assigned an IP address 202.53.128.57
The eth1 of the primary LVS is assigned an IP address 192.128.1.9
The eth1 of the primary LVS is assigned an IP address 192.128.1.10
In addition, both primary and backup LVS have enabled the IPV4 FORWARD and
IPV4 DEFRAG. In the file /etc/rc.d/rc.local the following command was also
added:
ipchains -A -j MASQ 192.168.1.0/24 -d 0.0.0.0/0
I use the piranha to configure the LVS so that the two LVS have a common
IP address 202.53.128.58 in the eth0 as eth0:1. And have a IP address
192.128.1.1
in the eth1 as eth1:1
The pulse daemon is also automatically be run when the two LVSes were
booted.
In my configuration, the Internet clients can still access to our Web server
with one of
the NT was disconnected from the LVS. The backup LVS --CAN AUTOMATICALLY--
take up the role of the primary LVS when the primary LVS is shutted down or
disconnected
from the backup LVS. However, I found that all the NT Web Servers cannot
reach the backup
LVS through the common IP address 192.128.1.1, and all the Internet clients
stalled to
connect to our web servers.
Later, I found that the problem may due to the ARP caching in the Web
Servers and router.
I tried to limit the ARP cache time to 5 seconds in the NT servers and half
of the problem has solved
, i.e. the NT Web servers can reach the backup LVS through the common IP
address 192.128.1.1
when the primary LVS was down. However, it is still cannot be connected
through the Internet clients
when the LVS failover occur.
Regards
Antony
|