Hi all,
I trying to configure a LVS-DR with 2 servers (centos 4.3) using
keepalived 1.1.12 for an http service.
The 2 servers acts as master director/slave director and real servers.
The problem arise when the 3rd client request arrive on the director.
From the client side, the browser wait for the connection to be
established without success and after a while it fails.
From the real servers point of view, I see a LOT of network traffic
consisting of only SYN packet.
My configuration is:
VIP: 10.0.91.25
RIP1: 10.0.91.23
RIP1: 10.0.91.24
Client: 10.0.90.116
--------------------------- keepalived.conf on real server 1 (10.0.91.23)
vrrp_instance VI_1 {
state MASTER
interface eth0
track_interface {
eth0
}
lvs_sync_daemon_interface eth0
virtual_router_id 25
priority 150
advert_int 2
authentication {
auth_type PASS
auth_pass tps
}
virtual_ipaddress {
10.0.91.25/24
}
notify_master "/etc/keepalived/ip_localhost del"
notify_backup "/etc/keepalived/ip_localhost add"
notify_fault "/etc/keepalived/ip_localhost add"
}
virtual_server 10.0.91.25 80 {
delay_loop 5
lb_algo rr
lb_kind DR
protocol TCP
real_server 10.0.91.23 80 {
weight 1
inhibit_on_failure
TCP_CHECK {
connect_port 80
connect_timeout 3
nb_get_retry 3
delay_before_retry 1
}
}
real_server 10.0.91.24 80 {
weight 1
inhibit_on_failure
TCP_CHECK {
connect_port 80
connect_timeout 3
nb_get_retry 3
delay_before_retry 1
}
}
}
--------------------------------------------------------------------------------------
--------------------------- keepalived.conf on real server 2 (10.0.91.24)
vrrp_instance VI_1 {
state BACKUP
interface eth0
track_interface {
eth0
}
lvs_sync_daemon_interface eth0
virtual_router_id 25
priority 100
advert_int 2
authentication {
auth_type PASS
auth_pass tps
}
virtual_ipaddress {
10.0.91.25/24
}
notify_master "/etc/keepalived/ip_localhost del"
notify_backup "/etc/keepalived/ip_localhost add"
notify_fault "/etc/keepalived/ip_localhost add"
}
virtual_server 10.0.91.25 80 {
delay_loop 5
lb_algo rr
lb_kind DR
protocol TCP
real_server 10.0.91.23 80 {
weight 1
inhibit_on_failure
TCP_CHECK {
connect_port 80
connect_timeout 3
nb_get_retry 3
delay_before_retry 1
}
}
real_server 10.0.91.24 80 {
weight 1
inhibit_on_failure
TCP_CHECK {
connect_port 80
connect_timeout 3
nb_get_retry 3
delay_before_retry 1
}
}
}
--------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------
/etc/keepalived/ip_localhost is the script used to setup the VIP (bound
to lo) on the real servers:
#/bin/sh
case "$1" in
add)
ip addr add 10.0.91.25/32 dev lo brd + scope host
;;
del)
ip add del 10.0.91.25/32 dev lo
;;
*)
echo "Usage: $0 {add|del}"
exit 1
esac
exit 0
--------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------
/etc/sysctl.conf
net.ipv4.ip_forward = 1
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.default.accept_source_route = 1
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
net.ipv4.conf.eth0.arp_ignore = 1
net.ipv4.conf.eth0.arp_announce = 2
--------------------------------------------------------------------------------------
After starting the keepalived service on the two servers I have this
network configuration on the first real server:
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:0c:29:1a:ce:fe brd ff:ff:ff:ff:ff:ff
inet 10.0.91.23/24 brd 10.0.91.255 scope global eth0
inet 10.0.91.25/24 scope global secondary eth0
inet6 fe80::20c:29ff:fe1a:cefe/64 scope link
valid_lft forever preferred_lft forever
and this one on the 2nd real server:
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
inet 10.0.91.25/32 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:0c:29:7a:c2:d3 brd ff:ff:ff:ff:ff:ff
inet 10.0.91.24/24 brd 10.0.91.255 scope global eth0
inet6 fe80::20c:29ff:fe7a:c2d3/64 scope link
valid_lft forever preferred_lft forever
The ipvsadm status seems to be correct.
On the 1st server is:
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.0.91.25:http rr
-> 10.0.91.24:http Route 1 0 0
-> 10.0.91.23:http Local 1 0 0
On the 2nd server is:
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.0.91.25:http rr
-> 10.0.91.24:http Local 1 0 0
-> 10.0.91.23:http Route 1 0 0
When the 3rd client request arrive on the server this is the tcpdump
output on the first node:
...
00:49:02.366902 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.366929 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367082 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367095 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367878 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367902 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367881 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367910 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367882 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367916 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.368584 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
...
and the same you can see in the tcpdump output from the 2
...
22:51:39.744887 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.746808 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.746843 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.746816 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.746862 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.746818 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.746884 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.747879 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.747909 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.747881 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.747949 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.748892 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.748923 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.749745 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
...
As you can see from the timestamps it's a lot of network traffic.
It seems like there is a loop between the two server.
The first two client requests are handled correctly: the first one goes
to the first node and the 2nd one goes to the other node.
Anyone can give me some hints to debug (and hopefully solve) the problem.
Thank you
Paolo
|