LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: LVS-DR keepalived problem - SOLUTION

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: LVS-DR keepalived problem - SOLUTION
From: Paolo Perrucci <p.perrucci@xxxxxxxxxx>
Date: Sat, 17 Jun 2006 20:57:48 +0200
Hi all,
after some days (and nights) of work I found the problem and the solution.
Following you can find the explanation of the problem with the solution.
I hope this help other guys to sleep easy...

keepalived control the ipvs configuration of both master and slave director (in my case also real servers). Actually the ipvs in the backup director is not sleeping. If the module is loaded in the kernel and the lvs table is not empty, ipvs is inspecting the network traffic applying the configured rules. Therefore, in my configuration, ipvs on the 2nd real server handle network traffic forwarded by the master ipvs creating the loop.

For the 1st client request the flow is:
- the request reaches the master ipvs
- according to configuration table (it's in my first mail) and counter (I configured a rr scheduler), ipvs forward packets to 2nd real server - on the 2nd real server ipvs handle the request using the table, so the request is forwarded to 2nd real server RIP

For the 2nd client request the flow is:
- the request reaches the master ipvs
- according to configuration table and counter, ipvs forward packets to 1st real server RIP

For the 3rd client request the flow is:
- the request reaches the master ipvs
- according to configuration table and counter, ipvs forward packets to 2nd real server RIP - on the 2nd real server ipvs handle the request using the table, so the request is forwarded to 1st real server - on the 1st real server ipvs handle the request using the table, so the request is forwarded to 2nd real server
- ...loop...

To solve the problem I removed the hidden VIP on the real servers and I used the following iptables nat rule

-A PREROUTING -d 10.0.91.25 -p tcp -j REDIRECT

activated by keepalived on the slave director.
In this way, the packets arriving on the slave director are modified in order to bypass ipvs (ipvs get only packets direct to VIP 10.0.91.25).

Paolo

Paolo Perrucci ha scritto:
Hi all,

I trying to configure a LVS-DR with 2 servers (centos 4.3) using
keepalived 1.1.12 for an http service.
The 2 servers acts as master director/slave director and real servers.

The problem arise when the 3rd client request arrive on the director.
From the client side, the browser wait for the connection to be
established without success and after a while it fails.
From the real servers point of view, I see a LOT of network traffic
consisting of only SYN packet.
My configuration is:

VIP: 10.0.91.25
RIP1: 10.0.91.23
RIP1: 10.0.91.24
Client: 10.0.90.116

--------------------------- keepalived.conf on real server 1 (10.0.91.23)
vrrp_instance VI_1 {
       state MASTER
       interface eth0
       track_interface {
               eth0
       }
       lvs_sync_daemon_interface eth0
       virtual_router_id 25
       priority 150
       advert_int 2
       authentication {
               auth_type PASS
               auth_pass tps
       }
       virtual_ipaddress {
               10.0.91.25/24
       }
       notify_master "/etc/keepalived/ip_localhost del"
       notify_backup "/etc/keepalived/ip_localhost add"
       notify_fault "/etc/keepalived/ip_localhost add"
}

virtual_server 10.0.91.25 80  {
       delay_loop 5
       lb_algo rr
       lb_kind DR
       protocol TCP
       real_server 10.0.91.23 80 {
               weight 1
               inhibit_on_failure
               TCP_CHECK {
                       connect_port 80
                       connect_timeout 3
                       nb_get_retry 3
                       delay_before_retry 1
               }
       }
       real_server 10.0.91.24 80 {
               weight 1
               inhibit_on_failure
               TCP_CHECK {
                       connect_port 80
                       connect_timeout 3
                       nb_get_retry 3
                       delay_before_retry 1
               }
       }
}
--------------------------------------------------------------------------------------


--------------------------- keepalived.conf on real server 2 (10.0.91.24)
vrrp_instance VI_1 {
       state BACKUP
       interface eth0
       track_interface {
               eth0
       }
       lvs_sync_daemon_interface eth0
       virtual_router_id 25
       priority 100
       advert_int 2
       authentication {
               auth_type PASS
               auth_pass tps
       }
       virtual_ipaddress {
               10.0.91.25/24
       }
       notify_master "/etc/keepalived/ip_localhost del"
       notify_backup "/etc/keepalived/ip_localhost add"
       notify_fault "/etc/keepalived/ip_localhost add"
}

virtual_server 10.0.91.25 80  {
       delay_loop 5
       lb_algo rr
       lb_kind DR
       protocol TCP
       real_server 10.0.91.23 80 {
               weight 1
               inhibit_on_failure
               TCP_CHECK {
                       connect_port 80
                       connect_timeout 3
                       nb_get_retry 3
                       delay_before_retry 1
               }
       }
       real_server 10.0.91.24 80 {
               weight 1
               inhibit_on_failure
               TCP_CHECK {
                       connect_port 80
                       connect_timeout 3
                       nb_get_retry 3
                       delay_before_retry 1
               }
       }
}
--------------------------------------------------------------------------------------


--------------------------------------------------------------------------------------
/etc/keepalived/ip_localhost is the script used to setup the VIP (bound
to lo) on the real servers:

#/bin/sh
case "$1" in
 add)
       ip addr add 10.0.91.25/32 dev lo brd + scope host
       ;;
 del)
       ip add del 10.0.91.25/32 dev lo
       ;;
 *)
       echo "Usage: $0 {add|del}"
       exit 1
esac
exit 0
--------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------
/etc/sysctl.conf

net.ipv4.ip_forward = 1
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.default.accept_source_route = 1
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
net.ipv4.conf.eth0.arp_ignore = 1
net.ipv4.conf.eth0.arp_announce = 2
--------------------------------------------------------------------------------------

After starting the keepalived service on the two servers I have this
network configuration on the first real server:

1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
   inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
   inet6 ::1/128 scope host
      valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
   link/ether 00:0c:29:1a:ce:fe brd ff:ff:ff:ff:ff:ff
   inet 10.0.91.23/24 brd 10.0.91.255 scope global eth0
   inet 10.0.91.25/24 scope global secondary eth0
   inet6 fe80::20c:29ff:fe1a:cefe/64 scope link
      valid_lft forever preferred_lft forever

and this one on the 2nd real server:

1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
   inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
   inet 10.0.91.25/32 scope host lo
   inet6 ::1/128 scope host
      valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
   link/ether 00:0c:29:7a:c2:d3 brd ff:ff:ff:ff:ff:ff
   inet 10.0.91.24/24 brd 10.0.91.255 scope global eth0
   inet6 fe80::20c:29ff:fe7a:c2d3/64 scope link
      valid_lft forever preferred_lft forever

The ipvsadm status seems to be correct.
On the 1st server is:

IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
 -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.0.91.25:http rr
 -> 10.0.91.24:http              Route   1      0          0
 -> 10.0.91.23:http              Local   1      0          0

On the 2nd server is:

IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
 -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.0.91.25:http rr
 -> 10.0.91.24:http              Local   1      0          0
 -> 10.0.91.23:http              Route   1      0          0

When the 3rd client request arrive on the server this is the tcpdump
output on the first node:

...
00:49:02.366902 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.366929 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367082 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367095 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367878 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367902 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367881 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367910 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367882 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.367916 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
00:49:02.368584 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
...

and the same you can see in the tcpdump output from the 2

...
22:51:39.744887 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.746808 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.746843 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.746816 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.746862 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.746818 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.746884 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.747879 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.747909 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.747881 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.747949 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.748892 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.748923 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
22:51:39.749745 IP 10.0.90.116.3724 > 10.0.91.25.http: S
2143602042:2143602042(0) win 32768 <mss 1460,nop,nop,timestamp 0
0,nop,nop,sackOK>
...

As you can see from the timestamps it's a lot of network traffic.
It seems like there is a loop between the two server.
The first two client requests are handled correctly: the first one goes
to the first node and the 2nd one goes to the other node.

Anyone can give me some hints to debug (and hopefully solve) the problem.
Thank you
Paolo

_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://www.in-addr.de/mailman/listinfo/lvs-users



<Prev in Thread] Current Thread [Next in Thread>