LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

RE: 'no hit' for LVS connection tracking (SYN+ACK not translated)

To: "Julian Anastasov" <ja@xxxxxx>
Subject: RE: 'no hit' for LVS connection tracking (SYN+ACK not translated)
Cc: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
From: "Jari Takkala" <Jari.Takkala@xxxxxx>
Date: Fri, 2 Sep 2005 11:08:57 -0400
Hi Julian,

Here are my vs settings

# grep . /proc/sys/net/ipv4/vs/*
/proc/sys/net/ipv4/vs/am_droprate:10
/proc/sys/net/ipv4/vs/amemthresh:2048
/proc/sys/net/ipv4/vs/cache_bypass:0
/proc/sys/net/ipv4/vs/debug_level:0
/proc/sys/net/ipv4/vs/drop_entry:0
/proc/sys/net/ipv4/vs/drop_packet:0
/proc/sys/net/ipv4/vs/expire_nodest_conn:0
/proc/sys/net/ipv4/vs/nat_icmp_send:0
/proc/sys/net/ipv4/vs/secure_tcp:0
/proc/sys/net/ipv4/vs/sync_threshold:3
/proc/sys/net/ipv4/vs/timeout_close:10
/proc/sys/net/ipv4/vs/timeout_closewait:60
/proc/sys/net/ipv4/vs/timeout_established:480
/proc/sys/net/ipv4/vs/timeout_finwait:60
/proc/sys/net/ipv4/vs/timeout_icmp:60
/proc/sys/net/ipv4/vs/timeout_lastack:30
/proc/sys/net/ipv4/vs/timeout_listen:120
/proc/sys/net/ipv4/vs/timeout_synack:100
/proc/sys/net/ipv4/vs/timeout_synrecv:10
/proc/sys/net/ipv4/vs/timeout_synsent:60
/proc/sys/net/ipv4/vs/timeout_timewait:60
/proc/sys/net/ipv4/vs/timeout_udp:180

No firewall rules, fwmarking, NAT, or bridging. No extra patches to IPVS. 
CONFIG_IP_NF_IPTABLES, CONFIG_IP_NF_NAT, and CONFIG_BRIDGE are kernel modules 
which are not loaded.

# lsmod
Module                  Size  Used by    Tainted: P
ip_vs_ftp               5956   0
ip_vs_wlc               1604   4  (autoclean)
ip_vs                  73812   7  (autoclean) [ip_vs_ftp ip_vs_wlc]
sg                     36460   0  (autoclean)
dcdesm                 36124   1
dcdbas                 40184   1
autofs                 13460   0  (autoclean) (unused)
bcm5700               106952   0  (unused)
e100                   57028   2

> From your explanation ip_vs_ftp leads to problems where SYN
> creates web connection, it is hashed in table, DNAT-ed to RS, then RS
> replies SYN+ACK which can not match the connection in table, it looks
> like this connection is not present (may be removed, do you see something
> in debug logs from the SYN to the SYN+ACK) or hash table is damaged.

The above sounds correct. Once again, here is the debug log. It looks like the 
incoming packet is hit, however the outgoing packet is not. See my first email 
for the tcpdump's.

Aug 13 03:20:43 kernel: IPVS: lookup/in TCP 216.220.XX.XXX:9345->10.99.23.64:80 
hit
Aug 13 03:20:43 kernel: IPVS: Incoming TCP 216.220.XX.XXX:9345->10.99.23.64:80
Aug 13 03:20:43 kernel: Enter: ip_vs_nat_xmit, ip_vs_conn.c line 680
Aug 13 03:20:43 kernel: IPVS: NAT to 10.99.22.53:80
Aug 13 03:20:43 kernel: Leave: ip_vs_nat_xmit, ip_vs_conn.c line 820
Aug 13 03:20:43 kernel: Enter: ip_vs_out, ip_vs_core.c line 646
Aug 13 03:20:43 kernel: IPVS: lookup/out TCP 
10.99.22.53:80->216.220.XX.XXX:9345 not hit
Aug 13 03:20:43 kernel: IPVS: packet for TCP 216.220.XX.XXX:9345 continue 
traversal as normal.
Aug 13 03:20:43 kernel: Enter: ip_vs_out, ip_vs_core.c line 646
Aug 13 03:20:43 kernel: IPVS: lookup/out TCP 
216.220.XX.XXX:9345->10.99.22.53:80 not hit
Aug 13 03:20:43 kernel: IPVS: packet for TCP 10.99.22.53:80 continue traversal 
as normal.

> Do you still think it is caused by ip_vs_ftp? About your tests, is the
> client IP on lan? Do you think this client IP has many connections to
> the director?

The client IP is not on the LAN. The problem occurs from any source IP trying 
to visit a load balanced VIP. Whenever we add the FTP service to ipvsadm, and 
begin load balancing to it, the problem begins to occur on all services. 
However, it is not consistent. Some outgoing SYN+ACK packets will get 
translated correctly for a certain period of time, then after awhile some 
packets will not be translated.

I do not think it is load related. We have other load balancers built from the 
same image handling many more connections.

# ipvsadm -l -n
IP Virtual Server version 1.0.11 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.99.23.64:80 wlc persistent 300
  -> 10.99.22.53:80               Masq    1      13         14
  -> 10.99.22.58:80               Masq    1      15         3
  -> 10.99.22.215:80              Masq    1      14         1
TCP  10.99.23.54:80 wlc persistent 300
TCP  10.99.23.51:80 wlc persistent 300
  -> 10.99.22.199:80              Masq    1      30         6
  -> 10.99.22.197:80              Masq    1      32         4
TCP  10.99.23.98:5061 wlc
  -> 10.99.22.252:5061            Masq    1      0          0
  -> 10.99.22.251:5061            Masq    1      0          0

Here is the output from the FTP service, which is not currently in the ipvsadm 
table because of the problems it's causing.

TCP  10.99.23.57:21 wlc
  -> 10.99.22.208:21              Masq    1      0          0
  -> 10.99.22.207:21              Masq    1      0          0

Because this is a production environment, I cannot make very many changes or 
further test the FTP service. At the moment, we are not load balancing FTP 
because of the problems it creates. I have tried to reproduce this in the lab 
using an image of the production load balancer. Unfortunately, I've had no luck 
getting the problem to occur in the lab. I do not have access to the web and 
FTP servers, and that is preventing me from fully reproducing the production 
environment. That may have an effect on the validity of the tests.

Any more ideas? Thanks!

Jari

<Prev in Thread] Current Thread [Next in Thread>