[lvs-users] sporadic connection reset on director

To:	"LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject:	[lvs-users] sporadic connection reset on director
From:	"Hildebrand, Nils, 122" <Nils.Hildebrand@xxxxxxxxxxxx>
Date:	Thu, 21 Feb 2013 14:36:18 +0100

Hi,

I am currently trying to get down to the core of a problem where my
LVS-director seems to drop a packet coming from a client from time to
time. We have this problem on our production systems and can reproduce
the problem on staging.

Our setup:
===========
We are using ipvsadm with Linux CentOS5 x86_64 in a PV XEN-DomU.

Current Version details:
  Kernel: 2.6.18-348.1.1.el5xen
  ipvsadm: 1.24-13.el5

LVS-Setup:
  We use IPVS in DR-mode, for managing the running connections we use
lvs-kiss.
  lvs is running in a heartbeat-v1-cluster (two virtual nodes), master
and backup are running constantly on both nodes
  For the LVS-services we use logical IPs being setup by heartbeat
(active/passive-clustermode)

The real-servers are physical Linux-machines.

Network-Setup:
  The VM acting as director is running as XEN-PV-DomU on a Dom0 using
bridged networks.
  Networks "in play":
    abn-network (staging-network, used to connect the client to the
director),
    used by the real-servers to send the answer to the clients (direct
routing approach),
    used for ipvsadm slave/master multicast-traffic

    lvs-network: This is a dedicated VLAN which connects director and
real-servers

    dr-arp-problem: solved my suppressing arp-answers on the
real-servers for the service-ip

    The service-IP is configured as logical IP on the lvs-interface on
the real-servers.
    In this setup ip_forwarding is not needed anywhere (neither on
director, nor on real-server).

VM details:
  1 GB RAM, 2 vCPUs, system-load almost 0, memory 73M free, 224M
buffers, 536M cache, no swap.
  top shows almost always 100% idle, 0% us/sy/ni/wa/hi/si/st.


Configuration details:

ipvsadm -Ln for the service in question shows:

TCP  x.y.183.217:12405 wrr persistent 7200
  -> 192.168.83.234:12405         Route   1000   0          0
  -> 192.168.83.235:12405         Route   1000   0          0
 
x.y first two octets are from our internal class-B-range.
We use 192.168.83.x as lvs-network for staging.

Persistent ipvsadm-configuration:
  /etc/sysconfig/ipvsadm: --set 20 20 20

Cluster-configuration:
  /etc/ha.d/haresources: $primary_directorname lvs-kiss x.y.183.217

lvs-kiss-configuration-snippet for the service above:
 
<VirtualServer idm-abn:12405>
  ServiceType       tcp
  Scheduler         wrr
  DynamicScheduler    0
  Persistance         7200
  QueueSize           2
  Fuzz              0.1
  <RealServer rs1-lvs:12405>
     PacketForwardingMethod  gatewaying
     Test ping -c 1 -nq -W 1 rs1-lvs >/dev/null
     RunOnFailure   "/sbin/ipvsadm -d -t idm-abn:12405 -r rs1-lvs"
     RunOnRecovery   "/sbin/ipvsadm -a -t idm-abn:12405 -r rs1-lvs"
  </RealServer>
  <RealServer rs2-lvs:12405>
     PacketForwardingMethod  gatewaying
     Test ping -c 1 -nq -W 1 rs2-lvs >/dev/null
     RunOnFailure   "/sbin/ipvsadm -d -t idm-abn:12405 -r rs2-lvs"
     RunOnRecovery   "/sbin/ipvsadm -a -t idm-abn:12405 -r rs2-lvs"
  </RealServer>
</VirtualServer>

idm-abn, rs1 and rs2 resolve via /etc/hosts.

About the service:
  This is a soa-web-service.

How we reproduce the error:
  From a client we run constant calls to the web-service at an interval
of one call in three seconds.
  From time to time there will be a connection reset from the director
to the client.
  Interesting: This happens on n x 100th + 1 tries - interesting is the
one. 

What we did to trace down the problem:
  - Checked /proc/sys/net/ipv4/vs: all values are set to default, so
drop_packet is NOT in place (=0)
  - tcpdump on client, fronted/abn of the director, backend/lvs of the
directory, lvs and abn of the real-servers

In this tcpdump we could see a request from the client, answered by a
connection-reset by the director.
The packet was NOT forwarded via LVS.

I welcome any ideas on how to track this problem further down.
If any information is unclear/missing to drill down the problem - please
ask.

Kind regards

Nils Hildebrand

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread]	Current Thread	[Next in Thread>
[lvs-users] sporadic connection reset on director, Hildebrand, Nils, 122 <=

Previous by Date:	[PATCH net] ipvs: add backup_only flag to avoid loops, Julian Anastasov
Next by Date:	[lvs-users] change of behavior with ipvsadm from centos 5 to centos 6, Besse Mickael
Previous by Thread:	[PATCH net] ipvs: add backup_only flag to avoid loops, Julian Anastasov
Next by Thread:	[lvs-users] change of behavior with ipvsadm from centos 5 to centos 6, Besse Mickael
Indexes:	[Date] [Thread] [Top] [All Lists]