LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

[lvs-users] 2-node setup connections hanging to backup

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: [lvs-users] 2-node setup connections hanging to backup
From: Lloyd Brown <lloyd_brown@xxxxxxx>
Date: Thu, 21 Aug 2014 15:18:15 -0600
Hi, all.  I'm having another problem that I hope someone can help me
with.  Or at least point me in the right direction for diagnosing this.
 It's a little weird, and I'm running out of ideas to test.

I'm in the middle of testing a two-node LVS balancing setup (see
http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.localnode.html#two_box_lvs)
to balance SSH connections.  But for reasons that aren't clear, some of
my connections are getting hung up.

Here's the general info on my setup:

- I'm running on (basically) a modified RHEL 6.2 image with an updated
kernel package
- This setup uses LVS-DR
- The two nodes have the IP addresses 192.168.25.14 and 192.168.25.15
for direct access
- The virtual IP that's being balanced is 192.168.25.16
- The testing client is 192.168.25.17
- I'm using keepalived to manage the setup, VRRP, etc.
- I've already started using an FWMARK balancing setup, using IPTables,
to avoid the double-balancing/packet-storm/battling-directors issue
described in section 9.3 of the LVS-HOWTO (URL above)
- All connections that go to the active director, and get sent locally,
seem to be fine
- Some, but not all, of the connections that go through the active
director, and are forwarded to the backup director (also acting as a
realserver), are hanging up
- When I do something short (eg. a loop around "ssh 192.168.25.16
hostname"), I can frequently get several good connections through to the
backup director, before one of them hangs up.
- If I try to do a larger stream of data, eg scp a file, then my
connection stalls/hangs up every time I'm sent to the backup director/RS
- There doesn't seem to be any pattern yet as to the number of good
connections, packet count, or data, before the hang up occurs
- When the problem occurs, I see very rapid packet/byte rate on the "lo"
interfaces, that seems to be a lot of SSH packet retransmissions from
192.168.25.17 (client), to 192.168.25.16 (VIP).  Why this is ending up
on "lo" is a mystery to me.
- The problem only occurs when using the floating VIP interface to
connect, and only when it's redirected to the backup director host.
Connecting directly to that same host (eg. 192.168.25.14 or
192.168.25.16) works just fine every time.
- I've already tried flushing iptables completely on the backup
director, and it didn't seem to help.

I'm going to attach copies of several files (keepalived.conf, iptables
setup, etc.) to see if they're helpful.  If anyone can point me in the
right direction to figure this out, I'd appreciate it greatly.

Thanks again,


-- 
Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu

Attachment: lvs_diagnosis_21August2014.tar.gz
Description: GNU Zip compressed data

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
<Prev in Thread] Current Thread [Next in Thread>
  • [lvs-users] 2-node setup connections hanging to backup, Lloyd Brown <=