LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

[lvs-users] Apparent MTU problem using LVS-DR and Windows 2003 RealServe

To: "lvs-users@xxxxxxxxxxxxxxxxxxxxxx" <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: [lvs-users] Apparent MTU problem using LVS-DR and Windows 2003 RealServers
From: Christopher Smith <csmith@xxxxxxxxxxxxxxxx>
Date: Tue, 15 Sep 2009 15:19:37 -0700
I have a somewhat weird (at least to me) problem with an LVS-DR setup that has 
some Windows 2003 RealServers.

Firstly, this whole setup is VMs running inside an ESXi 4.0 host, in case that 
sets of anyone's alarm bells up front.


Load Balancers are Centos 5 x86-64, using heartbeat (for failover) and 
ldirectord (to configure IPVS).  The config file for the VIP on the LBs is:

[PHX (UTC-0600) root@lb-test01 ~]# cat /etc/ha.d/conf/10.183.3.112
autoreload = yes
checkinterval = 30
checktimeout = 3
callback = "/etc/ha.d/resource.d/sync_config.sh"
# HTTP.  Mainly used for testing
virtual = 10.183.3.112:80
        ## IMPORTANT.  The following directives for the
        ## above virtual/service IP definition ***MUST*** be
        ## indented by _at least_ four (4) spaces *OR* a single tab.
        protocol = tcp
        scheduler = rr
        #persistent=600
        real = 10.183.3.113:80 gate
        #real = 10.183.3.114:80 gate
        #real = 10.183.3.115:80 gate
        checktype = connect
        quiescent = no
# 104.  Standard DICOM port
virtual = 10.183.3.112:104
        ## IMPORTANT.  The following directives for the
        ## above virtual/service IP definition ***MUST*** be
        ## indented by _at least_ four (4) spaces *OR* a single tab.
        protocol = tcp
        scheduler = rr
        #persistent=600
        real = 10.183.3.113:104 gate
        #real = 10.183.3.114:104 gate
        #real = 10.183.3.115:104 gate
        checktype = ping
        quiescent = no


(I have temporarily disabled two of the RealServers until I get it working with 
just one.)

I have configured the MS loopback adapter on the Windows RealServers, with the 
VIP (10.183.2.112) and a netmask of 255.255.255.255.  Since port 80 balances 
fine - at least so far as I've tested by refreshing a page in links a few dozen 
times and watching it round-robin between the different servers - I'm pretty 
sure the basic config is fine.

However, the balancing of DICOM associations on port 104 does not.  As far as I 
know, are just a simple TCP connection so I'm not sure why it isn't working.

LBs, RealServers and Clients are all on the same subnet.  I have confirmed that 
sending directly to the RealServer works.

Basically, the data transmission hangs and I see this from tcpdump on the LB:

15:14:26.956058 arp who-has 10.183.3.112 tell 10.183.3.241
15:14:26.956467 arp reply 10.183.3.112 is-at 00:0c:29:fb:40:f1
15:14:26.956483 IP 10.183.3.241.1122 > 10.183.3.112.104: S 
4050535759:4050535759(0) win 65535 <mss 1460,nop,nop,sackOK>
15:14:26.956507 IP 10.183.3.241.1122 > 10.183.3.112.104: S 
4050535759:4050535759(0) win 65535 <mss 1460,nop,nop,sackOK>
15:14:26.956115 arp who-has 10.183.3.241 tell 10.183.3.113
15:14:26.956122 arp reply 10.183.3.241 is-at 00:50:56:99:6c:90
15:14:26.956171 IP 10.183.3.112.104 > 10.183.3.241.1122: S 
1649263224:1649263224(0) ack 4050535760 win 40000 <mss 1460,nop,nop,sackOK>
15:14:26.956173 IP 10.183.3.241.1122 > 10.183.3.112.104: . ack 1 win 65535
15:14:26.956177 IP 10.183.3.241.1122 > 10.183.3.112.104: . ack 1 win 65535
15:14:26.956336 IP 10.183.3.112.104 > 10.183.3.241.1122: . ack 1 win 40000
15:14:26.969979 IP 10.183.3.241.1122 > 10.183.3.112.104: P 1:217(216) ack 1 win 
65535
15:14:26.969985 IP 10.183.3.241.1122 > 10.183.3.112.104: P 1:217(216) ack 1 win 
65535
15:14:26.970896 arp who-has 10.183.3.121 tell 10.183.3.113
15:14:26.970992 arp reply 10.183.3.121 is-at 00:50:56:99:78:51
15:14:26.970996 IP 10.183.3.113.1148 > 10.183.3.121.104: S 
4209027996:4209027996(0) win 65535 <mss 1460,nop,nop,sackOK>
15:14:26.970998 IP 10.183.3.121.104 > 10.183.3.113.1148: S 
1556555174:1556555174(0) ack 4209027997 win 64240 <mss 1460,nop,nop,sackOK>
15:14:26.971040 IP 10.183.3.113.1148 > 10.183.3.121.104: . ack 1 win 65535
15:14:26.971313 IP 10.183.3.113.1148 > 10.183.3.121.104: P 1:214(213) ack 1 win 
65535
15:14:27.008491 IP 10.183.3.121.104 > 10.183.3.113.1148: P 1:7(6) ack 214 win 
64027
15:14:27.128736 IP 10.183.3.112.104 > 10.183.3.241.1122: . ack 217 win 39784
15:14:27.136524 IP 10.183.3.113.1148 > 10.183.3.121.104: . ack 7 win 65529
15:14:27.136580 IP 10.183.3.121.104 > 10.183.3.113.1148: P 7:188(181) ack 214 
win 64027
15:14:27.136710 IP 10.183.3.112.104 > 10.183.3.241.1122: P 1:185(184) ack 217 
win 39784
15:14:27.146207 IP 10.183.3.241.1122 > 10.183.3.112.104: P 217:365(148) ack 185 
win 65351
15:14:27.146221 IP 10.183.3.241.1122 > 10.183.3.112.104: P 217:365(148) ack 185 
win 65351
15:14:27.146337 IP 10.183.3.113.1148 > 10.183.3.121.104: P 214:362(148) ack 188 
win 65348
15:14:27.150053 IP 10.183.3.241.1122 > 10.183.3.112.104: . 365:3285(2920) ack 
185 win 65351
15:14:27.150073 IP 10.183.3.112 > 10.183.3.241: ICMP 10.183.3.112 unreachable - 
need to frag (mtu 1500), length 556
15:14:27.329941 IP 10.183.3.112.104 > 10.183.3.241.1122: . ack 365 win 39636
15:14:27.329948 IP 10.183.3.241.1122 > 10.183.3.112.104: . 3285:6205(2920) ack 
185 win 65351
15:14:27.329965 IP 10.183.3.112 > 10.183.3.241: ICMP 10.183.3.112 unreachable - 
need to frag (mtu 1500), length 556
15:14:27.344068 IP 10.183.3.121.104 > 10.183.3.113.1148: . ack 362 win 63879
15:14:29.510399 IP 10.183.3.241.1122 > 10.183.3.112.104: . 365:1825(1460) ack 
185 win 65351
15:14:29.510421 IP 10.183.3.241.1122 > 10.183.3.112.104: . 365:1825(1460) ack 
185 win 65351
15:14:29.643488 IP 10.183.3.112.104 > 10.183.3.241.1122: . ack 1825 win 40000
15:14:29.643493 IP 10.183.3.241.1122 > 10.183.3.112.104: . 1825:4745(2920) ack 
185 win 65351
15:14:29.643507 IP 10.183.3.112 > 10.183.3.241: ICMP 10.183.3.112 unreachable - 
need to frag (mtu 1500), length 556
15:14:32.154209 arp who-has 10.183.3.241 tell 10.183.3.8
15:14:32.154336 arp reply 10.183.3.241 is-at 00:50:56:99:6c:90
15:14:32.884189 IP 10.183.3.8.36599 > 10.183.3.113.80: S 497646369:497646369(0) 
win 5840 <mss 1460,sackOK,timestamp 413727 0,nop,wscale 7>
15:14:32.884272 IP 10.183.3.113.80 > 10.183.3.8.36599: S 
2028145611:2028145611(0) ack 497646370 win 16384 <mss 1460,nop,wscale 
0,nop,nop,timestamp 0 0,nop,nop,sackOK>
15:14:32.884325 IP 10.183.3.8.36599 > 10.183.3.113.80: . ack 1 win 46 
<nop,nop,timestamp 413727 0>
15:14:32.884402 IP 10.183.3.8.36599 > 10.183.3.113.80: F 1:1(0) ack 1 win 46 
<nop,nop,timestamp 413727 0>
15:14:32.884457 IP 10.183.3.113.80 > 10.183.3.8.36599: . ack 2 win 65535 
<nop,nop,timestamp 14463 413727>
15:14:32.884588 IP 10.183.3.113.80 > 10.183.3.8.36599: F 1:1(0) ack 2 win 65535 
<nop,nop,timestamp 14463 413727>
15:14:32.884596 IP 10.183.3.8.36599 > 10.183.3.113.80: . ack 2 win 46 
<nop,nop,timestamp 413727 14463>
15:14:32.884943 IP 10.183.3.8 > 10.183.3.113: ICMP echo request, id 4333, seq 
1, length 72
15:14:32.885001 IP 10.183.3.113 > 10.183.3.8: ICMP echo reply, id 4333, seq 1, 
length 72
15:14:33.994888 IP 10.183.3.241.1122 > 10.183.3.112.104: . 1825:3285(1460) ack 
185 win 65351
15:14:33.994901 IP 10.183.3.241.1122 > 10.183.3.112.104: . 1825:3285(1460) ack 
185 win 65351
15:14:34.169938 IP 10.183.3.112.104 > 10.183.3.241.1122: . ack 3285 win 40000
15:14:34.169944 IP 10.183.3.241.1122 > 10.183.3.112.104: . 3285:6205(2920) ack 
185 win 65351
15:14:34.169959 IP 10.183.3.112 > 10.183.3.241: ICMP 10.183.3.112 unreachable - 
need to frag (mtu 1500), length 556
15:14:40.924150 IP 10.183.3.113.1143 > 10.181.3.12.80: R 
3625640816:3625640816(0) ack 1539269318 win 0
15:14:42.335482 IP 10.183.3.113.137 > 10.183.3.255.137: NBT UDP PACKET(137): 
QUERY; REQUEST; BROADCAST
15:14:42.335746 arp who-has 10.183.3.113 tell 10.183.3.10
15:14:42.335754 arp who-has 10.183.3.113 tell 10.183.3.10
15:14:42.335757 arp who-has 10.183.3.113 tell 10.183.3.10
15:14:42.335759 arp who-has 10.183.3.113 tell 10.183.3.10
15:14:42.335761 arp reply 10.183.3.113 is-at 00:50:56:99:66:4a
15:14:42.335911 IP 10.183.3.10.137 > 10.183.3.113.137: NBT UDP PACKET(137): 
QUERY; POSITIVE; RESPONSE; UNICAST
15:14:42.336130 IP 10.183.3.10.137 > 10.183.3.113.137: NBT UDP PACKET(137): 
QUERY; POSITIVE; RESPONSE; UNICAST
15:14:42.963860 IP 10.183.3.241.1122 > 10.183.3.112.104: . 3285:4745(1460) ack 
185 win 65351
15:14:42.963874 IP 10.183.3.241.1122 > 10.183.3.112.104: . 3285:4745(1460) ack 
185 win 65351
15:14:43.122349 IP 10.183.3.112.104 > 10.183.3.241.1122: . ack 4745 win 40000
15:14:43.122361 IP 10.183.3.241.1122 > 10.183.3.112.104: . 4745:7665(2920) ack 
185 win 65351
15:14:43.122373 IP 10.183.3.112 > 10.183.3.241: ICMP 10.183.3.112 unreachable - 
need to frag (mtu 1500), length 556



Then it just continues with the 'need to frag' messages indefinitely.


I had a bit of a look around on Google and the list archives, but all the 
postings I could find were referring to using LVS-TUN, not LVS-DR.

Has anyone seen this problem before ?  I'm assuming it has something to do with 
the larger data transfers of the DICOM association needing packets to fragment, 
but the smaller HTTP requests do not, but surely that shouldn't be a problem 
with all hosts on the same vlan ?

Cheers,
Chris
-- 
Christopher Smith

UNIX Team Leader
Nighthawk Radiology Services
Limmatquai 4, 6th Floor
8001 Zurich, Switzerland
http://www.nighthawkrad.net
Sydney Fax:    +61 2 8211 2333
Zurich Fax:    +41 43 497 3301
USA Toll free:  866 241 6635

Email:         csmith@xxxxxxxxxxxxxxxx
IP Extension:  8163
Sydney Phone:  +61 2 8211 2363
Sydney Mobile: +61 4 0739 7563
Zurich Phone:  +41 44 267 3363
Zurich Mobile: +41 79 550 2715

All phones forwarded to my current location, however, please consider the local 
time in Zurich before calling from abroad.


CONFIDENTIALITY NOTICE:   This email, including any attachments, contains 
information from NightHawk Radiology Services, which may be confidential or 
privileged. The information is intended to be for the use of the individual or 
entity named above. If you are not the intended recipient, be aware that any 
disclosure, copying, distribution or use of the contents of this information is 
prohibited. If you have received this email in error, please notify NightHawk 
Radiology Services immediately by forwarding message to 
postmaster@xxxxxxxxxxxxxxxx and destroy all electronic and hard copies of the 
communication, including attachments.



_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread] Current Thread [Next in Thread>