Hello Matthew,
Recently we've had this affliction where if you goto www.omnovia.com,
everything is super ass slow. But if you goto wwwdb1.omnovia.com (or
wwwdb2) everything is blazing fast.
I'm talking huge differences here. People on a T1 downloading a file
from www are getting around 10KB/sec and that same file from wwwdb1 is
around 110KB/sec.
I reckon www is mapped to the VIP, wwwdb[12] are mapped to the RS? And
we're talking about one file only, correct? Is there a traffic shaper in
between your clients and your servers?
This problem started this morning for the 2nd time. It happened about
2 weeks ago but we did nothing to the setup and the problem seemed to
fix itself. But now that it's happened again we need some answers.
Our config as of right now: (ip addys changed to protect the innocent)
.35 VIP (ip that www.omnovia.com points to)
.50 RS1
.130 RS2
[root@lb1 ~]# ipvsadm -L -n
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 75.52.166.35:80 rr
-> 75.52.166.50:80 Tunnel 1 12 95
-> 75.52.166.130:80 Tunnel 1 12 107
TCP 75.52.166.35:443 rr
-> 75.52.166.50:443 Tunnel 1 18 1331
-> 75.52.166.130:443 Tunnel 1 28 1351
TCP 75.52.166.35:3306 rr
-> 75.52.166.130:3306 Tunnel 1 4 0
-> 75.52.166.50:3306 Tunnel 1 5 0
We have 1 iptable rule on both RS's to combat the POST packet size issue:
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
TCPMSS tcp -- 75.52.166.35 0.0.0.0/0 tcp
flags:0x16/0x12 TCPMSS set 1440
I've been pretty passive on the LVS list for a couple of months, so what
exactly did I miss with regard to POST packets? Could you send me a link
where I can update myself in this matter?
We've had this in place since Jan 3rd so I don't see how suddenly this
could be causing a problem.
Well, it's all dynamic and certain subtle bugs only show up after some
amount of time; for example memory leaks or some such.
Can anyone offer any suggestions on what to check, look for, diagnose,
etc on what this problem is?
netstat -i
netstat -s
dmesg -s 1000000
grep . /proc/sys/net/ipv4/*
cat /proc/slabinfo
And of course: real time tcpdumps of one flow when it happens.
I'd also add that my connection from home doesn't have this problem.
From home, www, wwwdb1, wwwdb2 are all blazing fast. But we just had a
customer call from Chicago who was getting slow speeds and here in our
office its slow as well to www but not to db1, db2.
Is this reproducible? If so, could you check your MSS sizes in your
routing cache? BTW, from here in Switzerland all three VIP, RS1 and RS2
access are not too fast either. And from China it's dog slow, and from
my account in the US it's rather fast:
(CH) # tracepath www.omnovia.com
1: 192.168.1.32 (192.168.1.32) 0.366ms
pmtu 1500
1: 192.168.1.1 (192.168.1.1) 1.736ms
2: 212.55.210.209 (212.55.210.209) asymm 3
2.252ms
3: zhalb-gw1-fe00-1.cyberlink.ch (195.226.12.1) 11.587ms
4: zhalb-cr1.cyberlink.ch (212.55.192.145) 31.591ms
5: glbix-br1.cyberlink.ch (212.55.192.198) 94.863ms
6: pos5-0.gw4.zur4.alter.net (139.4.71.37) 47.533ms
7: so-3-0-0.XR2.ZUR4.ALTER.NET (146.188.4.193) asymm 8
95.024ms
8: so-1-0-0.TR2.ZUR3.ALTER.NET (146.188.5.133) asymm 9
47.408ms
9: so-2-0-0.IR2.NYC12.ALTER.NET (146.188.8.178) asymm 10
141.743ms
10: 0.so-1-0-0.IL2.NYC9.ALTER.NET (152.63.23.69) asymm 11 99.661ms
11: 0.so-7-0-0.XL4.NYC4.ALTER.NET (152.63.17.97) asymm 12 100.214ms
12: 0.ge-5-1-0.BR2.NYC4.ALTER.NET (152.63.3.122) asymm 13 99.517ms
13: 204.255.173.54 (204.255.173.54) asymm 12 104.101ms
14: ae-32-56.ebr2.NewYork1.Level3.net (4.68.97.190) asymm 12 120.824ms
15: ae-1-100.ebr1.NewYork1.Level3.net (4.69.132.25) asymm 12 112.435ms
16: ae-1-100.ebr1.Washington1.Level3.net (4.69.132.29) asymm 12 116.605ms
17: ae-2.ebr1.Atlanta2.Level3.net (4.69.132.85) asymm 12 131.241ms
18: ae-14-53.car4.Dallas1.Level3.net (4.68.122.80) asymm 12 148.308ms
19: ae-14-55.car4.Dallas1.Level3.net (4.68.122.144) asymm 12 140.875ms
20: THE-PLANET.car4.Dallas1.Level3.net (4.71.122.2) asymm 13 177.893ms
21: te7-2.dsr02.dllstx3.theplanet.com (70.87.253.26) asymm 14 173.259ms
22: vl2.car02.dllstx6.theplanet.com (12.96.160.55) asymm 14 180.037ms
23: vl2.car02.dllstx6.theplanet.com (12.96.160.55) asymm 15 191.469ms
24: 23.a6.344a.static.theplanet.com (74.52.166.35) asymm 15
176.687ms reached
Resume: pmtu 1500 hops 24 back 15
Please note that my PMTU is set to 1500 for all 24 hops!
(US) # /usr/sbin/tracepath www.omnovia.com
1?: [LOCALHOST] pmtu 1500
1: virt9.johncompanies.com (69.55.226.161) 0.287ms
2: 69-55-233-156.in-addr.arpa.johncompanies.com (69.55.233.156) asymm
3 0.803ms
3: 69-55-233-161.in-addr.arpa.johncompanies.com (69.55.233.161) asymm
2 1.270ms
4: 69.43.129.83 (69.43.129.83) asymm 5
1.356ms
5: ge0-0-ext-4.castleaccess.com (69.43.169.68) asymm 6
1.894ms
6: ge-5-1-123.hsa1.SanDiego1.Level3.net (4.79.33.253) asymm 14
7.518ms
7: so-6-1-0.mp2.SanDiego1.Level3.net (4.68.113.37) asymm 15
7.288ms
8: ae-0-0.bbr2.Dallas1.Level3.net (64.159.1.110) asymm 13
39.331ms
9: ae-24-56.car4.Dallas1.Level3.net (4.68.122.176) asymm 13
56.716ms
10: THE-PLANET.car4.Dallas1.Level3.net (4.71.122.2) asymm 15 47.689ms
11: te7-2.dsr02.dllstx3.theplanet.com (70.87.253.26) asymm 14 39.762ms
12: vl22.dsr02.dllstx2.theplanet.com (70.85.127.76) asymm 15 40.307ms
13: vl2.car02.dllstx6.theplanet.com (12.96.160.55) asymm 16 40.022ms
14: 23.a6.344a.static.theplanet.com (74.52.166.35) asymm 17
40.060ms reached
Resume: pmtu 1500 hops 14 back 17
Cheers,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
|