Hi Simon,
Thanks for the response. See below for the answers to your
questions....
On 12/02/2009 08:28 PM, Simon Horman wrote:
> On Wed, Dec 02, 2009 at 07:25:02PM -0800, Scooter Morris wrote:
>
>> OK, I've spent a bunch of time looking at this in more detail, and it
>> looks like I've got an MTU/ICMP problem. Here is a tcpdump between a
>> client and the cluster taken from the client:
>>
>> 19:09:42.102169 IP client.46508> cluster.ssh: . 123430:126326(2896) ack
>> 2318 win 190<nop,nop,timestamp 1066530312 96022114>
>> 19:09:42.102538 IP cluster> client: ICMP cluster unreachable - need to
>> frag (mtu 1500), length 556
>> 19:09:42.302789 IP client.46508> cluster.ssh: . 91574:93022(1448) ack
>> 2318 win 190<nop,nop,timestamp 1066530513 96022114>
>> 19:09:42.303138 IP cluster.ssh> client.46508: . ack 93022 win 479
>> <nop,nop,timestamp 96022315 1066530513,nop,nop,sack 1 {94470:97366}>
>> 19:09:42.303158 IP client.46508> cluster.ssh: P 126326:129222(2896) ack
>> 2318 win 190<nop,nop,timestamp 1066530513 96022315>
>> 19:09:42.303533 IP cluster> client: ICMP cluster unreachable - need to
>> frag (mtu 1500), length 556
>> 19:09:42.503791 IP client.46508> cluster.ssh: . 93022:94470(1448) ack
>> 2318 win 190<nop,nop,timestamp 1066530714 96022315>
>> 19:09:42.504147 IP cluster.ssh> client.46508: . ack 97366 win 479
>> <nop,nop,timestamp 96022516 1066530714>
>> 19:09:42.504168 IP client.46508> cluster.ssh: . 97366:98814(1448) ack
>> 2318 win 190<nop,nop,timestamp 1066530714 96022516>
>> 19:09:42.504176 IP client.46508> cluster.ssh: . 98814:100262(1448) ack
>> 2318 win 190<nop,nop,timestamp 1066530714 96022516>
>> 19:09:42.504528 IP cluster> lclient: ICMP cluster unreachable - need to
>> frag (mtu 1500), length 556
>> 19:09:42.704792 IP client.46508> cluster.ssh: . 97366:98814(1448) ack
>> 2318 win 190<nop,nop,timestamp 1066530915 96022516>
>> 19:09:42.705142 IP cluster.ssh> client.46508: . ack 98814 win 501
>> <nop,nop,timestamp 96022717 1066530915>
>> 19:09:42.705162 IP client.46508> cluster.ssh: . 98814:100262(1448) ack
>> 2318 win 190<nop,nop,timestamp 1066530915 96022717>
>> 19:09:42.705171 IP client.46508> cluster.ssh: . 100262:101710(1448) ack
>> 2318 win 190<nop,nop,timestamp 1066530915 96022717>
>> 19:09:42.705528 IP cluster> client: ICMP cluster unreachable - need to
>> frag (mtu 1500), length 556
>>
> [snip]
>
>
>> So, it looks to me like there is something going on the ICMP or the path
>> MTU discovery between the client and the redirector, but this is using
>> LVS-DR, so this shouldn't happen like it does with LVS-TUN, right? I've
>> poured over the HOWTO and done several google searches, but the solution
>> to this still eludes me. As another data point, this only happens when
>> I scp data to the cluster, but when I pull data from the cluster using
>> scp, I get great performance.
>>
> Hi Scooter,
>
> this is very curious. It looks a lot like the problem detailed at [1]
> that was reported a few months ago but IIRC the cause was never
> completely diagnosed.
>
>
Yes, I read through that problem report in some detail.
> [1]
> http://www.gossamer-threads.com/lists/lvs/users/22506?do=post_view_threaded#22506
>
> I also suspect that something is going on with ICMP. It seems curious
> that the client continues to send large (2896 byte) packets even after
> receiving an ICMP message from the director (cluster).
>
> * Are client and cluster on the same L2 (ethernet?) network?
>
No. Both are on our campus, but the client is in my office and is on a
different subnet than either the LVS servers or the cluster. The LVS
servers and the cluster /are/ connected to the same Cisco GigE switch,
however.
> * What is the MTU on the relevant interface on client
> (and cluster for that matter).
>
Everything is 1500. I even forced the MTU of the loopback interface on
the cluster to 1500 to see if that might be the problem, but it didn't
have any impact.
> * What OS is client running (out of interest)?
>
Fedora Core 11, although I can reproduce it from a number of other
hosts, including a Tru64 Alpha Cluster (remember those?).
Interestingly, the problem doesn't materialize if I scp from my FC 11
system at home, so there does seem to be a topological factor somewhere.
> Ideally I'd like to try and reproduce the problem,
> though IIRC I failed in that regard the last time this
> problem was reported.
>
>
>
That would be great. Let me know if there is anything I can do to try
to debug it further from my end.
Thanks!
-- scooter
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
|