LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] LVS-DR and scp

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [lvs-users] LVS-DR and scp
From: Scooter Morris <scooter@xxxxxxxxxxxx>
Date: Wed, 02 Dec 2009 22:01:15 -0800
Hi Simon,
     Thanks for the response.  See below for the answers to your 
questions....

On 12/02/2009 08:28 PM, Simon Horman wrote:
> On Wed, Dec 02, 2009 at 07:25:02PM -0800, Scooter Morris wrote:
>    
>> OK, I've spent a bunch of time looking at this in more detail, and it
>> looks like I've got an MTU/ICMP problem.  Here is a tcpdump between a
>> client and the cluster taken from the client:
>>
>> 19:09:42.102169 IP client.46508>  cluster.ssh: . 123430:126326(2896) ack
>> 2318 win 190<nop,nop,timestamp 1066530312 96022114>
>> 19:09:42.102538 IP cluster>  client: ICMP cluster unreachable - need to
>> frag (mtu 1500), length 556
>> 19:09:42.302789 IP client.46508>  cluster.ssh: . 91574:93022(1448) ack
>> 2318 win 190<nop,nop,timestamp 1066530513 96022114>
>> 19:09:42.303138 IP cluster.ssh>  client.46508: . ack 93022 win 479
>> <nop,nop,timestamp 96022315 1066530513,nop,nop,sack 1 {94470:97366}>
>> 19:09:42.303158 IP client.46508>  cluster.ssh: P 126326:129222(2896) ack
>> 2318 win 190<nop,nop,timestamp 1066530513 96022315>
>> 19:09:42.303533 IP cluster>  client: ICMP cluster unreachable - need to
>> frag (mtu 1500), length 556
>> 19:09:42.503791 IP client.46508>  cluster.ssh: . 93022:94470(1448) ack
>> 2318 win 190<nop,nop,timestamp 1066530714 96022315>
>> 19:09:42.504147 IP cluster.ssh>  client.46508: . ack 97366 win 479
>> <nop,nop,timestamp 96022516 1066530714>
>> 19:09:42.504168 IP client.46508>  cluster.ssh: . 97366:98814(1448) ack
>> 2318 win 190<nop,nop,timestamp 1066530714 96022516>
>> 19:09:42.504176 IP client.46508>  cluster.ssh: . 98814:100262(1448) ack
>> 2318 win 190<nop,nop,timestamp 1066530714 96022516>
>> 19:09:42.504528 IP cluster>  lclient: ICMP cluster unreachable - need to
>> frag (mtu 1500), length 556
>> 19:09:42.704792 IP client.46508>  cluster.ssh: . 97366:98814(1448) ack
>> 2318 win 190<nop,nop,timestamp 1066530915 96022516>
>> 19:09:42.705142 IP cluster.ssh>  client.46508: . ack 98814 win 501
>> <nop,nop,timestamp 96022717 1066530915>
>> 19:09:42.705162 IP client.46508>  cluster.ssh: . 98814:100262(1448) ack
>> 2318 win 190<nop,nop,timestamp 1066530915 96022717>
>> 19:09:42.705171 IP client.46508>  cluster.ssh: . 100262:101710(1448) ack
>> 2318 win 190<nop,nop,timestamp 1066530915 96022717>
>> 19:09:42.705528 IP cluster>  client: ICMP cluster unreachable - need to
>> frag (mtu 1500), length 556
>>      
> [snip]
>
>    
>> So, it looks to me like there is something going on the ICMP or the path
>> MTU discovery between the client and the redirector, but this is using
>> LVS-DR, so this shouldn't happen like it does with LVS-TUN, right?  I've
>> poured over the HOWTO and done several google searches, but the solution
>> to this still eludes me.  As another data point, this only happens when
>> I scp data to the cluster, but when I pull data from the cluster using
>> scp, I get great performance.
>>      
> Hi Scooter,
>
> this is very curious. It looks a lot like the problem detailed at [1]
> that was reported a few months ago but IIRC the cause was never
> completely diagnosed.
>
>    
Yes, I read through that problem report in some detail.
> [1] 
> http://www.gossamer-threads.com/lists/lvs/users/22506?do=post_view_threaded#22506
>
> I also suspect that something is going on with ICMP. It seems curious
> that the client continues to send large (2896 byte) packets even after
> receiving an ICMP message from the director (cluster).
>
> * Are client and cluster on the same L2 (ethernet?) network?
>    
No.  Both are on our campus, but the client is in my office and is on a 
different subnet than either the LVS servers or the cluster.  The LVS 
servers and the cluster /are/ connected to the same Cisco GigE switch, 
however.
> * What is the MTU on the relevant interface on client
>    (and cluster for that matter).
>    
Everything is 1500.  I even forced the MTU of the loopback interface on 
the cluster to 1500 to see if that might be the problem, but it didn't 
have any impact.
> * What OS is client running (out of interest)?
>    
Fedora Core 11, although I can reproduce it from a number of other 
hosts, including a Tru64 Alpha Cluster (remember those?).  
Interestingly, the problem doesn't materialize if I scp from my FC 11 
system at home, so there does seem to be a topological factor somewhere.
> Ideally I'd like to try and reproduce the problem,
> though IIRC I failed in that regard the last time this
> problem was reported.
>
>
>    
That would be great.  Let me know if there is anything I can do to try 
to debug it further from my end.

Thanks!

-- scooter

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread] Current Thread [Next in Thread>