LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] LVS-DR Cluster Some Real Servers Stuck in SYN_RECV

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [lvs-users] LVS-DR Cluster Some Real Servers Stuck in SYN_RECV
From: Bruce Rudolph <brudolph@xxxxxxxxxxx>
Date: Wed, 05 Mar 2014 13:24:25 -0500
One more follow up to see if there are any other suggestions.

Yesterday I added a sixth real server to the cluster. All of these 
servers are of the exact same type (bare metal machines). I installed 
and configured the new server exactly as the others. I added it to the 
cluster and tried it. It failed too, that is, sending requests to the 
VIP causes the real server to send a SYN-ACK (response to the SYN), but 
it is never seen by the client. The one working server, of the same 
type, continues to respond correctly!

Today I reconfigured a non-working server to use Direct Routing via the 
arptables_jf technique. I tried a request and it failed. Then I 
reconfigured the working server to use arptables_jf and it worked. So 
the failure continues on all bad servers with either DR configuration, 
and works on the sixth.

I doubt five servers can have a hardware problem with their NICs.

The cloud vendor has checked their smart switches and they state they 
are working fine.

Thanks for listening and any support suggestions you may have.

Regards,
Bruce

On 3/3/14 1:54 PM, Bruce Rudolph wrote:
> On the failing real servers the response is sent but is never received 
> by the client (e4:11:5b:ae:f9:e5). On the working server the response 
> is sent and the client gets it and sends an ACK and the connection is 
> open.
>
> I run tcpdump on the client (my Mac for the testing) and that is how I 
> know that the SYN-ACK packet is not received from the failing real 
> servers.
>
> This is the mind boggling thing...where are they going? Could it be a 
> smart switch in the cloud environment? If so, then why would one 
> server out of five work correctly?
>
> The real servers are not responding to arping. Only the Directory does.
>
> Bruce
>
> On 3/3/14 12:28 PM, Julian Anastasov wrote:
>>      Hello,
>>
>> On Mon, 3 Mar 2014, Bruce Rudolph wrote:
>>
>>>      18:21:12.346386 Out e4:11:5b:ae:f9:e5 ethertype IPv4 (0x0800),
>>>      length 76: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP
>>>      (6), length 60)
>>>           <VIP>.80 > <CIP//>.62628: Flags [S.], cksum 0xf2a9 (correct),
>>>      seq 4207299083, ack 4011092519, win 14480, options [mss
>>>      1460,sackOK,TS val 82369115 ecr 3844971164,nop,wscale 7], length 0
>>      Response is going to e4:11:5b:ae:f9:e5 ? Do
>> you see it reaching there? Also, simple test with
>> client on LAN can reveal the problem, just check with
>> tcpdump on client box. It can show if problem comes
>> from router or from real servers. Sometimes, smart
>> switches can be the culprit too.
>>
>>      Also, check on real servers (mostly the working
>> one) with tcpdump that you don't see the VIP in
>> outgoing ARP packets, only director can expose the VIP
>> in ARP packets. This can be also checked from client on
>> LAN with 'arping -c 1 VIP', only the director should
>> reply for VIP.
>>
>> Regards
>>
>> --
>> Julian Anastasov<ja@xxxxxx>
>

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread] Current Thread [Next in Thread>