LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] connection broken after 2MB of data transmitted

To: <ja@xxxxxx>
Subject: Re: [lvs-users] connection broken after 2MB of data transmitted
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: <Robert.Grange@xxxxxxxxxxxx>
Date: Fri, 18 May 2018 11:26:55 +0000
Hi Julian,

Here are the 3 traces anonymized (IP, MAC, Port, and with no payload)
        mq_00001_20180518102401_onlyip_192.168.0.1_anon at frame 3430
        director_00001_20180518102356_onlyip_192.168.0.1_anon   at frame 3428 
(but always TCP Retransmit & TCP Dup Ack since the beginning)
        client_00001_20180518102350_anon                                at 
frame 4287

I also tried to change the

        ethtool -K ETH gso off
        ethtool -K ETH gro off

but it didn't help

on the client, the arp -a shows that the MAC of the VIP has the correct MAC 
(the one of the active director)

Robert

-----Message d'origine-----
De : Julian Anastasov [mailto:ja@xxxxxx] 
Envoyé : jeudi 17 mai 2018 22:02
À : Grange Robert, INI-ONE-CIS-GSV-MFS <Robert.Grange@xxxxxxxxxxxx>
Cc : lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Objet : Re: [lvs-users] connection broken after 2MB of data transmitted


        Hello,

On Mon, 14 May 2018, Robert.Grange@xxxxxxxxxxxx wrote:

> We are using this tool in one of our project, and we are facing a disconnect 
> every ~2MB of data transferred.
> 
> rhel 7n
> ipvsadm.x86_64                                                                
>            1.27-7.el7
> keepalived.x86_64                                                             
>             1.3.5-1.el7
> kernel.x86_64                                                                 
>          3.1nnN0.0-514.21.1.el7
> 
> 
> Our configuration:
>             VIP       10.1.1.130
>             LB1       10.1.1.131        Virtual Server keepalived Active
>             LB2       10.1.1.132        Virtual Server keepalived backup
>             MQ1      10.1.1.151        Real Server MQ Active
>             MQ2      10.1.1.152        Real Server MQ Standby
> 
> Our keepalived.conf (simplified)
> global_defs {
>   notification_email {
>     blablalba@xxxxxxxxxx<mailto:blablalba@xxxxxxxxxx>
>   }
>   notification_email_from     
> blablalba@xxxxxxxxxx<mailto:blablalba@xxxxxxxxxx>
>   smtp_server sysmail.mymail.com
>   smtp_connect_timeout 30
> }
> 
> vrrp_instance vi_y-maas {
>   state BACKUP
>   virtual_router_id 100
>   interface ens32
>   priority 150
>   advert_int 5
>   nopreempt
>   smtp_alert
>   virtual_ipaddress {
>     10.1.1.130/25
>   }
> }
> 
> # My MQ
> virtual_server 10.1.1.130 1423 {
>   delay_loop 2
>   protocol TCP
>   lb_algo rr
>   lb_kind DR
> 
>   real_server 10.1.1.151 1423 {
>     weight 10
>     TCP_CHECK {
>     }
>   }
>   real_server 10.1.1.152 1423 {
>     weight 10
>     TCP_CHECK {
>     }
>   }
> }
> 
> On MQ1 and MQ2, we have added ARP rules (due to Direct Routing) :INPUT 
> ACCEPT :OUTPUT ACCEPT :FORWARD ACCEPT -A INPUT -j DROP -d 10.1.1.130 
> -A OUTPUT -j mangle -s 10.1.1.130 --mangle-ip-s 10.1.1.151 And :INPUT 
> ACCEPT :OUTPUT ACCEPT :FORWARD ACCEPT -A INPUT -j DROP -d 10.1.1.130 
> -A OUTPUT -j mangle -s 10.1.1.130 --mangle-ip-s 10.1.1.152
> 
> MQ1 and MQ2 also have the VIP Address as a secondary address of the 
> interface
> 2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state 
> UP qlen 1000
>     link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
>     inet 10.1.1.151/25 brd 10.1.1.255 scope global ens32
>        valid_lft forever preferred_lft forever
>     inet 10.1.1.130/25 scope global secondary ens32
>        valid_lft forever preferred_lft forever
> 
> This permit us to direct the routing to the active MQ without 
> intervention  (if the Active MQ fail, the StdBy take relay and the LB 
> detect that MQ1 is down and MQ2 is up)
> 
> My problem
> 
> When trying to read messages (~8'000) from MQ, using VIP to connect, 
> the program can read ~2MB, then the connection is broken (We can see 
> that in Wireshark trace that there are 5 TCP Retransmit with 
> increasing delay between retransmit, between the IP where the 
> Application PGM runs and the VIP address, and that at the same time, 
> there is no more traffic between the LB1 (active LB) to the MQ1 
> (Active MQ)

        It would be useful to see trace just before the retransmission starts, 
from client, director and real server:

tcpdump -lnnnv -i any -s 0 port 1423 or icmp

        If you prefer, you can scramble the addresses, we care for things like 
checksum, packet sizes, PMTU (ICMP errors?).

        Also, you can try to stop GRO/GSO on the director:

ethtool ETH -K gso off
ethtool ETH -K gro off

        Check on client with arp -an if MAC for VIP is correct, just in case to 
be sure.

> With the same pgm, same read of messages, when connecting directly to the 
> MQ1, there is no problems.
> 
> Could this be a problem related to keepalived or from linux-lvs it-self ?
> 
> Many thanks and regards
> Robert

Regards

--
Julian Anastasov <ja@xxxxxx>

Attachment: traces.zip
Description: traces.zip

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
<Prev in Thread] Current Thread [Next in Thread>