Hello,
I'm the network guy working through this issue with Phillip.
Phillip Moore wrote:
>
> We do not see any ICMP on the hosts.
> On Fri, Aug 28, 2015 at 2:42 AM, Julian Anastasov <ja@xxxxxx> wrote:
> >
> > On Thu, 27 Aug 2015, Phillip Moore wrote:
> >
> >> I have IPVS setup with 2 VIPs talking to the same real server
> >> configured for direct server return (ie TUN type).
> >> One vip is port 80 http and one vip is 443 for https/SSL. The SSL vip
> >> doesn't work properly. There is initial communication that happens but
> >> then it appears as though IPVS stops tunneling the incoming packets to
> >> the real server and the connection stalls and times out. If I switch
> >> ports to just verify there is nothing crazy going on with filtering
> >> and I put SSL on port 80 (or any port) it still fails.
> >>
> >> I've put the relevant info in a gist in hope it might be helpful and
> >> not clutter up the email.
> >>
> >> https://gist.github.com/realpdm/2118bbaa298ff3debe52
I'm going to have to clutter up the email to annotate the packet
captures, so please bear with me...
> >> In various test scenarios we found that the client is having to
> >> retransmit packets after some initial successful back and forth. On
> >> the IPVS node a tcpdump shows that for some reason IPVS stops
> >> forwarding the packets onto the real server over the tunnel. You can
> >> see in the tcpdump IPVS is forwarding things over ipip just fine until
> >> it stops around line 15 in the dump
> >>
> >> http traffic doesn't do this at all only SSL.
> >>
> >> I'm really puzzled and hope i am missing something obvious. I
> >> appreciate any insights or suggestions.
> >
> > I'm not sure what kind of fixes contain your kernel
> > but the problem should be related to PMTU (ICMP FRAG_NEEDED)
> > or GRO. In your packet trace I don't see ICMP messages,
> > may be there are such packets from real server to client.
> > For example:
> >
> > 17:09:40.820663 IP 10.240.8.72.60642 > 10.64.96.10.443: Flags [.], ack
> > 1449, win 137, options [nop,nop,TS val 592982574 ecr 2501301618], length 0
> > 17:09:40.820678 IP 10.65.74.77 > 10.65.74.72: IP 10.240.8.72.60642 >
> > 10.64.96.10.443: Flags [.], ack 1449, win 137, options [nop,nop,TS val
> > 592982574 ecr 2501301618], length 0 (ipip-proto-4)
> > 17:09:40.820704 IP 10.240.8.72.60642 > 10.64.96.10.443: Flags [.], ack
> > 3580, win 160, options [nop,nop,TS val 592982574 ecr 2501301618], length 0
> >
> > See how client acks 1449 and then 3580, i.e. 2131 bytes
> > were sent to client.
>
> I do not understand why on line 15 of the tcpdump you can see a 326
> byte packet is received from the client, but isn't forwarded to the
> real server. There wouldn't be any fragmentation issues with that
> would there? On line 15 you can see it keeps receiving the same packet
> 6 times and fails to forward it on.
I want to echo Phillip that we're not seeing anything that implies MTU
problems. The director is using the same interface for ingress and
egress traffic. The trace that we took from the director seems to be a
smoking gun. I realize this will only show traffic from the client to
the director and from the director to the real server, but it clearly
shows a problem as Phillip points out regarding the 326-byte segment.
| 17:09:40.750074 IP 10.240.8.72.60642 > 10.64.96.10.443: Flags [S], seq
| 573101480, win 14600, options [mss 1460,sackOK,TS val 592982504 ecr
| 0,nop,wscale 7], length 0
|
| 17:09:40.750111 IP 10.65.74.77 > 10.65.74.72: IP 10.240.8.72.60642 >
| 10.64.96.10.443: Flags [S], seq 573101480, win 14600, options [mss
| 1460,sackOK,TS val 592982504 ecr 0,nop,wscale 7], length 0
| (ipip-proto-4)
|
| 17:09:40.750780 IP 10.240.8.72.60642 > 10.64.96.10.443: Flags [.], ack
| 4261714231, win 115, options [nop,nop,TS val 592982504 ecr
| 2501301549], length 0
|
| 17:09:40.750796 IP 10.65.74.77 > 10.65.74.72: IP 10.240.8.72.60642 >
| 10.64.96.10.443: Flags [.], ack 1, win 115, options [nop,nop,TS val
| 592982504 ecr 2501301549], length 0 (ipip-proto-4)
Initial 3-way handshake is completed. All packets coming into the
director from the client are forwarded to the real server in an ipip
tunnel.
| 17:09:40.819644 IP 10.240.8.72.60642 > 10.64.96.10.443: Flags [P.], seq
| 0:77, ack 1, win 115, options [nop,nop,TS val 592982573 ecr
| 2501301549], length 77
|
| 17:09:40.819659 IP 10.65.74.77 > 10.65.74.72: IP 10.240.8.72.60642 >
| 10.64.96.10.443: Flags [P.], seq 0:77, ack 1, win 115, options
| [nop,nop,TS val 592982573 ecr 2501301549], length 77 (ipip-proto-4)
A 77-byte packet is send from client to director and forwarded as
expected.
| 17:09:40.820663 IP 10.240.8.72.60642 > 10.64.96.10.443: Flags [.], ack
| 1449, win 137, options [nop,nop,TS val 592982574 ecr 2501301618],
| length 0
|
| 17:09:40.820678 IP 10.65.74.77 > 10.65.74.72: IP 10.240.8.72.60642 >
| 10.64.96.10.443: Flags [.], ack 1449, win 137, options [nop,nop,TS
| val 592982574 ecr 2501301618], length 0 (ipip-proto-4)
The client acknowledges 1448 bytes of data received directly from the
real server.
| 17:09:40.820704 IP 10.240.8.72.60642 > 10.64.96.10.443: Flags [.], ack
| 3580, win 160, options [nop,nop,TS val 592982574 ecr 2501301618],
| length 0
| 17:09:40.820705 IP 10.65.74.77 > 10.65.74.72: IP 10.240.8.72.60642 >
| 10.64.96.10.443: Flags [.], ack 3580, win 160, options [nop,nop,TS
| val 592982574 ecr 2501301618], length 0 (ipip-proto-4)
The client acknowledges an additional 2132 bytes of data received
directly from the real server.
| 17:09:40.823431 IP 10.240.8.72.60642 > 10.64.96.10.443: Flags [P.], seq
| 77:403, ack 3580, win 160, options [nop,nop,TS val 592982577 ecr
| 2501301618], length 326
This is the 326-byte segment that the director receives from the client
a total of 6 times. There is no corresponding segment outbound from the
director to the real server. The director is eating this segment.
During the time that this segment is being transmitted and
retransmitted, `sudo ipvsadm -Ln -c` shows an ESTABLISHED connection.
When the client ultimately gives up (or I ^C), this packet is sent from
the client to the director:
| 17:09:46.017707 IP 10.240.8.72.60642 > 10.64.96.10.443: Flags [F.], seq
| 403, ack 3580, win 160, options [nop,nop,TS val 592987771 ecr
| 2501301618], length 0
And the director dutifully forwards the FIN to the real server:
| 17:09:46.017723 IP 10.65.74.77 > 10.65.74.72: IP 10.240.8.72.60642 >
| 10.64.96.10.443: Flags [F.], seq 403, ack 3580, win 160, options
| [nop,nop,TS val 592987771 ecr 2501301618], length 0 (ipip-proto-4)
So the connection still exists and is known. But the client's data
packets in the middle get lost.
> > SSL usually sends large certificates at start, so
> > that may explain the difference with plain HTTP. But you will
> > need more traces, mostly from the real server or better its
> > uplink router, if possible. For TUN mode the following traces
> > catch all traffic:
Packet size definitely has something to do with the issue. But the magic
number isn't what you'd think.
I just ran a series of tests noting that if I sent some extra headers
over plain-old-HTTP, I could also repro the issue.
This works:
% curl -kvH 'Host: test' -H 'X: AAAAAAAA' http://10.64.96.10/iptest.php
| 12:43:52.390132 IP 10.240.8.72.39812 > 10.64.96.10.http: Flags [P.], seq
| 0:190, ack 1, win 115, options [nop,nop,TS val 670634144 ecr
| 10917205], length 190
|
| 12:43:52.390135 IP 10.65.74.77 > 10.65.74.72: IP 10.240.8.72.39812 >
| 10.64.96.10.http: Flags [P.], seq 0:190, ack 1, win 115, options
| [nop,nop,TS val 670634144 ecr 10917205], length 190 (ipip-proto-4)
This fails:
% curl -kvH 'Host: test' -H 'X: AAAAAAAAA' http://10.64.96.10/iptest.php
| 12:43:55.955943 IP 10.240.8.72.39815 > 10.64.96.10.http: Flags [P.], seq
| 0:191, ack 1, win 115, options [nop,nop,TS val 670637710 ecr
| 10920771], length 191
|
| 12:43:56.156790 IP 10.240.8.72.39815 > 10.64.96.10.http: Flags [P.], seq
| 0:191, ack 1, win 115, options [nop,nop,TS val 670637911 ecr
| 10920771], length 191
|
| 12:43:56.558762 IP 10.240.8.72.39815 > 10.64.96.10.http: Flags [P.], seq
| 0:191, ack 1, win 115, options [nop,nop,TS val 670638313 ecr
| 10920771], length 191
With no corresponding packet being sent out by the director.
So TCP payload >= 190 bytes is a smashing success. 191 bytes, and the
director eats the packet.
For completeness:
| client % ping -s 1460 -c 1 $VIP
| PING 10.64.96.10 (10.64.96.10) 1460(1488) bytes of data.
| 1468 bytes from 10.64.96.10: icmp_seq=1 ttl=59 time=0.631 ms
|
| --- 10.64.96.10 ping statistics ---
| 1 packets transmitted, 1 received, 0% packet loss, time 0ms
| rtt min/avg/max/mdev = 0.631/0.631/0.631/0.000 ms
|
| director % ping -s 1460 -c 1 $REALIP
| PING 10.65.74.72 (10.65.74.72) 1460(1488) bytes of data.
| 1468 bytes from 10.65.74.72: icmp_seq=1 ttl=64 time=0.154 ms
|
| --- 10.65.74.72 ping statistics ---
| 1 packets transmitted, 1 received, 0% packet loss, time 0ms
| rtt min/avg/max/mdev = 0.154/0.154/0.154/0.000 ms
MTU is well over 1000 all the way through -- we should not be having
trouble with 191-byte segments.
Now, I'm guessing that if this were widespread behavior, nobody would be
using LVS. Which leads me to believe there must either be something
about our specific configuration or kernel version. Has nobody
encountered this type of behavior before?
> >
> > director# tcpdump -ln -i INDEV host CIP
> > director# tcpdump -ln -i OUTDEV host RIP -vvv
> > real server# tcpdump -ln -i IN_ETH host DIP -vvv
> > real server# tcpdump -ln -i tunl0 host CIP
> > real server# tcpdump -ln -i OUT_DEV host CIP
> > client# tcpdump -ln host VIP
> >
> > You need to catch TCP, IPIP, ICMP. Restricting
> > by client IP may help.
We have these captures from every point of view. Every packet tunneled
by the director is successfully received by the real server. Every
packet sent by the real server is successfully received by the client.
Every packet sent by the client is successfully received by the
director.
If you have any other ideas about specific version or debugging we could
enable, we'd really appreciate it.
--
Chris Cowart
http://www.timesinks.net/
pgpslpGKW6inj.pgp
Description: PGP signature
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
|