LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: Problems with IPVS

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: Problems with IPVS
From: Roberto Nibali <ratz@xxxxxxxxxxxx>
Date: Tue, 17 Oct 2006 17:35:42 +0200
 Dumps attached on previous e-mail were done on bond0 interface which is
facing proxy. tcpdumps done on proxy confirms the problem.

Hehe, you definitely want to use all possible features of Linux networking. How is your bonding configured, ALB? There is an outstanding issue with regard to packet reassembly on bond devices using ALB. It's highly unlikely that you're experiencing it, though. But this could explain your not perfect looking ethereal :).

 tcpdump.cap - DNAT case
 tcpdump2.cap - LVS case
 tcpdump3.cap - LVS case and Nokia phone

Still no data at my end.

 1. phone sends SYN packet to proxy;

Means (from previous email context):

Phone --> GRE tunnel --> netwap --> fwmark --> LVS --> proxy

 Yes. netwap is interface on the same server running LVS.

Ok.

How many devices are we talking about including Phone and proxy?

 Phone, SGSN/GGSN, PIX firewall (one end of GRE is there), server, proxy.

Excellent, thanks. Does the PIX belong to the carrier? I presume, the IP addresses after the PIX are still non-publicly routeable IP addresses?

@Joe: In case you want to update the LVS-Howto:
      http://en.wikipedia.org/wiki/SGSN
      http://tools.ietf.org/html/rfc3344

 2. proxy responds with SYN,ACK;
 3. phone sends ACK;

Beautiful, if this goes through LVS, it's already a big step towards a correctly working LVS.

 Nokia phones works through LVS without problems.

Hmm, since you talk about re-transmission, I wonder one of the following contexts apply (http://tools.ietf.org/html/rfc3344#page-83):

C.1. TCP Timers

   When high-delay (e.g. SATCOM) or low-bandwidth (e.g. High-Frequency
   Radio) links are in use, some TCP stacks may have insufficiently
   adaptive (non-standard) retransmission timeouts.  There may be
   spurious retransmission timeouts, even when the link and network
   are actually operating properly, but just with a high delay because
   of the medium in use.  This can cause an inability to create or
   maintain TCP connections over such links, and can also cause unneeded
   retransmissions which consume already scarce bandwidth.  Vendors
   are encouraged to follow the algorithms in RFC 2988 [31] when
   implementing TCP retransmission timers.  Vendors of systems designed
   for low-bandwidth, high-delay links should consult RFCs 2757 and
   2488 [28, 1].  Designers of applications targeted to operate on
   mobile nodes should be sensitive to the possibility of timer-related
   difficulties.

C.2. TCP Congestion Management

   Mobile nodes often use media which are more likely to introduce
   errors, effectively causing more packets to be dropped.  This
   introduces a conflict with the mechanisms for congestion management
   found in modern versions of TCP [21].  Now, when a packet is dropped,
   the correspondent node's TCP implementation is likely to react as
   if there were a source of network congestion, and initiate the
   slow-start mechanisms [21] designed for controlling that problem.
   However, those mechanisms are inappropriate for overcoming errors
   introduced by the links themselves, and have the effect of magnifying
   the discontinuity introduced by the dropped packet.  This problem has
   been analyzed by Caceres, et al. [5].  TCP approaches to the problem
   of handling errors that might interfere with congestion management
   are discussed in documents from the [pilc] working group [3, 9].
   While such approaches are beyond the scope of this document,
   they illustrate that providing performance transparency to mobile
   nodes involves understanding mechanisms outside the network layer.
   Problems introduced by higher media error rates also indicate the
   need to avoid designs which systematically drop packets; such designs
   might otherwise be considered favorably when making engineering
   tradeoffs.

But then we'd definitely have a problem with IPVS. However, let's not jump to early conclusions.

 4. phone sends HTTP GET request;
 5. proxy ACKs packet 4;
Only ACK? No data?

 Yes.

Window size? adv size?

 6. proxy sends HTTP data packet;
 7. proxy sends another HTTP data packet;
 8. proxy sends FIN packet;

 weird things starts here

9. phone once more sends ACK packet acknowledging packet 2 (duplicate of packet 3);
Does the proxy have SACK/FACK support enabled?

 Proxy is CentOS4 Linux server running Squid.

And you see nothing unusual in your squid logs when connecting with SE phones?

# sysctl net.ipv4.tcp_fack net.ipv4.tcp_sack
net.ipv4.tcp_fack = 1
net.ipv4.tcp_sack = 1

Does disabling (just for a test) SACK change anything?

 10. and one more dupe of packet 3;
 11.-14. proxy repeats packet 6. 4 times.
It has to. Is ECN enabled?

 Once again sysctl says that no. Both on LVS server and on proxy.

What are the kernel versions? (Sorry, if this is a dupe.)

 The problem is that LVS does not pass packets 11. to 14. to phone. Why?
Because packet 8 was FIN and LVS is not stateful with regard to TCP sessions and retransmits.

 But phone did not acknowledged that FIN yet?

Sure, but we act on first seen FIN regarding template expiration, IIRC:

http://www.drugphish.ch/~ratz/IPVS/ip__vs__proto__tcp_8c.html#a36

But I'd need to check the code again. Take this with a grain of salt.

In case of DNAT packets 11.-14. are passed to phone which at the end acknowledges packets 6. and 7. and then acknowledges packet 8. thus closing TCP connection.
Here I don't follow your statements, sorry.

If I setup DNAT instead of LVS then packets 11.-14. are sent to phone. In case of LVS they are not.

So you get to see packets 11-14 on the outbound interface of LVS from Squid, but never on the inbound interface (direction of PIX)? This is very odd!

And after phone receives those packets it sends ACK to packets 6. and 7. and then to 8.

But only for DNAT.

Regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

<Prev in Thread] Current Thread [Next in Thread>