Re: ipvsadm ..set / tcp timeout was: ipvsadm --set... what does it modi

To:	Andreas John <lists@xxxxxxxxxxxxxx>
Subject:	Re: ipvsadm ..set / tcp timeout was: ipvsadm --set... what does it modify?
Cc:	"LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
From:	Roberto Nibali <ratz@xxxxxxxxxxxx>
Date:	Tue, 14 Feb 2006 00:55:10 +0100

Dear Andreas,

I'm currently not able to reply in-depth to your email because ofspecial work I'm doing, which requires my full attention. I'll be backin April, if you're still interested.

implemented. Of course I could fly down to Julian's place over the
week-end and we could implement it together; want to sponsor it? ;).

Well, currently I am not in the position to sponsor such development,

but things may change. If that casse I would contact you via p-mail.


Fair enough.

(Ryanair does not fly to .bg .... :/ )

It's not easy to get to Julian's home. One has to fly to Sofia and thenbargain for a ticket to fly down to Varna :).

As you are from .ch (as your domain tells me ;)) ... will you be at
linuxtag (Wiesbaden, Germany?).

Unlikely, depends on how many other kernel developers I know will attendit. The conference is too much user space centric :). Also Wiesbaden isin the middle of nowhere and during this time I will probably be inMorokko surfing.

And I looked at Julian's proposal. Ufff. I not a fluent C speaker, so if
I would try to do that, I would recommend not to use it :)

To be fair, we should probably just re-enable the proc-fs based timersettings until we have a replacement. Maybe if I find my vim on time, Ican cook something up for 2.6.

eh, I got it form /proc:

# cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
Ahh, but that's not 2h + 9*75s :).
May I quote from the document you mentioned and comment: ip-sysctl.txt

It's better to quote from the source, since ip-sysctl.txt can beinaccurate sometimes; not in this case though.

tcp_keepalive_time - INTEGER: How often TCP sends out keepalive messages
when keepalive is enabled. Default: 2hours.

-> The kernel tries all 2 houres to send probes and check if the
connection is still there?

tcp_keepalive_probes - INTEGER: How many keepalive probes TCP sends out,
until it decides that the connection is broken. Default value: 9.

-> The kernel tries 9 times to send such probes. The probes will be 75
secs (tcp_keepalive_intvl) distance. After all 9 failed, the kernel will
drop the connection.

That brings me back to my 7200 + 9 * 75 secs. But it may be only 7200 +
8 * 75 secs, because the value does say noting about the timeout of the
last IP paket.... errrrh </confused> ...

You're still referring to sockets, where as IPVS has nothing to do withsockets. I'm sorry but due to time constraints I have to refer you toreading the source, Luke :):


Start with net/ipv4/tcp.c:tcp_setsockopt()

  case TCP_KEEPIDLE:
      if (val < 1 || val > MAX_TCP_KEEPIDLE)
              err = -EINVAL;
      else {
              tp->keepalive_time = val * HZ;
              if (sock_flag(sk, SOCK_KEEPOPEN) &&
                  !((1 << sk->sk_state) &
                    (TCPF_CLOSE | TCPF_LISTEN))) {
                      __u32 elapsed = tcp_time_stamp - tp->rcv_tstamp;
                      if (tp->keepalive_time > elapsed)
                              elapsed = tp->keepalive_time - elapsed;
                      else
                              elapsed = 0;
                      inet_csk_reset_keepalive_timer(sk, elapsed);
              }
      }

The important thing is the inet_csk_reset_keepalive_timer() which endsup calling sk_reset_timer() in net/core/sock.c, if I'm not mistaken.

There's a lot of TCP timers in the Linux kernel and they all have
[...]
/proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_timeout_time_wait:120


                     ^^^^^^^^^
Are you on drugs or am I too dumb :) ?

Depending on your perception and understanding of these the answerranges from neither to both ;).

What do the netfiler timeouts
have to do with the tcp_keepalive in general?


Nothing, I hope I've never stated this in my previous emails.

Or did you respond to the
resource i mentioned (this was about netfilter but I was only interested
in the little part about tcp_keepalive in general ...)

Regarding IPVS tcp_keepalive has not much to say. IPVS maintains its ownstate timers and is generally pretty unimpressed by socket timers. Ofcourse if it expires on the client's end, we get to see a RST andaccording to the state table, you can figure out what happens next; hereis an excerpt for your viewing pleasure (../ipvs/ip_vs_proto_tcp.c):


static struct tcp_states_t tcp_states [] = {
/*      INPUT */
/*        sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA */
/*syn*/ {{sSR, sES, sES, sSR, sSR, sSR, sSR, sSR, sSR, sSR, sSR }},
/*fin*/ {{sCL, sCW, sSS, sTW, sTW, sTW, sCL, sCW, sLA, sLI, sTW }},
/*ack*/ {{sCL, sES, sSS, sES, sFW, sTW, sCL, sCW, sCL, sLI, sES }},
/*rst*/ {{sCL, sCL, sCL, sSR, sCL, sCL, sCL, sCL, sLA, sLI, sSR }},

/*      OUTPUT */
/*        sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA */
/*syn*/ {{sSS, sES, sSS, sSR, sSS, sSS, sSS, sSS, sSS, sLI, sSR }},
/*fin*/ {{sTW, sFW, sSS, sTW, sFW, sTW, sCL, sTW, sLA, sLI, sTW }},
/*ack*/ {{sES, sES, sSS, sES, sFW, sTW, sCL, sCW, sLA, sES, sES }},
/*rst*/ {{sCL, sCL, sSS, sCL, sCL, sTW, sCL, sCL, sCL, sCL, sCL }},

/*      INPUT-ONLY */
/*        sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA */
/*syn*/ {{sSR, sES, sES, sSR, sSR, sSR, sSR, sSR, sSR, sSR, sSR }},
/*fin*/ {{sCL, sFW, sSS, sTW, sFW, sTW, sCL, sCW, sLA, sLI, sTW }},
/*ack*/ {{sCL, sES, sSS, sES, sFW, sTW, sCL, sCW, sCL, sLI, sES }},
/*rst*/ {{sCL, sCL, sCL, sSR, sCL, sCL, sCL, sCL, sLA, sLI, sCL }},
};

And of course we have the IPVS TCP settings, which look as follows (if
they weren't disabled in the core :)):


               ^^^^^ disabled? why should we ?

Noone has complained before. I have re-instated them in the 2.4.x kernelbecause I needed them. I don't need the fine granular settings Julian isproposing, but I can envision someone have a use for them. However, nothaving the ability to set state timeouts is not a good tradeoff to me.

timeouts. Since there is no socket (as in an endpoint) involved when
doing either netfilter or IPVS, you have to guess what the TCP flow
in-between (where you machine is "standing") is doing, so you can
continue to forward, rewrite, mangle, whatever, the flow, _without_
disturbing it. The timers are used for table mapping timeouts of TCP
states. If we didn't have them, mappings would stay in the kernel
forever and eventually we'd run out of memory. If we have them wrong, it
might occur that a connection is aborted prematurely by our host, for
example yielding those infamous ssh hangs when connecting through a
packet filter.

Yes, I am asking myself is, if wee see a FIN or RST fyling through a tco

connection, we could lower the timer significantly or even dropp the
entry because we may know what's happening?

In triangulation mode (LVS_DR) this is impossible to find out, unlessyou use the forward shared approach written by Julian. I'm actually alsonot very happy about the current way of detecting TCP state changes,however this is how it is. If you're interested, ipvs/ip_vs_proto_tcp.ccontains the related code which decides when a connection state getswhich timeout; just check out set_tcp_state().

My current problem is a client connection with a keepalive foo to a DB
balanced (via NAT) to two real servers. The client opens a forms, types,
then parks his browser and going to lunch for a hour ot two. Then he
comes back and presses "sumbit".... validation error....

Umm, why do your clients eat? Maybe you should print a warning into yourforms, that lunch is forbidden under penalty of cleaning up the databaseafterwards. Seriously though, there's really nothing IPVS can do for youat this stage, since the client gets a TCP RST. Increase the TCPkeepalive settings and set an equally high persistency timeout.

IMVHO the best solution would be a javascript on the client that tries
to get 1 byte of from the DB  every some minutes, but ....

Won't work on most client's browsers nowadays, since mostly use HTTP/1.1with keepalive and pipelining and thus have at least 2 concurrentsockets open, and JS does not know anything about sockets. Try fiddlingwith the apache keepalive settings.

The tcp keepalive timer setting you've mentioned, on the other hand, is
per socket. And as such only has an influence on locally created or
terminated sockets. A quick socket(2) and socket(7) skimming reveil:


[....]

If SO_KEEPALIVE is not enabled, the session will cease to exist after
2h? Is anyone here aware what the default values/ranges of win machines are?


Read again, tcp(7):

       tcp_keepalive_time
              The number of seconds a connection needs to be idle
              before TCP begins sending  out  keep-alive  probes.
              Keep-alives  are  only  sent  when the SO_KEEPALIVE
              socket option is enabled.   The  default  value  is
              7200 seconds (2 hours).  An idle connection is ter
              minated after approximately an additional  11  min
              utes  (9  probes  an  interval of 75 seconds apart)
              when keep-alive is enabled.

              Note that underlying connection tracking mechanisms
              and application timeouts may be much shorter.

Of course you can also use tcpdump to study the wickedness oftcp_keepalive ...

On top of that the keepalive timer has two different meanings, dependingif we are in a minisocket or in socket, state ESTABLISHED.

I'm a bit missing this view from your cited reference. I did not read it
through thoroughly. My apologies for not being more specific, however I
don't have more time right now.


Well, first of all thanks a lot for your expertise!

Sorry, I can't take more time right now to explain those complex issues.If you're interested in more TCP/IP stuff doing in Linux, I suggest youget a copy of "The Linux TCP/IP Stack, Networking for Embedded Systems"by Thomas F. Herbert, ISBN: 1-58450-284-3; or read the source, whichcontains all the original swearing of the Linux TCP stack creators in adirector's cut version.


Best regards,
Roberto Nibali, ratz
--

echo'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

<Prev in Thread]	Current Thread	[Next in Thread>
ipvsadm ..set / tcp timeout was: ipvsadm --set... what does it modify?, Andreas John Re: ipvsadm ..set / tcp timeout was: ipvsadm --set... what does it modify?, Roberto Nibali Re: ipvsadm ..set / tcp timeout was: ipvsadm --set... what does it modify?, Andreas John Re: ipvsadm ..set / tcp timeout was: ipvsadm --set... what does it modify?, Roberto Nibali Re: ipvsadm ..set / tcp timeout was: ipvsadm --set... what does it modify?, Andreas John Re: ipvsadm ..set / tcp timeout was: ipvsadm --set... what does it modify?, Roberto Nibali <=

Previous by Date:	RE: connection sync at failover, email, and using only basic IPmgmt, Richard Pickett
Next by Date:	Re: Extremely slow director on Centos, Roberto Nibali
Previous by Thread:	Re: ipvsadm ..set / tcp timeout was: ipvsadm --set... what does it modify?, Andreas John
Next by Thread:	Re: Extremely slow director on Centos, Roberto Nibali
Indexes:	[Date] [Thread] [Top] [All Lists]