Dear Andreas,
I'm currently not able to reply in-depth to your email because of
special work I'm doing, which requires my full attention. I'll be back
in April, if you're still interested.
implemented. Of course I could fly down to Julian's place over the
week-end and we could implement it together; want to sponsor it? ;).
Well, currently I am not in the position to sponsor such development,
but things may change. If that casse I would contact you via p-mail.
Fair enough.
(Ryanair does not fly to .bg .... :/ )
It's not easy to get to Julian's home. One has to fly to Sofia and then
bargain for a ticket to fly down to Varna :).
As you are from .ch (as your domain tells me ;)) ... will you be at
linuxtag (Wiesbaden, Germany?).
Unlikely, depends on how many other kernel developers I know will attend
it. The conference is too much user space centric :). Also Wiesbaden is
in the middle of nowhere and during this time I will probably be in
Morokko surfing.
And I looked at Julian's proposal. Ufff. I not a fluent C speaker, so if
I would try to do that, I would recommend not to use it :)
To be fair, we should probably just re-enable the proc-fs based timer
settings until we have a replacement. Maybe if I find my vim on time, I
can cook something up for 2.6.
eh, I got it form /proc:
# cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
Ahh, but that's not 2h + 9*75s :).
May I quote from the document you mentioned and comment: ip-sysctl.txt
It's better to quote from the source, since ip-sysctl.txt can be
inaccurate sometimes; not in this case though.
tcp_keepalive_time - INTEGER: How often TCP sends out keepalive messages
when keepalive is enabled. Default: 2hours.
-> The kernel tries all 2 houres to send probes and check if the
connection is still there?
tcp_keepalive_probes - INTEGER: How many keepalive probes TCP sends out,
until it decides that the connection is broken. Default value: 9.
-> The kernel tries 9 times to send such probes. The probes will be 75
secs (tcp_keepalive_intvl) distance. After all 9 failed, the kernel will
drop the connection.
That brings me back to my 7200 + 9 * 75 secs. But it may be only 7200 +
8 * 75 secs, because the value does say noting about the timeout of the
last IP paket.... errrrh </confused> ...
You're still referring to sockets, where as IPVS has nothing to do with
sockets. I'm sorry but due to time constraints I have to refer you to
reading the source, Luke :):
Start with net/ipv4/tcp.c:tcp_setsockopt()
case TCP_KEEPIDLE:
if (val < 1 || val > MAX_TCP_KEEPIDLE)
err = -EINVAL;
else {
tp->keepalive_time = val * HZ;
if (sock_flag(sk, SOCK_KEEPOPEN) &&
!((1 << sk->sk_state) &
(TCPF_CLOSE | TCPF_LISTEN))) {
__u32 elapsed = tcp_time_stamp - tp->rcv_tstamp;
if (tp->keepalive_time > elapsed)
elapsed = tp->keepalive_time - elapsed;
else
elapsed = 0;
inet_csk_reset_keepalive_timer(sk, elapsed);
}
}
The important thing is the inet_csk_reset_keepalive_timer() which ends
up calling sk_reset_timer() in net/core/sock.c, if I'm not mistaken.
There's a lot of TCP timers in the Linux kernel and they all have
[...]
/proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_timeout_time_wait:120
^^^^^^^^^
Are you on drugs or am I too dumb :) ?
Depending on your perception and understanding of these the answer
ranges from neither to both ;).
What do the netfiler timeouts
have to do with the tcp_keepalive in general?
Nothing, I hope I've never stated this in my previous emails.
Or did you respond to the
resource i mentioned (this was about netfilter but I was only interested
in the little part about tcp_keepalive in general ...)
Regarding IPVS tcp_keepalive has not much to say. IPVS maintains its own
state timers and is generally pretty unimpressed by socket timers. Of
course if it expires on the client's end, we get to see a RST and
according to the state table, you can figure out what happens next; here
is an excerpt for your viewing pleasure (../ipvs/ip_vs_proto_tcp.c):
static struct tcp_states_t tcp_states [] = {
/* INPUT */
/* sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA */
/*syn*/ {{sSR, sES, sES, sSR, sSR, sSR, sSR, sSR, sSR, sSR, sSR }},
/*fin*/ {{sCL, sCW, sSS, sTW, sTW, sTW, sCL, sCW, sLA, sLI, sTW }},
/*ack*/ {{sCL, sES, sSS, sES, sFW, sTW, sCL, sCW, sCL, sLI, sES }},
/*rst*/ {{sCL, sCL, sCL, sSR, sCL, sCL, sCL, sCL, sLA, sLI, sSR }},
/* OUTPUT */
/* sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA */
/*syn*/ {{sSS, sES, sSS, sSR, sSS, sSS, sSS, sSS, sSS, sLI, sSR }},
/*fin*/ {{sTW, sFW, sSS, sTW, sFW, sTW, sCL, sTW, sLA, sLI, sTW }},
/*ack*/ {{sES, sES, sSS, sES, sFW, sTW, sCL, sCW, sLA, sES, sES }},
/*rst*/ {{sCL, sCL, sSS, sCL, sCL, sTW, sCL, sCL, sCL, sCL, sCL }},
/* INPUT-ONLY */
/* sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA */
/*syn*/ {{sSR, sES, sES, sSR, sSR, sSR, sSR, sSR, sSR, sSR, sSR }},
/*fin*/ {{sCL, sFW, sSS, sTW, sFW, sTW, sCL, sCW, sLA, sLI, sTW }},
/*ack*/ {{sCL, sES, sSS, sES, sFW, sTW, sCL, sCW, sCL, sLI, sES }},
/*rst*/ {{sCL, sCL, sCL, sSR, sCL, sCL, sCL, sCL, sLA, sLI, sCL }},
};
And of course we have the IPVS TCP settings, which look as follows (if
they weren't disabled in the core :)):
^^^^^ disabled? why should we ?
Noone has complained before. I have re-instated them in the 2.4.x kernel
because I needed them. I don't need the fine granular settings Julian is
proposing, but I can envision someone have a use for them. However, not
having the ability to set state timeouts is not a good tradeoff to me.
timeouts. Since there is no socket (as in an endpoint) involved when
doing either netfilter or IPVS, you have to guess what the TCP flow
in-between (where you machine is "standing") is doing, so you can
continue to forward, rewrite, mangle, whatever, the flow, _without_
disturbing it. The timers are used for table mapping timeouts of TCP
states. If we didn't have them, mappings would stay in the kernel
forever and eventually we'd run out of memory. If we have them wrong, it
might occur that a connection is aborted prematurely by our host, for
example yielding those infamous ssh hangs when connecting through a
packet filter.
Yes, I am asking myself is, if wee see a FIN or RST fyling through a tco
connection, we could lower the timer significantly or even dropp the
entry because we may know what's happening?
In triangulation mode (LVS_DR) this is impossible to find out, unless
you use the forward shared approach written by Julian. I'm actually also
not very happy about the current way of detecting TCP state changes,
however this is how it is. If you're interested, ipvs/ip_vs_proto_tcp.c
contains the related code which decides when a connection state gets
which timeout; just check out set_tcp_state().
My current problem is a client connection with a keepalive foo to a DB
balanced (via NAT) to two real servers. The client opens a forms, types,
then parks his browser and going to lunch for a hour ot two. Then he
comes back and presses "sumbit".... validation error....
Umm, why do your clients eat? Maybe you should print a warning into your
forms, that lunch is forbidden under penalty of cleaning up the database
afterwards. Seriously though, there's really nothing IPVS can do for you
at this stage, since the client gets a TCP RST. Increase the TCP
keepalive settings and set an equally high persistency timeout.
IMVHO the best solution would be a javascript on the client that tries
to get 1 byte of from the DB every some minutes, but ....
Won't work on most client's browsers nowadays, since mostly use HTTP/1.1
with keepalive and pipelining and thus have at least 2 concurrent
sockets open, and JS does not know anything about sockets. Try fiddling
with the apache keepalive settings.
The tcp keepalive timer setting you've mentioned, on the other hand, is
per socket. And as such only has an influence on locally created or
terminated sockets. A quick socket(2) and socket(7) skimming reveil:
[....]
If SO_KEEPALIVE is not enabled, the session will cease to exist after
2h? Is anyone here aware what the default values/ranges of win machines are?
Read again, tcp(7):
tcp_keepalive_time
The number of seconds a connection needs to be idle
before TCP begins sending out keep-alive probes.
Keep-alives are only sent when the SO_KEEPALIVE
socket option is enabled. The default value is
7200 seconds (2 hours). An idle connection is ter
minated after approximately an additional 11 min
utes (9 probes an interval of 75 seconds apart)
when keep-alive is enabled.
Note that underlying connection tracking mechanisms
and application timeouts may be much shorter.
Of course you can also use tcpdump to study the wickedness of
tcp_keepalive ...
On top of that the keepalive timer has two different meanings, depending
if we are in a minisocket or in socket, state ESTABLISHED.
I'm a bit missing this view from your cited reference. I did not read it
through thoroughly. My apologies for not being more specific, however I
don't have more time right now.
Well, first of all thanks a lot for your expertise!
Sorry, I can't take more time right now to explain those complex issues.
If you're interested in more TCP/IP stuff doing in Linux, I suggest you
get a copy of "The Linux TCP/IP Stack, Networking for Embedded Systems"
by Thomas F. Herbert, ISBN: 1-58450-284-3; or read the source, which
contains all the original swearing of the Linux TCP stack creators in a
director's cut version.
Best regards,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
|