...
> The idea behind this is that if we see the same SYN sent multiple times,
> we know that for some reason, the person attempting to create a connection
> is never receiving a SYN-ACK -- or at least is claiming something to that
> effect.
> What could cause this?
> 1. Real server is never receiving the SYN, and/or is never sending a
> SYN-ACK.
> 2. Client is never receiving a SYN-ACK because it's being lost in
> transit.
> 3. Client is doing broken things -- either intentionally (syn flood) or
> unintentionally (bugs)
In your original message you were talking about specificaly seeing the same
sequence number over and over... not the same SYN. The SYN only carries the
ISN and only is worried about opening the connection.
Responding to each case:
1. If the real server never received the SYN and the client is trying to
open up a connection again, there is probably a problem at the IP layer with
the real server that can be detected more quickly some other way.
2. If the real server's SYN-ACK was lost in transit, there may be nothing
that can be done. This could be getting lost somewhere on the Internet that
cannot be touched. If it is a local problem, it probably is a static
problem. That is, I think the sort of monitoring you're suggesting would be
to detect dynamic problems... real servers going down... things like that.
Unless an administrator has made a routing change suddenly (that he should
investigate anyway to be sure that things are still working), the SYN-ACK
being lost in transit shouldn't be worried about.
3. There's already SYN flooding support that can be compiled into Linux as
well as into the routers ahead of your LinuxDirector. I would worry that
building extra SYN flooding security into LVS might be concentrating on the
wrong place. If a better method of avoiding SYN flooding exists, perhaps
someone should work on adding that to Linux in general.
I just don't see how keeping track of each SYN will provide that accurate
monitoring. At best it will verify that there is something at a lower level
which is down. However, by the time the LVS has enough information to know
from the repeated SYNs that a lower layer MIGHT be down, another monitor
(ICMP echo request/response, for example) would have already concluded that.
> I wouldn't mind knowing about it in any of these cases.
>
> There's another possibility:
> 4. Server requested retransmission of SYN
>
> When would that happen though? Checksum error? Any other cases?
A server wouldn't ever actively request retransmission of a SYN.
A SYN contains the initial sequence number... The way PAR works is that the
transmitting host will retransmit if an acknowledgement isn't sent in a
certain amount of time. The only positive acknowledgement to a SYN is a
SYN-ACK, which says, "Hey, I heard you and you told me you start here (ACK).
Well, I start here (SYN)."
> The idea is to cut down on the time it takes for me to notice that there's
> a problem, irrespective of where that problem may be. I think this could
> do that. LVS isn't just being used for load-balancing, it's being used
> for high availability, and the name of the game with high-availability is
> detecting problems -- fast -- so you can solve or work around them. This
> could be another tool to do that.
I really don't think that it would help detect problems that fast because
resending packets is just a part of TCP. It would take a while for enough
errors to rack up to accurately (and it wouldn't necessarily be that
accurate) conclude that something (which could be a number of things) is
down.
> If it's a routing problem, I can route around it. If it's a downed
> server, I can remove it from the pool. If it's a syn-flood, I can start
> adding some firewall rules to stop the attack up stream. In any case, I
> know that something is amiss, and I can do something about it.
But what do you gain by looking specifically at SYN segments? SYNs are only
sent at connection time. What do you gain by specifically looking at
sequence numbers? Sequence numbers are all around the board all the time...
And do you expect to keep a large table of all of these values inside the
kernel?
If you're just talking about monitoring, you could probably put a TCP
sniffer on your LVS boxes that monitors all traffic and sends you a
notification when certain things occur... I don't think this needs to be a
part of LVS specifically.
And again about the SYN flooding, there are already things which can be
compiled into the Linux kernel which count the number of SYNs per second and
will stop listening to a specific host if that host appears to be hostile.
There are also patches which implement SYN cookies, which basically allows
for a method of low-level authentication that keeps flooding from affecting
a machine. I just think that we would have to be careful that we're not
reinventing the wheel and then putting it on top of the car. :) Maybe we
should leave the SYN flooding stuff to the guys already involved in SYN
flooding projects.
I dunno... It's just hard for me to see how such support could be built into
LVS to make it a better HA solution.
All the best --
Ted
|