LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: random SYN-drop function

To: Wensong Zhang <wensong@xxxxxxxxxxxx>
Subject: Re: random SYN-drop function
Cc: Ratz <ratz@xxxxxx>, lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Julian Anastasov <uli@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 19 Mar 2000 21:48:51 +0200 (EET)
        Hello,

On Sat, 18 Mar 2000, Wensong Zhang wrote:

>
>
>
> On Fri, 17 Mar 2000, Julian Anastasov wrote:
>
> >
> >     I have thought about something like this:
> >
> >     if (rate) {
> >             if (!--counter) {
> >                     counter = rate;
> >                     drop packet
> >             }
> >     }
> >     accept this packet
> >
> >     sltimer_handler() {
> >             counter = rate = the_big_formula
> >     }
> >
> >     Currently, the formula is not complex and can be put
> > in the packet handler. But as in above example we can evaluate the
> > rate in the time handler too, as in the LVS 0.9.[89]. We can use
> > rate=0 in normal situations and to put a free memory as value
> > for the rate after some checks, of course.
> >
> >     So, if we decide to drop packets before forwarding them,
> > we can use such simple drop mechanism. Rate means: "drop 1/rate
> > packets". If the rate is evaluated in the time handler it is valid
> > for one second. rate=1 is total block. counter and rate are global
> > for all kind of the entries: TCP/UDP
> >
>
> Yeah, it could work, we would need design a good the_big_formula.
>
> Before I implemented randomly scanning the table to drop syn entries, I
> thought that it is very much more likely to pick stale one
> (syn-flooding) than a live one. Normal connection might have more chances
> to connect the services. Actually, we need do deep investigation on this
> issue.
>

        Yes, may be it is better to pass the SYN packets and to
remove the entries when there is no enough memory. This is the better
way when the real servers are protected with SYN-cookies mechanism
and even if they are not protected.

        I have attached a patch (not compiled) with some proposals
but there is more that must be done.

        We must consider these things:

A>>> Entries not in the tables

        Currently, we can break valid connections which are in
handshake, this is always true when the real servers support
SYN-cookies. Once the entry is deleted we are going to send
RST (ICMP_PORT_UNREACH) frames originated from the Director
because currently, LVS delivers locally all packets which are
not registered in the table. This is a _BUG_ which must be
corrected. I.e. in ip_fw_demasquerade, just before the happy
statement:

/* sorry, all this trouble for a no-hit :) */

we must add:

#ifdef CONFIG_IP_MASQUERADE_VS
if (svc) {
        return -1;
}
#endif

This change is in the attached patch.

I.e. we must drop this packet. This can be ACK from the client
but for just removed entry by the sysctl_ip_vs_randomdrop.
The rule here must be: "If a packet is for a LVS service it
must be answered or dropped. Not delivered locally!". We
have to wait the packet from the real server to create the
entry in the proper state (point B). Else, we are going to
reset this connection which is very bad. Let's give the real
server a chance (point B) to resurrect the entry.

        It is very bad if we remove entries not in VS/NAT mode,
these connections are lost.

B>>> Resurrecting entries: the new dream for the VS/NAT mode

        For this, ip_fw_masquerade() must be patched
too (I didn't implemented it before we decide what to do).
LVS must resurrect the entry just like the MASQ, i.e.
ip_masq_new_vs() must be called with all consequences:
templates, etc. If we drop entries for which we are not sure
about the state (may be we drop entries in ES state in the real
server) we must have a way to resurrect them. I.e. the
OUTPUT(NO->ES via ACK). We rely on the real servers state and
if we drop entries in SYN_RECV state we know that the possible
state is ES. When we see ACK from the real server we can assume
that it is normal data ACK (not ACK for FIN). I.e. if we don't
drop entries in FW/TW state we know that the ACK from the real
server is about the established state. So, we can create the
entry in ES state (just like the MASQ). The OUTPUT table for NO
state is correct: we resurrect the entries in SS, TW, ES or CL
state according to the flags. We must implement it for LVS/NAT.

        What is the consequence if we can resurrect the entries:
we can drop even entries in established state. But when?
Only when we expect reply from the real server from which we
create the new entry. Currently, we rely on real servers ACK
just after the handshake. This is true when the traffic is
started just after the connection (most of the services:
http, ftp, etc.). I.e. we think we drop entries in SYN_RECV
state but it can be established. In this case we rely on the
first ACK from the real server. Why not on the next ACKs too?
So, we can drop entries in ES state too.

        So, under siege, LVS can drop entries from the table
after SYN and/or ACK packets coming from the client. In this
case we expect real server to reply with ACK-only. The rule is
that if we expect answer from the real server we can drop some
frames from the client and to rely on the retransmission. So,
when we receive SYN, SYN+ACK or ACK frame from the client we
forward it to the real server and drop the entry. Yes, this is
bad but we are under atack. If the real servers ACKs we resurrect
the entry, so nothing is lost. The drawback is that the ACK
can be delayed (0.5 seconds?). We have to check what are the
consequences for the persistent connections in this case too.

        In the demasq direction we still create entries only on
SYN packets.

C>>> Handshake ACK

        In 0.9.9 we switch to ES state when the first ACK
from the real server is received. This mechanism can break
some services (I don't know such services) which rely on the
first packet from the client, i.e. if the protocol requires
the first packet to be transfered from the client. In this
case the real server is in established state (just after the
handshake), client is in established state too, his first
packet is delayed but LVS removes this entry because it is
in SR state, i.e. there is no first ACK from the server.

        Once the entry is removed we can receive SYN+ACK
from the real server. In this case we just switch to SS
state which is unexpected for LVS. We have to treat SR and
SS states as same: we have to wait for ACK from the real
server before changing the state to ES. I patched the SS
state too. If not patched the transfer can go in SS state,
i.e. with less timeout.

        The question is if we need this ACK everytime. May be
we can use this change only under attack. I.e. we can switch
to mode when the ACK from the real server is required for
changing the state from SR or SS to ES. In normal mode we
can just switch to ES state (same as before 0.9.9). But if
we can resurrect the entries we can drop entries from the
table not just after the initial SYN from the client but
after each ACK too. We wait for a answer from the real server.
So, may be we can enable the SR change (added in 0.9.9) and
the change is the SS state (in the attached patch) only
when the Director is under attack. Or to restore the table
as before 0.9.9 and to rely on resurrection. We are no
more afraid to enter ES state when we can drop entries in
this state. Good, Yeah :)

If there is no memory to resurrect the entry we drop this
packet from the real server.

For example (VS/NAT 0.9.9):

        CLIENT          SERVER  SERVER STATE            NEW MASQ STATE

1.      SYN     ------>                                 => SR
2.              <------ SYN+ACK SYN_RECV                => SR
3.      ACK     ------>         EST                     => SR
4.      data+ACK------>         EST                     => SR
5.              <------ ACK     EST                     => ES

        We can drop even established connections if we pass
the initial SYN (1) because we drop everything in SR state.
This is possible even for real server without SYN cookies
support. We don't know which initial SYNs are dropped from
the real server if its backlog queue is full. Even if the
initial SYN is accepted from the real server we can drop
the entry before receiving the SYN+ACK answer from the real
server. So, there is no difference if we switch to ES state
in point 2 or in point 5. In all cases if we pass the
initial SYN it is possible the entry to be deleted while
the connection is established in both ends.

        I want to say that the OUTPUT table change in 0.9.9
is dangerous. We give 60 seconds for the handshake, for
the first data to be received from the client and ACK-ed
from the real server.

        Why not just to restore the old behavior to enter ES
from SR and SS states after the clients SYN or SYN+ACK. When
we have entry resurrection this is not a problem.

        I think, there is another problem. The masquerade can
be fooled to drop connections even if these connections are
not sniffed. May be LVS needs to forget about the current
INPUT and OUTPUT tables which are for non-trusted environments
and to build new tables, i.e. where we trust the real
servers. We must consider the real servers as trusted hosts
and to change the states based on the packets from the real
servers. I.e. may be we have to ignore the RST and FIN packets
coming from the client and to rely on the real servers response.
But this is another issue.

> Maybe we can combine these two methods in the system too. ;-)
>

        We must decide how to combine all these techniques:

- drop frames before passing them or after passing them

- change the state table under attack and to rely on the entry
resurrection from the outgoing packet (from the real server),
i.e. when the entries are dropped from the table (after passing
them). Warning: we need to drop entries until there is enough
memory (a free limit is reached). The resurrection on each
outgoing packet is very expensive.

- do we need always to ACK the connection from the real servers
or we can mark it as ES just after SYN/SYN+ACK from the real
server (as before 0.9.9) and to rely on the resurrection. In this
case we don't care if the client tries to flood us. We just
follow the real server's state.



        The current variants for the user are:

1> Modify the timeouts with ipchains -M -S

        EST/FIN states

        with timeout 1 minute for the established state we
        reach the same defense level as the entry removal.
        This mean that after that point we have to drop frames
        without passing them to the real server.

2> Any other variants?

3> The entry resurrection. Is it only a dream?

        So, after 0.9.9 we can:

- restore the OUTPUT(SR->ES) state (as in the mainstream kernel)
- drop all entries if we can't find entry for the client's packet
- implement entry resurrection


Regards

--
Julian Anastasov <uli@xxxxxxxxxxxxxxxxxxxxxx>

Attachment: syn-099-1.diff
Description: Changes

<Prev in Thread] Current Thread [Next in Thread>