LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: random SYN-drop function

To: Wensong Zhang <wensong@xxxxxxxxxxxx>
Subject: Re: random SYN-drop function
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Julian Anastasov <uli@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 26 Mar 2000 19:11:27 +0300 (EEST)
        Hello,

On Sun, 26 Mar 2000, Wensong Zhang wrote:

> >     It seems that we can't drop entries in SR state after passing
> > the SYN packet to the real server. Currently, when the real server answers
> > with SYN+ACK ip_fw_masquerade() creates new entry with ip_masq_new().
> > Is that good?
> >
>
> That's not good. I think it is simple to notify the real server that
> destination is not reachable and drop the packet. ;)

        That is easy. We can add hash table for the destinations
which will be used only by ip_fw_masquerade. We even don't need
struct ip_masq_dest.svc pointer. We can just check if this is our
real service and return ICMP_PORT_UNREACH to the real server
when the entry is removed.

>
> >     In fact, if we start to drop entries (any kind), we have
> > to modify ip_fw_masquerade to check if the packet comes from our
> > real server. Else, we start to create MASQ entries to the real
> > servers with mport=MASQ_port (first free>61000) which is never used.
> > I.e. we create zombie entries in SS or ES state. May be we can notify
> > the real server here but only if we support uniq real services under
> > VS/NAT, i.e. one raddr/rport to be used from one virtual service.
> > May be we can just drop packets coming from our real service.
> >
> >     Usually the SYN packets are retransmitted if not answered
> > soon. This is a very bad situation for VS/DR and VS/TUN methods. If
> > we drop SR entry for a real server after passing the packet, the next
> > SYN is send to another real server and the client is confused from
> > two different SYN+ACK packets/cookies coming from two real servers.
> >
>
> Yeah, it is possible, but it should rarely arise. Even if the problem
> arises in some situations, I think the TCP protocol of client machines
> should be reliable against to this problem, or just drop the connection,
> it doesn't hurt too much to the whole situation. ;)

        Currently, the Linux client will send the 2nd SYN packet
3 seconds after the 1st. So, the client waits 3 seconds after
sending the initial SYN and before receiving the SYN+ACK from the
LVS. I think it occurs very often. On slow links, we usually
don't receive SYN+ACK in these 3 seconds.

        But we can solve this problem for LVS/NAT if we return
ICMP_PORT_UNREACH in ip_fw_masquerade. In this case we are
going to send the same number of ICMP replies to the real server
as the number of dropped SYN flood requests when the real server
supports SYN cookie protection. But the problem is not solved for
the other 3 methods: DR, TUN, LOCAL. We will confuse the client.

>
> >     May be the SYN packets must be dropped without passing them to
> > the real server, i.e. by using a drop rate.
> >
>
> If there was only a 1/rate drop before forwarding under the syn-flooding
> attack, it might soon reach the drop all situation (the system is nearly
> filled with entries), then no new clients might access the services until
> the attack stops. However, the random drop entry method can periodically
> drop some entries to get memory for new connections, so that most of users
> can still access the service and some need resend their request because
> their entries might be dropped. If there is only random drop entry method,
> the dropping speed might not keep up with the generation speed, especailly
> under the SYN following ACK attack. So, maybe we can combine the two
> methods.

        OK, we can add many independent sysctls (in
/proc/sys/net/ipv4/vs/ ?):

- ip_masq_ignore_requests

        - To drop MASQ/LVS TCP/UDP requests without passing the
        requests to the internal host

                [I'll send soon an example implementation]

- ip_vs_randomdrop

        - To drop entries from the LVS table

                [Currently in LVS]

- ip_vs_secure_tcp

        - To switch to delayed TCP state transitions, i.e.
        using many state tables/timeouts. By this way we
        can follow the real servers TCP flags.

                [I'm currently trying to attach/detach state
                tables to the LVS entries using
                ip_masq_timeout_attach/detach]

        Each sysctl var have to allow manually or automatically to
switch the working mode (0, 1 or 2?).

>
> >     For the resurrection of the entries. The only problem is that
> > we don't know when to drop the ES entries. We are not sure if the
> > real server will ACK soon. It is possible the connection to freeze.
> >
> >     Currently, I see 3 working modes as useful for VS/NAT (for
> > example, via ip_vs_defense_level):
> >
> > mode 0      -       default mode
> >
> >     No packets are dropped.
> >     Under load we switch automatically to mode 1 and then back
> >     to mode 0 when the system is not busy
> >
> > mode 1      -       we are in dangerous area
> >
> >     A> We start to ACK the connection setup
> >
> >             May be when there is less than 10MB left (or configured
> >     by user)?
> >
> >             We have to use other timeouts and states (tables):
> >
> >             We have to wait 10 seconds for example in SR state.
> >     When/if the real server replies with SYN+ACK we switch
> >     to a new state SA (abbreviated from SYN+ACK). If the real
> >     server doesn't use SYN cookie protection we don't see this
> >     SYN+ACK and the entry is dropped after 10 seconds. So,
> >     we expect SYN+ACK from the real server for 10 seconds. This
> >     is our support for all kinds of OS-es which doesn't
> >     support SYN cookies, i.e. when they just ignore the extra
> >     SYNs when their backlog is full. In fact, this is not a bad
> >     mode for the real server if it is overloaded. But may be
> >     the SYN cookie support is still preferred.
> >
> >             The timeout for the new SA state can be 60 seconds,
> >     same as the old SR state. Or 75? The rule here is that we
> >     must stay in SA state until the ACK is received from the real
> >     server to allow the transition to ES state. We can't trust the
> >     client, so we can't switch to ES after its ACK. This is OK for
> >     the most of the services.
> >
> >     B> We start to drop SYN packets using rate and without passing
> >     them to the real server.
>  >
> >             Yep, if the above protection doesn't work it is a
> >     time to switch to a faster Director. Buy more RAM, to feed your
> >     real servers. They accept more connections than the Director
> >     can handle.
> >
> > mode 2      -       This is same as mode 1 but when set from the user,
> >     LVS can't return automatically to mode 0. Very useful when
> >     the user thinks that he is permanently under attack or just
> >     for debugging.
> >
>
>
> Yeah, I like your idea of ip_vs_defence_level, where we can add more
> defence strategies there, and let users to choose the one they like. ;)

        I like the idea of many independent sysctl vars.
We have to choose correct names for the above sysctl vars.

>
> >
> > For the BUGS:
>  >
> > ip_fw_masquerade() incorrectly continues to send the packet after
> > ip_route_output() is failed. This is a recent MASQ bug. We must
> > return -1; and not to use the default gateway with
> > inet_select_addr(). We have to drop this packet, may be the routing
> > cache needs tunning, so don't try to send this packet.
> >
>
> I agree. And, for the performance reason of VS/NAT, we probably need to
> move the determination of maddr to where is really needed.

        Yes, under DoS attack when we drop entries after passing the
TCP/UDP requests to the real servers, we can move maddr selection
for TCP/UDP after checking whether we need to send ICMP_PORT_UNREACH
back to the real server or to pass the outoing packet to the client.
icmp_send() has its own route decisions.


        For the UDP entries. Is the checking in ip_vs_random_drop()
for IP_MASQ_S_NONE correct? Isn't the state IP_MASQ_S_UDP? May be
I'm missing something? May be it is better to control the UDP
entries by using ipchains and not to drop entries from the table?
We still can implement dropping UDP entries without passing them.


Regards

--
Julian Anastasov <uli@xxxxxxxxxxxxxxxxxxxxxx>

<Prev in Thread] Current Thread [Next in Thread>