Re: random SYN-drop function

To: Wensong Zhang <wensong@xxxxxxxxxxxx>
Subject: Re: random SYN-drop function
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Julian Anastasov <uli@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 25 Mar 2000 13:44:54 +0200 (EET)

On Sat, 25 Mar 2000, Wensong Zhang wrote:

> On Fri, 24 Mar 2000, Julian Anastasov wrote:
> > > We cannot resurrect entries for LVS/NAT, because we cannot get more
> > > information from packets from the real servers, we don't know which
> > > virtual service the services belong to (the IP address and port number of
> > > the virtual service).
> >
> >     Do we support same raddr,rport for many virtual services?
> > If this is true, we really can't restore the virtual service. But
> > it is useless to add one real service to many virtual services
> I don't know whether there is such an application, but there is no problem
> to add one real service to many virtual services for VS/NAT. ;-)

        Yes, currently it is possible. But I don't know why it can
be useful :)

> > for VS/NAT. For VS/DR and VS/TUN it is useful a real service to
> > belong to many virtual services. Yep, there is a local node feature
> > too that must be considered when we drop entries. May be it is
> > better not to drop IP_MASQ_F_VS_LOCALNODE entries.
> >
> Maybe it is not good, if we don't drop IP_MASQ_F_VS_LOCALNODE entries when
> it is under attack, there may be a large number of IP_MASQ_F_VS_LOCALNODE
> entries in SYN_RCV state.

> ...

> >     But for VS/NAT we receive packet from the real service,
> > i.e. saddr=raddr, sport=rport, daddr=caddr, dport=cport. We
> > can search the real server by saddr/sport. If we reorganize
> > the tables we can achieve that.
> >
> >     It can't work for ftp sessions, i.e. we can't
> > resurrect them if we can't find the service. When we resurrect
> > the entries in ip_fw_masquerade if the packet doesn't belong
> > to a real service (or MASQ) we can drop it.
> >
> >     In fact, the MASQ can't resurrect entries but LVS/NAT
> > can: if the v/r service still exist.
> >
> Since the entries needs resurrecting in LVS/NAT may be just a tiny portion
> of the whole things, maybe that the load balancer can simply send ICMP
> packet to the real server that client is not reachable, and the server can
> collect back resources quickly. For normal users of those entries, we are
> sorry that their connections are broken because we are under attack, then
> they need to establish the connections again and they should have
> probability to access the service. It is simple to handle.

        Yes, may be we can notify the real server from ip_fw_masquerade.
But only if we drop entries after passing the packets. Read the
following notes.

> > be fooled to set the ES state. But currently, MASQ can be fooled
> > by 3th "client" to:
> >
> > - set the state to SR or ES (flood attacks)
> > - set the state to CW/CL via FIN/RST (hijacking, even not from
> > a man-in-the-middle)
> >
> >     This is because the MASQ box checks only the flags and
> > not the protocol data. We can at least check the flags from
> > the real server but this leads to delayed transitions to ES state:
> > we can stay in SR if there is no data transfered, f.e. the last
> > SR/SS->ES state changes.
> >
> Yeah, it is true. The MASQ box just checks the flags (SYN, ACK, RST and
> FIN) to do TCP state transition, without checking the sequence number. It
> is vulernable under the SYN following ACK attack.
> For VS/NAT, we may record the sequence number of SYN+ACK packet from real
> server, then check the sequence number of ACK packet from client before
> entering the ES state. Or, we may delay transition to the ES state until
> data is transfered, it is fit for the TCP finite state machine, but it
> might work.

        I think, it is better to delay the transition to ES state.

> However, for VS/TUN and VS/DR, the load balancer is on the
> client-to-server half connection, it cannot get the sequence number of
> SYN+ACK packet from real server like that in VS/NAT, and it cannot delay
> transition to the ES state. So, it is still vulernable under the SYN
> following ACK attack.

        Someone to help VS/DR and VS/TUN, please. We can't :)
We can only drop SYN packets without passing them, I think.

> So, what is the solution applying to all the situations? I think that
> maybe we can combine dropping entries and dropping 1/rate packets (you
> proposed) together, just in order to let system have memory for new
> connections. Anyway, the more memory the box has, the better. ;-) And, we
> can tell users to use "ipchains -M -S ..." to set the possible small
> values too. ;-)

        It seems that we can't drop entries in SR state after passing
the SYN packet to the real server. Currently, when the real server answers
with SYN+ACK ip_fw_masquerade() creates new entry with ip_masq_new().
Is that good?

        In fact, if we start to drop entries (any kind), we have
to modify ip_fw_masquerade to check if the packet comes from our
real server. Else, we start to create MASQ entries to the real
servers with mport=MASQ_port (first free>61000) which is never used.
I.e. we create zombie entries in SS or ES state. May be we can notify
the real server here but only if we support uniq real services under
VS/NAT, i.e. one raddr/rport to be used from one virtual service.
May be we can just drop packets coming from our real service.

        Usually the SYN packets are retransmitted if not answered
soon. This is a very bad situation for VS/DR and VS/TUN methods. If
we drop SR entry for a real server after passing the packet, the next
SYN is send to another real server and the client is confused from
two different SYN+ACK packets/cookies coming from two real servers.

        May be the SYN packets must be dropped without passing them to
the real server, i.e. by using a drop rate.

        For the resurrection of the entries. The only problem is that
we don't know when to drop the ES entries. We are not sure if the
real server will ACK soon. It is possible the connection to freeze.

        Currently, I see 3 working modes as useful for VS/NAT (for
example, via ip_vs_defense_level):

mode 0  -       default mode

        No packets are dropped.
        Under load we switch automatically to mode 1 and then back
        to mode 0 when the system is not busy

mode 1  -       we are in dangerous area

        A> We start to ACK the connection setup

                May be when there is less than 10MB left (or configured
        by user)?

                We have to use other timeouts and states (tables):

                We have to wait 10 seconds for example in SR state.
        When/if the real server replies with SYN+ACK we switch
        to a new state SA (abbreviated from SYN+ACK). If the real
        server doesn't use SYN cookie protection we don't see this
        SYN+ACK and the entry is dropped after 10 seconds. So,
        we expect SYN+ACK from the real server for 10 seconds. This
        is our support for all kinds of OS-es which doesn't
        support SYN cookies, i.e. when they just ignore the extra
        SYNs when their backlog is full. In fact, this is not a bad
        mode for the real server if it is overloaded. But may be
        the SYN cookie support is still preferred.

                The timeout for the new SA state can be 60 seconds,
        same as the old SR state. Or 75? The rule here is that we
        must stay in SA state until the ACK is received from the real
        server to allow the transition to ES state. We can't trust the
        client, so we can't switch to ES after its ACK. This is OK for
        the most of the services.

        B> We start to drop SYN packets using rate and without passing
        them to the real server.

                Yep, if the above protection doesn't work it is a
        time to switch to a faster Director. Buy more RAM, to feed your
        real servers. They accept more connections than the Director
        can handle.

mode 2  -       This is same as mode 1 but when set from the user,
        LVS can't return automatically to mode 0. Very useful when
        the user thinks that he is permanently under attack or just
        for debugging.

For the BUGS:

ip_fw_masquerade() incorrectly continues to send the packet after
ip_route_output() is failed. This is a recent MASQ bug. We must
return -1; and not to use the default gateway with
inet_select_addr(). We have to drop this packet, may be the routing
cache needs tunning, so don't try to send this packet.


Julian Anastasov <uli@xxxxxxxxxxxxxxxxxxxxxx>

<Prev in Thread] Current Thread [Next in Thread>