LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: random SYN-drop function

To: Julian Anastasov <uli@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: random SYN-drop function
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Wensong Zhang <wensong@xxxxxxxxxxxx>
Date: Sun, 26 Mar 2000 20:33:18 +0800 (CST)


On Sat, 25 Mar 2000, Julian Anastasov wrote:

> 
>       Hello,
> 
> On Sat, 25 Mar 2000, Wensong Zhang wrote:
> 
> >
> >
> > On Fri, 24 Mar 2000, Julian Anastasov wrote:
> >
> > > > We cannot resurrect entries for LVS/NAT, because we cannot get more
> > > > information from packets from the real servers, we don't know which
> > > > virtual service the services belong to (the IP address and port number 
> > > > of
> > > > the virtual service).
> > >
> > >   Do we support same raddr,rport for many virtual services?
> > > If this is true, we really can't restore the virtual service. But
> > > it is useless to add one real service to many virtual services
> >
> > I don't know whether there is such an application, but there is no problem
> > to add one real service to many virtual services for VS/NAT. ;-)
> 
>       Yes, currently it is possible. But I don't know why it can
> be useful :)
> 
> >
> > > for VS/NAT. For VS/DR and VS/TUN it is useful a real service to
> > > belong to many virtual services. Yep, there is a local node feature
> > > too that must be considered when we drop entries. May be it is
> > > better not to drop IP_MASQ_F_VS_LOCALNODE entries.
> > >
> >
> > Maybe it is not good, if we don't drop IP_MASQ_F_VS_LOCALNODE entries when
> > it is under attack, there may be a large number of IP_MASQ_F_VS_LOCALNODE
> > entries in SYN_RCV state.
> >
> 
> > ...
> 
> > >   But for VS/NAT we receive packet from the real service,
> > > i.e. saddr=raddr, sport=rport, daddr=caddr, dport=cport. We
> > > can search the real server by saddr/sport. If we reorganize
> > > the tables we can achieve that.
> > >
> > >   It can't work for ftp sessions, i.e. we can't
> > > resurrect them if we can't find the service. When we resurrect
> > > the entries in ip_fw_masquerade if the packet doesn't belong
> > > to a real service (or MASQ) we can drop it.
> > >
> > >   In fact, the MASQ can't resurrect entries but LVS/NAT
> > > can: if the v/r service still exist.
> > >
> >
> > Since the entries needs resurrecting in LVS/NAT may be just a tiny portion
> > of the whole things, maybe that the load balancer can simply send ICMP
> > packet to the real server that client is not reachable, and the server can
> > collect back resources quickly. For normal users of those entries, we are
> > sorry that their connections are broken because we are under attack, then
> > they need to establish the connections again and they should have
> > probability to access the service. It is simple to handle.
> 
>       Yes, may be we can notify the real server from ip_fw_masquerade.
> But only if we drop entries after passing the packets. Read the
> following notes.
> 
> > > be fooled to set the ES state. But currently, MASQ can be fooled
> > > by 3th "client" to:
> > >
> > > - set the state to SR or ES (flood attacks)
> > > - set the state to CW/CL via FIN/RST (hijacking, even not from
> > > a man-in-the-middle)
> > >
> > >   This is because the MASQ box checks only the flags and
> > > not the protocol data. We can at least check the flags from
> > > the real server but this leads to delayed transitions to ES state:
> > > we can stay in SR if there is no data transfered, f.e. the last
> > > SR/SS->ES state changes.
> > >
> >
> > Yeah, it is true. The MASQ box just checks the flags (SYN, ACK, RST and
> > FIN) to do TCP state transition, without checking the sequence number. It
> > is vulernable under the SYN following ACK attack.
> >
> > For VS/NAT, we may record the sequence number of SYN+ACK packet from real
> > server, then check the sequence number of ACK packet from client before
> > entering the ES state. Or, we may delay transition to the ES state until
> > data is transfered, it is fit for the TCP finite state machine, but it
> > might work.
> 
>       I think, it is better to delay the transition to ES state.
> 
> >
> > However, for VS/TUN and VS/DR, the load balancer is on the
> > client-to-server half connection, it cannot get the sequence number of
> > SYN+ACK packet from real server like that in VS/NAT, and it cannot delay
> > transition to the ES state. So, it is still vulernable under the SYN
> > following ACK attack.
> 
>       Someone to help VS/DR and VS/TUN, please. We can't :)
> We can only drop SYN packets without passing them, I think.
> 
> >
> > So, what is the solution applying to all the situations? I think that
> > maybe we can combine dropping entries and dropping 1/rate packets (you
> > proposed) together, just in order to let system have memory for new
> > connections. Anyway, the more memory the box has, the better. ;-) And, we
> > can tell users to use "ipchains -M -S ..." to set the possible small
> > values too. ;-)
> 
> 
>       It seems that we can't drop entries in SR state after passing
> the SYN packet to the real server. Currently, when the real server answers
> with SYN+ACK ip_fw_masquerade() creates new entry with ip_masq_new().
> Is that good?
> 

That's not good. I think it is simple to notify the real server that
destination is not reachable and drop the packet. ;)

>       In fact, if we start to drop entries (any kind), we have
> to modify ip_fw_masquerade to check if the packet comes from our
> real server. Else, we start to create MASQ entries to the real
> servers with mport=MASQ_port (first free>61000) which is never used.
> I.e. we create zombie entries in SS or ES state. May be we can notify
> the real server here but only if we support uniq real services under
> VS/NAT, i.e. one raddr/rport to be used from one virtual service.
> May be we can just drop packets coming from our real service.
> 
>       Usually the SYN packets are retransmitted if not answered
> soon. This is a very bad situation for VS/DR and VS/TUN methods. If
> we drop SR entry for a real server after passing the packet, the next
> SYN is send to another real server and the client is confused from
> two different SYN+ACK packets/cookies coming from two real servers.
> 

Yeah, it is possible, but it should rarely arise. Even if the problem
arises in some situations, I think the TCP protocol of client machines
should be reliable against to this problem, or just drop the connection,
it doesn't hurt too much to the whole situation. ;)

>       May be the SYN packets must be dropped without passing them to
> the real server, i.e. by using a drop rate.
> 

If there was only a 1/rate drop before forwarding under the syn-flooding
attack, it might soon reach the drop all situation (the system is nearly
filled with entries), then no new clients might access the services until
the attack stops. However, the random drop entry method can periodically
drop some entries to get memory for new connections, so that most of users
can still access the service and some need resend their request because
their entries might be dropped. If there is only random drop entry method,
the dropping speed might not keep up with the generation speed, especailly
under the SYN following ACK attack. So, maybe we can combine the two
methods.

>       For the resurrection of the entries. The only problem is that
> we don't know when to drop the ES entries. We are not sure if the
> real server will ACK soon. It is possible the connection to freeze.
> 
>       Currently, I see 3 working modes as useful for VS/NAT (for
> example, via ip_vs_defense_level):
> 
> mode 0        -       default mode
> 
>       No packets are dropped.
>       Under load we switch automatically to mode 1 and then back
>       to mode 0 when the system is not busy
> 
> mode 1        -       we are in dangerous area
> 
>       A> We start to ACK the connection setup
> 
>               May be when there is less than 10MB left (or configured
>       by user)?
> 
>               We have to use other timeouts and states (tables):
> 
>               We have to wait 10 seconds for example in SR state.
>       When/if the real server replies with SYN+ACK we switch
>       to a new state SA (abbreviated from SYN+ACK). If the real
>       server doesn't use SYN cookie protection we don't see this
>       SYN+ACK and the entry is dropped after 10 seconds. So,
>       we expect SYN+ACK from the real server for 10 seconds. This
>       is our support for all kinds of OS-es which doesn't
>       support SYN cookies, i.e. when they just ignore the extra
>       SYNs when their backlog is full. In fact, this is not a bad
>       mode for the real server if it is overloaded. But may be
>       the SYN cookie support is still preferred.
> 
>               The timeout for the new SA state can be 60 seconds,
>       same as the old SR state. Or 75? The rule here is that we
>       must stay in SA state until the ACK is received from the real
>       server to allow the transition to ES state. We can't trust the
>       client, so we can't switch to ES after its ACK. This is OK for
>       the most of the services.
> 
>       B> We start to drop SYN packets using rate and without passing
>       them to the real server.
 > 
>               Yep, if the above protection doesn't work it is a
>       time to switch to a faster Director. Buy more RAM, to feed your
>       real servers. They accept more connections than the Director
>       can handle.
> 
> mode 2        -       This is same as mode 1 but when set from the user,
>       LVS can't return automatically to mode 0. Very useful when
>       the user thinks that he is permanently under attack or just
>       for debugging.
> 


Yeah, I like your idea of ip_vs_defence_level, where we can add more
defence strategies there, and let users to choose the one they like. ;)

> 
> For the BUGS:
 > 
> ip_fw_masquerade() incorrectly continues to send the packet after
> ip_route_output() is failed. This is a recent MASQ bug. We must
> return -1; and not to use the default gateway with
> inet_select_addr(). We have to drop this packet, may be the routing
> cache needs tunning, so don't try to send this packet.
> 

I agree. And, for the performance reason of VS/NAT, we probably need to
move the determination of maddr to where is really needed.

Thanks,

Wensong




<Prev in Thread] Current Thread [Next in Thread>