LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: random SYN-drop function

To: Julian Anastasov <uli@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: random SYN-drop function
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Wensong Zhang <wensong@xxxxxxxxxxxx>
Date: Mon, 27 Mar 2000 09:59:46 +0800 (CST)

Hi Julian,

On Sun, 26 Mar 2000, Julian Anastasov wrote:

> 
> On Sun, 26 Mar 2000, Wensong Zhang wrote:
> 
> > >   It seems that we can't drop entries in SR state after passing
> > > the SYN packet to the real server. Currently, when the real server answers
> > > with SYN+ACK ip_fw_masquerade() creates new entry with ip_masq_new().
> > > Is that good?
> > >
> >
> > That's not good. I think it is simple to notify the real server that
> > destination is not reachable and drop the packet. ;)
> 
>       That is easy. We can add hash table for the destinations
> which will be used only by ip_fw_masquerade. We even don't need
> struct ip_masq_dest.svc pointer. We can just check if this is our
> real service and return ICMP_PORT_UNREACH to the real server
> when the entry is removed.
> 

Yup.

> >
> > >   In fact, if we start to drop entries (any kind), we have
> > > to modify ip_fw_masquerade to check if the packet comes from our
> > > real server. Else, we start to create MASQ entries to the real
> > > servers with mport=MASQ_port (first free>61000) which is never used.
> > > I.e. we create zombie entries in SS or ES state. May be we can notify
> > > the real server here but only if we support uniq real services under
> > > VS/NAT, i.e. one raddr/rport to be used from one virtual service.
> > > May be we can just drop packets coming from our real service.
> > >
> > >   Usually the SYN packets are retransmitted if not answered
> > > soon. This is a very bad situation for VS/DR and VS/TUN methods. If
> > > we drop SR entry for a real server after passing the packet, the next
> > > SYN is send to another real server and the client is confused from
> > > two different SYN+ACK packets/cookies coming from two real servers.
> > >
> >
> > Yeah, it is possible, but it should rarely arise. Even if the problem
> > arises in some situations, I think the TCP protocol of client machines
> > should be reliable against to this problem, or just drop the connection,
> > it doesn't hurt too much to the whole situation. ;)
> 
>       Currently, the Linux client will send the 2nd SYN packet
> 3 seconds after the 1st. So, the client waits 3 seconds after
> sending the initial SYN and before receiving the SYN+ACK from the
> LVS. I think it occurs very often. On slow links, we usually
> don't receive SYN+ACK in these 3 seconds.
> 

The probability that the first SYN packet is passed then dropped and the
2nd SYN packet is passed is low. Even for the clients resending SYN packet
in 3 seconds on slow links, supposing the probability that an entry is
dropped in every second random scanning is 1/16, then the probability that
it happens is (1/16 + (15/16)*(1/16) + (15/16)^2*(1/16))*15/16, less than
17%. ;-)

>       But we can solve this problem for LVS/NAT if we return
> ICMP_PORT_UNREACH in ip_fw_masquerade. In this case we are
> going to send the same number of ICMP replies to the real server
> as the number of dropped SYN flood requests when the real server
> supports SYN cookie protection. But the problem is not solved for
> the other 3 methods: DR, TUN, LOCAL. We will confuse the client.
> 
> >
> > >   May be the SYN packets must be dropped without passing them to
> > > the real server, i.e. by using a drop rate.
> > >
> >
> > If there was only a 1/rate drop before forwarding under the syn-flooding
> > attack, it might soon reach the drop all situation (the system is nearly
> > filled with entries), then no new clients might access the services until
> > the attack stops. However, the random drop entry method can periodically
> > drop some entries to get memory for new connections, so that most of users
> > can still access the service and some need resend their request because
> > their entries might be dropped. If there is only random drop entry method,
> > the dropping speed might not keep up with the generation speed, especailly
> > under the SYN following ACK attack. So, maybe we can combine the two
> > methods.
> 
>       OK, we can add many independent sysctls (in
> /proc/sys/net/ipv4/vs/ ?):
> 
> - ip_masq_ignore_requests
> 
>       - To drop MASQ/LVS TCP/UDP requests without passing the
>       requests to the internal host
> 
>               [I'll send soon an example implementation]
> 
> - ip_vs_randomdrop
> 
>       - To drop entries from the LVS table
> 
>               [Currently in LVS]
> 
> - ip_vs_secure_tcp
> 
>       - To switch to delayed TCP state transitions, i.e.
>       using many state tables/timeouts. By this way we
>       can follow the real servers TCP flags.
> 
>               [I'm currently trying to attach/detach state
>               tables to the LVS entries using
>               ip_masq_timeout_attach/detach]
> 
>       Each sysctl var have to allow manually or automatically to
> switch the working mode (0, 1 or 2?).
> 

OK, thanks!

> >
> > >   For the resurrection of the entries. The only problem is that
> > > we don't know when to drop the ES entries. We are not sure if the
> > > real server will ACK soon. It is possible the connection to freeze.
> > >
> > >   Currently, I see 3 working modes as useful for VS/NAT (for
> > > example, via ip_vs_defense_level):
> > >
> > > mode 0    -       default mode
> > >
> > >   No packets are dropped.
> > >   Under load we switch automatically to mode 1 and then back
> > >   to mode 0 when the system is not busy
> > >
> > > mode 1    -       we are in dangerous area
> > >
> > >   A> We start to ACK the connection setup
> > >
> > >           May be when there is less than 10MB left (or configured
> > >   by user)?
> > >
> > >           We have to use other timeouts and states (tables):
> > >
> > >           We have to wait 10 seconds for example in SR state.
> > >   When/if the real server replies with SYN+ACK we switch
> > >   to a new state SA (abbreviated from SYN+ACK). If the real
> > >   server doesn't use SYN cookie protection we don't see this
> > >   SYN+ACK and the entry is dropped after 10 seconds. So,
> > >   we expect SYN+ACK from the real server for 10 seconds. This
> > >   is our support for all kinds of OS-es which doesn't
> > >   support SYN cookies, i.e. when they just ignore the extra
> > >   SYNs when their backlog is full. In fact, this is not a bad
> > >   mode for the real server if it is overloaded. But may be
> > >   the SYN cookie support is still preferred.
> > >
> > >           The timeout for the new SA state can be 60 seconds,
> > >   same as the old SR state. Or 75? The rule here is that we
> > >   must stay in SA state until the ACK is received from the real
> > >   server to allow the transition to ES state. We can't trust the
> > >   client, so we can't switch to ES after its ACK. This is OK for
> > >   the most of the services.
> > >
> > >   B> We start to drop SYN packets using rate and without passing
> > >   them to the real server.
> >  >
> > >           Yep, if the above protection doesn't work it is a
> > >   time to switch to a faster Director. Buy more RAM, to feed your
> > >   real servers. They accept more connections than the Director
> > >   can handle.
> > >
> > > mode 2    -       This is same as mode 1 but when set from the user,
> > >   LVS can't return automatically to mode 0. Very useful when
> > >   the user thinks that he is permanently under attack or just
> > >   for debugging.
> > >
> >
> >
> > Yeah, I like your idea of ip_vs_defence_level, where we can add more
> > defence strategies there, and let users to choose the one they like. ;)
> 
>       I like the idea of many independent sysctl vars.
> We have to choose correct names for the above sysctl vars.
> 

It is good to have many independent sysctl vars. However, it might be good
to have an ip_vs_defence_level sysctl, whoes value can be 0,1,2,3,4...,
different level means different defence strategies, then there is only one
sysctl on this. ;)

> >
> > >
> > > For the BUGS:
> >  >
> > > ip_fw_masquerade() incorrectly continues to send the packet after
> > > ip_route_output() is failed. This is a recent MASQ bug. We must
> > > return -1; and not to use the default gateway with
> > > inet_select_addr(). We have to drop this packet, may be the routing
> > > cache needs tunning, so don't try to send this packet.
> > >
> >
> > I agree. And, for the performance reason of VS/NAT, we probably need to
> > move the determination of maddr to where is really needed.
> 
>       Yes, under DoS attack when we drop entries after passing the
> TCP/UDP requests to the real servers, we can move maddr selection
> for TCP/UDP after checking whether we need to send ICMP_PORT_UNREACH
> back to the real server or to pass the outoing packet to the client.
> icmp_send() has its own route decisions.
> 
> 
>       For the UDP entries. Is the checking in ip_vs_random_drop()
> for IP_MASQ_S_NONE correct? Isn't the state IP_MASQ_S_UDP? May be
> I'm missing something? May be it is better to control the UDP
> entries by using ipchains and not to drop entries from the table?
> We still can implement dropping UDP entries without passing them.
> 

It is my mistake, the state should be IP_MASQ_S_UDP. But, I cannot see the
reason that we cannot drop UDP entries, UDP itself is unreliable and
connectionless, UDP packets can be lost, duplicated, out of oder in the
transfer.

Thanks,

Wensong



<Prev in Thread] Current Thread [Next in Thread>