Hi Julian,
On Sun, 26 Mar 2000, Julian Anastasov wrote:
>
> On Sun, 26 Mar 2000, Wensong Zhang wrote:
>
> > > It seems that we can't drop entries in SR state after passing
> > > the SYN packet to the real server. Currently, when the real server answers
> > > with SYN+ACK ip_fw_masquerade() creates new entry with ip_masq_new().
> > > Is that good?
> > >
> >
> > That's not good. I think it is simple to notify the real server that
> > destination is not reachable and drop the packet. ;)
>
> That is easy. We can add hash table for the destinations
> which will be used only by ip_fw_masquerade. We even don't need
> struct ip_masq_dest.svc pointer. We can just check if this is our
> real service and return ICMP_PORT_UNREACH to the real server
> when the entry is removed.
>
Yup.
> >
> > > In fact, if we start to drop entries (any kind), we have
> > > to modify ip_fw_masquerade to check if the packet comes from our
> > > real server. Else, we start to create MASQ entries to the real
> > > servers with mport=MASQ_port (first free>61000) which is never used.
> > > I.e. we create zombie entries in SS or ES state. May be we can notify
> > > the real server here but only if we support uniq real services under
> > > VS/NAT, i.e. one raddr/rport to be used from one virtual service.
> > > May be we can just drop packets coming from our real service.
> > >
> > > Usually the SYN packets are retransmitted if not answered
> > > soon. This is a very bad situation for VS/DR and VS/TUN methods. If
> > > we drop SR entry for a real server after passing the packet, the next
> > > SYN is send to another real server and the client is confused from
> > > two different SYN+ACK packets/cookies coming from two real servers.
> > >
> >
> > Yeah, it is possible, but it should rarely arise. Even if the problem
> > arises in some situations, I think the TCP protocol of client machines
> > should be reliable against to this problem, or just drop the connection,
> > it doesn't hurt too much to the whole situation. ;)
>
> Currently, the Linux client will send the 2nd SYN packet
> 3 seconds after the 1st. So, the client waits 3 seconds after
> sending the initial SYN and before receiving the SYN+ACK from the
> LVS. I think it occurs very often. On slow links, we usually
> don't receive SYN+ACK in these 3 seconds.
>
The probability that the first SYN packet is passed then dropped and the
2nd SYN packet is passed is low. Even for the clients resending SYN packet
in 3 seconds on slow links, supposing the probability that an entry is
dropped in every second random scanning is 1/16, then the probability that
it happens is (1/16 + (15/16)*(1/16) + (15/16)^2*(1/16))*15/16, less than
17%. ;-)
> But we can solve this problem for LVS/NAT if we return
> ICMP_PORT_UNREACH in ip_fw_masquerade. In this case we are
> going to send the same number of ICMP replies to the real server
> as the number of dropped SYN flood requests when the real server
> supports SYN cookie protection. But the problem is not solved for
> the other 3 methods: DR, TUN, LOCAL. We will confuse the client.
>
> >
> > > May be the SYN packets must be dropped without passing them to
> > > the real server, i.e. by using a drop rate.
> > >
> >
> > If there was only a 1/rate drop before forwarding under the syn-flooding
> > attack, it might soon reach the drop all situation (the system is nearly
> > filled with entries), then no new clients might access the services until
> > the attack stops. However, the random drop entry method can periodically
> > drop some entries to get memory for new connections, so that most of users
> > can still access the service and some need resend their request because
> > their entries might be dropped. If there is only random drop entry method,
> > the dropping speed might not keep up with the generation speed, especailly
> > under the SYN following ACK attack. So, maybe we can combine the two
> > methods.
>
> OK, we can add many independent sysctls (in
> /proc/sys/net/ipv4/vs/ ?):
>
> - ip_masq_ignore_requests
>
> - To drop MASQ/LVS TCP/UDP requests without passing the
> requests to the internal host
>
> [I'll send soon an example implementation]
>
> - ip_vs_randomdrop
>
> - To drop entries from the LVS table
>
> [Currently in LVS]
>
> - ip_vs_secure_tcp
>
> - To switch to delayed TCP state transitions, i.e.
> using many state tables/timeouts. By this way we
> can follow the real servers TCP flags.
>
> [I'm currently trying to attach/detach state
> tables to the LVS entries using
> ip_masq_timeout_attach/detach]
>
> Each sysctl var have to allow manually or automatically to
> switch the working mode (0, 1 or 2?).
>
OK, thanks!
> >
> > > For the resurrection of the entries. The only problem is that
> > > we don't know when to drop the ES entries. We are not sure if the
> > > real server will ACK soon. It is possible the connection to freeze.
> > >
> > > Currently, I see 3 working modes as useful for VS/NAT (for
> > > example, via ip_vs_defense_level):
> > >
> > > mode 0 - default mode
> > >
> > > No packets are dropped.
> > > Under load we switch automatically to mode 1 and then back
> > > to mode 0 when the system is not busy
> > >
> > > mode 1 - we are in dangerous area
> > >
> > > A> We start to ACK the connection setup
> > >
> > > May be when there is less than 10MB left (or configured
> > > by user)?
> > >
> > > We have to use other timeouts and states (tables):
> > >
> > > We have to wait 10 seconds for example in SR state.
> > > When/if the real server replies with SYN+ACK we switch
> > > to a new state SA (abbreviated from SYN+ACK). If the real
> > > server doesn't use SYN cookie protection we don't see this
> > > SYN+ACK and the entry is dropped after 10 seconds. So,
> > > we expect SYN+ACK from the real server for 10 seconds. This
> > > is our support for all kinds of OS-es which doesn't
> > > support SYN cookies, i.e. when they just ignore the extra
> > > SYNs when their backlog is full. In fact, this is not a bad
> > > mode for the real server if it is overloaded. But may be
> > > the SYN cookie support is still preferred.
> > >
> > > The timeout for the new SA state can be 60 seconds,
> > > same as the old SR state. Or 75? The rule here is that we
> > > must stay in SA state until the ACK is received from the real
> > > server to allow the transition to ES state. We can't trust the
> > > client, so we can't switch to ES after its ACK. This is OK for
> > > the most of the services.
> > >
> > > B> We start to drop SYN packets using rate and without passing
> > > them to the real server.
> > >
> > > Yep, if the above protection doesn't work it is a
> > > time to switch to a faster Director. Buy more RAM, to feed your
> > > real servers. They accept more connections than the Director
> > > can handle.
> > >
> > > mode 2 - This is same as mode 1 but when set from the user,
> > > LVS can't return automatically to mode 0. Very useful when
> > > the user thinks that he is permanently under attack or just
> > > for debugging.
> > >
> >
> >
> > Yeah, I like your idea of ip_vs_defence_level, where we can add more
> > defence strategies there, and let users to choose the one they like. ;)
>
> I like the idea of many independent sysctl vars.
> We have to choose correct names for the above sysctl vars.
>
It is good to have many independent sysctl vars. However, it might be good
to have an ip_vs_defence_level sysctl, whoes value can be 0,1,2,3,4...,
different level means different defence strategies, then there is only one
sysctl on this. ;)
> >
> > >
> > > For the BUGS:
> > >
> > > ip_fw_masquerade() incorrectly continues to send the packet after
> > > ip_route_output() is failed. This is a recent MASQ bug. We must
> > > return -1; and not to use the default gateway with
> > > inet_select_addr(). We have to drop this packet, may be the routing
> > > cache needs tunning, so don't try to send this packet.
> > >
> >
> > I agree. And, for the performance reason of VS/NAT, we probably need to
> > move the determination of maddr to where is really needed.
>
> Yes, under DoS attack when we drop entries after passing the
> TCP/UDP requests to the real servers, we can move maddr selection
> for TCP/UDP after checking whether we need to send ICMP_PORT_UNREACH
> back to the real server or to pass the outoing packet to the client.
> icmp_send() has its own route decisions.
>
>
> For the UDP entries. Is the checking in ip_vs_random_drop()
> for IP_MASQ_S_NONE correct? Isn't the state IP_MASQ_S_UDP? May be
> I'm missing something? May be it is better to control the UDP
> entries by using ipchains and not to drop entries from the table?
> We still can implement dropping UDP entries without passing them.
>
It is my mistake, the state should be IP_MASQ_S_UDP. But, I cannot see the
reason that we cannot drop UDP entries, UDP itself is unreliable and
connectionless, UDP packets can be lost, duplicated, out of oder in the
transfer.
Thanks,
Wensong
|