Hi Julian,
I am sorry for the delay. I was busy handling other things these days.
On Sun, 19 Mar 2000, Julian Anastasov wrote:
>
> Yes, may be it is better to pass the SYN packets and to
> remove the entries when there is no enough memory. This is the better
> way when the real servers are protected with SYN-cookies mechanism
> and even if they are not protected.
>
> I have attached a patch (not compiled) with some proposals
> but there is more that must be done.
>
> We must consider these things:
>
> A>>> Entries not in the tables
>
> Currently, we can break valid connections which are in
> handshake, this is always true when the real servers support
> SYN-cookies. Once the entry is deleted we are going to send
> RST (ICMP_PORT_UNREACH) frames originated from the Director
> because currently, LVS delivers locally all packets which are
> not registered in the table. This is a _BUG_ which must be
> corrected. I.e. in ip_fw_demasquerade, just before the happy
> statement:
>
> /* sorry, all this trouble for a no-hit :) */
>
> we must add:
>
> #ifdef CONFIG_IP_MASQUERADE_VS
> if (svc) {
> return -1;
> }
> #endif
>
> This change is in the attached patch.
>
> I.e. we must drop this packet. This can be ACK from the client
> but for just removed entry by the sysctl_ip_vs_randomdrop.
> The rule here must be: "If a packet is for a LVS service it
> must be answered or dropped. Not delivered locally!". We
Agree, "If a packet is for a LVS service it must be answered or dropped.
Not delivered locally!" We will fix it.
> have to wait the packet from the real server to create the
> entry in the proper state (point B). Else, we are going to
> reset this connection which is very bad. Let's give the real
> server a chance (point B) to resurrect the entry.
>
> It is very bad if we remove entries not in VS/NAT mode,
> these connections are lost.
>
> B>>> Resurrecting entries: the new dream for the VS/NAT mode
>
> For this, ip_fw_masquerade() must be patched
> too (I didn't implemented it before we decide what to do).
> LVS must resurrect the entry just like the MASQ, i.e.
> ip_masq_new_vs() must be called with all consequences:
> templates, etc. If we drop entries for which we are not sure
> about the state (may be we drop entries in ES state in the real
> server) we must have a way to resurrect them. I.e. the
> OUTPUT(NO->ES via ACK). We rely on the real servers state and
> if we drop entries in SYN_RECV state we know that the possible
> state is ES. When we see ACK from the real server we can assume
> that it is normal data ACK (not ACK for FIN). I.e. if we don't
> drop entries in FW/TW state we know that the ACK from the real
> server is about the established state. So, we can create the
> entry in ES state (just like the MASQ). The OUTPUT table for NO
> state is correct: we resurrect the entries in SS, TW, ES or CL
> state according to the flags. We must implement it for LVS/NAT.
>
We cannot resurrect entries for LVS/NAT, because we cannot get more
information from packets from the real servers, we don't know which
virtual service the services belong to (the IP address and port number of
the virtual service).
> What is the consequence if we can resurrect the entries:
> we can drop even entries in established state. But when?
> Only when we expect reply from the real server from which we
> create the new entry. Currently, we rely on real servers ACK
> just after the handshake. This is true when the traffic is
> started just after the connection (most of the services:
> http, ftp, etc.). I.e. we think we drop entries in SYN_RECV
> state but it can be established. In this case we rely on the
> first ACK from the real server. Why not on the next ACKs too?
> So, we can drop entries in ES state too.
>
We need fix the INPUT table too, switching from SYN_RECV state to EST
while receiving ACK from the client.
> So, under siege, LVS can drop entries from the table
> after SYN and/or ACK packets coming from the client. In this
> case we expect real server to reply with ACK-only. The rule is
> that if we expect answer from the real server we can drop some
> frames from the client and to rely on the retransmission. So,
> when we receive SYN, SYN+ACK or ACK frame from the client we
> forward it to the real server and drop the entry. Yes, this is
> bad but we are under atack. If the real servers ACKs we resurrect
> the entry, so nothing is lost. The drawback is that the ACK
> can be delayed (0.5 seconds?). We have to check what are the
> consequences for the persistent connections in this case too.
>
> In the demasq direction we still create entries only on
> SYN packets.
>
> C>>> Handshake ACK
>
> In 0.9.9 we switch to ES state when the first ACK
> from the real server is received. This mechanism can break
> some services (I don't know such services) which rely on the
> first packet from the client, i.e. if the protocol requires
> the first packet to be transfered from the client. In this
> case the real server is in established state (just after the
> handshake), client is in established state too, his first
> packet is delayed but LVS removes this entry because it is
> in SR state, i.e. there is no first ACK from the server.
>
> Once the entry is removed we can receive SYN+ACK
> from the real server. In this case we just switch to SS
> state which is unexpected for LVS. We have to treat SR and
> SS states as same: we have to wait for ACK from the real
> server before changing the state to ES. I patched the SS
> state too. If not patched the transfer can go in SS state,
> i.e. with less timeout.
>
> The question is if we need this ACK everytime. May be
> we can use this change only under attack. I.e. we can switch
> to mode when the ACK from the real server is required for
> changing the state from SR or SS to ES. In normal mode we
> can just switch to ES state (same as before 0.9.9). But if
> we can resurrect the entries we can drop entries from the
> table not just after the initial SYN from the client but
> after each ACK too. We wait for a answer from the real server.
> So, may be we can enable the SR change (added in 0.9.9) and
> the change is the SS state (in the attached patch) only
> when the Director is under attack. Or to restore the table
> as before 0.9.9 and to rely on resurrection. We are no
> more afraid to enter ES state when we can drop entries in
> this state. Good, Yeah :)
>
> If there is no memory to resurrect the entry we drop this
> packet from the real server.
>
> For example (VS/NAT 0.9.9):
>
> CLIENT SERVER SERVER STATE NEW MASQ STATE
>
> 1. SYN ------> => SR
> 2. <------ SYN+ACK SYN_RECV => SR
> 3. ACK ------> EST => SR
I think that we need fix the INPUT table here.
> 4. data+ACK------> EST => SR
> 5. <------ ACK EST => ES
>
> We can drop even established connections if we pass
> the initial SYN (1) because we drop everything in SR state.
> This is possible even for real server without SYN cookies
> support. We don't know which initial SYNs are dropped from
> the real server if its backlog queue is full. Even if the
> initial SYN is accepted from the real server we can drop
> the entry before receiving the SYN+ACK answer from the real
> server. So, there is no difference if we switch to ES state
> in point 2 or in point 5. In all cases if we pass the
> initial SYN it is possible the entry to be deleted while
> the connection is established in both ends.
>
> I want to say that the OUTPUT table change in 0.9.9
> is dangerous. We give 60 seconds for the handshake, for
> the first data to be received from the client and ACK-ed
> from the real server.
>
> Why not just to restore the old behavior to enter ES
> from SR and SS states after the clients SYN or SYN+ACK. When
> we have entry resurrection this is not a problem.
>
> I think, there is another problem. The masquerade can
> be fooled to drop connections even if these connections are
> not sniffed. May be LVS needs to forget about the current
> INPUT and OUTPUT tables which are for non-trusted environments
> and to build new tables, i.e. where we trust the real
> servers. We must consider the real servers as trusted hosts
> and to change the states based on the packets from the real
> servers. I.e. may be we have to ignore the RST and FIN packets
> coming from the client and to rely on the real servers response.
> But this is another issue.
>
> > Maybe we can combine these two methods in the system too. ;-)
> >
>
> We must decide how to combine all these techniques:
>
> - drop frames before passing them or after passing them
>
> - change the state table under attack and to rely on the entry
> resurrection from the outgoing packet (from the real server),
> i.e. when the entries are dropped from the table (after passing
> them). Warning: we need to drop entries until there is enough
> memory (a free limit is reached). The resurrection on each
> outgoing packet is very expensive.
>
> - do we need always to ACK the connection from the real servers
> or we can mark it as ES just after SYN/SYN+ACK from the real
> server (as before 0.9.9) and to rely on the resurrection. In this
> case we don't care if the client tries to flood us. We just
> follow the real server's state.
>
>
>
> The current variants for the user are:
>
> 1> Modify the timeouts with ipchains -M -S
>
> EST/FIN states
>
> with timeout 1 minute for the established state we
> reach the same defense level as the entry removal.
> This mean that after that point we have to drop frames
> without passing them to the real server.
>
> 2> Any other variants?
>
> 3> The entry resurrection. Is it only a dream?
>
> So, after 0.9.9 we can:
>
> - restore the OUTPUT(SR->ES) state (as in the mainstream kernel)
> - drop all entries if we can't find entry for the client's packet
> - implement entry resurrection
>
I think that we better verify the whole INPUT and OUTPUT state transition
tables for both IP Masquerading and IPVS.
/* INPUT */
/* mNO, mES, mSS, mSR, mFW, mTW, mCL, mCW, mLA, mLI */
/*syn*/ {{mSR, mES, mES, mSR, mSR, mSR, mSR, mSR, mSR, mSR }},
/*fin*/ {{mCL, mCW, mSS, mTW, mTW, mTW, mCL, mCW, mLA, mLI }},
/*ack*/ {{mCL, mES, mSS, mES, mFW, mTW, mCL, mCW, mCL, mLI }},
/*rst*/ {{mCL, mCL, mCL, mSR, mCL, mCL, mCL, mCL, mLA, mLI }},
/* OUTPUT */
/* mNO, mES, mSS, mSR, mFW, mTW, mCL, mCW, mLA, mLI */
/*syn*/ {{mSS, mES, mSS, mSR, mSS, mSS, mSS, mSS, mSS, mLI }},
/*fin*/ {{mTW, mFW, mSS, mTW, mFW, mTW, mCL, mTW, mLA, mLI }},
/*ack*/ {{mES, mES, mSS, mES, mFW, mTW, mCL, mCW, mLA, mES }},
/*rst*/ {{mCL, mCL, mSS, mCL, mCL, mTW, mCL, mCL, mCL, mCL }},
Is it right now?
Thanks,
Wensong
|