On Mon, 19 Jun 2000, Wensong Zhang wrote:
> > Yes, in 2.2 the packets must be restored which is
> > not good practice but this can hurt only masq applications
> > which change data (which is may be not fatal, I'm not sure).
> It depends. We have to make sure that ip_fw_unmasq_icmp doesn't restore
> packets that don't need to be restored. Otherwise, it might hurt some
> other progarms.
Yes, this is already ok.
> > > Yeah, I agree with you. The netfilter probably have problems in calling
> > > icmp_send for already-mangled packets. I think it is a need to restore ip
> > > header of the packet before calling icmp_send.
> > I think, with Netfilter many things look well
> > structured but I'm not sure for the ICMP_FRAG_NEEDED
> > generation. The packet restoring must be avoided, if
> > possible, if it involves changes in the protocol data, i.e.
> > not very good for masq apps. I assume this is not planned
> > in the 2.3 world without masq apps.
> Yup, restoring packet is difficult, especially for restoring data (the
> first 64 data bits of the datagram). The packet restoring should be
> avoided as far as possible, but I am not sure that all the error can be
> detected before mangling packets.
For now only the PMTU raises this problem.
> I see that mangling packets will introduce many other problems, I now
> like the VS/DR and VS/TUN more. :)
> > I think, we have to answer these questions for 2.3:
> > - should we use header restoring or not. It must be planned
> > with the masq apps support if any, i.e. whether the data
> > will be changed too. It is very difficult to restore data.
> > For the header it is easy.
> > - how can we call each hook from PRE_ROUTING to revert its
> > header or data changes if each this hook returns NF_ACCEPT
> > instead of NF_STOLEN. It is not possible.
> I think that we better try to avoid restoring packet if possible,
> because if the modified data is in the first 64 data bits and we cannot
> restore it to the original data, the icmp message still cannot notify
> the client correctly.
In fact, the our address is ignored when sending the packet
to the client. With a little magic icmp_send delivers the packet
back to the client but with wrong encapsulated address. The result:
may the client can't determine the related connection. We send
raddr instead of maddr, sometimes 192.168.X.Y :)
> If we cannot avoid restoring packets, the only method seems that we have
> to duplicate the original packet before the mangled packet is sent out
> correctly. But, it will introduce too much overhead.
> > The result:
> > - don't try to restore header from icmp_send
> > - if something is changed the hook must return NF_STOLEN
> > and process the packet: routing, mtu check, mangling and
> > forwarding
> > - return ICMP_FRAG_NEEDED before mangling. Here is the
> > problem (not for LVS), we must know the output device mtu
> > before mangling. But the kernels call ip_forward() when
> > the packets are ready to send, i.e. after mangling without a
> > way to restore them.
> > At least, these thoughts don't correspond with the
> > current packet filter hooks and the packet forwarding.
> > But may be I'm missing something. If the above is
> > correct the "ext_mtu > int_mtu" problem can break any
> > design. LVS have to do the steps ignoring the current kernel
> > structure. This will improve the speed, though. The other
> > way is just not to solve this problem. This can be bad for
> > some guys with external gigabits and many internal megabits.
> > Is that true?
> I agree that we better locate errors before mangling packet if possible.
> For VS/TUN and VS/DR in IPVS for kernel 2.3, we have already skipped
> some hooks and sent the packet immediately. For LVS/NAT, maybe we can
> try to detect all the errors before mangling packet, if there is no
> problem, we can also send the mangled packet immedidately, then the
> hooks such as ip_forward will not do some error detection again. Anyway,
> we will see.
I'm not sure if our in_get() call can replace the
routing cache lookup in ip_route_input for the LVS connections.
We will skip a hash table lookup for the next packets and only
the first packet will be checked with ip_route_input. May be
the time to lookup the route cache is same as the time to lookup
with in_get (which we always call). I.e. if we play in the
pre_routing we can safe (n-1)/n of the calls to ip_route_input.
But we can't call ip_forward without calling ip_route_input.
This can result in skipping the FORWARD hook. I have a very
bad idea: sysctl var, module option, compile-time define to
select FAST and SLOW mode for LVS in the Netfilter framework.
FAST: do all work (skip ip_route_input, play as FORWARD filter,
etc.), SLOW: stay in LOCAL_IN, call ip_output to check MTU,
may be before ip_forward. But there must be a flag, we have
to skip connection table lookups in the FORWARD chain. Very
bad. All these troubles for mtu. If we want to forward
gigabits we must make some compromises with the packet
filtering. Or may be we can call the NF_HOOKs? May be
we can't register in the Netfilter hooks: better to
hook ip_forward just like the dumb nat? And to change the
header after the mtu check and before the NF_HOOK?
The is the only good place to demasq and masq. And
we better not to hook pre_routing but to call ip_route_input
with daddr=raddr for our connections. The result: may
be we can hook ip_rcv_finish() and to call ip_route_input
with different daddr (after schedule)? We can use skb
Julian Anastasov <uli@xxxxxxxxxxxxxxxxxxxxxx>