Re: PMTU-D: remember, your load balancer is broken (fwd)

To: Julian Anastasov <uli@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: PMTU-D: remember, your load balancer is broken (fwd)
Cc: Kyle Sparger <ksparger@xxxxxxxxxxxxxxxxxxxx>, lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Wensong Zhang <wensong@xxxxxxxxxxxx>
Date: Mon, 19 Jun 2000 22:44:24 +0800 (CST)


On Fri, 16 Jun 2000, Julian Anastasov wrote:

>       Hello,
> On Fri, 16 Jun 2000, Wensong Zhang wrote:
> > >   For  VS/NAT demasq  is broken.  We  must restore the
> > > original   packet  in  ip_fw_unmasq_icmp  when  called  from
> > > icmp_send  (called from ip_forward after the packet mangling
> > > and  after 2nd ip_route_input). (I hope nobody complains the
> > 
> > I am sorry that I forgot to restore the ip header of the mangled packet
> > before icmp_send, we have to hook ip_fw_unmasq_icmp to restore packet. You
> > are right.
>       Yes,  in 2.2 the  packets must be  restored which is
> not  good practice but this  can hurt only masq applications
> which change data (which is may be not fatal, I'm not sure).

It depends. We have to make sure that ip_fw_unmasq_icmp doesn't restore
packets that don't need to be restored. Otherwise, it might hurt some
other progarms.

> > 
> > > 576 data bytes are not restored).  There is no such thing in
> > > 2.3,  the packet restoring is a  big pain and it was removed
> > > but  that  doesn't  means  2.3  looks  correct  when sending
> > > ICMP_FRAG_NEEDED  from ip_forward(). We are  not sure if the
> > > packet  is  not already  changed  from the  PRE_ROUTING? The
> > > result:  rewritten iph->daddr (internal address) is returned
> > > in  the ICMP reply.  Is my interpretation  correct? I didn't
> > > tested it yet.
> > > 
> > 
> > Yeah, I agree with you. The netfilter probably have problems in calling
> > icmp_send for already-mangled packets. I think it is a need to restore ip
> > header of the packet before calling icmp_send.
>       I  think,  with  Netfilter  many  things  look  well
> structured   but  I'm  not  sure  for  the  ICMP_FRAG_NEEDED
> generation.   The  packet  restoring  must  be  avoided,  if
> possible,  if it involves changes in the protocol data, i.e.
> not  very good for  masq apps. I assume  this is not planned
> in the 2.3 world without masq apps.

Yup, restoring packet is difficult, especially for restoring data (the
first 64 data bits of the datagram). The packet restoring should be
avoided as far as possible, but I am not sure that all the error can be
detected before mangling packets.

I see that mangling packets will introduce many other problems, I now
like the VS/DR and VS/TUN more. :)


>       I think, we have to answer these questions for 2.3:
> - should  we use header restoring or not. It must be planned
> with  the masq  apps support if  any, i.e.  whether the data
> will  be changed too. It is  very difficult to restore data.
> For the header it is easy.
> - how  can we call each hook  from PRE_ROUTING to revert its
> header  or data changes if  each this hook returns NF_ACCEPT
> instead  of NF_STOLEN. It is not possible.

I think that we better try to avoid restoring packet if possible,
because if the modified data is in the first 64 data bits and we cannot
restore it to the original data, the icmp message still cannot notify
the client correctly.

If we cannot avoid restoring packets, the only method seems that we have
to duplicate the original packet before the mangled packet is sent out
correctly. But, it will introduce too much overhead.

>       The result:
> - don't try to restore header from icmp_send
> - if  something is  changed the  hook must  return NF_STOLEN
> and  process the  packet: routing,  mtu check,  mangling and
> forwarding
> - return  ICMP_FRAG_NEEDED  before  mangling.   Here  is the
> problem  (not for LVS),  we must know  the output device mtu
> before  mangling.   But the  kernels call  ip_forward() when
> the packets are ready to send, i.e. after mangling without a
> way to restore them.
>       At  least, these thoughts  don't correspond with the
> current packet filter hooks and the packet forwarding.
>       But  may be I'm  missing something. If  the above is
> correct  the  "ext_mtu  >  int_mtu"  problem  can  break any
> design. LVS have to do the steps ignoring the current kernel
> structure.   This will improve the  speed, though. The other
> way  is just not to solve this  problem. This can be bad for
> some guys with external gigabits and many internal megabits.
> Is that true?

I agree that we better locate errors before mangling packet if possible.
For VS/TUN and VS/DR in IPVS for kernel 2.3, we have already skipped
some hooks and sent the packet immediately. For LVS/NAT, maybe we can
try to detect all the errors before mangling packet, if there is no
problem, we can also send the mangled packet immedidately, then the
hooks such as ip_forward will not do some error detection again. Anyway,
we will see.



> Regards
> --
> Julian Anastasov <uli@xxxxxxxxxxxxxxxxxxxxxx>

<Prev in Thread] Current Thread [Next in Thread>