Re: [PATCH][RFC]: followup ...

To:	Roberto Nibali <ratz@xxxxxx>
Subject:	Re: [PATCH][RFC]: followup ...
Cc:	"lvs-users@xxxxxxxxxxxxxxxxxxxxxx" <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
From:	Julian Anastasov <ja@xxxxxx>
Date:	Sun, 18 Feb 2001 16:40:27 +0000 (GMT)
        Hello Ratz,

On Sun, 18 Feb 2001, Roberto Nibali wrote:

> Hi Julian,
>
> Basically I have to agree with your statements, the problem
> is, that we just see the network setup of a LVS-cluster
> differently. I tend to use a good firewall architecture that
> assures me that the loadbalancer will not get hit too badly
> in case of a [D]DoS attack. You assume the worst case, where
> the loadbalancer is standing right in the Internet and is
> fighting alone against all the malicious forged packets. For
> me the LVS is a part of the firewall design, I'd personally
> never build a net only with a load balancer and without some
> filter and/or proxy. Your mileage may vary.

        I agree, some firewalling can be done before the balancer
but when the normally looking traffic comes only the balancer knows
for open/closed ports, related ICMP, etc. The main things
you can do before the balancer are to avoid source address spoofing,
some bad packets, may be some ICMP types? But the balancer can be
attacked even with normal traffic. The request rate can be limited
before the balancer. But it is good this to be related with such
things as: the virtual service, the real server load, etc.

> >         Yep, may be we need an universal netlink transport to get data
> > from LVS but this must be carefully analyzed.
>
> I'm currently talking with laforge (Harald Welte) from the
> iptables team about the universal netlink interface. He's
> trying to synchronize the state tables too via the netlink
> interface. A intelligent netlink framework, maybe even as
> some kind of API could help people a lot.

        Yes, we need NETLINK_LVS kernel socket or similar. I don't
think that for netfilter will be easy but for LVS can be easier. If
we use full state (yes, Netfilter has "Real statefull connection
tracking") replication we can flood the the internal links. There
are ideas the state replication to be implemented only for long
living connections. And yes, we can use this universal transport
for many things, not only for connection state replication.

> > sees another one. And under attack we observe big gap between the
> > active/inactive counters and the used threshold values. In this case
> > we just exclude all real servers. This is the reason I prefer the
> > more informed approach of using agents.
>
> Then set the threshold to a high enough value just as much
> that the realserver will not die.

        We will need one admin to stay and to change the range :)
Of course, the range you propose can be tuned once until the
parameters are changed under attack. OK, the user space tool can
change the values under attack :)

> >         No, we have two choices:
> >
> > - use SYN cookies and much memory for open requests, accept more
> > valid requests
> >
> > - don't use SYN cookies, drop the requests exceeding the backlog length,
> > drop many valid requests but the real servers are not overloaded
>
> So, what you're saying is, that in the beginning you set the value
> for the backlog queue high and if you experience high request rate
> that never completes 3-Way handshaking you reduce the backlog number?
> IMO you should enable both, SYN cookies and a well chosen backlog
> queue number because if you disable SYN cookies and exceed the amount
> of allowed connections in the backlog queue, new TCP_SYN requests
> will be dropped no matter what. If you however enabled SYN cookies
> and the amount of SYN_RECV states exceeds the backlog queue number
> new incoming requests trying to finish the 3-Way handshake will still
> get the possibility to do so.

        Yes, the user must select a backlog size value according to the
connection rate, we don't want dropped requests even while not under
attack. Of course, the SYN cookies help, for the OSes that support
them. Not very much if our link is full with invalid requests because
we can flood our output pipe too. But I don't know how often DDoS
SYN attacks happen these days.

> >         Yes, nobody claims the defense strategies guard the real
> > servers. This is not their goal. They keep the director with more
> > free memory and nothing more :) Only drop_packet can control the
> > request rate but only for the new requests.
>
> Hmm, so why can't I enable the drop_packet and set thresholds?
> Excuse my ignorance but drop_packet would randomly drop some
> connections and the threshold would guard that the server
> doesn't crash. I just provided an additional feature that doesn't
> affect the functionality or flow control of the rest. Do you

        Agreed. drop_packet and RS limits are different things.
The question is how efficient will be the RS limits but if they
are option the users can select, I don't see a problem. That can
be an option just like the people use wlc for example - no
guarantee for the real server load :) But while under attack the
wlc is not affected (except if the flood is over one connection),
the RS limits are. And this is the problem I see.

> want the RS to be exhausted (with enabled drop_packet) or do
> you want to have the possibilty to act before this happens. If
> you let the RS get flooded by forged packets it's IMHO the same
> as if you set their value to zero. Existing connection in the
> transition table will still be able communicate with the server
> I just want to prevent the LB to sent another request to the
> RS. If this has to be done in kernel space is a question per se.

        Yes, these RS limits are a simple control we can add.
And of course it will be used from many users. My doubts are related
to the moment where all real server will disappear and will not
accept more new connections. How fast we will increase these
limits or will start scheduling connections to these real servers.
It again appears to be a user space problem :)

> >         Yes but the clients can't work, you exclude all servers
> > in this case because the LVS spreads the requests to all servers
> > and the rain becomes deluge :)
>
> No, in quiesced mode, all existing connections will still be
> handled. If the (dest->inactconns + dest->activeconns) for
> this RS drop below the lower threshold he will get new requests.
> I don't see the deluge. In case the LVS kernel sets all the
> RS weights 0 it will act as a sink by dropping all connections
> not in the template state table. Of course no new correct
> connection will be processed but in your case where you drop
> randomly packets you can have the same amount of loss of 'good'
> packets. Am I wrong ? :)

        Yes but drop_packet can be activated when we see a very
big connection rate that will occupy all the memory for connections
in the director. If we don't run other user space software we
can simply ignore the defense strategies and to leave the packets
to be dropped after memory allocation error.

> >         Yes, we have to find a way to configure all these features
> > that will be implemented further. The main problem here is how
> > we can provide one binary ipvsadm tool to all Linux distributions.
>
> Huh? To adjust the ipvsadm tool to the distros is the problem
> of the maintainers itself. We just provide the functionality
> to be able to maintain all tasks we support. I have built an
> own distribution too and I too have to maintain my diffs to
> the LVS-code and the ipvsadm app. I'd disagree if ipvsadm would
> become distro-related.

        Yes, may be we can imlpement a better mechanism that will
allow the different options to be supported without hurting all
users. Who knows, may be we can create more sockops? But the
things can be very complex. BTW, netfilter has such shared libs,
for example.

> > We know how many problems the users have when they use the ipvsadm
> > supplied in their distrib.
>
> So the distributions can handle it. It can't be our task to
> adjust the binary tool to every distro it's our task to keep
> it clean and independant of any distro.

        This is true but it means thay have to put all features in?
Currently, for LVS we have the following methods in hand:

- create new scheduler

        Total 1 methods to add new separated features (may be I'm missing
something). The things can be very complex if one new feature wants
to touch some parts of the functions in the fast path or in the user
space structures. What can be the solution? Putting hooks inside LVS?
IMO, we already must think for such needs.

> > > Hmm, I just want to limit the amount of concurrent connections
> > > per realserver and in the future maybe per service. This saved
> > > me quite some lines of code in my userspace healthchecking
> > > daemon.
> >
> >         Yes, you vote for moving some features from user to the
> > kernel space. We must find the right balance: what can be done in
> > LVS and what must be implemented in the user space tools.
>
> I absolutely agree and even if you consider the fact that in
> the future (although I don't think so) this very clean patch
> could be part of the mainstream kernel. I also disagree putting
> every cool feature one thinks he needs into the kernel just
> because it's faster and it saves him 2500+ lines of code. My

        No doubts, there will be some nice features that can't be
done in user space. And exactly these features are not used from
other users. The example is the cp->fwmark support proposed from
Henrik Nordstrom: we have a feature that is difficult to say it
is for user space but that touches two parts: internal functions
and adds another hook that can delay the processing for some
users. I'm not sure what will happen if we start to think in
"hooks" just like netfilter. If that looks good in user space
I'm not sure we can tell the same for the kernel space. Any
ideas here, may be for new topic?

> ugly patch doesn't, if implemented in the correct way, affect
> the normal kernel control path in case you don't use the
> feature. Anyway, we will find a cool solution to the problem
> because admittedly both solutions are not the best of all worlds.
> I also would like to hear from other people what experiences
> they've made with DDoS and the way the LVS was working under
> an attack. So far I've not seen more than an akademic proof
> (doing some stress tests not reflecting real world example)
> to the designed defense strategies. I think Anoush was working
> on something too but I haven't heard of him since ages ;)

        Hm, it seems nobody has such problems :)))

> >         May be we can limit the SYN rate. Of course, that not covers
> > all cases, so my thought was to limit the packet rate for all states
> > or per connection, not sure, this is an open topic. It is easy to open
>
> Uih! This is strong tobacco. You can screw quite alot if you
> start doing modifications in all states. But we should discuss
> such an approach because it sounds challenging.

        No, counter which is reset on state change. But this is
another issue and I didn't started to think more about such things.
May be will not :)

> > combine this with the other kind of attacks, the distributed ones,
> > we have better control. Of course, some QoS implementations can
> > cover such problems, not sure. And this can be a simple implementation,
> > of course, nobody wants to invent the wheel :)
>
> Yes, I doubt however that existing QoS schedulers would already
> bring such a functionality.

        Yes, that defense can be connection state related, LVS is
connection scheduler, though, not a packet scheduler.

> > a state when we can drop some of the requests and to keep the
> > real servers busy but responsive. This can be a difficult task but
> > not when we have the help from our agents. We expect that many
> > valid requests can be dropped but if we keep the real server in
> > good health we can handle some valid requests because nobody knows
> > when the flood will stop. The link is busy but it contains valid
> > requests. And the service does not see the invalid ones.
>
> This is the biggest problem with LVS in DR-mode. The control of
> the states and the packets. We just don't have yet a reliable
> way of weighting an incoming connection and this is IMHO also
> impossible.

        Yes, job for the agents to represent the real server load
in weights.

> >         The requests are not meaningful, we care how much load they
> > introduce and we report this load to the director. It can look, for
> > example, as one value (weight) for the real host that can be set
> > for all real services running on this host. We don't need to generate
> > 10 weights for the 10 real services running in our real host. And
>
> I don't know, this could be desirable unless we have an
> intelligent enough scheduler. In lots of projects I've seen
> or implemented the application or database behind such a LVS
> cluster was crap or the Tier-architecture was extremly clumsy
> so that already after a day I had huge load imbalance even
> with wlc and non-persistency.

        Yes, wlc is not my preferred scheduler when it comes to
connections dealing with database :)

        I don't think we need intelligent scheduler if we
are talking about current set of information used from the LVS
schedulers. Only the users know what kind of connections are
scheduled and they can instruct an user space tool how to set the
WRR weights according to the load.

> > we change the weight on each 2 seconds for example. We need two
> > syscalls (lseek and read) to get most of the values from /proc fs.
> > But may be from 2-3 files. This is in Linux, of course. Not sure
> > how this behaves under attack. We will see it :)
>
> Are you going for it?

        Yes, when my user space libs are ready we will test them
for different setups and services.

> >         The only problem we have with this scheme is the ipvsadm
> > binary. It must be changed (the user structure in the kernel :))
>
> This is not your only problem :)
>
> > The last change is dated from 0.9.10 and this is a big period :)
> > But you know what means a change in the user structures :)
>
> Indeed.
>
> >         Yes, the picture is complex and there are so many details
> > we can consider. IMO, there is no simple solution :) But if we
> > combine all useful ideas in a user space software, I think, we can
> > have an useful tool.
>
> Definitely true, you already started with a very promising
> user space tool which is extremely open to extend.
>
> > > A packetfilter, the router (most of use do have a CISCO, don't the?)
> >
> >         Yes, the question is how Cisco will know what packet rate
> > overloads the real servers :)
>
> :) The router is in my example just configured to drop non-net related
> packets and these are already enough (seeing the huge logfile that
> comes every day.

        Yes, there are packets with sources from the private networks
too :)

> >         No, no :) I'm never happy, always look for better ideas (a
> > joke :)) May be I'm thinking for too complex things. And the time is
> > always not enough :)
>
> Well, I'm happy to hear this so I know we're both pulling on the
> same rope. I'm also not happy as long as the proper solution I
> can live with is implemented. That's the way the IT business should
> work (it doesn't however).

        I hope other people will express their ideas about this
topic. May be I'm too pedantic in some cases :) And now I'm talking
without "showing the code" :) I hope the things will change soon :)

> Thank you again for the interesting comments and thoughts,
> Roberto Nibali, ratz


Regards

--
Julian Anastasov <ja@xxxxxx>
<Prev in Thread]	Current Thread	[Next in Thread>
Re: [PATCH][RFC]: add threshhold per RS (dirty hospital version), Roberto Nibali Re: [PATCH][RFC]: add threshhold per RS (dirty hospital version), Julian Anastasov Re: [PATCH][RFC]: followup ..., Roberto Nibali Re: [PATCH][RFC]: followup ..., Julian Anastasov <= Re: [PATCH][RFC]: followup ..., Roberto Nibali Re: [PATCH][RFC]: followup ..., Julian Anastasov Re: [PATCH][RFC]: followup ..., Roberto Nibali Re: [PATCH][RFC]: followup ..., Julian Anastasov Re: [PATCH][RFC]: followup ..., Henrik Nordstrom Re: [PATCH][RFC]: followup ..., Roberto Nibali Re: [PATCH][RFC]: followup ..., Henrik Nordstrom Re: [PATCH][RFC]: followup ..., Roberto Nibali
Previous by Date:	Re: some info for DH and SH schedulers (fwd), Roberto Nibali
Next by Date:	Re: Help with DR, Julian Anastasov
Previous by Thread:	Re: [PATCH][RFC]: followup ..., Roberto Nibali
Next by Thread:	Re: [PATCH][RFC]: followup ..., Roberto Nibali
Indexes:	[Date] [Thread] [Top] [All Lists]