Re: [PATCH][RFC]: add threshhold per RS (dirty hospital version)

To:	Roberto Nibali <ratz@xxxxxx>
Subject:	Re: [PATCH][RFC]: add threshhold per RS (dirty hospital version)
Cc:	"lvs-users@xxxxxxxxxxxxxxxxxxxxxx" <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
From:	Julian Anastasov <ja@xxxxxx>
Date:	Fri, 16 Feb 2001 01:39:30 +0000 (GMT)

        Hello Ratz,

On Thu, 15 Feb 2001, Roberto Nibali wrote:

> Hi Julian,
>
> [Back from hospital]
>
> > > What is it good for? Yeah well, I don't know exactly, imagine yourself,
> > > but first of all this is proposal and I wanted to ask for a discussion
> > > about a possible inclusion of such a feature or even a derived one into
> > > the main code (of course after fixing the race conditions and bugs and
> > > cleaning up the code) and second, I found out with tons of talks with
> > > customers that such a feature is needed, because also commercial lb
> > > have this and managers always like to have a nice comparision of all
> > > features to decide which product they take. Doing all this in user-
> > > space is unfortunately just not atomic enough.
> >
> >         Yes, greping for the connection counters from the /proc fs
> > is not good :)
>
> Don't laugh, I've been doing this lately. On a heavy loaded server
> you get a shift of +-150 connections. :)

        Yep, may be we need an universal netlink transport to get data
from LVS but this must be carefully analyzed.

> >         How we will defend against DDoS? Using the number of active
>
> I'm using a packetfilter and in special zones a firewall after the
> packetfilter ;) No seriously, I personally don't think the LVS should
> take too much part on securing the realservers It's just another part
> of the firewall setup.

        The problem is that LVS has another view for the real server
load. The director sees one number of connections the real server
sees another one. And under attack we observe big gap between the
active/inactive counters and the used threshold values. In this case
we just exclude all real servers. This is the reason I prefer the
more informed approach of using agents.

> > or inactive connections to assign a new weight is _very_ dangerous.
>
> I know, but OTOH, if you set a threshhold and my code takes the
> server out, because of a well formated DDoS attack, I think it
> is even better than if you would allow the DDoS and maybe kill the
> realservers http-listener. BTW, what if you enable the defense

        No, we have two choices:

- use SYN cookies and much memory for open requests, accept more
valid requests

- don't use SYN cookies, drop the requests exceeding the backlog length,
drop many valid requests but the real servers are not overloaded

In both cases the listeners don't see requests until the handshake is
completed (Linux).

> strategies of the loadbalancer? I've done some tests and I was
> able to flood the realservers by sending forged SYNs and timeshifted
> SYN-ACKs with the expected seq-nr. It was impossible to work on
> the realservers unless of course I enabled the TCP_SYNCOOKIES. I

        Yes, nobody claims the defense strategies guard the real
servers. This is not their goal. They keep the director with more
free memory and nothing more :) Only drop_packet can control the
request rate but only for the new requests.

> then enabled my patch and after the connections exceeded the
> threshhold, the kernel took the server out temporarily by setting
> the weight to 0. In that way the server was usable and I could
> work on the server.

        Yes but the clients can't work, you exclude all servers
in this case because the LVS spreads the requests to all servers
and the rain becomes deluge :)

> > In theory, the number of connections is related to the load but
> > this is true when the world is ideal. The inactive counter can
> > be set with very high values when we are under attack. Even the WLC
> > method loads proporcionally the real servers but they are never
> > excluded from operation.
>
> True, but as I already said. I think LVS shouldn't replace a fw.
> I normally have a router configured properly, then a packetfilter,
> then a firewall or even another but stateful packetfilter. See,
> the patch itself is not even mandatory. I normal setup, my code
> is not even touched (except the ``if'':).

        Yes, we have to find a way to configure all these features
that will be implemented further. The main problem here is how
we can provide one binary ipvsadm tool to all Linux distributions.
We know how many problems the users have when they use the ipvsadm
supplied in their distrib.

> >         I have some thoughts about limiting the traffic per
> > connection but this idea must be analyzed. The other alternatives
>
> Hmm, I just want to limit the amount of concurrent connections
> per realserver and in the future maybe per service. This saved
> me quite some lines of code in my userspace healthchecking
> daemon.

        Yes, you vote for moving some features from user to the
kernel space. We must find the right balance: what can be done in
LVS and what must be implemented in the user space tools.

> > are to use the Netfilter's "limit" target or QoS to limit the
> > traffic to the real servers.
>
> But then you have to add quite some code. The limit target has
> no idea about LVS tables. How should this work, f.e. if you
> would like to rate limit the amount of connections to a realserver?

        May be we can limit the SYN rate. Of course, that not covers
all cases, so my thought was to limit the packet rate for all states
or per connection, not sure, this is an open topic. It is easy to open
a connection through the director (especially in LVS-DR) and then
to flood with packets this connection. This is one of the cases where
LVS can really guard the real servers from packet floods. If we
combine this with the other kind of attacks, the distributed ones,
we have better control. Of course, some QoS implementations can
cover such problems, not sure. And this can be a simple implementation,
of course, nobody wants to invent the wheel :)

> > > I already implemented a dozen of such setups and they work all
> > > pretty well.
> >
> >         Let's analyze the problem. If we move new connections from
> > "overloaded" real server and redirect them to the other real servers we
> > will overload them too. IMO, the problem is that there are
>
> No, unless you use a old machine. This is maybe a requirement of
> an e-commerce application. They have some servers and if the servers
> are overloaded (taken out by my user-space healthchecking daemon
> because the response time it to high or the application daemon is
> not listening anymore on the port) they will be taken out. Now I
> have found out that by setting threshholds I could reduce the down-
> time of flooded server significantly. In case all servers were
> taken out or their weights were set to 0 the userspace application
> sets up a temporarily (either local route or another server) new
> realserver that has nothing else to do then pushing a static webpage
> saying that the service is currently unavailable due to high
> server load or DDoS attack or whatever. Put this page behind a

        Yes, I know that this is a working solution. But see, you
exclude all real servers :) You are giving up. My idea is we to find
a state when we can drop some of the requests and to keep the
real servers busy but responsive. This can be a difficult task but
not when we have the help from our agents. We expect that many
valid requests can be dropped but if we keep the real server in
good health we can handle some valid requests because nobody knows
when the flood will stop. The link is busy but it contains valid
requests. And the service does not see the invalid ones.

> TUX 2.0 and try to overflow it. If you can, apply the zero-copy
> patches of DaveM. No way you will find such a fast (88MBit/s
> requests!!) Link to saturate the server.

        Nobody overflows the service :) You need so many clients
for this. The easiest way the attackers use is to flood the link.
And they prefer to reach the service because this makes more
troubles. More hops reached, more links busy, more troubles.

> > more connection requests than the cluster can handle. The solutions
> > to try to move the traffic between the real servers can only cause
> > more problems. If at the same time we set the weights to 0 this
> > leads to more delay in the processing. May be more useful is to
> > start to reduce the weights first but this again returns us to
> > the theory for the smart cluster software.
>
> Mhh, I was thinking of this first. But do you think a cluster
> software is fast (atomic) enough to react to huge amount of
> requests?

        The requests are not meaningful, we care how much load they
introduce and we report this load to the director. It can look, for
example, as one value (weight) for the real host that can be set
for all real services running on this host. We don't need to generate
10 weights for the 10 real services running in our real host. And
we change the weight on each 2 seconds for example. We need two
syscalls (lseek and read) to get most of the values from /proc fs.
But may be from 2-3 files. This is in Linux, of course. Not sure
how this behaves under attack. We will see it :)

> >         So, we can't exit from this situation without dropping
> > requests. There is more traffic that can't be served from the cluster.
>
> Obviously yes, but if you also include the practical problem of
> SLA with customers and guaranteed downtime per month I still have
> to say that for my deploition (is this the correct noun?) I go
> better with my patch in case of a DDoS and enabled LVS defense
> strategies then without.
>
> > If there is no cluster software to keep the real servers equally
> > loaded, some of them can go offline too early.
>
> The scheduler should keep them equally loaded IMO even in case
> of let's say 70% forged packets. Again, if you don't like to
> set a threshold, leave it. The patch is open enough. If you like
> to set it, set it, maybe set it very high. It's up to you.
> [See, I'm still fighting, although you're arguments are better :)]

        The only problem we have with this scheme is the ipvsadm
binary. It must be changed (the user structure in the kernel :))
The last change is dated from 0.9.10 and this is a big period :)
But you know what means a change in the user structures :)

> >         The cluster software can take the role to monitor the load
> > instead of relying on the connection counters. I agree, changing the
>
> I think this has to be done additionally.
>
> > weights and deciding how much traffic to drop can be explained
> > with a complex formula. But I can see it only as a complete solution:
>
> Agreed. Well, I come over to your place and we discuss this whole
> thing :)

        Yes, the picture is complex and there are so many details
we can consider. IMO, there is no simple solution :) But if we
combine all useful ideas in a user space software, I think, we can
have an useful tool.

> > to balance the load and to drop the exceeding requests, serve as many
> > requests as possible. Even the drop_packet strategy can help here,
>
> See, not too many tests have been made yet. Most of it by Joe and
> not all of them reflect the real world.
>
> > we can explicitly enable it specifying the proper drop rate. We don't
>
> Again, additionally.

        Yes, because the the easiest way to control the LVS is
from user space and to leave in LVS only the basic needed support.
This allows us to have more ways to control LVS.

> > need to use it only to defend the LVS box but to drop the exceeding
> > traffic. But someone have to control the drop rate :) If there is no
>
> A packetfilter, the router (most of use do have a CISCO, don't the?)

        Yes, the question is how Cisco will know what packet rate
overloads the real servers :)

> > exceeding traffic what problems we can expect? Only from the bad load
> > balancing :)
>
> Null pointer dereferences :)
>
> Thank you Julian for having had a look at it and for the interesting
> points. I'm sure in future we will find a way to make all of us happy.

        No, no :) I'm never happy, always look for better ideas (a
joke :)) May be I'm thinking for too complex things. And the time is
always not enough :)

> Best regards,
> Roberto Nibali, ratz


Regards

--
Julian Anastasov <ja@xxxxxx>

<Prev in Thread]	Current Thread	[Next in Thread>
Re: [PATCH][RFC]: add threshhold per RS (dirty hospital version), Roberto Nibali Re: [PATCH][RFC]: add threshhold per RS (dirty hospital version), Julian Anastasov <= Re: [PATCH][RFC]: followup ..., Roberto Nibali Re: [PATCH][RFC]: followup ..., Julian Anastasov Re: [PATCH][RFC]: followup ..., Roberto Nibali Re: [PATCH][RFC]: followup ..., Julian Anastasov Re: [PATCH][RFC]: followup ..., Roberto Nibali Re: [PATCH][RFC]: followup ..., Julian Anastasov Re: [PATCH][RFC]: followup ..., Henrik Nordstrom Re: [PATCH][RFC]: followup ..., Roberto Nibali Re: [PATCH][RFC]: followup ..., Henrik Nordstrom Re: [PATCH][RFC]: followup ..., Roberto Nibali

Previous by Date:	Survey Results: 11 Question LVS survey, Lorn Kay
Next by Date:	some info for DH and SH schedulers (fwd), Wensong Zhang
Previous by Thread:	Re: [PATCH][RFC]: add threshhold per RS (dirty hospital version), Roberto Nibali
Next by Thread:	Re: [PATCH][RFC]: followup ..., Roberto Nibali
Indexes:	[Date] [Thread] [Top] [All Lists]