Re: [PATCH][RFC]: followup ...

To:	Julian Anastasov <ja@xxxxxx>
Subject:	Re: [PATCH][RFC]: followup ...
Cc:	"lvs-users@xxxxxxxxxxxxxxxxxxxxxx" <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
From:	Roberto Nibali <ratz@xxxxxx>
Date:	Sun, 18 Feb 2001 12:31:59 +0100

Hi Julian,

Basically I have to agree with your statements, the problem
is, that we just see the network setup of a LVS-cluster
differently. I tend to use a good firewall architecture that
assures me that the loadbalancer will not get hit too badly
in case of a [D]DoS attack. You assume the worst case, where
the loadbalancer is standing right in the Internet and is
fighting alone against all the malicious forged packets. For
me the LVS is a part of the firewall design, I'd personally
never build a net only with a load balancer and without some
filter and/or proxy. Your mileage may vary.

> > Don't laugh, I've been doing this lately. On a heavy loaded server
> > you get a shift of +-150 connections. :)
> 
>         Yep, may be we need an universal netlink transport to get data
> from LVS but this must be carefully analyzed.

I'm currently talking with laforge (Harald Welte) from the
iptables team about the universal netlink interface. He's
trying to synchronize the state tables too via the netlink 
interface. A intelligent netlink framework, maybe even as
some kind of API could help people a lot.
 
>         The problem is that LVS has another view for the real server
> load. The director sees one number of connections the real server

Yes, if (amount forged packets << correctly crafted requests).

> sees another one. And under attack we observe big gap between the
> active/inactive counters and the used threshold values. In this case
> we just exclude all real servers. This is the reason I prefer the
> more informed approach of using agents.

Then set the threshold to a high enough value just as much
that the realserver will not die. 
 
>         No, we have two choices:
> 
> - use SYN cookies and much memory for open requests, accept more
> valid requests
> 
> - don't use SYN cookies, drop the requests exceeding the backlog length,
> drop many valid requests but the real servers are not overloaded

So, what you're saying is, that in the beginning you set the value
for the backlog queue high and if you experience high request rate
that never completes 3-Way handshaking you reduce the backlog number?
IMO you should enable both, SYN cookies and a well chosen backlog
queue number because if you disable SYN cookies and exceed the amount
of allowed connections in the backlog queue, new TCP_SYN requests
will be dropped no matter what. If you however enabled SYN cookies
and the amount of SYN_RECV states exceeds the backlog queue number
new incoming requests trying to finish the 3-Way handshake will still
get the possibility to do so.
 
> In both cases the listeners don't see requests until the handshake is
> completed (Linux).

Yep, corretly.
 
>         Yes, nobody claims the defense strategies guard the real
> servers. This is not their goal. They keep the director with more
> free memory and nothing more :) Only drop_packet can control the
> request rate but only for the new requests.

Hmm, so why can't I enable the drop_packet and set thresholds?
Excuse my ignorance but drop_packet would randomly drop some
connections and the threshold would guard that the server 
doesn't crash. I just provided an additional feature that doesn't
affect the functionality or flow control of the rest. Do you
want the RS to be exhausted (with enabled drop_packet) or do
you want to have the possibilty to act before this happens. If
you let the RS get flooded by forged packets it's IMHO the same
as if you set their value to zero. Existing connection in the
transition table will still be able communicate with the server
I just want to prevent the LB to sent another request to the
RS. If this has to be done in kernel space is a question per se.
 
>         Yes but the clients can't work, you exclude all servers
> in this case because the LVS spreads the requests to all servers
> and the rain becomes deluge :)

No, in quiesced mode, all existing connections will still be
handled. If the (dest->inactconns + dest->activeconns) for
this RS drop below the lower threshold he will get new requests.
I don't see the deluge. In case the LVS kernel sets all the 
RS weights 0 it will act as a sink by dropping all connections
not in the template state table. Of course no new correct 
connection will be processed but in your case where you drop
randomly packets you can have the same amount of loss of 'good'
packets. Am I wrong ? :)
 
>         Yes, we have to find a way to configure all these features
> that will be implemented further. The main problem here is how
> we can provide one binary ipvsadm tool to all Linux distributions.

Huh? To adjust the ipvsadm tool to the distros is the problem
of the maintainers itself. We just provide the functionality
to be able to maintain all tasks we support. I have built an
own distribution too and I too have to maintain my diffs to 
the LVS-code and the ipvsadm app. I'd disagree if ipvsadm would
become distro-related.

> We know how many problems the users have when they use the ipvsadm
> supplied in their distrib.

So the distributions can handle it. It can't be our task to
adjust the binary tool to every distro it's our task to keep
it clean and independant of any distro.
 
> > Hmm, I just want to limit the amount of concurrent connections
> > per realserver and in the future maybe per service. This saved
> > me quite some lines of code in my userspace healthchecking
> > daemon.
> 
>         Yes, you vote for moving some features from user to the
> kernel space. We must find the right balance: what can be done in
> LVS and what must be implemented in the user space tools.

I absolutely agree and even if you consider the fact that in 
the future (although I don't think so) this very clean patch
could be part of the mainstream kernel. I also disagree putting
every cool feature one thinks he needs into the kernel just 
because it's faster and it saves him 2500+ lines of code. My
ugly patch doesn't, if implemented in the correct way, affect
the normal kernel control path in case you don't use the 
feature. Anyway, we will find a cool solution to the problem
because admittedly both solutions are not the best of all worlds.
I also would like to hear from other people what experiences
they've made with DDoS and the way the LVS was working under
an attack. So far I've not seen more than an akademic proof
(doing some stress tests not reflecting real world example)
to the designed defense strategies. I think Anoush was working
on something too but I haven't heard of him since ages ;)
 
>         May be we can limit the SYN rate. Of course, that not covers
> all cases, so my thought was to limit the packet rate for all states
> or per connection, not sure, this is an open topic. It is easy to open

Uih! This is strong tobacco. You can screw quite alot if you
start doing modifications in all states. But we should discuss
such an approach because it sounds challenging.

> a connection through the director (especially in LVS-DR) and then
> to flood with packets this connection. This is one of the cases where
> LVS can really guard the real servers from packet floods. If we

True.

> combine this with the other kind of attacks, the distributed ones,
> we have better control. Of course, some QoS implementations can
> cover such problems, not sure. And this can be a simple implementation,
> of course, nobody wants to invent the wheel :)

Yes, I doubt however that existing QoS schedulers would already
bring such a functionality.
 
>         Yes, I know that this is a working solution. But see, you
> exclude all real servers :) You are giving up. My idea is we to find

No. :) :) I allow all existing connections for a template entered
into the transition table to finish their requests. It's just the
quiesced mode. I was very happy, when Wensong introduced it ;) The
RS just won't get new requests until at least one of them falls
below the low threshold.

> a state when we can drop some of the requests and to keep the
> real servers busy but responsive. This can be a difficult task but
> not when we have the help from our agents. We expect that many
> valid requests can be dropped but if we keep the real server in
> good health we can handle some valid requests because nobody knows
> when the flood will stop. The link is busy but it contains valid
> requests. And the service does not see the invalid ones.

This is the biggest problem with LVS in DR-mode. The control of
the states and the packets. We just don't have yet a reliable
way of weighting an incoming connection and this is IMHO also
impossible.
 
> > TUX 2.0 and try to overflow it. If you can, apply the zero-copy
> > patches of DaveM. No way you will find such a fast (88MBit/s
> > requests!!) Link to saturate the server.
> 
>         Nobody overflows the service :) You need so many clients
> for this. The easiest way the attackers use is to flood the link.
> And they prefer to reach the service because this makes more
> troubles. More hops reached, more links busy, more troubles.

I can't follow you here, sorry ;)
 
>         The requests are not meaningful, we care how much load they
> introduce and we report this load to the director. It can look, for
> example, as one value (weight) for the real host that can be set
> for all real services running on this host. We don't need to generate
> 10 weights for the 10 real services running in our real host. And

I don't know, this could be desirable unless we have an
intelligent enough scheduler. In lots of projects I've seen
or implemented the application or database behind such a LVS
cluster was crap or the Tier-architecture was extremly clumsy
so that already after a day I had huge load imbalance even
with wlc and non-persistency.

> we change the weight on each 2 seconds for example. We need two
> syscalls (lseek and read) to get most of the values from /proc fs.
> But may be from 2-3 files. This is in Linux, of course. Not sure
> how this behaves under attack. We will see it :)

Are you going for it?
 
>         The only problem we have with this scheme is the ipvsadm
> binary. It must be changed (the user structure in the kernel :))

This is not your only problem :)

> The last change is dated from 0.9.10 and this is a big period :)
> But you know what means a change in the user structures :)

Indeed.
 
>         Yes, the picture is complex and there are so many details
> we can consider. IMO, there is no simple solution :) But if we
> combine all useful ideas in a user space software, I think, we can
> have an useful tool.

Definitely true, you already started with a very promising 
user space tool which is extremely open to extend.
 
> > A packetfilter, the router (most of use do have a CISCO, don't the?)
> 
>         Yes, the question is how Cisco will know what packet rate
> overloads the real servers :)

:) The router is in my example just configured to drop non-net related
packets and these are already enough (seeing the huge logfile that
comes every day.
 
>         No, no :) I'm never happy, always look for better ideas (a
> joke :)) May be I'm thinking for too complex things. And the time is
> always not enough :)

Well, I'm happy to hear this so I know we're both pulling on the
same rope. I'm also not happy as long as the proper solution I
can live with is implemented. That's the way the IT business should
work (it doesn't however).
 
Thank you again for the interesting comments and thoughts,
Roberto Nibali, ratz

-- 
mailto: `echo NrOatSz@xxxxxxxxx | sed 's/[NOSPAM]//g'`

<Prev in Thread]	Current Thread	[Next in Thread>
Re: [PATCH][RFC]: add threshhold per RS (dirty hospital version), Roberto Nibali Re: [PATCH][RFC]: add threshhold per RS (dirty hospital version), Julian Anastasov Re: [PATCH][RFC]: followup ..., Roberto Nibali <= Re: [PATCH][RFC]: followup ..., Julian Anastasov Re: [PATCH][RFC]: followup ..., Roberto Nibali Re: [PATCH][RFC]: followup ..., Julian Anastasov Re: [PATCH][RFC]: followup ..., Roberto Nibali Re: [PATCH][RFC]: followup ..., Julian Anastasov Re: [PATCH][RFC]: followup ..., Henrik Nordstrom Re: [PATCH][RFC]: followup ..., Roberto Nibali Re: [PATCH][RFC]: followup ..., Henrik Nordstrom Re: [PATCH][RFC]: followup ..., Roberto Nibali

Previous by Date:	Re: LVS with mark tracking, Roberto Nibali
Next by Date:	Re: some info for DH and SH schedulers (fwd), Roberto Nibali
Previous by Thread:	Re: [PATCH][RFC]: add threshhold per RS (dirty hospital version), Julian Anastasov
Next by Thread:	Re: [PATCH][RFC]: followup ..., Julian Anastasov
Indexes:	[Date] [Thread] [Top] [All Lists]