LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [PATCH][RFC]: followup ...

To: Roberto Nibali <ratz@xxxxxx>
Subject: Re: [PATCH][RFC]: followup ...
Cc: <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
From: Julian Anastasov <ja@xxxxxx>
Date: Sat, 24 Feb 2001 00:15:52 +0000 (GMT)
        Hello Ratz,

On Thu, 22 Feb 2001, Roberto Nibali wrote:

> > > >         Total 1 methods to add new separated features (may be I'm 
> > > > missing
> > > > something). The things can be very complex if one new feature wants
> > > > to touch some parts of the functions in the fast path or in the user
> > > > space structures. What can be the solution? Putting hooks inside LVS?
> > >
> > > Yes, but I don't think Wensong likes that idea :)
> >
> >         Because this idea is not clear :)
>
> Maybe. But I see that the defense_level is triggered via a sysctrl
> and invoked in the sltimer_handler as well as the *_dropentry. If
> we push those functions on level higher and introduce a metalayer
> that registers the defense_strategy which would be selectable via
> sysctrl and would currently contain update_defense_level we had the
> possibility to register other defense strategies like f.e. limiting
> threshold. Is this feasible? I mean instead of calling update_defense\
> _level() and ip_vs_random_dropentry() in the sltimer_handler we just
> call the registered defense_strategy[sysctrl_read] function. In the
> existing case the defense_strategy[0]=update_defense_level() which
> also merges the ip_vs_dropentry. Do I make myself sound stupid? ;)

        The different strategies work in different places and it is
difficult to use one hook. The current implementation allows they to
work together. But may be there is another solution considering how
LVS is called: to drop packets or to drop entries. There are no many
places for such hooks, so may be it is possible something to be done.
But first let's see what kind of other defense strategies will come.

> > > Yes, the project got larger and more reputation than some of us
> > > initially thought. The code is very clear and stable, it's time
> > > to enhance it. The only very big problem that I see is that it
> > > looks like we're going to have to separate code paths one patch
> > > for 2.2.x kernels and one for 2.4.x.
> >
> >         Yes, this is the reality. We can try to keep the things not
> > to look different for the user space.
>
> This would be a pain in the ass if we had two ipvsadm. IMHO the
> userspace tools should recognize (compile-time) what kernel it
> is working with and therefore enable the featureset. This will
> of course bloat it up in future the more feature-differences we
> will have regarding 2.2.x and 2.4.x series.

        Not possible, the sockopt are different in 2.4

> Could you point me to a sketch where I could try to see how the
> control path for a packet looks like in kernel 2.4? I mean some-
> thing like I would do for 2.2.x kernels:
>
>           ----------------------------------------------------------------
>           |            ACCEPT/                              lo interface |
>           v           REDIRECT                  _______                  |
>   --> C --> S --> ______ --> D --> ~~~~~~~~ -->|forward|----> _______ -->
>       h     a    |input |    e    {Routing }   |Chain  |     |output |ACCEPT
>       e     n    |Chain |    m    {Decision}   |_______| --->|Chain  |
>       c     i    |______|    a     ~~~~~~~~        |     | ->|_______|
>       k     t       |        s       |             |     | |     |
>       s     y       |        q       |             v     | |     |
>       u     |       v        e       v            DENY/  | |     v
>       m     |     DENY/      r   Local Process   REJECT  | |   DENY/
>       |     v    REJECT      a       |                   | |  REJECT
>       |   DENY               d       --------------------- |
>       v                      e -----------------------------
>      DENY


        Here is some info I maintain (may be not actual, the new ICMP
hooks are missing). Look for "LVS" where is LVS placed.

The Netfilter hooks:

Priorities:
        NF_IP_PRI_FIRST = INT_MIN,
        NF_IP_PRI_CONNTRACK = -200,
        NF_IP_PRI_MANGLE = -150,
        NF_IP_PRI_NAT_DST = -100,
        NF_IP_PRI_FILTER = 0,
        NF_IP_PRI_NAT_SRC = 100,
        NF_IP_PRI_LAST = INT_MAX,


PRE_ROUTING (ip_input.c:ip_rcv):
        CONNTRACK=-200, ip_conntrack_core.c:ip_conntrack_in
        MANGLE=-150, iptable_mangle.c:ipt_hook
        NAT_DST=-100, ip_nat_standalone.c:ip_nat_fn
        FILTER=0, ip_fw_compat.c:fw_in, defrag, firewall, demasq, redirect
        FILTER+1=1, net/sched/sch_ingress.c:ing_hook

LOCAL_IN (ip_input.c:ip_local_deliver):
        FILTER=0, iptable_filter.c:ipt_hook
        LVS=100, ip_vs_in
        LAST-1, ip_fw_compat.c:fw_confirm
        CONNTRACK=LAST-1, ip_conntrack_standalone.c:ip_confirm

FORWARD (ip_forward.c:ip_forward):
        FILTER=0, iptable_filter.c:ipt_hook
        FILTER=0, ip_fw_compat.c:fw_in, firewall, LVS:check_for_ip_vs_out,
                masquerade
        LVS=100, ip_vs_out

LOCAL_OUT (ip_output.c):
        CONNTRACK=-200, ip_conntrack_standalone.c:ip_conntrack_local
        MANGLE=-150, iptable_mangle.c:ipt_local_out_hook
        NAT_DST=-100, ip_nat_standalone.c:ip_nat_local_fn
        FILTER=0, iptable_filter.c:ipt_local_out_hook

POST_ROUTING (ip_output.c:ip_finish_output):
        FILTER=0, ip_fw_compat.c:fw_in, firewall, unredirect,
                mangle ICMP replies
        LVS=NAT_SRC-1, ip_vs_post_routing
        NAT_SRC=100, ip_nat_standalone.c:ip_nat_out
        CONNTRACK=LAST, ip_conntrack_standalone.c:ip_refrag


CONNTRACK:
        PRE_ROUTING, LOCAL_IN, LOCAL_OUT, POST_ROUTING

FILTER:
        LOCAL_IN, FORWARD, LOCAL_OUT

MANGLE:
        PRE_ROUTING, LOCAL_OUT

NAT:
        PRE_ROUTING, LOCAL_OUT, POST_ROUTING


Running variants:

1. Only lvs - the fastest
2. lvs + ipfw NAT
3. lvs + iptables NAT

Where is LVS placed:

LOCAL_IN:100 ip_vs_in

FORWARD:100 ip_vs_out

POST_ROUTING:NF_IP_PRI_NAT_SRC-1 ip_vs_post_routing



The chains:

The out->in LVS packets (for any forwarding method) walk:

pre_routing -> LOCAL_IN -> ip_route_output or dst cache -> POST_ROUTING


        LOCAL_IN
        ip_vs_in        -> ip_route_output/dst cache
                        -> set skb->nfmark with special value
                        -> ip_send -> POST_ROUTING

        POST_ROUTING
        ip_vs_post_routing
                        - check skb->nfmark and exit from the
                        chain


The in->out LVS packets (for LVS/NAT) walk:

pre_routing -> FORWARD -> POST_ROUTING

        FORWARD
        ip_vs_out       -> NAT -> NF_ACCEPT

        POST_ROUTING
        ip_vs_post_routing
                        - check skb->nfmark and exit from the
                        chain

        I hope in the netfilter docs there is a nice ascii diagram.
But I hope the above info is more useful if you already know what
means each hook.


> > > The biggest problem I see here is that maybe the user space daemons
> > > don't get enough scheduling time to be accurate enough.
> >
> >         That is definitely true. When the CPU(s) are busy
> > transferring packets the processes can be delayed. So, the director
> > better not spend many cycles in user space. This is the reason I
> > prefer all these health checks to run in the real servers but this
> > is not always good/possible.
>
> No, considering the fact that not all RS are running Linux. We would
> need to port the healthchecks to every possible RS architecture.

        Yes, this is a drawback.

> > > Tell me, which scheduler should I take? None of the existing ones
> > > gives me good enough results currently with persistency. We have
> > > to accept the fact, that 3-Tier application programmers don't
> > > know about loadbalancing or clustering, mostly using Java and this
> > > is just about the end of trying to load balance the application
> > > smoothly.
> >
> >         WRR + load informed cluster software. But I'm not sure in
> > the fact the persistency can do very bad things. May be yes, but
> > for small number of clients or when wlc is used (we don't talk for
> > the other dumb schedulers).
>
> I currently get some values via an daemon coded in perl on the RS,
> started via xinetd. The LB connects to the healthcheck port and
> gets some prepared results. He then puts this stuff into a db and
> starts calculating the next steps to reconfigure the LVS-cluster to
> smoothen the imbalance. The longer you let it running the more data
> you get and the less adjustments you have to make. I reckon some
> guy showing up on this list once had this idea in direction of
> fuzzy logic. Hey Julian, maybe we should accept the fact that the
> wlc scheduler also isn't a very advanced one:
> loh = atomic_read(&least->activeconns)*50+atomic_read(&least->inactconns);
> What would you think would change if we made this 50 dynamic?

        Not sure :) I don't have results from experiments with wlc :)
You can put it in /proc and to make different experiments, for example :)
But warning, ip_vs_wlc can be module, check how lblc* register /proc
vars.

> Later,
> Roberto Nibali, ratz


Regards

--
Julian Anastasov <ja@xxxxxx>



<Prev in Thread] Current Thread [Next in Thread>