Re: [PATCH 1/3] IPVS: add wlib & wlip schedulers

To: Chris Caputo <ccaputo@xxxxxxx>
Subject: Re: [PATCH 1/3] IPVS: add wlib & wlip schedulers
Cc: Wensong Zhang <wensong@xxxxxxxxxxxx>, Simon Horman <horms@xxxxxxxxxxxx>, lvs-devel@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx
From: Julian Anastasov <ja@xxxxxx>
Date: Tue, 27 Jan 2015 10:36:34 +0200 (EET)

On Fri, 23 Jan 2015, Julian Anastasov wrote:

> On Tue, 20 Jan 2015, Chris Caputo wrote:
> > My application consists of incoming TCP streams being load balanced to 
> > servers which receive the feeds. These are long lived multi-gigabyte 
> > streams, and so I believe the estimator's 2-second timer is fine. As an 
> > example:
> > 
> > # cat /proc/net/ip_vs_stats
> >    Total Incoming Outgoing         Incoming         Outgoing
> >    Conns  Packets  Packets            Bytes            Bytes
> >      9AB  58B7C17        0      1237CA2C325                0
> > 
> >  Conns/s   Pkts/s   Pkts/s          Bytes/s          Bytes/s
> >        1     387C        0          B16C4AE                0
>       Not sure, may be everything here should be u64 because
> we have shifted values. I'll need some days to investigate
> this issue...

        For now I don't see hope in using schedulers that rely
on IPVS byte/packet stats, due to the slow update (2 seconds).
If we reduce this period we can cause performance problems to
other users.

Every *-LEAST-* (eg. LC, WLC) algorithm needs actual information
to take decision on every new connection. OTOH, all *-ROUND-ROBIN-*
algorithms (RR, WRR) use information (weights) from user space,
by this way kernel performs as expected.

        Currently, LC/WLC use feedback from the 3-way TCP handshake,
see ip_vs_dest_conn_overhead() where the established connections
have large preference. Such feedback from real servers is delayed
usually with microseconds, up to milliseconds. More time if
depends on clients.

        The proposed schedulers have round-robin function but
only among least loaded servers, so it is not dominant
and we suffer from slow feedback from the estimator.

        For load information that is not present in kernel
an user space daemon is needed to determine weights to use
with WRR. It can take actual stats from real server, for
example, it can take into account non-IPVS traffic.

        As alternative, it is possible to implement some new svc
method that can be called for every packet, for example, in
ip_vs_in_stats(). It does not look fatal to add some fields in
struct ip_vs_dest that only specific schedulers will update,
for example, byte/packet counters. Of course, the spin_locks
the scheduler must use will suffer on many CPUs. Such info can
be also attached as allocated structure in RCU pointer
dest->sched_info where data and corresponding methods can be
stored. It will need careful RCU-kind of update, especially when
scheduler is updated in svc. If you think such idea can work
we can discuss the RCU and scheduler changes that are needed.
The proposed schedulers have to implement counters, their
own estimator and WRR function.

        Another variant can be to extend WRR with some
support for automatic dynamic-weight update depending on 
parameters: -s wrr --sched-flags {wlip,wlib,...}

        or using new option --sched-param that can also
provide info for wrr estimator, etc. In any case, the
extended WRR scheduler will need above support to check
every packet.


Julian Anastasov <ja@xxxxxx>
To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at

<Prev in Thread] Current Thread [Next in Thread>