LVS
lvs-devel
Google
 
Web LinuxVirtualServer.org

Re: [PATCH] netfilter: ipvs: Add Maglev consistent hashing scheduler

To: Julian Anastasov <ja@xxxxxx>
Subject: Re: [PATCH] netfilter: ipvs: Add Maglev consistent hashing scheduler
Cc: lvs-devel@xxxxxxxxxxxxxxx,  Wensong Zhang <wensong@xxxxxxxxxxxx>,  Simon Horman <horms@xxxxxxxxxxxx>
From: Inju Song <inju.song@xxxxxxxxxxxxx>
Date: Tue, 21 Nov 2017 01:36:08 +0900
Hello, Julian

Thank you for your a lot of comments.

On Sun, Nov 19, 2017 at 06:04:35PM +0200, Julian Anastasov wrote:
> 
>       Hello,
> 
> On Sun, 19 Nov 2017, Inju Song wrote:
> 
> > Implements the Google's Maglev hashing algorithm as a IPVS scheduler.
> > 
> > Basically it provides consistent hashing but offers some special
> > features about disruption and load balancing.
> > 
> >   1) minimal disruption: when the set of destinations changes,
> >      a connection will likely be sent to the same destination
> >      as it was before.
> > 
> >   2) load balancing: each destination will receive an almost
> >      equal number of connections.
> > 
> > Seel also for detail: [3.4 Consistent Hasing] in
> > https://static.googleusercontent.com/media/research.google.com/ko//pubs/archive/44824.pdf
> 
>        I see that the PDF shows implementation only for servers
> with same weight. It would be better to see implementation that
> considers dest->weight.
> 

I agree. In this case, weight value is same weight or zero.
So I will consider this case when I patch the source.

I have read your comments about memory allocation fail,
caching table, setting the entry table with NULLs and so on.

I will do fix it and send v2 patchsets.

> > Signed-off-by: Inju Song <inju.song@xxxxxxxxxxxxx>
> > ---
> >  net/netfilter/ipvs/ip_vs_mh.c | 402 
> > ++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 402 insertions(+)
> >  create mode 100644 net/netfilter/ipvs/ip_vs_mh.c
> > 
> > diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
> > new file mode 100644
> > index 0000000..534a9f5
> > --- /dev/null
> > +++ b/net/netfilter/ipvs/ip_vs_mh.c
> 
> > +static inline int
> > +ip_vs_mh_populate(struct ip_vs_mh_state *s, struct ip_vs_service *svc,
> > +             unsigned int **permutation)
> > +{
> > +   int i;
> > +   unsigned int *next;
> > +   struct ip_vs_mh_lookup *entry, *l;
> > +   struct list_head *p;
> > +   struct ip_vs_dest *dest;
> > +   int dcnt;
> > +   unsigned int n, c;
> > +
> > +   dcnt = svc->num_dests;
> 
>       While calling del_dest svc->num_dests can be 0, this should
> be considered in ip_vs_mh_permutate and ip_vs_mh_populate. We should
> set the table with NULLs in such case.
> 
> > +   next = kcalloc(dcnt, sizeof(unsigned int), GFP_KERNEL);
> > +   entry = kcalloc(IP_VS_MH_LOOKUP_SIZE, sizeof(*entry),
> > +                   GFP_KERNEL);
> 
>       It is current drawback that add_dest, upd_dest and del_dest
> are considered as non-failing methods, we can not fail the operation
> if memory allocation fails, for example. This is bad for add_dest,
> not so for upd_dest and del_dest. In this regard, I think, the MH
> scheduler should not crash if allocation fails.
> 
> > +/* Assign all the hash buckets of the specified table with the service.
> > + */
> > +static int
> > +ip_vs_mh_reassign(struct ip_vs_mh_state *s, struct ip_vs_service *svc)
> > +{
> > +   int dcnt;
> > +   unsigned int **permutation;
> > +
> > +   /* flush all the hash entry before assigning mh entry */
> > +   ip_vs_mh_flush(s);
> 
>       ip_vs_mh_reassign should not call ip_vs_mh_flush.
> ip_vs_mh_populate while copying the temp table to s->lookup[]
> should put the refcnt to old dests (as in SH:ip_vs_sh_reassign)
> and then to assign new value. NULL should be assigned if
> svc->num_dests = 0.
> 
> > +/* IPVS MH Scheduler structure */
> > +static struct ip_vs_scheduler ip_vs_mh_scheduler = {
> > +   .name =                 "mh",
> > +   .refcnt =               ATOMIC_INIT(0),
> > +   .module =               THIS_MODULE,
> > +   .n_list  =              LIST_HEAD_INIT(ip_vs_mh_scheduler.n_list),
> > +   .init_service =         ip_vs_mh_init_svc,
> > +   .done_service =         ip_vs_mh_done_svc,
> > +   .add_dest =             ip_vs_mh_dest_changed,
> > +   .del_dest =             ip_vs_mh_dest_changed,
> > +   .upd_dest =             ip_vs_mh_dest_changed,
> 
>       As upd_dest is called only when weight changes, the SH and MH
> schedulers should not handle this method. Their goal is to fill
> the table with all added dests by ignoring health state (it is
> considered only on dest selection).
> 
>       I guess, the correct configuration for SH and MH would be:
> 
> 1.1 Add all dests with specific weight, so that the schedulers can
> fill their table based on the weights.
> 1.2 Set weight to 0 for all servers that are not healthy. SH and MH
> should ignore these configuration events.
> 
> after the initial configuration:
> 
> 2. Traffic will ignore non-healthy servers, fallbacks to healthy ones.
> 3. If other servers become unavailable, change their weight to 0
> 
> Regards
> 
> --
> Julian Anastasov <ja@xxxxxx>

Regards

-- 
Inju Song
NAVER Corporation

--
To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

<Prev in Thread] Current Thread [Next in Thread>