LVS
lvs-devel
Google
 
Web LinuxVirtualServer.org

Re: [PATCHv3 ipvs-next 2/3] netfilter: ipvs: Add Maglev consistent hashi

To: Julian Anastasov <ja@xxxxxx>
Subject: Re: [PATCHv3 ipvs-next 2/3] netfilter: ipvs: Add Maglev consistent hashing scheduler
Cc: lvs-devel@xxxxxxxxxxxxxxx, Simon Horman <horms@xxxxxxxxxxxx>,  Wensong Zhang <wensong@xxxxxxxxxxxx>
From: Inju Song <inju.song@xxxxxxxxxxxxx>
Date: Mon, 26 Mar 2018 14:49:37 +0900
        Hello, 

        Thank you for your comments.

On Sat, Mar 24, 2018 at 01:26:04PM +0200, Julian Anastasov wrote:
> 
>       Hello,
> 
>       Only few notes below. If all goes well, v4 can be
> the last version... :)
> 
> On Wed, 21 Mar 2018, Inju Song wrote:
> 
> > Implements the Google's Maglev hashing algorithm as a IPVS scheduler.
> > 
> > Basically it provides consistent hashing but offers some special
> > features about disruption and load balancing.
> > 
> >   1) minimal disruption: when the set of destinations changes,
> >      a connection will likely be sent to the same destination
> >      as it was before.
> > 
> >   2) load balancing: each destination will receive an almost
> >      equal number of connections.
> > 
> > Seel also for detail: [3.4 Consistent Hasing] in
> > https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-eisenbud.pdf
> 
>       Above commit message is properly indented but comments in
> patches 1 and 3 should not be indented, there should not be
> spaces at left.
> 

        ok. I will remove indentions in patch 1 and 3.

> > Signed-off-by: Inju Song <inju.song@xxxxxxxxxxxxx>
> > ---
> >  net/netfilter/ipvs/ip_vs_mh.c | 532 
> > ++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 532 insertions(+)
> >  create mode 100644 net/netfilter/ipvs/ip_vs_mh.c
> > 
> > diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
> > new file mode 100644
> > index 0000000..e531100
> > --- /dev/null
> > +++ b/net/netfilter/ipvs/ip_vs_mh.c
> > @@ -0,0 +1,532 @@
> 
> > +           IP_VS_DBG_BUF(6,
> > +                         "MH: selected unavailable server %s:%d (offset 
> > %d), reselecting",
> > +                         IP_VS_DBG_ADDR(dest->af, &dest->addr),
> > +                         ntohs(dest->port), roffset);
> 
>       As some vars are unsigned int, make sure all %d in
> debug messages are changed to %u, for consistency. This
> also includes all ports. I guess, these %d come from the
> SH scheduler.
> 

        Yes, these printing format was from SH scheduler..
I will fix these properly.

> > +static void ip_vs_mh_state_free(struct rcu_head *head)
> > +{
> > +   struct ip_vs_mh_state *s;
> > +
> > +   s = container_of(head, struct ip_vs_mh_state, rcu_head);
> > +   kfree(s->lookup);
> > +   kfree(s);
> > +}
> > +
> > +static int ip_vs_mh_init_svc(struct ip_vs_service *svc)
> > +{
> > +   struct ip_vs_mh_state *s;
> > +
> > +   /* allocate the MH table for this service */
> > +   s = kzalloc(sizeof(*s), GFP_KERNEL);
> > +   if (!s)
> > +           return -ENOMEM;
> > +
> > +   s->lookup = kcalloc(IP_VS_MH_TAB_SIZE, sizeof(struct ip_vs_mh_lookup),
> > +                       GFP_KERNEL);
> > +   if (!s->lookup) {
> > +           kfree(s);
> > +           return -ENOMEM;
> > +   }
> > +
> > +   generate_hash_secret(&s->hash1, &s->hash2);
> > +   s->gcd = ip_vs_mh_gcd_weight(svc);
> > +   s->rshift = ip_vs_mh_shift_weight(svc, s->gcd);
> > +
> > +   svc->sched_data = s;
> > +   IP_VS_DBG(6,
> > +             "MH lookup table (memory=%zdbytes) allocated for current 
> > service\n",
> > +             sizeof(struct ip_vs_mh_lookup) * IP_VS_MH_TAB_SIZE);
> > +
> > +   /* permutate & populate with current dests */
> > +   return ip_vs_mh_reassign(s, svc);
> 
>       Looks like we must not set svc->sched_data on error.
> We can do it like this:
> 
>       int ret;
> 
>       ...
>       IP_VS_DBG()
>       ret = ip_vs_mh_reassign(s, svc);
>       if (ret < 0) {
>               ip_vs_mh_reset(s); /* This is here for completeness */
>               ip_vs_mh_state_free(&s->rcu_head);
>               return ret;
>       }
>       /* No more failures, attach state */
>       svc->sched_data = s;
>       return 0;
> 
>       It should be ok because we use svc->sched_data only
> from entry points, the scheduler does not use it internally.
> 

        OK. Above case will be handled.

> > +}
> > +
> > +static void ip_vs_mh_done_svc(struct ip_vs_service *svc)
> > +{
> > +   struct ip_vs_mh_state *s = svc->sched_data;
> > +
> > +   /* got to clean up hash buckets here */
> > +   ip_vs_mh_reset(s);
> > +
> > +   /* release the table itself */
> > +   ip_vs_mh_state_free(&s->rcu_head);
> 
>       Above should be:
> 
>       call_rcu(&s->rcu_head, ip_vs_mh_state_free);
> 
>       As result, the RCU callback will be called after
> RCU grace period giving chance to RCU readers to stop
> using the svc->sched_data state.
> 

        Ok, I missed adding to schedule rcu callback for
freeing the state. It will be fixed. Thanks.

> > +   IP_VS_DBG(6, "MH lookup table (memory=%zdbytes) released\n",
> > +             sizeof(struct ip_vs_mh_lookup) * IP_VS_MH_TAB_SIZE);
> 

Regards

-- 
Inju Song
NAVER Corporation

--
To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

<Prev in Thread] Current Thread [Next in Thread>