Hello,
Only few notes below. If all goes well, v4 can be
the last version... :)
On Wed, 21 Mar 2018, Inju Song wrote:
> Implements the Google's Maglev hashing algorithm as a IPVS scheduler.
>
> Basically it provides consistent hashing but offers some special
> features about disruption and load balancing.
>
> 1) minimal disruption: when the set of destinations changes,
> a connection will likely be sent to the same destination
> as it was before.
>
> 2) load balancing: each destination will receive an almost
> equal number of connections.
>
> Seel also for detail: [3.4 Consistent Hasing] in
> https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-eisenbud.pdf
Above commit message is properly indented but comments in
patches 1 and 3 should not be indented, there should not be
spaces at left.
> Signed-off-by: Inju Song <inju.song@xxxxxxxxxxxxx>
> ---
> net/netfilter/ipvs/ip_vs_mh.c | 532
> ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 532 insertions(+)
> create mode 100644 net/netfilter/ipvs/ip_vs_mh.c
>
> diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
> new file mode 100644
> index 0000000..e531100
> --- /dev/null
> +++ b/net/netfilter/ipvs/ip_vs_mh.c
> @@ -0,0 +1,532 @@
> + IP_VS_DBG_BUF(6,
> + "MH: selected unavailable server %s:%d (offset
> %d), reselecting",
> + IP_VS_DBG_ADDR(dest->af, &dest->addr),
> + ntohs(dest->port), roffset);
As some vars are unsigned int, make sure all %d in
debug messages are changed to %u, for consistency. This
also includes all ports. I guess, these %d come from the
SH scheduler.
> +static void ip_vs_mh_state_free(struct rcu_head *head)
> +{
> + struct ip_vs_mh_state *s;
> +
> + s = container_of(head, struct ip_vs_mh_state, rcu_head);
> + kfree(s->lookup);
> + kfree(s);
> +}
> +
> +static int ip_vs_mh_init_svc(struct ip_vs_service *svc)
> +{
> + struct ip_vs_mh_state *s;
> +
> + /* allocate the MH table for this service */
> + s = kzalloc(sizeof(*s), GFP_KERNEL);
> + if (!s)
> + return -ENOMEM;
> +
> + s->lookup = kcalloc(IP_VS_MH_TAB_SIZE, sizeof(struct ip_vs_mh_lookup),
> + GFP_KERNEL);
> + if (!s->lookup) {
> + kfree(s);
> + return -ENOMEM;
> + }
> +
> + generate_hash_secret(&s->hash1, &s->hash2);
> + s->gcd = ip_vs_mh_gcd_weight(svc);
> + s->rshift = ip_vs_mh_shift_weight(svc, s->gcd);
> +
> + svc->sched_data = s;
> + IP_VS_DBG(6,
> + "MH lookup table (memory=%zdbytes) allocated for current
> service\n",
> + sizeof(struct ip_vs_mh_lookup) * IP_VS_MH_TAB_SIZE);
> +
> + /* permutate & populate with current dests */
> + return ip_vs_mh_reassign(s, svc);
Looks like we must not set svc->sched_data on error.
We can do it like this:
int ret;
...
IP_VS_DBG()
ret = ip_vs_mh_reassign(s, svc);
if (ret < 0) {
ip_vs_mh_reset(s); /* This is here for completeness */
ip_vs_mh_state_free(&s->rcu_head);
return ret;
}
/* No more failures, attach state */
svc->sched_data = s;
return 0;
It should be ok because we use svc->sched_data only
from entry points, the scheduler does not use it internally.
> +}
> +
> +static void ip_vs_mh_done_svc(struct ip_vs_service *svc)
> +{
> + struct ip_vs_mh_state *s = svc->sched_data;
> +
> + /* got to clean up hash buckets here */
> + ip_vs_mh_reset(s);
> +
> + /* release the table itself */
> + ip_vs_mh_state_free(&s->rcu_head);
Above should be:
call_rcu(&s->rcu_head, ip_vs_mh_state_free);
As result, the RCU callback will be called after
RCU grace period giving chance to RCU readers to stop
using the svc->sched_data state.
> + IP_VS_DBG(6, "MH lookup table (memory=%zdbytes) released\n",
> + sizeof(struct ip_vs_mh_lookup) * IP_VS_MH_TAB_SIZE);
Regards
--
Julian Anastasov <ja@xxxxxx>
|