LVS
lvs-devel
Google
 
Web LinuxVirtualServer.org

Re: [PATCH net-next v2] net: ipvs: randomize starting destination of RR/

To: menglong8.dong@xxxxxxxxx
Subject: Re: [PATCH net-next v2] net: ipvs: randomize starting destination of RR/WRR scheduler
Cc: Simon Horman <horms@xxxxxxxxxxxx>, pablo@xxxxxxxxxxxxx, lvs-devel@xxxxxxxxxxxxxxx, netfilter-devel@xxxxxxxxxxxxxxx, Menglong Dong <imagedong@xxxxxxxxxxx>, Jiang Biao <benbjiang@xxxxxxxxxxx>, Hao Peng <flyingpeng@xxxxxxxxxxx>
From: Julian Anastasov <ja@xxxxxx>
Date: Wed, 11 May 2022 00:51:21 +0300 (EEST)
        Hello,

On Tue, 10 May 2022, menglong8.dong@xxxxxxxxx wrote:

> From: Menglong Dong <imagedong@xxxxxxxxxxx>
> 
> For now, the start of the RR/WRR scheduler is in order of added
> destinations, it will result in imbalance if the director is local
> to the clients and the number of connection is small.
> 
> For example, we have client1, client2, ..., client100 and real service
> service1, service2, ..., service10. All clients have local director with
> the same ipvs config, and each of them will create 2 long TCP connect to
> the virtual service. Therefore, all the clients will connect to service1
> and service2, leaving others free, which will make service1 and service2
> overloaded.

        More time I spend on this topic, I'm less
convinced that it is worth the effort. Randomness can come
from another place: client address/port. Schedulers
like MH and SH probably can help for such case.
RR is so simple scheduler that I doubt it is used in
practice. People prefer WLC, WRR and recently MH which
has many features:

- lockless
- supports server weights
- safer on dest adding/removal or weight changes
- fallback, optional
- suitable for sloppy TCP/SCTP mode to avoid the
SYNC mechanism in active-active setups

> Fix this by randomizing the starting destination when
> IP_VS_SVC_F_SCHED_RR_RANDOM/IP_VS_SVC_F_SCHED_WRR_RANDOM is set.
> 
> I start the randomizing from svc->destinations, as we choose the starting
> destination from all of the destinations, so it makes no different to
> start from svc->sched_data or svc->destinations. If we start from
> svc->sched_data, we have to avoid the 'head' node of the list being the
> next node of svc->sched_data, to make the choose totally random.

        Yes, first dest has two times more chance if we do
not account the head.

> Reviewed-by: Jiang Biao <benbjiang@xxxxxxxxxxx>
> Reviewed-by: Hao Peng <flyingpeng@xxxxxxxxxxx>
> Signed-off-by: Menglong Dong <imagedong@xxxxxxxxxxx>
> ---
> v2:
> - randomizing the starting of WRR scheduler too
> - Replace '|' with '&' in ip_vs_rr_random_start(Julian Anastasov)
> - Replace get_random_u32() with prandom_u32_max() (Julian Anastasov)
> ---
>  include/uapi/linux/ip_vs.h     |  3 +++
>  net/netfilter/ipvs/ip_vs_rr.c  | 25 ++++++++++++++++++++++++-
>  net/netfilter/ipvs/ip_vs_wrr.c | 20 ++++++++++++++++++++
>  3 files changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/ip_vs.h b/include/uapi/linux/ip_vs.h
> index 4102ddcb4e14..9543906dae7d 100644
> --- a/include/uapi/linux/ip_vs.h
> +++ b/include/uapi/linux/ip_vs.h
> @@ -28,6 +28,9 @@
>  #define IP_VS_SVC_F_SCHED_SH_FALLBACK        IP_VS_SVC_F_SCHED1 /* SH 
> fallback */
>  #define IP_VS_SVC_F_SCHED_SH_PORT    IP_VS_SVC_F_SCHED2 /* SH use port */
>  
> +#define IP_VS_SVC_F_SCHED_WRR_RANDOM IP_VS_SVC_F_SCHED1 /* random start */
> +#define IP_VS_SVC_F_SCHED_RR_RANDOM  IP_VS_SVC_F_SCHED1 /* random start */
> +
>  /*
>   *      Destination Server Flags
>   */
> diff --git a/net/netfilter/ipvs/ip_vs_rr.c b/net/netfilter/ipvs/ip_vs_rr.c
> index 38495c6f6c7c..d53bfaf7aadf 100644
> --- a/net/netfilter/ipvs/ip_vs_rr.c
> +++ b/net/netfilter/ipvs/ip_vs_rr.c
> @@ -22,13 +22,36 @@
>  
>  #include <net/ip_vs.h>
>  
> +static void ip_vs_rr_random_start(struct ip_vs_service *svc)
> +{
> +     struct list_head *cur;
> +     u32 start;
> +
> +     if (!(svc->flags & IP_VS_SVC_F_SCHED_RR_RANDOM) ||
> +         svc->num_dests <= 1)
> +             return;
> +
> +     start = prandom_u32_max(svc->num_dests);
> +     spin_lock_bh(&svc->sched_lock);
> +     cur = &svc->destinations;
> +     while (start--)
> +             cur = cur->next;
> +     svc->sched_data = cur;
> +     spin_unlock_bh(&svc->sched_lock);
> +}
>  
>  static int ip_vs_rr_init_svc(struct ip_vs_service *svc)
>  {
>       svc->sched_data = &svc->destinations;
> +     ip_vs_rr_random_start(svc);
>       return 0;
>  }
>  
> +static int ip_vs_rr_add_dest(struct ip_vs_service *svc, struct ip_vs_dest 
> *dest)
> +{
> +     ip_vs_rr_random_start(svc);
> +     return 0;
> +}
>  
>  static int ip_vs_rr_del_dest(struct ip_vs_service *svc, struct ip_vs_dest 
> *dest)
>  {
> @@ -104,7 +127,7 @@ static struct ip_vs_scheduler ip_vs_rr_scheduler = {
>       .module =               THIS_MODULE,
>       .n_list =               LIST_HEAD_INIT(ip_vs_rr_scheduler.n_list),
>       .init_service =         ip_vs_rr_init_svc,
> -     .add_dest =             NULL,
> +     .add_dest =             ip_vs_rr_add_dest,
>       .del_dest =             ip_vs_rr_del_dest,
>       .schedule =             ip_vs_rr_schedule,
>  };
> diff --git a/net/netfilter/ipvs/ip_vs_wrr.c b/net/netfilter/ipvs/ip_vs_wrr.c
> index 1bc7a0789d85..ed6230976379 100644
> --- a/net/netfilter/ipvs/ip_vs_wrr.c
> +++ b/net/netfilter/ipvs/ip_vs_wrr.c
> @@ -102,6 +102,24 @@ static int ip_vs_wrr_max_weight(struct ip_vs_service 
> *svc)
>       return weight;
>  }
>  
> +static void ip_vs_wrr_random_start(struct ip_vs_service *svc)
> +{
> +     struct ip_vs_wrr_mark *mark = svc->sched_data;
> +     struct list_head *cur;
> +     u32 start;
> +
> +     if (!(svc->flags & IP_VS_SVC_F_SCHED_WRR_RANDOM) ||
> +         svc->num_dests <= 1)
> +             return;
> +
> +     start = prandom_u32_max(svc->num_dests);
> +     spin_lock_bh(&svc->sched_lock);
> +     cur = &svc->destinations;
> +     while (start--)
> +             cur = cur->next;
> +     mark->cl = list_entry(cur, struct ip_vs_dest, n_list);

        The problem with WRR is that mark->cl and mark->cw
work together, we can not change just cl.

> +     spin_unlock_bh(&svc->sched_lock);
> +}
>  
>  static int ip_vs_wrr_init_svc(struct ip_vs_service *svc)
>  {
> @@ -119,6 +137,7 @@ static int ip_vs_wrr_init_svc(struct ip_vs_service *svc)
>       mark->mw = ip_vs_wrr_max_weight(svc) - (mark->di - 1);
>       mark->cw = mark->mw;
>       svc->sched_data = mark;
> +     ip_vs_wrr_random_start(svc);
>  
>       return 0;
>  }
> @@ -149,6 +168,7 @@ static int ip_vs_wrr_dest_changed(struct ip_vs_service 
> *svc,
>       else if (mark->di > 1)
>               mark->cw = (mark->cw / mark->di) * mark->di + 1;
>       spin_unlock_bh(&svc->sched_lock);
> +     ip_vs_wrr_random_start(svc);

        This will be called even on upd_dest (while processing
packets), so the region under lock should be short and cl/cw
state should not be damaged.

Regards

--
Julian Anastasov <ja@xxxxxx>


<Prev in Thread] Current Thread [Next in Thread>