Re: ipvsadm: One-packet scheduling with UDP service is unstable

To: Drunkard Zhang <gongfan193@xxxxxxxxx>
Subject: Re: ipvsadm: One-packet scheduling with UDP service is unstable
Cc: lvs-devel@xxxxxxxxxxxxxxx
From: Julian Anastasov <ja@xxxxxx>
Date: Mon, 26 Aug 2013 12:55:54 +0300 (EEST)

On Mon, 26 Aug 2013, Drunkard Zhang wrote:

> Good news, I finally found the crap source, it's keepalived. I tested
> several times without keepalived in runlevel 3, after kernel boots I
> add the ipvs service by hand:

        OK, I was worried that my recent RCU changes broke
something in the WRR scheduler and the configuration process.

> ./ipvsadm -C
> # Clear previous log
> > /var/log/kern.log
> sleep 1
> # Start debug
> echo 20 > /proc/sys/net/ipv4/vs/debug_level
> ./ipvsadm -R < /etc/keepalived/rules-with-ops
> usleep 30000
> # Stop debug
> echo 0 > /proc/sys/net/ipv4/vs/debug_level
> Then add VIP manually, then do ARP announce manually:
> vs3 ~/pkgs # ip a add dev eno1
> vs3 ~/pkgs # arp-sk -i eno1 -S -d
> After these actions, traffic starts come in. and all ipvsadm checks
> are fine, OPS is fine too. So I figured that maybe outdated libipvs in
> keepalived broke the ipvs in kernel. I'll try to report this to
> upstream.

        OK, I have no more doubts. To summarize,
here is what I think happened:

- packet is scheduled while there is virtual service without
the --ops flag. The result is that an UDP connection is
created that expires after 5mins by default, if there are
no more packets.

- traffic is not stopped, it hits the connection and
restarts its timer. As result, this connection stays
forever and forwards traffic to single server.

- as single connection is used we see that the stats for
Conns and CPS rate do not move because we do not create
connections anymore, all traffic comes from single client
address and the scheduler is not called.

- there is one variation here: ipvsadm -C is called,
dests are moved to the trash list, new rules are
added but before the RCU grace period is expired.
In such case IP_VS_DEST_STATE_REMOVING is still set and
prevents the same dest to be reused when adding the
same dest parameters. In this case the connection will point
to unavailable dest for 5mins and the traffic that hits it
will not restart its timer. After 5mins the connection
will be removed and the first packet that comes
will use the --ops flag. There is a chance everything
to work. So, if new rules are added we have 2

        1. rules reuse old dests and traffic goes to single server.
        This happens if the new rules are added after at least
        10ms (the RCU grace period, in fact), eg. with
        usleep 10000 after ipvsadm -C. We have CPS=0 and
        InPPS above 0 for single server.

        2. rules allocate new dest and traffic is stopped
        for 5mins. This will happen if rules are added
        immediately after ipvsadm -C (while in RCU grace period).
        After 5mins everything works.

- CPS 0 means we are reusing existing connection

- even if you replace the service or set --ops, the
existing connection is still used, even ipvsadm -C
can not remove it. There is only one chance: to set
expire_nodest_conn=1, to call ipvsadm -C and to wait
next packet to remove the connection. Then to add
all rules again but not before the connection is removed.

> On the other hand, ipvs didn't recovery from ipvsadm -C, rmmod ip_vs
> && ./ipvsadm -R < rules-with-ops is needed (I tested, reload ip_vs
> module could make OPS work). So robustness of IPVS needs improvement.

        Some problem? May be you refer to the fact that
connections survive ipvsadm -C and that is what prevented
your traffic to be scheduled.

        So, I see two problems here:

- tools do not set --ops, connection is created and is
reused from all packets from same client. The trick
to add --ops later can not work. Idea: drop traffic
before reaching IPVS (-j DROP) until --ops is applied,
by this way no connections should be created.

- no way to flush connections in IPVS without removing the
module because expire_nodest_conn works only when traffic is
received. I think, your above remark points here.


Julian Anastasov <ja@xxxxxx>
To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at

<Prev in Thread] Current Thread [Next in Thread>