LVS
lvs-devel
Google
 
Web LinuxVirtualServer.org

Re: [*v3 PATCH 00/22] IPVS Network Name Space aware

To: hans@xxxxxxxxxxxxxxx
Subject: Re: [*v3 PATCH 00/22] IPVS Network Name Space aware
Cc: horms@xxxxxxxxxxxx, daniel.lezcano@xxxxxxx, wensong@xxxxxxxxxxxx, lvs-devel@xxxxxxxxxxxxxxx, netdev@xxxxxxxxxxxxxxx, netfilter-devel@xxxxxxxxxxxxxxx, Hans Schillstrom <hans.schillstrom@xxxxxxxxxxxx>
From: Julian Anastasov <ja@xxxxxx>
Date: Sat, 1 Jan 2011 14:27:00 +0200 (EET)

        Hello,

On Thu, 30 Dec 2010, hans@xxxxxxxxxxxxxxx wrote:

From: Hans Schillstrom <hans.schillstrom@xxxxxxxxxxxx>

This patch series adds network name space support to the LVS.

REVISION

This is version 3

OVERVIEW

The patch doesn't remove or add any functionality except for netns.
For users that don't use network name space (netns) this patch is
completely transparent.

Now it's possible to run LVS in a Linux container (see lxc-tools)
i.e.  a light weight visualization. For example it's possible to run
one or several lvs on a real server in their own network name spaces.
From the LVS point of view it looks like it runs on it's own machine.

        Only some comments for two of the patches:

v3 PATCH 10/22 use ip_vs_proto_data as param

        - Can ip_vs_protocol_timeout_change() walk
        proto_data_table instead of ip_vs_proto_table to avoid the
        __ipvs_proto_data_get call?

v3 PATCH 15/22 ip_vs_stats

        - Is ustats_seq allocated with alloc_percpu?

        Such reader sections should be changed to use tmp vars because
        on retry we risk to add the values multiple times. For example:

                do {
                        start = read_seqcount_begin(seq_count);
                        ipvs->ctl_stats->ustats.inbytes += u->inbytes;
                        ipvs->ctl_stats->ustats.outbytes += u->outbytes;
                } while (read_seqcount_retry(seq_count, start));

        should be changed as follows:

                u64 inbytes, outbytes;

                do {
                        start = read_seqcount_begin(seq_count);
                        inbytes = u->inbytes;
                        outbytes = u->outbytes;
                } while (read_seqcount_retry(seq_count, start));
                ipvs->ctl_stats->ustats.inbytes += inbytes;
                ipvs->ctl_stats->ustats.outbytes += outbytes;

        Or it is better to create new struct for percpu stats,
        they will have their own syncp, because we can not
        change struct ip_vs_stats_user. syncp should be percpu
        because we remove locks.

        For example:

        struct ip_vs_cpu_stats {
                struct ip_vs_stats_user ustats;
                struct u64_stats_sync   syncp;
        };

        Then we can add this in struct netns_ipvs:

        struct ip_vs_cpu_stats __percpu *stats; /* Statistics */

        without the seqcount_t * ustats_seq;

        Then syncp does not need any initialization, it seems
        alloc_percpu returns zeroed area.

        When we use percpu stats for all places (dest and svc) we
        can create new struct struct ip_vs_counters, so that we
        can reduce the memory usage from percpu data. Now stats
        include counters and estimated values. The estimated
        values should not be percpu. Then ip_vs_cpu_stats
        will be shorter (it is not visible to user space):

        struct ip_vs_cpu_stats {
                struct ip_vs_counters   ustats;
                struct u64_stats_sync   syncp;
        };

        For writer side in softirq context we should protect the whole
        section with u64 counters, for example this code:

        spin_lock(&ip_vs_stats.lock);
        ip_vs_stats.ustats.outpkts++;
        ip_vs_stats.ustats.outbytes += skb->len;
        spin_unlock(&ip_vs_stats.lock);

        should be changed to:

        struct ip_vs_cpu_stats *s = this_cpu_ptr(ipvs->stats);

        u64_stats_update_begin(&s->syncp);
        s->ustats.outpkts++;
        s->ustats.outbytes += skb->len;
        u64_stats_update_end(&s->syncp);

        Readers should look in similar way, only that _bh fetching
        should be used only for the u64 counters because writer is in
        softirq context:

        u64 inbytes, outbytes;

        for_each_possible_cpu(i) {
                const struct ip_vs_cpu_stats *s = per_cpu_ptr(ipvs->stats, i);
                unsigned int start;

                ...
                do {
                        start = u64_stats_fetch_begin_bh(&s->syncp);
                        inbytes = s->ustats.inbytes;
                        outbytes = s->ustats.outbytes;
                } while (u64_stats_fetch_retry_bh(&s->syncp, start));
                ipvs->ctl_stats->ustats.inbytes += inbytes;
                ipvs->ctl_stats->ustats.outbytes += outbytes;
        }

        Then IPVS_STAT_ADD and IPVS_STAT_INC can not access
        the syncp. They do not look generic enough, in case we
        decide to use struct ip_vs_cpu_stats for dest and svc stats.
        I think, these macros are not needed.
        It is possible to have other macros for write side,
        for percpu stats:

        #define IP_VS_STATS_UPDATE_BEGIN(stats)         \
                { struct ip_vs_cpu_stats *s = this_cpu_ptr(stats); \
                u64_stats_update_begin(&s->syncp)

        #define IP_VS_STATS_UPDATE_END()                \
                u64_stats_update_end(&s->syncp); \
                }

        Then usage can be:

        IP_VS_STATS_UPDATE_BEGIN(ipvs->stats);
        s->ustats.outpkts++;
        s->ustats.outbytes += skb->len;
        IP_VS_STATS_UPDATE_END();

        Then we can rename ctl_stats to total_stats or full_stats.
        get_stats() can be changed to ip_vs_read_cpu_stats(), it
        will be used to read all percpu stats as sum in the
        total_stats/full_stats structure or tmp space:

        /* Get sum of percpu stats */
        ... ip_vs_read_cpu_stats(struct ip_vs_stats_user *sum,
                struct ip_vs_cpu_stats *stats)
        {
                ...
                for_each_possible_cpu(i) {
                        const struct ip_vs_cpu_stats *s = per_cpu_ptr(stats, i);
                        unsigned int start;

                        if (i) {...
                        ...
                        do {
                                start = u64_stats_fetch_begin_bh(&s->syncp);
                                inbytes = s->ustats.inbytes;
                                outbytes = s->ustats.outbytes;
                        } while (u64_stats_fetch_retry_bh(&s->syncp, start));
                        sum->inbytes += inbytes;
                        sum->outbytes += outbytes;
                }
        }

        As result, ip_vs_read_cpu_stats() will be used in
        estimation_timer() instead of get_stats():

        ip_vs_read_cpu_stats(&ipvs->total_stats->ustats, ipvs->stats);

        but also in ip_vs_stats_show()
        {
                // I hope struct ip_vs_stats_user can fit in stack:
                struct ip_vs_stats_user sum;

                here we do not need to use spin_lock_bh(&ctl_stats->lock);
                because now the stats are lockless, estimation_timer
                does not use ctl_stats->lock, so we have to call
                ip_vs_read_cpu_stats to avoid problems with
                reading incorrect u64 values directly from
                ipvs->total_stats->ustats.

                ip_vs_read_cpu_stats(&sum, ipvs->stats);
                seq_printf for sum
        }

        ip_vs_read_cpu_stats can be used also in ip_vs_copy_stats()
        which copies values to user space and needs proper u64 reading.
        But it is used only for svc and dest stats which are not
        percpu yet.

        Now this code does not look ok:

        ip_vs_zero_stats(net_ipvs(net)->ctl_stats);

        May be we need new func ip_vs_zero_percpu_stats that will
        reset stats for all CPUs, i.e. ipvs->stats in our case?
        Even if this operation is not very safe if done from
        user space while the softirq can modify u64 counters
        on another CPU.


Happy New Year!
--
To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

<Prev in Thread] Current Thread [Next in Thread>