Hello,
On Thu, 30 Dec 2010, hans@xxxxxxxxxxxxxxx wrote:
From: Hans Schillstrom <hans.schillstrom@xxxxxxxxxxxx>
This patch series adds network name space support to the LVS.
REVISION
This is version 3
OVERVIEW
The patch doesn't remove or add any functionality except for netns.
For users that don't use network name space (netns) this patch is
completely transparent.
Now it's possible to run LVS in a Linux container (see lxc-tools)
i.e. a light weight visualization. For example it's possible to run
one or several lvs on a real server in their own network name spaces.
From the LVS point of view it looks like it runs on it's own machine.
Only some comments for two of the patches:
v3 PATCH 10/22 use ip_vs_proto_data as param
- Can ip_vs_protocol_timeout_change() walk
proto_data_table instead of ip_vs_proto_table to avoid the
__ipvs_proto_data_get call?
v3 PATCH 15/22 ip_vs_stats
- Is ustats_seq allocated with alloc_percpu?
Such reader sections should be changed to use tmp vars because
on retry we risk to add the values multiple times. For example:
do {
start = read_seqcount_begin(seq_count);
ipvs->ctl_stats->ustats.inbytes += u->inbytes;
ipvs->ctl_stats->ustats.outbytes += u->outbytes;
} while (read_seqcount_retry(seq_count, start));
should be changed as follows:
u64 inbytes, outbytes;
do {
start = read_seqcount_begin(seq_count);
inbytes = u->inbytes;
outbytes = u->outbytes;
} while (read_seqcount_retry(seq_count, start));
ipvs->ctl_stats->ustats.inbytes += inbytes;
ipvs->ctl_stats->ustats.outbytes += outbytes;
Or it is better to create new struct for percpu stats,
they will have their own syncp, because we can not
change struct ip_vs_stats_user. syncp should be percpu
because we remove locks.
For example:
struct ip_vs_cpu_stats {
struct ip_vs_stats_user ustats;
struct u64_stats_sync syncp;
};
Then we can add this in struct netns_ipvs:
struct ip_vs_cpu_stats __percpu *stats; /* Statistics */
without the seqcount_t * ustats_seq;
Then syncp does not need any initialization, it seems
alloc_percpu returns zeroed area.
When we use percpu stats for all places (dest and svc) we
can create new struct struct ip_vs_counters, so that we
can reduce the memory usage from percpu data. Now stats
include counters and estimated values. The estimated
values should not be percpu. Then ip_vs_cpu_stats
will be shorter (it is not visible to user space):
struct ip_vs_cpu_stats {
struct ip_vs_counters ustats;
struct u64_stats_sync syncp;
};
For writer side in softirq context we should protect the whole
section with u64 counters, for example this code:
spin_lock(&ip_vs_stats.lock);
ip_vs_stats.ustats.outpkts++;
ip_vs_stats.ustats.outbytes += skb->len;
spin_unlock(&ip_vs_stats.lock);
should be changed to:
struct ip_vs_cpu_stats *s = this_cpu_ptr(ipvs->stats);
u64_stats_update_begin(&s->syncp);
s->ustats.outpkts++;
s->ustats.outbytes += skb->len;
u64_stats_update_end(&s->syncp);
Readers should look in similar way, only that _bh fetching
should be used only for the u64 counters because writer is in
softirq context:
u64 inbytes, outbytes;
for_each_possible_cpu(i) {
const struct ip_vs_cpu_stats *s = per_cpu_ptr(ipvs->stats, i);
unsigned int start;
...
do {
start = u64_stats_fetch_begin_bh(&s->syncp);
inbytes = s->ustats.inbytes;
outbytes = s->ustats.outbytes;
} while (u64_stats_fetch_retry_bh(&s->syncp, start));
ipvs->ctl_stats->ustats.inbytes += inbytes;
ipvs->ctl_stats->ustats.outbytes += outbytes;
}
Then IPVS_STAT_ADD and IPVS_STAT_INC can not access
the syncp. They do not look generic enough, in case we
decide to use struct ip_vs_cpu_stats for dest and svc stats.
I think, these macros are not needed.
It is possible to have other macros for write side,
for percpu stats:
#define IP_VS_STATS_UPDATE_BEGIN(stats) \
{ struct ip_vs_cpu_stats *s = this_cpu_ptr(stats); \
u64_stats_update_begin(&s->syncp)
#define IP_VS_STATS_UPDATE_END() \
u64_stats_update_end(&s->syncp); \
}
Then usage can be:
IP_VS_STATS_UPDATE_BEGIN(ipvs->stats);
s->ustats.outpkts++;
s->ustats.outbytes += skb->len;
IP_VS_STATS_UPDATE_END();
Then we can rename ctl_stats to total_stats or full_stats.
get_stats() can be changed to ip_vs_read_cpu_stats(), it
will be used to read all percpu stats as sum in the
total_stats/full_stats structure or tmp space:
/* Get sum of percpu stats */
... ip_vs_read_cpu_stats(struct ip_vs_stats_user *sum,
struct ip_vs_cpu_stats *stats)
{
...
for_each_possible_cpu(i) {
const struct ip_vs_cpu_stats *s = per_cpu_ptr(stats, i);
unsigned int start;
if (i) {...
...
do {
start = u64_stats_fetch_begin_bh(&s->syncp);
inbytes = s->ustats.inbytes;
outbytes = s->ustats.outbytes;
} while (u64_stats_fetch_retry_bh(&s->syncp, start));
sum->inbytes += inbytes;
sum->outbytes += outbytes;
}
}
As result, ip_vs_read_cpu_stats() will be used in
estimation_timer() instead of get_stats():
ip_vs_read_cpu_stats(&ipvs->total_stats->ustats, ipvs->stats);
but also in ip_vs_stats_show()
{
// I hope struct ip_vs_stats_user can fit in stack:
struct ip_vs_stats_user sum;
here we do not need to use spin_lock_bh(&ctl_stats->lock);
because now the stats are lockless, estimation_timer
does not use ctl_stats->lock, so we have to call
ip_vs_read_cpu_stats to avoid problems with
reading incorrect u64 values directly from
ipvs->total_stats->ustats.
ip_vs_read_cpu_stats(&sum, ipvs->stats);
seq_printf for sum
}
ip_vs_read_cpu_stats can be used also in ip_vs_copy_stats()
which copies values to user space and needs proper u64 reading.
But it is used only for svc and dest stats which are not
percpu yet.
Now this code does not look ok:
ip_vs_zero_stats(net_ipvs(net)->ctl_stats);
May be we need new func ip_vs_zero_percpu_stats that will
reset stats for all CPUs, i.e. ipvs->stats in our case?
Even if this operation is not very safe if done from
user space while the softirq can modify u64 counters
on another CPU.
Happy New Year!
--
To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
|