Hello,
On Thu, 7 Mar 2024, Michael Weiß wrote:
> Configuring ipvs in a non-initial user namespace using the genl
> netlink interface, e.g., by 'ipvsadm' is currently resulting in an
> '-EPERM'. This is due to the use of GENL_ADMIN_PERM flag in
> 'ip_vs_ctl.c'.
>
> Similarly to other genl interfaces, we switch to the use of
> GENL_UNS_ADMIN_PERM flag which allows connection from non-initial
> user namespace. Thus, it would be feasible to configure ipvs using
> the genl interface also from within an unprivileged system container.
>
> Since adding of new services and new dests are triggered from
> userspace, accounting for the corresponding memory allocations in
> ip_vs_new_dest() and ip_vs_add_service() is activated.
>
> We tested this by simply running some samples from "man ipvsadm"
> within an unprivileged user namespaced system container in GyroidOS.
> Further, we successfully passed an adapted version of the ipvs
> selftest in 'tools/testing/selftests/netfilter/ipvs.sh' using
> preliminary created network namespaces from unprivileged GyroidOS
> containers.
I planned such change but as followup patchset to other
work which converts many structures to be per-netns.
There is a RFC v2 patchset for reference:
https://archive.linuxvirtualserver.org/html/lvs-devel/2023-12/index.html
My goal was to isolate the different namespaces as much as
possible: different structures, different kthreads, etc. with the
goal to reduce the security risks of giving power to unprivileged roots.
Such isolation should help when namespaces are served from different CPUs.
May be I should push fresh v3 soon, so that we can later use
GFP_KERNEL_ACCOUNT not only for services and dests but also
for allocations by schedulers, estimators, etc. The access to
sysctl vars should be enabled too, around comment
"Don't export sysctls to unprivileged users",
alloc_percpu => alloc_percpu_gfp(,GFP_KERNEL_ACCOUNT),
SLAB_ACCOUNT for kmem_cache_create, not sure about __GFP_NOWARN and
__GFP_NORETRY usage too.
Not sure about the sysctl vars: now they are cloned from
init_net, do we give full access for writing, some can be privileged,
etc.
I didn't push such changes yet because I'm not sure what
is needed: looks like, for now, what was needed is root from init_net to
control rules in different netns and there was no demand from the
virtualization world to extend this. If we can clearly define what is
good and what is bad from security perspective, we can go with such
changes after pushing the above patchset, i.e. the GENL_UNS_ADMIN_PERM
change should follow all other changes.
> Signed-off-by: Michael Weiß <michael.weiss@xxxxxxxxxxxxxxxxxxx>
> ---
> net/netfilter/ipvs/ip_vs_ctl.c | 36 +++++++++++++++++-----------------
> 1 file changed, 18 insertions(+), 18 deletions(-)
>
> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> index 143a341bbc0a..d39120c64207 100644
> --- a/net/netfilter/ipvs/ip_vs_ctl.c
> +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> @@ -1080,7 +1080,7 @@ ip_vs_new_dest(struct ip_vs_service *svc, struct
> ip_vs_dest_user_kern *udest)
> return -EINVAL;
> }
>
> - dest = kzalloc(sizeof(struct ip_vs_dest), GFP_KERNEL);
> + dest = kzalloc(sizeof(struct ip_vs_dest), GFP_KERNEL_ACCOUNT);
> if (dest == NULL)
> return -ENOMEM;
>
> @@ -1421,7 +1421,7 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct
> ip_vs_service_user_kern *u,
> ret_hooks = ret;
> }
>
> - svc = kzalloc(sizeof(struct ip_vs_service), GFP_KERNEL);
> + svc = kzalloc(sizeof(struct ip_vs_service), GFP_KERNEL_ACCOUNT);
> if (svc == NULL) {
> IP_VS_DBG(1, "%s(): no memory\n", __func__);
> ret = -ENOMEM;
> @@ -4139,98 +4139,98 @@ static const struct genl_small_ops ip_vs_genl_ops[] =
> {
> {
> .cmd = IPVS_CMD_NEW_SERVICE,
> .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
> - .flags = GENL_ADMIN_PERM,
> + .flags = GENL_UNS_ADMIN_PERM,
> .doit = ip_vs_genl_set_cmd,
...
Regards
--
Julian Anastasov <ja@xxxxxx>
|