LVS
lvs-devel
Google
 
Web LinuxVirtualServer.org

[PATCH v3 00/14] sysctl: Add a size argument to register functions in sy

To: mcgrof@xxxxxxxxxx
Subject: [PATCH v3 00/14] sysctl: Add a size argument to register functions in sysctl
Cc: rds-devel@xxxxxxxxxxxxxx, "David S. Miller" <davem@xxxxxxxxxxxxx>, Florian Westphal <fw@xxxxxxxxx>, willy@xxxxxxxxxxxxx, Jan Karcher <jaka@xxxxxxxxxxxxx>, Wen Gu <guwen@xxxxxxxxxxxxxxxxx>, Simon Horman <horms@xxxxxxxxxxxx>, Tony Lu <tonylu@xxxxxxxxxxxxxxxxx>, linux-wpan@xxxxxxxxxxxxxxx, Matthieu Baerts <matthieu.baerts@xxxxxxxxxxxx>, Christian Borntraeger <borntraeger@xxxxxxxxxxxxx>, mptcp@xxxxxxxxxxxxxxx, Heiko Carstens <hca@xxxxxxxxxxxxx>, Stefan Schmidt <stefan@xxxxxxxxxxxxxxxxxx>, Will Deacon <will@xxxxxxxxxx>, Julian Anastasov <ja@xxxxxx>, netfilter-devel@xxxxxxxxxxxxxxx, Joerg Reuter <jreuter@xxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, Alexander Gordeev <agordeev@xxxxxxxxxxxxx>, linux-sctp@xxxxxxxxxxxxxxx, Xin Long <lucien.xin@xxxxxxxxx>, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>, linux-hams@xxxxxxxxxxxxxxx, Vasily Gorbik <gor@xxxxxxxxxxxxx>, coreteam@xxxxxxxxxxxxx, Ralf Baechle <ralf@xxxxxxxxxxxxxx>, Steffen Klassert <steffen.klassert@xxxxxxxxxxx>, Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx>, keescook@xxxxxxxxxxxx, Roopa Prabhu <roopa@xxxxxxxxxx>, David Ahern <dsahern@xxxxxxxxxx>, linux-arm-kernel@xxxxxxxxxxxxxxxxxxx, Catalin Marinas <catalin.marinas@xxxxxxx>, Jozsef Kadlecsik <kadlec@xxxxxxxxxxxxx>, Wenjia Zhang <wenjia@xxxxxxxxxxxxx>, josh@xxxxxxxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, Alexander Aring <alex.aring@xxxxxxxxx>, Nikolay Aleksandrov <razor@xxxxxxxxxxxxx>, netdev@xxxxxxxxxxxxxxx, Santosh Shilimkar <santosh.shilimkar@xxxxxxxxxx>, linux-s390@xxxxxxxxxxxxxxx, Sven Schnelle <svens@xxxxxxxxxxxxx>, "D. Wythe" <alibuda@xxxxxxxxxxxxxxxxx>, Eric Dumazet <edumazet@xxxxxxxxxx>, lvs-devel@xxxxxxxxxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, Paolo Abeni <pabeni@xxxxxxxxxx>, Iurii Zaikin <yzaikin@xxxxxxxxxx>, Marcelo Ricardo Leitner <marcelo.leitner@xxxxxxxxx>, bridge@xxxxxxxxxxxxxxxxxxxxxxxxxx, Karsten Graul <kgraul@xxxxxxxxxxxxx>, Mat Martineau <martineau@xxxxxxxxxx>, Miquel Raynal <miquel.raynal@xxxxxxxxxxx>, Jakub Kicinski <kuba@xxxxxxxxxx>, Joel Granados <j.granados@xxxxxxxxxxx>
From: Joel Granados <joel.granados@xxxxxxxxx>
Date: Wed, 9 Aug 2023 12:49:52 +0200
What?
These commits set things up so we can start removing the sentinel elements.
They modify sysctl and net_sysctl internals so that registering a ctl_table
that contains a sentinel gives the same result as passing a table_size
calculated from the ctl_table array without a sentinel. We accomplish this by
introducing a table_size argument in the same place where procname is checked
for NULL. The idea is for it to keep stopping when it hits ->procname == NULL,
while the sentinel is still present. And when the sentinel is removed, it will
stop on the table_size (thx to jani.nikula@xxxxxxxxxxxxxxx for the discussion
that led to this). This allows us to remove sentinels from one (or several)
files at a time.

These commits are part of a bigger set containing the removal of ctl_table 
sentinel
(https://github.com/Joelgranados/linux/tree/tag/sysctl_remove_empty_elem_V3).
The idea is to make the review process easier by chunking the 65+ commits into
manageable pieces.

My idea is to send out one chunk at a time so it can be reviewed separately
from the others without the noise from parallel related sets. After this first
chunk will come 6 that remove the sentinel element from "arch/*, drivers/*,
fs/*, kernel/*, net/* and miscellaneous. And then a final one that removes the
->procname == NULL check. You can see all commits here
(https://github.com/Joelgranados/linux/tree/tag/sysctl_remove_empty_elem_V3).

Why?
This is a preparation patch set that will make it easier for us to apply
subsequent patches that will remove the sentinel element (last empty element)
in the ctl_table arrays.

In itself, it does not remove any sentinels but it is needed to bring all the
advantages of the removal to fruition which is to help reduce the overall build
time size of the kernel and run time memory bloat by about ~64 bytes per
declared ctl_table array. It also ensures that future moves of sysctl arrays
out from kernel/sysctl.c to their own subsystem won't penalize in enlarging the
kernel build size or run time memory consumption. Without this patch set we
would have to put everything into one big commit making the review process that
much longer and harder for everyone.

Since it is so related to the removal of the sentinel element, its worth while
to give a bit of context on this:
* Good summary from Luis about why we want to remove the sentinels.
  https://lore.kernel.org/all/ZMFizKFkVxUFtSqa@xxxxxxxxxxxxxxxxxxxxxx/
* This is a patch set that replaces register_sysctl_table with register_sysctl
  https://lore.kernel.org/all/20230302204612.782387-1-mcgrof@xxxxxxxxxx/
* Patch set to deprecate register_sysctl_paths()
  https://lore.kernel.org/all/20230302202826.776286-1-mcgrof@xxxxxxxxxx/
* Here there is an explicit expectation for the removal of the sentinel element.
  https://lore.kernel.org/all/20230321130908.6972-1-frank.li@xxxxxxxx
* The "ARRAY_SIZE" approach was mentioned (proposed?) in this thread
  https://lore.kernel.org/all/20220220060626.15885-1-tangmeng@xxxxxxxxxxxxx

Commits in this chunk:
* Preparation commits:
    start : sysctl: Prefer ctl_table_header in proc_sysct
    end   : sysctl: Add size argument to init_header
  These are preparation commits that make sure that we have the
  ctl_table_header where we need the array size.

* Add size to __register_sysctl_table, __register_sysctl_init and 
register_sysctl
    start : sysctl: Add a size arg to __register_sysctl_table
    end   : sysctl: Add size arg to __register_sysctl_init
  Here we replace the existing register functions with macros that add the
  ARRAY_SIZE automatically. Unfortunately these macros cannot be used for the
  register calls that pass a pointer; in this situation we add register
  functions with an table_size argument (thx to greg@xxxxxxxxx for bringing
  this to my attention)

* Add size to register_net_sysctl
    start : sysctl: Add size to register_net_sysctl function
    end   : sysctl: SIZE_MAX->ARRAY_SIZE in register_net_sysctl
  register_net_sysctl is an indirection function to the sysctl registrations
  and needed a several commits to add table_size to all its callers. We
  temporarily use SIZE_MAX to avoid compiler warnings while we change to
  register_net_sysctl to register_net_sysctl_sz; we remove it with the
  penultimate patch of this set. Finally, we make sure to adjust the calculated
  size every time there is a check for unprivileged users.

* Add size as additional stopping criteria
    commit : sysctl: Use ctl_table_size as stopping criteria for list macro
  We add table_size check in the main macro within proc_sysctl.c. This commit
  allows the removal of the sentinel element by chunks.

Testing:
* Ran sysctl selftests (./tools/testing/selftests/sysctl/sysctl.sh)
* Successfully ran this through 0-day

Size saving estimates:
A consequence of eventually removing all the sentinels (64 bytes per sentinel)
is the bytes we save. These are *not* numbers that we will get after this patch
set; these are the numbers that we will get after removing all the sentinels. I
included them here because they are relevant and to get an idea of just how
much memory we are talking about.
  * bloat-o-meter:
    The "yesall" configuration results save 9158 bytes (you can see the output 
here
    https://lore.kernel.org/all/20230621091000.424843-1-j.granados@xxxxxxxxxxx/.
    The "tiny" configuration + CONFIG_SYSCTL save 1215 bytes (you can see the
    output here [2])
  * memory usage:
    As we no longer need the sentinel element within proc_sysctl.c, we save some
    bytes in main memory as well. In my testing kernel I measured a difference 
of
    6720 bytes. I include the way to measure this in [1]

Comments/feedback greatly appreciated

V3:
* Updated tags from mailing list
* Corrected an off-by-one error in
  https://lore.kernel.org/all/22e0e672-f9f6-6afe-6ce6-63de264e7b6d@xxxxxxxxx
* Fixed a bug where we would have erroneously registered ctl_table to
  unprivileged ipv6 users
* Rebased on v6.5-rc5
* Rebase the bigger patchset located at
  https://github.com/Joelgranados/linux/tree/tag/sysctl_remove_empty_elem_V3 on
  top of this version

V2:
* Dropped moving mpls_table up the af_mpls.c file. We don't need it any longer
  as it is not really used before its current location.
* Added/Clarified the why in several commit messages that were missing it.
* Clarified the why in the cover letter to be "to make it easier to apply
  subsequent patches that will remove the sentinels"
* Added documentation for table_size
* Added suggested by tags (Greg and Jani) to relevant commits

Best
Joel

[1]
To measure the in memory savings apply this patch on top of
https://github.com/Joelgranados/linux/tree/tag/sysctl_remove_empty_elem_V1
"
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 5f413bfd6271..9aa8374c0ef1 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -975,6 +975,7 @@ static struct ctl_dir *new_dir(struct ctl_table_set *set,
        table[0].procname = new_name;
        table[0].mode = S_IFDIR|S_IRUGO|S_IXUGO;
        init_header(&new->header, set->dir.header.root, set, node, table, 1);
+       printk("%ld sysctl saved mem kzalloc \n", sizeof(struct ctl_table));

        return new;
 }
@@ -1202,6 +1203,7 @@ static struct ctl_table_header *new_links(struct ctl_dir 
*dir, struct ctl_table_
                    head->ctl_table_size);
        links->nreg = head->ctl_table_size;

+       printk("%ld sysctl saved mem kzalloc \n", sizeof(struct ctl_table));
        return links;
 }

"
and then run the following bash script in the kernel:

accum=0
for n in $(dmesg | grep kzalloc | awk '{print $3}') ; do
    echo $n
    accum=$(calc "$accum + $n")
done
echo $accum

[2]
bloat-o-meter with "tiny" config:
add/remove: 0/2 grow/shrink: 33/24 up/down: 470/-1685 (-1215)
Function                                     old     new   delta
insert_header                                831     966    +135
__register_sysctl_table                      971    1092    +121
get_links                                    177     226     +49
put_links                                    167     186     +19
erase_header                                  55      66     +11
sysctl_init_bases                             59      69     +10
setup_sysctl_set                              65      73      +8
utsname_sysctl_init                           26      31      +5
sld_mitigate_sysctl_init                      33      38      +5
setup_userns_sysctls                         158     163      +5
sched_rt_sysctl_init                          33      38      +5
sched_fair_sysctl_init                        33      38      +5
sched_dl_sysctl_init                          33      38      +5
random_sysctls_init                           33      38      +5
page_writeback_init                          122     127      +5
oom_init                                      73      78      +5
kernel_panic_sysctls_init                     33      38      +5
kernel_exit_sysctls_init                      33      38      +5
init_umh_sysctls                              33      38      +5
init_signal_sysctls                           33      38      +5
init_pipe_fs                                  94      99      +5
init_fs_sysctls                               33      38      +5
init_fs_stat_sysctls                          33      38      +5
init_fs_namespace_sysctls                     33      38      +5
init_fs_namei_sysctls                         33      38      +5
init_fs_inode_sysctls                         33      38      +5
init_fs_exec_sysctls                          33      38      +5
init_fs_dcache_sysctls                        33      38      +5
register_sysctl                               22      25      +3
__register_sysctl_init                         9      12      +3
user_namespace_sysctl_init                   149     151      +2
sched_core_sysctl_init                        38      40      +2
register_sysctl_mount_point                   13      15      +2
vm_table                                    1344    1280     -64
vm_page_writeback_sysctls                    512     448     -64
vm_oom_kill_table                            256     192     -64
uts_kern_table                               448     384     -64
usermodehelper_table                         192     128     -64
user_table                                   576     512     -64
sld_sysctls                                  128      64     -64
signal_debug_table                           128      64     -64
sched_rt_sysctls                             256     192     -64
sched_fair_sysctls                           128      64     -64
sched_dl_sysctls                             192     128     -64
sched_core_sysctls                            64       -     -64
root_table                                   128      64     -64
random_table                                 448     384     -64
namei_sysctls                                320     256     -64
kern_table                                  1792    1728     -64
kern_panic_table                             128      64     -64
kern_exit_table                              128      64     -64
inodes_sysctls                               192     128     -64
fs_stat_sysctls                              256     192     -64
fs_shared_sysctls                            192     128     -64
fs_pipe_sysctls                              256     192     -64
fs_namespace_sysctls                         128      64     -64
fs_exec_sysctls                              128      64     -64
fs_dcache_sysctls                            128      64     -64
init_header                                   85       -     -85
Total: Before=1877669, After=1876454, chg -0.06%

base:  fdf0eaf11452

Joel Granados (14):
  sysctl: Prefer ctl_table_header in proc_sysctl
  sysctl: Use ctl_table_header in list_for_each_table_entry
  sysctl: Add ctl_table_size to ctl_table_header
  sysctl: Add size argument to init_header
  sysctl: Add a size arg to __register_sysctl_table
  sysctl: Add size to register_sysctl
  sysctl: Add size arg to __register_sysctl_init
  sysctl: Add size to register_net_sysctl function
  ax.25: Update to register_net_sysctl_sz
  netfilter: Update to register_net_sysctl_sz
  networking: Update to register_net_sysctl_sz
  vrf: Update to register_net_sysctl_sz
  sysctl: SIZE_MAX->ARRAY_SIZE in register_net_sysctl
  sysctl: Use ctl_table_size as stopping criteria for list macro

 arch/arm64/kernel/armv8_deprecated.c    |  2 +-
 arch/s390/appldata/appldata_base.c      |  2 +-
 drivers/net/vrf.c                       |  3 +-
 fs/proc/proc_sysctl.c                   | 90 +++++++++++++------------
 include/linux/sysctl.h                  | 31 +++++++--
 include/net/ipv6.h                      |  2 +
 include/net/net_namespace.h             | 10 +--
 ipc/ipc_sysctl.c                        |  4 +-
 ipc/mq_sysctl.c                         |  4 +-
 kernel/ucount.c                         |  5 +-
 net/ax25/sysctl_net_ax25.c              |  3 +-
 net/bridge/br_netfilter_hooks.c         |  3 +-
 net/core/neighbour.c                    |  8 ++-
 net/core/sysctl_net_core.c              |  3 +-
 net/ieee802154/6lowpan/reassembly.c     |  8 ++-
 net/ipv4/devinet.c                      |  3 +-
 net/ipv4/ip_fragment.c                  |  3 +-
 net/ipv4/route.c                        |  8 ++-
 net/ipv4/sysctl_net_ipv4.c              |  3 +-
 net/ipv4/xfrm4_policy.c                 |  3 +-
 net/ipv6/addrconf.c                     |  3 +-
 net/ipv6/icmp.c                         |  5 ++
 net/ipv6/netfilter/nf_conntrack_reasm.c |  3 +-
 net/ipv6/reassembly.c                   |  3 +-
 net/ipv6/route.c                        |  9 +++
 net/ipv6/sysctl_net_ipv6.c              | 16 +++--
 net/ipv6/xfrm6_policy.c                 |  3 +-
 net/mpls/af_mpls.c                      |  6 +-
 net/mptcp/ctrl.c                        |  3 +-
 net/netfilter/ipvs/ip_vs_ctl.c          |  8 ++-
 net/netfilter/ipvs/ip_vs_lblc.c         | 10 ++-
 net/netfilter/ipvs/ip_vs_lblcr.c        | 10 ++-
 net/netfilter/nf_conntrack_standalone.c |  4 +-
 net/netfilter/nf_log.c                  |  7 +-
 net/rds/tcp.c                           |  3 +-
 net/sctp/sysctl.c                       |  4 +-
 net/smc/smc_sysctl.c                    |  3 +-
 net/sysctl_net.c                        | 26 ++++---
 net/unix/sysctl_net_unix.c              |  3 +-
 net/xfrm/xfrm_sysctl.c                  |  8 ++-
 40 files changed, 222 insertions(+), 113 deletions(-)

-- 
2.30.2


<Prev in Thread] Current Thread [Next in Thread>