Message ID | 20211020083854.1101670-1-atenart@kernel.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] net: sysctl data could be in .bss | expand |
Widening the CC list a little. On Wed, 20 Oct 2021 10:38:54 +0200 Antoine Tenart wrote: > A check is made when registering non-init netns sysctl files to ensure > their data pointer does not point to a global data section. This works > well for modules as the check is made against the whole module address > space (is_module_address). But when built-in, the check is made against > the .data section. However global variables initialized to 0 can be in > .bss (-fzero-initialized-in-bss). > > Add an extra check to make sure the sysctl data does not point to the > .bss section either. > > Signed-off-by: Antoine Tenart <atenart@kernel.org> > Reviewed-by: Jonathon Reinhart <jonathon.reinhart@gmail.com> > --- > Hello, > > This was previously sent as an RFC[1] waiting for a problematic sysctl > to be fixed. The fix was accepted and is now in the nf tree[2]. > > This is not sent as a fix to avoid possible new warnings in stable > kernels. (The actual fixes of sysctl files should go). > > I think this can go through the net-next tree as kernel/extable.c > doesn't seem to be under any subsystem and a conflict is unlikely to > happen. > [1] https://lore.kernel.org/all/20211012155542.827631-1-atenart@kernel.org/T/ > [2] https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git/commit/?id=174c376278949c44aad89c514a6b5db6cee8db59 > > include/linux/kernel.h | 1 + > kernel/extable.c | 8 ++++++++ > net/sysctl_net.c | 2 +- > 3 files changed, 10 insertions(+), 1 deletion(-) > > diff --git a/include/linux/kernel.h b/include/linux/kernel.h > index 2776423a587e..beb61d0ab220 100644 > --- a/include/linux/kernel.h > +++ b/include/linux/kernel.h > @@ -231,6 +231,7 @@ extern char *next_arg(char *args, char **param, char **val); > extern int core_kernel_text(unsigned long addr); > extern int init_kernel_text(unsigned long addr); > extern int core_kernel_data(unsigned long addr); > +extern int core_kernel_bss(unsigned long addr); Is the intention of these helpers to have strict section name semantics or higher level "is this global kernel data" semantics? If it's the latter we could make core_kernel_data() check bss instead, chances are all callers will either want that or not care. Steven? > extern int __kernel_text_address(unsigned long addr); > extern int kernel_text_address(unsigned long addr); > extern int func_ptr_is_kernel_text(void *ptr); > diff --git a/kernel/extable.c b/kernel/extable.c > index b0ea5eb0c3b4..477a4b6c8f63 100644 > --- a/kernel/extable.c > +++ b/kernel/extable.c > @@ -100,6 +100,14 @@ int core_kernel_data(unsigned long addr) > return 0; > } > > +int core_kernel_bss(unsigned long addr) > +{ > + if (addr >= (unsigned long)__bss_start && > + addr < (unsigned long)__bss_stop) > + return 1; > + return 0; > +} > + > int __kernel_text_address(unsigned long addr) > { > if (kernel_text_address(addr)) > diff --git a/net/sysctl_net.c b/net/sysctl_net.c > index f6cb0d4d114c..d883cf65029f 100644 > --- a/net/sysctl_net.c > +++ b/net/sysctl_net.c > @@ -144,7 +144,7 @@ static void ensure_safe_net_sysctl(struct net *net, const char *path, > addr = (unsigned long)ent->data; > if (is_module_address(addr)) > where = "module"; > - else if (core_kernel_data(addr)) > + else if (core_kernel_data(addr) || core_kernel_bss(addr)) > where = "kernel"; > else > continue;
On Fri, Oct 22, 2021 at 4:01 PM Jakub Kicinski <kuba@kernel.org> wrote: > > Widening the CC list a little. > > On Wed, 20 Oct 2021 10:38:54 +0200 Antoine Tenart wrote: > > A check is made when registering non-init netns sysctl files to ensure > > their data pointer does not point to a global data section. This works > > well for modules as the check is made against the whole module address > > space (is_module_address). But when built-in, the check is made against > > the .data section. However global variables initialized to 0 can be in > > .bss (-fzero-initialized-in-bss). > > > > Add an extra check to make sure the sysctl data does not point to the > > .bss section either. > > > > Signed-off-by: Antoine Tenart <atenart@kernel.org> > > Reviewed-by: Jonathon Reinhart <jonathon.reinhart@gmail.com> > > --- > > Hello, > > > > This was previously sent as an RFC[1] waiting for a problematic sysctl > > to be fixed. The fix was accepted and is now in the nf tree[2]. > > > > This is not sent as a fix to avoid possible new warnings in stable > > kernels. (The actual fixes of sysctl files should go). > > > > I think this can go through the net-next tree as kernel/extable.c > > doesn't seem to be under any subsystem and a conflict is unlikely to > > happen. > > > [1] https://lore.kernel.org/all/20211012155542.827631-1-atenart@kernel.org/T/ > > [2] https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git/commit/?id=174c376278949c44aad89c514a6b5db6cee8db59 > > > > include/linux/kernel.h | 1 + > > kernel/extable.c | 8 ++++++++ > > net/sysctl_net.c | 2 +- > > 3 files changed, 10 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/kernel.h b/include/linux/kernel.h > > index 2776423a587e..beb61d0ab220 100644 > > --- a/include/linux/kernel.h > > +++ b/include/linux/kernel.h > > @@ -231,6 +231,7 @@ extern char *next_arg(char *args, char **param, char **val); > > extern int core_kernel_text(unsigned long addr); > > extern int init_kernel_text(unsigned long addr); > > extern int core_kernel_data(unsigned long addr); > > +extern int core_kernel_bss(unsigned long addr); > > Is the intention of these helpers to have strict section name semantics > or higher level "is this global kernel data" semantics? If it's the > latter we could make core_kernel_data() check bss instead, chances are > all callers will either want that or not care. Steven? The core_kernel_data() function was introduced in a2d063ac216c1, and the commit message says: "It may or may not return true for RO data... This utility function is used to determine if data is safe from ever being freed. Thus it should return true for all RW global data that is not in a module or has been allocated, or false otherwise." The intent of the function seems to be more in line with the higher-level "is this global kernel data" semantics you suggested. The purpose seems to be to differentiate between "part of the loaded kernel image" vs. a dynamic allocation (which would include a loaded module image). And given that it *might* return true for RO data (depending on the arch linker script, presumably), I think it would be safe to include .bss -- clearly, with that caveat in place, it isn't promising strict section semantics. There are only two existing in-tree consumers: 1. __register_ftrace_function() [kernel/trace/ftrace.c] -- Sets FTRACE_OPS_FL_DYNAMIC if core_kernel_data(ops) returns false, which denotes "dynamically allocated ftrace_ops which need special care". It would be unlikely (if not impossible) for the "ops" object in question to be all-zero and end up in the .bss, but if it were, then the current behavior would be wrong. IOW, it would be more correct to include .bss. 2. ensure_safe_net_sysctl() [net/sysctl_net.c] (The subject of this thread) -- Trying to distinguish "global kernel data" (static/global variables) from kmalloc-allocated objects. More correct to include .bss. Both of these callers only seem to delineate between static and dynamic object allocations. Put another way, if core_kernel_bss(), all existing callers should be updated to check core_kernel_data() || core_kernel_bss(). Since Steven introduced it, and until I added ensure_safe_net_sysctl(), he / tracing was the only consumer. Thinking critically from the C language perspective, I can't come up with any case where one would actually expect core_kernel_data() to return true for 'int global = 1' and false for 'int global = 0'. In conclusion, I agree with your alternative proposal Jakub, and I think this patch is the right way forward: diff --git a/kernel/extable.c b/kernel/extable.c index b0ea5eb0c3b4..8b6f1d0bdaf6 100644 --- a/kernel/extable.c +++ b/kernel/extable.c @@ -97,6 +97,9 @@ int core_kernel_data(unsigned long addr) if (addr >= (unsigned long)_sdata && addr < (unsigned long)_edata) return 1; + if (addr >= (unsigned long)__bss_start && + addr < (unsigned long)__bss_stop) + return 1; return 0; } Jonathon Reinhart
On Mon, 8 Nov 2021 00:24:33 -0500 Jonathon Reinhart <jonathon.reinhart@gmail.com> wrote: > On Fri, Oct 22, 2021 at 4:01 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > Widening the CC list a little. Thanks! [..] > The core_kernel_data() function was introduced in a2d063ac216c1, and > the commit message says: > > "It may or may not return true for RO data... This utility function is > used to determine if data is safe from ever being freed. Thus it > should return true for all RW global data that is not in a module or > has been allocated, or false otherwise." > > The intent of the function seems to be more in line with the > higher-level "is this global kernel data" semantics you suggested. The > purpose seems to be to differentiate between "part of the loaded > kernel image" vs. a dynamic allocation (which would include a loaded > module image). And given that it *might* return true for RO data > (depending on the arch linker script, presumably), I think it would be > safe to include .bss -- clearly, with that caveat in place, it isn't > promising strict section semantics. > > There are only two existing in-tree consumers: > > 1. __register_ftrace_function() [kernel/trace/ftrace.c] -- Sets > FTRACE_OPS_FL_DYNAMIC if core_kernel_data(ops) returns false, which > denotes "dynamically allocated ftrace_ops which need special care". It > would be unlikely (if not impossible) for the "ops" object in question > to be all-zero and end up in the .bss, but if it were, then the > current behavior would be wrong. IOW, it would be more correct to > include .bss. > > 2. ensure_safe_net_sysctl() [net/sysctl_net.c] (The subject of this > thread) -- Trying to distinguish "global kernel data" (static/global > variables) from kmalloc-allocated objects. More correct to include > .bss. > > Both of these callers only seem to delineate between static and > dynamic object allocations. Put another way, if core_kernel_bss(), all > existing callers should be updated to check core_kernel_data() || > core_kernel_bss(). > > Since Steven introduced it, and until I added > ensure_safe_net_sysctl(), he / tracing was the only consumer. I agree with your analysis. The intent is that allocated ftrace_ops (things that function tracer uses to know what callbacks are called from function entry), must go through a very slow synchronization (rcu_synchronize_tasks). But this is not needed if the ftrace_ops is part of the core kernel (.data or .bss) as that will never be freed, and thus does not need to worry about it disappearing while they are still in use. > > Thinking critically from the C language perspective, I can't come up > with any case where one would actually expect core_kernel_data() to > return true for 'int global = 1' and false for 'int global = 0'. > > In conclusion, I agree with your alternative proposal Jakub, and I > think this patch is the right way forward: > > diff --git a/kernel/extable.c b/kernel/extable.c > index b0ea5eb0c3b4..8b6f1d0bdaf6 100644 > --- a/kernel/extable.c > +++ b/kernel/extable.c > @@ -97,6 +97,9 @@ int core_kernel_data(unsigned long addr) > if (addr >= (unsigned long)_sdata && > addr < (unsigned long)_edata) > return 1; > + if (addr >= (unsigned long)__bss_start && > + addr < (unsigned long)__bss_stop) > + return 1; > return 0; > } Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> -- Steve
Thanks Jonathon for the analysis and Steven for the review! I'll send the patch formally then. Antoine Quoting Steven Rostedt (2021-11-08 17:11:14) > On Mon, 8 Nov 2021 00:24:33 -0500 > Jonathon Reinhart <jonathon.reinhart@gmail.com> wrote: > > > On Fri, Oct 22, 2021 at 4:01 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > Widening the CC list a little. > > Thanks! > > [..] > > > The core_kernel_data() function was introduced in a2d063ac216c1, and > > the commit message says: > > > > "It may or may not return true for RO data... This utility function is > > used to determine if data is safe from ever being freed. Thus it > > should return true for all RW global data that is not in a module or > > has been allocated, or false otherwise." > > > > The intent of the function seems to be more in line with the > > higher-level "is this global kernel data" semantics you suggested. The > > purpose seems to be to differentiate between "part of the loaded > > kernel image" vs. a dynamic allocation (which would include a loaded > > module image). And given that it *might* return true for RO data > > (depending on the arch linker script, presumably), I think it would be > > safe to include .bss -- clearly, with that caveat in place, it isn't > > promising strict section semantics. > > > > There are only two existing in-tree consumers: > > > > 1. __register_ftrace_function() [kernel/trace/ftrace.c] -- Sets > > FTRACE_OPS_FL_DYNAMIC if core_kernel_data(ops) returns false, which > > denotes "dynamically allocated ftrace_ops which need special care". It > > would be unlikely (if not impossible) for the "ops" object in question > > to be all-zero and end up in the .bss, but if it were, then the > > current behavior would be wrong. IOW, it would be more correct to > > include .bss. > > > > 2. ensure_safe_net_sysctl() [net/sysctl_net.c] (The subject of this > > thread) -- Trying to distinguish "global kernel data" (static/global > > variables) from kmalloc-allocated objects. More correct to include > > .bss. > > > > Both of these callers only seem to delineate between static and > > dynamic object allocations. Put another way, if core_kernel_bss(), all > > existing callers should be updated to check core_kernel_data() || > > core_kernel_bss(). > > > > Since Steven introduced it, and until I added > > ensure_safe_net_sysctl(), he / tracing was the only consumer. > > I agree with your analysis. > > The intent is that allocated ftrace_ops (things that function tracer uses > to know what callbacks are called from function entry), must go through a > very slow synchronization (rcu_synchronize_tasks). But this is not needed > if the ftrace_ops is part of the core kernel (.data or .bss) as that will > never be freed, and thus does not need to worry about it disappearing while > they are still in use. > > > > > Thinking critically from the C language perspective, I can't come up > > with any case where one would actually expect core_kernel_data() to > > return true for 'int global = 1' and false for 'int global = 0'. > > > > In conclusion, I agree with your alternative proposal Jakub, and I > > think this patch is the right way forward: > > > > diff --git a/kernel/extable.c b/kernel/extable.c > > index b0ea5eb0c3b4..8b6f1d0bdaf6 100644 > > --- a/kernel/extable.c > > +++ b/kernel/extable.c > > @@ -97,6 +97,9 @@ int core_kernel_data(unsigned long addr) > > if (addr >= (unsigned long)_sdata && > > addr < (unsigned long)_edata) > > return 1; > > + if (addr >= (unsigned long)__bss_start && > > + addr < (unsigned long)__bss_stop) > > + return 1; > > return 0; > > } > > Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> > > -- Steve >
diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 2776423a587e..beb61d0ab220 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -231,6 +231,7 @@ extern char *next_arg(char *args, char **param, char **val); extern int core_kernel_text(unsigned long addr); extern int init_kernel_text(unsigned long addr); extern int core_kernel_data(unsigned long addr); +extern int core_kernel_bss(unsigned long addr); extern int __kernel_text_address(unsigned long addr); extern int kernel_text_address(unsigned long addr); extern int func_ptr_is_kernel_text(void *ptr); diff --git a/kernel/extable.c b/kernel/extable.c index b0ea5eb0c3b4..477a4b6c8f63 100644 --- a/kernel/extable.c +++ b/kernel/extable.c @@ -100,6 +100,14 @@ int core_kernel_data(unsigned long addr) return 0; } +int core_kernel_bss(unsigned long addr) +{ + if (addr >= (unsigned long)__bss_start && + addr < (unsigned long)__bss_stop) + return 1; + return 0; +} + int __kernel_text_address(unsigned long addr) { if (kernel_text_address(addr)) diff --git a/net/sysctl_net.c b/net/sysctl_net.c index f6cb0d4d114c..d883cf65029f 100644 --- a/net/sysctl_net.c +++ b/net/sysctl_net.c @@ -144,7 +144,7 @@ static void ensure_safe_net_sysctl(struct net *net, const char *path, addr = (unsigned long)ent->data; if (is_module_address(addr)) where = "module"; - else if (core_kernel_data(addr)) + else if (core_kernel_data(addr) || core_kernel_bss(addr)) where = "kernel"; else continue;