Message ID | YgEeQNdgBuHRyEWl@dumbo (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Fix regression due to "fs: move binfmt_misc sysctl to its own file" | expand |
On Mon, Feb 07, 2022 at 02:27:28PM +0100, Domenico Andreoli wrote: > Commit 3ba442d5331f did not go unnoticed, binfmt-support stopped to > work on my Debian system since v5.17-rc2 (did not check with -rc1). > > The existance of /proc/sys/fs/binfmt_misc is a precondition for > attempting to mount the binfmt_misc fs, which in turn triggers the > autoload of the binfmt_misc module. Without it, no module is loaded > and no binfmt is available at boot. > > Building as built-in or manually loading the module and mounting the fs > works fine, it's therefore only a matter of interaction with user-space. > > I could try to improve the Debian systemd configuration but I can't > say anything about the other distributions. > > In the meanwhile this patch restores a working system right after boot. > > Fixes: 3ba442d5331f ("fs: move binfmt_misc sysctl to its own file") > Signed-off-by: Domenico Andreoli <domenico.andreoli@linux.com> > Cc: Al Viro <viro@zeniv.linux.org.uk> > Cc: Amir Goldstein <amir73il@gmail.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Cc: Antti Palosaari <crope@iki.fi> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Benjamin LaHaise <bcrl@kvack.org> > Cc: Clemens Ladisch <clemens@ladisch.de> > Cc: David Airlie <airlied@linux.ie> > Cc: Douglas Gilbert <dgilbert@interlog.com> > Cc: Eric Biederman <ebiederm@xmission.com> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Iurii Zaikin <yzaikin@google.com> > Cc: James E.J. Bottomley <jejb@linux.ibm.com> > Cc: Jan Kara <jack@suse.cz> > Cc: Jani Nikula <jani.nikula@intel.com> > Cc: Jani Nikula <jani.nikula@linux.intel.com> > Cc: Joel Becker <jlbec@evilplan.org> > Cc: John Ogness <john.ogness@linutronix.de> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: Joseph Qi <joseph.qi@linux.alibaba.com> > Cc: Julia Lawall <julia.lawall@inria.fr> > Cc: Kees Cook <keescook@chromium.org> > Cc: Lukas Middendorf <kernel@tuxforce.de> > Cc: Luis Chamberlain <mcgrof@kernel.org> > Cc: Mark Fasheh <mark@fasheh.com> > Cc: Martin K. Petersen <martin.petersen@oracle.com> > Cc: Paul Turner <pjt@google.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Petr Mladek <pmladek@suse.com> > Cc: Phillip Potter <phil@philpotter.co.uk> > Cc: Qing Wang <wangqing@vivo.com> > Cc: "Rafael J. Wysocki" <rafael@kernel.org> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> > Cc: Sebastian Reichel <sre@kernel.org> > Cc: Sergey Senozhatsky <senozhatsky@chromium.org> > Cc: Stephen Kitt <steve@sk2.org> > Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> > Cc: Suren Baghdasaryan <surenb@google.com> > Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Cc: "Theodore Ts'o" <tytso@mit.edu> > Cc: Xiaoming Ni <nixiaoming@huawei.com> > > --- > fs/binfmt_misc.c | 6 +----- > kernel/sysctl.c | 13 +++++++++++++ > 2 files changed, 14 insertions(+), 5 deletions(-) > > Index: b/fs/binfmt_misc.c > =================================================================== > --- a/fs/binfmt_misc.c > +++ b/fs/binfmt_misc.c > @@ -817,20 +817,16 @@ static struct file_system_type bm_fs_typ > }; > MODULE_ALIAS_FS("binfmt_misc"); > > -static struct ctl_table_header *binfmt_misc_header; > - > static int __init init_misc_binfmt(void) > { > int err = register_filesystem(&bm_fs_type); > if (!err) > insert_binfmt(&misc_format); > - binfmt_misc_header = register_sysctl_mount_point("fs/binfmt_misc"); > - return 0; > + return err; > } OK I think the issue here should have been that declaring this on fs/binfmt_misc.c creates the chicken and the egg issue, and so we need to do this on built-in code. Instead of your patch can you try this instead, it just always creates the sysctl mount always now so long as proc sysctl is enabled. It does not do the unregistration as we always want the path present as it used to be before as well. diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index c07f35719ee3..4b8f1b11a7c8 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -817,20 +817,16 @@ static struct file_system_type bm_fs_type = { }; MODULE_ALIAS_FS("binfmt_misc"); -static struct ctl_table_header *binfmt_misc_header; - static int __init init_misc_binfmt(void) { int err = register_filesystem(&bm_fs_type); if (!err) insert_binfmt(&misc_format); - binfmt_misc_header = register_sysctl_mount_point("fs/binfmt_misc"); return 0; } static void __exit exit_misc_binfmt(void) { - unregister_sysctl_table(binfmt_misc_header); unregister_binfmt(&misc_format); unregister_filesystem(&bm_fs_type); } diff --git a/fs/file_table.c b/fs/file_table.c index 57edef16dce4..4f4739c9405c 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -119,6 +119,7 @@ static struct ctl_table fs_stat_sysctls[] = { static int __init init_fs_stat_sysctls(void) { register_sysctl_init("fs", fs_stat_sysctls); + register_sysctl_mount_point("fs/binfmt_misc"); return 0; } fs_initcall(init_fs_stat_sysctls);
On Mon, Feb 7, 2022 at 1:46 PM Luis Chamberlain <mcgrof@kernel.org> wrote: > > OK I think the issue here should have been that declaring this on > fs/binfmt_misc.c creates the chicken and the egg issue, and so we > need to do this on built-in code. Instead of your patch can you try > this instead, it just always creates the sysctl mount always now > so long as proc sysctl is enabled. It does not do the unregistration > as we always want the path present as it used to be before as well. > > diff --git a/fs/file_table.c b/fs/file_table.c > index 57edef16dce4..4f4739c9405c 100644 > --- a/fs/file_table.c > +++ b/fs/file_table.c > @@ -119,6 +119,7 @@ static struct ctl_table fs_stat_sysctls[] = { > static int __init init_fs_stat_sysctls(void) > { > register_sysctl_init("fs", fs_stat_sysctls); > + register_sysctl_mount_point("fs/binfmt_misc"); > return 0; > } > fs_initcall(init_fs_stat_sysctls); I'm looking at the original code, and it seems that if we don't select CONFIG_BINFMT_MISC at all, this file won't be created. This would suggest an IF MACRO around > + register_sysctl_mount_point("fs/binfmt_misc"); and it should looks like +#if IS_ENABLED(CONFIG_BINFMT_MISC) + register_sysctl_mount_point("fs/binfmt_misc"); +#endif or if you prefer original style: #if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE) - Tong
[TLDR: I'm adding the regression report below to regzbot, the Linux kernel regression tracking bot; all text you find below is compiled from a few templates paragraphs you might have encountered already already from similar mails.] Hi, this is your Linux kernel regression tracker speaking. CCing the regression mailing list, as it should be in the loop for all regressions, as explained here: https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html On 07.02.22 14:27, Domenico Andreoli wrote: > Commit 3ba442d5331f did not go unnoticed, binfmt-support stopped to > work on my Debian system since v5.17-rc2 (did not check with -rc1). > > The existance of /proc/sys/fs/binfmt_misc is a precondition for > attempting to mount the binfmt_misc fs, which in turn triggers the > autoload of the binfmt_misc module. Without it, no module is loaded > and no binfmt is available at boot. > > Building as built-in or manually loading the module and mounting the fs > works fine, it's therefore only a matter of interaction with user-space. > > I could try to improve the Debian systemd configuration but I can't > say anything about the other distributions. > > In the meanwhile this patch restores a working system right after boot. > [...] To be sure this issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression tracking bot: #regzbot ^introduced 3ba442d5331f #regzbot title binfmt-support stopped to work #regzbot ignore-activity Reminder for developers: when fixing the issue, please add a 'Link:' tags pointing to the report (the mail quoted above) using lore.kernel.org/r/, as explained in 'Documentation/process/submitting-patches.rst' and 'Documentation/process/5.Posting.rst'. This allows the bot to connect the report with any patches posted or committed to fix the issue; this again allows the bot to show the current status of regressions and automatically resolve the issue when the fix hits the right tree. I'm sending this to everyone that got the initial report, to make them aware of the tracking. I also hope that messages like this motivate people to directly get at least the regression mailing list and ideally even regzbot involved when dealing with regressions, as messages like this wouldn't be needed then. Don't worry, I'll send further messages wrt to this regression just to the lists (with a tag in the subject so people can filter them away), if they are relevant just for regzbot. With a bit of luck no such messages will be needed anyway. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I'm getting a lot of reports on my table. I can only look briefly into most of them and lack knowledge about most of the areas they concern. I thus unfortunately will sometimes get things wrong or miss something important. I hope that's not the case here; if you think it is, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.
On Mon, Feb 07, 2022 at 02:53:21PM -0800, Tong Zhang wrote: > On Mon, Feb 7, 2022 at 1:46 PM Luis Chamberlain <mcgrof@kernel.org> wrote: > > > > OK I think the issue here should have been that declaring this on > > fs/binfmt_misc.c creates the chicken and the egg issue, and so we > > need to do this on built-in code. Instead of your patch can you try > > this instead, it just always creates the sysctl mount always now > > so long as proc sysctl is enabled. It does not do the unregistration > > as we always want the path present as it used to be before as well. > > > > diff --git a/fs/file_table.c b/fs/file_table.c > > index 57edef16dce4..4f4739c9405c 100644 > > --- a/fs/file_table.c > > +++ b/fs/file_table.c > > @@ -119,6 +119,7 @@ static struct ctl_table fs_stat_sysctls[] = { > > static int __init init_fs_stat_sysctls(void) > > { > > register_sysctl_init("fs", fs_stat_sysctls); > > + register_sysctl_mount_point("fs/binfmt_misc"); > > return 0; > > } > > fs_initcall(init_fs_stat_sysctls); > > I'm looking at the original code, and it seems that if we don't select > CONFIG_BINFMT_MISC at all, > this file won't be created. This would suggest an IF MACRO around > > + register_sysctl_mount_point("fs/binfmt_misc"); > and it should looks like > +#if IS_ENABLED(CONFIG_BINFMT_MISC) > + register_sysctl_mount_point("fs/binfmt_misc"); > +#endif > or if you prefer original style: > #if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE) Or better yet using IS_ENABLED() to avoid the ifdef cruft: diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index c07f35719ee3..4b8f1b11a7c8 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -817,20 +817,16 @@ static struct file_system_type bm_fs_type = { }; MODULE_ALIAS_FS("binfmt_misc"); -static struct ctl_table_header *binfmt_misc_header; - static int __init init_misc_binfmt(void) { int err = register_filesystem(&bm_fs_type); if (!err) insert_binfmt(&misc_format); - binfmt_misc_header = register_sysctl_mount_point("fs/binfmt_misc"); return 0; } static void __exit exit_misc_binfmt(void) { - unregister_sysctl_table(binfmt_misc_header); unregister_binfmt(&misc_format); unregister_filesystem(&bm_fs_type); } diff --git a/fs/file_table.c b/fs/file_table.c index 57edef16dce4..4969021fa676 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -119,6 +119,8 @@ static struct ctl_table fs_stat_sysctls[] = { static int __init init_fs_stat_sysctls(void) { register_sysctl_init("fs", fs_stat_sysctls); + if (IS_ENABLED(CONFIG_BINFMT_MISC)) + register_sysctl_mount_point("fs/binfmt_misc"); return 0; } fs_initcall(init_fs_stat_sysctls);
On Tue, Feb 08, 2022 at 09:20:42AM -0800, Luis Chamberlain wrote: > On Mon, Feb 07, 2022 at 02:53:21PM -0800, Tong Zhang wrote: > > On Mon, Feb 7, 2022 at 1:46 PM Luis Chamberlain <mcgrof@kernel.org> wrote: > > > > > > OK I think the issue here should have been that declaring this on > > > fs/binfmt_misc.c creates the chicken and the egg issue, and so we > > > need to do this on built-in code. Instead of your patch can you try > > > this instead, it just always creates the sysctl mount always now > > > so long as proc sysctl is enabled. It does not do the unregistration > > > as we always want the path present as it used to be before as well. > > > > > > diff --git a/fs/file_table.c b/fs/file_table.c > > > index 57edef16dce4..4f4739c9405c 100644 > > > --- a/fs/file_table.c > > > +++ b/fs/file_table.c > > > @@ -119,6 +119,7 @@ static struct ctl_table fs_stat_sysctls[] = { > > > static int __init init_fs_stat_sysctls(void) > > > { > > > register_sysctl_init("fs", fs_stat_sysctls); > > > + register_sysctl_mount_point("fs/binfmt_misc"); > > > return 0; > > > } > > > fs_initcall(init_fs_stat_sysctls); > > > > I'm looking at the original code, and it seems that if we don't select > > CONFIG_BINFMT_MISC at all, > > this file won't be created. This would suggest an IF MACRO around > > > + register_sysctl_mount_point("fs/binfmt_misc"); > > and it should looks like > > +#if IS_ENABLED(CONFIG_BINFMT_MISC) > > + register_sysctl_mount_point("fs/binfmt_misc"); > > +#endif > > or if you prefer original style: > > #if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE) > > Or better yet using IS_ENABLED() to avoid the ifdef cruft: > > diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c > index c07f35719ee3..4b8f1b11a7c8 100644 > --- a/fs/binfmt_misc.c > +++ b/fs/binfmt_misc.c > @@ -817,20 +817,16 @@ static struct file_system_type bm_fs_type = { > }; > MODULE_ALIAS_FS("binfmt_misc"); > > -static struct ctl_table_header *binfmt_misc_header; > - > static int __init init_misc_binfmt(void) > { > int err = register_filesystem(&bm_fs_type); > if (!err) > insert_binfmt(&misc_format); > - binfmt_misc_header = register_sysctl_mount_point("fs/binfmt_misc"); > return 0; > } > > static void __exit exit_misc_binfmt(void) > { > - unregister_sysctl_table(binfmt_misc_header); > unregister_binfmt(&misc_format); > unregister_filesystem(&bm_fs_type); > } > diff --git a/fs/file_table.c b/fs/file_table.c > index 57edef16dce4..4969021fa676 100644 > --- a/fs/file_table.c > +++ b/fs/file_table.c > @@ -119,6 +119,8 @@ static struct ctl_table fs_stat_sysctls[] = { > static int __init init_fs_stat_sysctls(void) > { > register_sysctl_init("fs", fs_stat_sysctls); > + if (IS_ENABLED(CONFIG_BINFMT_MISC)) > + register_sysctl_mount_point("fs/binfmt_misc"); > return 0; > } > fs_initcall(init_fs_stat_sysctls); Apologies for the late reply, I was notified the patch is added to the mm patchset and thought to be late for any update but now I'm not seeing it anywhere. Maybe it's been dropped? Forgot of IS_ENABLED(), I would surely use it in the v2. Not clear what's the benefit in registering the mount point in fs/file_table.c vs. kernel/sysctl.c. I'd keep it close to where people was used to find it, unless there is a good reason otherwise. Could you please elaborate? Thanks for your review! Dom
Index: b/fs/binfmt_misc.c =================================================================== --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -817,20 +817,16 @@ static struct file_system_type bm_fs_typ }; MODULE_ALIAS_FS("binfmt_misc"); -static struct ctl_table_header *binfmt_misc_header; - static int __init init_misc_binfmt(void) { int err = register_filesystem(&bm_fs_type); if (!err) insert_binfmt(&misc_format); - binfmt_misc_header = register_sysctl_mount_point("fs/binfmt_misc"); - return 0; + return err; } static void __exit exit_misc_binfmt(void) { - unregister_sysctl_table(binfmt_misc_header); unregister_binfmt(&misc_format); unregister_filesystem(&bm_fs_type); } Index: b/kernel/sysctl.c =================================================================== --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2806,6 +2806,17 @@ static struct ctl_table vm_table[] = { { } }; +static struct ctl_table fs_table[] = { +#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE) + { + .procname = "binfmt_misc", + .mode = 0555, + .child = sysctl_mount_point, + }, +#endif + { } +}; + static struct ctl_table debug_table[] = { #ifdef CONFIG_SYSCTL_EXCEPTION_TRACE { @@ -2825,6 +2836,7 @@ static struct ctl_table dev_table[] = { DECLARE_SYSCTL_BASE(kernel, kern_table); DECLARE_SYSCTL_BASE(vm, vm_table); +DECLARE_SYSCTL_BASE(fs, fs_table); DECLARE_SYSCTL_BASE(debug, debug_table); DECLARE_SYSCTL_BASE(dev, dev_table); @@ -2832,6 +2844,7 @@ int __init sysctl_init_bases(void) { register_sysctl_base(kernel); register_sysctl_base(vm); + register_sysctl_base(fs); register_sysctl_base(debug); register_sysctl_base(dev);
Commit 3ba442d5331f did not go unnoticed, binfmt-support stopped to work on my Debian system since v5.17-rc2 (did not check with -rc1). The existance of /proc/sys/fs/binfmt_misc is a precondition for attempting to mount the binfmt_misc fs, which in turn triggers the autoload of the binfmt_misc module. Without it, no module is loaded and no binfmt is available at boot. Building as built-in or manually loading the module and mounting the fs works fine, it's therefore only a matter of interaction with user-space. I could try to improve the Debian systemd configuration but I can't say anything about the other distributions. In the meanwhile this patch restores a working system right after boot. Fixes: 3ba442d5331f ("fs: move binfmt_misc sysctl to its own file") Signed-off-by: Domenico Andreoli <domenico.andreoli@linux.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Antti Palosaari <crope@iki.fi> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: David Airlie <airlied@linux.ie> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: James E.J. Bottomley <jejb@linux.ibm.com> Cc: Jan Kara <jack@suse.cz> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Kees Cook <keescook@chromium.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Phillip Potter <phil@philpotter.co.uk> Cc: Qing Wang <wangqing@vivo.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Stephen Kitt <steve@sk2.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Xiaoming Ni <nixiaoming@huawei.com> --- fs/binfmt_misc.c | 6 +----- kernel/sysctl.c | 13 +++++++++++++ 2 files changed, 14 insertions(+), 5 deletions(-)