Message ID | 20230114003409.1168311-6-mcgrof@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfs: provide automatic kernel freeze / resume | expand |
On Fri, Jan 13, 2023 at 04:33:50PM -0800, Luis Chamberlain wrote: > +#ifdef CONFIG_PM_SLEEP > +static bool super_should_freeze(struct super_block *sb) > +{ > + if (!(sb->s_type->fs_flags & FS_AUTOFREEZE)) > + return false; This is used. > + /* > + * We don't freeze virtual filesystems, we skip those filesystems with > + * no backing device. > + */ > + if (sb->s_bdi == &noop_backing_dev_info) > + return false; I however had dropped this and forgot to update my branch. > + > + return true; > +} So the call to super_should_freeze() is removed and the check for the flag is open coded. Luis
On Fri 13-01-23 16:33:50, Luis Chamberlain wrote: > Add support to automatically handle freezing and thawing filesystems > during the kernel's suspend/resume cycle. > > This is needed so that we properly really stop IO in flight without > races after userspace has been frozen. Without this we rely on > kthread freezing and its semantics are loose and error prone. > For instance, even though a kthread may use try_to_freeze() and end > up being frozen we have no way of being sure that everything that > has been spawned asynchronously from it (such as timers) have also > been stopped as well. > > A long term advantage of also adding filesystem freeze / thawing > supporting during suspend / hibernation is that long term we may > be able to eventually drop the kernel's thread freezing completely > as it was originally added to stop disk IO in flight as we hibernate > or suspend. > > This does not remove the superflous freezer calls on all filesystems. > Each filesystem must remove all the kthread freezer stuff and peg > the fs_type flags as supporting auto-freezing with the FS_AUTOFREEZE > flag. > > Subsequent patches remove the kthread freezer usage from each > filesystem, one at a time to make all this work bisectable. > Once all filesystems remove the usage of the kthread freezer we > can remove the FS_AUTOFREEZE flag. > > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Looks good to me. Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/super.c | 69 ++++++++++++++++++++++++++++++++++++++++++ > include/linux/fs.h | 14 +++++++++ > kernel/power/process.c | 15 ++++++++- > 3 files changed, 97 insertions(+), 1 deletion(-) > > diff --git a/fs/super.c b/fs/super.c > index 2f77fcb6e555..e8af4c8269ad 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -1853,3 +1853,72 @@ int thaw_super(struct super_block *sb, bool usercall) > return 0; > } > EXPORT_SYMBOL(thaw_super); > + > +#ifdef CONFIG_PM_SLEEP > +static bool super_should_freeze(struct super_block *sb) > +{ > + if (!(sb->s_type->fs_flags & FS_AUTOFREEZE)) > + return false; > + /* > + * We don't freeze virtual filesystems, we skip those filesystems with > + * no backing device. > + */ > + if (sb->s_bdi == &noop_backing_dev_info) > + return false; > + > + return true; > +} > + > +int fs_suspend_freeze_sb(struct super_block *sb, void *priv) > +{ > + int error = 0; > + > + if (!grab_lock_super(sb)) { > + pr_err("%s (%s): freezing failed to grab_super()\n", > + sb->s_type->name, sb->s_id); > + return -ENOTTY; > + } > + > + if (!super_should_freeze(sb)) > + goto out; > + > + pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id); > + > + error = freeze_super(sb, false); > + if (!error) > + lockdep_sb_freeze_release(sb); > + else if (error != -EBUSY) > + pr_notice("%s (%s): Unable to freeze, error=%d", > + sb->s_type->name, sb->s_id, error); > + > +out: > + deactivate_locked_super(sb); > + return error; > +} > + > +int fs_suspend_thaw_sb(struct super_block *sb, void *priv) > +{ > + int error = 0; > + > + if (!grab_lock_super(sb)) { > + pr_err("%s (%s): thawing failed to grab_super()\n", > + sb->s_type->name, sb->s_id); > + return -ENOTTY; > + } > + > + if (!super_should_freeze(sb)) > + goto out; > + > + pr_info("%s (%s): thawing\n", sb->s_type->name, sb->s_id); > + > + error = thaw_super(sb, false); > + if (error && error != -EBUSY) > + pr_notice("%s (%s): Unable to unfreeze, error=%d", > + sb->s_type->name, sb->s_id, error); > + > +out: > + deactivate_locked_super(sb); > + return error; > +} > + > +#endif > diff --git a/include/linux/fs.h b/include/linux/fs.h > index f168e72f6ca1..e5bee359e804 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2231,6 +2231,7 @@ struct file_system_type { > #define FS_DISALLOW_NOTIFY_PERM 16 /* Disable fanotify permission events */ > #define FS_ALLOW_IDMAP 32 /* FS has been updated to handle vfs idmappings. */ > #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */ > +#define FS_AUTOFREEZE (1<<16) /* temporary as we phase kthread freezer out */ > int (*init_fs_context)(struct fs_context *); > const struct fs_parameter_spec *parameters; > struct dentry *(*mount) (struct file_system_type *, int, > @@ -2306,6 +2307,19 @@ extern int user_statfs(const char __user *, struct kstatfs *); > extern int fd_statfs(int, struct kstatfs *); > extern int freeze_super(struct super_block *super, bool usercall); > extern int thaw_super(struct super_block *super, bool usercall); > +#ifdef CONFIG_PM_SLEEP > +int fs_suspend_freeze_sb(struct super_block *sb, void *priv); > +int fs_suspend_thaw_sb(struct super_block *sb, void *priv); > +#else > +static inline int fs_suspend_freeze_sb(struct super_block *sb, void *priv) > +{ > + return 0; > +} > +static inline int fs_suspend_thaw_sb(struct super_block *sb, void *priv) > +{ > + return 0; > +} > +#endif > extern __printf(2, 3) > int super_setup_bdi_name(struct super_block *sb, char *fmt, ...); > extern int super_setup_bdi(struct super_block *sb); > diff --git a/kernel/power/process.c b/kernel/power/process.c > index 6c1c7e566d35..1dd6b0b6b4e5 100644 > --- a/kernel/power/process.c > +++ b/kernel/power/process.c > @@ -140,6 +140,16 @@ int freeze_processes(void) > > BUG_ON(in_atomic()); > > + pr_info("Freezing filesystems ... "); > + error = iterate_supers_reverse_excl(fs_suspend_freeze_sb, NULL); > + if (error) { > + pr_cont("failed\n"); > + iterate_supers_excl(fs_suspend_thaw_sb, NULL); > + thaw_processes(); > + return error; > + } > + pr_cont("done.\n"); > + > /* > * Now that the whole userspace is frozen we need to disable > * the OOM killer to disallow any further interference with > @@ -149,8 +159,10 @@ int freeze_processes(void) > if (!error && !oom_killer_disable(msecs_to_jiffies(freeze_timeout_msecs))) > error = -EBUSY; > > - if (error) > + if (error) { > + iterate_supers_excl(fs_suspend_thaw_sb, NULL); > thaw_processes(); > + } > return error; > } > > @@ -188,6 +200,7 @@ void thaw_processes(void) > pm_nosig_freezing = false; > > oom_killer_enable(); > + iterate_supers_excl(fs_suspend_thaw_sb, NULL); > > pr_info("Restarting tasks ... "); > > -- > 2.35.1 >
On Fri, Jan 13, 2023 at 04:33:50PM -0800, Luis Chamberlain wrote: > Add support to automatically handle freezing and thawing filesystems > during the kernel's suspend/resume cycle. > > This is needed so that we properly really stop IO in flight without > races after userspace has been frozen. Without this we rely on > kthread freezing and its semantics are loose and error prone. > For instance, even though a kthread may use try_to_freeze() and end > up being frozen we have no way of being sure that everything that > has been spawned asynchronously from it (such as timers) have also > been stopped as well. > > A long term advantage of also adding filesystem freeze / thawing > supporting during suspend / hibernation is that long term we may > be able to eventually drop the kernel's thread freezing completely > as it was originally added to stop disk IO in flight as we hibernate > or suspend. Hooray! One evil question though -- Say you have dm devices A and B. Each has a distinct fs on it. If you mount A and then B and initiate a suspend, that should result in first B and then A freezing, right? After resuming, you then change A's dm-table definition to point it at a loop device backed by a file on B. What happens now when you initiate a suspend? B freezes, then A tries to flush data to the loop-mounted file on B, but it's too late for that. That sounds like a deadlock? Though I don't know how much we care about this corner case, since (a) freezing has been busted on xfs for years and (b) one can think up all sorts of horrid ouroborous scenarios like: Change A's dm-table to point to a loop-mounted file on B, and changing B to point to a loop-mounted file on A. Then try to write to either filesystem and see what kind of storm you get. Anyway, just wondering if you'd thought about that kind of doomsday scenario that a nutty sysadmin could set up. The only way I can think of to solve that kind of thing would be to hook filesystems and loop devices into the device model, make fs "device" suspend actually freeze, hope the suspend code suspends from the leaves inward, and hope I actually understand how the device model works (I don't.) --D > This does not remove the superflous freezer calls on all filesystems. > Each filesystem must remove all the kthread freezer stuff and peg > the fs_type flags as supporting auto-freezing with the FS_AUTOFREEZE > flag. > > Subsequent patches remove the kthread freezer usage from each > filesystem, one at a time to make all this work bisectable. > Once all filesystems remove the usage of the kthread freezer we > can remove the FS_AUTOFREEZE flag. > > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> > --- > fs/super.c | 69 ++++++++++++++++++++++++++++++++++++++++++ > include/linux/fs.h | 14 +++++++++ > kernel/power/process.c | 15 ++++++++- > 3 files changed, 97 insertions(+), 1 deletion(-) > > diff --git a/fs/super.c b/fs/super.c > index 2f77fcb6e555..e8af4c8269ad 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -1853,3 +1853,72 @@ int thaw_super(struct super_block *sb, bool usercall) > return 0; > } > EXPORT_SYMBOL(thaw_super); > + > +#ifdef CONFIG_PM_SLEEP > +static bool super_should_freeze(struct super_block *sb) > +{ > + if (!(sb->s_type->fs_flags & FS_AUTOFREEZE)) > + return false; > + /* > + * We don't freeze virtual filesystems, we skip those filesystems with > + * no backing device. > + */ > + if (sb->s_bdi == &noop_backing_dev_info) > + return false; > + > + return true; > +} > + > +int fs_suspend_freeze_sb(struct super_block *sb, void *priv) > +{ > + int error = 0; > + > + if (!grab_lock_super(sb)) { > + pr_err("%s (%s): freezing failed to grab_super()\n", > + sb->s_type->name, sb->s_id); > + return -ENOTTY; > + } > + > + if (!super_should_freeze(sb)) > + goto out; > + > + pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id); > + > + error = freeze_super(sb, false); > + if (!error) > + lockdep_sb_freeze_release(sb); > + else if (error != -EBUSY) > + pr_notice("%s (%s): Unable to freeze, error=%d", > + sb->s_type->name, sb->s_id, error); > + > +out: > + deactivate_locked_super(sb); > + return error; > +} > + > +int fs_suspend_thaw_sb(struct super_block *sb, void *priv) > +{ > + int error = 0; > + > + if (!grab_lock_super(sb)) { > + pr_err("%s (%s): thawing failed to grab_super()\n", > + sb->s_type->name, sb->s_id); > + return -ENOTTY; > + } > + > + if (!super_should_freeze(sb)) > + goto out; > + > + pr_info("%s (%s): thawing\n", sb->s_type->name, sb->s_id); > + > + error = thaw_super(sb, false); > + if (error && error != -EBUSY) > + pr_notice("%s (%s): Unable to unfreeze, error=%d", > + sb->s_type->name, sb->s_id, error); > + > +out: > + deactivate_locked_super(sb); > + return error; > +} > + > +#endif > diff --git a/include/linux/fs.h b/include/linux/fs.h > index f168e72f6ca1..e5bee359e804 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2231,6 +2231,7 @@ struct file_system_type { > #define FS_DISALLOW_NOTIFY_PERM 16 /* Disable fanotify permission events */ > #define FS_ALLOW_IDMAP 32 /* FS has been updated to handle vfs idmappings. */ > #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */ > +#define FS_AUTOFREEZE (1<<16) /* temporary as we phase kthread freezer out */ > int (*init_fs_context)(struct fs_context *); > const struct fs_parameter_spec *parameters; > struct dentry *(*mount) (struct file_system_type *, int, > @@ -2306,6 +2307,19 @@ extern int user_statfs(const char __user *, struct kstatfs *); > extern int fd_statfs(int, struct kstatfs *); > extern int freeze_super(struct super_block *super, bool usercall); > extern int thaw_super(struct super_block *super, bool usercall); > +#ifdef CONFIG_PM_SLEEP > +int fs_suspend_freeze_sb(struct super_block *sb, void *priv); > +int fs_suspend_thaw_sb(struct super_block *sb, void *priv); > +#else > +static inline int fs_suspend_freeze_sb(struct super_block *sb, void *priv) > +{ > + return 0; > +} > +static inline int fs_suspend_thaw_sb(struct super_block *sb, void *priv) > +{ > + return 0; > +} > +#endif > extern __printf(2, 3) > int super_setup_bdi_name(struct super_block *sb, char *fmt, ...); > extern int super_setup_bdi(struct super_block *sb); > diff --git a/kernel/power/process.c b/kernel/power/process.c > index 6c1c7e566d35..1dd6b0b6b4e5 100644 > --- a/kernel/power/process.c > +++ b/kernel/power/process.c > @@ -140,6 +140,16 @@ int freeze_processes(void) > > BUG_ON(in_atomic()); > > + pr_info("Freezing filesystems ... "); > + error = iterate_supers_reverse_excl(fs_suspend_freeze_sb, NULL); > + if (error) { > + pr_cont("failed\n"); > + iterate_supers_excl(fs_suspend_thaw_sb, NULL); > + thaw_processes(); > + return error; > + } > + pr_cont("done.\n"); > + > /* > * Now that the whole userspace is frozen we need to disable > * the OOM killer to disallow any further interference with > @@ -149,8 +159,10 @@ int freeze_processes(void) > if (!error && !oom_killer_disable(msecs_to_jiffies(freeze_timeout_msecs))) > error = -EBUSY; > > - if (error) > + if (error) { > + iterate_supers_excl(fs_suspend_thaw_sb, NULL); > thaw_processes(); > + } > return error; > } > > @@ -188,6 +200,7 @@ void thaw_processes(void) > pm_nosig_freezing = false; > > oom_killer_enable(); > + iterate_supers_excl(fs_suspend_thaw_sb, NULL); > > pr_info("Restarting tasks ... "); > > -- > 2.35.1 >
On Thu, Feb 23, 2023 at 07:08:37PM -0800, Darrick J. Wong wrote: > On Fri, Jan 13, 2023 at 04:33:50PM -0800, Luis Chamberlain wrote: > > Add support to automatically handle freezing and thawing filesystems > > during the kernel's suspend/resume cycle. > > > > This is needed so that we properly really stop IO in flight without > > races after userspace has been frozen. Without this we rely on > > kthread freezing and its semantics are loose and error prone. > > For instance, even though a kthread may use try_to_freeze() and end > > up being frozen we have no way of being sure that everything that > > has been spawned asynchronously from it (such as timers) have also > > been stopped as well. > > > > A long term advantage of also adding filesystem freeze / thawing > > supporting during suspend / hibernation is that long term we may > > be able to eventually drop the kernel's thread freezing completely > > as it was originally added to stop disk IO in flight as we hibernate > > or suspend. > > Hooray! > > One evil question though -- > > Say you have dm devices A and B. Each has a distinct fs on it. > If you mount A and then B and initiate a suspend, that should result in > first B and then A freezing, right? > > After resuming, you then change A's dm-table definition to point it > at a loop device backed by a file on B. > > What happens now when you initiate a suspend? B freezes, then A tries > to flush data to the loop-mounted file on B, but it's too late for that. > That sounds like a deadlock? > > Though I don't know how much we care about this corner case, As you suggest this is not the only corner case that one could draw upon. There was that evil ioctl added years ago to allow flipping an installed system bootted from a USB or ISO over to the real freshly installed root mount point. To make this bullet-proof we'll need to eventually add a simple graph implementation to keep tags on ordering requirements for the super blocks. I have some C code which tries to implement a graph Linux style but since these are all corner cases at this time, I think it's best we fix first suspend for most and later address a proper graph solution. > Anyway, just wondering if you'd thought about that kind of doomsday > scenario that a nutty sysadmin could set up. > > The only way I can think of to solve that kind of thing would be to hook > filesystems and loop devices into the device model, make fs "device" > suspend actually freeze, hope the suspend code suspends from the leaves > inward, and hope I actually understand how the device model works (I > don't.) There's probably really odd things one can do, and one thing I think we can later do is simply annotate those cases and *not* allow auto-freeze with time for those horrible situations. A real long term solution I think will involve a graph. Luis
diff --git a/fs/super.c b/fs/super.c index 2f77fcb6e555..e8af4c8269ad 100644 --- a/fs/super.c +++ b/fs/super.c @@ -1853,3 +1853,72 @@ int thaw_super(struct super_block *sb, bool usercall) return 0; } EXPORT_SYMBOL(thaw_super); + +#ifdef CONFIG_PM_SLEEP +static bool super_should_freeze(struct super_block *sb) +{ + if (!(sb->s_type->fs_flags & FS_AUTOFREEZE)) + return false; + /* + * We don't freeze virtual filesystems, we skip those filesystems with + * no backing device. + */ + if (sb->s_bdi == &noop_backing_dev_info) + return false; + + return true; +} + +int fs_suspend_freeze_sb(struct super_block *sb, void *priv) +{ + int error = 0; + + if (!grab_lock_super(sb)) { + pr_err("%s (%s): freezing failed to grab_super()\n", + sb->s_type->name, sb->s_id); + return -ENOTTY; + } + + if (!super_should_freeze(sb)) + goto out; + + pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id); + + error = freeze_super(sb, false); + if (!error) + lockdep_sb_freeze_release(sb); + else if (error != -EBUSY) + pr_notice("%s (%s): Unable to freeze, error=%d", + sb->s_type->name, sb->s_id, error); + +out: + deactivate_locked_super(sb); + return error; +} + +int fs_suspend_thaw_sb(struct super_block *sb, void *priv) +{ + int error = 0; + + if (!grab_lock_super(sb)) { + pr_err("%s (%s): thawing failed to grab_super()\n", + sb->s_type->name, sb->s_id); + return -ENOTTY; + } + + if (!super_should_freeze(sb)) + goto out; + + pr_info("%s (%s): thawing\n", sb->s_type->name, sb->s_id); + + error = thaw_super(sb, false); + if (error && error != -EBUSY) + pr_notice("%s (%s): Unable to unfreeze, error=%d", + sb->s_type->name, sb->s_id, error); + +out: + deactivate_locked_super(sb); + return error; +} + +#endif diff --git a/include/linux/fs.h b/include/linux/fs.h index f168e72f6ca1..e5bee359e804 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2231,6 +2231,7 @@ struct file_system_type { #define FS_DISALLOW_NOTIFY_PERM 16 /* Disable fanotify permission events */ #define FS_ALLOW_IDMAP 32 /* FS has been updated to handle vfs idmappings. */ #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */ +#define FS_AUTOFREEZE (1<<16) /* temporary as we phase kthread freezer out */ int (*init_fs_context)(struct fs_context *); const struct fs_parameter_spec *parameters; struct dentry *(*mount) (struct file_system_type *, int, @@ -2306,6 +2307,19 @@ extern int user_statfs(const char __user *, struct kstatfs *); extern int fd_statfs(int, struct kstatfs *); extern int freeze_super(struct super_block *super, bool usercall); extern int thaw_super(struct super_block *super, bool usercall); +#ifdef CONFIG_PM_SLEEP +int fs_suspend_freeze_sb(struct super_block *sb, void *priv); +int fs_suspend_thaw_sb(struct super_block *sb, void *priv); +#else +static inline int fs_suspend_freeze_sb(struct super_block *sb, void *priv) +{ + return 0; +} +static inline int fs_suspend_thaw_sb(struct super_block *sb, void *priv) +{ + return 0; +} +#endif extern __printf(2, 3) int super_setup_bdi_name(struct super_block *sb, char *fmt, ...); extern int super_setup_bdi(struct super_block *sb); diff --git a/kernel/power/process.c b/kernel/power/process.c index 6c1c7e566d35..1dd6b0b6b4e5 100644 --- a/kernel/power/process.c +++ b/kernel/power/process.c @@ -140,6 +140,16 @@ int freeze_processes(void) BUG_ON(in_atomic()); + pr_info("Freezing filesystems ... "); + error = iterate_supers_reverse_excl(fs_suspend_freeze_sb, NULL); + if (error) { + pr_cont("failed\n"); + iterate_supers_excl(fs_suspend_thaw_sb, NULL); + thaw_processes(); + return error; + } + pr_cont("done.\n"); + /* * Now that the whole userspace is frozen we need to disable * the OOM killer to disallow any further interference with @@ -149,8 +159,10 @@ int freeze_processes(void) if (!error && !oom_killer_disable(msecs_to_jiffies(freeze_timeout_msecs))) error = -EBUSY; - if (error) + if (error) { + iterate_supers_excl(fs_suspend_thaw_sb, NULL); thaw_processes(); + } return error; } @@ -188,6 +200,7 @@ void thaw_processes(void) pm_nosig_freezing = false; oom_killer_enable(); + iterate_supers_excl(fs_suspend_thaw_sb, NULL); pr_info("Restarting tasks ... ");
Add support to automatically handle freezing and thawing filesystems during the kernel's suspend/resume cycle. This is needed so that we properly really stop IO in flight without races after userspace has been frozen. Without this we rely on kthread freezing and its semantics are loose and error prone. For instance, even though a kthread may use try_to_freeze() and end up being frozen we have no way of being sure that everything that has been spawned asynchronously from it (such as timers) have also been stopped as well. A long term advantage of also adding filesystem freeze / thawing supporting during suspend / hibernation is that long term we may be able to eventually drop the kernel's thread freezing completely as it was originally added to stop disk IO in flight as we hibernate or suspend. This does not remove the superflous freezer calls on all filesystems. Each filesystem must remove all the kthread freezer stuff and peg the fs_type flags as supporting auto-freezing with the FS_AUTOFREEZE flag. Subsequent patches remove the kthread freezer usage from each filesystem, one at a time to make all this work bisectable. Once all filesystems remove the usage of the kthread freezer we can remove the FS_AUTOFREEZE flag. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- fs/super.c | 69 ++++++++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 14 +++++++++ kernel/power/process.c | 15 ++++++++- 3 files changed, 97 insertions(+), 1 deletion(-)