Message ID | 20171003185313.1017-3-mcgrof@kernel.org (mailing list archive) |
---|---|
State | RFC, archived |
Headers | show |
On Tue, 2017-10-03 at 11:53 -0700, Luis R. Rodriguez wrote: > +static bool super_allows_freeze(struct super_block *sb) > +{ > + return !!(sb->s_type->fs_flags & FS_FREEZE_ON_SUSPEND); > +} A minor comment: if "!!" would be left out the compiler will perform the conversion from int to bool implicitly so I propose to leave out the "!!" and parentheses. Anyway, I agree with the approach of this patch and I think that freezing filesystems before processes are frozen would be a big step forward. Bart.
On Tue, 3 Oct 2017, Luis R. Rodriguez wrote: > This uses the existing filesystem freeze and thaw callbacks to > freeze each filesystem on suspend/hibernation and thaw upon resume. > > This is needed so that we properly really stop IO in flight without > races after userspace has been frozen. Without this we rely on > kthread freezing and its semantics are loose and error prone. > For instance, even though a kthread may use try_to_freeze() and end > up being frozen we have no way of being sure that everything that > has been spawned asynchronously from it (such as timers) have also > been stopped as well. > > A long term advantage of also adding filesystem freeze / thawing > supporting durign suspend / hibernation is that long term we may > be able to eventually drop the kernel's thread freezing completely > as it was originally added to stop disk IO in flight as we hibernate > or suspend. > > This also implies that many kthread users exist which have been > adding freezer semantics onto its kthreads without need. These also > will need to be reviewed later. > > This is based on prior work originally by Rafael Wysocki and later by > Jiri Kosina. > > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Thanks a lot for picking this up; I never found time to actually finalize it. Acked-by: Jiri Kosina <jkosina@suse.cz> for patches 2 and 5 (the fs agnostic code), which were in the core of my original work.
On Tue, Oct 03, 2017 at 08:02:22PM +0000, Bart Van Assche wrote: > On Tue, 2017-10-03 at 11:53 -0700, Luis R. Rodriguez wrote: > > +static bool super_allows_freeze(struct super_block *sb) > > +{ > > + return !!(sb->s_type->fs_flags & FS_FREEZE_ON_SUSPEND); > > +} > > A minor comment: if "!!" would be left out the compiler will perform the > conversion from int to bool implicitly For all compilers? > so I propose to leave out the "!!" and parentheses. OK! > Anyway, I agree with the approach of this patch and I think > that freezing filesystems before processes are frozen would be a big step > forward. Great! But please note, the current implementation calls fs_suspend_freeze() *after* try_to_freeze_tasks(), ie: this implementation freezes userspace and only after then filesystems. Order will be *critical* here to get right, so we should definitely figure out if this is definitely the right place (TM) to call fs_suspend_freeze(). Lastly, a final minor implementation note: I think using a PM notifier would have been much cleaner, in fact it was my the way I originally implemented this orthogonally to Jiri's work, however to get this right the semantics of __pm_notifier_call_chain() would need to be expanded with another state, perhaps PM_USERSPACE_FROZEN, for example. I decided in the end a new state was not worth it give we would just have one user: fs freezing. So to be clear using a notifier to wrap this code into the fs code and not touching kernel/power/process.c is not possible with today's semantics nor do I think its worth it to expand on these semantics. This approach is explicit about order and requirements for those that should care: those that will maintain kernel/power/process.c and friends. Having this in a notifier would shift this implicitly. Luis
On Tue, 2017-10-03 at 22:23 +0200, Luis R. Rodriguez wrote: > On Tue, Oct 03, 2017 at 08:02:22PM +0000, Bart Van Assche wrote: > > On Tue, 2017-10-03 at 11:53 -0700, Luis R. Rodriguez wrote: > > > +static bool super_allows_freeze(struct super_block *sb) > > > +{ > > > + return !!(sb->s_type->fs_flags & FS_FREEZE_ON_SUSPEND); > > > +} > > > > A minor comment: if "!!" would be left out the compiler will perform the > > conversion from int to bool implicitly > > For all compilers? Let's have a look at the output of the following commands: $ PAGER= git grep 'typedef.*[[:blank:]]bool;' include include/linux/types.h:typedef _Bool bool; $ PAGER= git grep std= Makefile Makefile: -fomit-frame-pointer -std=gnu89 $(HOST_LFS_CFLAGS) Makefile: -std=gnu89 $(call cc-option,-fno-PIE) From https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/C-Dialect-Options.html#C-Dialect-Options: ‘gnu89’ GNU dialect of ISO C90 (including some C99 features). I think this means that the Linux kernel tree can only be compiled correctly by compilers that support the C11 type _Bool. > > Anyway, I agree with the approach of this patch and I think > > that freezing filesystems before processes are frozen would be a big step > > forward. > > Great! But please note, the current implementation calls fs_suspend_freeze() > *after* try_to_freeze_tasks(), ie: this implementation freezes userspace and > only after then filesystems. What will the impact be of that choice on filesystems implemented in userspace? Thanks, Bart.
On Tue, Oct 03, 2017 at 08:32:39PM +0000, Bart Van Assche wrote: > On Tue, 2017-10-03 at 22:23 +0200, Luis R. Rodriguez wrote: > > On Tue, Oct 03, 2017 at 08:02:22PM +0000, Bart Van Assche wrote: > > > On Tue, 2017-10-03 at 11:53 -0700, Luis R. Rodriguez wrote: > > > > +static bool super_allows_freeze(struct super_block *sb) > > > > +{ > > > > + return !!(sb->s_type->fs_flags & FS_FREEZE_ON_SUSPEND); > > > > +} > > > > > > A minor comment: if "!!" would be left out the compiler will perform the > > > conversion from int to bool implicitly > > > > For all compilers? > > Let's have a look at the output of the following commands: > > $ PAGER= git grep 'typedef.*[[:blank:]]bool;' include > include/linux/types.h:typedef _Bool bool; > $ PAGER= git grep std= Makefile > Makefile: -fomit-frame-pointer -std=gnu89 $(HOST_LFS_CFLAGS) > Makefile: -std=gnu89 $(call cc-option,-fno-PIE) > > From https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/C-Dialect-Options.html#C-Dialect-Options: > ‘gnu89’ > GNU dialect of ISO C90 (including some C99 features). > > I think this means that the Linux kernel tree can only be compiled correctly > by compilers that support the C11 type _Bool. :*) beautiful, thanks. > > > Anyway, I agree with the approach of this patch and I think > > > that freezing filesystems before processes are frozen would be a big step > > > forward. > > > > Great! But please note, the current implementation calls fs_suspend_freeze() > > *after* try_to_freeze_tasks(), ie: this implementation freezes userspace and > > only after then filesystems. > > What will the impact be of that choice on filesystems implemented in userspace? Depends on what kernel hooks those use? Also now is a good time for those working on userspace filesystems to chime in :) Its why I am stating -- I am not saying I have found the right order, I have find the right order that works for me, and we need consensus on what the right order should be. Luis
On Tue, Oct 03, 2017 at 11:53:10AM -0700, Luis R. Rodriguez wrote: > This uses the existing filesystem freeze and thaw callbacks to > freeze each filesystem on suspend/hibernation and thaw upon resume. > > This is needed so that we properly really stop IO in flight without > races after userspace has been frozen. Without this we rely on > kthread freezing and its semantics are loose and error prone. > For instance, even though a kthread may use try_to_freeze() and end > up being frozen we have no way of being sure that everything that > has been spawned asynchronously from it (such as timers) have also > been stopped as well. > > A long term advantage of also adding filesystem freeze / thawing > supporting durign suspend / hibernation is that long term we may > be able to eventually drop the kernel's thread freezing completely > as it was originally added to stop disk IO in flight as we hibernate > or suspend. > > This also implies that many kthread users exist which have been > adding freezer semantics onto its kthreads without need. These also > will need to be reviewed later. > > This is based on prior work originally by Rafael Wysocki and later by > Jiri Kosina. > > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> > --- > fs/super.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++ > include/linux/fs.h | 13 +++++++++ > kernel/power/process.c | 14 ++++++++- > 3 files changed, 105 insertions(+), 1 deletion(-) > > diff --git a/fs/super.c b/fs/super.c > index d45e92d9a38f..ce8da8b187b1 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -1572,3 +1572,82 @@ int thaw_super(struct super_block *sb) > return 0; > } > EXPORT_SYMBOL(thaw_super); > + > +#ifdef CONFIG_PM_SLEEP > +static bool super_allows_freeze(struct super_block *sb) > +{ > + return !!(sb->s_type->fs_flags & FS_FREEZE_ON_SUSPEND); > +} That's a completely misleading function name. All superblocks can be frozen - freeze_super() is filesystem independent. And given that, I don't see why these super_should_freeze() hoops need to be jumped through... > + > +static bool super_should_freeze(struct super_block *sb) > +{ > + if (!sb->s_root) > + return false; > + if (!(sb->s_flags & MS_BORN)) > + return false; > + /* > + * We don't freeze virtual filesystems, we skip those filesystems with > + * no backing device. > + */ > + if (sb->s_bdi == &noop_backing_dev_info) > + return false; > + /* No need to freeze read-only filesystems */ > + if (sb->s_flags & MS_RDONLY) > + return false; > + if (!super_allows_freeze(sb)) > + return false; > + > + return true; > +} > + > +int fs_suspend_freeze_sb(struct super_block *sb, void *priv) > +{ > + int error = 0; > + > + spin_lock(&sb_lock); > + if (!super_should_freeze(sb)) > + goto out; > + > + up_read(&sb->s_umount); > + pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id); > + error = freeze_super(sb); > + down_read(&sb->s_umount); > +out: > + if (error && error != -EBUSY) > + pr_notice("%s (%s): Unable to freeze, error=%d", > + sb->s_type->name, sb->s_id, error); > + spin_unlock(&sb_lock); > + return error; > +} I don't think this was ever tested. Calling freeze_super() with a spinlock held with through "sleeping in atomic" errors all over the place. Also, the s_umount lock juggling is nasty. Your new copy+pasted iterate_supers_reverse() takes the lock in read mode, yet all the freeze/thaw callers here want to take it in write mode. So, really, iterate_supers_reverse() needs to be iterate_supers_reverse_excl() and take the write lock, and freeze_super/thaw_super need to be factored into locked and unlocked versions. In which case, we end up with: int fs_suspend_freeze_sb(struct super_block *sb, void *priv) { return freeze_locked_super(sb); } int fs_suspend_thaw_sb(struct super_block *sb, void *priv) { return thaw_locked_super(sb); } Cheers, Dave.
On Wed, Oct 04, 2017 at 07:58:41AM +1100, Dave Chinner wrote: > On Tue, Oct 03, 2017 at 11:53:10AM -0700, Luis R. Rodriguez wrote: > > diff --git a/fs/super.c b/fs/super.c > > index d45e92d9a38f..ce8da8b187b1 100644 > > --- a/fs/super.c > > +++ b/fs/super.c > > @@ -1572,3 +1572,82 @@ int thaw_super(struct super_block *sb) > > return 0; > > } > > EXPORT_SYMBOL(thaw_super); > > + > > +#ifdef CONFIG_PM_SLEEP > > +static bool super_allows_freeze(struct super_block *sb) > > +{ > > + return !!(sb->s_type->fs_flags & FS_FREEZE_ON_SUSPEND); > > +} > > That's a completely misleading function name. All superblocks can be > frozen - freeze_super() is filesystem independent. And given that, I > don't see why these super_should_freeze() hoops need to be jumped > through... I added this given ext4's current thaw implementation stalls on resume as it implicates a bio_submit() and this never completes. So I refactored ext4 thaw skip a sync on thaw. This requires more eyeballs. This may be an underlying issue elsewhere. If its not an bug elsewhere or on ordering, then there may be some restrictions on thaw when used on resume. Refer to my notes to Ted on patch #4 for ext4. > > +int fs_suspend_freeze_sb(struct super_block *sb, void *priv) > > +{ > > + int error = 0; > > + > > + spin_lock(&sb_lock); > > + if (!super_should_freeze(sb)) > > + goto out; > > + > > + up_read(&sb->s_umount); > > + pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id); > > + error = freeze_super(sb); > > + down_read(&sb->s_umount); > > +out: > > + if (error && error != -EBUSY) > > + pr_notice("%s (%s): Unable to freeze, error=%d", > > + sb->s_type->name, sb->s_id, error); > > + spin_unlock(&sb_lock); > > + return error; > > +} > > I don't think this was ever tested. Calling freeze_super() with a > spinlock held with through "sleeping in atomic" errors all over the > place. No, I run time tested it with a rootfs with ext4 and xfs with my development tree and hammering on both while I loop on suspend and resume. > Also, the s_umount lock juggling is nasty. Your new copy+pasted > iterate_supers_reverse() takes the lock in read mode, yet all the > freeze/thaw callers here want to take it in write mode. Yeap, yuk! > So, really, > iterate_supers_reverse() needs to be iterate_supers_reverse_excl() > and take the write lock, and freeze_super/thaw_super need to be > factored into locked and unlocked versions. > > In which case, we end up with: > > int fs_suspend_freeze_sb(struct super_block *sb, void *priv) > { > return freeze_locked_super(sb); > } > > int fs_suspend_thaw_sb(struct super_block *sb, void *priv) > { > return thaw_locked_super(sb); > } Groovy, thanks, I suspected that locking was convoluted and we could come up with something better. Will do it this way. Its really what I hoped we could do :) I just needed to get it in writing. Luis
diff --git a/fs/super.c b/fs/super.c index d45e92d9a38f..ce8da8b187b1 100644 --- a/fs/super.c +++ b/fs/super.c @@ -1572,3 +1572,82 @@ int thaw_super(struct super_block *sb) return 0; } EXPORT_SYMBOL(thaw_super); + +#ifdef CONFIG_PM_SLEEP +static bool super_allows_freeze(struct super_block *sb) +{ + return !!(sb->s_type->fs_flags & FS_FREEZE_ON_SUSPEND); +} + +static bool super_should_freeze(struct super_block *sb) +{ + if (!sb->s_root) + return false; + if (!(sb->s_flags & MS_BORN)) + return false; + /* + * We don't freeze virtual filesystems, we skip those filesystems with + * no backing device. + */ + if (sb->s_bdi == &noop_backing_dev_info) + return false; + /* No need to freeze read-only filesystems */ + if (sb->s_flags & MS_RDONLY) + return false; + + if (!super_allows_freeze(sb)) + return false; + + return true; +} + +int fs_suspend_freeze_sb(struct super_block *sb, void *priv) +{ + int error = 0; + + spin_lock(&sb_lock); + if (!super_should_freeze(sb)) + goto out; + + up_read(&sb->s_umount); + pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id); + error = freeze_super(sb); + down_read(&sb->s_umount); +out: + if (error && error != -EBUSY) + pr_notice("%s (%s): Unable to freeze, error=%d", + sb->s_type->name, sb->s_id, error); + spin_unlock(&sb_lock); + return error; +} + +int fs_suspend_freeze(void) +{ + return iterate_supers_reverse(fs_suspend_freeze_sb, NULL); +} + +void fs_resume_unfreeze_sb(struct super_block *sb, void *priv) +{ + int error = 0; + + spin_lock(&sb_lock); + if (!super_should_freeze(sb)) + goto out; + + up_read(&sb->s_umount); + pr_info("%s (%s): thawing\n", sb->s_type->name, sb->s_id); + error = thaw_super(sb); + down_read(&sb->s_umount); +out: + if (error && error != -EBUSY) + pr_notice("%s (%s): Unable to unfreeze, error=%d", + sb->s_type->name, sb->s_id, error); + spin_unlock(&sb_lock); +} + +void fs_resume_unfreeze(void) +{ + iterate_supers(fs_resume_unfreeze_sb, NULL); +} + +#endif diff --git a/include/linux/fs.h b/include/linux/fs.h index cd084792cf39..7754230d6afc 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2071,6 +2071,7 @@ struct file_system_type { #define FS_BINARY_MOUNTDATA 2 #define FS_HAS_SUBTYPE 4 #define FS_USERNS_MOUNT 8 /* Can be mounted by userns root */ +#define FS_FREEZE_ON_SUSPEND 16 /* should filesystem freeze on suspend? */ #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */ struct dentry *(*mount) (struct file_system_type *, int, const char *, void *); @@ -2172,6 +2173,18 @@ extern int fd_statfs(int, struct kstatfs *); extern int vfs_ustat(dev_t, struct kstatfs *); extern int freeze_super(struct super_block *super); extern int thaw_super(struct super_block *super); +#ifdef CONFIG_PM_SLEEP +int fs_suspend_freeze(void); +void fs_resume_unfreeze(void); +#else +static inline int fs_suspend_freeze(void) +{ + return 0; +} +static inline void fs_resume_unfreeze(void) +{ +} +#endif extern bool our_mnt(struct vfsmount *mnt); extern __printf(2, 3) int super_setup_bdi_name(struct super_block *sb, char *fmt, ...); diff --git a/kernel/power/process.c b/kernel/power/process.c index 50f25cb370c6..9d1277768312 100644 --- a/kernel/power/process.c +++ b/kernel/power/process.c @@ -144,6 +144,15 @@ int freeze_processes(void) pr_cont("\n"); BUG_ON(in_atomic()); + pr_info("Freezing filesystems ... "); + error = fs_suspend_freeze(); + if (error) { + pr_cont("failed\n"); + fs_resume_unfreeze(); + return error; + } + pr_cont("done.\n"); + /* * Now that the whole userspace is frozen we need to disbale * the OOM killer to disallow any further interference with @@ -153,8 +162,10 @@ int freeze_processes(void) if (!error && !oom_killer_disable(msecs_to_jiffies(freeze_timeout_msecs))) error = -EBUSY; - if (error) + if (error) { thaw_processes(); + fs_resume_unfreeze(); + } return error; } @@ -197,6 +208,7 @@ void thaw_processes(void) pm_nosig_freezing = false; oom_killer_enable(); + fs_resume_unfreeze(); pr_info("Restarting tasks ... ");
This uses the existing filesystem freeze and thaw callbacks to freeze each filesystem on suspend/hibernation and thaw upon resume. This is needed so that we properly really stop IO in flight without races after userspace has been frozen. Without this we rely on kthread freezing and its semantics are loose and error prone. For instance, even though a kthread may use try_to_freeze() and end up being frozen we have no way of being sure that everything that has been spawned asynchronously from it (such as timers) have also been stopped as well. A long term advantage of also adding filesystem freeze / thawing supporting durign suspend / hibernation is that long term we may be able to eventually drop the kernel's thread freezing completely as it was originally added to stop disk IO in flight as we hibernate or suspend. This also implies that many kthread users exist which have been adding freezer semantics onto its kthreads without need. These also will need to be reviewed later. This is based on prior work originally by Rafael Wysocki and later by Jiri Kosina. Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> --- fs/super.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 13 +++++++++ kernel/power/process.c | 14 ++++++++- 3 files changed, 105 insertions(+), 1 deletion(-)