Message ID | 20250402-work-freeze-v2-2-6719a97b52ac@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | power: wire-up filesystem freeze/thaw with suspend/resume | expand |
On Wed, Apr 02, 2025 at 04:07:32PM +0200, Christian Brauner wrote: > During freeze/thaw we need to be able to freeze all writers during > suspend/hibernate. Otherwise tasks such as systemd-journald that mmap a > file and write to it will not be frozen after we've already frozen the > filesystem. > > This has some risk of not being able to freeze processes in case a > process has acquired SB_FREEZE_PAGEFAULT under mmap_sem or > SB_FREEZE_INTERNAL under some other filesytem specific lock. If the > filesystem is frozen, a task can block on the frozen filesystem with > e.g., mmap_sem held. If some other task then blocks on grabbing that > mmap_sem, hibernation ill fail because it is unable to hibernate a task > holding mmap_sem. This could be fixed by making a range of filesystem > related locks use freezable sleeping. That's impractical and not > warranted just for suspend/hibernate. Assume that this is an infrequent > problem and we've given userspace a way to skip filesystem freezing > through a sysfs file. > > Signed-off-by: Christian Brauner <brauner@kernel.org> > --- > include/linux/fs.h | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index b379a46b5576..1edcba3cd68e 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1781,8 +1781,7 @@ static inline void __sb_end_write(struct super_block *sb, int level) > > static inline void __sb_start_write(struct super_block *sb, int level) > { > - percpu_down_read_freezable(sb->s_writers.rw_sem + level - 1, > - level == SB_FREEZE_WRITE); > + percpu_down_read_freezable(sb->s_writers.rw_sem + level - 1, true); > } Jan, one more thought about freezability here. We know that there will can be at least one process during hibernation that ends up generating page faults and that's systemd-journald. When systemd-sleep requests writing a hibernation image via /sys/power/ files it will inevitably end up freezing systemd-journald and it may be generating a page fault with ->mmap_lock held. systemd-journald is now sleeping with SB_FREEZE_PAGEFAULT and TASK_FREEZABLE. We know this can cause hibernation to fail. That part is fine. What isn't is that we will very likely always trigger: #ifdef CONFIG_LOCKDEP /* * It's dangerous to freeze with locks held; there be dragons there. */ if (!(state & __TASK_FREEZABLE_UNSAFE)) WARN_ON_ONCE(debug_locks && p->lockdep_depth); #endif with lockdep enabled. So we really actually need percpu_rswem_read_freezable_unsafe(), i.e., TASK_FREEZABLE_UNSAFE.
On Wed, 2025-04-02 at 17:32 +0200, Christian Brauner wrote: [...] > Jan, one more thought about freezability here. We know that there > will can be at least one process during hibernation that ends up > generating page faults and that's systemd-journald. When systemd- > sleep requests writing a hibernation image via /sys/power/ files it > will inevitably end up freezing systemd-journald and it may be > generating a page fault with ->mmap_lock held. systemd-journald is > now sleeping with SB_FREEZE_PAGEFAULT and TASK_FREEZABLE. We know > this can cause hibernation to fail. That part is fine. What isn't is > that we will very likely always trigger: > > #ifdef CONFIG_LOCKDEP > /* > * It's dangerous to freeze with locks held; there be dragons > there. > */ > if (!(state & __TASK_FREEZABLE_UNSAFE)) > WARN_ON_ONCE(debug_locks && p->lockdep_depth); > #endif > > with lockdep enabled. > > So we really actually need percpu_rswem_read_freezable_unsafe(), > i.e., TASK_FREEZABLE_UNSAFE. The sched people have pretty strong views about people not doing this, expressed in the comment in sched.h and commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic") where most of the _unsafe variants got removed with prejudice. If we do get into this situation the worst that can happen is that another upper lock acquisition triggers a hibernate failure and we thaw everything, thus we can never truly deadlock, which is the fear, so perhaps they might be OK with this. Note that Rafael's solution to this was to disable lockdep around hibernate/suspend and resume, which is another possibility. Regards, James
On Wed, Apr 02, 2025 at 12:03:24PM -0400, James Bottomley wrote: > On Wed, 2025-04-02 at 17:32 +0200, Christian Brauner wrote: > [...] > > Jan, one more thought about freezability here. We know that there > > will can be at least one process during hibernation that ends up > > generating page faults and that's systemd-journald. When systemd- > > sleep requests writing a hibernation image via /sys/power/ files it > > will inevitably end up freezing systemd-journald and it may be > > generating a page fault with ->mmap_lock held. systemd-journald is > > now sleeping with SB_FREEZE_PAGEFAULT and TASK_FREEZABLE. We know > > this can cause hibernation to fail. That part is fine. What isn't is > > that we will very likely always trigger: > > > > #ifdef CONFIG_LOCKDEP > > /* > > * It's dangerous to freeze with locks held; there be dragons > > there. > > */ > > if (!(state & __TASK_FREEZABLE_UNSAFE)) > > WARN_ON_ONCE(debug_locks && p->lockdep_depth); > > #endif > > > > with lockdep enabled. > > > > So we really actually need percpu_rswem_read_freezable_unsafe(), > > i.e., TASK_FREEZABLE_UNSAFE. > > The sched people have pretty strong views about people not doing this, > expressed in the comment in sched.h and commit f5d39b020809 > ("freezer,sched: Rewrite core freezer logic") where most of the _unsafe > variants got removed with prejudice. > > If we do get into this situation the worst that can happen is that > another upper lock acquisition triggers a hibernate failure and we thaw > everything, thus we can never truly deadlock, which is the fear, so Yes, I know that it's harmless but we need to not generate misleading lockdep splats when lockdep is turned on and confuse users. > perhaps they might be OK with this. Note that Rafael's solution to > this was to disable lockdep around hibernate/suspend and resume, which > is another possibility. It can be done as a follow-up. I'm just saying it needs treatment.
On Wed 02-04-25 16:07:32, Christian Brauner wrote: > During freeze/thaw we need to be able to freeze all writers during > suspend/hibernate. Otherwise tasks such as systemd-journald that mmap a > file and write to it will not be frozen after we've already frozen the > filesystem. > > This has some risk of not being able to freeze processes in case a > process has acquired SB_FREEZE_PAGEFAULT under mmap_sem or > SB_FREEZE_INTERNAL under some other filesytem specific lock. If the > filesystem is frozen, a task can block on the frozen filesystem with > e.g., mmap_sem held. If some other task then blocks on grabbing that > mmap_sem, hibernation ill fail because it is unable to hibernate a task > holding mmap_sem. This could be fixed by making a range of filesystem > related locks use freezable sleeping. That's impractical and not > warranted just for suspend/hibernate. Assume that this is an infrequent > problem and we've given userspace a way to skip filesystem freezing > through a sysfs file. > > Signed-off-by: Christian Brauner <brauner@kernel.org> Looks good. Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> BTW, I agree about the need to silence lockdep somehow. Honza > --- > include/linux/fs.h | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index b379a46b5576..1edcba3cd68e 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1781,8 +1781,7 @@ static inline void __sb_end_write(struct super_block *sb, int level) > > static inline void __sb_start_write(struct super_block *sb, int level) > { > - percpu_down_read_freezable(sb->s_writers.rw_sem + level - 1, > - level == SB_FREEZE_WRITE); > + percpu_down_read_freezable(sb->s_writers.rw_sem + level - 1, true); > } > > static inline bool __sb_start_write_trylock(struct super_block *sb, int level) > > -- > 2.47.2 >
diff --git a/include/linux/fs.h b/include/linux/fs.h index b379a46b5576..1edcba3cd68e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1781,8 +1781,7 @@ static inline void __sb_end_write(struct super_block *sb, int level) static inline void __sb_start_write(struct super_block *sb, int level) { - percpu_down_read_freezable(sb->s_writers.rw_sem + level - 1, - level == SB_FREEZE_WRITE); + percpu_down_read_freezable(sb->s_writers.rw_sem + level - 1, true); } static inline bool __sb_start_write_trylock(struct super_block *sb, int level)
During freeze/thaw we need to be able to freeze all writers during suspend/hibernate. Otherwise tasks such as systemd-journald that mmap a file and write to it will not be frozen after we've already frozen the filesystem. This has some risk of not being able to freeze processes in case a process has acquired SB_FREEZE_PAGEFAULT under mmap_sem or SB_FREEZE_INTERNAL under some other filesytem specific lock. If the filesystem is frozen, a task can block on the frozen filesystem with e.g., mmap_sem held. If some other task then blocks on grabbing that mmap_sem, hibernation ill fail because it is unable to hibernate a task holding mmap_sem. This could be fixed by making a range of filesystem related locks use freezable sleeping. That's impractical and not warranted just for suspend/hibernate. Assume that this is an infrequent problem and we've given userspace a way to skip filesystem freezing through a sysfs file. Signed-off-by: Christian Brauner <brauner@kernel.org> --- include/linux/fs.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)