Message ID | 20200413193718.956985775@gmail.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
Series | Preparing to phase out uswsusp | expand |
On Monday, April 13, 2020 9:08:45 PM CEST Domenico Andreoli wrote: > From: Domenico Andreoli <domenico.andreoli@linux.com> > > uswsusp is no longer the preferred way to suspend/hibernate and the > userspace tools have not received any update in years. > > Make it possible to enable the uswsusp support only if needed, prepare > for future phase out. > > Signed-off-by: Domenico Andreoli <domenico.andreoli@linux.com> > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> > Cc: Pavel Machek <pavel@ucw.cz> > Cc: Linux PM <linux-pm@vger.kernel.org> > > --- > kernel/power/Kconfig | 5 +++++ > kernel/power/Makefile | 3 ++- > kernel/power/power.h | 5 +++++ > 3 files changed, 12 insertions(+), 1 deletion(-) > > Index: b/kernel/power/Kconfig > =================================================================== > --- a/kernel/power/Kconfig > +++ b/kernel/power/Kconfig > @@ -80,6 +80,11 @@ config HIBERNATION > > For more information take a look at <file:Documentation/power/swsusp.rst>. > > +config HIBERNATION_USER > + bool "Userspace software suspend interface (DEPRECATED)" > + depends on HIBERNATION > + default n This needs to be "default y" for the time being. Also, I would call the option HIBERNATION_SNAPSHOT_DEV, because it effectively controls whether or not the snapshot device is available. > + > config PM_STD_PARTITION > string "Default resume partition" > depends on HIBERNATION > Index: b/kernel/power/Makefile > =================================================================== > --- a/kernel/power/Makefile > +++ b/kernel/power/Makefile > @@ -10,7 +10,8 @@ obj-$(CONFIG_VT_CONSOLE_SLEEP) += consol > obj-$(CONFIG_FREEZER) += process.o > obj-$(CONFIG_SUSPEND) += suspend.o > obj-$(CONFIG_PM_TEST_SUSPEND) += suspend_test.o > -obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o > +obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o > +obj-$(CONFIG_HIBERNATION_USER) += user.o > obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o > obj-$(CONFIG_PM_WAKELOCKS) += wakelock.o > > Index: b/kernel/power/power.h > =================================================================== > --- a/kernel/power/power.h > +++ b/kernel/power/power.h > @@ -158,8 +158,13 @@ extern sector_t alloc_swapdev_block(int > extern void free_all_swap_pages(int swap); > extern int swsusp_swap_in_use(void); > > +#ifdef CONFIG_HIBERNATION_USER > bool swsusp_try_enter(void); > void swsusp_leave(void); > +#else > +static inline bool swsusp_try_enter(void) { return 1; } > +static inline void swsusp_leave(void) {} > +#endif It is possible in theory that two processes write "disk" to /sys/power/state concurrently. Is there enough mutual exclusion in place to handle this gracefully after the above change? > > /* > * Flags that can be passed from the hibernatig hernel to the "boot" kernel in > >
On Sun, Apr 26, 2020 at 06:16:29PM +0200, Rafael J. Wysocki wrote: > On Monday, April 13, 2020 9:08:45 PM CEST Domenico Andreoli wrote: > > From: Domenico Andreoli <domenico.andreoli@linux.com> > > > > uswsusp is no longer the preferred way to suspend/hibernate and the > > userspace tools have not received any update in years. > > > > Make it possible to enable the uswsusp support only if needed, prepare > > for future phase out. > > > > Signed-off-by: Domenico Andreoli <domenico.andreoli@linux.com> > > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> > > Cc: Pavel Machek <pavel@ucw.cz> > > Cc: Linux PM <linux-pm@vger.kernel.org> > > > > --- > > kernel/power/Kconfig | 5 +++++ > > kernel/power/Makefile | 3 ++- > > kernel/power/power.h | 5 +++++ > > 3 files changed, 12 insertions(+), 1 deletion(-) > > > > Index: b/kernel/power/Kconfig > > =================================================================== > > --- a/kernel/power/Kconfig > > +++ b/kernel/power/Kconfig > > @@ -80,6 +80,11 @@ config HIBERNATION > > > > For more information take a look at <file:Documentation/power/swsusp.rst>. > > > > +config HIBERNATION_USER > > + bool "Userspace software suspend interface (DEPRECATED)" > > + depends on HIBERNATION > > + default n > > This needs to be "default y" for the time being. > > Also, I would call the option HIBERNATION_SNAPSHOT_DEV, because it effectively > controls whether or not the snapshot device is available. > > > + > > config PM_STD_PARTITION > > string "Default resume partition" > > depends on HIBERNATION > > Index: b/kernel/power/Makefile > > =================================================================== > > --- a/kernel/power/Makefile > > +++ b/kernel/power/Makefile > > @@ -10,7 +10,8 @@ obj-$(CONFIG_VT_CONSOLE_SLEEP) += consol > > obj-$(CONFIG_FREEZER) += process.o > > obj-$(CONFIG_SUSPEND) += suspend.o > > obj-$(CONFIG_PM_TEST_SUSPEND) += suspend_test.o > > -obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o > > +obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o > > +obj-$(CONFIG_HIBERNATION_USER) += user.o > > obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o > > obj-$(CONFIG_PM_WAKELOCKS) += wakelock.o > > > > Index: b/kernel/power/power.h > > =================================================================== > > --- a/kernel/power/power.h > > +++ b/kernel/power/power.h > > @@ -158,8 +158,13 @@ extern sector_t alloc_swapdev_block(int > > extern void free_all_swap_pages(int swap); > > extern int swsusp_swap_in_use(void); > > > > +#ifdef CONFIG_HIBERNATION_USER > > bool swsusp_try_enter(void); > > void swsusp_leave(void); > > +#else > > +static inline bool swsusp_try_enter(void) { return 1; } > > +static inline void swsusp_leave(void) {} > > +#endif > > It is possible in theory that two processes write "disk" to /sys/power/state > concurrently. > > Is there enough mutual exclusion in place to handle this gracefully after the > above change? No, indeed. It looks like hibernate.c needs the mutual exclusion and user.c could just use it. Should I move snapshot_device_available to hibernate.c and rename it hibernate_available?
On Mon, Apr 27, 2020 at 11:48 AM Domenico Andreoli <domenico.andreoli@linux.com> wrote: > > On Sun, Apr 26, 2020 at 06:16:29PM +0200, Rafael J. Wysocki wrote: > > On Monday, April 13, 2020 9:08:45 PM CEST Domenico Andreoli wrote: > > > From: Domenico Andreoli <domenico.andreoli@linux.com> > > > > > > uswsusp is no longer the preferred way to suspend/hibernate and the > > > userspace tools have not received any update in years. > > > > > > Make it possible to enable the uswsusp support only if needed, prepare > > > for future phase out. > > > > > > Signed-off-by: Domenico Andreoli <domenico.andreoli@linux.com> > > > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> > > > Cc: Pavel Machek <pavel@ucw.cz> > > > Cc: Linux PM <linux-pm@vger.kernel.org> > > > > > > --- > > > kernel/power/Kconfig | 5 +++++ > > > kernel/power/Makefile | 3 ++- > > > kernel/power/power.h | 5 +++++ > > > 3 files changed, 12 insertions(+), 1 deletion(-) > > > > > > Index: b/kernel/power/Kconfig > > > =================================================================== > > > --- a/kernel/power/Kconfig > > > +++ b/kernel/power/Kconfig > > > @@ -80,6 +80,11 @@ config HIBERNATION > > > > > > For more information take a look at <file:Documentation/power/swsusp.rst>. > > > > > > +config HIBERNATION_USER > > > + bool "Userspace software suspend interface (DEPRECATED)" > > > + depends on HIBERNATION > > > + default n > > > > This needs to be "default y" for the time being. > > > > Also, I would call the option HIBERNATION_SNAPSHOT_DEV, because it effectively > > controls whether or not the snapshot device is available. > > > > > + > > > config PM_STD_PARTITION > > > string "Default resume partition" > > > depends on HIBERNATION > > > Index: b/kernel/power/Makefile > > > =================================================================== > > > --- a/kernel/power/Makefile > > > +++ b/kernel/power/Makefile > > > @@ -10,7 +10,8 @@ obj-$(CONFIG_VT_CONSOLE_SLEEP) += consol > > > obj-$(CONFIG_FREEZER) += process.o > > > obj-$(CONFIG_SUSPEND) += suspend.o > > > obj-$(CONFIG_PM_TEST_SUSPEND) += suspend_test.o > > > -obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o > > > +obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o > > > +obj-$(CONFIG_HIBERNATION_USER) += user.o > > > obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o > > > obj-$(CONFIG_PM_WAKELOCKS) += wakelock.o > > > > > > Index: b/kernel/power/power.h > > > =================================================================== > > > --- a/kernel/power/power.h > > > +++ b/kernel/power/power.h > > > @@ -158,8 +158,13 @@ extern sector_t alloc_swapdev_block(int > > > extern void free_all_swap_pages(int swap); > > > extern int swsusp_swap_in_use(void); > > > > > > +#ifdef CONFIG_HIBERNATION_USER > > > bool swsusp_try_enter(void); > > > void swsusp_leave(void); > > > +#else > > > +static inline bool swsusp_try_enter(void) { return 1; } > > > +static inline void swsusp_leave(void) {} > > > +#endif > > > > It is possible in theory that two processes write "disk" to /sys/power/state > > concurrently. > > > > Is there enough mutual exclusion in place to handle this gracefully after the > > above change? > > No, indeed. > > It looks like hibernate.c needs the mutual exclusion and user.c could > just use it. Should I move snapshot_device_available to hibernate.c > and rename it hibernate_available? There is hibernation_available() already. Maybe switch over to the refcount_t API, call the variable hibernate_refcount and use refcount_add_not_zero() on it for the mutual exclusion.
On Wed, Apr 29, 2020 at 01:20:53PM +0200, Rafael J. Wysocki wrote: > On Mon, Apr 27, 2020 at 11:48 AM Domenico Andreoli <domenico.andreoli@linux.com> wrote: > > > > On Sun, Apr 26, 2020 at 06:16:29PM +0200, Rafael J. Wysocki wrote: > > > [...] > > > > > > It is possible in theory that two processes write "disk" to /sys/power/state > > > concurrently. > > > > > > Is there enough mutual exclusion in place to handle this gracefully after the > > > above change? > > > > No, indeed. > > > > It looks like hibernate.c needs the mutual exclusion and user.c could > > just use it. Should I move snapshot_device_available to hibernate.c > > and rename it hibernate_available? > > There is hibernation_available() already. > > Maybe switch over to the refcount_t API, call the variable > hibernate_refcount and use refcount_add_not_zero() on it for the > mutual exclusion. I'm doing as you ask but I'm not understanding what we actually gain from using the refcount_t API. I'm reading about relaxation of memory ordering and there is no mention on what this implies for the add_not_zero operation that we use. Honestly I would stick with the atomic_t, we just need the mutual exclusion and no saturation semantics. Dom
On Fri, May 1, 2020 at 9:10 AM Domenico Andreoli <domenico.andreoli@linux.com> wrote: > > On Wed, Apr 29, 2020 at 01:20:53PM +0200, Rafael J. Wysocki wrote: > > On Mon, Apr 27, 2020 at 11:48 AM Domenico Andreoli <domenico.andreoli@linux.com> wrote: > > > > > > On Sun, Apr 26, 2020 at 06:16:29PM +0200, Rafael J. Wysocki wrote: > > > > > > [...] > > > > > > > > > It is possible in theory that two processes write "disk" to /sys/power/state > > > > concurrently. > > > > > > > > Is there enough mutual exclusion in place to handle this gracefully after the > > > > above change? > > > > > > No, indeed. > > > > > > It looks like hibernate.c needs the mutual exclusion and user.c could > > > just use it. Should I move snapshot_device_available to hibernate.c > > > and rename it hibernate_available? > > > > There is hibernation_available() already. > > > > Maybe switch over to the refcount_t API, call the variable > > hibernate_refcount and use refcount_add_not_zero() on it for the > > mutual exclusion. > > I'm doing as you ask but I'm not understanding what we actually gain > from using the refcount_t API. > > I'm reading about relaxation of memory ordering and there is no mention > on what this implies for the add_not_zero operation that we use. The details probably don't matter, but what we use here effectively is a refcount which is not allowed to grow above 1, That's why it'd be reasonable to explicitly define it as a refcount, now that there is a suitable API.
On Fri, May 01, 2020 at 04:54:13PM +0200, Rafael J. Wysocki wrote: > On Fri, May 1, 2020 at 9:10 AM Domenico Andreoli > <domenico.andreoli@linux.com> wrote: > > > > On Wed, Apr 29, 2020 at 01:20:53PM +0200, Rafael J. Wysocki wrote: > > > On Mon, Apr 27, 2020 at 11:48 AM Domenico Andreoli <domenico.andreoli@linux.com> wrote: > > > > > > > > On Sun, Apr 26, 2020 at 06:16:29PM +0200, Rafael J. Wysocki wrote: > > > > > > > > > [...] > > > > > > > > > > > > It is possible in theory that two processes write "disk" to /sys/power/state > > > > > concurrently. > > > > > > > > > > Is there enough mutual exclusion in place to handle this gracefully after the > > > > > above change? > > > > > > > > No, indeed. > > > > > > > > It looks like hibernate.c needs the mutual exclusion and user.c could > > > > just use it. Should I move snapshot_device_available to hibernate.c > > > > and rename it hibernate_available? > > > > > > There is hibernation_available() already. > > > > > > Maybe switch over to the refcount_t API, call the variable > > > hibernate_refcount and use refcount_add_not_zero() on it for the > > > mutual exclusion. > > > > I'm doing as you ask but I'm not understanding what we actually gain > > from using the refcount_t API. > > > > I'm reading about relaxation of memory ordering and there is no mention > > on what this implies for the add_not_zero operation that we use. > > The details probably don't matter, but what we use here effectively is > a refcount which is not allowed to grow above 1, > > That's why it'd be reasonable to explicitly define it as a refcount, > now that there is a suitable API. The logic above looks fine to me and AFICT I implemented it in https://lore.kernel.org/linux-pm/20200501152304.523890160@gmail.com/. What I noticed only after I posted the patch, it triggers a warning (the ">>>>>>" traces are only in my local code): | May 3 15:06:10 dumbo kernel: [ 318.272438] >>>>>>>>>> release refcount-pre 3221225472 | May 3 15:06:10 dumbo kernel: [ 318.272441] ------------[ cut here ]------------ | May 3 15:06:10 dumbo kernel: [ 318.272442] refcount_t: saturated; leaking memory. | ... | May 3 15:06:10 dumbo kernel: [ 318.272531] Call Trace: | May 3 15:06:10 dumbo kernel: [ 318.272537] hibernate_release+0x52/0x64 | May 3 15:06:10 dumbo kernel: [ 318.272540] snapshot_release+0x47/0x70 | May 3 15:06:10 dumbo kernel: [ 318.272545] __fput+0xe1/0x250 | May 3 15:06:10 dumbo kernel: [ 318.272547] task_work_run+0x76/0xb0 | May 3 15:06:10 dumbo kernel: [ 318.272551] exit_to_usermode_loop+0xeb/0xf0 | May 3 15:06:10 dumbo kernel: [ 318.272554] do_syscall_64+0x162/0x180 | May 3 15:06:10 dumbo kernel: [ 318.272558] entry_SYSCALL_64_after_hwframe+0x44/0xa9 | May 3 15:06:10 dumbo kernel: [ 318.272560] RIP: 0033:0x7fc0c064eb54 | ... | May 3 15:06:10 dumbo kernel: [ 318.272570] ---[ end trace 9b4a89152f05edb2 ]--- | May 3 15:06:10 dumbo kernel: [ 318.272571] >>>>>>>>>> release refcount-port 3221225472 If I switch back to atomic_t, I get the expected values (my traces of two hibernation cycles): | [ 42.836678] >>>>>>>>>> acquire refcount-pre 1 | [ 42.836683] >>>>>>>>>> acquire refcount-post 0 | [ 47.313636] >>>>>>>>>> release refcount-pre 0 | [ 47.313638] >>>>>>>>>> release refcount-post 1 | [ 58.069508] >>>>>>>>>> acquire refcount-pre 1 | [ 58.069513] >>>>>>>>>> acquire refcount-post 0 | [ 63.661207] >>>>>>>>>> release refcount-pre 0 | [ 63.661209] >>>>>>>>>> release refcount-post 1 I'm still trying to understand why this difference between refcount_t and atomic_t in our context. I must be missing something big. Thanks, Dom
On Sun, May 03, 2020 at 03:31:05PM +0200, Domenico Andreoli wrote: > On Fri, May 01, 2020 at 04:54:13PM +0200, Rafael J. Wysocki wrote: > > On Fri, May 1, 2020 at 9:10 AM Domenico Andreoli > > <domenico.andreoli@linux.com> wrote: > > > > > > On Wed, Apr 29, 2020 at 01:20:53PM +0200, Rafael J. Wysocki wrote: > > > > On Mon, Apr 27, 2020 at 11:48 AM Domenico Andreoli <domenico.andreoli@linux.com> wrote: > > > > > > > > > > On Sun, Apr 26, 2020 at 06:16:29PM +0200, Rafael J. Wysocki wrote: > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > It is possible in theory that two processes write "disk" to /sys/power/state > > > > > > concurrently. > > > > > > > > > > > > Is there enough mutual exclusion in place to handle this gracefully after the > > > > > > above change? > > > > > > > > > > No, indeed. > > > > > > > > > > It looks like hibernate.c needs the mutual exclusion and user.c could > > > > > just use it. Should I move snapshot_device_available to hibernate.c > > > > > and rename it hibernate_available? > > > > > > > > There is hibernation_available() already. > > > > > > > > Maybe switch over to the refcount_t API, call the variable > > > > hibernate_refcount and use refcount_add_not_zero() on it for the > > > > mutual exclusion. > > > > > > I'm doing as you ask but I'm not understanding what we actually gain > > > from using the refcount_t API. > > > > > > I'm reading about relaxation of memory ordering and there is no mention > > > on what this implies for the add_not_zero operation that we use. > > > > The details probably don't matter, but what we use here effectively is > > a refcount which is not allowed to grow above 1, > > > > That's why it'd be reasonable to explicitly define it as a refcount, > > now that there is a suitable API. > > The logic above looks fine to me and AFICT I implemented it in > https://lore.kernel.org/linux-pm/20200501152304.523890160@gmail.com/. > > What I noticed only after I posted the patch, it triggers a warning > (the ">>>>>>" traces are only in my local code): > > | May 3 15:06:10 dumbo kernel: [ 318.272438] >>>>>>>>>> release refcount-pre 3221225472 > | May 3 15:06:10 dumbo kernel: [ 318.272441] ------------[ cut here ]------------ > | May 3 15:06:10 dumbo kernel: [ 318.272442] refcount_t: saturated; leaking memory. > | ... > | May 3 15:06:10 dumbo kernel: [ 318.272531] Call Trace: > | May 3 15:06:10 dumbo kernel: [ 318.272537] hibernate_release+0x52/0x64 > | May 3 15:06:10 dumbo kernel: [ 318.272540] snapshot_release+0x47/0x70 > | May 3 15:06:10 dumbo kernel: [ 318.272545] __fput+0xe1/0x250 > | May 3 15:06:10 dumbo kernel: [ 318.272547] task_work_run+0x76/0xb0 > | May 3 15:06:10 dumbo kernel: [ 318.272551] exit_to_usermode_loop+0xeb/0xf0 > | May 3 15:06:10 dumbo kernel: [ 318.272554] do_syscall_64+0x162/0x180 > | May 3 15:06:10 dumbo kernel: [ 318.272558] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > | May 3 15:06:10 dumbo kernel: [ 318.272560] RIP: 0033:0x7fc0c064eb54 > | ... > | May 3 15:06:10 dumbo kernel: [ 318.272570] ---[ end trace 9b4a89152f05edb2 ]--- > | May 3 15:06:10 dumbo kernel: [ 318.272571] >>>>>>>>>> release refcount-port 3221225472 > > If I switch back to atomic_t, I get the expected values (my traces of > two hibernation cycles): > > | [ 42.836678] >>>>>>>>>> acquire refcount-pre 1 > | [ 42.836683] >>>>>>>>>> acquire refcount-post 0 > | [ 47.313636] >>>>>>>>>> release refcount-pre 0 > | [ 47.313638] >>>>>>>>>> release refcount-post 1 > | [ 58.069508] >>>>>>>>>> acquire refcount-pre 1 > | [ 58.069513] >>>>>>>>>> acquire refcount-post 0 > | [ 63.661207] >>>>>>>>>> release refcount-pre 0 > | [ 63.661209] >>>>>>>>>> release refcount-post 1 > > I'm still trying to understand why this difference between refcount_t > and atomic_t in our context. I must be missing something big. The problem is in refcount_add(): | static inline void refcount_add(int i, refcount_t *r) | { | int old = atomic_fetch_add_relaxed(i, &r->refs); | | if (unlikely(!old)) | refcount_warn_saturate(r, REFCOUNT_ADD_UAF); | else if (unlikely(old < 0 || old + i < 0)) | refcount_warn_saturate(r, REFCOUNT_ADD_OVF); | } She clearly does not like to add anything to zero. Which does not make much sense to me. Dom
Index: b/kernel/power/Kconfig =================================================================== --- a/kernel/power/Kconfig +++ b/kernel/power/Kconfig @@ -80,6 +80,11 @@ config HIBERNATION For more information take a look at <file:Documentation/power/swsusp.rst>. +config HIBERNATION_USER + bool "Userspace software suspend interface (DEPRECATED)" + depends on HIBERNATION + default n + config PM_STD_PARTITION string "Default resume partition" depends on HIBERNATION Index: b/kernel/power/Makefile =================================================================== --- a/kernel/power/Makefile +++ b/kernel/power/Makefile @@ -10,7 +10,8 @@ obj-$(CONFIG_VT_CONSOLE_SLEEP) += consol obj-$(CONFIG_FREEZER) += process.o obj-$(CONFIG_SUSPEND) += suspend.o obj-$(CONFIG_PM_TEST_SUSPEND) += suspend_test.o -obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o +obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o +obj-$(CONFIG_HIBERNATION_USER) += user.o obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o obj-$(CONFIG_PM_WAKELOCKS) += wakelock.o Index: b/kernel/power/power.h =================================================================== --- a/kernel/power/power.h +++ b/kernel/power/power.h @@ -158,8 +158,13 @@ extern sector_t alloc_swapdev_block(int extern void free_all_swap_pages(int swap); extern int swsusp_swap_in_use(void); +#ifdef CONFIG_HIBERNATION_USER bool swsusp_try_enter(void); void swsusp_leave(void); +#else +static inline bool swsusp_try_enter(void) { return 1; } +static inline void swsusp_leave(void) {} +#endif /* * Flags that can be passed from the hibernatig hernel to the "boot" kernel in