diff mbox

[BUG] 3.7-rc regression bisected: s2disk fails to resume image: Processes could not be frozen, cannot continue resuming

Message ID 20130918191607.GO13318@ZenIV.linux.org.uk (mailing list archive)
State RFC, archived
Headers show

Commit Message

Al Viro Sept. 18, 2013, 7:16 p.m. UTC
On Wed, Sep 18, 2013 at 10:40:32PM +0400, Andrew Savchenko wrote:

> And from suspend_ioctls.h:
> #define SNAPSHOT_IOC_MAGIC      '3'
> #define SNAPSHOT_FREEZE                 _IO(SNAPSHOT_IOC_MAGIC, 1)
> 
> My mistake, should be '3' instead of 3.

OK...  The thing to test, then, is what does __usermodehelper_disable()
return to freeze_processes().  If that's where this -EAGAIN comes from,
we at least have a plausible theory re what's going on.

freeze_processes() uses __usermodehelper_disable() to stop any new userland
processes spawned by UMH (modprobe, etc.) and waits for ones it might be
waiting for to complete.  Then it does try_to_freeze_tasks(), which
freezes remaining userland, carefully skipping the current thread.
However, it misses the possibility that current thread might have been
spawned by something that had been launched by UMH, with UMH waiting
for it.  Which is the case of everything spawned by linuxrc.

I'd try something like diff below, but I'm *NOT* familiar with swsusp at
all; it's not for mainline until ACKed by swsusp folks.

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Andrew Savchenko Sept. 18, 2013, 10:13 p.m. UTC | #1
On Wed, 18 Sep 2013 20:16:07 +0100 Al Viro wrote:
> On Wed, Sep 18, 2013 at 10:40:32PM +0400, Andrew Savchenko wrote:
> 
> > And from suspend_ioctls.h:
> > #define SNAPSHOT_IOC_MAGIC      '3'
> > #define SNAPSHOT_FREEZE                 _IO(SNAPSHOT_IOC_MAGIC, 1)
> > 
> > My mistake, should be '3' instead of 3.
> 
> OK...  The thing to test, then, is what does __usermodehelper_disable()
> return to freeze_processes().  If that's where this -EAGAIN comes from,
> we at least have a plausible theory re what's going on.
> 
> freeze_processes() uses __usermodehelper_disable() to stop any new userland
> processes spawned by UMH (modprobe, etc.) and waits for ones it might be
> waiting for to complete.  Then it does try_to_freeze_tasks(), which
> freezes remaining userland, carefully skipping the current thread.
> However, it misses the possibility that current thread might have been
> spawned by something that had been launched by UMH, with UMH waiting
> for it.  Which is the case of everything spawned by linuxrc.
> 
> I'd try something like diff below, but I'm *NOT* familiar with swsusp at
> all; it's not for mainline until ACKed by swsusp folks.
> 
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index fb32636..d968882 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -571,7 +571,8 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
>  	DECLARE_COMPLETION_ONSTACK(done);
>  	int retval = 0;
>  
> -	helper_lock();
> +	if (!(current->flags & PF_FREEZER_SKIP))
> +		helper_lock();
>  	if (!khelper_wq || usermodehelper_disabled) {
>  		retval = -EBUSY;
>  		goto out;
> @@ -611,7 +612,8 @@ wait_done:
>  out:
>  	call_usermodehelper_freeinfo(sub_info);
>  unlock:
> -	helper_unlock();
> +	if (!(current->flags & PF_FREEZER_SKIP))
> +		helper_unlock();
>  	return retval;
>  }
>  EXPORT_SYMBOL(call_usermodehelper_exec);

With this patch and 3.11.1 kernel resume works fine.

Best regards,
Andrew Savchenko
Pavel Machek Sept. 24, 2013, 12:21 a.m. UTC | #2
Hi!

> > And from suspend_ioctls.h:
> > #define SNAPSHOT_IOC_MAGIC      '3'
> > #define SNAPSHOT_FREEZE                 _IO(SNAPSHOT_IOC_MAGIC, 1)
> > 
> > My mistake, should be '3' instead of 3.
> 
> OK...  The thing to test, then, is what does __usermodehelper_disable()
> return to freeze_processes().  If that's where this -EAGAIN comes from,
> we at least have a plausible theory re what's going on.
> 
> freeze_processes() uses __usermodehelper_disable() to stop any new userland
> processes spawned by UMH (modprobe, etc.) and waits for ones it might be
> waiting for to complete.  Then it does try_to_freeze_tasks(), which
> freezes remaining userland, carefully skipping the current thread.
> However, it misses the possibility that current thread might have been
> spawned by something that had been launched by UMH, with UMH waiting
> for it.  Which is the case of everything spawned by linuxrc.
> 
> I'd try something like diff below, but I'm *NOT* familiar with swsusp at
> all; it's not for mainline until ACKed by swsusp folks.
> 
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index fb32636..d968882 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -571,7 +571,8 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
>  	DECLARE_COMPLETION_ONSTACK(done);
>  	int retval = 0;
>  
> -	helper_lock();
> +	if (!(current->flags & PF_FREEZER_SKIP))
> +		helper_lock();
>  	if (!khelper_wq || usermodehelper_disabled) {
>  		retval = -EBUSY;
>  		goto out;
> @@ -611,7 +612,8 @@ wait_done:
>  out:
>  	call_usermodehelper_freeinfo(sub_info);
>  unlock:
> -	helper_unlock();
> +	if (!(current->flags & PF_FREEZER_SKIP))
> +		helper_unlock();
>  	return retval;
>  }
>  EXPORT_SYMBOL(call_usermodehelper_exec);

PF_FREEZER_SKIP flag is manipulated at about 1000 places, so I'm not
sure this will nest correctly. They seem to be in form of 

|= FREEZER_SKIP
schedule()
&= ~FREEZER_SKIP

so this should be safe, but...

								Pavel
Rafael Wysocki Oct. 17, 2013, 9:35 p.m. UTC | #3
Sorry for the huge delay.

On Tuesday, September 24, 2013 02:21:11 AM Pavel Machek wrote:
> Hi!
> 
> > > And from suspend_ioctls.h:
> > > #define SNAPSHOT_IOC_MAGIC      '3'
> > > #define SNAPSHOT_FREEZE                 _IO(SNAPSHOT_IOC_MAGIC, 1)
> > > 
> > > My mistake, should be '3' instead of 3.
> > 
> > OK...  The thing to test, then, is what does __usermodehelper_disable()
> > return to freeze_processes().  If that's where this -EAGAIN comes from,
> > we at least have a plausible theory re what's going on.
> > 
> > freeze_processes() uses __usermodehelper_disable() to stop any new userland
> > processes spawned by UMH (modprobe, etc.) and waits for ones it might be
> > waiting for to complete.  Then it does try_to_freeze_tasks(), which
> > freezes remaining userland, carefully skipping the current thread.
> > However, it misses the possibility that current thread might have been
> > spawned by something that had been launched by UMH, with UMH waiting
> > for it.  Which is the case of everything spawned by linuxrc.
> > 
> > I'd try something like diff below, but I'm *NOT* familiar with swsusp at
> > all; it's not for mainline until ACKed by swsusp folks.
> > 
> > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > index fb32636..d968882 100644
> > --- a/kernel/kmod.c
> > +++ b/kernel/kmod.c
> > @@ -571,7 +571,8 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
> >  	DECLARE_COMPLETION_ONSTACK(done);
> >  	int retval = 0;
> >  
> > -	helper_lock();
> > +	if (!(current->flags & PF_FREEZER_SKIP))
> > +		helper_lock();
> >  	if (!khelper_wq || usermodehelper_disabled) {
> >  		retval = -EBUSY;
> >  		goto out;
> > @@ -611,7 +612,8 @@ wait_done:
> >  out:
> >  	call_usermodehelper_freeinfo(sub_info);
> >  unlock:
> > -	helper_unlock();
> > +	if (!(current->flags & PF_FREEZER_SKIP))
> > +		helper_unlock();
> >  	return retval;
> >  }
> >  EXPORT_SYMBOL(call_usermodehelper_exec);
> 
> PF_FREEZER_SKIP flag is manipulated at about 1000 places, so I'm not
> sure this will nest correctly.

This is not exactly correct unless 1000 is about 50.  And none of them leads to
call_usermodehelper_exec() as far as I can say.

> They seem to be in form of 
> 
> |= FREEZER_SKIP
> schedule()
> &= ~FREEZER_SKIP
> 
> so this should be safe, but...

I think the patch is correct, so

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Thanks!
diff mbox

Patch

diff --git a/kernel/kmod.c b/kernel/kmod.c
index fb32636..d968882 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -571,7 +571,8 @@  int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
 	DECLARE_COMPLETION_ONSTACK(done);
 	int retval = 0;
 
-	helper_lock();
+	if (!(current->flags & PF_FREEZER_SKIP))
+		helper_lock();
 	if (!khelper_wq || usermodehelper_disabled) {
 		retval = -EBUSY;
 		goto out;
@@ -611,7 +612,8 @@  wait_done:
 out:
 	call_usermodehelper_freeinfo(sub_info);
 unlock:
-	helper_unlock();
+	if (!(current->flags & PF_FREEZER_SKIP))
+		helper_unlock();
 	return retval;
 }
 EXPORT_SYMBOL(call_usermodehelper_exec);