diff mbox

[v2,03/10] freezer: add new freezable helpers using freezer_do_not_count()

Message ID 1367458508-9133-4-git-send-email-ccross@android.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Colin Cross May 2, 2013, 1:35 a.m. UTC
Freezing tasks will wake up almost every userspace task from
where it is blocking and force it to run until it hits a
call to try_to_sleep(), generally on the exit path from the syscall
it is blocking in.  On resume each task will run again, usually
restarting the syscall and running until it hits the same
blocking call as it was originally blocked in.

To allow tasks to avoid running on every suspend/resume cycle,
this patch adds additional freezable wrappers around blocking calls
that call freezer_do_not_count().  Combined with the previous patch,
these tasks will not run during suspend or resume unless they wake
up for another reason, in which case they will run until they hit
the try_to_freeze() in freezer_count(), and then continue processing
the wakeup after tasks are thawed.  This patch also converts the
existing wait_event_freezable* wrappers to use freezer_do_not_count().

Additional patches will convert the most common locations that
userspace blocks in to use freezable helpers.

Signed-off-by: Colin Cross <ccross@android.com>
---
 include/linux/freezer.h | 72 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 58 insertions(+), 14 deletions(-)

Comments

Tejun Heo May 2, 2013, 11:55 p.m. UTC | #1
Hello,

On Wed, May 01, 2013 at 06:35:01PM -0700, Colin Cross wrote:
> To allow tasks to avoid running on every suspend/resume cycle,
> this patch adds additional freezable wrappers around blocking calls
> that call freezer_do_not_count().  Combined with the previous patch,
> these tasks will not run during suspend or resume unless they wake
> up for another reason, in which case they will run until they hit
> the try_to_freeze() in freezer_count(), and then continue processing
> the wakeup after tasks are thawed.  This patch also converts the
> existing wait_event_freezable* wrappers to use freezer_do_not_count().

I'd much prefer the latter part being a separate patch because the
change isn't really trivial and it isn't exactly equivalent - before
we prioritized meeting the condition over freezing, now it's the other
way around.  It's fine but does deserve description in the log so that
if something gets tracked down to the commit, it's easy to tell how
the behavior flipped and why.

> +/* Like schedule_timeout(), but should not block the freezer. */
> +#define freezable_schedule_timeout(timeout)				\
> +({									\
> +	long __retval;							\
> +	freezer_do_not_count();						\
> +	__retval = schedule_timeout(timeout);				\
> +	freezer_count();						\
> +	__retval;							\
> +})
> +
> +/* Like schedule_timeout_interruptible(), but should not block the freezer. */
> +#define freezable_schedule_timeout_interruptible(timeout)		\
> +({									\
> +	long __retval;							\
> +	freezer_do_not_count();						\
> +	__retval = schedule_timeout_interruptible(timeout);		\
> +	freezer_count();						\
> +	__retval;							\
> +})
...
> +/* Like schedule_hrtimeout_range(), but should not block the freezer. */
> +#define freezable_schedule_hrtimeout_range(expires, delta, mode)	\
> +({									\
> +	int __retval;							\
> +	freezer_do_not_count();						\
> +	__retval = schedule_hrtimeout_range(expires, delta, mode);	\
> +	freezer_count();						\
> +	__retval;							\
> +})

(cc'ing Jeff Layton)

So, one worry that I have about this is that freezer essentially
behaves like a huge lock that everyone grabs and while scattering
try_to_freeze() calls around might seem innocuous, it effectively
adding a dependency to that single giant lock to that place, so
whenever you're adding try_to_freeze() call while holding any kind of
locks, it substiantially increases the chance of possible deadlock
scenarios.  NFS recently got bitten by it and now is trying to get rid
of them as it's fundamentally broken.

So, the freezable interface can't be something that people can use
casually.  It is something which should be carefully and strategically
deployed where we *know* that lock dependency risks don't exist or at
least are acceptable.  I'm a bit weary that this patch is expanding
the interface a lot that they now look like the equivalents of normal
schedule calls.  Not exactly sure what to do here but can we please at
least have RED BOLD BLINKING comments which scream to people not to
use these unless they know what they're doing?

Thanks.
Tejun Heo May 3, 2013, 12:03 a.m. UTC | #2
On Thu, May 02, 2013 at 04:55:05PM -0700, Tejun Heo wrote:
> So, the freezable interface can't be something that people can use
> casually.  It is something which should be carefully and strategically
> deployed where we *know* that lock dependency risks don't exist or at
> least are acceptable.  I'm a bit weary that this patch is expanding
> the interface a lot that they now look like the equivalents of normal
> schedule calls.  Not exactly sure what to do here but can we please at
> least have RED BOLD BLINKING comments which scream to people not to
> use these unless they know what they're doing?

Maybe we should trigger WARN_ON_ONCE() if lockdep_depth() > 0 by
default and have ugly variants which can be used if the caller is sure
that it's okay possibly with list of locks which are held?
Colin Cross May 3, 2013, 2:16 a.m. UTC | #3
This sounds the same as what ended up getting reverted in
https://lkml.org/lkml/2013/3/4/221
I can add the WARN_ON_ONCE to all my new calls, and leave them out of
existing calls, but that seems a little odd, and will be redundant if
the lockdep call in try_to_freeze goes back in in 3.11.  Do you still
want it in the new apis?

On Thu, May 2, 2013 at 5:03 PM, Tejun Heo <tj@kernel.org> wrote:
> On Thu, May 02, 2013 at 04:55:05PM -0700, Tejun Heo wrote:
>> So, the freezable interface can't be something that people can use
>> casually.  It is something which should be carefully and strategically
>> deployed where we *know* that lock dependency risks don't exist or at
>> least are acceptable.  I'm a bit weary that this patch is expanding
>> the interface a lot that they now look like the equivalents of normal
>> schedule calls.  Not exactly sure what to do here but can we please at
>> least have RED BOLD BLINKING comments which scream to people not to
>> use these unless they know what they're doing?
>
> Maybe we should trigger WARN_ON_ONCE() if lockdep_depth() > 0 by
> default and have ugly variants which can be used if the caller is sure
> that it's okay possibly with list of locks which are held?
>
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Colin Cross May 3, 2013, 2:41 a.m. UTC | #4
On Thu, May 2, 2013 at 7:16 PM, Colin Cross <ccross@android.com> wrote:
> This sounds the same as what ended up getting reverted in
> https://lkml.org/lkml/2013/3/4/221
> I can add the WARN_ON_ONCE to all my new calls, and leave them out of
> existing calls, but that seems a little odd, and will be redundant if
> the lockdep call in try_to_freeze goes back in in 3.11.  Do you still
> want it in the new apis?
>
> On Thu, May 2, 2013 at 5:03 PM, Tejun Heo <tj@kernel.org> wrote:
>> On Thu, May 02, 2013 at 04:55:05PM -0700, Tejun Heo wrote:
>>> So, the freezable interface can't be something that people can use
>>> casually.  It is something which should be carefully and strategically
>>> deployed where we *know* that lock dependency risks don't exist or at
>>> least are acceptable.  I'm a bit weary that this patch is expanding
>>> the interface a lot that they now look like the equivalents of normal
>>> schedule calls.  Not exactly sure what to do here but can we please at
>>> least have RED BOLD BLINKING comments which scream to people not to
>>> use these unless they know what they're doing?
>>
>> Maybe we should trigger WARN_ON_ONCE() if lockdep_depth() > 0 by
>> default and have ugly variants which can be used if the caller is sure
>> that it's okay possibly with list of locks which are held?
>>
>> --
>> tejun

(sorry for the top post)

I could also put the lockdep check that was reveted back into
try_to_freeze(), and add a freezable_schedule_unsafe() that skips it
for use in the known-unsafe users in nfs, with a big comment not to
add new users of it.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo May 3, 2013, 4:09 a.m. UTC | #5
Hello,

On Thu, May 02, 2013 at 07:41:39PM -0700, Colin Cross wrote:
> On Thu, May 2, 2013 at 7:16 PM, Colin Cross <ccross@android.com> wrote:
> > This sounds the same as what ended up getting reverted in
> > https://lkml.org/lkml/2013/3/4/221
> > I can add the WARN_ON_ONCE to all my new calls, and leave them out of
> > existing calls, but that seems a little odd, and will be redundant if
> > the lockdep call in try_to_freeze goes back in in 3.11.  Do you still
> > want it in the new apis?
...
> I could also put the lockdep check that was reveted back into
> try_to_freeze(), and add a freezable_schedule_unsafe() that skips it
> for use in the known-unsafe users in nfs, with a big comment not to
> add new users of it.

Oh yeah, that sounds like a good idea, and as for the non-ugly
variants, at least in the mid/long term, I think it'd be best to add
the lockdep annotation to try_to_freeze() with
__try_to_freeze_unsafe_youre_gonna_burn_in_hell_if_you_use_this() for
the existing users which should be gradually converted, but if that's
too burdensome, adding warnings to the wrappers should do for now, I
guess.

And I *hope* the lockdep annotation is stricter than what was added
before.  I think it better be "no lock ever should be held at this
point" rather than "consider this a big lock".  I'm not sure how much
consensus we have on this but I'd like to, in the longer term, merge
freezer into job control stop so that frozen tasks don't appear as
being in uninterruptible sleep.  Currently, the frozen tasks are
essentially in UNINTERRUPTIBLE sleep which is okay for suspend but for
cgroup freezer, it sucks.  You end up with tasks which you can't even
kill.

Combined with the locking problems, I was planning to update the
freezer such that the frozen state is implemented as a form of jobctl
stop, so that things like ptrace / kill -9 could work on them and we
also have the clear definition of the frozen state rather than the
current "it may get stuck somewhere in the kernel".

But that conflicts with what you're doing here which seems pretty
useful, so, to satisfy both goals, when somebody needs to put a
pseudo-frozen task into the actual frozen jobctl stop, those spots
which are currently using try_to_stop() would have to return an error,
most likely -EINTR with TIF_SIGPENDING set, and the control should
return towards userland so that signal handling path can be invoked.
ie. It should be possible to steer the tasks which are considered
frozen but not in the frozen jobctl stop into the jobctl stop without
any side effect.  To do that, those spots basically have to be pretty
close to the userland boundary where it can easily leave the kernel
with -EINTR and AFAICS all the spots that you converted are like that
(which I think is natural).  While not holding any locks doesn't
guarantee that, I think there'd be a fairly high correlation at least
and it'd be able to drive people towards finding out what's going on.

So, that's my agenda.  Not sure how many actually agree with it.
Rafael, Oleg, Jeff?

Thanks.
Tejun Heo May 3, 2013, 4:12 a.m. UTC | #6
On Thu, May 02, 2013 at 09:09:34PM -0700, Tejun Heo wrote:
> But that conflicts with what you're doing here which seems pretty
> useful, so, to satisfy both goals, when somebody needs to put a
> pseudo-frozen task into the actual frozen jobctl stop, those spots
> which are currently using try_to_stop() would have to return an error,
> most likely -EINTR with TIF_SIGPENDING set, and the control should

Let's make that -ERESTART*.
Colin Cross May 3, 2013, 4:17 a.m. UTC | #7
On Thu, May 2, 2013 at 9:09 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Thu, May 02, 2013 at 07:41:39PM -0700, Colin Cross wrote:
>> On Thu, May 2, 2013 at 7:16 PM, Colin Cross <ccross@android.com> wrote:
>> > This sounds the same as what ended up getting reverted in
>> > https://lkml.org/lkml/2013/3/4/221
>> > I can add the WARN_ON_ONCE to all my new calls, and leave them out of
>> > existing calls, but that seems a little odd, and will be redundant if
>> > the lockdep call in try_to_freeze goes back in in 3.11.  Do you still
>> > want it in the new apis?
> ...
>> I could also put the lockdep check that was reveted back into
>> try_to_freeze(), and add a freezable_schedule_unsafe() that skips it
>> for use in the known-unsafe users in nfs, with a big comment not to
>> add new users of it.
>
> Oh yeah, that sounds like a good idea, and as for the non-ugly
> variants, at least in the mid/long term, I think it'd be best to add
> the lockdep annotation to try_to_freeze() with
> __try_to_freeze_unsafe_youre_gonna_burn_in_hell_if_you_use_this() for
> the existing users which should be gradually converted, but if that's
> too burdensome, adding warnings to the wrappers should do for now, I
> guess.
>
> And I *hope* the lockdep annotation is stricter than what was added
> before.  I think it better be "no lock ever should be held at this
> point" rather than "consider this a big lock".

The previous patch (6aa9707099c4b25700940eb3d016f16c4434360d in Linus'
tree) already required that no locks be held, it wasn't using a lock
annotation.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo May 3, 2013, 4:20 a.m. UTC | #8
On Thu, May 02, 2013 at 09:17:21PM -0700, Colin Cross wrote:
> > And I *hope* the lockdep annotation is stricter than what was added
> > before.  I think it better be "no lock ever should be held at this
> > point" rather than "consider this a big lock".
> 
> The previous patch (6aa9707099c4b25700940eb3d016f16c4434360d in Linus'
> tree) already required that no locks be held, it wasn't using a lock
> annotation.

Ooh, cool.  Thanks for the pointer.  Forget about my rambling and
let's just add an unsafe version of try_to_freeze() and be done with
it.  :)
Alan Stern May 3, 2013, 2:18 p.m. UTC | #9
On Thu, 2 May 2013, Tejun Heo wrote:

> Combined with the locking problems, I was planning to update the
> freezer such that the frozen state is implemented as a form of jobctl
> stop, so that things like ptrace / kill -9 could work on them and we
> also have the clear definition of the frozen state rather than the
> current "it may get stuck somewhere in the kernel".
> 
> But that conflicts with what you're doing here which seems pretty
> useful, so, to satisfy both goals, when somebody needs to put a
> pseudo-frozen task into the actual frozen jobctl stop, those spots
> which are currently using try_to_stop() would have to return an error,
> most likely -EINTR with TIF_SIGPENDING set, and the control should
> return towards userland so that signal handling path can be invoked.
> ie. It should be possible to steer the tasks which are considered
> frozen but not in the frozen jobctl stop into the jobctl stop without
> any side effect.  To do that, those spots basically have to be pretty
> close to the userland boundary where it can easily leave the kernel
> with -EINTR and AFAICS all the spots that you converted are like that
> (which I think is natural).  While not holding any locks doesn't
> guarantee that, I think there'd be a fairly high correlation at least
> and it'd be able to drive people towards finding out what's going on.

Don't forget about freezable kernel threads.  They never cross the 
kernel/user boundary.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo May 3, 2013, 4:54 p.m. UTC | #10
On Fri, May 03, 2013 at 10:18:17AM -0400, Alan Stern wrote:
> Don't forget about freezable kernel threads.  They never cross the 
> kernel/user boundary.

Doesn't really matter for this purpose.  They first of all can't be
moved into !root cgroups and thus can't be cgroup-frozen and even if
they could they can't be killed or ptraced, so they just don't play
the same game.

Thanks.
diff mbox

Patch

diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index e70df40..b239f45 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -152,6 +152,26 @@  static inline bool freezer_should_skip(struct task_struct *p)
 	freezer_count();						\
 })
 
+/* Like schedule_timeout(), but should not block the freezer. */
+#define freezable_schedule_timeout(timeout)				\
+({									\
+	long __retval;							\
+	freezer_do_not_count();						\
+	__retval = schedule_timeout(timeout);				\
+	freezer_count();						\
+	__retval;							\
+})
+
+/* Like schedule_timeout_interruptible(), but should not block the freezer. */
+#define freezable_schedule_timeout_interruptible(timeout)		\
+({									\
+	long __retval;							\
+	freezer_do_not_count();						\
+	__retval = schedule_timeout_interruptible(timeout);		\
+	freezer_count();						\
+	__retval;							\
+})
+
 /* Like schedule_timeout_killable(), but should not block the freezer. */
 #define freezable_schedule_timeout_killable(timeout)			\
 ({									\
@@ -162,6 +182,17 @@  static inline bool freezer_should_skip(struct task_struct *p)
 	__retval;							\
 })
 
+/* Like schedule_hrtimeout_range(), but should not block the freezer. */
+#define freezable_schedule_hrtimeout_range(expires, delta, mode)	\
+({									\
+	int __retval;							\
+	freezer_do_not_count();						\
+	__retval = schedule_hrtimeout_range(expires, delta, mode);	\
+	freezer_count();						\
+	__retval;							\
+})
+
+
 /*
  * Freezer-friendly wrappers around wait_event_interruptible(),
  * wait_event_killable() and wait_event_interruptible_timeout(), originally
@@ -180,30 +211,32 @@  static inline bool freezer_should_skip(struct task_struct *p)
 #define wait_event_freezable(wq, condition)				\
 ({									\
 	int __retval;							\
-	for (;;) {							\
-		__retval = wait_event_interruptible(wq, 		\
-				(condition) || freezing(current));	\
-		if (__retval || (condition))				\
-			break;						\
-		try_to_freeze();					\
-	}								\
+	freezer_do_not_count();						\
+	__retval = wait_event_interruptible(wq, (condition));		\
+	freezer_count();						\
 	__retval;							\
 })
 
 #define wait_event_freezable_timeout(wq, condition, timeout)		\
 ({									\
 	long __retval = timeout;					\
-	for (;;) {							\
-		__retval = wait_event_interruptible_timeout(wq,		\
-				(condition) || freezing(current),	\
+	freezer_do_not_count();						\
+	__retval = wait_event_interruptible_timeout(wq,	(condition),	\
 				__retval); 				\
-		if (__retval <= 0 || (condition))			\
-			break;						\
-		try_to_freeze();					\
-	}								\
+	freezer_count();						\
 	__retval;							\
 })
 
+#define wait_event_freezable_exclusive(wq, condition)		\
+({									\
+	int __retval;							\
+	freezer_do_not_count();						\
+	__retval = wait_event_interruptible_exclusive(wq, condition);	\
+	freezer_count();						\
+	__retval;							\
+})
+
+
 #else /* !CONFIG_FREEZER */
 static inline bool frozen(struct task_struct *p) { return false; }
 static inline bool freezing(struct task_struct *p) { return false; }
@@ -225,15 +258,26 @@  static inline void set_freezable(void) {}
 
 #define freezable_schedule()  schedule()
 
+#define freezable_schedule_timeout(timeout)  schedule_timeout(timeout)
+
+#define freezable_schedule_timeout_interruptible(timeout)		\
+	schedule_timeout_interruptible(timeout)
+
 #define freezable_schedule_timeout_killable(timeout)			\
 	schedule_timeout_killable(timeout)
 
+#define freezable_schedule_hrtimeout_range(expires, delta, mode)	\
+	schedule_hrtimeout_range(expires, delta, mode)
+
 #define wait_event_freezable(wq, condition)				\
 		wait_event_interruptible(wq, condition)
 
 #define wait_event_freezable_timeout(wq, condition, timeout)		\
 		wait_event_interruptible_timeout(wq, condition, timeout)
 
+#define wait_event_freezable_exclusive(wq, condition)			\
+		wait_event_interruptible_exclusive(wq, condition)
+
 #define wait_event_freezekillable(wq, condition)		\
 		wait_event_killable(wq, condition)