Message ID | 1367458508-9133-4-git-send-email-ccross@android.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Hello, On Wed, May 01, 2013 at 06:35:01PM -0700, Colin Cross wrote: > To allow tasks to avoid running on every suspend/resume cycle, > this patch adds additional freezable wrappers around blocking calls > that call freezer_do_not_count(). Combined with the previous patch, > these tasks will not run during suspend or resume unless they wake > up for another reason, in which case they will run until they hit > the try_to_freeze() in freezer_count(), and then continue processing > the wakeup after tasks are thawed. This patch also converts the > existing wait_event_freezable* wrappers to use freezer_do_not_count(). I'd much prefer the latter part being a separate patch because the change isn't really trivial and it isn't exactly equivalent - before we prioritized meeting the condition over freezing, now it's the other way around. It's fine but does deserve description in the log so that if something gets tracked down to the commit, it's easy to tell how the behavior flipped and why. > +/* Like schedule_timeout(), but should not block the freezer. */ > +#define freezable_schedule_timeout(timeout) \ > +({ \ > + long __retval; \ > + freezer_do_not_count(); \ > + __retval = schedule_timeout(timeout); \ > + freezer_count(); \ > + __retval; \ > +}) > + > +/* Like schedule_timeout_interruptible(), but should not block the freezer. */ > +#define freezable_schedule_timeout_interruptible(timeout) \ > +({ \ > + long __retval; \ > + freezer_do_not_count(); \ > + __retval = schedule_timeout_interruptible(timeout); \ > + freezer_count(); \ > + __retval; \ > +}) ... > +/* Like schedule_hrtimeout_range(), but should not block the freezer. */ > +#define freezable_schedule_hrtimeout_range(expires, delta, mode) \ > +({ \ > + int __retval; \ > + freezer_do_not_count(); \ > + __retval = schedule_hrtimeout_range(expires, delta, mode); \ > + freezer_count(); \ > + __retval; \ > +}) (cc'ing Jeff Layton) So, one worry that I have about this is that freezer essentially behaves like a huge lock that everyone grabs and while scattering try_to_freeze() calls around might seem innocuous, it effectively adding a dependency to that single giant lock to that place, so whenever you're adding try_to_freeze() call while holding any kind of locks, it substiantially increases the chance of possible deadlock scenarios. NFS recently got bitten by it and now is trying to get rid of them as it's fundamentally broken. So, the freezable interface can't be something that people can use casually. It is something which should be carefully and strategically deployed where we *know* that lock dependency risks don't exist or at least are acceptable. I'm a bit weary that this patch is expanding the interface a lot that they now look like the equivalents of normal schedule calls. Not exactly sure what to do here but can we please at least have RED BOLD BLINKING comments which scream to people not to use these unless they know what they're doing? Thanks.
On Thu, May 02, 2013 at 04:55:05PM -0700, Tejun Heo wrote: > So, the freezable interface can't be something that people can use > casually. It is something which should be carefully and strategically > deployed where we *know* that lock dependency risks don't exist or at > least are acceptable. I'm a bit weary that this patch is expanding > the interface a lot that they now look like the equivalents of normal > schedule calls. Not exactly sure what to do here but can we please at > least have RED BOLD BLINKING comments which scream to people not to > use these unless they know what they're doing? Maybe we should trigger WARN_ON_ONCE() if lockdep_depth() > 0 by default and have ugly variants which can be used if the caller is sure that it's okay possibly with list of locks which are held?
This sounds the same as what ended up getting reverted in https://lkml.org/lkml/2013/3/4/221 I can add the WARN_ON_ONCE to all my new calls, and leave them out of existing calls, but that seems a little odd, and will be redundant if the lockdep call in try_to_freeze goes back in in 3.11. Do you still want it in the new apis? On Thu, May 2, 2013 at 5:03 PM, Tejun Heo <tj@kernel.org> wrote: > On Thu, May 02, 2013 at 04:55:05PM -0700, Tejun Heo wrote: >> So, the freezable interface can't be something that people can use >> casually. It is something which should be carefully and strategically >> deployed where we *know* that lock dependency risks don't exist or at >> least are acceptable. I'm a bit weary that this patch is expanding >> the interface a lot that they now look like the equivalents of normal >> schedule calls. Not exactly sure what to do here but can we please at >> least have RED BOLD BLINKING comments which scream to people not to >> use these unless they know what they're doing? > > Maybe we should trigger WARN_ON_ONCE() if lockdep_depth() > 0 by > default and have ugly variants which can be used if the caller is sure > that it's okay possibly with list of locks which are held? > > -- > tejun -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 2, 2013 at 7:16 PM, Colin Cross <ccross@android.com> wrote: > This sounds the same as what ended up getting reverted in > https://lkml.org/lkml/2013/3/4/221 > I can add the WARN_ON_ONCE to all my new calls, and leave them out of > existing calls, but that seems a little odd, and will be redundant if > the lockdep call in try_to_freeze goes back in in 3.11. Do you still > want it in the new apis? > > On Thu, May 2, 2013 at 5:03 PM, Tejun Heo <tj@kernel.org> wrote: >> On Thu, May 02, 2013 at 04:55:05PM -0700, Tejun Heo wrote: >>> So, the freezable interface can't be something that people can use >>> casually. It is something which should be carefully and strategically >>> deployed where we *know* that lock dependency risks don't exist or at >>> least are acceptable. I'm a bit weary that this patch is expanding >>> the interface a lot that they now look like the equivalents of normal >>> schedule calls. Not exactly sure what to do here but can we please at >>> least have RED BOLD BLINKING comments which scream to people not to >>> use these unless they know what they're doing? >> >> Maybe we should trigger WARN_ON_ONCE() if lockdep_depth() > 0 by >> default and have ugly variants which can be used if the caller is sure >> that it's okay possibly with list of locks which are held? >> >> -- >> tejun (sorry for the top post) I could also put the lockdep check that was reveted back into try_to_freeze(), and add a freezable_schedule_unsafe() that skips it for use in the known-unsafe users in nfs, with a big comment not to add new users of it. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Thu, May 02, 2013 at 07:41:39PM -0700, Colin Cross wrote: > On Thu, May 2, 2013 at 7:16 PM, Colin Cross <ccross@android.com> wrote: > > This sounds the same as what ended up getting reverted in > > https://lkml.org/lkml/2013/3/4/221 > > I can add the WARN_ON_ONCE to all my new calls, and leave them out of > > existing calls, but that seems a little odd, and will be redundant if > > the lockdep call in try_to_freeze goes back in in 3.11. Do you still > > want it in the new apis? ... > I could also put the lockdep check that was reveted back into > try_to_freeze(), and add a freezable_schedule_unsafe() that skips it > for use in the known-unsafe users in nfs, with a big comment not to > add new users of it. Oh yeah, that sounds like a good idea, and as for the non-ugly variants, at least in the mid/long term, I think it'd be best to add the lockdep annotation to try_to_freeze() with __try_to_freeze_unsafe_youre_gonna_burn_in_hell_if_you_use_this() for the existing users which should be gradually converted, but if that's too burdensome, adding warnings to the wrappers should do for now, I guess. And I *hope* the lockdep annotation is stricter than what was added before. I think it better be "no lock ever should be held at this point" rather than "consider this a big lock". I'm not sure how much consensus we have on this but I'd like to, in the longer term, merge freezer into job control stop so that frozen tasks don't appear as being in uninterruptible sleep. Currently, the frozen tasks are essentially in UNINTERRUPTIBLE sleep which is okay for suspend but for cgroup freezer, it sucks. You end up with tasks which you can't even kill. Combined with the locking problems, I was planning to update the freezer such that the frozen state is implemented as a form of jobctl stop, so that things like ptrace / kill -9 could work on them and we also have the clear definition of the frozen state rather than the current "it may get stuck somewhere in the kernel". But that conflicts with what you're doing here which seems pretty useful, so, to satisfy both goals, when somebody needs to put a pseudo-frozen task into the actual frozen jobctl stop, those spots which are currently using try_to_stop() would have to return an error, most likely -EINTR with TIF_SIGPENDING set, and the control should return towards userland so that signal handling path can be invoked. ie. It should be possible to steer the tasks which are considered frozen but not in the frozen jobctl stop into the jobctl stop without any side effect. To do that, those spots basically have to be pretty close to the userland boundary where it can easily leave the kernel with -EINTR and AFAICS all the spots that you converted are like that (which I think is natural). While not holding any locks doesn't guarantee that, I think there'd be a fairly high correlation at least and it'd be able to drive people towards finding out what's going on. So, that's my agenda. Not sure how many actually agree with it. Rafael, Oleg, Jeff? Thanks.
On Thu, May 02, 2013 at 09:09:34PM -0700, Tejun Heo wrote: > But that conflicts with what you're doing here which seems pretty > useful, so, to satisfy both goals, when somebody needs to put a > pseudo-frozen task into the actual frozen jobctl stop, those spots > which are currently using try_to_stop() would have to return an error, > most likely -EINTR with TIF_SIGPENDING set, and the control should Let's make that -ERESTART*.
On Thu, May 2, 2013 at 9:09 PM, Tejun Heo <tj@kernel.org> wrote: > Hello, > > On Thu, May 02, 2013 at 07:41:39PM -0700, Colin Cross wrote: >> On Thu, May 2, 2013 at 7:16 PM, Colin Cross <ccross@android.com> wrote: >> > This sounds the same as what ended up getting reverted in >> > https://lkml.org/lkml/2013/3/4/221 >> > I can add the WARN_ON_ONCE to all my new calls, and leave them out of >> > existing calls, but that seems a little odd, and will be redundant if >> > the lockdep call in try_to_freeze goes back in in 3.11. Do you still >> > want it in the new apis? > ... >> I could also put the lockdep check that was reveted back into >> try_to_freeze(), and add a freezable_schedule_unsafe() that skips it >> for use in the known-unsafe users in nfs, with a big comment not to >> add new users of it. > > Oh yeah, that sounds like a good idea, and as for the non-ugly > variants, at least in the mid/long term, I think it'd be best to add > the lockdep annotation to try_to_freeze() with > __try_to_freeze_unsafe_youre_gonna_burn_in_hell_if_you_use_this() for > the existing users which should be gradually converted, but if that's > too burdensome, adding warnings to the wrappers should do for now, I > guess. > > And I *hope* the lockdep annotation is stricter than what was added > before. I think it better be "no lock ever should be held at this > point" rather than "consider this a big lock". The previous patch (6aa9707099c4b25700940eb3d016f16c4434360d in Linus' tree) already required that no locks be held, it wasn't using a lock annotation. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 02, 2013 at 09:17:21PM -0700, Colin Cross wrote: > > And I *hope* the lockdep annotation is stricter than what was added > > before. I think it better be "no lock ever should be held at this > > point" rather than "consider this a big lock". > > The previous patch (6aa9707099c4b25700940eb3d016f16c4434360d in Linus' > tree) already required that no locks be held, it wasn't using a lock > annotation. Ooh, cool. Thanks for the pointer. Forget about my rambling and let's just add an unsafe version of try_to_freeze() and be done with it. :)
On Thu, 2 May 2013, Tejun Heo wrote: > Combined with the locking problems, I was planning to update the > freezer such that the frozen state is implemented as a form of jobctl > stop, so that things like ptrace / kill -9 could work on them and we > also have the clear definition of the frozen state rather than the > current "it may get stuck somewhere in the kernel". > > But that conflicts with what you're doing here which seems pretty > useful, so, to satisfy both goals, when somebody needs to put a > pseudo-frozen task into the actual frozen jobctl stop, those spots > which are currently using try_to_stop() would have to return an error, > most likely -EINTR with TIF_SIGPENDING set, and the control should > return towards userland so that signal handling path can be invoked. > ie. It should be possible to steer the tasks which are considered > frozen but not in the frozen jobctl stop into the jobctl stop without > any side effect. To do that, those spots basically have to be pretty > close to the userland boundary where it can easily leave the kernel > with -EINTR and AFAICS all the spots that you converted are like that > (which I think is natural). While not holding any locks doesn't > guarantee that, I think there'd be a fairly high correlation at least > and it'd be able to drive people towards finding out what's going on. Don't forget about freezable kernel threads. They never cross the kernel/user boundary. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 03, 2013 at 10:18:17AM -0400, Alan Stern wrote: > Don't forget about freezable kernel threads. They never cross the > kernel/user boundary. Doesn't really matter for this purpose. They first of all can't be moved into !root cgroups and thus can't be cgroup-frozen and even if they could they can't be killed or ptraced, so they just don't play the same game. Thanks.
diff --git a/include/linux/freezer.h b/include/linux/freezer.h index e70df40..b239f45 100644 --- a/include/linux/freezer.h +++ b/include/linux/freezer.h @@ -152,6 +152,26 @@ static inline bool freezer_should_skip(struct task_struct *p) freezer_count(); \ }) +/* Like schedule_timeout(), but should not block the freezer. */ +#define freezable_schedule_timeout(timeout) \ +({ \ + long __retval; \ + freezer_do_not_count(); \ + __retval = schedule_timeout(timeout); \ + freezer_count(); \ + __retval; \ +}) + +/* Like schedule_timeout_interruptible(), but should not block the freezer. */ +#define freezable_schedule_timeout_interruptible(timeout) \ +({ \ + long __retval; \ + freezer_do_not_count(); \ + __retval = schedule_timeout_interruptible(timeout); \ + freezer_count(); \ + __retval; \ +}) + /* Like schedule_timeout_killable(), but should not block the freezer. */ #define freezable_schedule_timeout_killable(timeout) \ ({ \ @@ -162,6 +182,17 @@ static inline bool freezer_should_skip(struct task_struct *p) __retval; \ }) +/* Like schedule_hrtimeout_range(), but should not block the freezer. */ +#define freezable_schedule_hrtimeout_range(expires, delta, mode) \ +({ \ + int __retval; \ + freezer_do_not_count(); \ + __retval = schedule_hrtimeout_range(expires, delta, mode); \ + freezer_count(); \ + __retval; \ +}) + + /* * Freezer-friendly wrappers around wait_event_interruptible(), * wait_event_killable() and wait_event_interruptible_timeout(), originally @@ -180,30 +211,32 @@ static inline bool freezer_should_skip(struct task_struct *p) #define wait_event_freezable(wq, condition) \ ({ \ int __retval; \ - for (;;) { \ - __retval = wait_event_interruptible(wq, \ - (condition) || freezing(current)); \ - if (__retval || (condition)) \ - break; \ - try_to_freeze(); \ - } \ + freezer_do_not_count(); \ + __retval = wait_event_interruptible(wq, (condition)); \ + freezer_count(); \ __retval; \ }) #define wait_event_freezable_timeout(wq, condition, timeout) \ ({ \ long __retval = timeout; \ - for (;;) { \ - __retval = wait_event_interruptible_timeout(wq, \ - (condition) || freezing(current), \ + freezer_do_not_count(); \ + __retval = wait_event_interruptible_timeout(wq, (condition), \ __retval); \ - if (__retval <= 0 || (condition)) \ - break; \ - try_to_freeze(); \ - } \ + freezer_count(); \ __retval; \ }) +#define wait_event_freezable_exclusive(wq, condition) \ +({ \ + int __retval; \ + freezer_do_not_count(); \ + __retval = wait_event_interruptible_exclusive(wq, condition); \ + freezer_count(); \ + __retval; \ +}) + + #else /* !CONFIG_FREEZER */ static inline bool frozen(struct task_struct *p) { return false; } static inline bool freezing(struct task_struct *p) { return false; } @@ -225,15 +258,26 @@ static inline void set_freezable(void) {} #define freezable_schedule() schedule() +#define freezable_schedule_timeout(timeout) schedule_timeout(timeout) + +#define freezable_schedule_timeout_interruptible(timeout) \ + schedule_timeout_interruptible(timeout) + #define freezable_schedule_timeout_killable(timeout) \ schedule_timeout_killable(timeout) +#define freezable_schedule_hrtimeout_range(expires, delta, mode) \ + schedule_hrtimeout_range(expires, delta, mode) + #define wait_event_freezable(wq, condition) \ wait_event_interruptible(wq, condition) #define wait_event_freezable_timeout(wq, condition, timeout) \ wait_event_interruptible_timeout(wq, condition, timeout) +#define wait_event_freezable_exclusive(wq, condition) \ + wait_event_interruptible_exclusive(wq, condition) + #define wait_event_freezekillable(wq, condition) \ wait_event_killable(wq, condition)
Freezing tasks will wake up almost every userspace task from where it is blocking and force it to run until it hits a call to try_to_sleep(), generally on the exit path from the syscall it is blocking in. On resume each task will run again, usually restarting the syscall and running until it hits the same blocking call as it was originally blocked in. To allow tasks to avoid running on every suspend/resume cycle, this patch adds additional freezable wrappers around blocking calls that call freezer_do_not_count(). Combined with the previous patch, these tasks will not run during suspend or resume unless they wake up for another reason, in which case they will run until they hit the try_to_freeze() in freezer_count(), and then continue processing the wakeup after tasks are thawed. This patch also converts the existing wait_event_freezable* wrappers to use freezer_do_not_count(). Additional patches will convert the most common locations that userspace blocks in to use freezable helpers. Signed-off-by: Colin Cross <ccross@android.com> --- include/linux/freezer.h | 72 +++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 58 insertions(+), 14 deletions(-)