Message ID | 1314988070-12244-7-git-send-email-tj@kernel.org (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On 09/03, Tejun Heo wrote: > > This patch removes set_freezable_with_signal() along with > PF_FREEZER_NOSIG Great. I never understood PF_FREEZER_NOSIG logic ;) One question, > @@ -72,10 +72,6 @@ bool __refrigerator(bool check_kthr_stop) > schedule(); > } > > - spin_lock_irq(¤t->sighand->siglock); > - recalc_sigpending(); /* We sent fake signal, clean it up */ > - spin_unlock_irq(¤t->sighand->siglock); > - Why? This recalc_sigpending() makes sense imho. Otherwise the user-space tasks (almost) always return with TIF_SIGPENDING. May be we can do this under "if (PF_KTRHREAD)". For example. Suppose the user-space task does wait_event_freezable()... Hmm. OTOH, wait_event_freezable() looks wrong anyway... So probably this doesn't matter. ptrace_stop/get_signal_to_deliver doesn't need this, probably we do not care about the other callers. It seems, a lot of get_signal_to_deliver() calles also call try_to_freeze() for no reason. So, yes, I am starting to think this change is fine too ;) Oleg.
Hello, Oleg. On Sun, Sep 04, 2011 at 08:46:26PM +0200, Oleg Nesterov wrote: > > @@ -72,10 +72,6 @@ bool __refrigerator(bool check_kthr_stop) > > schedule(); > > } > > > > - spin_lock_irq(¤t->sighand->siglock); > > - recalc_sigpending(); /* We sent fake signal, clean it up */ > > - spin_unlock_irq(¤t->sighand->siglock); > > - > > Why? This recalc_sigpending() makes sense imho. Otherwise the user-space > tasks (almost) always return with TIF_SIGPENDING. May be we can do this > under "if (PF_KTRHREAD)". PF_KTHREAD no longer gets TIF_SIGPENDING set, so... > For example. Suppose the user-space task does wait_event_freezable()... > > Hmm. OTOH, wait_event_freezable() looks wrong anyway... So probably > this doesn't matter. ptrace_stop/get_signal_to_deliver doesn't need > this, probably we do not care about the other callers. Can you elaborate on it being wrong? Do you mean the possibility of leaking spurious TIF_SIGPENDING? > It seems, a lot of get_signal_to_deliver() calles also call > try_to_freeze() for no reason. Yeap, it doesn't really matter. In most cases userland tasks would get parked in the signal delivery path anyway and if a task is deep in the kernel, it should and usually does hit syscall restart path. Except for few special cases (maybe some of unfailable NFS calls), there'userland tasks shouldn't take try_to_freeze() path outside of signal delivery / job control path. Also, in general, I don't think it's a good idea to add non-trivial optimization like above to code path which is as cold as it gets without any backing data. The PM freezer used to busy loop until all tasks enter refrigerator. Nobody cares whether some tasks enter signal delivery path one more time. > So, yes, I am starting to think this change is fine too ;) Yeay. :) Thanks.
On Mon, Sep 05, 2011 at 11:33:15AM +0900, Tejun Heo wrote: > > So, yes, I am starting to think this change is fine too ;) > > Yeay. :) Ooh, can I add your Acked-by before sending pull request to Rafael? Thanks.
On 09/05, Tejun Heo wrote: > > Hello, Oleg. > > On Sun, Sep 04, 2011 at 08:46:26PM +0200, Oleg Nesterov wrote: > > > @@ -72,10 +72,6 @@ bool __refrigerator(bool check_kthr_stop) > > > schedule(); > > > } > > > > > > - spin_lock_irq(¤t->sighand->siglock); > > > - recalc_sigpending(); /* We sent fake signal, clean it up */ > > > - spin_unlock_irq(¤t->sighand->siglock); > > > - > > > > Why? This recalc_sigpending() makes sense imho. Otherwise the user-space > > tasks (almost) always return with TIF_SIGPENDING. May be we can do this > > under "if (PF_KTRHREAD)". > > PF_KTHREAD no longer gets TIF_SIGPENDING set, so... Yes, > > For example. Suppose the user-space task does wait_event_freezable()... > > > > Hmm. OTOH, wait_event_freezable() looks wrong anyway... So probably > > this doesn't matter. ptrace_stop/get_signal_to_deliver doesn't need > > this, probably we do not care about the other callers. > > Can you elaborate on it being wrong? Do you mean the possibility of > leaking spurious TIF_SIGPENDING? Perhaps it is correct... Just I do not understand what it should do. I thought it is "wait_for_event && do_not_block_freezer". And at first glance the code looks as if it tries to do this. Say, in the "likely" case we restart wait_event_interruptible() after refrigerator(). But this looks racy. Suppose that freezing() is already false when try_to_freeze() or __refrigerator() is called. Say, cgroup_freezer does freeze_task() + __thaw_task(). Why it returns -ERESTARTSYS in this case? And if it can be used by the userspace thread, then we should probably do recalc_sigpending() somewhere, otherwise wait_event_freezable() will always return -ERESTARTSYS after __refrigerator(). Oleg.
On 09/05, Tejun Heo wrote: > > Ooh, can I add your Acked-by before sending pull request to Rafael? Yes, thanks, please feel free. Oleg.
Hello, On Mon, Sep 05, 2011 at 06:20:12PM +0200, Oleg Nesterov wrote: > Perhaps it is correct... Just I do not understand what it should do. > I thought it is "wait_for_event && do_not_block_freezer". And at first > glance the code looks as if it tries to do this. Say, in the "likely" > case we restart wait_event_interruptible() after refrigerator(). > > But this looks racy. Suppose that freezing() is already false when > try_to_freeze() or __refrigerator() is called. Say, cgroup_freezer does > freeze_task() + __thaw_task(). Why it returns -ERESTARTSYS in this case? It may return -ERESTARTSYS when not strictly necessary but given that it's supposed to trigger restart anyway I don't think it's actually broken (it doesn't break the contract of the wait). Another thing to note is that kthread freezing via cgroup is almost fundamentally broken. With the removal of freezable_with_signal, this shouldn't be an issue anymore although cgroup_freezer still needs to be fixed to account for kthreads for xstate transitions (it currently triggers BUG_ON). > And if it can be used by the userspace thread, then we should probably > do recalc_sigpending() somewhere, otherwise wait_event_freezable() will > always return -ERESTARTSYS after __refrigerator(). Thankfully, currently, all the few users are kthreads. I don't think it makes sense for userland tasks at all. Interruptible sleeps for userland tasks necessiate the ability to return to signal delivery path and restart or abort the current operation. There's no point in using wait_event_freezable() instead of wait_event_interruptible(). Thanks.
Hi, On 09/06, Tejun Heo wrote: > > Hello, > > On Mon, Sep 05, 2011 at 06:20:12PM +0200, Oleg Nesterov wrote: > > Perhaps it is correct... Just I do not understand what it should do. > > I thought it is "wait_for_event && do_not_block_freezer". And at first > > glance the code looks as if it tries to do this. Say, in the "likely" > > case we restart wait_event_interruptible() after refrigerator(). > > > > But this looks racy. Suppose that freezing() is already false when > > try_to_freeze() or __refrigerator() is called. Say, cgroup_freezer does > > freeze_task() + __thaw_task(). Why it returns -ERESTARTSYS in this case? > > It may return -ERESTARTSYS when not strictly necessary but given that > it's supposed to trigger restart anyway I don't think it's actually > broken (it doesn't break the contract of the wait). OK, but still this doesn't look right. > Another thing to > note is that kthread freezing via cgroup is almost fundamentally > broken. OK, let it be freeze_processes()+thaw_tasks(). Of course this is mostly theoretical. > > And if it can be used by the userspace thread, then we should probably > > do recalc_sigpending() somewhere, otherwise wait_event_freezable() will > > always return -ERESTARTSYS after __refrigerator(). > > Thankfully, currently, all the few users are kthreads. I don't think > it makes sense for userland tasks at all. Yes, agreed. In this case I think it should be #define wait_event_freezable(wq, condition) \ ({ \ int __retval; \ for (;;) { \ __retval = wait_event_interruptible(wq, \ (condition) || freezing(current)); \ if (__retval || (condition)) \ break; \ try_to_freeze(); \ } \ __retval; \ }) __retval/ERESTARTSYS is only needed for kthreads which play with allow_signal(), probably nobody should do this. Oleg.
On 09/06, Oleg Nesterov wrote: > > Yes, agreed. In this case I think it should be > > #define wait_event_freezable(wq, condition) \ > ({ \ > int __retval; \ > for (;;) { \ > __retval = wait_event_interruptible(wq, \ > (condition) || freezing(current)); \ > if (__retval || (condition)) \ > break; \ > try_to_freeze(); \ > } \ > __retval; \ > }) > > __retval/ERESTARTSYS is only needed for kthreads which play with allow_signal(), > probably nobody should do this. I meant, unless the caller plays with allow_signal(), it has all rights to do if (wait_event_freezable(...)) BUG(); This becomes correct with the code above. Oleg.
Hello, On Tue, Sep 06, 2011 at 05:25:39PM +0200, Oleg Nesterov wrote: > On 09/06, Oleg Nesterov wrote: > > > > Yes, agreed. In this case I think it should be > > > > #define wait_event_freezable(wq, condition) \ > > ({ \ > > int __retval; \ > > for (;;) { \ > > __retval = wait_event_interruptible(wq, \ > > (condition) || freezing(current)); \ > > if (__retval || (condition)) \ > > break; \ > > try_to_freeze(); \ > > } \ > > __retval; \ > > }) > > > > __retval/ERESTARTSYS is only needed for kthreads which play with allow_signal(), > > probably nobody should do this. > > I meant, unless the caller plays with allow_signal(), it has all rights to do > > if (wait_event_freezable(...)) > BUG(); > > This becomes correct with the code above. Yeap, sure, w/ freezable_with_signal gone, the above should work fine. Care to create a patch? Thanks.
Hi, On 09/07, Tejun Heo wrote: > > > > I meant, unless the caller plays with allow_signal(), it has all rights to do > > > > if (wait_event_freezable(...)) > > BUG(); > > > > This becomes correct with the code above. > > Yeap, sure, w/ freezable_with_signal gone, the above should work fine. > Care to create a patch? Sure. But. Tejun, Rafael, I have to rely on your review. I have no idea how can I test this change. Hopfully it is simple enough, though. And. wait_event_freezable_timeout() obviously has another problem, although I hope this is fine. The caller can spend a lot of time in refrigerator, shouldn't we update "timeout" in this case? I hope we shouldn't. Oleg.
diff --git a/include/linux/freezer.h b/include/linux/freezer.h index 9193bae..7ffbf04 100644 --- a/include/linux/freezer.h +++ b/include/linux/freezer.h @@ -48,7 +48,7 @@ static inline bool try_to_freeze(void) } extern bool freeze_task(struct task_struct *p); -extern bool __set_freezable(bool with_signal); +extern bool set_freezable(void); #ifdef CONFIG_CGROUP_FREEZER extern bool cgroup_freezing(struct task_struct *task); @@ -104,23 +104,6 @@ static inline int freezer_should_skip(struct task_struct *p) } /* - * Tell the freezer that the current task should be frozen by it - */ -static inline bool set_freezable(void) -{ - return __set_freezable(false); -} - -/* - * Tell the freezer that the current task should be frozen by it and that it - * should send a fake signal to the task to freeze it. - */ -static inline bool set_freezable_with_signal(void) -{ - return __set_freezable(true); -} - -/* * Freezer-friendly wrappers around wait_event_interruptible() and * wait_event_interruptible_timeout(), originally defined in <linux/wait.h> */ @@ -164,7 +147,6 @@ static inline void freezer_do_not_count(void) {} static inline void freezer_count(void) {} static inline int freezer_should_skip(struct task_struct *p) { return 0; } static inline void set_freezable(void) {} -static inline void set_freezable_with_signal(void) {} #define wait_event_freezable(wq, condition) \ wait_event_interruptible(wq, condition) diff --git a/include/linux/sched.h b/include/linux/sched.h index 1bb3356..6d45ce7 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1784,7 +1784,6 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t * #define PF_MEMPOLICY 0x10000000 /* Non-default NUMA mempolicy */ #define PF_MUTEX_TESTER 0x20000000 /* Thread belongs to the rt mutex tester */ #define PF_FREEZER_SKIP 0x40000000 /* Freezer should not count it as freezable */ -#define PF_FREEZER_NOSIG 0x80000000 /* Freezer won't send signals to it */ /* * Only the _current_ task can read/write to tsk->flags, but other diff --git a/kernel/freezer.c b/kernel/freezer.c index da76b64..770e6f5 100644 --- a/kernel/freezer.c +++ b/kernel/freezer.c @@ -39,7 +39,7 @@ bool freezing_slow_path(struct task_struct *p) if (pm_nosig_freezing || cgroup_freezing(p)) return true; - if (pm_freezing && !(p->flags & PF_FREEZER_NOSIG)) + if (pm_freezing && !(p->flags & PF_KTHREAD)) return true; return false; @@ -72,10 +72,6 @@ bool __refrigerator(bool check_kthr_stop) schedule(); } - spin_lock_irq(¤t->sighand->siglock); - recalc_sigpending(); /* We sent fake signal, clean it up */ - spin_unlock_irq(¤t->sighand->siglock); - pr_debug("%s left refrigerator\n", current->comm); /* @@ -120,7 +116,7 @@ bool freeze_task(struct task_struct *p) return false; } - if (!(p->flags & PF_FREEZER_NOSIG)) { + if (!(p->flags & PF_KTHREAD)) { fake_signal_wake_up(p); /* * fake_signal_wake_up() goes through p's scheduler @@ -145,28 +141,19 @@ void __thaw_task(struct task_struct *p) * be visible to @p as waking up implies wmb. Waking up inside * freezer_lock also prevents wakeups from leaking outside * refrigerator. - * - * If !FROZEN, @p hasn't reached refrigerator, recalc sigpending to - * avoid leaving dangling TIF_SIGPENDING behind. */ spin_lock_irqsave(&freezer_lock, flags); - if (frozen(p)) { + if (frozen(p)) wake_up_process(p); - } else { - spin_lock(&p->sighand->siglock); - recalc_sigpending_and_wake(p); - spin_unlock(&p->sighand->siglock); - } spin_unlock_irqrestore(&freezer_lock, flags); } /** - * __set_freezable - make %current freezable - * @with_signal: do we want %TIF_SIGPENDING for notification too? + * set_freezable - make %current freezable * * Mark %current freezable and enter refrigerator if necessary. */ -bool __set_freezable(bool with_signal) +bool set_freezable(void) { might_sleep(); @@ -177,10 +164,8 @@ bool __set_freezable(bool with_signal) */ spin_lock_irq(&freezer_lock); current->flags &= ~PF_NOFREEZE; - if (with_signal) - current->flags &= ~PF_FREEZER_NOSIG; spin_unlock_irq(&freezer_lock); return try_to_freeze(); } -EXPORT_SYMBOL(__set_freezable); +EXPORT_SYMBOL(set_freezable); diff --git a/kernel/kthread.c b/kernel/kthread.c index a6cbeea..7a4c862 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -282,7 +282,7 @@ int kthreadd(void *unused) set_cpus_allowed_ptr(tsk, cpu_all_mask); set_mems_allowed(node_states[N_HIGH_MEMORY]); - current->flags |= PF_NOFREEZE | PF_FREEZER_NOSIG; + current->flags |= PF_NOFREEZE; for (;;) { set_current_state(TASK_INTERRUPTIBLE);
There's no in-kernel user of set_freezable_with_signal() and there's no plan to add one. Mixing TIF_SIGPENDING with kernel threads can lead to nasty corner cases as kernel threads never travel signal delivery path on their own. e.g. the current implementation is buggy in the cancelation path of __thaw_task(). It calls recalc_sigpending_and_wake() in an attempt to clear TIF_SIGPENDING but the function never clears it regardless of sigpending state. This means that signallable freezable kthreads may continue executing with !freezing() && stuck TIF_SIGPENDING, which can be troublesome. This patch removes set_freezable_with_signal() along with PF_FREEZER_NOSIG and recalc_sigpending*() calls in freezer. User tasks get TIF_SIGPENDING, kernel tasks get woken up and the spurious sigpending is dealt with in the usual signal delivery path. Signed-off-by: Tejun Heo <tj@kernel.org> --- include/linux/freezer.h | 20 +------------------- include/linux/sched.h | 1 - kernel/freezer.c | 27 ++++++--------------------- kernel/kthread.c | 2 +- 4 files changed, 8 insertions(+), 42 deletions(-)