diff mbox series

fs: process fput task_work with TWA_SIGNAL

Message ID d6ddf6c2-3789-2e10-ba71-668cba03eb35@kernel.dk (mailing list archive)
State New, archived
Headers show
Series fs: process fput task_work with TWA_SIGNAL | expand

Commit Message

Jens Axboe Jan. 5, 2021, 6:29 p.m. UTC
Song reported a boot regression in a kvm image with 5.11-rc, and bisected
it down to the below patch. Debugging this issue, turns out that the boot
stalled when a task is waiting on a pipe being released. As we no longer
run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
task goes idle without running the task_work. This prevents ->release()
from being called on the pipe, which another boot task is waiting on.

Use TWA_SIGNAL for the file fput work to ensure it's run before the task
goes idle.

Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
Reported-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

The other alternative here is obviously to re-instate the:

if (unlikely(current->task_works))
	task_work_run();

in get_signal() that we had before this change. Might be safer in case
there are other cases that need to ensure the work is run in a timely
fashion, though I do think it's cleaner to long term to correctly mark
task_work with the needed notification type. Comments welcome...

Comments

Doug Anderson Jan. 7, 2021, 10:17 p.m. UTC | #1
Hi,

On Tue, Jan 5, 2021 at 10:30 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> Song reported a boot regression in a kvm image with 5.11-rc, and bisected
> it down to the below patch. Debugging this issue, turns out that the boot
> stalled when a task is waiting on a pipe being released. As we no longer
> run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
> task goes idle without running the task_work. This prevents ->release()
> from being called on the pipe, which another boot task is waiting on.
>
> Use TWA_SIGNAL for the file fput work to ensure it's run before the task
> goes idle.
>
> Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
> Reported-by: Song Liu <songliubraving@fb.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

I just spend a bit of time bisecting and landed on commit 98b89b649fce
("signal: kill JOBCTL_TASK_WORK") causing my failure to bootup
mainline.  Your patch fixes my problem.  I haven't done any analysis
of the code--just testing, thus:

Tested-by: Douglas Anderson <dianders@chromium.org>
Jens Axboe Jan. 8, 2021, 3:52 a.m. UTC | #2
On 1/7/21 3:17 PM, Doug Anderson wrote:
> Hi,
> 
> On Tue, Jan 5, 2021 at 10:30 AM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> Song reported a boot regression in a kvm image with 5.11-rc, and bisected
>> it down to the below patch. Debugging this issue, turns out that the boot
>> stalled when a task is waiting on a pipe being released. As we no longer
>> run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
>> task goes idle without running the task_work. This prevents ->release()
>> from being called on the pipe, which another boot task is waiting on.
>>
>> Use TWA_SIGNAL for the file fput work to ensure it's run before the task
>> goes idle.
>>
>> Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
>> Reported-by: Song Liu <songliubraving@fb.com>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> I just spend a bit of time bisecting and landed on commit 98b89b649fce
> ("signal: kill JOBCTL_TASK_WORK") causing my failure to bootup
> mainline.  Your patch fixes my problem.  I haven't done any analysis
> of the code--just testing, thus:
> 
> Tested-by: Douglas Anderson <dianders@chromium.org>

Thanks, adding your Tested-by.
Al Viro Jan. 8, 2021, 5:26 a.m. UTC | #3
On Tue, Jan 05, 2021 at 11:29:11AM -0700, Jens Axboe wrote:
> Song reported a boot regression in a kvm image with 5.11-rc, and bisected
> it down to the below patch. Debugging this issue, turns out that the boot
> stalled when a task is waiting on a pipe being released. As we no longer
> run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
> task goes idle without running the task_work. This prevents ->release()
> from being called on the pipe, which another boot task is waiting on.
> 
> Use TWA_SIGNAL for the file fput work to ensure it's run before the task
> goes idle.
> 
> Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
> Reported-by: Song Liu <songliubraving@fb.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> ---
> 
> The other alternative here is obviously to re-instate the:
> 
> if (unlikely(current->task_works))
> 	task_work_run();
> 
> in get_signal() that we had before this change. Might be safer in case
> there are other cases that need to ensure the work is run in a timely
> fashion, though I do think it's cleaner to long term to correctly mark
> task_work with the needed notification type. Comments welcome...

Interesting...  I think I've missed the discussion of that thing; could
you forward the relevant thread my way or give an archive link to it?
Sedat Dilek Jan. 8, 2021, 6:20 a.m. UTC | #4
On Fri, Jan 8, 2021 at 4:56 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 1/7/21 3:17 PM, Doug Anderson wrote:
> > Hi,
> >
> > On Tue, Jan 5, 2021 at 10:30 AM Jens Axboe <axboe@kernel.dk> wrote:
> >>
> >> Song reported a boot regression in a kvm image with 5.11-rc, and bisected
> >> it down to the below patch. Debugging this issue, turns out that the boot
> >> stalled when a task is waiting on a pipe being released. As we no longer
> >> run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
> >> task goes idle without running the task_work. This prevents ->release()
> >> from being called on the pipe, which another boot task is waiting on.
> >>
> >> Use TWA_SIGNAL for the file fput work to ensure it's run before the task
> >> goes idle.
> >>
> >> Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
> >> Reported-by: Song Liu <songliubraving@fb.com>
> >> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >
> > I just spend a bit of time bisecting and landed on commit 98b89b649fce
> > ("signal: kill JOBCTL_TASK_WORK") causing my failure to bootup
> > mainline.  Your patch fixes my problem.  I haven't done any analysis
> > of the code--just testing, thus:
> >
> > Tested-by: Douglas Anderson <dianders@chromium.org>
>
> Thanks, adding your Tested-by.
>

I have this in my patch-series since it appeared in [1].

Feel free to add my:

   Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang version 11.0.1

- Sedat -

[1] https://git.kernel.dk/cgit/linux-block/log/?h=task_work

- Sedat -
Sedat Dilek Jan. 8, 2021, 6:21 a.m. UTC | #5
On Fri, Jan 8, 2021 at 6:30 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Tue, Jan 05, 2021 at 11:29:11AM -0700, Jens Axboe wrote:
> > Song reported a boot regression in a kvm image with 5.11-rc, and bisected
> > it down to the below patch. Debugging this issue, turns out that the boot
> > stalled when a task is waiting on a pipe being released. As we no longer
> > run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
> > task goes idle without running the task_work. This prevents ->release()
> > from being called on the pipe, which another boot task is waiting on.
> >
> > Use TWA_SIGNAL for the file fput work to ensure it's run before the task
> > goes idle.
> >
> > Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
> > Reported-by: Song Liu <songliubraving@fb.com>
> > Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >
> > ---
> >
> > The other alternative here is obviously to re-instate the:
> >
> > if (unlikely(current->task_works))
> >       task_work_run();
> >
> > in get_signal() that we had before this change. Might be safer in case
> > there are other cases that need to ensure the work is run in a timely
> > fashion, though I do think it's cleaner to long term to correctly mark
> > task_work with the needed notification type. Comments welcome...
>
> Interesting...  I think I've missed the discussion of that thing; could
> you forward the relevant thread my way or give an archive link to it?

See [1].

- Sedat -

[1] https://marc.info/?t=160987156600001&r=1&w=2
Al Viro Jan. 8, 2021, 6:46 a.m. UTC | #6
On Fri, Jan 08, 2021 at 05:26:51AM +0000, Al Viro wrote:
> On Tue, Jan 05, 2021 at 11:29:11AM -0700, Jens Axboe wrote:
> > Song reported a boot regression in a kvm image with 5.11-rc, and bisected
> > it down to the below patch. Debugging this issue, turns out that the boot
> > stalled when a task is waiting on a pipe being released. As we no longer
> > run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
> > task goes idle without running the task_work. This prevents ->release()
> > from being called on the pipe, which another boot task is waiting on.
> > 
> > Use TWA_SIGNAL for the file fput work to ensure it's run before the task
> > goes idle.
> > 
> > Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
> > Reported-by: Song Liu <songliubraving@fb.com>
> > Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > 
> > ---
> > 
> > The other alternative here is obviously to re-instate the:
> > 
> > if (unlikely(current->task_works))
> > 	task_work_run();
> > 
> > in get_signal() that we had before this change. Might be safer in case
> > there are other cases that need to ensure the work is run in a timely
> > fashion, though I do think it's cleaner to long term to correctly mark
> > task_work with the needed notification type. Comments welcome...
> 
> Interesting...  I think I've missed the discussion of that thing; could
> you forward the relevant thread my way or give an archive link to it?

Actually, why do we need TWA_RESUME at all?  OK, a while ago you've added
a way for task_work_add() to do wake_up_signal().  Fine, so if the sucker
had been asleep in get_signal(), it gets woken up and the work gets run
fast.  Irrelevant for those who did task_work_add() for themselves.
With that commit, though, you've suddenly changed the default behaviour -
now if you do that task_work_add() for current *and* get asleep in
get_signal(), task_work_add() gets delayed - potentially for a very
long time.

Now the default (TWA_RESUME) has changed semantics; matter of fact,
TWA_SIGNAL seems to be a lot closer than what we used to have.  I'm
too sleepy right now to check if there are valid usecases for your
new TWA_RESUME behaviour, but I very much doubt that old callers
(before the TWA_RESUME/TWA_SIGNAL split) want that.

In particular, for mntput_no_expire() we definitely do *not* want
that, same as with fput().  Same, AFAICS, for YAMA report_access().
And for binder_deferred_fd_close().  And task_tick_numa() looks
that way as well...

Anyway, bedtime for me; right now it looks like at least for task == current
we always want TWA_SIGNAL.  I'll look into that more tomorrow when I get
up, but so far it smells like switching everything to TWA_SIGNAL would
be the right thing to do, if not going back to bool notify for
task_work_add()...
Al Viro Jan. 8, 2021, 6:47 a.m. UTC | #7
On Fri, Jan 08, 2021 at 07:21:52AM +0100, Sedat Dilek wrote:
> On Fri, Jan 8, 2021 at 6:30 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Tue, Jan 05, 2021 at 11:29:11AM -0700, Jens Axboe wrote:
> > > Song reported a boot regression in a kvm image with 5.11-rc, and bisected
> > > it down to the below patch. Debugging this issue, turns out that the boot
> > > stalled when a task is waiting on a pipe being released. As we no longer
> > > run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
> > > task goes idle without running the task_work. This prevents ->release()
> > > from being called on the pipe, which another boot task is waiting on.
> > >
> > > Use TWA_SIGNAL for the file fput work to ensure it's run before the task
> > > goes idle.
> > >
> > > Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
> > > Reported-by: Song Liu <songliubraving@fb.com>
> > > Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > >
> > > ---
> > >
> > > The other alternative here is obviously to re-instate the:
> > >
> > > if (unlikely(current->task_works))
> > >       task_work_run();
> > >
> > > in get_signal() that we had before this change. Might be safer in case
> > > there are other cases that need to ensure the work is run in a timely
> > > fashion, though I do think it's cleaner to long term to correctly mark
> > > task_work with the needed notification type. Comments welcome...
> >
> > Interesting...  I think I've missed the discussion of that thing; could
> > you forward the relevant thread my way or give an archive link to it?
> 
> See [1].
> 
> - Sedat -
> 
> [1] https://marc.info/?t=160987156600001&r=1&w=2

Thanks; will check tomorrow.
Al Viro Jan. 8, 2021, 6:52 a.m. UTC | #8
On Fri, Jan 08, 2021 at 07:21:52AM +0100, Sedat Dilek wrote:
> On Fri, Jan 8, 2021 at 6:30 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Tue, Jan 05, 2021 at 11:29:11AM -0700, Jens Axboe wrote:
> > > Song reported a boot regression in a kvm image with 5.11-rc, and bisected
> > > it down to the below patch. Debugging this issue, turns out that the boot
> > > stalled when a task is waiting on a pipe being released. As we no longer
> > > run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
> > > task goes idle without running the task_work. This prevents ->release()
> > > from being called on the pipe, which another boot task is waiting on.
> > >
> > > Use TWA_SIGNAL for the file fput work to ensure it's run before the task
> > > goes idle.
> > >
> > > Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
> > > Reported-by: Song Liu <songliubraving@fb.com>
> > > Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > >
> > > ---
> > >
> > > The other alternative here is obviously to re-instate the:
> > >
> > > if (unlikely(current->task_works))
> > >       task_work_run();
> > >
> > > in get_signal() that we had before this change. Might be safer in case
> > > there are other cases that need to ensure the work is run in a timely
> > > fashion, though I do think it's cleaner to long term to correctly mark
> > > task_work with the needed notification type. Comments welcome...
> >
> > Interesting...  I think I've missed the discussion of that thing; could
> > you forward the relevant thread my way or give an archive link to it?
> 
> See [1].
> 
> - Sedat -
> 
> [1] https://marc.info/?t=160987156600001&r=1&w=2

Wait, that's this very thread, starting with the posting I'd been replying
to.  Really confused now...  Was that a private bug report and an equally
private discussion?  That's what I wanted to see...

Anyway, I'm more than half-asleep right now; will get back to that in the
morning.
Jens Axboe Jan. 8, 2021, 3:13 p.m. UTC | #9
On 1/7/21 11:46 PM, Al Viro wrote:
> On Fri, Jan 08, 2021 at 05:26:51AM +0000, Al Viro wrote:
>> On Tue, Jan 05, 2021 at 11:29:11AM -0700, Jens Axboe wrote:
>>> Song reported a boot regression in a kvm image with 5.11-rc, and bisected
>>> it down to the below patch. Debugging this issue, turns out that the boot
>>> stalled when a task is waiting on a pipe being released. As we no longer
>>> run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
>>> task goes idle without running the task_work. This prevents ->release()
>>> from being called on the pipe, which another boot task is waiting on.
>>>
>>> Use TWA_SIGNAL for the file fput work to ensure it's run before the task
>>> goes idle.
>>>
>>> Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK")
>>> Reported-by: Song Liu <songliubraving@fb.com>
>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>
>>> ---
>>>
>>> The other alternative here is obviously to re-instate the:
>>>
>>> if (unlikely(current->task_works))
>>> 	task_work_run();
>>>
>>> in get_signal() that we had before this change. Might be safer in case
>>> there are other cases that need to ensure the work is run in a timely
>>> fashion, though I do think it's cleaner to long term to correctly mark
>>> task_work with the needed notification type. Comments welcome...
>>
>> Interesting...  I think I've missed the discussion of that thing; could
>> you forward the relevant thread my way or give an archive link to it?

The initial report from Song was off list, and I just worked from that
to get to understanding the issue. Most of it is in the commit message,
but the debugging basically involved figuring out what the stuck task
was doing (it was in idle), and that it still had pending task_work. The
task_work was 5 entries of ____fput, with 4 being ext4 files, and 1
being a pipe. So that lead to the theory of the pipe not being released,
and hence why we were stuck.

> Actually, why do we need TWA_RESUME at all?  OK, a while ago you've added
> a way for task_work_add() to do wake_up_signal().  Fine, so if the sucker
> had been asleep in get_signal(), it gets woken up and the work gets run
> fast.  Irrelevant for those who did task_work_add() for themselves.
> With that commit, though, you've suddenly changed the default behaviour -
> now if you do that task_work_add() for current *and* get asleep in
> get_signal(), task_work_add() gets delayed - potentially for a very
> long time.

Right, this is why I brought up that we can re-instate the get_signal()
running task_work unconditionally as another way of fixing it, because
that change was inadvertently done as part of the commit that killed off
JOBCTL_TASK_WORK.

> Now the default (TWA_RESUME) has changed semantics; matter of fact,
> TWA_SIGNAL seems to be a lot closer than what we used to have.  I'm
> too sleepy right now to check if there are valid usecases for your new
> TWA_RESUME behaviour, but I very much doubt that old callers (before
> the TWA_RESUME/TWA_SIGNAL split) want that.
> 
> In particular, for mntput_no_expire() we definitely do *not* want
> that, same as with fput().  Same, AFAICS, for YAMA report_access().
> And for binder_deferred_fd_close().  And task_tick_numa() looks that
> way as well...
> 
> Anyway, bedtime for me; right now it looks like at least for task ==
> current we always want TWA_SIGNAL.  I'll look into that more tomorrow
> when I get up, but so far it smells like switching everything to
> TWA_SIGNAL would be the right thing to do, if not going back to bool
> notify for task_work_add()...

Before the change, the fact that we ran task_work off get_signal() and
thus processed even non-notify work in that path was a bit of a mess,
imho. If you have work that needs processing now, in the same manner as
signals, then you really should be using TWA_SIGNAL. For this pipe case,
and I'd need to setup and reproduce it again, the task must have a
signal pending and that would have previously caused the task_work to
run, and now it does not. TWA_RESUME technically didn't change its
behavior, it's still the same notification type, we just don't run
task_work unconditionally (regardless of notification type) from
get_signal().

I think the main question here is if we want to re-instate the behavior
of running task_work off get_signal(). I'm leaning towards not doing
that and ensuring that callers that DO need that are using TWA_SIGNAL.
Al Viro Jan. 8, 2021, 3:58 p.m. UTC | #10
On Fri, Jan 08, 2021 at 08:13:25AM -0700, Jens Axboe wrote:
> > Anyway, bedtime for me; right now it looks like at least for task ==
> > current we always want TWA_SIGNAL.  I'll look into that more tomorrow
> > when I get up, but so far it smells like switching everything to
> > TWA_SIGNAL would be the right thing to do, if not going back to bool
> > notify for task_work_add()...
> 
> Before the change, the fact that we ran task_work off get_signal() and
> thus processed even non-notify work in that path was a bit of a mess,
> imho. If you have work that needs processing now, in the same manner as
> signals, then you really should be using TWA_SIGNAL. For this pipe case,
> and I'd need to setup and reproduce it again, the task must have a
> signal pending and that would have previously caused the task_work to
> run, and now it does not. TWA_RESUME technically didn't change its
> behavior, it's still the same notification type, we just don't run
> task_work unconditionally (regardless of notification type) from
> get_signal().

It sure as hell did change behaviour.  Think of the effect of getting
hit with SIGSTOP.  That's what that "bit of a mess" had been about.
Work done now vs. possibly several days later when SIGCONT finally
gets sent.

> I think the main question here is if we want to re-instate the behavior
> of running task_work off get_signal(). I'm leaning towards not doing
> that and ensuring that callers that DO need that are using TWA_SIGNAL.

Can you show the callers that DO NOT need it?
Jens Axboe Jan. 8, 2021, 4:10 p.m. UTC | #11
On 1/8/21 8:58 AM, Al Viro wrote:
> On Fri, Jan 08, 2021 at 08:13:25AM -0700, Jens Axboe wrote:
>>> Anyway, bedtime for me; right now it looks like at least for task ==
>>> current we always want TWA_SIGNAL.  I'll look into that more tomorrow
>>> when I get up, but so far it smells like switching everything to
>>> TWA_SIGNAL would be the right thing to do, if not going back to bool
>>> notify for task_work_add()...
>>
>> Before the change, the fact that we ran task_work off get_signal() and
>> thus processed even non-notify work in that path was a bit of a mess,
>> imho. If you have work that needs processing now, in the same manner as
>> signals, then you really should be using TWA_SIGNAL. For this pipe case,
>> and I'd need to setup and reproduce it again, the task must have a
>> signal pending and that would have previously caused the task_work to
>> run, and now it does not. TWA_RESUME technically didn't change its
>> behavior, it's still the same notification type, we just don't run
>> task_work unconditionally (regardless of notification type) from
>> get_signal().
> 
> It sure as hell did change behaviour.  Think of the effect of getting
> hit with SIGSTOP.  That's what that "bit of a mess" had been about.
> Work done now vs. possibly several days later when SIGCONT finally
> gets sent.
> 
>> I think the main question here is if we want to re-instate the behavior
>> of running task_work off get_signal(). I'm leaning towards not doing
>> that and ensuring that callers that DO need that are using TWA_SIGNAL.
> 
> Can you show the callers that DO NOT need it?

OK, so here's my suggestion:

1) For 5.11, we just re-instate the task_work run in get_signal(). This
   will make TWA_RESUME have the exact same behavior as before.

2) For 5.12, I'll prepare a patch that collapses TWA_RESUME and TWA_SIGNAL,
   turning it into a bool again (notify or no notify).

How does that sound?
Jens Axboe Jan. 8, 2021, 4:26 p.m. UTC | #12
On 1/8/21 9:10 AM, Jens Axboe wrote:
> On 1/8/21 8:58 AM, Al Viro wrote:
>> On Fri, Jan 08, 2021 at 08:13:25AM -0700, Jens Axboe wrote:
>>>> Anyway, bedtime for me; right now it looks like at least for task ==
>>>> current we always want TWA_SIGNAL.  I'll look into that more tomorrow
>>>> when I get up, but so far it smells like switching everything to
>>>> TWA_SIGNAL would be the right thing to do, if not going back to bool
>>>> notify for task_work_add()...
>>>
>>> Before the change, the fact that we ran task_work off get_signal() and
>>> thus processed even non-notify work in that path was a bit of a mess,
>>> imho. If you have work that needs processing now, in the same manner as
>>> signals, then you really should be using TWA_SIGNAL. For this pipe case,
>>> and I'd need to setup and reproduce it again, the task must have a
>>> signal pending and that would have previously caused the task_work to
>>> run, and now it does not. TWA_RESUME technically didn't change its
>>> behavior, it's still the same notification type, we just don't run
>>> task_work unconditionally (regardless of notification type) from
>>> get_signal().
>>
>> It sure as hell did change behaviour.  Think of the effect of getting
>> hit with SIGSTOP.  That's what that "bit of a mess" had been about.
>> Work done now vs. possibly several days later when SIGCONT finally
>> gets sent.
>>
>>> I think the main question here is if we want to re-instate the behavior
>>> of running task_work off get_signal(). I'm leaning towards not doing
>>> that and ensuring that callers that DO need that are using TWA_SIGNAL.
>>
>> Can you show the callers that DO NOT need it?
> 
> OK, so here's my suggestion:
> 
> 1) For 5.11, we just re-instate the task_work run in get_signal(). This
>    will make TWA_RESUME have the exact same behavior as before.
> 
> 2) For 5.12, I'll prepare a patch that collapses TWA_RESUME and TWA_SIGNAL,
>    turning it into a bool again (notify or no notify).
> 
> How does that sound?

Attached the patches - #1 is proposed for 5.11 to fix the current issue,
and then 2-4 can get queued for 5.12 to totally remove the difference
between TWA_RESUME and TWA_SIGNAL.

Totally untested, but pretty straight forward.
Al Viro Jan. 8, 2021, 6:05 p.m. UTC | #13
On Fri, Jan 08, 2021 at 09:26:40AM -0700, Jens Axboe wrote:
> >> Can you show the callers that DO NOT need it?
> > 
> > OK, so here's my suggestion:
> > 
> > 1) For 5.11, we just re-instate the task_work run in get_signal(). This
> >    will make TWA_RESUME have the exact same behavior as before.
> > 
> > 2) For 5.12, I'll prepare a patch that collapses TWA_RESUME and TWA_SIGNAL,
> >    turning it into a bool again (notify or no notify).
> > 
> > How does that sound?
> 
> Attached the patches - #1 is proposed for 5.11 to fix the current issue,
> and then 2-4 can get queued for 5.12 to totally remove the difference
> between TWA_RESUME and TWA_SIGNAL.
> 
> Totally untested, but pretty straight forward.

	Umm...  I'm looking at the callers of get_signal() and I really wonder
how your support for TIF_NOTIFY_SIGNAL interacts with saved sigmask handling
by various do_signal() (calls of restore_saved_sigmask()).  Could you give
pointers to relevant discussion or a braindump on the same?  I realize that
it had been months ago, but...

	Do we even need restore_saved_sigmask_unless() now?  Could
set_user_sigmask() simply set TIF_NOTIFY_SIGNAL?  Oleg, could you comment
on that?

	Another fun question is how does that thing interact with
single-stepping logics; it's been about 8 years since I looked into
those horrors, but they used to be bloody awful...

	What I'm trying to figure out is how costly TIF_NOTIFY_SIGNAL is
on the work execution side; task_work_add() side is cheap enough, it's
delivery that is interesting.
diff mbox series

Patch

diff --git a/fs/file_table.c b/fs/file_table.c
index 45437f8e1003..7c76b611c95b 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -338,7 +338,13 @@  void fput_many(struct file *file, unsigned int refs)
 
 		if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
 			init_task_work(&file->f_u.fu_rcuhead, ____fput);
-			if (!task_work_add(task, &file->f_u.fu_rcuhead, TWA_RESUME))
+			/*
+			 * We could be dependent on the fput task_work running,
+			 * eg for pipes where someone is waiting on release
+			 * being called. Use TWA_SIGNAL to ensure it's run
+			 * before the task goes idle.
+			 */
+			if (!task_work_add(task, &file->f_u.fu_rcuhead, TWA_SIGNAL))
 				return;
 			/*
 			 * After this task has run exit_task_work(),