mbox series

[PATCHSET,v4,0/5] Add io_uring support for waitid

Message ID 20230909151124.1229695-1-axboe@kernel.dk (mailing list archive)
Headers show
Series Add io_uring support for waitid | expand

Message

Jens Axboe Sept. 9, 2023, 3:11 p.m. UTC
Hi,

This adds support for IORING_OP_WAITID, which is an async variant of
the waitid(2) syscall. Rather than have a parent need to block waiting
on a child task state change, it can now simply get an async notication
when the requested state change has occured.

Patches 1..4 are purely prep patches, and should not have functional
changes. They split out parts of do_wait() into __do_wait(), so that
the prepare-to-wait and sleep parts are contained within do_wait().

Patch 5 adds io_uring support.

I wrote a few basic tests for this, which can be found in the
'waitid' branch of liburing:

https://git.kernel.dk/cgit/liburing/log/?h=waitid

Also spun a custom kernel for someone to test it, and no issues reported
so far.

The code can also be found here:

https://git.kernel.dk/cgit/linux/log/?h=io_uring-waitid

 include/linux/io_uring_types.h |   2 +
 include/uapi/linux/io_uring.h  |   2 +
 io_uring/Makefile              |   3 +-
 io_uring/cancel.c              |   5 +
 io_uring/io_uring.c            |   3 +
 io_uring/opdef.c               |  10 +-
 io_uring/waitid.c              | 372 +++++++++++++++++++++++++++++++++
 io_uring/waitid.h              |  15 ++
 kernel/exit.c                  | 131 ++++++------
 kernel/exit.h                  |  30 +++
 10 files changed, 512 insertions(+), 61 deletions(-)

Changes since v3:
- Rebase on current tree
- Move it before the futex changes. Note that if you're testing this,
  this means that the opcode values have changed. The liburing repo
  has been rebased as a result as well, you'll want to update that too.
  liburing also has update test cases.
- Fix races between cancelation and wakeup trigger. This follows a
  scheme similar to the internal poll io_uring handling

Comments

Jens Axboe Sept. 12, 2023, 5:06 p.m. UTC | #1
On 9/9/23 9:11 AM, Jens Axboe wrote:
> Hi,
> 
> This adds support for IORING_OP_WAITID, which is an async variant of
> the waitid(2) syscall. Rather than have a parent need to block waiting
> on a child task state change, it can now simply get an async notication
> when the requested state change has occured.
> 
> Patches 1..4 are purely prep patches, and should not have functional
> changes. They split out parts of do_wait() into __do_wait(), so that
> the prepare-to-wait and sleep parts are contained within do_wait().
> 
> Patch 5 adds io_uring support.
> 
> I wrote a few basic tests for this, which can be found in the
> 'waitid' branch of liburing:
> 
> https://git.kernel.dk/cgit/liburing/log/?h=waitid
> 
> Also spun a custom kernel for someone to test it, and no issues reported
> so far.

Forget to mention that I also ran all the ltp testcases for any wait*
syscall test, and everything still passes just fine.
Christian Brauner Sept. 19, 2023, 2:45 p.m. UTC | #2
On Tue, Sep 12, 2023 at 11:06:39AM -0600, Jens Axboe wrote:
> On 9/9/23 9:11 AM, Jens Axboe wrote:
> > Hi,
> > 
> > This adds support for IORING_OP_WAITID, which is an async variant of
> > the waitid(2) syscall. Rather than have a parent need to block waiting
> > on a child task state change, it can now simply get an async notication
> > when the requested state change has occured.
> > 
> > Patches 1..4 are purely prep patches, and should not have functional
> > changes. They split out parts of do_wait() into __do_wait(), so that
> > the prepare-to-wait and sleep parts are contained within do_wait().
> > 
> > Patch 5 adds io_uring support.
> > 
> > I wrote a few basic tests for this, which can be found in the
> > 'waitid' branch of liburing:
> > 
> > https://git.kernel.dk/cgit/liburing/log/?h=waitid
> > 
> > Also spun a custom kernel for someone to test it, and no issues reported
> > so far.
> 
> Forget to mention that I also ran all the ltp testcases for any wait*
> syscall test, and everything still passes just fine.

I think the struct that this ends up exposing to io_uring is pretty ugly
and it would warrant a larger cleanup. I wouldn't be surprised if you
get some people complain about this.

Other than that I don't have any complaints about the series.
Jens Axboe Sept. 19, 2023, 2:57 p.m. UTC | #3
On 9/19/23 8:45 AM, Christian Brauner wrote:
> On Tue, Sep 12, 2023 at 11:06:39AM -0600, Jens Axboe wrote:
>> On 9/9/23 9:11 AM, Jens Axboe wrote:
>>> Hi,
>>>
>>> This adds support for IORING_OP_WAITID, which is an async variant of
>>> the waitid(2) syscall. Rather than have a parent need to block waiting
>>> on a child task state change, it can now simply get an async notication
>>> when the requested state change has occured.
>>>
>>> Patches 1..4 are purely prep patches, and should not have functional
>>> changes. They split out parts of do_wait() into __do_wait(), so that
>>> the prepare-to-wait and sleep parts are contained within do_wait().
>>>
>>> Patch 5 adds io_uring support.
>>>
>>> I wrote a few basic tests for this, which can be found in the
>>> 'waitid' branch of liburing:
>>>
>>> https://git.kernel.dk/cgit/liburing/log/?h=waitid
>>>
>>> Also spun a custom kernel for someone to test it, and no issues reported
>>> so far.
>>
>> Forget to mention that I also ran all the ltp testcases for any wait*
>> syscall test, and everything still passes just fine.
> 
> I think the struct that this ends up exposing to io_uring is pretty ugly
> and it would warrant a larger cleanup. I wouldn't be surprised if you
> get some people complain about this.
> 
> Other than that I don't have any complaints about the series.

io_uring only really needs child_wait and wo_pid on the wait_opts side,
for waitid_info it needs all of it. I'm assuming your worry is about the
former rather than the latter.

I think we could only make this smaller if we had a separate entry point
for io_uring, which would then make the code reuse a lot smaller. Right
now we just have __do_wait() abstracted out, and if we added a third
struct that has child_wait/wo_pid and exposed just that, we could not
share this infrastructure.

So as far as I can tell, there's no way to make the sharing less than it
is, at least not without adding cost of more code and less reuse.

Shrug?
Jens Axboe Sept. 20, 2023, 2:54 p.m. UTC | #4
On 9/19/23 8:57 AM, Jens Axboe wrote:
> On 9/19/23 8:45 AM, Christian Brauner wrote:
>> On Tue, Sep 12, 2023 at 11:06:39AM -0600, Jens Axboe wrote:
>>> On 9/9/23 9:11 AM, Jens Axboe wrote:
>>>> Hi,
>>>>
>>>> This adds support for IORING_OP_WAITID, which is an async variant of
>>>> the waitid(2) syscall. Rather than have a parent need to block waiting
>>>> on a child task state change, it can now simply get an async notication
>>>> when the requested state change has occured.
>>>>
>>>> Patches 1..4 are purely prep patches, and should not have functional
>>>> changes. They split out parts of do_wait() into __do_wait(), so that
>>>> the prepare-to-wait and sleep parts are contained within do_wait().
>>>>
>>>> Patch 5 adds io_uring support.
>>>>
>>>> I wrote a few basic tests for this, which can be found in the
>>>> 'waitid' branch of liburing:
>>>>
>>>> https://git.kernel.dk/cgit/liburing/log/?h=waitid
>>>>
>>>> Also spun a custom kernel for someone to test it, and no issues reported
>>>> so far.
>>>
>>> Forget to mention that I also ran all the ltp testcases for any wait*
>>> syscall test, and everything still passes just fine.
>>
>> I think the struct that this ends up exposing to io_uring is pretty ugly
>> and it would warrant a larger cleanup. I wouldn't be surprised if you
>> get some people complain about this.
>>
>> Other than that I don't have any complaints about the series.
> 
> io_uring only really needs child_wait and wo_pid on the wait_opts side,
> for waitid_info it needs all of it. I'm assuming your worry is about the
> former rather than the latter.
> 
> I think we could only make this smaller if we had a separate entry point
> for io_uring, which would then make the code reuse a lot smaller. Right
> now we just have __do_wait() abstracted out, and if we added a third
> struct that has child_wait/wo_pid and exposed just that, we could not
> share this infrastructure.
> 
> So as far as I can tell, there's no way to make the sharing less than it
> is, at least not without adding cost of more code and less reuse.
> 
> Shrug?

Took a closer look, and I don't think it's really possible to split much
out of wait_opts. You may only need child_wait/wo_pid for setup, but on
the wakeup side you type and flags as well. We could probably add that
third struct and move wo_rusage and notask_error out, but seems very
pointless at that point just to avoid those two. And if we do wire up
rusage at some point, then we're left with just the one.