mbox series

[PATCHSET,v4,0/7] io_uring epoll wait support

Message ID 20250219172552.1565603-1-axboe@kernel.dk (mailing list archive)
Headers show
Series io_uring epoll wait support | expand

Message

Jens Axboe Feb. 19, 2025, 5:22 p.m. UTC
Hi,

One issue people consistently run into when converting legacy epoll
event loops with io_uring is that parts of the event loop still needs to
use epoll. And since event loops generally need to wait in one spot,
they add the io_uring fd to the epoll set and continue to use
epoll_wait(2) to wait on events. This is suboptimal on the io_uring
front as there's now an active poller on the ring, and it's suboptimal
as it doesn't give the application the batch waiting (with fine grained
timeouts) that io_uring provides.

This patchset adds support for IORING_OP_EPOLL_WAIT, which does an async
epoll_wait() operation. No sleeping or thread offload is involved, it
relies on the internal poll infrastructure that io_uring uses to drive
retries on pollable entities. With that, then the above event loops can
continue to use epoll for certain parts, but bundle it all under waiting
on the ring itself rather than add the ring fd to the epoll set.

Patches 1..2 are just prep patches, and patch 3 adds the epoll change
to allow io_uring to queue a callback, if no events are available. Patch
4 is just prep the io_uring side, and patch 5 finally adds
IORING_OP_EPOLL_WAIT support

Patches can also be found here:

https://git.kernel.dk/cgit/linux/log/?h=io_uring-epoll-wait

and are against 6.14-rc3 + already pending io_uring patches.

 fs/eventpoll.c                | 87 +++++++++++++++++++++++++----------
 include/linux/eventpoll.h     |  4 ++
 include/uapi/linux/io_uring.h |  1 +
 io_uring/Makefile             |  9 ++--
 io_uring/epoll.c              | 35 +++++++++++++-
 io_uring/epoll.h              |  2 +
 io_uring/opdef.c              | 14 ++++++
 7 files changed, 122 insertions(+), 30 deletions(-)

Since v3:
- Base on poll infrastructure rather than rolling our own, thanks to
  Pavel's suggestion.
- Rebase on top of 6.15 changes, which shifted the opcode value due
  to the addition of zc rx.

Comments

Christian Brauner Feb. 20, 2025, 9:21 a.m. UTC | #1
On Wed, 19 Feb 2025 10:22:23 -0700, Jens Axboe wrote:
> One issue people consistently run into when converting legacy epoll
> event loops with io_uring is that parts of the event loop still needs to
> use epoll. And since event loops generally need to wait in one spot,
> they add the io_uring fd to the epoll set and continue to use
> epoll_wait(2) to wait on events. This is suboptimal on the io_uring
> front as there's now an active poller on the ring, and it's suboptimal
> as it doesn't give the application the batch waiting (with fine grained
> timeouts) that io_uring provides.
> 
> [...]

Preparatory patches in vfs-6.15.eventpoll with tag vfs-6.15-rc1.eventpoll.
Stable now.

---

Applied to the vfs-6.15.eventpoll branch of the vfs/vfs.git tree.
Patches in the vfs-6.15.eventpoll branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.15.eventpoll

[1/5] eventpoll: abstract out parameter sanity checking
      https://git.kernel.org/vfs/vfs/c/6b47d35d4d9e
[2/5] eventpoll: abstract out ep_try_send_events() helper
      https://git.kernel.org/vfs/vfs/c/38d203560118
[3/5] eventpoll: add epoll_sendevents() helper
      https://git.kernel.org/vfs/vfs/c/ae3a4f1fdc2c
Jens Axboe Feb. 20, 2025, 3:15 p.m. UTC | #2
On 2/20/25 2:21 AM, Christian Brauner wrote:
> On Wed, 19 Feb 2025 10:22:23 -0700, Jens Axboe wrote:
>> One issue people consistently run into when converting legacy epoll
>> event loops with io_uring is that parts of the event loop still needs to
>> use epoll. And since event loops generally need to wait in one spot,
>> they add the io_uring fd to the epoll set and continue to use
>> epoll_wait(2) to wait on events. This is suboptimal on the io_uring
>> front as there's now an active poller on the ring, and it's suboptimal
>> as it doesn't give the application the batch waiting (with fine grained
>> timeouts) that io_uring provides.
>>
>> [...]
> 
> Preparatory patches in vfs-6.15.eventpoll with tag vfs-6.15-rc1.eventpoll.
> Stable now.

Thanks, I'll rebase on your branch.