mbox series

[PATCHSET,v2,0/4] Add support for batched min timeout

Message ID 20240215161002.3044270-1-axboe@kernel.dk (mailing list archive)
Headers show
Series Add support for batched min timeout | expand

Message

Jens Axboe Feb. 15, 2024, 4:06 p.m. UTC
Hi,

Normal CQE waiting is generally either done with a timeout, or without
one. Outside of the timeout, the other key parameter is how many events
to wait for. If we ask for N events and we get that within the timeout,
then we return successfully. If we do not, then we return with -ETIME
and the application can then check how many CQEs are actually available,
if any.

This works fine, but we're increasingly using smaller timeouts in
applications for targeted batch waiting. Eg "give me N requests in T
usec". If the application has other things do do every T usec, this
works fine. But if it's an event loop that wants to process completions
to make progress, it's pointless to return after T usec if there's
nothing to do. The application can't really make T bigger reliably, as
this may be the target it has to meet at busier times of the day.

This patchset adds support for min timeout waiting, which adds a third
parameter to how waits are done. The N and T timeout remain, but we add
a min_timeout option, M. The batch is now defined by N and M. The
application can now say "give me N requests in M usec, but if none have
arrived, just sleep until T has passed". This allows for using a sane
N+M, while avoid waking and returning all the time if nothing happens.

The semantics are as follows:

- If M expires and no events are available, keep waiting until T has
  expired. This is identical to using N+T without setting M at all,
  except if an event arrives after M has expired, we return immediately.

- If M expires and events are available, return those even if it's
  less than N.

- If N events arrive before M expires, return those events. This is
  identical to T == M, and M not being set.

There's a liburing branch with test cases here:

https://git.kernel.dk/cgit/liburing/log/?h=min-wait

and the patches are on top of the current for-6.9/io_uring branch. They
can also be viewed here:

https://git.kernel.dk/cgit/linux/log/?h=io_uring-min-wait

 include/uapi/linux/io_uring.h |   3 +-
 io_uring/io_uring.c           | 156 ++++++++++++++++++++++++++++------
 io_uring/io_uring.h           |   4 +
 3 files changed, 134 insertions(+), 29 deletions(-)

Changes since v1:
- Fix issue with both min_wait and timeout, and transitioning to the long
  timeout. We'd add the current time potentially more than once, causing
  much longer waits than what was asked for. Test case has been added for
  that as well.