mbox series

[v3,0/4] open/accept directly into io_uring fixed file table

Message ID cover.1629559905.git.asml.silence@gmail.com (mailing list archive)
Headers show
Series open/accept directly into io_uring fixed file table | expand

Message

Pavel Begunkov Aug. 21, 2021, 3:52 p.m. UTC
Add an optional feature to open/accept directly into io_uring's fixed
file table bypassing the normal file table. Same behaviour if as the
snippet below, but in one operation:

sqe = prep_[open,accept](...);
cqe = submit_and_wait(sqe);
io_uring_register_files_update(uring_idx, (fd = cqe->res));
close((fd = cqe->res));

The idea in pretty old, and was brough up and implemented a year ago
by Josh Triplett, though haven't sought the light for some reasons.

The behaviour is controlled by setting sqe->file_index, where 0 implies
the old behaviour. If non-zero value is specified, then it will behave
as described and place the file into a fixed file slot
sqe->file_index - 1. A file table should be already created, the slot
should be valid and empty, otherwise the operation will fail.

we can't use IOSQE_FIXED_FILE to switch between modes, because accept
takes a file, and it already uses the flag with a different meaning.

since RFC:
 - added attribution
 - updated descriptions
 - rebased

since v1:
 - EBADF if slot is already used (Josh Triplett)
 - alias index with splice_fd_in (Josh Triplett)
 - fix a bound check bug

Pavel Begunkov (4):
  net: add accept helper not installing fd
  io_uring: openat directly into fixed fd table
  io_uring: hand code io_accept() fd installing
  io_uring: accept directly into fixed file table

 fs/io_uring.c                 | 129 +++++++++++++++++++++++++++++-----
 include/linux/socket.h        |   3 +
 include/uapi/linux/io_uring.h |   5 +-
 net/socket.c                  |  71 ++++++++++---------
 4 files changed, 157 insertions(+), 51 deletions(-)

Comments

Jens Axboe Aug. 22, 2021, 2:18 a.m. UTC | #1
On 8/21/21 9:52 AM, Pavel Begunkov wrote:
> Add an optional feature to open/accept directly into io_uring's fixed
> file table bypassing the normal file table. Same behaviour if as the
> snippet below, but in one operation:
> 
> sqe = prep_[open,accept](...);
> cqe = submit_and_wait(sqe);
> io_uring_register_files_update(uring_idx, (fd = cqe->res));
> close((fd = cqe->res));
> 
> The idea in pretty old, and was brough up and implemented a year ago
> by Josh Triplett, though haven't sought the light for some reasons.
> 
> The behaviour is controlled by setting sqe->file_index, where 0 implies
> the old behaviour. If non-zero value is specified, then it will behave
> as described and place the file into a fixed file slot
> sqe->file_index - 1. A file table should be already created, the slot
> should be valid and empty, otherwise the operation will fail.
> 
> we can't use IOSQE_FIXED_FILE to switch between modes, because accept
> takes a file, and it already uses the flag with a different meaning.
> 
> since RFC:
>  - added attribution
>  - updated descriptions
>  - rebased
> 
> since v1:
>  - EBADF if slot is already used (Josh Triplett)
>  - alias index with splice_fd_in (Josh Triplett)
>  - fix a bound check bug

With the prep series, this looks good to me now. Josh, what do you
think?

And we need the net folks to sign off on the first patch, of course.
Josh Triplett Aug. 23, 2021, 7:13 p.m. UTC | #2
On Sat, Aug 21, 2021 at 08:18:12PM -0600, Jens Axboe wrote:
> On 8/21/21 9:52 AM, Pavel Begunkov wrote:
> > Add an optional feature to open/accept directly into io_uring's fixed
> > file table bypassing the normal file table. Same behaviour if as the
> > snippet below, but in one operation:
> > 
> > sqe = prep_[open,accept](...);
> > cqe = submit_and_wait(sqe);
> > io_uring_register_files_update(uring_idx, (fd = cqe->res));
> > close((fd = cqe->res));
> > 
> > The idea in pretty old, and was brough up and implemented a year ago
> > by Josh Triplett, though haven't sought the light for some reasons.
> > 
> > The behaviour is controlled by setting sqe->file_index, where 0 implies
> > the old behaviour. If non-zero value is specified, then it will behave
> > as described and place the file into a fixed file slot
> > sqe->file_index - 1. A file table should be already created, the slot
> > should be valid and empty, otherwise the operation will fail.
> > 
> > we can't use IOSQE_FIXED_FILE to switch between modes, because accept
> > takes a file, and it already uses the flag with a different meaning.
> > 
> > since RFC:
> >  - added attribution
> >  - updated descriptions
> >  - rebased
> > 
> > since v1:
> >  - EBADF if slot is already used (Josh Triplett)
> >  - alias index with splice_fd_in (Josh Triplett)
> >  - fix a bound check bug
> 
> With the prep series, this looks good to me now. Josh, what do you
> think?

I would still like to see this using a union with the `nofile` field in
io_open and io_accept, rather than overloading the 16-bit buf_index
field. That would avoid truncating to 16 bits, and make less work for
expansion to more than 16 bits of fixed file indexes.

(I'd also like that to actually use a union, rather than overloading the
meaning of buf_index/nofile.)

I personally still feel that using non-zero to signify index-plus-one is
both error-prone and not as future-compatible. I think we could do
better with no additional overhead. But I think the final call on that
interface is up to you, Jens. Do you think it'd be worth spending a flag
bit or using a different opcode, to get a cleaner interface? If you
don't, then I'd be fine with seeing this go in with just the io_open and
io_accept change.

- Josh Triplett
Jens Axboe Aug. 23, 2021, 7:40 p.m. UTC | #3
On 8/23/21 1:13 PM, Josh Triplett wrote:
> On Sat, Aug 21, 2021 at 08:18:12PM -0600, Jens Axboe wrote:
>> On 8/21/21 9:52 AM, Pavel Begunkov wrote:
>>> Add an optional feature to open/accept directly into io_uring's fixed
>>> file table bypassing the normal file table. Same behaviour if as the
>>> snippet below, but in one operation:
>>>
>>> sqe = prep_[open,accept](...);
>>> cqe = submit_and_wait(sqe);
>>> io_uring_register_files_update(uring_idx, (fd = cqe->res));
>>> close((fd = cqe->res));
>>>
>>> The idea in pretty old, and was brough up and implemented a year ago
>>> by Josh Triplett, though haven't sought the light for some reasons.
>>>
>>> The behaviour is controlled by setting sqe->file_index, where 0 implies
>>> the old behaviour. If non-zero value is specified, then it will behave
>>> as described and place the file into a fixed file slot
>>> sqe->file_index - 1. A file table should be already created, the slot
>>> should be valid and empty, otherwise the operation will fail.
>>>
>>> we can't use IOSQE_FIXED_FILE to switch between modes, because accept
>>> takes a file, and it already uses the flag with a different meaning.
>>>
>>> since RFC:
>>>  - added attribution
>>>  - updated descriptions
>>>  - rebased
>>>
>>> since v1:
>>>  - EBADF if slot is already used (Josh Triplett)
>>>  - alias index with splice_fd_in (Josh Triplett)
>>>  - fix a bound check bug
>>
>> With the prep series, this looks good to me now. Josh, what do you
>> think?
> 
> I would still like to see this using a union with the `nofile` field in
> io_open and io_accept, rather than overloading the 16-bit buf_index
> field. That would avoid truncating to 16 bits, and make less work for
> expansion to more than 16 bits of fixed file indexes.
> 
> (I'd also like that to actually use a union, rather than overloading the
> meaning of buf_index/nofile.)

Agree, and in fact there's room in the open and accept command parts, so
we can just make it a separate entry there instead of using ->buf_index.
Then just pass in the index to io_install_fixed_file() instead of having
it pull it from req->buf_index.

> I personally still feel that using non-zero to signify index-plus-one is
> both error-prone and not as future-compatible. I think we could do
> better with no additional overhead. But I think the final call on that
> interface is up to you, Jens. Do you think it'd be worth spending a flag
> bit or using a different opcode, to get a cleaner interface? If you
> don't, then I'd be fine with seeing this go in with just the io_open and
> io_accept change.

I'd be inclined to go the extra opcode route instead, as the flag only
really would make sense to requests that instantiate file descriptors.
For this particular case, we'd need 3 new opcodes for
openat/openat2/accept, which is probably a worthwhile expenditure.

Pavel, what do you think? Switch to using a different opcode for the new
requests, and just grab some space in io_open and io_accept for the fd
and pass it in to install.

I do think that'd end up being less hackish and easier to grok for a
user.
Pavel Begunkov Aug. 24, 2021, 9:48 a.m. UTC | #4
On 8/23/21 8:40 PM, Jens Axboe wrote:
> On 8/23/21 1:13 PM, Josh Triplett wrote:
>> On Sat, Aug 21, 2021 at 08:18:12PM -0600, Jens Axboe wrote:
>>> On 8/21/21 9:52 AM, Pavel Begunkov wrote:
>>>> Add an optional feature to open/accept directly into io_uring's fixed
>>>> file table bypassing the normal file table. Same behaviour if as the
>>>> snippet below, but in one operation:
>>>>
>>>> sqe = prep_[open,accept](...);
>>>> cqe = submit_and_wait(sqe);
>>>> io_uring_register_files_update(uring_idx, (fd = cqe->res));
>>>> close((fd = cqe->res));
>>>>
>>>> The idea in pretty old, and was brough up and implemented a year ago
>>>> by Josh Triplett, though haven't sought the light for some reasons.
>>>>
>>>> The behaviour is controlled by setting sqe->file_index, where 0 implies
>>>> the old behaviour. If non-zero value is specified, then it will behave
>>>> as described and place the file into a fixed file slot
>>>> sqe->file_index - 1. A file table should be already created, the slot
>>>> should be valid and empty, otherwise the operation will fail.
>>>>
>>>> we can't use IOSQE_FIXED_FILE to switch between modes, because accept
>>>> takes a file, and it already uses the flag with a different meaning.
>>>>
>>>> since RFC:
>>>>  - added attribution
>>>>  - updated descriptions
>>>>  - rebased
>>>>
>>>> since v1:
>>>>  - EBADF if slot is already used (Josh Triplett)
>>>>  - alias index with splice_fd_in (Josh Triplett)
>>>>  - fix a bound check bug
>>>
>>> With the prep series, this looks good to me now. Josh, what do you
>>> think?
>>
>> I would still like to see this using a union with the `nofile` field in
>> io_open and io_accept, rather than overloading the 16-bit buf_index
>> field. That would avoid truncating to 16 bits, and make less work for
>> expansion to more than 16 bits of fixed file indexes.
>>
>> (I'd also like that to actually use a union, rather than overloading the
>> meaning of buf_index/nofile.)
> 
> Agree, and in fact there's room in the open and accept command parts, so
> we can just make it a separate entry there instead of using ->buf_index.
> Then just pass in the index to io_install_fixed_file() instead of having
> it pull it from req->buf_index.

That's internal details, can be expanded at wish in the future, if we'd
ever need larger tables. ->buf_index already holds indexes to different
resources just fine.

Aliasing with nofile would rather be ugly, so the only option, as you
mentioned, is to grab some space from open/accept structs, but don't see
why we'd want it when there is a more convenient alternative.

>> I personally still feel that using non-zero to signify index-plus-one is
>> both error-prone and not as future-compatible. I think we could do
>> better with no additional overhead. But I think the final call on that
>> interface is up to you, Jens. Do you think it'd be worth spending a flag
>> bit or using a different opcode, to get a cleaner interface? If you
>> don't, then I'd be fine with seeing this go in with just the io_open and
>> io_accept change.
> 
> I'd be inclined to go the extra opcode route instead, as the flag only
> really would make sense to requests that instantiate file descriptors.
> For this particular case, we'd need 3 new opcodes for
> openat/openat2/accept, which is probably a worthwhile expenditure.
> 
> Pavel, what do you think? Switch to using a different opcode for the new
> requests, and just grab some space in io_open and io_accept for the fd
> and pass it in to install.

I don't get it, why it's even called hackish? How that's anyhow better?
To me the feature looks like a natural extension to the operations, just
like a read can be tuned with flags, so and creating new opcodes seems
a bit ugly, unnecessary taking space from opcodes and adding duplication
(even if both versions call the same handler).

First, why it's not future-compatible? It's a serious argument, but I
don't see where it came from. Do I miss something?

It's u32 now, and so will easily cover all indexes. SQE fields should
always be zeroed, that's a rule, liburing follows it, and there would
have been already lots of problems for users not honoring it. And there
will be a helper hiding all the index conversions for convenience.

void io_uring_prep_open_direct(sqe, index, ...)
{
	io_uring_prep_open(sqe, ...);
	sqe->file_index = index + 1;
}
Jens Axboe Aug. 24, 2021, 2:02 p.m. UTC | #5
On 8/24/21 3:48 AM, Pavel Begunkov wrote:
> On 8/23/21 8:40 PM, Jens Axboe wrote:
>> On 8/23/21 1:13 PM, Josh Triplett wrote:
>>> On Sat, Aug 21, 2021 at 08:18:12PM -0600, Jens Axboe wrote:
>>>> On 8/21/21 9:52 AM, Pavel Begunkov wrote:
>>>>> Add an optional feature to open/accept directly into io_uring's fixed
>>>>> file table bypassing the normal file table. Same behaviour if as the
>>>>> snippet below, but in one operation:
>>>>>
>>>>> sqe = prep_[open,accept](...);
>>>>> cqe = submit_and_wait(sqe);
>>>>> io_uring_register_files_update(uring_idx, (fd = cqe->res));
>>>>> close((fd = cqe->res));
>>>>>
>>>>> The idea in pretty old, and was brough up and implemented a year ago
>>>>> by Josh Triplett, though haven't sought the light for some reasons.
>>>>>
>>>>> The behaviour is controlled by setting sqe->file_index, where 0 implies
>>>>> the old behaviour. If non-zero value is specified, then it will behave
>>>>> as described and place the file into a fixed file slot
>>>>> sqe->file_index - 1. A file table should be already created, the slot
>>>>> should be valid and empty, otherwise the operation will fail.
>>>>>
>>>>> we can't use IOSQE_FIXED_FILE to switch between modes, because accept
>>>>> takes a file, and it already uses the flag with a different meaning.
>>>>>
>>>>> since RFC:
>>>>>  - added attribution
>>>>>  - updated descriptions
>>>>>  - rebased
>>>>>
>>>>> since v1:
>>>>>  - EBADF if slot is already used (Josh Triplett)
>>>>>  - alias index with splice_fd_in (Josh Triplett)
>>>>>  - fix a bound check bug
>>>>
>>>> With the prep series, this looks good to me now. Josh, what do you
>>>> think?
>>>
>>> I would still like to see this using a union with the `nofile` field in
>>> io_open and io_accept, rather than overloading the 16-bit buf_index
>>> field. That would avoid truncating to 16 bits, and make less work for
>>> expansion to more than 16 bits of fixed file indexes.
>>>
>>> (I'd also like that to actually use a union, rather than overloading the
>>> meaning of buf_index/nofile.)
>>
>> Agree, and in fact there's room in the open and accept command parts, so
>> we can just make it a separate entry there instead of using ->buf_index.
>> Then just pass in the index to io_install_fixed_file() instead of having
>> it pull it from req->buf_index.
> 
> That's internal details, can be expanded at wish in the future, if we'd
> ever need larger tables. ->buf_index already holds indexes to different
> resources just fine.

Sure it's internal and can always be changed, doesn't change the fact
that it's a bit iffy that it's used differently in different spots. As
it costs us nothing to simply add a 'fixed_file' u32 for io_accept and
io_open, I really think that should be done instead.

> Aliasing with nofile would rather be ugly, so the only option, as you
> mentioned, is to grab some space from open/accept structs, but don't see
> why we'd want it when there is a more convenient alternative.

Because it's a lot more readable and less error prone imho. Agree on the
union, we don't have to resort to that.

>>> I personally still feel that using non-zero to signify index-plus-one is
>>> both error-prone and not as future-compatible. I think we could do
>>> better with no additional overhead. But I think the final call on that
>>> interface is up to you, Jens. Do you think it'd be worth spending a flag
>>> bit or using a different opcode, to get a cleaner interface? If you
>>> don't, then I'd be fine with seeing this go in with just the io_open and
>>> io_accept change.
>>
>> I'd be inclined to go the extra opcode route instead, as the flag only
>> really would make sense to requests that instantiate file descriptors.
>> For this particular case, we'd need 3 new opcodes for
>> openat/openat2/accept, which is probably a worthwhile expenditure.
>>
>> Pavel, what do you think? Switch to using a different opcode for the new
>> requests, and just grab some space in io_open and io_accept for the fd
>> and pass it in to install.
> 
> I don't get it, why it's even called hackish? How that's anyhow better?
> To me the feature looks like a natural extension to the operations, just
> like a read can be tuned with flags, so and creating new opcodes seems
> a bit ugly, unnecessary taking space from opcodes and adding duplication
> (even if both versions call the same handler).

I agree that it's a natural extension, the problem is that we have to do
unnatural things (somewhat) to make it work. I'm fine with using the
union for the splice_fd_in to pass it in, I don't think it's a big deal.

I do wish that IORING_OP_CLOSE would work with them, though. I think we
should to that as a followup patch. It's a bit odd to be able to open a
file with IORING_OP_OPENAT and not being able to close it with
IORING_OP_CLOSE. For the latter, we should just give it fixed file
support, which would be pretty trivial.

> First, why it's not future-compatible? It's a serious argument, but I
> don't see where it came from. Do I miss something?
> 
> It's u32 now, and so will easily cover all indexes. SQE fields should
> always be zeroed, that's a rule, liburing follows it, and there would
> have been already lots of problems for users not honoring it. And there
> will be a helper hiding all the index conversions for convenience.
> 
> void io_uring_prep_open_direct(sqe, index, ...)
> {
> 	io_uring_prep_open(sqe, ...);
> 	sqe->file_index = index + 1;
> }

Let's keep it the way that it is, but I do want to see the buf_index
thing go away and just req->open.fixed_file or whatever being used for
open and accept. We should fold that in.
Pavel Begunkov Aug. 24, 2021, 2:43 p.m. UTC | #6
On 8/24/21 3:02 PM, Jens Axboe wrote:
> On 8/24/21 3:48 AM, Pavel Begunkov wrote:
>> On 8/23/21 8:40 PM, Jens Axboe wrote:
>>> On 8/23/21 1:13 PM, Josh Triplett wrote:
>>>> On Sat, Aug 21, 2021 at 08:18:12PM -0600, Jens Axboe wrote:
>>>>> On 8/21/21 9:52 AM, Pavel Begunkov wrote:
>>>>>> Add an optional feature to open/accept directly into io_uring's fixed
>>>>>> file table bypassing the normal file table. Same behaviour if as the
>>>>>> snippet below, but in one operation:
>>>>>>
>>>>>> sqe = prep_[open,accept](...);
>>>>>> cqe = submit_and_wait(sqe);
>>>>>> io_uring_register_files_update(uring_idx, (fd = cqe->res));
>>>>>> close((fd = cqe->res));
>>>>>>
>>>>>> The idea in pretty old, and was brough up and implemented a year ago
>>>>>> by Josh Triplett, though haven't sought the light for some reasons.
>>>>>>
>>>>>> The behaviour is controlled by setting sqe->file_index, where 0 implies
>>>>>> the old behaviour. If non-zero value is specified, then it will behave
>>>>>> as described and place the file into a fixed file slot
>>>>>> sqe->file_index - 1. A file table should be already created, the slot
>>>>>> should be valid and empty, otherwise the operation will fail.
>>>>>>
>>>>>> we can't use IOSQE_FIXED_FILE to switch between modes, because accept
>>>>>> takes a file, and it already uses the flag with a different meaning.
>>>>>>
>>>>>> since RFC:
>>>>>>  - added attribution
>>>>>>  - updated descriptions
>>>>>>  - rebased
>>>>>>
>>>>>> since v1:
>>>>>>  - EBADF if slot is already used (Josh Triplett)
>>>>>>  - alias index with splice_fd_in (Josh Triplett)
>>>>>>  - fix a bound check bug
>>>>>
>>>>> With the prep series, this looks good to me now. Josh, what do you
>>>>> think?
>>>>
>>>> I would still like to see this using a union with the `nofile` field in
>>>> io_open and io_accept, rather than overloading the 16-bit buf_index
>>>> field. That would avoid truncating to 16 bits, and make less work for
>>>> expansion to more than 16 bits of fixed file indexes.
>>>>
>>>> (I'd also like that to actually use a union, rather than overloading the
>>>> meaning of buf_index/nofile.)
>>>
>>> Agree, and in fact there's room in the open and accept command parts, so
>>> we can just make it a separate entry there instead of using ->buf_index.
>>> Then just pass in the index to io_install_fixed_file() instead of having
>>> it pull it from req->buf_index.
>>
>> That's internal details, can be expanded at wish in the future, if we'd
>> ever need larger tables. ->buf_index already holds indexes to different
>> resources just fine.
> 
> Sure it's internal and can always be changed, doesn't change the fact
> that it's a bit iffy that it's used differently in different spots. As
> it costs us nothing to simply add a 'fixed_file' u32 for io_accept and
> io_open, I really think that should be done instead.
> 
>> Aliasing with nofile would rather be ugly, so the only option, as you
>> mentioned, is to grab some space from open/accept structs, but don't see
>> why we'd want it when there is a more convenient alternative.
> 
> Because it's a lot more readable and less error prone imho. Agree on the
> union, we don't have to resort to that.

Ok, I don't have a strong opinion on that. Will resend



>>>> I personally still feel that using non-zero to signify index-plus-one is
>>>> both error-prone and not as future-compatible. I think we could do
>>>> better with no additional overhead. But I think the final call on that
>>>> interface is up to you, Jens. Do you think it'd be worth spending a flag
>>>> bit or using a different opcode, to get a cleaner interface? If you
>>>> don't, then I'd be fine with seeing this go in with just the io_open and
>>>> io_accept change.
>>>
>>> I'd be inclined to go the extra opcode route instead, as the flag only
>>> really would make sense to requests that instantiate file descriptors.
>>> For this particular case, we'd need 3 new opcodes for
>>> openat/openat2/accept, which is probably a worthwhile expenditure.
>>>
>>> Pavel, what do you think? Switch to using a different opcode for the new
>>> requests, and just grab some space in io_open and io_accept for the fd
>>> and pass it in to install.
>>
>> I don't get it, why it's even called hackish? How that's anyhow better?
>> To me the feature looks like a natural extension to the operations, just
>> like a read can be tuned with flags, so and creating new opcodes seems
>> a bit ugly, unnecessary taking space from opcodes and adding duplication
>> (even if both versions call the same handler).
> 
> I agree that it's a natural extension, the problem is that we have to do
> unnatural things (somewhat) to make it work. I'm fine with using the
> union for the splice_fd_in to pass it in, I don't think it's a big deal.
> 
> I do wish that IORING_OP_CLOSE would work with them, though. I think we
> should to that as a followup patch. It's a bit odd to be able to open a
> file with IORING_OP_OPENAT and not being able to close it with
> IORING_OP_CLOSE. For the latter, we should just give it fixed file
> support, which would be pretty trivial.
> 
>> First, why it's not future-compatible? It's a serious argument, but I
>> don't see where it came from. Do I miss something?
>>
>> It's u32 now, and so will easily cover all indexes. SQE fields should
>> always be zeroed, that's a rule, liburing follows it, and there would
>> have been already lots of problems for users not honoring it. And there
>> will be a helper hiding all the index conversions for convenience.
>>
>> void io_uring_prep_open_direct(sqe, index, ...)
>> {
>> 	io_uring_prep_open(sqe, ...);
>> 	sqe->file_index = index + 1;
>> }
> 
> Let's keep it the way that it is, but I do want to see the buf_index
> thing go away and just req->open.fixed_file or whatever being used for
> open and accept. We should fold that in.