mbox series

[0/2] Remove kiocb ki_complete

Message ID 20200708222637.23046-1-willy@infradead.org (mailing list archive)
Headers show
Series Remove kiocb ki_complete | expand

Message

Matthew Wilcox (Oracle) July 8, 2020, 10:26 p.m. UTC
Save a pointer in the kiocb by using a few bits in ki_flags to index
a table of completion functions.

Matthew Wilcox (Oracle) (2):
  fs: Abstract calling the kiocb completion function
  fs: Remove kiocb->ki_complete

 crypto/af_alg.c                    |  2 +-
 drivers/block/loop.c               | 14 +++++++--
 drivers/nvme/target/core.c         | 10 +++++-
 drivers/nvme/target/io-cmd-file.c  | 10 +++---
 drivers/nvme/target/nvmet.h        |  2 ++
 drivers/target/target_core_file.c  | 20 ++++++++++--
 drivers/usb/gadget/function/f_fs.c |  2 +-
 drivers/usb/gadget/legacy/inode.c  |  4 +--
 fs/aio.c                           | 50 ++++++++++++++++--------------
 fs/block_dev.c                     |  2 +-
 fs/ceph/file.c                     |  2 +-
 fs/cifs/file.c                     |  8 ++---
 fs/direct-io.c                     |  2 +-
 fs/fuse/file.c                     |  2 +-
 fs/io_uring.c                      | 14 ++++++---
 fs/iomap/direct-io.c               |  3 +-
 fs/nfs/direct.c                    |  2 +-
 fs/ocfs2/file.c                    |  7 +++--
 fs/overlayfs/file.c                | 17 +++++++---
 fs/read_write.c                    | 36 +++++++++++++++++++++
 include/linux/fs.h                 | 23 ++++++++++++--
 21 files changed, 168 insertions(+), 64 deletions(-)

Comments

Jens Axboe July 8, 2020, 10:33 p.m. UTC | #1
On 7/8/20 4:26 PM, Matthew Wilcox (Oracle) wrote:
> Save a pointer in the kiocb by using a few bits in ki_flags to index
> a table of completion functions.

I ran polled and regular IO testing through io_uring, which exercises
both completions that we have in there, and it works just fine for
me.

Didn't test anything beyond that, outside of ensuring the kernel also
booted ;-)
Christoph Hellwig July 9, 2020, 10:17 a.m. UTC | #2
I really don't like this series at all.  If saves a single pointer
but introduces a complicated machinery that just doesn't follow any
natural flow.  And there doesn't seem to be any good reason for it to
start with.
Matthew Wilcox (Oracle) July 9, 2020, 11:10 a.m. UTC | #3
On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote:
> I really don't like this series at all.  If saves a single pointer
> but introduces a complicated machinery that just doesn't follow any
> natural flow.  And there doesn't seem to be any good reason for it to
> start with.

Jens doesn't want the kiocb to grow beyond a single cacheline, and we
want the ability to set the loff_t in userspace for an appending write,
so the plan was to replace the ki_complete member in kiocb with an
loff_t __user *ki_posp.

I don't think it's worth worrying about growing kiocb, personally,
but this seemed like the easiest way to make room for a new pointer.
Christoph Hellwig July 9, 2020, 1:26 p.m. UTC | #4
On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote:
> On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote:
> > I really don't like this series at all.  If saves a single pointer
> > but introduces a complicated machinery that just doesn't follow any
> > natural flow.  And there doesn't seem to be any good reason for it to
> > start with.
> 
> Jens doesn't want the kiocb to grow beyond a single cacheline, and we
> want the ability to set the loff_t in userspace for an appending write,
> so the plan was to replace the ki_complete member in kiocb with an
> loff_t __user *ki_posp.
> 
> I don't think it's worth worrying about growing kiocb, personally,
> but this seemed like the easiest way to make room for a new pointer.

The user offset pointer has absolutely no business in the the kiocb
itself - it is a io_uring concept which needs to go into the io_kiocb,
which has 14 bytes left in the last cache line in my build.  It would
fit in very well there right next to the result and user pointer.
Matthew Wilcox (Oracle) July 9, 2020, 1:32 p.m. UTC | #5
On Thu, Jul 09, 2020 at 02:26:11PM +0100, Christoph Hellwig wrote:
> On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote:
> > On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote:
> > > I really don't like this series at all.  If saves a single pointer
> > > but introduces a complicated machinery that just doesn't follow any
> > > natural flow.  And there doesn't seem to be any good reason for it to
> > > start with.
> > 
> > Jens doesn't want the kiocb to grow beyond a single cacheline, and we
> > want the ability to set the loff_t in userspace for an appending write,
> > so the plan was to replace the ki_complete member in kiocb with an
> > loff_t __user *ki_posp.
> > 
> > I don't think it's worth worrying about growing kiocb, personally,
> > but this seemed like the easiest way to make room for a new pointer.
> 
> The user offset pointer has absolutely no business in the the kiocb
> itself - it is a io_uring concept which needs to go into the io_kiocb,
> which has 14 bytes left in the last cache line in my build.  It would
> fit in very well there right next to the result and user pointer.

I agree.  Jens doesn't.
Pavel Begunkov July 9, 2020, 1:37 p.m. UTC | #6
On 09/07/2020 16:26, Christoph Hellwig wrote:
> On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote:
>> On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote:
>>> I really don't like this series at all.  If saves a single pointer
>>> but introduces a complicated machinery that just doesn't follow any
>>> natural flow.  And there doesn't seem to be any good reason for it to
>>> start with.
>>
>> Jens doesn't want the kiocb to grow beyond a single cacheline, and we
>> want the ability to set the loff_t in userspace for an appending write,
>> so the plan was to replace the ki_complete member in kiocb with an
>> loff_t __user *ki_posp.
>>
>> I don't think it's worth worrying about growing kiocb, personally,
>> but this seemed like the easiest way to make room for a new pointer.
> 
> The user offset pointer has absolutely no business in the the kiocb
> itself - it is a io_uring concept which needs to go into the io_kiocb,
> which has 14 bytes left in the last cache line in my build.  It would
> fit in very well there right next to the result and user pointer.

After getting a valid offset, io_uring shouldn't do anything but
complete the request. And as io_kiocb implicitly contains a CQE entry,
not sure we need @append_offset in the first place.

Kanchan, could you take a look if you can hide it in req->cflags?
Matthew Wilcox (Oracle) July 9, 2020, 1:43 p.m. UTC | #7
On Thu, Jul 09, 2020 at 04:37:59PM +0300, Pavel Begunkov wrote:
> On 09/07/2020 16:26, Christoph Hellwig wrote:
> > On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote:
> >> On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote:
> >>> I really don't like this series at all.  If saves a single pointer
> >>> but introduces a complicated machinery that just doesn't follow any
> >>> natural flow.  And there doesn't seem to be any good reason for it to
> >>> start with.
> >>
> >> Jens doesn't want the kiocb to grow beyond a single cacheline, and we
> >> want the ability to set the loff_t in userspace for an appending write,
> >> so the plan was to replace the ki_complete member in kiocb with an
> >> loff_t __user *ki_posp.
> >>
> >> I don't think it's worth worrying about growing kiocb, personally,
> >> but this seemed like the easiest way to make room for a new pointer.
> > 
> > The user offset pointer has absolutely no business in the the kiocb
> > itself - it is a io_uring concept which needs to go into the io_kiocb,
> > which has 14 bytes left in the last cache line in my build.  It would
> > fit in very well there right next to the result and user pointer.
> 
> After getting a valid offset, io_uring shouldn't do anything but
> complete the request. And as io_kiocb implicitly contains a CQE entry,
> not sure we need @append_offset in the first place.
> 
> Kanchan, could you take a look if you can hide it in req->cflags?

No, that's not what cflags are for.  And besides, there's only 32 bits
there.
Pavel Begunkov July 9, 2020, 1:49 p.m. UTC | #8
On 09/07/2020 16:43, Matthew Wilcox wrote:
> On Thu, Jul 09, 2020 at 04:37:59PM +0300, Pavel Begunkov wrote:
>> On 09/07/2020 16:26, Christoph Hellwig wrote:
>>> On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote:
>>>> On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote:
>>>>> I really don't like this series at all.  If saves a single pointer
>>>>> but introduces a complicated machinery that just doesn't follow any
>>>>> natural flow.  And there doesn't seem to be any good reason for it to
>>>>> start with.
>>>>
>>>> Jens doesn't want the kiocb to grow beyond a single cacheline, and we
>>>> want the ability to set the loff_t in userspace for an appending write,
>>>> so the plan was to replace the ki_complete member in kiocb with an
>>>> loff_t __user *ki_posp.
>>>>
>>>> I don't think it's worth worrying about growing kiocb, personally,
>>>> but this seemed like the easiest way to make room for a new pointer.
>>>
>>> The user offset pointer has absolutely no business in the the kiocb
>>> itself - it is a io_uring concept which needs to go into the io_kiocb,
>>> which has 14 bytes left in the last cache line in my build.  It would
>>> fit in very well there right next to the result and user pointer.
>>
>> After getting a valid offset, io_uring shouldn't do anything but
>> complete the request. And as io_kiocb implicitly contains a CQE entry,
>> not sure we need @append_offset in the first place.
>>
>> Kanchan, could you take a look if you can hide it in req->cflags?
> 
> No, that's not what cflags are for.  And besides, there's only 32 bits
> there.

It's there to temporarily store cqe->cflags, if a request can't completed
right away. And req->{result,user_data,cflags} are basically an CQE inside
io_kiocb.

So, it is there exactly for that reason, and whatever way it's going to be
encoded in an CQE, io_kiocb can fit it. That was my point.
Jens Axboe July 9, 2020, 1:53 p.m. UTC | #9
On 7/9/20 7:32 AM, Matthew Wilcox wrote:
> On Thu, Jul 09, 2020 at 02:26:11PM +0100, Christoph Hellwig wrote:
>> On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote:
>>> On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote:
>>>> I really don't like this series at all.  If saves a single pointer
>>>> but introduces a complicated machinery that just doesn't follow any
>>>> natural flow.  And there doesn't seem to be any good reason for it to
>>>> start with.
>>>
>>> Jens doesn't want the kiocb to grow beyond a single cacheline, and we
>>> want the ability to set the loff_t in userspace for an appending write,
>>> so the plan was to replace the ki_complete member in kiocb with an
>>> loff_t __user *ki_posp.
>>>
>>> I don't think it's worth worrying about growing kiocb, personally,
>>> but this seemed like the easiest way to make room for a new pointer.
>>
>> The user offset pointer has absolutely no business in the the kiocb
>> itself - it is a io_uring concept which needs to go into the io_kiocb,
>> which has 14 bytes left in the last cache line in my build.  It would
>> fit in very well there right next to the result and user pointer.
> 
> I agree.  Jens doesn't.

Stop putting words in my mouth, especially when they are totally untrue.

I was opposed to growing struct io_rw in io_uring, which is where the
extra append variable belonds, beyond a cacheline. You mentioned you
could probably shave some bits out of struct kiocb, which is how this
completion handling business came about. If kiocb was shrunk, then io_rw
has room for the needed variable.

At no point have I said that whatever we need to shove in there for
io_uring should be in the kiocb, that would not make any sense. I'm
just opposed to growing the per-op data field in io_kiocb beyond
a cacheline. And that's especially true for something like append
writes, which I don't consider super interesting.
Matthew Wilcox (Oracle) July 9, 2020, 1:53 p.m. UTC | #10
On Thu, Jul 09, 2020 at 04:49:51PM +0300, Pavel Begunkov wrote:
> On 09/07/2020 16:43, Matthew Wilcox wrote:
> > On Thu, Jul 09, 2020 at 04:37:59PM +0300, Pavel Begunkov wrote:
> >> Kanchan, could you take a look if you can hide it in req->cflags?
> > 
> > No, that's not what cflags are for.  And besides, there's only 32 bits
> > there.
> 
> It's there to temporarily store cqe->cflags, if a request can't completed
> right away. And req->{result,user_data,cflags} are basically an CQE inside
> io_kiocb.
> 
> So, it is there exactly for that reason, and whatever way it's going to be
> encoded in an CQE, io_kiocb can fit it. That was my point.

But it's not going to be encoded in the CQE.  Perhaps you should go back to
the older thread and read the arguments there.
Jens Axboe July 9, 2020, 1:55 p.m. UTC | #11
On 7/9/20 7:26 AM, Christoph Hellwig wrote:
> On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote:
>> On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote:
>>> I really don't like this series at all.  If saves a single pointer
>>> but introduces a complicated machinery that just doesn't follow any
>>> natural flow.  And there doesn't seem to be any good reason for it to
>>> start with.
>>
>> Jens doesn't want the kiocb to grow beyond a single cacheline, and we
>> want the ability to set the loff_t in userspace for an appending write,
>> so the plan was to replace the ki_complete member in kiocb with an
>> loff_t __user *ki_posp.
>>
>> I don't think it's worth worrying about growing kiocb, personally,
>> but this seemed like the easiest way to make room for a new pointer.
> 
> The user offset pointer has absolutely no business in the the kiocb
> itself - it is a io_uring concept which needs to go into the io_kiocb,

Nobody disagrees on that.

> which has 14 bytes left in the last cache line in my build.  It would
> fit in very well there right next to the result and user pointer.

Per-op data should not spill into the io_kiocb itself. And I absolutely
hate arguments like "oh there's still 14 bytes in there", because then
there's 6, then there's none, and now we're going into the next
cacheline. io_kiocb is already too fat, it should be getting slimmer,
not bigger. And the append write stuff is not nearly interesting
enough to a) grow io_kiocb, b) warrant a special case for op private
data in the io_kiocb itself.
Pavel Begunkov July 9, 2020, 1:59 p.m. UTC | #12
On 09/07/2020 16:53, Matthew Wilcox wrote:
> On Thu, Jul 09, 2020 at 04:49:51PM +0300, Pavel Begunkov wrote:
>> On 09/07/2020 16:43, Matthew Wilcox wrote:
>>> On Thu, Jul 09, 2020 at 04:37:59PM +0300, Pavel Begunkov wrote:
>>>> Kanchan, could you take a look if you can hide it in req->cflags?
>>>
>>> No, that's not what cflags are for.  And besides, there's only 32 bits
>>> there.
>>
>> It's there to temporarily store cqe->cflags, if a request can't completed
>> right away. And req->{result,user_data,cflags} are basically an CQE inside
>> io_kiocb.
>>
>> So, it is there exactly for that reason, and whatever way it's going to be
>> encoded in an CQE, io_kiocb can fit it. That was my point.
> 
> But it's not going to be encoded in the CQE.  Perhaps you should go back to
> the older thread and read the arguments there.

Ok, if the thread is stopped on the version with indirection. I just looked the
last sent patch.

If so, we can also store offset before adding an CQE and getting by with local
vars only. Just a thought.