Message ID | 20200708222637.23046-1-willy@infradead.org (mailing list archive) |
---|---|
Headers | show |
Series | Remove kiocb ki_complete | expand |
On 7/8/20 4:26 PM, Matthew Wilcox (Oracle) wrote: > Save a pointer in the kiocb by using a few bits in ki_flags to index > a table of completion functions. I ran polled and regular IO testing through io_uring, which exercises both completions that we have in there, and it works just fine for me. Didn't test anything beyond that, outside of ensuring the kernel also booted ;-)
I really don't like this series at all. If saves a single pointer but introduces a complicated machinery that just doesn't follow any natural flow. And there doesn't seem to be any good reason for it to start with.
On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote: > I really don't like this series at all. If saves a single pointer > but introduces a complicated machinery that just doesn't follow any > natural flow. And there doesn't seem to be any good reason for it to > start with. Jens doesn't want the kiocb to grow beyond a single cacheline, and we want the ability to set the loff_t in userspace for an appending write, so the plan was to replace the ki_complete member in kiocb with an loff_t __user *ki_posp. I don't think it's worth worrying about growing kiocb, personally, but this seemed like the easiest way to make room for a new pointer.
On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote: > On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote: > > I really don't like this series at all. If saves a single pointer > > but introduces a complicated machinery that just doesn't follow any > > natural flow. And there doesn't seem to be any good reason for it to > > start with. > > Jens doesn't want the kiocb to grow beyond a single cacheline, and we > want the ability to set the loff_t in userspace for an appending write, > so the plan was to replace the ki_complete member in kiocb with an > loff_t __user *ki_posp. > > I don't think it's worth worrying about growing kiocb, personally, > but this seemed like the easiest way to make room for a new pointer. The user offset pointer has absolutely no business in the the kiocb itself - it is a io_uring concept which needs to go into the io_kiocb, which has 14 bytes left in the last cache line in my build. It would fit in very well there right next to the result and user pointer.
On Thu, Jul 09, 2020 at 02:26:11PM +0100, Christoph Hellwig wrote: > On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote: > > On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote: > > > I really don't like this series at all. If saves a single pointer > > > but introduces a complicated machinery that just doesn't follow any > > > natural flow. And there doesn't seem to be any good reason for it to > > > start with. > > > > Jens doesn't want the kiocb to grow beyond a single cacheline, and we > > want the ability to set the loff_t in userspace for an appending write, > > so the plan was to replace the ki_complete member in kiocb with an > > loff_t __user *ki_posp. > > > > I don't think it's worth worrying about growing kiocb, personally, > > but this seemed like the easiest way to make room for a new pointer. > > The user offset pointer has absolutely no business in the the kiocb > itself - it is a io_uring concept which needs to go into the io_kiocb, > which has 14 bytes left in the last cache line in my build. It would > fit in very well there right next to the result and user pointer. I agree. Jens doesn't.
On 09/07/2020 16:26, Christoph Hellwig wrote: > On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote: >> On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote: >>> I really don't like this series at all. If saves a single pointer >>> but introduces a complicated machinery that just doesn't follow any >>> natural flow. And there doesn't seem to be any good reason for it to >>> start with. >> >> Jens doesn't want the kiocb to grow beyond a single cacheline, and we >> want the ability to set the loff_t in userspace for an appending write, >> so the plan was to replace the ki_complete member in kiocb with an >> loff_t __user *ki_posp. >> >> I don't think it's worth worrying about growing kiocb, personally, >> but this seemed like the easiest way to make room for a new pointer. > > The user offset pointer has absolutely no business in the the kiocb > itself - it is a io_uring concept which needs to go into the io_kiocb, > which has 14 bytes left in the last cache line in my build. It would > fit in very well there right next to the result and user pointer. After getting a valid offset, io_uring shouldn't do anything but complete the request. And as io_kiocb implicitly contains a CQE entry, not sure we need @append_offset in the first place. Kanchan, could you take a look if you can hide it in req->cflags?
On Thu, Jul 09, 2020 at 04:37:59PM +0300, Pavel Begunkov wrote: > On 09/07/2020 16:26, Christoph Hellwig wrote: > > On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote: > >> On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote: > >>> I really don't like this series at all. If saves a single pointer > >>> but introduces a complicated machinery that just doesn't follow any > >>> natural flow. And there doesn't seem to be any good reason for it to > >>> start with. > >> > >> Jens doesn't want the kiocb to grow beyond a single cacheline, and we > >> want the ability to set the loff_t in userspace for an appending write, > >> so the plan was to replace the ki_complete member in kiocb with an > >> loff_t __user *ki_posp. > >> > >> I don't think it's worth worrying about growing kiocb, personally, > >> but this seemed like the easiest way to make room for a new pointer. > > > > The user offset pointer has absolutely no business in the the kiocb > > itself - it is a io_uring concept which needs to go into the io_kiocb, > > which has 14 bytes left in the last cache line in my build. It would > > fit in very well there right next to the result and user pointer. > > After getting a valid offset, io_uring shouldn't do anything but > complete the request. And as io_kiocb implicitly contains a CQE entry, > not sure we need @append_offset in the first place. > > Kanchan, could you take a look if you can hide it in req->cflags? No, that's not what cflags are for. And besides, there's only 32 bits there.
On 09/07/2020 16:43, Matthew Wilcox wrote: > On Thu, Jul 09, 2020 at 04:37:59PM +0300, Pavel Begunkov wrote: >> On 09/07/2020 16:26, Christoph Hellwig wrote: >>> On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote: >>>> On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote: >>>>> I really don't like this series at all. If saves a single pointer >>>>> but introduces a complicated machinery that just doesn't follow any >>>>> natural flow. And there doesn't seem to be any good reason for it to >>>>> start with. >>>> >>>> Jens doesn't want the kiocb to grow beyond a single cacheline, and we >>>> want the ability to set the loff_t in userspace for an appending write, >>>> so the plan was to replace the ki_complete member in kiocb with an >>>> loff_t __user *ki_posp. >>>> >>>> I don't think it's worth worrying about growing kiocb, personally, >>>> but this seemed like the easiest way to make room for a new pointer. >>> >>> The user offset pointer has absolutely no business in the the kiocb >>> itself - it is a io_uring concept which needs to go into the io_kiocb, >>> which has 14 bytes left in the last cache line in my build. It would >>> fit in very well there right next to the result and user pointer. >> >> After getting a valid offset, io_uring shouldn't do anything but >> complete the request. And as io_kiocb implicitly contains a CQE entry, >> not sure we need @append_offset in the first place. >> >> Kanchan, could you take a look if you can hide it in req->cflags? > > No, that's not what cflags are for. And besides, there's only 32 bits > there. It's there to temporarily store cqe->cflags, if a request can't completed right away. And req->{result,user_data,cflags} are basically an CQE inside io_kiocb. So, it is there exactly for that reason, and whatever way it's going to be encoded in an CQE, io_kiocb can fit it. That was my point.
On 7/9/20 7:32 AM, Matthew Wilcox wrote: > On Thu, Jul 09, 2020 at 02:26:11PM +0100, Christoph Hellwig wrote: >> On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote: >>> On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote: >>>> I really don't like this series at all. If saves a single pointer >>>> but introduces a complicated machinery that just doesn't follow any >>>> natural flow. And there doesn't seem to be any good reason for it to >>>> start with. >>> >>> Jens doesn't want the kiocb to grow beyond a single cacheline, and we >>> want the ability to set the loff_t in userspace for an appending write, >>> so the plan was to replace the ki_complete member in kiocb with an >>> loff_t __user *ki_posp. >>> >>> I don't think it's worth worrying about growing kiocb, personally, >>> but this seemed like the easiest way to make room for a new pointer. >> >> The user offset pointer has absolutely no business in the the kiocb >> itself - it is a io_uring concept which needs to go into the io_kiocb, >> which has 14 bytes left in the last cache line in my build. It would >> fit in very well there right next to the result and user pointer. > > I agree. Jens doesn't. Stop putting words in my mouth, especially when they are totally untrue. I was opposed to growing struct io_rw in io_uring, which is where the extra append variable belonds, beyond a cacheline. You mentioned you could probably shave some bits out of struct kiocb, which is how this completion handling business came about. If kiocb was shrunk, then io_rw has room for the needed variable. At no point have I said that whatever we need to shove in there for io_uring should be in the kiocb, that would not make any sense. I'm just opposed to growing the per-op data field in io_kiocb beyond a cacheline. And that's especially true for something like append writes, which I don't consider super interesting.
On Thu, Jul 09, 2020 at 04:49:51PM +0300, Pavel Begunkov wrote: > On 09/07/2020 16:43, Matthew Wilcox wrote: > > On Thu, Jul 09, 2020 at 04:37:59PM +0300, Pavel Begunkov wrote: > >> Kanchan, could you take a look if you can hide it in req->cflags? > > > > No, that's not what cflags are for. And besides, there's only 32 bits > > there. > > It's there to temporarily store cqe->cflags, if a request can't completed > right away. And req->{result,user_data,cflags} are basically an CQE inside > io_kiocb. > > So, it is there exactly for that reason, and whatever way it's going to be > encoded in an CQE, io_kiocb can fit it. That was my point. But it's not going to be encoded in the CQE. Perhaps you should go back to the older thread and read the arguments there.
On 7/9/20 7:26 AM, Christoph Hellwig wrote: > On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote: >> On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote: >>> I really don't like this series at all. If saves a single pointer >>> but introduces a complicated machinery that just doesn't follow any >>> natural flow. And there doesn't seem to be any good reason for it to >>> start with. >> >> Jens doesn't want the kiocb to grow beyond a single cacheline, and we >> want the ability to set the loff_t in userspace for an appending write, >> so the plan was to replace the ki_complete member in kiocb with an >> loff_t __user *ki_posp. >> >> I don't think it's worth worrying about growing kiocb, personally, >> but this seemed like the easiest way to make room for a new pointer. > > The user offset pointer has absolutely no business in the the kiocb > itself - it is a io_uring concept which needs to go into the io_kiocb, Nobody disagrees on that. > which has 14 bytes left in the last cache line in my build. It would > fit in very well there right next to the result and user pointer. Per-op data should not spill into the io_kiocb itself. And I absolutely hate arguments like "oh there's still 14 bytes in there", because then there's 6, then there's none, and now we're going into the next cacheline. io_kiocb is already too fat, it should be getting slimmer, not bigger. And the append write stuff is not nearly interesting enough to a) grow io_kiocb, b) warrant a special case for op private data in the io_kiocb itself.
On 09/07/2020 16:53, Matthew Wilcox wrote: > On Thu, Jul 09, 2020 at 04:49:51PM +0300, Pavel Begunkov wrote: >> On 09/07/2020 16:43, Matthew Wilcox wrote: >>> On Thu, Jul 09, 2020 at 04:37:59PM +0300, Pavel Begunkov wrote: >>>> Kanchan, could you take a look if you can hide it in req->cflags? >>> >>> No, that's not what cflags are for. And besides, there's only 32 bits >>> there. >> >> It's there to temporarily store cqe->cflags, if a request can't completed >> right away. And req->{result,user_data,cflags} are basically an CQE inside >> io_kiocb. >> >> So, it is there exactly for that reason, and whatever way it's going to be >> encoded in an CQE, io_kiocb can fit it. That was my point. > > But it's not going to be encoded in the CQE. Perhaps you should go back to > the older thread and read the arguments there. Ok, if the thread is stopped on the version with indirection. I just looked the last sent patch. If so, we can also store offset before adding an CQE and getting by with local vars only. Just a thought.