Message ID | 20190411150657.18480-1-axboe@kernel.dk (mailing list archive) |
---|---|
Headers | show |
Series | io_uring: add sync_file_range and drains | expand |
On Thu, Apr 11, 2019 at 09:06:54AM -0600, Jens Axboe wrote: > In continuation of the fsync barrier patch from the other day, I > reworked that patch to turn it into a general primitive instead. This > means that any command can be flagged with IOSQE_IO_DRAIN, which will > insert a sequence point in the queue. If a request is marked with > IOSQE_IO_DRAIN, then previous commands must complete before this one > is issued. Subsequent requests are not started until the drain has > completed. The latter is a necessity since we track this through the > CQ index. If we allow later commands, then they could complete before > earlier commands and we'd mistakenly think that we have satisfied the > sequence point. That's potentially going to cause quite the bubble in the pipeline of commands being sent. Do consumers know which writes they are going to want to fence? We could do something like tag each command with a stream ID and then fence a particular stream. We'd need one nr_pending counter per stream, but that should be pretty cheap.
On 4/11/19 9:16 AM, Matthew Wilcox wrote: > On Thu, Apr 11, 2019 at 09:06:54AM -0600, Jens Axboe wrote: >> In continuation of the fsync barrier patch from the other day, I >> reworked that patch to turn it into a general primitive instead. This >> means that any command can be flagged with IOSQE_IO_DRAIN, which will >> insert a sequence point in the queue. If a request is marked with >> IOSQE_IO_DRAIN, then previous commands must complete before this one >> is issued. Subsequent requests are not started until the drain has >> completed. The latter is a necessity since we track this through the >> CQ index. If we allow later commands, then they could complete before >> earlier commands and we'd mistakenly think that we have satisfied the >> sequence point. > > That's potentially going to cause quite the bubble in the pipeline of > commands being sent. Definitely. > Do consumers know which writes they are going to want to fence? We could > do something like tag each command with a stream ID and then fence a > particular stream. We'd need one nr_pending counter per stream, but > that should be pretty cheap. Or you could just split your streams between io_urings. That has other overhead of course in terms of resources, but it'd avoid having to do any extra accounting on the kernel side. A pending counter is not necessarily cheap, though it'd be acceptable if we required writes that you want to fence to be tagged (hence it wouldn't happen for "normal" IO).
On 11 Apr 2019, at 11:16, Matthew Wilcox wrote: > On Thu, Apr 11, 2019 at 09:06:54AM -0600, Jens Axboe wrote: >> In continuation of the fsync barrier patch from the other day, I >> reworked that patch to turn it into a general primitive instead. This >> means that any command can be flagged with IOSQE_IO_DRAIN, which will >> insert a sequence point in the queue. If a request is marked with >> IOSQE_IO_DRAIN, then previous commands must complete before this one >> is issued. Subsequent requests are not started until the drain has >> completed. The latter is a necessity since we track this through the >> CQ index. If we allow later commands, then they could complete before >> earlier commands and we'd mistakenly think that we have satisfied the >> sequence point. > > That's potentially going to cause quite the bubble in the pipeline of > commands being sent. > > Do consumers know which writes they are going to want to fence? We > could > do something like tag each command with a stream ID and then fence a > particular stream. We'd need one nr_pending counter per stream, but > that should be pretty cheap. It'll be a bubble, but without the drain command, iou_ring users would still have the same bubble while they wait for IO in order to enforce the ordering themselves. I prefer Jens' suggestion to limit the drain's impact with multiple iou_rings instead of adding a stream id. I don't have a really solid reason for this, but I'd hesitate to add complexity before we have more data from users. -chris