[PATCHSET,0/3] io_uring: add sync_file_range and drains

Message ID	20190411150657.18480-1-axboe@kernel.dk (mailing list archive)
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> From: Jens Axboe <axboe@kernel.dk> To: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: hch@infradead.org, clm@fb.com Subject: [PATCHSET 0/3] io_uring: add sync_file_range and drains Date: Thu, 11 Apr 2019 09:06:54 -0600 Message-Id: <20190411150657.18480-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk
Series	io_uring: add sync_file_range and drains \| expand [PATCHSET,0/3] io_uring: add sync_file_range and drains [1/3] io_uring: add support for marking commands as draining [2/3] fs: add sync_file_range() helper [3/3] io_uring: add support for IORING_OP_SYNC_FILE_RANGE

Message ID

20190411150657.18480-1-axboe@kernel.dk (mailing list archive)

Headers

From: Jens Axboe <axboe@kernel.dk>
To: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org
Cc: hch@infradead.org, clm@fb.com
Subject: [PATCHSET 0/3] io_uring: add sync_file_range and drains
Date: Thu, 11 Apr 2019 09:06:54 -0600
Message-Id: <20190411150657.18480-1-axboe@kernel.dk>
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk

Series

io_uring: add sync_file_range and drains | expand

Message

Jens Axboe April 11, 2019, 3:06 p.m. UTC

In continuation of the fsync barrier patch from the other day, I
reworked that patch to turn it into a general primitive instead. This
means that any command can be flagged with IOSQE_IO_DRAIN, which will
insert a sequence point in the queue. If a request is marked with
IOSQE_IO_DRAIN, then previous commands must complete before this one
is issued. Subsequent requests are not started until the drain has
completed. The latter is a necessity since we track this through the
CQ index. If we allow later commands, then they could complete before
earlier commands and we'd mistakenly think that we have satisfied the
sequence point.

Patch 2 is just a prep patch for patch 3, which adds support for
sync_file_range() through io_uring. sync_file_range() is heavily used
by RocksDB.

Patches are also in my io_uring-next branch;

git://git.kernel.dk/linux-block io_uring-next

 fs/io_uring.c                 | 142 +++++++++++++++++++++++++++++++++-
 fs/sync.c                     | 135 +++++++++++++++++---------------
 include/linux/fs.h            |   3 +
 include/uapi/linux/io_uring.h |   3 +
 4 files changed, 216 insertions(+), 67 deletions(-)

Comments

Matthew Wilcox April 11, 2019, 3:16 p.m. UTC | #1

On Thu, Apr 11, 2019 at 09:06:54AM -0600, Jens Axboe wrote:
> In continuation of the fsync barrier patch from the other day, I
> reworked that patch to turn it into a general primitive instead. This
> means that any command can be flagged with IOSQE_IO_DRAIN, which will
> insert a sequence point in the queue. If a request is marked with
> IOSQE_IO_DRAIN, then previous commands must complete before this one
> is issued. Subsequent requests are not started until the drain has
> completed. The latter is a necessity since we track this through the
> CQ index. If we allow later commands, then they could complete before
> earlier commands and we'd mistakenly think that we have satisfied the
> sequence point.

That's potentially going to cause quite the bubble in the pipeline of
commands being sent.

Do consumers know which writes they are going to want to fence?  We could
do something like tag each command with a stream ID and then fence a
particular stream.  We'd need one nr_pending counter per stream, but
that should be pretty cheap.

Jens Axboe April 11, 2019, 3:23 p.m. UTC | #2

On 4/11/19 9:16 AM, Matthew Wilcox wrote:
> On Thu, Apr 11, 2019 at 09:06:54AM -0600, Jens Axboe wrote:
>> In continuation of the fsync barrier patch from the other day, I
>> reworked that patch to turn it into a general primitive instead. This
>> means that any command can be flagged with IOSQE_IO_DRAIN, which will
>> insert a sequence point in the queue. If a request is marked with
>> IOSQE_IO_DRAIN, then previous commands must complete before this one
>> is issued. Subsequent requests are not started until the drain has
>> completed. The latter is a necessity since we track this through the
>> CQ index. If we allow later commands, then they could complete before
>> earlier commands and we'd mistakenly think that we have satisfied the
>> sequence point.
> 
> That's potentially going to cause quite the bubble in the pipeline of
> commands being sent.

Definitely.

> Do consumers know which writes they are going to want to fence?  We could
> do something like tag each command with a stream ID and then fence a
> particular stream.  We'd need one nr_pending counter per stream, but
> that should be pretty cheap.

Or you could just split your streams between io_urings. That has other
overhead of course in terms of resources, but it'd avoid having to do
any extra accounting on the kernel side. A pending counter is not
necessarily cheap, though it'd be acceptable if we required writes
that you want to fence to be tagged (hence it wouldn't happen for
"normal" IO).

Chris Mason April 11, 2019, 4:19 p.m. UTC | #3

On 11 Apr 2019, at 11:16, Matthew Wilcox wrote:

> On Thu, Apr 11, 2019 at 09:06:54AM -0600, Jens Axboe wrote:
>> In continuation of the fsync barrier patch from the other day, I
>> reworked that patch to turn it into a general primitive instead. This
>> means that any command can be flagged with IOSQE_IO_DRAIN, which will
>> insert a sequence point in the queue. If a request is marked with
>> IOSQE_IO_DRAIN, then previous commands must complete before this one
>> is issued. Subsequent requests are not started until the drain has
>> completed. The latter is a necessity since we track this through the
>> CQ index. If we allow later commands, then they could complete before
>> earlier commands and we'd mistakenly think that we have satisfied the
>> sequence point.
>
> That's potentially going to cause quite the bubble in the pipeline of
> commands being sent.
>
> Do consumers know which writes they are going to want to fence?  We 
> could
> do something like tag each command with a stream ID and then fence a
> particular stream.  We'd need one nr_pending counter per stream, but
> that should be pretty cheap.

It'll be a bubble, but without the drain command, iou_ring users would 
still have the same bubble while they wait for IO in order to enforce 
the ordering themselves.

I prefer Jens' suggestion to limit the drain's impact with multiple 
iou_rings instead of adding a stream id.   I don't have a really solid 
reason for this, but I'd hesitate to add complexity before we have more 
data from users.

-chris