[v7,11/11] zonefs: use REQ_OP_ZONE_APPEND for sync DIO

Message ID	20200417121536.5393-12-johannes.thumshirn@wdc.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=VLuF=6B=vger.kernel.org=linux-fsdevel-owner@kernel.org> IronPort-SDR: ygJbTuMf9h6W7UKWtj/TFkTNNF9hqxjl3gr5jPYCBif284SJXM553o12Cl5+C4Xgq+IE3zSKD+ VXzE9taDzRirhh7d/ArkByrHRiXK5CPNHrHuRtGF+vusClOfq9a8bZeiFrs1LjUMMJM0Tdnaf7 HZ131LRHZDQftpdnvQgEHIOG3EYZaN1JTwme3RyUC0VL9FOmiU5bnQ4fcbxEQVQ1YVuRlMRFBq PqA8HXXW1roXozydmDfLJtYczL4m4EgI5mzqmETUngLMN4QtJOxi5EdDwgFenyPlADK66ABmNh dwc= IronPort-SDR: kkeLWFAAxNb9FfEAE7nPvW9d3Vh1t/xY3EX0k139hidFR6dbsJwQ8smgzK90q+yq1raegqCTEV 8fc5Kfq+PQ0U7ae2alUo99iDFHxU1ZaP4= IronPort-SDR: FTnxkZXjU4DFTvIsctm6WMqZ1KWfVRnVbcxfd6QFruAxT7pPPhVUihHPUT02ZXpKbSO36ZBLC1 dKhi/uJiaUwQ== WDCIronportException: Internal From: Johannes Thumshirn <johannes.thumshirn@wdc.com> To: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org>, linux-block <linux-block@vger.kernel.org>, Damien Le Moal <Damien.LeMoal@wdc.com>, Keith Busch <kbusch@kernel.org>, "linux-scsi @ vger . kernel . org" <linux-scsi@vger.kernel.org>, "Martin K . Petersen" <martin.petersen@oracle.com>, "linux-fsdevel @ vger . kernel . org" <linux-fsdevel@vger.kernel.org>, Daniel Wagner <dwagner@suse.de>, Johannes Thumshirn <johannes.thumshirn@wdc.com> Subject: [PATCH v7 11/11] zonefs: use REQ_OP_ZONE_APPEND for sync DIO Date: Fri, 17 Apr 2020 21:15:36 +0900 Message-Id: <20200417121536.5393-12-johannes.thumshirn@wdc.com> In-Reply-To: <20200417121536.5393-1-johannes.thumshirn@wdc.com> References: <20200417121536.5393-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk
Series	Introduce Zone Append for writing to zoned block devices \| expand [v7,00/11] Introduce Zone Append for writing to zoned block devices [v7,01/11] scsi: free sgtables in case command setup fails [v7,02/11] block: provide fallbacks for blk_queue_zone_is_seq and blk_queue_zone_no [v7,03/11] block: rename __bio_add_pc_page to bio_add_hw_page [v7,04/11] block: Introduce REQ_OP_ZONE_APPEND [v7,05/11] block: introduce blk_req_zone_write_trylock [v7,06/11] block: Modify revalidate zones [v7,07/11] scsi: sd_zbc: factor out sanity checks for zoned commands [v7,08/11] scsi: sd_zbc: emulate ZONE_APPEND commands [v7,09/11] null_blk: Support REQ_OP_ZONE_APPEND [v7,10/11] block: export bio_release_pages and bio_iov_iter_get_pages [v7,11/11] zonefs: use REQ_OP_ZONE_APPEND for sync DIO

Message ID

20200417121536.5393-12-johannes.thumshirn@wdc.com (mailing list archive)

State

New, archived

Headers

IronPort-SDR: 
 ygJbTuMf9h6W7UKWtj/TFkTNNF9hqxjl3gr5jPYCBif284SJXM553o12Cl5+C4Xgq+IE3zSKD+
 VXzE9taDzRirhh7d/ArkByrHRiXK5CPNHrHuRtGF+vusClOfq9a8bZeiFrs1LjUMMJM0Tdnaf7
 HZ131LRHZDQftpdnvQgEHIOG3EYZaN1JTwme3RyUC0VL9FOmiU5bnQ4fcbxEQVQ1YVuRlMRFBq
 PqA8HXXW1roXozydmDfLJtYczL4m4EgI5mzqmETUngLMN4QtJOxi5EdDwgFenyPlADK66ABmNh
 dwc=
IronPort-SDR: 
 kkeLWFAAxNb9FfEAE7nPvW9d3Vh1t/xY3EX0k139hidFR6dbsJwQ8smgzK90q+yq1raegqCTEV
 8fc5Kfq+PQ0U7ae2alUo99iDFHxU1ZaP4=
IronPort-SDR: 
 FTnxkZXjU4DFTvIsctm6WMqZ1KWfVRnVbcxfd6QFruAxT7pPPhVUihHPUT02ZXpKbSO36ZBLC1
 dKhi/uJiaUwQ==
WDCIronportException: Internal
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>,
        linux-block <linux-block@vger.kernel.org>,
        Damien Le Moal <Damien.LeMoal@wdc.com>,
        Keith Busch <kbusch@kernel.org>,
        "linux-scsi @ vger . kernel . org" <linux-scsi@vger.kernel.org>,
        "Martin K . Petersen" <martin.petersen@oracle.com>,
        "linux-fsdevel @ vger . kernel . org" <linux-fsdevel@vger.kernel.org>,
        Daniel Wagner <dwagner@suse.de>,
        Johannes Thumshirn <johannes.thumshirn@wdc.com>
Subject: [PATCH v7 11/11] zonefs: use REQ_OP_ZONE_APPEND for sync DIO
Date: Fri, 17 Apr 2020 21:15:36 +0900
Message-Id: <20200417121536.5393-12-johannes.thumshirn@wdc.com>
In-Reply-To: <20200417121536.5393-1-johannes.thumshirn@wdc.com>
References: <20200417121536.5393-1-johannes.thumshirn@wdc.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk

Series

Introduce Zone Append for writing to zoned block devices | expand

Commit Message

Johannes Thumshirn April 17, 2020, 12:15 p.m. UTC

Synchronous direct I/O to a sequential write only zone can be issued using
the new REQ_OP_ZONE_APPEND request operation. As dispatching multiple
BIOs can potentially result in reordering, we cannot support asynchronous
IO via this interface.

We also can only dispatch up to queue_max_zone_append_sectors() via the
new zone-append method and have to return a short write back to user-space
in case an IO larger than queue_max_zone_append_sectors() has been issued.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/zonefs/super.c | 80 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 72 insertions(+), 8 deletions(-)

Comments

Bart Van Assche April 18, 2020, 9:45 p.m. UTC | #1

On 4/17/20 5:15 AM, Johannes Thumshirn wrote:
> Synchronous direct I/O to a sequential write only zone can be issued using
> the new REQ_OP_ZONE_APPEND request operation. As dispatching multiple
> BIOs can potentially result in reordering, we cannot support asynchronous
> IO via this interface.
> 
> We also can only dispatch up to queue_max_zone_append_sectors() via the
> new zone-append method and have to return a short write back to user-space
> in case an IO larger than queue_max_zone_append_sectors() has been issued.

Is this patch the only patch that adds a user space interface through 
which REQ_OP_ZONE_APPEND operations can be submitted? Has it been 
considered to make it possible to submit REQ_OP_ZONE_APPEND operations 
through the asynchronous I/O mechanism?

Thanks,

Bart.

Damien Le Moal April 20, 2020, 12:36 a.m. UTC | #2

On 2020/04/19 6:45, Bart Van Assche wrote:
> On 4/17/20 5:15 AM, Johannes Thumshirn wrote:
>> Synchronous direct I/O to a sequential write only zone can be issued using
>> the new REQ_OP_ZONE_APPEND request operation. As dispatching multiple
>> BIOs can potentially result in reordering, we cannot support asynchronous
>> IO via this interface.
>>
>> We also can only dispatch up to queue_max_zone_append_sectors() via the
>> new zone-append method and have to return a short write back to user-space
>> in case an IO larger than queue_max_zone_append_sectors() has been issued.
> 
> Is this patch the only patch that adds a user space interface through 
> which REQ_OP_ZONE_APPEND operations can be submitted? Has it been 
> considered to make it possible to submit REQ_OP_ZONE_APPEND operations 
> through the asynchronous I/O mechanism?

Yes, we have looked into it. We do have some hack-ish code working for that. For
the initial feature post though, we didn't want to add that part to facilitate
reviews and also because we need more work to cleanly handle zone append in the
aio code (that code is written assuming that BIOs can always be split and so
never returns short writes).

> 
> Thanks,
> 
> Bart.
>

diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 3ce9829a6936..0bf7009f50a2 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -20,6 +20,7 @@ 
 #include <linux/mman.h>
 #include <linux/sched/mm.h>
 #include <linux/crc32.h>
+#include <linux/task_io_accounting_ops.h>
 
 #include "zonefs.h"
 
@@ -596,6 +597,61 @@  static const struct iomap_dio_ops zonefs_write_dio_ops = {
 	.end_io			= zonefs_file_write_dio_end_io,
 };
 
+static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct inode *inode = file_inode(iocb->ki_filp);
+	struct zonefs_inode_info *zi = ZONEFS_I(inode);
+	struct block_device *bdev = inode->i_sb->s_bdev;
+	unsigned int max;
+	struct bio *bio;
+	ssize_t size;
+	int nr_pages;
+	ssize_t ret;
+
+	nr_pages = iov_iter_npages(from, BIO_MAX_PAGES);
+	if (!nr_pages)
+		return 0;
+
+	max = queue_max_zone_append_sectors(bdev_get_queue(bdev));
+	max = ALIGN_DOWN(max << SECTOR_SHIFT, inode->i_sb->s_blocksize);
+	iov_iter_truncate(from, max);
+
+	bio = bio_alloc_bioset(GFP_NOFS, nr_pages, &fs_bio_set);
+	if (!bio)
+		return -ENOMEM;
+
+	bio_set_dev(bio, bdev);
+	bio->bi_iter.bi_sector = zi->i_zsector;
+	bio->bi_write_hint = iocb->ki_hint;
+	bio->bi_ioprio = iocb->ki_ioprio;
+	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
+	if (iocb->ki_flags & IOCB_DSYNC)
+		bio->bi_opf |= REQ_FUA;
+
+	ret = bio_iov_iter_get_pages(bio, from);
+	if (unlikely(ret)) {
+		bio_io_error(bio);
+		return ret;
+	}
+	size = bio->bi_iter.bi_size;
+	task_io_account_write(ret);
+
+	if (iocb->ki_flags & IOCB_HIPRI)
+		bio_set_polled(bio, iocb);
+
+	ret = submit_bio_wait(bio);
+
+	bio_put(bio);
+
+	zonefs_file_write_dio_end_io(iocb, size, ret, 0);
+	if (ret >= 0) {
+		iocb->ki_pos += size;
+		return size;
+	}
+
+	return ret;
+}
+
 /*
  * Handle direct writes. For sequential zone files, this is the only possible
  * write path. For these files, check that the user is issuing writes
@@ -611,6 +667,8 @@  static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
 	struct inode *inode = file_inode(iocb->ki_filp);
 	struct zonefs_inode_info *zi = ZONEFS_I(inode);
 	struct super_block *sb = inode->i_sb;
+	bool sync = is_sync_kiocb(iocb);
+	bool append = false;
 	size_t count;
 	ssize_t ret;
 
@@ -619,7 +677,7 @@  static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
 	 * as this can cause write reordering (e.g. the first aio gets EAGAIN
 	 * on the inode lock but the second goes through but is now unaligned).
 	 */
-	if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !is_sync_kiocb(iocb) &&
+	if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !sync &&
 	    (iocb->ki_flags & IOCB_NOWAIT))
 		return -EOPNOTSUPP;
 
@@ -643,16 +701,22 @@  static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
 	}
 
 	/* Enforce sequential writes (append only) in sequential zones */
-	mutex_lock(&zi->i_truncate_mutex);
-	if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && iocb->ki_pos != zi->i_wpoffset) {
+	if (zi->i_ztype == ZONEFS_ZTYPE_SEQ) {
+		mutex_lock(&zi->i_truncate_mutex);
+		if (iocb->ki_pos != zi->i_wpoffset) {
+			mutex_unlock(&zi->i_truncate_mutex);
+			ret = -EINVAL;
+			goto inode_unlock;
+		}
 		mutex_unlock(&zi->i_truncate_mutex);
-		ret = -EINVAL;
-		goto inode_unlock;
+		append = sync;
 	}
-	mutex_unlock(&zi->i_truncate_mutex);
 
-	ret = iomap_dio_rw(iocb, from, &zonefs_iomap_ops,
-			   &zonefs_write_dio_ops, is_sync_kiocb(iocb));
+	if (append)
+		ret = zonefs_file_dio_append(iocb, from);
+	else
+		ret = iomap_dio_rw(iocb, from, &zonefs_iomap_ops,
+				   &zonefs_write_dio_ops, sync);
 	if (zi->i_ztype == ZONEFS_ZTYPE_SEQ &&
 	    (ret > 0 || ret == -EIOCBQUEUED)) {
 		if (ret > 0)

[v7,11/11] zonefs: use REQ_OP_ZONE_APPEND for sync DIO

Commit Message

Comments

Patch