From patchwork Tue May 12 08:55:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11542353 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 09B16186E for ; Tue, 12 May 2020 08:56:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E3D6C214DB for ; Tue, 12 May 2020 08:56:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="Od/t/O43" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729282AbgELI4F (ORCPT ); Tue, 12 May 2020 04:56:05 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:16012 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727859AbgELI4F (ORCPT ); Tue, 12 May 2020 04:56:05 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1589273765; x=1620809765; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YWXANu2tzSwd2q0J4374R6vpioY6gmSMC2fMgiiszi8=; b=Od/t/O434QtETjOQz1jBQ/RPFHYA5oJfiiB+drW9QTZ4ggKljYntYTC4 qKT7LtSF8B+DEPEqq6krpTRevi3vvH0hSeqwrT5Oof0GUK0ATgdL1gN3Z Y5wTMNvyPvjDshoPFJ0ne2tBO2f2x9BRS6Z/MuseCOTbq5g7rX5hS5f/T yulxJ/1Si92w9uyctIWy52fOZd5jaGhs90eiTzcCz8tqmPLD6pAvFScLo F27RBQ4UWyBaCbBaucne9v/1hKGZYWtEwA2SrG8encR811Af7YZRpzYtY e6jN9TEK3xl+QvyxwgooYhcZT4BdvsdWFA13bcSY8+xckjzwDumCR+cHu A==; IronPort-SDR: ziwTGvV/u4nL3Iuv+zqgvDPrRY+Kq2PoWh2AZE+kzleTQcFRpw4bdrr2rNM73mMoxc7x9JMdQe iTp2Di4l81xJSuW0hV/ecgIIdVjXhNAMqM2njVwyw90jVareQ8i0sS7HAX5MgZJwEDk2qXEczL GCR8C6ki69myke1AnfAicZ5IpIx0NVuk1KF5A7MNhmAjMrmoQ6fNC/z/3YOWUNDDN448mMzGN3 97NMZFqRomhrBhc6rKr0oyOm8amQ/KL6BzfZX2NKOPZprqgWtzpH4hUQTrLhTDIIaIEAOxjwTD aKY= X-IronPort-AV: E=Sophos;i="5.73,383,1583164800"; d="scan'208";a="141823527" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 12 May 2020 16:56:05 +0800 IronPort-SDR: 8FLTNUwhTf7AzU7uDbJM7NiFUDRQAZPUvSXicZWHqXg0p3nL1hL95c/BXFbNdMpTUBBwjHtWWF LFOKLj7nYJ320BnVdP7MjZLaOrWgkOAM4= Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 May 2020 01:45:47 -0700 IronPort-SDR: T/Il2f6XwNF00+XvPCJA4OUTvzFgNMT3F50auizYj+VWSaaVXaTigJDyflOVFnum+JmH7kxmlN HOMQYu4wnvWw== WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 12 May 2020 01:56:02 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn , Christoph Hellwig , Bart Van Assche , Hannes Reinecke Subject: [PATCH v11 01/10] block: provide fallbacks for blk_queue_zone_is_seq and blk_queue_zone_no Date: Tue, 12 May 2020 17:55:45 +0900 Message-Id: <20200512085554.26366-2-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200512085554.26366-1-johannes.thumshirn@wdc.com> References: <20200512085554.26366-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org blk_queue_zone_is_seq() and blk_queue_zone_no() have not been called with CONFIG_BLK_DEV_ZONED disabled until now. The introduction of REQ_OP_ZONE_APPEND will change this, so we need to provide noop fallbacks for the !CONFIG_BLK_DEV_ZONED case. Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig Reviewed-by: Bart Van Assche Reviewed-by: Hannes Reinecke Reviewed-by: Martin K. Petersen --- include/linux/blkdev.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f00bd4042295..91c6e413bf6b 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -721,6 +721,16 @@ static inline unsigned int blk_queue_nr_zones(struct request_queue *q) { return 0; } +static inline bool blk_queue_zone_is_seq(struct request_queue *q, + sector_t sector) +{ + return false; +} +static inline unsigned int blk_queue_zone_no(struct request_queue *q, + sector_t sector) +{ + return 0; +} #endif /* CONFIG_BLK_DEV_ZONED */ static inline bool rq_is_sync(struct request *rq) From patchwork Tue May 12 08:55:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11542365 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0F697912 for ; Tue, 12 May 2020 08:56:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E62E020A8B for ; Tue, 12 May 2020 08:56:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="QZ9Z0lIn" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729364AbgELI4I (ORCPT ); Tue, 12 May 2020 04:56:08 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:16012 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729358AbgELI4H (ORCPT ); Tue, 12 May 2020 04:56:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1589273768; x=1620809768; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=310oifaEsgscOtS7gBBrXefIuiyMnrBkU90sLOB8m90=; b=QZ9Z0lInsLe4gM/Jj4f+AZ2BQhx6nvfVsFkc4doLMa201NOWgr0XPU7O Fg4BWJ5pwLMtTO8iSf+QeBegYpdgnXEvJr62OGFMZsYSzCsOo2k6GAPJx pTiEzaZjUDI8Xj0jHASHwjANy0Ts3LDZTC7mO9hVJwbhsqpHNyagrAFPO lI5kNW+Ss7geel+akiVw4xEAu4e6g2psBv2ONO7CDUmjsvhEZuMykltDd x19A4gNkVdLrsmnjcjHVZryiVEwKIIt67SataG1TeP9tpGIRJR4xEFrjG 6J+aOa9znvaCk4gB+7GR6o2JTGUEpkQ+Tzd1Ss4qLPQpMVWAf5YOU/2uk Q==; IronPort-SDR: T3Ymlyt1GkM5WY9+Q7AO8BBO9e/pNOnRW4mpaqJP8gYej2n6FaKhGgfAsJjeXR+lGPnqsu8qBj bjgMbmv6ILTSDOtHhe0NnvUY+HgJOdCPbGBlnKZvml1QomtjyOfc/uPB19rkNbsKxI7WsaFKHu WnBMAcZgdTg50x88brEEU37XZYdaD9wra2m5uHo6UH+8H8JQRhBsnl45oScoF3XDJy87LmfUkV VQsbU3O5UezO++BYHARiB1/IMEI94l6+2c5pAXjQJHqEDGJ0ymTZdaQK/9ErRnI69pA+eK+g2i cXQ= X-IronPort-AV: E=Sophos;i="5.73,383,1583164800"; d="scan'208";a="141823535" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 12 May 2020 16:56:07 +0800 IronPort-SDR: XGmSTRMzjUdb3Dax8Eb9WBhu/X8o01vvXMxFGXq4/18DDvTPN+gAS9Ipbu/D4pOppuJGSanguw fv1PZLkiPYwgcU2Ze8t0ughaJ/dDWHi4o= Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 May 2020 01:45:49 -0700 IronPort-SDR: mJhEXX3nQ+FL2UhHRYlr2qlY7kmHpyoWDy5tTwnPxleg79GENqORTexK9ACgF3FRYBjaQ1bzB/ QoQpr1fSgxFw== WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 12 May 2020 01:56:04 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Christoph Hellwig , Johannes Thumshirn , Daniel Wagner , Hannes Reinecke Subject: [PATCH v11 02/10] block: rename __bio_add_pc_page to bio_add_hw_page Date: Tue, 12 May 2020 17:55:46 +0900 Message-Id: <20200512085554.26366-3-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200512085554.26366-1-johannes.thumshirn@wdc.com> References: <20200512085554.26366-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Christoph Hellwig Rename __bio_add_pc_page() to bio_add_hw_page() and explicitly pass in a max_sectors argument. This max_sectors argument can be used to specify constraints from the hardware. Signed-off-by: Christoph Hellwig [ jth: rebased and made public for blk-map.c ] Signed-off-by: Johannes Thumshirn Reviewed-by: Daniel Wagner Reviewed-by: Martin K. Petersen Reviewed-by: Hannes Reinecke --- block/bio.c | 65 ++++++++++++++++++++++++++++++------------------- block/blk-map.c | 5 ++-- block/blk.h | 4 +-- 3 files changed, 45 insertions(+), 29 deletions(-) diff --git a/block/bio.c b/block/bio.c index 21cbaa6a1c20..aad0a6dad4f9 100644 --- a/block/bio.c +++ b/block/bio.c @@ -748,9 +748,14 @@ static inline bool page_is_mergeable(const struct bio_vec *bv, return true; } -static bool bio_try_merge_pc_page(struct request_queue *q, struct bio *bio, - struct page *page, unsigned len, unsigned offset, - bool *same_page) +/* + * Try to merge a page into a segment, while obeying the hardware segment + * size limit. This is not for normal read/write bios, but for passthrough + * or Zone Append operations that we can't split. + */ +static bool bio_try_merge_hw_seg(struct request_queue *q, struct bio *bio, + struct page *page, unsigned len, + unsigned offset, bool *same_page) { struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1]; unsigned long mask = queue_segment_boundary(q); @@ -765,38 +770,32 @@ static bool bio_try_merge_pc_page(struct request_queue *q, struct bio *bio, } /** - * __bio_add_pc_page - attempt to add page to passthrough bio - * @q: the target queue - * @bio: destination bio - * @page: page to add - * @len: vec entry length - * @offset: vec entry offset - * @same_page: return if the merge happen inside the same page - * - * Attempt to add a page to the bio_vec maplist. This can fail for a - * number of reasons, such as the bio being full or target block device - * limitations. The target block device must allow bio's up to PAGE_SIZE, - * so it is always possible to add a single page to an empty bio. + * bio_add_hw_page - attempt to add a page to a bio with hw constraints + * @q: the target queue + * @bio: destination bio + * @page: page to add + * @len: vec entry length + * @offset: vec entry offset + * @max_sectors: maximum number of sectors that can be added + * @same_page: return if the segment has been merged inside the same page * - * This should only be used by passthrough bios. + * Add a page to a bio while respecting the hardware max_sectors, max_segment + * and gap limitations. */ -int __bio_add_pc_page(struct request_queue *q, struct bio *bio, +int bio_add_hw_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset, - bool *same_page) + unsigned int max_sectors, bool *same_page) { struct bio_vec *bvec; - /* - * cloned bio must not modify vec list - */ - if (unlikely(bio_flagged(bio, BIO_CLONED))) + if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED))) return 0; - if (((bio->bi_iter.bi_size + len) >> 9) > queue_max_hw_sectors(q)) + if (((bio->bi_iter.bi_size + len) >> 9) > max_sectors) return 0; if (bio->bi_vcnt > 0) { - if (bio_try_merge_pc_page(q, bio, page, len, offset, same_page)) + if (bio_try_merge_hw_seg(q, bio, page, len, offset, same_page)) return len; /* @@ -823,11 +822,27 @@ int __bio_add_pc_page(struct request_queue *q, struct bio *bio, return len; } +/** + * bio_add_pc_page - attempt to add page to passthrough bio + * @q: the target queue + * @bio: destination bio + * @page: page to add + * @len: vec entry length + * @offset: vec entry offset + * + * Attempt to add a page to the bio_vec maplist. This can fail for a + * number of reasons, such as the bio being full or target block device + * limitations. The target block device must allow bio's up to PAGE_SIZE, + * so it is always possible to add a single page to an empty bio. + * + * This should only be used by passthrough bios. + */ int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset) { bool same_page = false; - return __bio_add_pc_page(q, bio, page, len, offset, &same_page); + return bio_add_hw_page(q, bio, page, len, offset, + queue_max_hw_sectors(q), &same_page); } EXPORT_SYMBOL(bio_add_pc_page); diff --git a/block/blk-map.c b/block/blk-map.c index b6fa343fea9f..e3e4ac48db45 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -257,6 +257,7 @@ static struct bio *bio_copy_user_iov(struct request_queue *q, static struct bio *bio_map_user_iov(struct request_queue *q, struct iov_iter *iter, gfp_t gfp_mask) { + unsigned int max_sectors = queue_max_hw_sectors(q); int j; struct bio *bio; int ret; @@ -294,8 +295,8 @@ static struct bio *bio_map_user_iov(struct request_queue *q, if (n > bytes) n = bytes; - if (!__bio_add_pc_page(q, bio, page, n, offs, - &same_page)) { + if (!bio_add_hw_page(q, bio, page, n, offs, + max_sectors, &same_page)) { if (same_page) put_page(page); break; diff --git a/block/blk.h b/block/blk.h index 73bd3b1c6938..1ae3279df712 100644 --- a/block/blk.h +++ b/block/blk.h @@ -453,8 +453,8 @@ static inline void part_nr_sects_write(struct hd_struct *part, sector_t size) struct request_queue *__blk_alloc_queue(int node_id); -int __bio_add_pc_page(struct request_queue *q, struct bio *bio, +int bio_add_hw_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset, - bool *same_page); + unsigned int max_sectors, bool *same_page); #endif /* BLK_INTERNAL_H */ From patchwork Tue May 12 08:55:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11542375 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 793ED912 for ; Tue, 12 May 2020 08:56:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5BB77214DB for ; Tue, 12 May 2020 08:56:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="F+vTryIo" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729374AbgELI4N (ORCPT ); Tue, 12 May 2020 04:56:13 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:16012 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729365AbgELI4J (ORCPT ); Tue, 12 May 2020 04:56:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1589273769; x=1620809769; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=l0CUlCpeB/tUpQLgohs/Adxa9O463b3nVjgJfopnwvM=; b=F+vTryIoyS/dAOcZ56WQrjjvEOja46O347oexkfuk+8j9AyxcJqsd/fe 6nbUoIoYiL65//J6IkJKQJdIUfN3g+7H0k8RFgOfEtHdVWQA1TbDwMj+O p29wU1Kk/7HGUttvcRKfDElAj1w8X5IStAIFhMTG2ybsWTU3kcp713UMk wFjBZsxwymVS/W0sgyKyMTuvAL29duskIX9VWrBG/+4Cy5f9cLXWnfAFc EwclNiABaDSEVwQBMFWG2vWHxWD+qsCGzdtCi5NVulvp/Dq/q80WXvE2/ DUGzMmc+ZTSWAb0xRJgfZ/LEQSkVbQ37lCQ1JNyHuiNLIvQJz2UPB+YX3 A==; IronPort-SDR: ycIGLjJ7+VMoP6gtRLmVmMOVo+DTrdkfCBai69vLbO529Mv1SGIRoTRsrhyWJXOTdIv8Yu0lCi BfaoYDycUBXMjHlVV+PIaOHLlSIJa5a8i/pJUK+R1VqLWz9t6a53Z7u6f5OY70YdYQNGud0VrV t9NCJsM/EzxfLMiIwoAnkkYT5vK5ZwutrKpEZ81ZuN1YpJALZRS6g2wtupEFJTG8Snrpx0QSKV RkeX3cmK/Yh5Hb9aPqX+fnNHfW8/3HfRV8yUzwjdkoTcjyVf2uYnXVF/MbMRd31RlI/X1+PYNE fqA= X-IronPort-AV: E=Sophos;i="5.73,383,1583164800"; d="scan'208";a="141823539" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 12 May 2020 16:56:09 +0800 IronPort-SDR: 26bmWEsRCClmkYs5Uci8qBF3PRh/THst84gNeKr152ny9ZLYZKSxDIx4cfJnFO+w8SaDnhW05G 0sfRHlAfI6pX1dibjiNaFk1sHdy7ADImc= Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 May 2020 01:45:51 -0700 IronPort-SDR: WDUns7fXxI5AWU71KzZW4s9IvKpbG0yh8GfkTtfQqIHauvqFlx+sG8frY8Frkyd77YhWJjr5G5 tkZnVkR0p2AA== WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 12 May 2020 01:56:06 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn , Christoph Hellwig , Hannes Reinecke Subject: [PATCH v11 03/10] block: Introduce REQ_OP_ZONE_APPEND Date: Tue, 12 May 2020 17:55:47 +0900 Message-Id: <20200512085554.26366-4-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200512085554.26366-1-johannes.thumshirn@wdc.com> References: <20200512085554.26366-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Keith Busch Define REQ_OP_ZONE_APPEND to append-write sectors to a zone of a zoned block device. This is a no-merge write operation. A zone append write BIO must: * Target a zoned block device * Have a sector position indicating the start sector of the target zone * The target zone must be a sequential write zone * The BIO must not cross a zone boundary * The BIO size must not be split to ensure that a single range of LBAs is written with a single command. Implement these checks in generic_make_request_checks() using the helper function blk_check_zone_append(). To avoid write append BIO splitting, introduce the new max_zone_append_sectors queue limit attribute and ensure that a BIO size is always lower than this limit. Export this new limit through sysfs and check these limits in bio_full(). Also when a LLDD can't dispatch a request to a specific zone, it will return BLK_STS_ZONE_RESOURCE indicating this request needs to be delayed, e.g. because the zone it will be dispatched to is still write-locked. If this happens set the request aside in a local list to continue trying dispatching requests such as READ requests or a WRITE/ZONE_APPEND requests targetting other zones. This way we can still keep a high queue depth without starving other requests even if one request can't be served due to zone write-locking. Finally, make sure that the bio sector position indicates the actual write position as indicated by the device on completion. Signed-off-by: Keith Busch [ jth: added zone-append specific add_page and merge_page helpers ] Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: Martin K. Petersen --- block/bio.c | 62 ++++++++++++++++++++++++++++++++++++--- block/blk-core.c | 52 ++++++++++++++++++++++++++++++++ block/blk-mq.c | 27 +++++++++++++++++ block/blk-settings.c | 31 ++++++++++++++++++++ block/blk-sysfs.c | 13 ++++++++ drivers/scsi/scsi_lib.c | 1 + include/linux/blk_types.h | 14 +++++++++ include/linux/blkdev.h | 11 +++++++ 8 files changed, 207 insertions(+), 4 deletions(-) diff --git a/block/bio.c b/block/bio.c index aad0a6dad4f9..3aa3c4ce2e5e 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1025,6 +1025,50 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) return 0; } +static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter) +{ + unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt; + unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt; + struct request_queue *q = bio->bi_disk->queue; + unsigned int max_append_sectors = queue_max_zone_append_sectors(q); + struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt; + struct page **pages = (struct page **)bv; + ssize_t size, left; + unsigned len, i; + size_t offset; + + if (WARN_ON_ONCE(!max_append_sectors)) + return 0; + + /* + * Move page array up in the allocated memory for the bio vecs as far as + * possible so that we can start filling biovecs from the beginning + * without overwriting the temporary page array. + */ + BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2); + pages += entries_left * (PAGE_PTRS_PER_BVEC - 1); + + size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset); + if (unlikely(size <= 0)) + return size ? size : -EFAULT; + + for (left = size, i = 0; left > 0; left -= len, i++) { + struct page *page = pages[i]; + bool same_page = false; + + len = min_t(size_t, PAGE_SIZE - offset, left); + if (bio_add_hw_page(q, bio, page, len, offset, + max_append_sectors, &same_page) != len) + return -EINVAL; + if (same_page) + put_page(page); + offset = 0; + } + + iov_iter_advance(iter, size); + return 0; +} + /** * bio_iov_iter_get_pages - add user or kernel pages to a bio * @bio: bio to add pages to @@ -1054,10 +1098,16 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) return -EINVAL; do { - if (is_bvec) - ret = __bio_iov_bvec_add_pages(bio, iter); - else - ret = __bio_iov_iter_get_pages(bio, iter); + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + if (WARN_ON_ONCE(is_bvec)) + return -EINVAL; + ret = __bio_iov_append_get_pages(bio, iter); + } else { + if (is_bvec) + ret = __bio_iov_bvec_add_pages(bio, iter); + else + ret = __bio_iov_iter_get_pages(bio, iter); + } } while (!ret && iov_iter_count(iter) && !bio_full(bio, 0)); if (is_bvec) @@ -1460,6 +1510,10 @@ struct bio *bio_split(struct bio *bio, int sectors, BUG_ON(sectors <= 0); BUG_ON(sectors >= bio_sectors(bio)); + /* Zone append commands cannot be split */ + if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND)) + return NULL; + split = bio_clone_fast(bio, gfp, bs); if (!split) return NULL; diff --git a/block/blk-core.c b/block/blk-core.c index 538cbc725620..0f95fdfd3827 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -135,6 +135,7 @@ static const char *const blk_op_name[] = { REQ_OP_NAME(ZONE_OPEN), REQ_OP_NAME(ZONE_CLOSE), REQ_OP_NAME(ZONE_FINISH), + REQ_OP_NAME(ZONE_APPEND), REQ_OP_NAME(WRITE_SAME), REQ_OP_NAME(WRITE_ZEROES), REQ_OP_NAME(SCSI_IN), @@ -240,6 +241,17 @@ static void req_bio_endio(struct request *rq, struct bio *bio, bio_advance(bio, nbytes); + if (req_op(rq) == REQ_OP_ZONE_APPEND && error == BLK_STS_OK) { + /* + * Partial zone append completions cannot be supported as the + * BIO fragments may end up not being written sequentially. + */ + if (bio->bi_iter.bi_size) + bio->bi_status = BLK_STS_IOERR; + else + bio->bi_iter.bi_sector = rq->__sector; + } + /* don't actually finish bio if it's part of flush sequence */ if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ)) bio_endio(bio); @@ -887,6 +899,41 @@ static inline int blk_partition_remap(struct bio *bio) return ret; } +/* + * Check write append to a zoned block device. + */ +static inline blk_status_t blk_check_zone_append(struct request_queue *q, + struct bio *bio) +{ + sector_t pos = bio->bi_iter.bi_sector; + int nr_sectors = bio_sectors(bio); + + /* Only applicable to zoned block devices */ + if (!blk_queue_is_zoned(q)) + return BLK_STS_NOTSUPP; + + /* The bio sector must point to the start of a sequential zone */ + if (pos & (blk_queue_zone_sectors(q) - 1) || + !blk_queue_zone_is_seq(q, pos)) + return BLK_STS_IOERR; + + /* + * Not allowed to cross zone boundaries. Otherwise, the BIO will be + * split and could result in non-contiguous sectors being written in + * different zones. + */ + if (nr_sectors > q->limits.chunk_sectors) + return BLK_STS_IOERR; + + /* Make sure the BIO is small enough and will not get split */ + if (nr_sectors > q->limits.max_zone_append_sectors) + return BLK_STS_IOERR; + + bio->bi_opf |= REQ_NOMERGE; + + return BLK_STS_OK; +} + static noinline_for_stack bool generic_make_request_checks(struct bio *bio) { @@ -959,6 +1006,11 @@ generic_make_request_checks(struct bio *bio) if (!q->limits.max_write_same_sectors) goto not_supported; break; + case REQ_OP_ZONE_APPEND: + status = blk_check_zone_append(q, bio); + if (status != BLK_STS_OK) + goto end_io; + break; case REQ_OP_ZONE_RESET: case REQ_OP_ZONE_OPEN: case REQ_OP_ZONE_CLOSE: diff --git a/block/blk-mq.c b/block/blk-mq.c index bc34d6b572b6..d743f0af5fb8 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1183,6 +1183,19 @@ static void blk_mq_handle_dev_resource(struct request *rq, __blk_mq_requeue_request(rq); } +static void blk_mq_handle_zone_resource(struct request *rq, + struct list_head *zone_list) +{ + /* + * If we end up here it is because we cannot dispatch a request to a + * specific zone due to LLD level zone-write locking or other zone + * related resource not being available. In this case, set the request + * aside in zone_list for retrying it later. + */ + list_add(&rq->queuelist, zone_list); + __blk_mq_requeue_request(rq); +} + /* * Returns true if we did some work AND can potentially do more. */ @@ -1195,6 +1208,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, int errors, queued; blk_status_t ret = BLK_STS_OK; bool no_budget_avail = false; + LIST_HEAD(zone_list); if (list_empty(list)) return false; @@ -1256,6 +1270,16 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { blk_mq_handle_dev_resource(rq, list); break; + } else if (ret == BLK_STS_ZONE_RESOURCE) { + /* + * Move the request to zone_list and keep going through + * the dispatch list to find more requests the drive can + * accept. + */ + blk_mq_handle_zone_resource(rq, &zone_list); + if (list_empty(list)) + break; + continue; } if (unlikely(ret != BLK_STS_OK)) { @@ -1267,6 +1291,9 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, queued++; } while (!list_empty(list)); + if (!list_empty(&zone_list)) + list_splice_tail_init(&zone_list, list); + hctx->dispatched[queued_to_index(queued)]++; /* diff --git a/block/blk-settings.c b/block/blk-settings.c index 2ab1967b9716..9a2c23cd9700 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -48,6 +48,7 @@ void blk_set_default_limits(struct queue_limits *lim) lim->chunk_sectors = 0; lim->max_write_same_sectors = 0; lim->max_write_zeroes_sectors = 0; + lim->max_zone_append_sectors = 0; lim->max_discard_sectors = 0; lim->max_hw_discard_sectors = 0; lim->discard_granularity = 0; @@ -83,6 +84,7 @@ void blk_set_stacking_limits(struct queue_limits *lim) lim->max_dev_sectors = UINT_MAX; lim->max_write_same_sectors = UINT_MAX; lim->max_write_zeroes_sectors = UINT_MAX; + lim->max_zone_append_sectors = UINT_MAX; } EXPORT_SYMBOL(blk_set_stacking_limits); @@ -221,6 +223,33 @@ void blk_queue_max_write_zeroes_sectors(struct request_queue *q, } EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors); +/** + * blk_queue_max_zone_append_sectors - set max sectors for a single zone append + * @q: the request queue for the device + * @max_zone_append_sectors: maximum number of sectors to write per command + **/ +void blk_queue_max_zone_append_sectors(struct request_queue *q, + unsigned int max_zone_append_sectors) +{ + unsigned int max_sectors; + + if (WARN_ON(!blk_queue_is_zoned(q))) + return; + + max_sectors = min(q->limits.max_hw_sectors, max_zone_append_sectors); + max_sectors = min(q->limits.chunk_sectors, max_sectors); + + /* + * Signal eventual driver bugs resulting in the max_zone_append sectors limit + * being 0 due to a 0 argument, the chunk_sectors limit (zone size) not set, + * or the max_hw_sectors limit not set. + */ + WARN_ON(!max_sectors); + + q->limits.max_zone_append_sectors = max_sectors; +} +EXPORT_SYMBOL_GPL(blk_queue_max_zone_append_sectors); + /** * blk_queue_max_segments - set max hw segments for a request for this queue * @q: the request queue for the device @@ -470,6 +499,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, b->max_write_same_sectors); t->max_write_zeroes_sectors = min(t->max_write_zeroes_sectors, b->max_write_zeroes_sectors); + t->max_zone_append_sectors = min(t->max_zone_append_sectors, + b->max_zone_append_sectors); t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn); t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask, diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index fca9b158f4a0..02643e149d5e 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -218,6 +218,13 @@ static ssize_t queue_write_zeroes_max_show(struct request_queue *q, char *page) (unsigned long long)q->limits.max_write_zeroes_sectors << 9); } +static ssize_t queue_zone_append_max_show(struct request_queue *q, char *page) +{ + unsigned long long max_sectors = q->limits.max_zone_append_sectors; + + return sprintf(page, "%llu\n", max_sectors << SECTOR_SHIFT); +} + static ssize_t queue_max_sectors_store(struct request_queue *q, const char *page, size_t count) { @@ -639,6 +646,11 @@ static struct queue_sysfs_entry queue_write_zeroes_max_entry = { .show = queue_write_zeroes_max_show, }; +static struct queue_sysfs_entry queue_zone_append_max_entry = { + .attr = {.name = "zone_append_max_bytes", .mode = 0444 }, + .show = queue_zone_append_max_show, +}; + static struct queue_sysfs_entry queue_nonrot_entry = { .attr = {.name = "rotational", .mode = 0644 }, .show = queue_show_nonrot, @@ -749,6 +761,7 @@ static struct attribute *queue_attrs[] = { &queue_discard_zeroes_data_entry.attr, &queue_write_same_max_entry.attr, &queue_write_zeroes_max_entry.attr, + &queue_zone_append_max_entry.attr, &queue_nonrot_entry.attr, &queue_zoned_entry.attr, &queue_nr_zones_entry.attr, diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index f0cb26b3da6a..af00e4a3f006 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1712,6 +1712,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, case BLK_STS_OK: break; case BLK_STS_RESOURCE: + case BLK_STS_ZONE_RESOURCE: if (atomic_read(&sdev->device_busy) || scsi_device_blocked(sdev)) ret = BLK_STS_DEV_RESOURCE; diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 90895d594e64..b90dca1fa430 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -63,6 +63,18 @@ typedef u8 __bitwise blk_status_t; */ #define BLK_STS_DEV_RESOURCE ((__force blk_status_t)13) +/* + * BLK_STS_ZONE_RESOURCE is returned from the driver to the block layer if zone + * related resources are unavailable, but the driver can guarantee the queue + * will be rerun in the future once the resources become available again. + * + * This is different from BLK_STS_DEV_RESOURCE in that it explicitly references + * a zone specific resource and IO to a different zone on the same device could + * still be served. Examples of that are zones that are write-locked, but a read + * to the same zone could be served. + */ +#define BLK_STS_ZONE_RESOURCE ((__force blk_status_t)14) + /** * blk_path_error - returns true if error may be path related * @error: status the request was completed with @@ -296,6 +308,8 @@ enum req_opf { REQ_OP_ZONE_CLOSE = 11, /* Transition a zone to full */ REQ_OP_ZONE_FINISH = 12, + /* write data at the current zone write pointer */ + REQ_OP_ZONE_APPEND = 13, /* SCSI passthrough using struct scsi_request */ REQ_OP_SCSI_IN = 32, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 91c6e413bf6b..158641fbc7cd 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -331,6 +331,7 @@ struct queue_limits { unsigned int max_hw_discard_sectors; unsigned int max_write_same_sectors; unsigned int max_write_zeroes_sectors; + unsigned int max_zone_append_sectors; unsigned int discard_granularity; unsigned int discard_alignment; @@ -749,6 +750,9 @@ static inline bool rq_mergeable(struct request *rq) if (req_op(rq) == REQ_OP_WRITE_ZEROES) return false; + if (req_op(rq) == REQ_OP_ZONE_APPEND) + return false; + if (rq->cmd_flags & REQ_NOMERGE_FLAGS) return false; if (rq->rq_flags & RQF_NOMERGE_FLAGS) @@ -1083,6 +1087,8 @@ extern void blk_queue_max_write_same_sectors(struct request_queue *q, extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q, unsigned int max_write_same_sectors); extern void blk_queue_logical_block_size(struct request_queue *, unsigned int); +extern void blk_queue_max_zone_append_sectors(struct request_queue *q, + unsigned int max_zone_append_sectors); extern void blk_queue_physical_block_size(struct request_queue *, unsigned int); extern void blk_queue_alignment_offset(struct request_queue *q, unsigned int alignment); @@ -1300,6 +1306,11 @@ static inline unsigned int queue_max_segment_size(const struct request_queue *q) return q->limits.max_segment_size; } +static inline unsigned int queue_max_zone_append_sectors(const struct request_queue *q) +{ + return q->limits.max_zone_append_sectors; +} + static inline unsigned queue_logical_block_size(const struct request_queue *q) { int retval = 512; From patchwork Tue May 12 08:55:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11542371 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 03DBC912 for ; Tue, 12 May 2020 08:56:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DDBA02186A for ; Tue, 12 May 2020 08:56:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="JDngQPSj" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729378AbgELI4O (ORCPT ); Tue, 12 May 2020 04:56:14 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:16022 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729367AbgELI4K (ORCPT ); Tue, 12 May 2020 04:56:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1589273771; x=1620809771; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=h9S/KYy5dLV7OMmurO/YDlMGEqqAyiyisBAqUMQa/FM=; b=JDngQPSjjKartAbH28IOjC6urnd99D/W10ib3zn+W+T+nDX686LF5H0Q TIVFrRGeEVvovgj2yvPUpUUBM4DouGwQTWRlsl0bdrYKQ8nQyigfUWSzu SSVuTNx3BvGvYvVjgsZVEcF6uB3U07kwKdV/i62pA7O6UVnwdv7QEq0Hu cNrYN9tXwRIjw6o7xqR3efJGJ2+y4SHA7jNmMH/ciwh0DWFV0QriESHRW ycnRRQwAOQweKbpS6iCBfHesrXoKyGoDXEqM3UeVhZRW5qciK8o/doDy2 uaSCN0h4hhZl9gi62yg72vFkpDvmMEOYH3zkGekFR1LOI6kHobf3Smzag g==; IronPort-SDR: l4nNK0FkJ+GOm6L5u+7tNTnMm9YsuOy9eDeCi0BI2TdMFmkYZxw3+PLYCn92Kl7/nDwmq/tkAi tMyZyzID63lizrHhTlhQlWaaO9CfsNFP5GiUlsoWj/iMa01nfyRhjthxmmTcMUx4IJUW55i4Kf 9Q82CXq5PIpRsWilt3T8CS5+S8W8/XcxGupBz5g+v43LYqBVSoMHThy/dOgJLaIqdfz6yQayCq E2Daz7DGHiE6kAYv1S+/DXni2FnIsYxLrI6xjeyd6OY1x5w9Q6iBK4dKOjYNkEOa8CMeAFbt77 myY= X-IronPort-AV: E=Sophos;i="5.73,383,1583164800"; d="scan'208";a="141823547" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 12 May 2020 16:56:11 +0800 IronPort-SDR: sGI1MCmrhWD3XdE9DM53bmQkUAsD5aJOOBd9sJl1pzFliT7r7iMIT62u3OVvwGjA/DIpa+X05t Oi+9bvV+NwfXVasH3CwM5E0W+/71+j5Y4= Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 May 2020 01:45:53 -0700 IronPort-SDR: r6+UklOx8F4TUqAimMRNYy4eIWtvZWjIRKQOfY7RcOKgNP+fGbACTI2UEmRIM4ogStazY1o8BQ RtgV1llrTD0w== WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 12 May 2020 01:56:08 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn , Christoph Hellwig , Hannes Reinecke Subject: [PATCH v11 04/10] block: introduce blk_req_zone_write_trylock Date: Tue, 12 May 2020 17:55:48 +0900 Message-Id: <20200512085554.26366-5-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200512085554.26366-1-johannes.thumshirn@wdc.com> References: <20200512085554.26366-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Introduce blk_req_zone_write_trylock(), which either grabs the write-lock for a sequential zone or returns false, if the zone is already locked. Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: Martin K. Petersen --- block/blk-zoned.c | 14 ++++++++++++++ include/linux/blkdev.h | 1 + 2 files changed, 15 insertions(+) diff --git a/block/blk-zoned.c b/block/blk-zoned.c index f87956e0dcaf..c822cfa7a102 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -82,6 +82,20 @@ bool blk_req_needs_zone_write_lock(struct request *rq) } EXPORT_SYMBOL_GPL(blk_req_needs_zone_write_lock); +bool blk_req_zone_write_trylock(struct request *rq) +{ + unsigned int zno = blk_rq_zone_no(rq); + + if (test_and_set_bit(zno, rq->q->seq_zones_wlock)) + return false; + + WARN_ON_ONCE(rq->rq_flags & RQF_ZONE_WRITE_LOCKED); + rq->rq_flags |= RQF_ZONE_WRITE_LOCKED; + + return true; +} +EXPORT_SYMBOL_GPL(blk_req_zone_write_trylock); + void __blk_req_zone_write_lock(struct request *rq) { if (WARN_ON_ONCE(test_and_set_bit(blk_rq_zone_no(rq), diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 158641fbc7cd..d6e6ce3dc656 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1737,6 +1737,7 @@ extern int bdev_write_page(struct block_device *, sector_t, struct page *, #ifdef CONFIG_BLK_DEV_ZONED bool blk_req_needs_zone_write_lock(struct request *rq); +bool blk_req_zone_write_trylock(struct request *rq); void __blk_req_zone_write_lock(struct request *rq); void __blk_req_zone_write_unlock(struct request *rq); From patchwork Tue May 12 08:55:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11542385 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6B164912 for ; Tue, 12 May 2020 08:56:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 52BFC2137B for ; Tue, 12 May 2020 08:56:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="G9IgS8uL" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729261AbgELI4S (ORCPT ); Tue, 12 May 2020 04:56:18 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:16024 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729144AbgELI4M (ORCPT ); Tue, 12 May 2020 04:56:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1589273773; x=1620809773; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mmrsIuvT4emv5cGWG0aWsrxpc1/G5U077NvrBZdYN1I=; b=G9IgS8uLpAZj5wV7kw46chHmDWt18QX7CnU2kHcRL++itbpgai1NiNCv uaTEWY1++6pp1+CileGxZnIFvougdakLH3TUToSOGGfVP5Q6YF58L7r8M vIZNalJpgwXKsT2YGd+w7b+huFIaP5XzZFxJdT+UqGljb6cZA902+RgFW B6G0j0OKKnYcP0PTqKAvgDxM6GbPYyX07iD6/YsJgzTGAAYGzwjtG4hMp cOS4T+AsM3fSI48TE9hCF5NcSSNolBPx411NlXmTvx1mMr7rKGrbM+pG1 +gwd+DCASxepyl2sfaF25MlCOgx2RxfuD+lyXLBww5UdhXcxz3iRfpUI9 A==; IronPort-SDR: GXdosVKINLXF5lUTOy9bsZH6uuZQEvAMFXgk38xcQqlJHxhi3LA8HCzHa48g/75Hr/j5DqW9n9 NVf02BBa3o/kB7JalwfKEUA4O9BRT2MYmRjbHsMTDgYw7S6ERyXbmE79365wyo4CKBkb0b7MbC SQwHyhzlIIXDg1cifMIFbOhujFLwkOQcuKhTiGDIzCzyh+eLk7IGSKL+6FlgeIcZHTDO4oZdU6 diCQuq3+0berFX9xTKYlTgpvziIYZuJVAXOb80qnwPBBdoYi1JHMFvj6mUc2jMneZYD6AHhURs eZA= X-IronPort-AV: E=Sophos;i="5.73,383,1583164800"; d="scan'208";a="141823557" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 12 May 2020 16:56:13 +0800 IronPort-SDR: yWAz9dZIb7jAP4mgsHRrqJmU3B5rXC+CGWiUY0UgvtK8BSq1IBYQjpzHQZR9LSN9++hE5eKRmX 3LQggDJiUjqHnZp4hMHq8QWRae+f5NoY0= Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 May 2020 01:45:55 -0700 IronPort-SDR: Z/Rdoo79UGAQoTBGKb4rIZZpQsoi1no8O4YTRoZCBnGr0ac/e+xi7LMs9tpzhWi0akhqxdsH+3 m1h5sER3mR0g== WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 12 May 2020 01:56:10 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Damien Le Moal , Johannes Thumshirn , Christoph Hellwig , Hannes Reinecke Subject: [PATCH v11 05/10] block: Modify revalidate zones Date: Tue, 12 May 2020 17:55:49 +0900 Message-Id: <20200512085554.26366-6-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200512085554.26366-1-johannes.thumshirn@wdc.com> References: <20200512085554.26366-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Damien Le Moal Modify the interface of blk_revalidate_disk_zones() to add an optional driver callback function that a driver can use to extend processing done during zone revalidation. The callback, if defined, is executed with the device request queue frozen, after all zones have been inspected. Signed-off-by: Damien Le Moal Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: Martin K. Petersen --- block/blk-zoned.c | 9 ++++++++- drivers/block/null_blk_zoned.c | 2 +- include/linux/blkdev.h | 3 ++- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/block/blk-zoned.c b/block/blk-zoned.c index c822cfa7a102..23831fa8701d 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -471,14 +471,19 @@ static int blk_revalidate_zone_cb(struct blk_zone *zone, unsigned int idx, /** * blk_revalidate_disk_zones - (re)allocate and initialize zone bitmaps * @disk: Target disk + * @update_driver_data: Callback to update driver data on the frozen disk * * Helper function for low-level device drivers to (re) allocate and initialize * a disk request queue zone bitmaps. This functions should normally be called * within the disk ->revalidate method for blk-mq based drivers. For BIO based * drivers only q->nr_zones needs to be updated so that the sysfs exposed value * is correct. + * If the @update_driver_data callback function is not NULL, the callback is + * executed with the device request queue frozen after all zones have been + * checked. */ -int blk_revalidate_disk_zones(struct gendisk *disk) +int blk_revalidate_disk_zones(struct gendisk *disk, + void (*update_driver_data)(struct gendisk *disk)) { struct request_queue *q = disk->queue; struct blk_revalidate_zone_args args = { @@ -512,6 +517,8 @@ int blk_revalidate_disk_zones(struct gendisk *disk) q->nr_zones = args.nr_zones; swap(q->seq_zones_wlock, args.seq_zones_wlock); swap(q->conv_zones_bitmap, args.conv_zones_bitmap); + if (update_driver_data) + update_driver_data(disk); ret = 0; } else { pr_warn("%s: failed to revalidate zones\n", disk->disk_name); diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c index 9e4bcdad1a80..46641df2e58e 100644 --- a/drivers/block/null_blk_zoned.c +++ b/drivers/block/null_blk_zoned.c @@ -73,7 +73,7 @@ int null_register_zoned_dev(struct nullb *nullb) struct request_queue *q = nullb->q; if (queue_is_mq(q)) - return blk_revalidate_disk_zones(nullb->disk); + return blk_revalidate_disk_zones(nullb->disk, NULL); blk_queue_chunk_sectors(q, nullb->dev->zone_size_sects); q->nr_zones = blkdev_nr_zones(nullb->disk); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index d6e6ce3dc656..fd405dac8eb0 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -357,7 +357,8 @@ unsigned int blkdev_nr_zones(struct gendisk *disk); extern int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op, sector_t sectors, sector_t nr_sectors, gfp_t gfp_mask); -extern int blk_revalidate_disk_zones(struct gendisk *disk); +int blk_revalidate_disk_zones(struct gendisk *disk, + void (*update_driver_data)(struct gendisk *disk)); extern int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd, unsigned long arg); From patchwork Tue May 12 08:55:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11542383 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B63CF912 for ; Tue, 12 May 2020 08:56:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9F067214DB for ; Tue, 12 May 2020 08:56:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="Wq8TONs4" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729389AbgELI4S (ORCPT ); Tue, 12 May 2020 04:56:18 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:16027 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729256AbgELI4P (ORCPT ); Tue, 12 May 2020 04:56:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1589273775; x=1620809775; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=g5AG8nHQvcBfTW7D00G9a2jNlsRVALeJYYmeP1Th5QM=; b=Wq8TONs41wcWDspoz0txZVmlnfdNKwW9lZd0QjDfKx/9X6zrk49w3B4v o7P2VY2REBUK5lAIVxNlOlFEcAEA5pxeTo9Vhhx/bgHE1pKGGWxoLtoxk QhpOd0WLidR38/ViYi7hmHjtmAOoI0g6DWLztBZTb8FvEKwmgRz/rSZPb Sf/ngFcI3YsHhciJuW3OxIADsLJSdzVKlt79+VFN3v8Z6QgtAc4P8e2RB GAzEDZbgB01Ujpt4YskocoodWuMAovKczyv2EI3ZWVpwpSysO+UAcMeLe Fld+lzQzrUMxy26QmxFoXpVErcyKK4SyN4AFqJfEWynxbfzd9WMLNvZkF g==; IronPort-SDR: XDr0ozXmhBtA+BHOa7HMBBjKLAvh4xwpzTpPah5CrPGPZhXXuzywPtdFA5Ruk3sQziK/f6498H +SP64eqfKncRg6ZrMj2mVTbit06Ma69x86MVkqVUcRA1It+p7j353IBUEby//CLeJfpitnma/7 IBV6jaWNUNwPaS14sxiI3j95u4Y0K9BuFKkI9wwSjsemDXEevh6Q1OS8V448ZKVdxPsH0BlC7R ONSDCkeNC8p/9ewv9eMnC3FEsmaMpxHoyVnGZF/1bfQm9OUtyotqsAvOyJQZCju5e8ZSMvas8t m1c= X-IronPort-AV: E=Sophos;i="5.73,383,1583164800"; d="scan'208";a="141823561" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 12 May 2020 16:56:15 +0800 IronPort-SDR: nxkIwNqd6BTnLuy+ciBRqHRpUKLUcEZej5S57yQd8JTAdALLdVo8s0cEArutoy9T+mFvehpGku nAxLjoL9enn7CiwJgSAkrWMMZFY0sWxrE= Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 May 2020 01:45:57 -0700 IronPort-SDR: Cuz9BrMYQRKWCiVwtppbxEBe6Ov0o1Pc5XoQaTGlrvVqv3WOzToinuy3eA81LFN7ecuqwj5jQ3 ja6wYIIgp6sA== WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 12 May 2020 01:56:12 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn , Christoph Hellwig , Bart Van Assche , Hannes Reinecke Subject: [PATCH v11 06/10] scsi: sd_zbc: factor out sanity checks for zoned commands Date: Tue, 12 May 2020 17:55:50 +0900 Message-Id: <20200512085554.26366-7-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200512085554.26366-1-johannes.thumshirn@wdc.com> References: <20200512085554.26366-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Factor sanity checks for zoned commands from sd_zbc_setup_zone_mgmt_cmnd(). This will help with the introduction of an emulated ZONE_APPEND command. Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig Reviewed-by: Bart Van Assche Reviewed-by: Hannes Reinecke Reviewed-by: Martin K. Petersen --- drivers/scsi/sd_zbc.c | 36 +++++++++++++++++++++++++----------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c index f45c22b09726..ee156fbf3780 100644 --- a/drivers/scsi/sd_zbc.c +++ b/drivers/scsi/sd_zbc.c @@ -209,6 +209,26 @@ int sd_zbc_report_zones(struct gendisk *disk, sector_t sector, return ret; } +static blk_status_t sd_zbc_cmnd_checks(struct scsi_cmnd *cmd) +{ + struct request *rq = cmd->request; + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); + sector_t sector = blk_rq_pos(rq); + + if (!sd_is_zoned(sdkp)) + /* Not a zoned device */ + return BLK_STS_IOERR; + + if (sdkp->device->changed) + return BLK_STS_IOERR; + + if (sector & (sd_zbc_zone_sectors(sdkp) - 1)) + /* Unaligned request */ + return BLK_STS_IOERR; + + return BLK_STS_OK; +} + /** * sd_zbc_setup_zone_mgmt_cmnd - Prepare a zone ZBC_OUT command. The operations * can be RESET WRITE POINTER, OPEN, CLOSE or FINISH. @@ -223,20 +243,14 @@ blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd, unsigned char op, bool all) { struct request *rq = cmd->request; - struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); sector_t sector = blk_rq_pos(rq); + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); sector_t block = sectors_to_logical(sdkp->device, sector); + blk_status_t ret; - if (!sd_is_zoned(sdkp)) - /* Not a zoned device */ - return BLK_STS_IOERR; - - if (sdkp->device->changed) - return BLK_STS_IOERR; - - if (sector & (sd_zbc_zone_sectors(sdkp) - 1)) - /* Unaligned request */ - return BLK_STS_IOERR; + ret = sd_zbc_cmnd_checks(cmd); + if (ret != BLK_STS_OK) + return ret; cmd->cmd_len = 16; memset(cmd->cmnd, 0, cmd->cmd_len); From patchwork Tue May 12 08:55:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11542393 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A5E7815E6 for ; Tue, 12 May 2020 08:56:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6F73F214DB for ; Tue, 12 May 2020 08:56:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="PusD1uQU" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729399AbgELI4U (ORCPT ); Tue, 12 May 2020 04:56:20 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:16030 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729367AbgELI4R (ORCPT ); Tue, 12 May 2020 04:56:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1589273777; x=1620809777; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bwNpJmLnnD5D6+GjfHw3GC+KlC1r6qPGbt/zpNDTH/c=; b=PusD1uQUunok+vRl8xlpuM5xenNe+uN5ZYSTVFARVmBMxSdN2YhYVHGM jUvR4LwBh+gBKiZBP9d4tnb53P6A5fekchBQYPIS8rSBKKbVuahJxs8KA 4y4JyJ7I4/hsUCKui59Sw5MBYun6nWeKhGT0skc2SBLo2OPW2RJhvsyUQ Hz+OeEVnnvBwZFYnohwCFtMAV9N9WNmQRYjHHvh9C2PUZTOPhSLADbTHR QJq1Pmx1CjQaqDps6jam8JSGxLGElkKMW8Ja6W/cnG3eadT8dC60ilBPo zafHhPB1jIAXexCm06Rg2OjfYXwYbl0Lq1h5l0axs0yLiGISXtOGf1vP+ Q==; IronPort-SDR: Fna4rjY4LXUnwaocyJET5pGu8H6gFlDe3VPNG40PuHdVTvK/s54DofK+JZ7IqCVnerVtIHroE8 v+hAh4jJPCIzLdNxl56Z2zLresyXhhSnwl9CKO49MLCS7s0SDQMnVwlNNmcEnVNI9mVamMxcd2 OXr5Wf0te1IOfe7i5SFEQJ6bnTghZBlHPOiG54hqgUTDPTL6hsdhiowPhKJTsBEyd0kgbgRWzA htDgEWf26KyVooXJ/8x/aYQviko2g+L79dJFDtULHONeu0WPZ7l90UXW8Pg2eaUo5CukjjnnTK C5M= X-IronPort-AV: E=Sophos;i="5.73,383,1583164800"; d="scan'208";a="141823565" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 12 May 2020 16:56:17 +0800 IronPort-SDR: E1HbHAymry/+95Wf0oJgxtpfEjWsbDlcGWBd9GpI8VClnjEkEVI6Iz1UjTdYuWFLG94bmRUkEJ tLTwfi+0/6XhMIOgPchkB8d32xbv/Ehxg= Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 May 2020 01:45:59 -0700 IronPort-SDR: lI9A0kis1mFv2RVTSay3bcG4jH3TtRwYjiIGOvS2Lvok9Lr8/NTxjLyInhCtBFHPmGAUYAH+n/ itco3Hse8yew== WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 12 May 2020 01:56:14 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn , Christoph Hellwig , Hannes Reinecke Subject: [PATCH v11 07/10] scsi: sd_zbc: emulate ZONE_APPEND commands Date: Tue, 12 May 2020 17:55:51 +0900 Message-Id: <20200512085554.26366-8-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200512085554.26366-1-johannes.thumshirn@wdc.com> References: <20200512085554.26366-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Emulate ZONE_APPEND for SCSI disks using a regular WRITE(16) command with a start LBA set to the target zone write pointer position. In order to always know the write pointer position of a sequential write zone, the write pointer of all zones is tracked using an array of 32bits zone write pointer offset attached to the scsi disk structure. Each entry of the array indicate a zone write pointer position relative to the zone start sector. The write pointer offsets are maintained in sync with the device as follows: 1) the write pointer offset of a zone is reset to 0 when a REQ_OP_ZONE_RESET command completes. 2) the write pointer offset of a zone is set to the zone size when a REQ_OP_ZONE_FINISH command completes. 3) the write pointer offset of a zone is incremented by the number of 512B sectors written when a write, write same or a zone append command completes. 4) the write pointer offset of all zones is reset to 0 when a REQ_OP_ZONE_RESET_ALL command completes. Since the block layer does not write lock zones for zone append commands, to ensure a sequential ordering of the regular write commands used for the emulation, the target zone of a zone append command is locked when the function sd_zbc_prepare_zone_append() is called from sd_setup_read_write_cmnd(). If the zone write lock cannot be obtained (e.g. a zone append is in-flight or a regular write has already locked the zone), the zone append command dispatching is delayed by returning BLK_STS_ZONE_RESOURCE. To avoid the need for write locking all zones for REQ_OP_ZONE_RESET_ALL requests, use a spinlock to protect accesses and modifications of the zone write pointer offsets. This spinlock is initialized from sd_probe() using the new function sd_zbc_init(). Co-developed-by: Damien Le Moal Signed-off-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Reviewed-by: Hannes Reinecke --- drivers/scsi/sd.c | 16 +- drivers/scsi/sd.h | 43 ++++- drivers/scsi/sd_zbc.c | 363 +++++++++++++++++++++++++++++++++++++++--- 3 files changed, 392 insertions(+), 30 deletions(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index a793cb08d025..7b0383e42b4c 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1206,6 +1206,12 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd) } } + if (req_op(rq) == REQ_OP_ZONE_APPEND) { + ret = sd_zbc_prepare_zone_append(cmd, &lba, nr_blocks); + if (ret) + return ret; + } + fua = rq->cmd_flags & REQ_FUA ? 0x8 : 0; dix = scsi_prot_sg_count(cmd); dif = scsi_host_dif_capable(cmd->device->host, sdkp->protection_type); @@ -1287,6 +1293,7 @@ static blk_status_t sd_init_command(struct scsi_cmnd *cmd) return sd_setup_flush_cmnd(cmd); case REQ_OP_READ: case REQ_OP_WRITE: + case REQ_OP_ZONE_APPEND: return sd_setup_read_write_cmnd(cmd); case REQ_OP_ZONE_RESET: return sd_zbc_setup_zone_mgmt_cmnd(cmd, ZO_RESET_WRITE_POINTER, @@ -2055,7 +2062,7 @@ static int sd_done(struct scsi_cmnd *SCpnt) out: if (sd_is_zoned(sdkp)) - sd_zbc_complete(SCpnt, good_bytes, &sshdr); + good_bytes = sd_zbc_complete(SCpnt, good_bytes, &sshdr); SCSI_LOG_HLCOMPLETE(1, scmd_printk(KERN_INFO, SCpnt, "sd_done: completed %d of %d bytes\n", @@ -3372,6 +3379,10 @@ static int sd_probe(struct device *dev) sdkp->first_scan = 1; sdkp->max_medium_access_timeouts = SD_MAX_MEDIUM_TIMEOUTS; + error = sd_zbc_init_disk(sdkp); + if (error) + goto out_free_index; + sd_revalidate_disk(gd); gd->flags = GENHD_FL_EXT_DEVT; @@ -3409,6 +3420,7 @@ static int sd_probe(struct device *dev) out_put: put_disk(gd); out_free: + sd_zbc_release_disk(sdkp); kfree(sdkp); out: scsi_autopm_put_device(sdp); @@ -3485,6 +3497,8 @@ static void scsi_disk_release(struct device *dev) put_disk(disk); put_device(&sdkp->device->sdev_gendev); + sd_zbc_release_disk(sdkp); + kfree(sdkp); } diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h index 50fff0bf8c8e..3a74f4b45134 100644 --- a/drivers/scsi/sd.h +++ b/drivers/scsi/sd.h @@ -79,6 +79,12 @@ struct scsi_disk { u32 zones_optimal_open; u32 zones_optimal_nonseq; u32 zones_max_open; + u32 *zones_wp_offset; + spinlock_t zones_wp_offset_lock; + u32 *rev_wp_offset; + struct mutex rev_mutex; + struct work_struct zone_wp_offset_work; + char *zone_wp_update_buf; #endif atomic_t openers; sector_t capacity; /* size in logical blocks */ @@ -207,17 +213,35 @@ static inline int sd_is_zoned(struct scsi_disk *sdkp) #ifdef CONFIG_BLK_DEV_ZONED +int sd_zbc_init_disk(struct scsi_disk *sdkp); +void sd_zbc_release_disk(struct scsi_disk *sdkp); extern int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buffer); extern void sd_zbc_print_zones(struct scsi_disk *sdkp); blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd, unsigned char op, bool all); -extern void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, - struct scsi_sense_hdr *sshdr); +unsigned int sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, + struct scsi_sense_hdr *sshdr); int sd_zbc_report_zones(struct gendisk *disk, sector_t sector, unsigned int nr_zones, report_zones_cb cb, void *data); +blk_status_t sd_zbc_prepare_zone_append(struct scsi_cmnd *cmd, sector_t *lba, + unsigned int nr_blocks); + #else /* CONFIG_BLK_DEV_ZONED */ +static inline int sd_zbc_init(void) +{ + return 0; +} + +static inline int sd_zbc_init_disk(struct scsi_disk *sdkp) +{ + return 0; +} + +static inline void sd_zbc_exit(void) {} +static inline void sd_zbc_release_disk(struct scsi_disk *sdkp) {} + static inline int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf) { @@ -233,9 +257,18 @@ static inline blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd, return BLK_STS_TARGET; } -static inline void sd_zbc_complete(struct scsi_cmnd *cmd, - unsigned int good_bytes, - struct scsi_sense_hdr *sshdr) {} +static inline unsigned int sd_zbc_complete(struct scsi_cmnd *cmd, + unsigned int good_bytes, struct scsi_sense_hdr *sshdr) +{ + return 0; +} + +static inline blk_status_t sd_zbc_prepare_zone_append(struct scsi_cmnd *cmd, + sector_t *lba, + unsigned int nr_blocks) +{ + return BLK_STS_TARGET; +} #define sd_zbc_report_zones NULL diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c index ee156fbf3780..bb87fbba2a09 100644 --- a/drivers/scsi/sd_zbc.c +++ b/drivers/scsi/sd_zbc.c @@ -11,6 +11,7 @@ #include #include #include +#include #include @@ -19,11 +20,36 @@ #include "sd.h" +static unsigned int sd_zbc_get_zone_wp_offset(struct blk_zone *zone) +{ + if (zone->type == ZBC_ZONE_TYPE_CONV) + return 0; + + switch (zone->cond) { + case BLK_ZONE_COND_IMP_OPEN: + case BLK_ZONE_COND_EXP_OPEN: + case BLK_ZONE_COND_CLOSED: + return zone->wp - zone->start; + case BLK_ZONE_COND_FULL: + return zone->len; + case BLK_ZONE_COND_EMPTY: + case BLK_ZONE_COND_OFFLINE: + case BLK_ZONE_COND_READONLY: + default: + /* + * Offline and read-only zones do not have a valid + * write pointer. Use 0 as for an empty zone. + */ + return 0; + } +} + static int sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf, unsigned int idx, report_zones_cb cb, void *data) { struct scsi_device *sdp = sdkp->device; struct blk_zone zone = { 0 }; + int ret; zone.type = buf[0] & 0x0f; zone.cond = (buf[1] >> 4) & 0xf; @@ -39,7 +65,14 @@ static int sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf, zone.cond == ZBC_ZONE_COND_FULL) zone.wp = zone.start + zone.len; - return cb(&zone, idx, data); + ret = cb(&zone, idx, data); + if (ret) + return ret; + + if (sdkp->rev_wp_offset) + sdkp->rev_wp_offset[idx] = sd_zbc_get_zone_wp_offset(&zone); + + return 0; } /** @@ -229,6 +262,116 @@ static blk_status_t sd_zbc_cmnd_checks(struct scsi_cmnd *cmd) return BLK_STS_OK; } +#define SD_ZBC_INVALID_WP_OFST (~0u) +#define SD_ZBC_UPDATING_WP_OFST (SD_ZBC_INVALID_WP_OFST - 1) + +static int sd_zbc_update_wp_offset_cb(struct blk_zone *zone, unsigned int idx, + void *data) +{ + struct scsi_disk *sdkp = data; + + lockdep_assert_held(&sdkp->zones_wp_offset_lock); + + sdkp->zones_wp_offset[idx] = sd_zbc_get_zone_wp_offset(zone); + + return 0; +} + +static void sd_zbc_update_wp_offset_workfn(struct work_struct *work) +{ + struct scsi_disk *sdkp; + unsigned int zno; + int ret; + + sdkp = container_of(work, struct scsi_disk, zone_wp_offset_work); + + spin_lock_bh(&sdkp->zones_wp_offset_lock); + for (zno = 0; zno < sdkp->nr_zones; zno++) { + if (sdkp->zones_wp_offset[zno] != SD_ZBC_UPDATING_WP_OFST) + continue; + + spin_unlock_bh(&sdkp->zones_wp_offset_lock); + ret = sd_zbc_do_report_zones(sdkp, sdkp->zone_wp_update_buf, + SD_BUF_SIZE, + zno * sdkp->zone_blocks, true); + spin_lock_bh(&sdkp->zones_wp_offset_lock); + if (!ret) + sd_zbc_parse_report(sdkp, sdkp->zone_wp_update_buf + 64, + zno, sd_zbc_update_wp_offset_cb, + sdkp); + } + spin_unlock_bh(&sdkp->zones_wp_offset_lock); + + scsi_device_put(sdkp->device); +} + +/** + * sd_zbc_prepare_zone_append() - Prepare an emulated ZONE_APPEND command. + * @cmd: the command to setup + * @lba: the LBA to patch + * @nr_blocks: the number of LBAs to be written + * + * Called from sd_setup_read_write_cmnd() for REQ_OP_ZONE_APPEND. + * @sd_zbc_prepare_zone_append() handles the necessary zone wrote locking and + * patching of the lba for an emulated ZONE_APPEND command. + * + * In case the cached write pointer offset is %SD_ZBC_INVALID_WP_OFST it will + * schedule a REPORT ZONES command and return BLK_STS_IOERR. + */ +blk_status_t sd_zbc_prepare_zone_append(struct scsi_cmnd *cmd, sector_t *lba, + unsigned int nr_blocks) +{ + struct request *rq = cmd->request; + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); + unsigned int wp_offset, zno = blk_rq_zone_no(rq); + blk_status_t ret; + + ret = sd_zbc_cmnd_checks(cmd); + if (ret != BLK_STS_OK) + return ret; + + if (!blk_rq_zone_is_seq(rq)) + return BLK_STS_IOERR; + + /* Unlock of the write lock will happen in sd_zbc_complete() */ + if (!blk_req_zone_write_trylock(rq)) + return BLK_STS_ZONE_RESOURCE; + + spin_lock_bh(&sdkp->zones_wp_offset_lock); + wp_offset = sdkp->zones_wp_offset[zno]; + switch (wp_offset) { + case SD_ZBC_INVALID_WP_OFST: + /* + * We are about to schedule work to update a zone write pointer + * offset, which will cause the zone append command to be + * requeued. So make sure that the scsi device does not go away + * while the work is being processed. + */ + if (scsi_device_get(sdkp->device)) { + ret = BLK_STS_IOERR; + break; + } + sdkp->zones_wp_offset[zno] = SD_ZBC_UPDATING_WP_OFST; + schedule_work(&sdkp->zone_wp_offset_work); + fallthrough; + case SD_ZBC_UPDATING_WP_OFST: + ret = BLK_STS_DEV_RESOURCE; + break; + default: + wp_offset = sectors_to_logical(sdkp->device, wp_offset); + if (wp_offset + nr_blocks > sdkp->zone_blocks) { + ret = BLK_STS_IOERR; + break; + } + + *lba += wp_offset; + } + spin_unlock_bh(&sdkp->zones_wp_offset_lock); + if (ret) + blk_req_zone_write_unlock(rq); + return ret; +} + /** * sd_zbc_setup_zone_mgmt_cmnd - Prepare a zone ZBC_OUT command. The operations * can be RESET WRITE POINTER, OPEN, CLOSE or FINISH. @@ -269,16 +412,105 @@ blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd, return BLK_STS_OK; } +static bool sd_zbc_need_zone_wp_update(struct request *rq) +{ + switch (req_op(rq)) { + case REQ_OP_ZONE_APPEND: + case REQ_OP_ZONE_FINISH: + case REQ_OP_ZONE_RESET: + case REQ_OP_ZONE_RESET_ALL: + return true; + case REQ_OP_WRITE: + case REQ_OP_WRITE_ZEROES: + case REQ_OP_WRITE_SAME: + return blk_rq_zone_is_seq(rq); + default: + return false; + } +} + +/** + * sd_zbc_zone_wp_update - Update cached zone write pointer upon cmd completion + * @cmd: Completed command + * @good_bytes: Command reply bytes + * + * Called from sd_zbc_complete() to handle the update of the cached zone write + * pointer value in case an update is needed. + */ +static unsigned int sd_zbc_zone_wp_update(struct scsi_cmnd *cmd, + unsigned int good_bytes) +{ + int result = cmd->result; + struct request *rq = cmd->request; + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); + unsigned int zno = blk_rq_zone_no(rq); + enum req_opf op = req_op(rq); + + /* + * If we got an error for a command that needs updating the write + * pointer offset cache, we must mark the zone wp offset entry as + * invalid to force an update from disk the next time a zone append + * command is issued. + */ + spin_lock_bh(&sdkp->zones_wp_offset_lock); + + if (result && op != REQ_OP_ZONE_RESET_ALL) { + if (op == REQ_OP_ZONE_APPEND) { + /* Force complete completion (no retry) */ + good_bytes = 0; + scsi_set_resid(cmd, blk_rq_bytes(rq)); + } + + /* + * Force an update of the zone write pointer offset on + * the next zone append access. + */ + if (sdkp->zones_wp_offset[zno] != SD_ZBC_UPDATING_WP_OFST) + sdkp->zones_wp_offset[zno] = SD_ZBC_INVALID_WP_OFST; + goto unlock_wp_offset; + } + + switch (op) { + case REQ_OP_ZONE_APPEND: + rq->__sector += sdkp->zones_wp_offset[zno]; + fallthrough; + case REQ_OP_WRITE_ZEROES: + case REQ_OP_WRITE_SAME: + case REQ_OP_WRITE: + if (sdkp->zones_wp_offset[zno] < sd_zbc_zone_sectors(sdkp)) + sdkp->zones_wp_offset[zno] += + good_bytes >> SECTOR_SHIFT; + break; + case REQ_OP_ZONE_RESET: + sdkp->zones_wp_offset[zno] = 0; + break; + case REQ_OP_ZONE_FINISH: + sdkp->zones_wp_offset[zno] = sd_zbc_zone_sectors(sdkp); + break; + case REQ_OP_ZONE_RESET_ALL: + memset(sdkp->zones_wp_offset, 0, + sdkp->nr_zones * sizeof(unsigned int)); + break; + default: + break; + } + +unlock_wp_offset: + spin_unlock_bh(&sdkp->zones_wp_offset_lock); + + return good_bytes; +} + /** * sd_zbc_complete - ZBC command post processing. * @cmd: Completed command * @good_bytes: Command reply bytes * @sshdr: command sense header * - * Called from sd_done(). Process report zones reply and handle reset zone - * and write commands errors. + * Called from sd_done() to handle zone commands errors and updates to the + * device queue zone write pointer offset cahce. */ -void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, +unsigned int sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, struct scsi_sense_hdr *sshdr) { int result = cmd->result; @@ -294,7 +526,13 @@ void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes, * so be quiet about the error. */ rq->rq_flags |= RQF_QUIET; - } + } else if (sd_zbc_need_zone_wp_update(rq)) + good_bytes = sd_zbc_zone_wp_update(cmd, good_bytes); + + if (req_op(rq) == REQ_OP_ZONE_APPEND) + blk_req_zone_write_unlock(rq); + + return good_bytes; } /** @@ -396,11 +634,67 @@ static int sd_zbc_check_capacity(struct scsi_disk *sdkp, unsigned char *buf, return 0; } +static void sd_zbc_revalidate_zones_cb(struct gendisk *disk) +{ + struct scsi_disk *sdkp = scsi_disk(disk); + + swap(sdkp->zones_wp_offset, sdkp->rev_wp_offset); +} + +static int sd_zbc_revalidate_zones(struct scsi_disk *sdkp, + u32 zone_blocks, + unsigned int nr_zones) +{ + struct gendisk *disk = sdkp->disk; + int ret = 0; + + /* + * Make sure revalidate zones are serialized to ensure exclusive + * updates of the scsi disk data. + */ + mutex_lock(&sdkp->rev_mutex); + + /* + * Revalidate the disk zones to update the device request queue zone + * bitmaps and the zone write pointer offset array. Do this only once + * the device capacity is set on the second revalidate execution for + * disk scan or if something changed when executing a normal revalidate. + */ + if (sdkp->first_scan) { + sdkp->zone_blocks = zone_blocks; + sdkp->nr_zones = nr_zones; + goto unlock; + } + + if (sdkp->zone_blocks == zone_blocks && + sdkp->nr_zones == nr_zones && + disk->queue->nr_zones == nr_zones) + goto unlock; + + sdkp->rev_wp_offset = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO); + if (!sdkp->rev_wp_offset) { + ret = -ENOMEM; + goto unlock; + } + + ret = blk_revalidate_disk_zones(disk, sd_zbc_revalidate_zones_cb); + + kvfree(sdkp->rev_wp_offset); + sdkp->rev_wp_offset = NULL; + +unlock: + mutex_unlock(&sdkp->rev_mutex); + + return ret; +} + int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf) { struct gendisk *disk = sdkp->disk; + struct request_queue *q = disk->queue; unsigned int nr_zones; u32 zone_blocks = 0; + u32 max_append; int ret; if (!sd_is_zoned(sdkp)) @@ -421,35 +715,31 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf) goto err; /* The drive satisfies the kernel restrictions: set it up */ - blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, sdkp->disk->queue); - blk_queue_required_elevator_features(sdkp->disk->queue, - ELEVATOR_F_ZBD_SEQ_WRITE); + blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q); + blk_queue_required_elevator_features(q, ELEVATOR_F_ZBD_SEQ_WRITE); nr_zones = round_up(sdkp->capacity, zone_blocks) >> ilog2(zone_blocks); /* READ16/WRITE16 is mandatory for ZBC disks */ sdkp->device->use_16_for_rw = 1; sdkp->device->use_10_for_rw = 0; + ret = sd_zbc_revalidate_zones(sdkp, zone_blocks, nr_zones); + if (ret) + goto err; + /* - * Revalidate the disk zone bitmaps once the block device capacity is - * set on the second revalidate execution during disk scan and if - * something changed when executing a normal revalidate. + * On the first scan 'chunk_sectors' isn't setup yet, so calling + * blk_queue_max_zone_append_sectors() will result in a WARN(). Defer + * this setting to the second scan. */ - if (sdkp->first_scan) { - sdkp->zone_blocks = zone_blocks; - sdkp->nr_zones = nr_zones; + if (sdkp->first_scan) return 0; - } - if (sdkp->zone_blocks != zone_blocks || - sdkp->nr_zones != nr_zones || - disk->queue->nr_zones != nr_zones) { - ret = blk_revalidate_disk_zones(disk); - if (ret != 0) - goto err; - sdkp->zone_blocks = zone_blocks; - sdkp->nr_zones = nr_zones; - } + max_append = min_t(u32, logical_to_sectors(sdkp->device, zone_blocks), + q->limits.max_segments << (PAGE_SHIFT - 9)); + max_append = min_t(u32, max_append, queue_max_hw_sectors(q)); + + blk_queue_max_zone_append_sectors(q, max_append); return 0; @@ -475,3 +765,28 @@ void sd_zbc_print_zones(struct scsi_disk *sdkp) sdkp->nr_zones, sdkp->zone_blocks); } + +int sd_zbc_init_disk(struct scsi_disk *sdkp) +{ + if (!sd_is_zoned(sdkp)) + return 0; + + sdkp->zones_wp_offset = NULL; + spin_lock_init(&sdkp->zones_wp_offset_lock); + sdkp->rev_wp_offset = NULL; + mutex_init(&sdkp->rev_mutex); + INIT_WORK(&sdkp->zone_wp_offset_work, sd_zbc_update_wp_offset_workfn); + sdkp->zone_wp_update_buf = kzalloc(SD_BUF_SIZE, GFP_KERNEL); + if (!sdkp->zone_wp_update_buf) + return -ENOMEM; + + return 0; +} + +void sd_zbc_release_disk(struct scsi_disk *sdkp) +{ + kvfree(sdkp->zones_wp_offset); + sdkp->zones_wp_offset = NULL; + kfree(sdkp->zone_wp_update_buf); + sdkp->zone_wp_update_buf = NULL; +} From patchwork Tue May 12 08:55:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11542413 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 68FC615E6 for ; Tue, 12 May 2020 08:56:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4F60020A8B for ; Tue, 12 May 2020 08:56:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="Ns7PA6lG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729394AbgELI41 (ORCPT ); Tue, 12 May 2020 04:56:27 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:16030 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729256AbgELI4T (ORCPT ); Tue, 12 May 2020 04:56:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1589273779; x=1620809779; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vC6fIcl8UNycAk5TzfaPQt3mtOYMSDj8SXxAWxcvzvc=; b=Ns7PA6lG99vwlgZlrVSqRNbClpV7y1gnwJBDWS0Mwyq9Qb8j7G2kNPRW PR7zEjUCHnJxXQWYchxNPXf0AIs2qT0xLIQBsfNHMZnZvioJB9ZTudJnp rVfFUg2Wqe4n1oBRf5W8RY+CxCYjb3eFJPXgOO6qx4OS5rLvQPCBZmSrs FEe+Wa5ENcsnQGIQ7KQgK/wxw2SNfK1oKtQpZEQ4x1NN7B1dTd/5Hb2zG M4fO+kebsv4FZ4cc7EghuueiEgfSXh9DiH2M1IrJpojT7ppSj1DaKX99J O85/7iXrQp90JM6aTDbAVYgxbtfr5HKSIZ8CsERj80IjmBjAmZ3n2ZU6S Q==; IronPort-SDR: GBdPTr2AD2KlxPbjrngsyjfBSmDEi8yGmg8/xP7zWQ6Ze0N/5Xdn68tXaXfwP+qLoKMTVUiWsR HcmKOZ+GCHpE38R7NExXEBWFaD14RW6hIfPH0mmYb47hv6QC8NtjaThoAo1F4La4J1PIMghSdo myri+r0HdNq5ruhn06FTMBneA57QBOjzmRVXMDYp8MpBhPByoEEaXpQFwJpumCXkbtv9I3EqP6 0clikxc+pItGHDz9144M14Bc753iQyO52WsEPOQUKBp8uxIv7lnch4TG37ZN5EC1JN4U8aCrUu SEI= X-IronPort-AV: E=Sophos;i="5.73,383,1583164800"; d="scan'208";a="141823569" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 12 May 2020 16:56:19 +0800 IronPort-SDR: b3/+sDJF7rG452guAAQ3a1Xm2TZwQDnaaBJd4hTQgtLClKiSBDz0c2srzW5BGXmu4p0NYvqRnM P9RVaw+13ZtKQ9DVo5un47PyXSEf7jelc= Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 May 2020 01:46:01 -0700 IronPort-SDR: Q6NfnjeDGmusNEvKwosiFCbvk/4Tf4DAeNFqZ1zFfe43aN/XU4/bNCW15TtqA6PFf5hh4uToOE t78W65S2nsIA== WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 12 May 2020 01:56:16 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Damien Le Moal , Hannes Reinecke , Christoph Hellwig , Johannes Thumshirn Subject: [PATCH v11 08/10] null_blk: Support REQ_OP_ZONE_APPEND Date: Tue, 12 May 2020 17:55:52 +0900 Message-Id: <20200512085554.26366-9-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200512085554.26366-1-johannes.thumshirn@wdc.com> References: <20200512085554.26366-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Damien Le Moal Support REQ_OP_ZONE_APPEND requests for null_blk devices with zoned mode enabled. Use the internally tracked zone write pointer position as the actual write position and return it using the command request __sector field in the case of an mq device and using the command BIO sector in the case of a BIO device. Reviewed-by: Martin K. Petersen Reviewed-by: Hannes Reinecke Reviewed-by: Christoph Hellwig Signed-off-by: Damien Le Moal Signed-off-by: Johannes Thumshirn --- drivers/block/null_blk_zoned.c | 37 ++++++++++++++++++++++++++-------- 1 file changed, 29 insertions(+), 8 deletions(-) diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c index 46641df2e58e..9c19f747f394 100644 --- a/drivers/block/null_blk_zoned.c +++ b/drivers/block/null_blk_zoned.c @@ -70,13 +70,20 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q) int null_register_zoned_dev(struct nullb *nullb) { + struct nullb_device *dev = nullb->dev; struct request_queue *q = nullb->q; - if (queue_is_mq(q)) - return blk_revalidate_disk_zones(nullb->disk, NULL); + if (queue_is_mq(q)) { + int ret = blk_revalidate_disk_zones(nullb->disk, NULL); + + if (ret) + return ret; + } else { + blk_queue_chunk_sectors(q, dev->zone_size_sects); + q->nr_zones = blkdev_nr_zones(nullb->disk); + } - blk_queue_chunk_sectors(q, nullb->dev->zone_size_sects); - q->nr_zones = blkdev_nr_zones(nullb->disk); + blk_queue_max_zone_append_sectors(q, dev->zone_size_sects); return 0; } @@ -138,7 +145,7 @@ size_t null_zone_valid_read_len(struct nullb *nullb, } static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector, - unsigned int nr_sectors) + unsigned int nr_sectors, bool append) { struct nullb_device *dev = cmd->nq->dev; unsigned int zno = null_zone_no(dev, sector); @@ -158,9 +165,21 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector, case BLK_ZONE_COND_IMP_OPEN: case BLK_ZONE_COND_EXP_OPEN: case BLK_ZONE_COND_CLOSED: - /* Writes must be at the write pointer position */ - if (sector != zone->wp) + /* + * Regular writes must be at the write pointer position. + * Zone append writes are automatically issued at the write + * pointer and the position returned using the request or BIO + * sector. + */ + if (append) { + sector = zone->wp; + if (cmd->bio) + cmd->bio->bi_iter.bi_sector = sector; + else + cmd->rq->__sector = sector; + } else if (sector != zone->wp) { return BLK_STS_IOERR; + } if (zone->cond != BLK_ZONE_COND_EXP_OPEN) zone->cond = BLK_ZONE_COND_IMP_OPEN; @@ -242,7 +261,9 @@ blk_status_t null_process_zoned_cmd(struct nullb_cmd *cmd, enum req_opf op, { switch (op) { case REQ_OP_WRITE: - return null_zone_write(cmd, sector, nr_sectors); + return null_zone_write(cmd, sector, nr_sectors, false); + case REQ_OP_ZONE_APPEND: + return null_zone_write(cmd, sector, nr_sectors, true); case REQ_OP_ZONE_RESET: case REQ_OP_ZONE_RESET_ALL: case REQ_OP_ZONE_OPEN: From patchwork Tue May 12 08:55:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11542401 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B76A4186E for ; Tue, 12 May 2020 08:56:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9FB05207FF for ; Tue, 12 May 2020 08:56:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="O3G8cn82" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729401AbgELI4X (ORCPT ); Tue, 12 May 2020 04:56:23 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:16035 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729392AbgELI4V (ORCPT ); Tue, 12 May 2020 04:56:21 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1589273781; x=1620809781; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OxG58gmQxO/wsNKD46rcb3CCQQtQoICGm9rQW0sMuDY=; b=O3G8cn82pL52oYH6XS89JRNDArgz1jrp+oGrGc+icESMttRyxwAWQsf+ MLc8KQ/3QpH4tG45LuymEbTzn8c4HTHoWOOqdpNANt26uHPIUUwcgrX6c EgCITunfA9gXmbGUpp0Rnv5swt2CX4kxLDcwyv6oZvhIe1zdNSHhKnB7R /ktnoUaNo0OjcHJ2jjDvSn+ZMqehOBBmqYOeZAbQ0d8j6/KnWlFzvy6ca OFUHH6LsD9pyYLJ2N0hMvUdhP9hrhEYxx+CXHoqW3yFIGjmpLBpPQL56a 7IRtscD8uj3kXkCcugKvDiDBixQ1OIpwJXIpCRnbwtBeda88txzyUvmBA Q==; IronPort-SDR: p3i8mkwVJnYAP2euyf4jc3zLAd+5ogAsxdH6TJrE7aCsCN0ROLKRzkBrddIJVsVlTQJbEKZHrz Sl3jG3JQx8nr/CEcLE6isRc/+ln0GvxI9PmBGVIaO+x/Rabc/ki0xeR40rtrFJZXXVddZkjj0a fFfr/Irm23pDdHKIFV/bpsdAKVkefp66EqXkf9NXIDce2LPhbgxCjc0HFy6AYLxCdEjQTa7CUJ DPchm0vZkLLUelSWtHCfjpz9j3uI5DI0V5vIsps22z6vY/ZvOWfht5P3tjls6iICiFeCebbrmX zkU= X-IronPort-AV: E=Sophos;i="5.73,383,1583164800"; d="scan'208";a="141823571" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 12 May 2020 16:56:21 +0800 IronPort-SDR: ScoxW/IOyA2Ag5sV+y7yfSF7wkD0nUabxiW17Cl3KeeLArSjimZhd/EzXDeYLcZAIgsShWqzMO TER2hG+qq40fugxuFFGIEP+6zL+b6r+yY= Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 May 2020 01:46:03 -0700 IronPort-SDR: iAAE5ms6Ynm5Lj5CMF2vunRHavFvSZF0pMySOoHCANAt+Cxomo4cCNENFCwGqVk/KYP2n8HSPY rpmxSf0X4n5g== WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 12 May 2020 01:56:18 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn , Hannes Reinecke Subject: [PATCH v11 09/10] block: export bio_release_pages and bio_iov_iter_get_pages Date: Tue, 12 May 2020 17:55:53 +0900 Message-Id: <20200512085554.26366-10-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200512085554.26366-1-johannes.thumshirn@wdc.com> References: <20200512085554.26366-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Export bio_release_pages and bio_iov_iter_get_pages, so they can be used from modular code. Signed-off-by: Johannes Thumshirn Reviewed-by: Martin K. Petersen Reviewed-by: Hannes Reinecke --- block/bio.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block/bio.c b/block/bio.c index 3aa3c4ce2e5e..e4c46e2bd5ba 100644 --- a/block/bio.c +++ b/block/bio.c @@ -951,6 +951,7 @@ void bio_release_pages(struct bio *bio, bool mark_dirty) put_page(bvec->bv_page); } } +EXPORT_SYMBOL_GPL(bio_release_pages); static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter) { @@ -1114,6 +1115,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) bio_set_flag(bio, BIO_NO_PAGE_REF); return bio->bi_vcnt ? 0 : ret; } +EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages); static void submit_bio_wait_endio(struct bio *bio) { From patchwork Tue May 12 08:55:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Thumshirn X-Patchwork-Id: 11542403 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0D1C017EA for ; Tue, 12 May 2020 08:56:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E96C92082E for ; Tue, 12 May 2020 08:56:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="WsWYco5L" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729407AbgELI4Y (ORCPT ); Tue, 12 May 2020 04:56:24 -0400 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:16037 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729135AbgELI4W (ORCPT ); Tue, 12 May 2020 04:56:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1589273783; x=1620809783; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=we2PS0L+kq3Y5Q7oruSHo5wZ/NQT+esEYAv00kJsoiI=; b=WsWYco5Lnz84Fvc35Eg3oiRbwo9XxkPfW6R9/3VzvaGe9KYzXGQ49Kf7 FFDtWhBJZcmOpKwr0jX4GnZqX2vk445ypyOelJRUNI9Uw0L9zGkFudIRi eZGvKw1sU2bCQXa78SBFUyqaa8Yghc4peK8IpYr2kK/3YdJYdfhPcahOq dZ7iUcCILeTMweoZkYpF6PdiaY9NzQMzPTxActHQNK31T9GGt6hUk3bDT OIkzjAqJEyCNxpJn9nqITif/sdtpfOTB/Riw3wE+J7c2RQKIXzt4IounE 4J/nSs4Y2mWqMFEmgI+ENpixHyr4uCLLtgt+l+xjQIF5+gQgqmWHkaQIk A==; IronPort-SDR: b8yHKoY1xrnUrhcXWBnoQeLVVQ84NU3Y/m6U0mV1WhxnKKgrk7eK4I+xoy8ZzAxWHptf1p75Jp JUCIlNJhtF8tFEXqvrBnJYf5XjsJY1dNLTrlz4AtjO9yN6DZMW6Xld5C/ptVEkXm6KiWbJ3MMt MJs1CIuLX44gjK5HvOFgQ/z73Fc+RuKYI/QXGxSMF1XoPSGnU7c1/qfLBaS70JZRNI6TitAyAd +q9IrpI7M8gHLLj4D89IT8IHjVAEi6NaLddm9itBFby5i6TmFzDo77zr0ZLhBDBMfN903E1CkB I1U= X-IronPort-AV: E=Sophos;i="5.73,383,1583164800"; d="scan'208";a="141823574" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 12 May 2020 16:56:23 +0800 IronPort-SDR: 493YmtxBDQhLwlH9yUs2wYEuVQobPIBqcNfMBQCLDyuv/XnNn2jN52hyvnScaKbonRNosKwvea yo+VG7DZNy9lTbWpJ4/XGl81wldpMwRRo= Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 May 2020 01:46:04 -0700 IronPort-SDR: lA+ig10vldVJTg4SSXDemIszuei6fSTAC6GZnbEI1D4qvvO0ow2M7yKB7goDKMQfr8Yvh4k+Hu zfHQOKAUV2Gg== WDCIronportException: Internal Received: from unknown (HELO redsun60.ssa.fujisawa.hgst.com) ([10.149.66.36]) by uls-op-cesaip01.wdc.com with ESMTP; 12 May 2020 01:56:20 -0700 From: Johannes Thumshirn To: Jens Axboe Cc: Christoph Hellwig , linux-block , Damien Le Moal , Keith Busch , "linux-scsi @ vger . kernel . org" , "Martin K . Petersen" , "linux-fsdevel @ vger . kernel . org" , Johannes Thumshirn , Damien Le Moal Subject: [PATCH v11 10/10] zonefs: use REQ_OP_ZONE_APPEND for sync DIO Date: Tue, 12 May 2020 17:55:54 +0900 Message-Id: <20200512085554.26366-11-johannes.thumshirn@wdc.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200512085554.26366-1-johannes.thumshirn@wdc.com> References: <20200512085554.26366-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Synchronous direct I/O to a sequential write only zone can be issued using the new REQ_OP_ZONE_APPEND request operation. As dispatching multiple BIOs can potentially result in reordering, we cannot support asynchronous IO via this interface. We also can only dispatch up to queue_max_zone_append_sectors() via the new zone-append method and have to return a short write back to user-space in case an IO larger than queue_max_zone_append_sectors() has been issued. Signed-off-by: Johannes Thumshirn Acked-by: Damien Le Moal --- fs/zonefs/super.c | 80 ++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 72 insertions(+), 8 deletions(-) diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index 3ce9829a6936..0bf7009f50a2 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -20,6 +20,7 @@ #include #include #include +#include #include "zonefs.h" @@ -596,6 +597,61 @@ static const struct iomap_dio_ops zonefs_write_dio_ops = { .end_io = zonefs_file_write_dio_end_io, }; +static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from) +{ + struct inode *inode = file_inode(iocb->ki_filp); + struct zonefs_inode_info *zi = ZONEFS_I(inode); + struct block_device *bdev = inode->i_sb->s_bdev; + unsigned int max; + struct bio *bio; + ssize_t size; + int nr_pages; + ssize_t ret; + + nr_pages = iov_iter_npages(from, BIO_MAX_PAGES); + if (!nr_pages) + return 0; + + max = queue_max_zone_append_sectors(bdev_get_queue(bdev)); + max = ALIGN_DOWN(max << SECTOR_SHIFT, inode->i_sb->s_blocksize); + iov_iter_truncate(from, max); + + bio = bio_alloc_bioset(GFP_NOFS, nr_pages, &fs_bio_set); + if (!bio) + return -ENOMEM; + + bio_set_dev(bio, bdev); + bio->bi_iter.bi_sector = zi->i_zsector; + bio->bi_write_hint = iocb->ki_hint; + bio->bi_ioprio = iocb->ki_ioprio; + bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE; + if (iocb->ki_flags & IOCB_DSYNC) + bio->bi_opf |= REQ_FUA; + + ret = bio_iov_iter_get_pages(bio, from); + if (unlikely(ret)) { + bio_io_error(bio); + return ret; + } + size = bio->bi_iter.bi_size; + task_io_account_write(ret); + + if (iocb->ki_flags & IOCB_HIPRI) + bio_set_polled(bio, iocb); + + ret = submit_bio_wait(bio); + + bio_put(bio); + + zonefs_file_write_dio_end_io(iocb, size, ret, 0); + if (ret >= 0) { + iocb->ki_pos += size; + return size; + } + + return ret; +} + /* * Handle direct writes. For sequential zone files, this is the only possible * write path. For these files, check that the user is issuing writes @@ -611,6 +667,8 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) struct inode *inode = file_inode(iocb->ki_filp); struct zonefs_inode_info *zi = ZONEFS_I(inode); struct super_block *sb = inode->i_sb; + bool sync = is_sync_kiocb(iocb); + bool append = false; size_t count; ssize_t ret; @@ -619,7 +677,7 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) * as this can cause write reordering (e.g. the first aio gets EAGAIN * on the inode lock but the second goes through but is now unaligned). */ - if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !is_sync_kiocb(iocb) && + if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !sync && (iocb->ki_flags & IOCB_NOWAIT)) return -EOPNOTSUPP; @@ -643,16 +701,22 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) } /* Enforce sequential writes (append only) in sequential zones */ - mutex_lock(&zi->i_truncate_mutex); - if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && iocb->ki_pos != zi->i_wpoffset) { + if (zi->i_ztype == ZONEFS_ZTYPE_SEQ) { + mutex_lock(&zi->i_truncate_mutex); + if (iocb->ki_pos != zi->i_wpoffset) { + mutex_unlock(&zi->i_truncate_mutex); + ret = -EINVAL; + goto inode_unlock; + } mutex_unlock(&zi->i_truncate_mutex); - ret = -EINVAL; - goto inode_unlock; + append = sync; } - mutex_unlock(&zi->i_truncate_mutex); - ret = iomap_dio_rw(iocb, from, &zonefs_iomap_ops, - &zonefs_write_dio_ops, is_sync_kiocb(iocb)); + if (append) + ret = zonefs_file_dio_append(iocb, from); + else + ret = iomap_dio_rw(iocb, from, &zonefs_iomap_ops, + &zonefs_write_dio_ops, sync); if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && (ret > 0 || ret == -EIOCBQUEUED)) { if (ret > 0)