From patchwork Tue Jan 26 02:24:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048229 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A904C43381 for ; Tue, 26 Jan 2021 20:06:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C894322B37 for ; Tue, 26 Jan 2021 20:06:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732787AbhAZF11 (ORCPT ); Tue, 26 Jan 2021 00:27:27 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33033 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731694AbhAZC1I (ORCPT ); Mon, 25 Jan 2021 21:27:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628028; x=1643164028; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=eQqHLsjybb7lWCvG4dlunJ6qzMlBgQEQm6/04cQWSLs=; b=Z2imcZ/EOemTr+P6deOUZ4LE9Uol4lS9iaY3cHrOxukwz3Ig/rr+ecsy hkEZ8gvmVZy9/3w65LkPW01eVGgQ93mQeyZzZP15nYUw2ZO2Hi4wbDKkI TIbOeQjqcvBSB46GstEWgXjGFIybeghKQ2L2rMD5mZM0MLpA5+0UNM0H7 8B9tnfWOzfDyM+KNxlHLGa0HHivlbxvULPM308qf0TvFHMuJg5Q7gVK9y 16HLSe0gKD76Fp8Zme4z1tJKyt5jLqKXLSXThxVUBa7Lawv+abW9dqleT aV72cOprSgBzrR3bU7Dh1DI/Hb1bX5mBQsDsre8bcuEs/WVeTYr0+avGx w==; IronPort-SDR: jeOkewXojZEEW/u7rYtZdrSK9D0aQgmfXhNj83by+y/6AfqSWVvr59c3H7aAE2Kn5VRf2Ix/P5 c100UQQhOXJZT3CMjeZv4a4/Oq39gxNQYp9HUN/ajaVdiXh5N9ff1wVPidXVhfUhup64/QmQ0N gBKhgNL5eCNFendohLOOVrYHFrEjR6bWzLd4hfJXCRiDSd1sBZRvRM89CsIBGpj75p/FHr1Igj EdkYWBTbMZfgR4MBFXeRs4+S9fhX4fE52fEZeDrRmpO7WEnBCWG0qCbCDYF3Gf5PmO3fURNCo9 2oM= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483496" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:01 +0800 IronPort-SDR: QbxyO2I9JObZrdu5fAH8iu7EpHMrmGnyaWtXSlT7sE+uXml80q1aRL1fjlBH7I+ID8Pv5OZEjM jmHOonu7i5YcHI3+nrdkwJlZdJl3HrNbdssmws8jfiWgrRnnynCIPOfc+LMIV7v6iTBDCHAY9E A3n73Ly+PenCEX99SlD1QlddpKqEv/XKvKQxFxANx322vmcil/BQ3IxcWhozs7B5Snn2yKvWR2 lvmfJBse80rIsulzVeI0PiqbZVUjrzU+hyNWs0/BEAf0sQtUFP15mAqJwFmxj0VYrgH+GoHACR NRljZZ51oFhuFrV/7RXrBTg5 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:27 -0800 IronPort-SDR: e8ZRGeefFLl9fyACTmPhDkolUe9PxYQU57HxV3qH3NXhG2lDb523uPXmj+k4s5rr28D+Rd1lrk Ej2SjcIji1k5NAz08HzmDK7A6Ya0wbK8ulP/Au1E4COtVG4rf2/2m8CRT31kznyUVbJyo2XS5/ 3ouNrm7vgRKmfwkK6+cR4Yaa9RyL1OeCMaqTYHB6k3TEjIDuO7OE7ZJSenGc9LMa2w8X/ynnGd K6UBf1NBeBDY3QkW5iACE1rUoybPapuBs3Eu+MtEcQwjTXxpaELlO16eMqtVoJAk9OanUrqFYO j7U= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:25:59 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn , Christoph Hellwig , Josef Bacik , Chaitanya Kulkarni Subject: [PATCH v14 01/42] block: add bio_add_zone_append_page Date: Tue, 26 Jan 2021 11:24:39 +0900 Message-Id: <2a0a587139de5586a2c563e4d43060b9abcbf1ed.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Add bio_add_zone_append_page(), a wrapper around bio_add_hw_page() which is intended to be used by file systems that directly add pages to a bio instead of using bio_iov_iter_get_pages(). Cc: Jens Axboe Reviewed-by: Christoph Hellwig Reviewed-by: Josef Bacik Reviewed-by: Chaitanya Kulkarni Signed-off-by: Johannes Thumshirn --- block/bio.c | 33 +++++++++++++++++++++++++++++++++ include/linux/bio.h | 2 ++ 2 files changed, 35 insertions(+) diff --git a/block/bio.c b/block/bio.c index 1f2cc1fbe283..2f21d2958b60 100644 --- a/block/bio.c +++ b/block/bio.c @@ -851,6 +851,39 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, } EXPORT_SYMBOL(bio_add_pc_page); +/** + * bio_add_zone_append_page - attempt to add page to zone-append bio + * @bio: destination bio + * @page: page to add + * @len: vec entry length + * @offset: vec entry offset + * + * Attempt to add a page to the bio_vec maplist of a bio that will be submitted + * for a zone-append request. This can fail for a number of reasons, such as the + * bio being full or the target block device is not a zoned block device or + * other limitations of the target block device. The target block device must + * allow bio's up to PAGE_SIZE, so it is always possible to add a single page + * to an empty bio. + * + * Returns: number of bytes added to the bio, or 0 in case of a failure. + */ +int bio_add_zone_append_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int offset) +{ + struct request_queue *q = bio->bi_disk->queue; + bool same_page = false; + + if (WARN_ON_ONCE(bio_op(bio) != REQ_OP_ZONE_APPEND)) + return 0; + + if (WARN_ON_ONCE(!blk_queue_is_zoned(q))) + return 0; + + return bio_add_hw_page(q, bio, page, len, offset, + queue_max_zone_append_sectors(q), &same_page); +} +EXPORT_SYMBOL_GPL(bio_add_zone_append_page); + /** * __bio_try_merge_page - try appending data to an existing bvec. * @bio: destination bio diff --git a/include/linux/bio.h b/include/linux/bio.h index 1edda614f7ce..de62911473bb 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -455,6 +455,8 @@ void bio_chain(struct bio *, struct bio *); extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int); extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *, unsigned int, unsigned int); +int bio_add_zone_append_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int offset); bool __bio_try_merge_page(struct bio *bio, struct page *page, unsigned int len, unsigned int off, bool *same_page); void __bio_add_page(struct bio *bio, struct page *page, From patchwork Tue Jan 26 02:24:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048227 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAD81C433E6 for ; Tue, 26 Jan 2021 20:06:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A9F6F22DFB for ; Tue, 26 Jan 2021 20:06:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732796AbhAZF1c (ORCPT ); Tue, 26 Jan 2021 00:27:32 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33036 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731695AbhAZC1J (ORCPT ); Mon, 25 Jan 2021 21:27:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628028; x=1643164028; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2OGmGEY1byakHiROOdFFJ6CO7FmJZnKKJJ8Xq5Cei/Y=; b=Zb6U8SVOcQOSsgrFyizNk/93NzUburQTZLfA7prblJ59Wm1KDkcpJLYD 1tvH9Keo5ifBpTiIrlisfBO6I558zwPQnCRMJnf154gew3fZGC8KljS9D UgyR9V+ky4/efPoH+yy5u6RgxMXHfMmtL3DcZeIJO/ugtZ73MggH+gOUJ hH4NPTB4SO9LEjYgAIgysboDgJRlIqRGtvUnt5N77prypjQS8V1dNMVCK dgC1WWa6Dkg42jG7oQy9G9gs4oKPY9p+pqL5thg6Nw8EmZSNpkju97zl2 ZOMNm9sd5XefQuZG5A5PG6Wi+y9n5tJLdNIo+0yKyDQk2+HI2iHIZiQMv w==; IronPort-SDR: 9uMIV+3lpmKaie2gs1V+OrxLxXnW3l7nxUEtqglKV6d5m+HdmR5n7g2gd3i8wtaUpVKtlWDW3L KYeOp02KTjbTnoMUin/rL+uQ7RV7e8A1bKl0EgB9MDg2KCBcKEFfBaoi9vn4nRmur8SWxMGTiC WxkaxQllboPoMsYa2PTYpG4LX3qqFQDVz+jD2pqVVY8R1Qd+Gm4SFQE/v0JoiwPuYj3X6IU9h+ 6Tzh9AUR8uaAizcL6gl+o/cWc3jebLhoFp4S2FRKs1ihhV3ckuKF3DZv5wZ13ShbwL026Z7DSz 4HU= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483498" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:03 +0800 IronPort-SDR: voxKt2k8XTMLHziKkQpsDO3WCy5Ld2YeRJ/9BLVGKybsMh7UKnpyIuGKplT2BUb+o48mB5kuLT C6YT/r7JozYVdMahuqy//GExy+LSkIPO6u4NvOnxDD6iiRukqHtWV8+Vad1ECiitxxqOZckqD9 oZ9ME129tlAdQmrr77pJKyKxHiwbq82wYiTXYZQuNw2LBOHZ4wiAxtSUyKn+NlN8+v9wega5P8 1T8iK+t3KPrQNotW169bHJeU1mGE0lM8wiIERPGDSH2EjUoNyv9ft0I3QKPWQz1YEw5wHh+n+V XdcXiAVrcH8cIO5V3ibneWn4 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:29 -0800 IronPort-SDR: HRuVdQSnQxgqAFRTinpTBZiGcpiMr1PDelXmGJ88FKWQEX1YVXet/tB/YpWsBpCwBE/LaiiGEM x2hNYiiVUj/0QOYjBLb3LDxQK5dKiriix0YyWxOPybUuy4lAle5tp2J8uF+IpM9qANKtFQha7r 2YPVYjxtc6qCVl1+dxyJTc3tNHCAwLAgpNk+pKCL/xyx9BgnjuI2DUzSurnwRhBLL8rVppBl+e ffQ34zWrvomP0afXvHDxxw6ulD0Ks5ce1FQloMAeDKeLgFumEmyHKvamrijlWzk2FVLT6z+idB ilA= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:01 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Christoph Hellwig , Chaitanya Kulkarni Subject: [PATCH v14 02/42] iomap: support REQ_OP_ZONE_APPEND Date: Tue, 26 Jan 2021 11:24:40 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org A ZONE_APPEND bio must follow hardware restrictions (e.g. not exceeding max_zone_append_sectors) not to be split. bio_iov_iter_get_pages builds such restricted bio using __bio_iov_append_get_pages if bio_op(bio) == REQ_OP_ZONE_APPEND. To utilize it, we need to set the bio_op before calling bio_iov_iter_get_pages(). This commit introduces IOMAP_F_ZONE_APPEND, so that iomap user can set the flag to indicate they want REQ_OP_ZONE_APPEND and restricted bio. Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Signed-off-by: Naohiro Aota --- fs/iomap/direct-io.c | 43 +++++++++++++++++++++++++++++++++++++------ include/linux/iomap.h | 1 + 2 files changed, 38 insertions(+), 6 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 933f234d5bec..2273120d8ed7 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -201,6 +201,34 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos, iomap_dio_submit_bio(dio, iomap, bio, pos); } +/* + * Figure out the bio's operation flags from the dio request, the + * mapping, and whether or not we want FUA. Note that we can end up + * clearing the WRITE_FUA flag in the dio request. + */ +static inline unsigned int +iomap_dio_bio_opflags(struct iomap_dio *dio, struct iomap *iomap, bool use_fua) +{ + unsigned int opflags = REQ_SYNC | REQ_IDLE; + + if (!(dio->flags & IOMAP_DIO_WRITE)) { + WARN_ON_ONCE(iomap->flags & IOMAP_F_ZONE_APPEND); + return REQ_OP_READ; + } + + if (iomap->flags & IOMAP_F_ZONE_APPEND) + opflags |= REQ_OP_ZONE_APPEND; + else + opflags |= REQ_OP_WRITE; + + if (use_fua) + opflags |= REQ_FUA; + else + dio->flags &= ~IOMAP_DIO_WRITE_FUA; + + return opflags; +} + static loff_t iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, struct iomap_dio *dio, struct iomap *iomap) @@ -208,6 +236,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev)); unsigned int fs_block_size = i_blocksize(inode), pad; unsigned int align = iov_iter_alignment(dio->submit.iter); + unsigned int bio_opf; struct bio *bio; bool need_zeroout = false; bool use_fua = false; @@ -263,6 +292,13 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, iomap_dio_zero(dio, iomap, pos - pad, pad); } + /* + * Set the operation flags early so that bio_iov_iter_get_pages + * can set up the page vector appropriately for a ZONE_APPEND + * operation. + */ + bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua); + do { size_t n; if (dio->error) { @@ -278,6 +314,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, bio->bi_ioprio = dio->iocb->ki_ioprio; bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; + bio->bi_opf = bio_opf; ret = bio_iov_iter_get_pages(bio, dio->submit.iter); if (unlikely(ret)) { @@ -293,14 +330,8 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, n = bio->bi_iter.bi_size; if (dio->flags & IOMAP_DIO_WRITE) { - bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE; - if (use_fua) - bio->bi_opf |= REQ_FUA; - else - dio->flags &= ~IOMAP_DIO_WRITE_FUA; task_io_account_write(n); } else { - bio->bi_opf = REQ_OP_READ; if (dio->flags & IOMAP_DIO_DIRTY) bio_set_pages_dirty(bio); } diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 5bd3cac4df9c..8ebb1fa6f3b7 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -55,6 +55,7 @@ struct vm_fault; #define IOMAP_F_SHARED 0x04 #define IOMAP_F_MERGED 0x08 #define IOMAP_F_BUFFER_HEAD 0x10 +#define IOMAP_F_ZONE_APPEND 0x20 /* * Flags set by the core iomap code during operations: From patchwork Tue Jan 26 02:24:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048223 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42C71C433DB for ; Tue, 26 Jan 2021 20:05:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E776822B3F for ; Tue, 26 Jan 2021 20:05:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728156AbhAZF1s (ORCPT ); Tue, 26 Jan 2021 00:27:48 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38250 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726573AbhAZC1n (ORCPT ); Mon, 25 Jan 2021 21:27:43 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628062; x=1643164062; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=A8PFG3Diie06Y2nM5+Pb7vgzC9jKX3Ev6iZz/zowzC8=; b=Ojhq2rt7TkWrMDZlqqbmLmydalhLLLzbrfm+cZC9GStIM/d7Ho1GzTtg 5qocebYNhXDRHz10wsM5weH0SYnLlxQwph+hSTTRH2t7oPFr369ICgrVb ijvJmIfGUAjhd/NX0yYSuJ1633ZZ9cROllR8oh9+ZTRLpKRFkJlmZi18i 7yaEnm+a0b0q/gA1ORacTp6UHercfvOReilVHqcpXSzuDXeXgdKzYM3ax dUv5pJLeA0sGDh34yfXfedzEuehV64Ki+SiL6/4pcUd4AXK7qT1VwC9TW SW3y95MqHeZ+9pMxNXN0NKro7nD02uFXGA73KBsJvaSTmGEqMz+ulSb+i g==; IronPort-SDR: lsw/RpoY+Og1AYFDO2vQAIJ6x/UUa4N+WxcTmlqTIEWPKJ+Tnb4Ji26zVkiemhHL64rZEITy16 z8E23cKpBjowUUL1gehIJNgw0yt80B9WK1sUaVzW1QrCwKTibbVNUUGU0HrwdvUmgUVU6LtoUq FylkgTiCYwftIjbkRkenU70YV5fELfVapAUHHuFjg/WNhFrbnhE/avSyFs8chTbTzkbTbLTGm7 eyXcJLSpGZs+WvX2bzCsxemMT22quL+lP1UqL70+4pULqPrHN6xX20tiMrRURtag98gFZNV6GQ JCU= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483502" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:04 +0800 IronPort-SDR: Nq+72Ti/JXAMvJ+15nYOcpYgUgnWYT0rnJY4k9NJa1g9We/Dm9Np8kctCVhuaCkX8voegpLpIE uGxHCftCyH6ziZxF8zsDhHJVSCWGYQ3FFm5hiUhUNoD20Fe72qZY3wwfCEyFOhcW8rxiXwjEMb Hewjq/0OyrMGBHZxv1B7BcYANnXJABBXfKwyirYGhNXsdD24MwURTOOD2OimMAhp40LogRQkGu O+NaMjQFyHri6Kf73OsoFWmYVIkqlaVwdzBBZFUmfELz5CG5fuPx0e3UjSd5Jz4+3lxHAKHodA XYnl38rcKCHGDwx1rW1ok6IE Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:31 -0800 IronPort-SDR: rDrtun07R3GdynqfAyO+viUrPAJg8nAlh2+MkkpTtcHGBggLcTHmw50M9T/gRRnbTF1wwOqIwo gw268IU1AURbVAxcmL8fuC+AFWB+lnR1R41dVJIdJx3T4WYpgUiV730GIhQz4f4+kY9qZA82hS r5Un9bE70NoebBbI6ep9FnGrGG/dsWGO8AwaDnC+V1e6z5yDiTbuRhYbywNt+j7IJn+bKAUZhj 1xqhjGlpWYPyqHvlRlRBNeD3kOmc9ODgjpj+VAlEH6Yc2ISCiUL8+V76x4G+mMjL4i1u6rmdMZ LeQ= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:02 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 03/42] btrfs: defer loading zone info after opening trees Date: Tue, 26 Jan 2021 11:24:41 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is preparation patch to implement zone emulation on a regular device. To emulate zoned mode on a regular (non-zoned) device, we need to decide an emulating zone size. Instead of making it compile-time static value, we'll make it configurable at mkfs time. Since we have one zone == one device extent restriction, we can determine the emulated zone size from the size of a device extent. We can extend btrfs_get_dev_zone_info() to show a regular device filled with conventional zones once the zone size is decided. The current call site of btrfs_get_dev_zone_info() during the mount process is earlier than reading the trees, so we can't slice a regular device to conventional zones. This patch defers the loading of zone info to open_ctree() to load the emulated zone size from a device extent. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota Reviewed-by: Anand Jain --- fs/btrfs/disk-io.c | 13 +++++++++++++ fs/btrfs/volumes.c | 4 ---- fs/btrfs/zoned.c | 24 ++++++++++++++++++++++++ fs/btrfs/zoned.h | 7 +++++++ 4 files changed, 44 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 5473bed6a7e8..39cbe10a81b6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3257,6 +3257,19 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device if (ret) goto fail_tree_roots; + /* + * Get zone type information of zoned block devices. This will also + * handle emulation of the zoned mode for btrfs if a regular device has + * the zoned incompat feature flag set. + */ + ret = btrfs_get_dev_zone_info_all_devices(fs_info); + if (ret) { + btrfs_err(fs_info, + "failed to read device zone info: %d", + ret); + goto fail_block_groups; + } + /* * If we have a uuid root and we're not being told to rescan we need to * check the generation here so we can set the diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index badb972919eb..bb3f341f6a22 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -669,10 +669,6 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices, clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state); device->mode = flags; - ret = btrfs_get_dev_zone_info(device); - if (ret != 0) - goto error_free_page; - fs_devices->open_devices++; if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) && device->devid != BTRFS_DEV_REPLACE_DEVID) { diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index c38846659019..bcabdb2c97f1 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -143,6 +143,30 @@ static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, return 0; } +int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) +{ + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; + struct btrfs_device *device; + int ret = 0; + + if (!btrfs_fs_incompat(fs_info, ZONED)) + return 0; + + mutex_lock(&fs_devices->device_list_mutex); + list_for_each_entry(device, &fs_devices->devices, dev_list) { + /* We can skip reading of zone info for missing devices */ + if (!device->bdev) + continue; + + ret = btrfs_get_dev_zone_info(device); + if (ret) + break; + } + mutex_unlock(&fs_devices->device_list_mutex); + + return ret; +} + int btrfs_get_dev_zone_info(struct btrfs_device *device) { struct btrfs_zoned_device_info *zone_info = NULL; diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 8abe2f83272b..5e0e7de84a82 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -25,6 +25,7 @@ struct btrfs_zoned_device_info { #ifdef CONFIG_BLK_DEV_ZONED int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone); +int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info); int btrfs_get_dev_zone_info(struct btrfs_device *device); void btrfs_destroy_dev_zone_info(struct btrfs_device *device); int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info); @@ -42,6 +43,12 @@ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, return 0; } +static inline int btrfs_get_dev_zone_info_all_devices( + struct btrfs_fs_info *fs_info) +{ + return 0; +} + static inline int btrfs_get_dev_zone_info(struct btrfs_device *device) { return 0; From patchwork Tue Jan 26 02:24:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048221 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A23BC4332D for ; Tue, 26 Jan 2021 20:02:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 570EF22B3F for ; Tue, 26 Jan 2021 20:02:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732827AbhAZF1x (ORCPT ); Tue, 26 Jan 2021 00:27:53 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38256 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731704AbhAZC2D (ORCPT ); Mon, 25 Jan 2021 21:28:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628082; x=1643164082; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lXJV3aeX/g+21wp5YVCT9XcbjBESAfZEy0QiID+uj0o=; b=VsSS9SoJry91KFc41KW0wz26+PYa3T0cbd89eL/jj0doFfBCzb+dwGV4 MUpo4Cg5NwWgYFHL2N0v2xD4FqhwT5kI1MkYNiUCnr0Y0gtdVoukPm+DR 7JJUKf5qrgPBLuwdfhCtu1E0wxcuQO4k3yTZVSRVjZ0SHT2OjgQQpqexM L8s4YGRR/lm4iG5f0Mck2cRXju0CeXGC+gYJGRRDcz2Z3R3ELmHmjDy3N 4/S2cnS/WEdEK7W/xunDS9Qrz0zN3Fov/1rbwgIll3i28r8CuqL5QqcZY VhbxYRSYFKVmiWP8wHHnOkVJkO/+ywo2BZfk/Z8g44XkxIqTwdKw6lZW6 A==; IronPort-SDR: zA2i7CKZNRloVS8r9DSsxL+RDyhh/lGxAhUGR4WVPaDAuqZjgt8ptpxDqqI9EJCHnFvKDxsQgp WzCP7QtYBF1D9tfIvEyVSi6CsYc4lZMNW3zkMM4ph/S1gdKXlpzYlczrUUgqKBOHXmvb74rcca JESiIyfkSX8SB3YYW91p/paM4Hm797Hw5smlHQtshAwTD50uXT8u38pWH8RzKWGziIDl0wGC8M 26AWfBbV2YWfWfBq6PzMc1RkXP++leRMsUUTsIv5Fr3hYonuzTeVO7H+4KMv1RyIWUrd7wKmE3 A6c= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483503" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:06 +0800 IronPort-SDR: 1qn8nUEXjtgzZciZACRwIA8iiyD54TL9l1MWQfl7PMIT1o2OE9g3VAmFy2nDGQngSvMckwhMTF d9P/J55rOM8hhLzvTGJNgv5/LJ8wByrti2ySID4b0MZODqxgvWqZoksYes2NK5CFM8FwFkqhsH AHfVp3dnWOq0mz6xWD5H0/vmBi+wHury9DEkkCk/B+Aj3cLI0Opy61W/kZspVNs4hzxLKzhKO+ qp9GZKtRgMsVaLZZIknuKpcfCUTpBUYJDygRwTcjcY3XhlKhb4h3a/+GxXuoMn3ZLyHhmbcSqg M0L8VpfNqGYn1CMwKMTXoi/f Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:32 -0800 IronPort-SDR: qpKvojXeTIVxTVJHr94+WsGViCL1t5e4WWjrEXIdARjN6C6ms7shPbGxWgQkCzpJ1tW10hmlfP gsvctj/yp+adt7xFQKSzQwep8QfpXa89RYMSmZrIExRgFnbm4K9XwlZbzAOLIEN0Dle7PIyCI0 zXSpbR7TC5+nlHMI/ADvHc7FheEWFf1WPJfrLhhPCUrkKBjKFzQ9wl4dtBRshXMdD1RDKbiMkg cdi3o9ulkfSKjErjwiYEAK67Wf3lsMvLJrkC+SA+SOrSsWuwYN+LGr+BMZo8j3Y7mzeHRfxTzc ACA= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:04 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 04/42] btrfs: use regular SB location on emulated zoned mode Date: Tue, 26 Jan 2021 11:24:42 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The zoned btrfs puts a superblock at the beginning of SB logging zones if the zone is conventional. This difference causes a chicken-and-egg problem for emulated zoned mode. Since the device is a regular (non-zoned) device, we cannot know if the btrfs is regular or emulated zoned while we read the superblock. But, to load proper superblock, we need to see if it is emulated zoned or not. We place the SBs at the same location as the regular btrfs on emulated zoned mode to solve the problem. It is possible because it's ensured that all the SB locations are at a conventional zone on emulated zoned mode. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/zoned.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index bcabdb2c97f1..87172ce7173b 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -553,7 +553,13 @@ int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw, struct btrfs_zoned_device_info *zinfo = device->zone_info; u32 zone_num; - if (!zinfo) { + /* + * With btrfs zoned mode on a non-zoned block device, use the same + * super block locations as regular btrfs. Doing so, the super + * block can always be retrieved and the zoned-mode of the volume + * detected from the super block information. + */ + if (!bdev_is_zoned(device->bdev)) { *bytenr_ret = btrfs_sb_offset(mirror); return 0; } From patchwork Tue Jan 26 02:24:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048219 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40398C433E9 for ; Tue, 26 Jan 2021 20:02:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0507122B3F for ; Tue, 26 Jan 2021 20:02:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732835AbhAZF15 (ORCPT ); Tue, 26 Jan 2021 00:27:57 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38278 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731710AbhAZC2G (ORCPT ); Mon, 25 Jan 2021 21:28:06 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628085; x=1643164085; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=uqJc7j6oBZt9woy6vA4bYp+7GoqizOx2pcMwC5qjXG0=; b=PebN9baQhxfqxKm+0RGc241MU84MJu1vC7c1uS2gEUl/eaLUGDtwH/UX M8KiyMBflgRqWKTQOKRA7w3QxzHQbj7G7hqPK5Dh0NMSCwDTUqQWDgiqC 75jcrFWoU4v8z5wwSZVaZVAF3i81BEOyQQbia1yzVqRCAl8rW4WtQI/zA c+rogQwt+DUmKafuwb15mzW/4z8emtVixQ2EmdEaAU/wouDR8KkLXg5Oi Nriqj2cimT5zy0ltkUMNfXGzi9RZ0M0tm8lwBE3riD2B0J8py5+F6+QGA F+IIFU7tJ1yQV6YVvofxAYztaggr8lEE12oqI8UcLgXRb4v12lOVy+s1E Q==; IronPort-SDR: CWEbwP8nInw4dOQfp6X0vB16/35DTFJVkiYJandBlc2E2XuH1KYYkmCc1hLtARQ84Id35x4fOE HeuzTybe8Ixvt6Hi90Yn60RqxMQfXFnwFdbVmzlg7Yo/ZmBwlXNQjxg2WhzFFLelpGk3d0sxW8 6ONsazBMbgzwh8+OeA3sGRODrxOE0m84WLDS8kBxZZkyhd6eZtssTvGoCazxCyWM7G+R22sX8y cYgB4IpF5TOGrufIekRZcjOr6cWvBrj2tNzCuyZsP8oRT+pZnaggyFJlyw6tya9MXt+iYI7IyN NAU= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483508" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:08 +0800 IronPort-SDR: IPNcbEX85NVJ7KX5QzqTf7vwcm0nI35ptyQVwjRRphiNmeR/xJiR4bYFBhvuApEMI4MQN+sv+e OU3bxFhpTq9Du0qmPa4M7p3HDPsA23wI0g7kCmEQhbrTi00N6WLR1ravir4bP+ijWDFMAQvH5c jD+ltcqKXg2eqeoxEYr0U0etbbTc2cqjq74sFCSWdR+F8t/KIGxDPXVPrHnbsZFYx3Edip8G5k RbgcNcWHQ2RATcTwtdfUOV6erBe49dJhnVO6MaUFa3wotdRUF+U1qhdIwHRZhxbbVAYaBjAXif 7yfP+54OrVFmU/iIJnlrWUbV Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:34 -0800 IronPort-SDR: 858dfzxsVthZ3RJeur89u2hmLObh/+YUb9bjO7suOQvfpPGY1gtMyS4GpEghOp/ORrl0+tIDxd 7K3byeQ27Q2RQQcY3UNnCrl1txnInsh80RWjAxNOAJn26EJAEOalowPqm45MSKY0qz41tnRqtr CvRDvS5lF9P2zmjs8W8stpmvs359s13AKbivjxbyLc7OcuRXaoWnX0EXiT20QSPyArYXyB+Oif CHgVhHXasYHUz8ETI7mt5YwHtH5AUDC54TGpgc0Pb4QGMma989yqTiw97MUW4AOIiPJ7y4tTSM Lcs= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:06 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn , Josef Bacik Subject: [PATCH v14 05/42] btrfs: release path before calling into btrfs_load_block_group_zone_info Date: Tue, 26 Jan 2021 11:24:43 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Since we have no write pointer in conventional zones, we cannot determine the allocation offset from it. Instead, we set the allocation offset after the highest addressed extent. This is done by reading the extent tree in btrfs_load_block_group_zone_info(). However, this function is called from btrfs_read_block_groups(), so the read lock for the tree node can recursively taken. To avoid this unsafe locking scenario, release the path before reading the extent tree to get the allocation offset. Signed-off-by: Johannes Thumshirn Reviewed-by: Josef Bacik Reviewed-by: Anand Jain --- fs/btrfs/block-group.c | 39 ++++++++++++++++++--------------------- 1 file changed, 18 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 763a3671b7af..bdd20af69dde 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1805,24 +1805,8 @@ static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info) return ret; } -static void read_block_group_item(struct btrfs_block_group *cache, - struct btrfs_path *path, - const struct btrfs_key *key) -{ - struct extent_buffer *leaf = path->nodes[0]; - struct btrfs_block_group_item bgi; - int slot = path->slots[0]; - - cache->length = key->offset; - - read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot), - sizeof(bgi)); - cache->used = btrfs_stack_block_group_used(&bgi); - cache->flags = btrfs_stack_block_group_flags(&bgi); -} - static int read_one_block_group(struct btrfs_fs_info *info, - struct btrfs_path *path, + struct btrfs_block_group_item *bgi, const struct btrfs_key *key, int need_clear) { @@ -1837,7 +1821,9 @@ static int read_one_block_group(struct btrfs_fs_info *info, if (!cache) return -ENOMEM; - read_block_group_item(cache, path, key); + cache->length = key->offset; + cache->used = btrfs_stack_block_group_used(bgi); + cache->flags = btrfs_stack_block_group_flags(bgi); set_free_space_tree_thresholds(cache); @@ -1996,19 +1982,30 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) need_clear = 1; while (1) { + struct btrfs_block_group_item bgi; + struct extent_buffer *leaf; + int slot; + ret = find_first_block_group(info, path, &key); if (ret > 0) break; if (ret != 0) goto error; - btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); - ret = read_one_block_group(info, path, &key, need_clear); + leaf = path->nodes[0]; + slot = path->slots[0]; + + read_extent_buffer(leaf, &bgi, + btrfs_item_ptr_offset(leaf, slot), + sizeof(bgi)); + + btrfs_item_key_to_cpu(leaf, &key, slot); + btrfs_release_path(path); + ret = read_one_block_group(info, &bgi, &key, need_clear); if (ret < 0) goto error; key.objectid += key.offset; key.offset = 0; - btrfs_release_path(path); } btrfs_release_path(path); From patchwork Tue Jan 26 02:24:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12045395 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CDA63C433DB for ; Tue, 26 Jan 2021 05:28:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9E05922795 for ; Tue, 26 Jan 2021 05:28:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732569AbhAZF2C (ORCPT ); Tue, 26 Jan 2021 00:28:02 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33033 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731787AbhAZC3t (ORCPT ); Mon, 25 Jan 2021 21:29:49 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628189; x=1643164189; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=U2Q3c4GwozAQUdaajpyHuDs/frhojOzH17vuDUHkLHw=; b=O5bfiUEzREFo7FLlZ4vYF2x0tXjXtK5jkrC4JFrID+R7wH4vo1iPvAXX FvUwzKuNfRT6MRqMIx+c5oYUQPg+m+tBrY+VwbQC8tYpcxwsBH93Pq8ai R0iFcjZctTBXnPbFnV3HdNR5GXLC+O7rOJkC7ZEpj5ouThB7RaIJUUxZu ISjMQZlxTQ1CyKP04I0740pqlC5iFfy/PFriIiZmsHzaHHBxKyeqH9Hs1 obyI8Mf1sQjaQWlYydcgzmrQbUF7MYMH3xYfcS5SjrY3avMrKC0ieOxiO zbwnM4CJ6uVWX3nUI+W12y1Ei/0ie2b+z8CdDhqui9x8DPhpzWvxhjLzO g==; IronPort-SDR: Sh8oZs8cWX+7yKw/8NQhkR45/BwUijsWy5HsulOLq7m0Kd7hXgjUq5kndNoqJugYKkmkwjxs2J p/KDnkJtD3KINLHN3GbsigkG5Q9CuXF9ohKk9v3d1LOJlyAXfCTCH7FQQFQRykIgje8V/t+sXz tos4uWelaAAkQa5iSD1CWhwwpEUJYrSAGBzKEaQTtFAm4IXHkO70HrFF0LT519Ew3kjd1IEv0r Wk/k3r7Cefs66YH37H4+EhFObO39rah+uNEc40D8ZhibXurpftneJd7Z+ZCRMDJ3BWAtkMogpz Nxk= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483512" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:09 +0800 IronPort-SDR: 4Nn25OZWP+y0kIhosZ0sL94t/Pp620nnif3uKm8MqYd9kA5oZvH4hqSdMUEqHFj9RPo0rudS3L maS3huZ17sDA6vRkPrWzlpT0+81h6/CU0cNmc1hsZ3SHWpBG0+K+PHVy87ybzop6rgpEM8jxlY KZL+sOGFtkPZNCxhV/gjsTARb9WMlXvdNnOxdzMFJl+S+AqxtY3Qz0jDbecWefrKf0eBFKA2OE puJyu7Nf1QJYmSp0hEdyqj++3fb0/nVpZqmIf2x9UfOaydFFtg2yqRDmI69GUYYtr1Rq2WvMpU bz0g1t5/mrta3a0r1tmw/Ukg Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:36 -0800 IronPort-SDR: YA5IfCmLzewVhqRdpi2e9LPRFK6pOkhuEgSzFvIZw9Bmq6Hce0EtBd7RF65ilZ9S+AEGdIF0pH rwvaYHK3NKJrMROgWdqzpB11AAE38hD/ACkicEKEutU85bxLbOBR0+zIqCgqhV6fiG5kXmomRN kuXXsNTn1kNn3XsYZR2F0bUrsVvIjYUjqZTem/ddUCIZQZqQ6KSp3u6BN5+ylpb7x5wOnrHmrt N3HIAL2fRyjNhPXeJSoeJ+wDMOuox9Ka5+e6Mr9NCuyB3XKabSP3byY46SXTAfDdNPfY0eeCPf 2Bs= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:08 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn , Josef Bacik Subject: [PATCH v14 06/42] btrfs: do not load fs_info->zoned from incompat flag Date: Tue, 26 Jan 2021 11:24:44 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Don't set the zoned flag in fs_info when encountering the BTRFS_FEATURE_INCOMPAT_ZONED on mount. The zoned flag in fs_info is in a union together with the zone_size, so setting it too early will result in setting an incorrect zone_size as well. Once the correct zone_size is read from the device, we can rely on the zoned flag in fs_info as well to determine if the filesystem is running in zoned mode. Signed-off-by: Johannes Thumshirn Reviewed-by: Josef Bacik Reviewed-by: Anand Jain --- fs/btrfs/disk-io.c | 2 -- fs/btrfs/zoned.c | 8 ++++++++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 39cbe10a81b6..76ab86dacc8d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3136,8 +3136,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device if (features & BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA) btrfs_info(fs_info, "has skinny extents"); - fs_info->zoned = (features & BTRFS_FEATURE_INCOMPAT_ZONED); - /* * flag our filesystem as having big metadata blocks if * they are bigger than the page size diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 87172ce7173b..315cd5189781 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -431,6 +431,14 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) fs_info->zone_size = zone_size; fs_info->max_zone_append_size = max_zone_append_size; + /* + * Check mount options here, because we might change fs_info->zoned + * from fs_info->zone_size. + */ + ret = btrfs_check_mountopts_zoned(fs_info); + if (ret) + goto out; + btrfs_info(fs_info, "zoned mode enabled with zone size %llu", zone_size); out: return ret; From patchwork Tue Jan 26 02:24:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12045397 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 958BAC433E9 for ; Tue, 26 Jan 2021 05:28:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 682FA206F7 for ; Tue, 26 Jan 2021 05:28:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731064AbhAZF2P (ORCPT ); Tue, 26 Jan 2021 00:28:15 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33036 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731793AbhAZC3u (ORCPT ); Mon, 25 Jan 2021 21:29:50 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628189; x=1643164189; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fUVTGy0Bi9cRVthbe+jEVQeQBcaicfUle3JY11JdKm0=; b=cNKweC9b83jdHXtmVnf9CvdDBqwKracOGa/SYAAn9zMLfjZFF2CuCbtO p7DG3rp5pKqilvZ474S0Cn84E2o/erjO50uQ4Yoa+tf3fGcDcOnHxZKMW ZigWZOvZ7nlTXjpekKmttRsHmJ+tE3ETPYdXceLl4Gkhuj3KxMyMJNsLq B509nN5Z+zTkmSqi82/bFeiyZefPQosWls0MN+l4r7eD948YJEZmBbmK/ KM5n4sBAfVzWV7OovKtvH9PdU6Zl4Gesn62kfB0UF4QAxWJwFmCprzE4d el9F1PTWHabROSz8rdzEcvTXtsX/9rSQAaw0t88WsVV1cmpgldHLazqGt w==; IronPort-SDR: nYsvmXWaojjp2tx9MpeSj1NYGGYnowy+8cAZsbuwshKfnc+WXzuyfdBhS74TncGUBUVK+DgWi+ Rbg1ogUKLB5WXS8MY3hW5bocLQFm5/D1KBdo/VaxQSnebQunRLJ/pH0Lo6DKYQoyy+dCh6t7dw WrOJHEGDp3idHMVHdGYEvsHlb1lTO/p6y1tbNPNCXXJpJEigHGlfh1vFSc4pjyjjAu9ffYoLbS /CEP85Dt+eYZHeS5R26S0IkecFEKyMoOyHiqzizpcp+Xjbc9WiYPEyhsCWZma7/SGg59PGPdu7 RUg= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483516" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:11 +0800 IronPort-SDR: qYoeKXZ2vWkS0smhCd/DvAEhwyX3DVHdwZpn57hbDgC18oup99zbwzf80Qz/RiSFm6L6gNT5Rc wgHZh+R2SbxA4MQl0z3Twb4cUHwHRgkkwOzAgJEqRFKWYIVyC1iG7QGFY7Art3QAJ/j8XDMzVe hFPEbNWXp3NLlJoLMA11spx+ua224kDBZYU/WHgkeIc0Ljmd1hxUHoSG95F6BBznP5QSGl9+Nz Af7fFewQf7WO8n6mnmd8NpycBRbMCt2yIqcbxzoks1FkqLji3ZR2nE1c4UnzEDVGDSqBUabXsY mOHspeILL0mm9ASPE4jIf4Qt Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:37 -0800 IronPort-SDR: mrtHJB+oP20vk+8uCo3Ck54/lrUjbqvR+wmebpBiYSI6bEVbm5x2+i+xmKXWshRt6vVQ5hfry/ V+KXw7frJDF3jn/wvbKF4awTN7bpGj2hhxdBkNYlKgTQUlaYPpjJ9VGTglGaG1z0l0oL4Zu5sQ pEpg6TJMgoh3hE9brTdbW8gABoIuJDCGJ0g/PWHRPxrAdUv+pguDNQAeqpsjjHSdwbLmHxVGTP KxvjSRpTph30SFt9cdtagMSP3Qb/IpUJQGKFPHINnCO1vB3oUH3Fk6O2PvuUjfidIAAJ0tZeD4 r+w= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:09 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 07/42] btrfs: disallow fitrim in ZONED mode Date: Tue, 26 Jan 2021 11:24:45 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The implementation of fitrim is depending on space cache, which is not used and disabled for zoned btrfs' extent allocator. So the current code does not work with zoned btrfs. In the future, we can implement fitrim for zoned btrfs by enabling space cache (but, only for fitrim) or scanning the extent tree at fitrim time. But, for now, disallow fitrim in ZONED mode. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik Reviewed-by: Anand Jain --- fs/btrfs/ioctl.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 7f2935ea8d3a..f05b0b8b1595 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -527,6 +527,14 @@ static noinline int btrfs_ioctl_fitrim(struct btrfs_fs_info *fs_info, if (!capable(CAP_SYS_ADMIN)) return -EPERM; + /* + * btrfs_trim_block_group() is depending on space cache, which is + * not available in ZONED mode. So, disallow fitrim in ZONED mode + * for now. + */ + if (btrfs_is_zoned(fs_info)) + return -EOPNOTSUPP; + /* * If the fs is mounted with nologreplay, which requires it to be * mounted in RO mode as well, we can not allow discard on free space From patchwork Tue Jan 26 02:24:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12045399 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65B17C433E6 for ; Tue, 26 Jan 2021 05:28:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2E29A22795 for ; Tue, 26 Jan 2021 05:28:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732858AbhAZF2T (ORCPT ); Tue, 26 Jan 2021 00:28:19 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33029 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731802AbhAZC34 (ORCPT ); Mon, 25 Jan 2021 21:29:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628195; x=1643164195; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jKNkW9XEzouK44pSW6LvjnQUz0h9XmFJeiwLvA02EHE=; b=VqhpNIiDMOFHeOenF0i5dfguqPBzayYzY+7R4oX3JfUHw6ISiSxenydm 54SEdxZTWjhDnplc64z1Gna9OLD2v8AMLS0YL+ACPE/e3RuOyUmZxakMv uAO1LCOYavSYBOd07U9pZqvyMNLVN7pmv7eeaJWggL2qCqef6HSilNaPk 0mGhvCCLFCxMIVLUyhe0+tW0/TcpQTnlmn/yVztDcAMqXuv+7JAxYfmtS fPy9P7B6UsGNLYBgbA9Ila+My7xUa/ug8hkB1qZFEL8EMbMwlQ6HYDV1g YbLqVBU5agcWa+lA+G84S21TTlGR0YN7UjyuHMbD7J9M2l89noRVM409V w==; IronPort-SDR: a5ZtxcUbqS4h26SNSaQStjn9dO7Xu3DVZksfVMK+Fv4qiCwL+GNnswcI1V3RtVkn8EEkHambwz kgGOtxZjAYK4hcCprroMYS06fSxgXo8IU6gJdXnb4WGoSLSq4ZwiPknElrZ3xznrY+N4fG6TTM Q8y+aD9SLc1rFOMQcek/xqCFrjkjZoYwfAd6WRlBW0+qBJ6p8mFy12qLQFHBK2wfut8ikPiq5N UrxK4ndJWSbcGhObSmlu+cwAu0IIeH51WYgZ37JsnR9fRAuRy2IcwOwL1ohPpk2G+FLd9fd6Z9 3NE= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483520" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:13 +0800 IronPort-SDR: gtv84v1IhAUSerLqjrPL2eyTbK59nEr9L9qPjU16pYGI67UV0eAFik3bRBiecx+v9HcwNufAZn H49HTY5U+QCAY27ZhzX4E1dAn2NrmrSSfBj4VtWph4AufCFKZFLj3kun3XsO4N6HxcE8tVvoY5 meB32rJepqSGuV3/D40a1uRB9UbFCBChPXvSiG/cmRDCEchu8div2XH6HnxqT3vnTMSi60eBTM +M3Jqox598dqBFBp7C+kD0t6YRRRX8702KvTyix04DUsQ5VYMJoOoBxA8a+e3IbcmV0bQSqI/4 yONOIGltGV0V9wFn7Z+T7HMM Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:39 -0800 IronPort-SDR: 9DaBX38BzwfZzaVrmDmJOCFKKq1HG9U8KZWQHgIfsPOMqfgPojgNR3muFLksBF/jxFsPTO9rdE sRYb1A4wW9aB6Hj2kSEhqh9hXud9Qm7jHcge7pzPlm6C0WE+pCfNUq6VBJzPbP+j/wrSjGAYs5 OlpmFNCS/SS4Hp27CINWYWCefD800jorP1JwogeV7NGBbGpdF23ZH6336cYTzcEPDfsB1+WnaP NwGOIPmYpDAdP+DlLYCQ8UKcJoNFjQsRNCM3i20vyrFnGPx14Kl6mHR6mKx3oL6NOF5pYyXTSR mt8= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:11 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn , Naohiro Aota , Josef Bacik Subject: [PATCH v14 08/42] btrfs: allow zoned mode on non-zoned block devices Date: Tue, 26 Jan 2021 11:24:46 +0900 Message-Id: <613da3120ca06ebf470352dbebcbdaa19bf57926.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Run zoned btrfs mode on non-zoned devices. This is done by "slicing up" the block-device into static sized chunks and fake a conventional zone on each of them. The emulated zone size is determined from the size of device extent. This is mainly aimed at testing parts of the zoned mode, i.e. the zoned chunk allocator, on regular block devices. Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik Reviewed-by: Anand Jain --- fs/btrfs/zoned.c | 149 +++++++++++++++++++++++++++++++++++++++++++---- fs/btrfs/zoned.h | 14 +++-- 2 files changed, 147 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 315cd5189781..f0af88d497c7 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -119,6 +119,37 @@ static inline u32 sb_zone_number(int shift, int mirror) return 0; } +/* + * Emulate blkdev_report_zones() for a non-zoned device. It slice up + * the block device into static sized chunks and fake a conventional zone + * on each of them. + */ +static int emulate_report_zones(struct btrfs_device *device, u64 pos, + struct blk_zone *zones, unsigned int nr_zones) +{ + const sector_t zone_sectors = + device->fs_info->zone_size >> SECTOR_SHIFT; + sector_t bdev_size = bdev_nr_sectors(device->bdev); + unsigned int i; + + pos >>= SECTOR_SHIFT; + for (i = 0; i < nr_zones; i++) { + zones[i].start = i * zone_sectors + pos; + zones[i].len = zone_sectors; + zones[i].capacity = zone_sectors; + zones[i].wp = zones[i].start + zone_sectors; + zones[i].type = BLK_ZONE_TYPE_CONVENTIONAL; + zones[i].cond = BLK_ZONE_COND_NOT_WP; + + if (zones[i].wp >= bdev_size) { + i++; + break; + } + } + + return i; +} + static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, struct blk_zone *zones, unsigned int *nr_zones) { @@ -127,6 +158,12 @@ static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, if (!*nr_zones) return 0; + if (!bdev_is_zoned(device->bdev)) { + ret = emulate_report_zones(device, pos, zones, *nr_zones); + *nr_zones = ret; + return 0; + } + ret = blkdev_report_zones(device->bdev, pos >> SECTOR_SHIFT, *nr_zones, copy_zone_info_cb, zones); if (ret < 0) { @@ -143,6 +180,50 @@ static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, return 0; } +/* The emulated zone size is determined from the size of device extent. */ +static int calculate_emulated_zone_size(struct btrfs_fs_info *fs_info) +{ + struct btrfs_path *path; + struct btrfs_root *root = fs_info->dev_root; + struct btrfs_key key; + struct extent_buffer *leaf; + struct btrfs_dev_extent *dext; + int ret = 0; + + key.objectid = 1; + key.type = BTRFS_DEV_EXTENT_KEY; + key.offset = 0; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + if (ret < 0) + goto out; + + if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) { + ret = btrfs_next_item(root, path); + if (ret < 0) + goto out; + /* No dev extents at all? Not good */ + if (ret > 0) { + ret = -EUCLEAN; + goto out; + } + } + + leaf = path->nodes[0]; + dext = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_dev_extent); + fs_info->zone_size = btrfs_dev_extent_length(leaf, dext); + ret = 0; + +out: + btrfs_free_path(path); + + return ret; +} + int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) { struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; @@ -169,6 +250,7 @@ int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) int btrfs_get_dev_zone_info(struct btrfs_device *device) { + struct btrfs_fs_info *fs_info = device->fs_info; struct btrfs_zoned_device_info *zone_info = NULL; struct block_device *bdev = device->bdev; struct request_queue *queue = bdev_get_queue(bdev); @@ -177,9 +259,14 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device) struct blk_zone *zones = NULL; unsigned int i, nreported = 0, nr_zones; unsigned int zone_sectors; + char *model, *emulated; int ret; - if (!bdev_is_zoned(bdev)) + /* + * Cannot use btrfs_is_zoned here, since fs_info->zone_size might + * not be set yet. + */ + if (!btrfs_fs_incompat(fs_info, ZONED)) return 0; if (device->zone_info) @@ -189,8 +276,20 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device) if (!zone_info) return -ENOMEM; + if (!bdev_is_zoned(bdev)) { + if (!fs_info->zone_size) { + ret = calculate_emulated_zone_size(fs_info); + if (ret) + goto out; + } + + ASSERT(fs_info->zone_size); + zone_sectors = fs_info->zone_size >> SECTOR_SHIFT; + } else { + zone_sectors = bdev_zone_sectors(bdev); + } + nr_sectors = bdev_nr_sectors(bdev); - zone_sectors = bdev_zone_sectors(bdev); /* Check if it's power of 2 (see is_power_of_2) */ ASSERT(zone_sectors != 0 && (zone_sectors & (zone_sectors - 1)) == 0); zone_info->zone_size = zone_sectors << SECTOR_SHIFT; @@ -296,12 +395,32 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device) device->zone_info = zone_info; - /* device->fs_info is not safe to use for printing messages */ - btrfs_info_in_rcu(NULL, - "host-%s zoned block device %s, %u zones of %llu bytes", - bdev_zoned_model(bdev) == BLK_ZONED_HM ? "managed" : "aware", - rcu_str_deref(device->name), zone_info->nr_zones, - zone_info->zone_size); + switch (bdev_zoned_model(bdev)) { + case BLK_ZONED_HM: + model = "host-managed zoned"; + emulated = ""; + break; + case BLK_ZONED_HA: + model = "host-aware zoned"; + emulated = ""; + break; + case BLK_ZONED_NONE: + model = "regular"; + emulated = "emulated "; + break; + default: + /* Just in case */ + btrfs_err_in_rcu(fs_info, "Unsupported zoned model %d on %s", + bdev_zoned_model(bdev), + rcu_str_deref(device->name)); + ret = -EOPNOTSUPP; + goto out; + } + + btrfs_info_in_rcu(fs_info, + "%s block device %s, %u %szones of %llu bytes", + model, rcu_str_deref(device->name), zone_info->nr_zones, + emulated, zone_info->zone_size); return 0; @@ -348,7 +467,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) u64 nr_devices = 0; u64 zone_size = 0; u64 max_zone_append_size = 0; - const bool incompat_zoned = btrfs_is_zoned(fs_info); + const bool incompat_zoned = btrfs_fs_incompat(fs_info, ZONED); int ret = 0; /* Count zoned devices */ @@ -359,9 +478,17 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) continue; model = bdev_zoned_model(device->bdev); + /* + * A Host-Managed zoned device msut be used as a zoned + * device. A Host-Aware zoned device and a non-zoned devices + * can be treated as a zoned device, if ZONED flag is + * enabled in the superblock. + */ if (model == BLK_ZONED_HM || - (model == BLK_ZONED_HA && incompat_zoned)) { - struct btrfs_zoned_device_info *zone_info; + (model == BLK_ZONED_HA && incompat_zoned) || + (model == BLK_ZONED_NONE && incompat_zoned)) { + struct btrfs_zoned_device_info *zone_info = + device->zone_info; zone_info = device->zone_info; zoned_devices++; diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 5e0e7de84a82..058a57317c05 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -143,12 +143,16 @@ static inline void btrfs_dev_clear_zone_empty(struct btrfs_device *device, u64 p static inline bool btrfs_check_device_zone_type(const struct btrfs_fs_info *fs_info, struct block_device *bdev) { - u64 zone_size; - if (btrfs_is_zoned(fs_info)) { - zone_size = bdev_zone_sectors(bdev) << SECTOR_SHIFT; - /* Do not allow non-zoned device */ - return bdev_is_zoned(bdev) && fs_info->zone_size == zone_size; + /* + * We can allow a regular device on a zoned btrfs, because + * we will emulate zoned device on the regular device. + */ + if (!bdev_is_zoned(bdev)) + return true; + + return fs_info->zone_size == + (bdev_zone_sectors(bdev) << SECTOR_SHIFT); } /* Do not allow Host Manged zoned device */ From patchwork Tue Jan 26 02:24:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048197 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24485C432C3 for ; Tue, 26 Jan 2021 20:00:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 04E6923118 for ; Tue, 26 Jan 2021 20:00:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732875AbhAZF2b (ORCPT ); Tue, 26 Jan 2021 00:28:31 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38250 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731805AbhAZCaY (ORCPT ); Mon, 25 Jan 2021 21:30:24 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628223; x=1643164223; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hw5rz406n16kqVNs0O4FE53tJCJ27kTzImUMixMpf4g=; b=dvCN6gO+BGWYYgX3svM/Und4OjntESKZ5ObXiYZhmP1t2e5Fbj+hf6ft wMGmCmzwIYITXFN7eMgDxF4fayTc4gASTavM3xJ9C+YZFaCphr0bN+GcQ npENsvwrvf1yDNLXr9+QoQJkUhVuwHkuouVwMbx7KEDBUypK1R7dnNPWe wW/RkEujT6ZiEPHWaKTQfQPM7FId0Hqodp54Vg3d9i/PFwZ4SRCCqpLY7 0Hcv4WrPCNl8nDpvwdhGPhHCT5ue7ffIUgAUHzGRUdnG/WFgsQ6EpmtKX prLhiy3Dcvzf9bzk8AO5eOOvGofD2RSmv4vtbS2fI1M2SQ1UHq04ZACII A==; IronPort-SDR: icecB777p5ZYMn9AJOnQsq4pT0qMYYtknu+fBrFO2N3lPlLg9Z9Nkdx5iF5lhq1hwdSSaw/Hz0 cVgUMumyL97gME4AJRszqXpqDJxxnRqo0gl+HNUOODJAfHBR7K+8JGp3mmaROY3h2T2Wfp1VLF NrXRc+UMVeSUXNmzCfHR8cN/YvOgnS2HZI1BT3xhBly0xuPObRTMLaVA528gLjGE/s5YoI73P0 1WwbO8bsj8MjFHu2TPXRp7wTLYZYFDocuq0lkT9N3X0b/fqlwRByFBOcMI1YpQ41eyFgkVoCpY mYE= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483523" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:15 +0800 IronPort-SDR: vqNrrgqo0MgZ7hCGpTkVg61gvJgcPryiolmcaM9WD7rMH39LkqrS3GNBMGHIMIdQBfwWNAcVHp 3XSNmBUvThGPJdBisinckvk3renu8rr7pMWy7LSqOB8rND9VbwkZLvsql+DRqg1ExjYqNhww0y Y+2JPnjG1e5cHmdd4xUD4tayh0TKSJjOvrqRbIjiCPGqduhmlsiXTXJS7U24D1vqjb+nvAqbVl UrAkZj9IF54st1SuSK0edJAOc0Bbd0ZGjC2SlLmowE100ndxwEYdUA6I0aCNh9V1gMPbt9/f1b Y2PolCbuzYZOZdemd3M+xQ5g Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:41 -0800 IronPort-SDR: SbTx6CXEWA9TKWwOCqmNbJPXmInez6xPbOAiNaZvs3jAwxm9UICRqygkVedgyEx5FLVM/TGriU Hz8ncT7inLyKyxUbN7oWKo6sYAMfbeR3l8yDbqyYP4ZUbw4xdKP2/dTnz3UtUyqbUKf3JvY3s5 yVMx2Bf/oDQ5XwZu+EfHBAB3/8x1i5ClBUauI1yhScfZvxjJ9IUcohz5oUa+sN17nwGZpuyySm 1CdSSgwmHsL+NobHWsmabO1vs7Tv5M4svh/9xpdWCie8eQOtagek/+2n+2HmrdIjEmw4AzlDQi 8WY= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:13 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 09/42] btrfs: implement zoned chunk allocator Date: Tue, 26 Jan 2021 11:24:47 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit implements a zoned chunk/dev_extent allocator. The zoned allocator aligns the device extents to zone boundaries, so that a zone reset affects only the device extent and does not change the state of blocks in the neighbor device extents. Also, it checks that a region allocation is not overlapping any of the super block zones, and ensures the region is empty. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/volumes.c | 169 ++++++++++++++++++++++++++++++++++++++++----- fs/btrfs/volumes.h | 1 + fs/btrfs/zoned.c | 144 ++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 25 +++++++ 4 files changed, 323 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index bb3f341f6a22..27208139d6e2 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1414,11 +1414,62 @@ static u64 dev_extent_search_start(struct btrfs_device *device, u64 start) * make sure to start at an offset of at least 1MB. */ return max_t(u64, start, SZ_1M); + case BTRFS_CHUNK_ALLOC_ZONED: + /* + * We don't care about the starting region like regular + * allocator, because we anyway use/reserve the first two + * zones for superblock logging. + */ + return ALIGN(start, device->zone_info->zone_size); default: BUG(); } } +static bool dev_extent_hole_check_zoned(struct btrfs_device *device, + u64 *hole_start, u64 *hole_size, + u64 num_bytes) +{ + u64 zone_size = device->zone_info->zone_size; + u64 pos; + int ret; + int changed = 0; + + ASSERT(IS_ALIGNED(*hole_start, zone_size)); + + while (*hole_size > 0) { + pos = btrfs_find_allocatable_zones(device, *hole_start, + *hole_start + *hole_size, + num_bytes); + if (pos != *hole_start) { + *hole_size = *hole_start + *hole_size - pos; + *hole_start = pos; + changed = 1; + if (*hole_size < num_bytes) + break; + } + + ret = btrfs_ensure_empty_zones(device, pos, num_bytes); + + /* Range is ensured to be empty */ + if (!ret) + return changed; + + /* Given hole range was invalid (outside of device) */ + if (ret == -ERANGE) { + *hole_start += *hole_size; + *hole_size = 0; + return 1; + } + + *hole_start += zone_size; + *hole_size -= zone_size; + changed = 1; + } + + return changed; +} + /** * dev_extent_hole_check - check if specified hole is suitable for allocation * @device: the device which we have the hole @@ -1435,24 +1486,39 @@ static bool dev_extent_hole_check(struct btrfs_device *device, u64 *hole_start, bool changed = false; u64 hole_end = *hole_start + *hole_size; - /* - * Check before we set max_hole_start, otherwise we could end up - * sending back this offset anyway. - */ - if (contains_pending_extent(device, hole_start, *hole_size)) { - if (hole_end >= *hole_start) - *hole_size = hole_end - *hole_start; - else - *hole_size = 0; - changed = true; - } + for (;;) { + /* + * Check before we set max_hole_start, otherwise we could end up + * sending back this offset anyway. + */ + if (contains_pending_extent(device, hole_start, *hole_size)) { + if (hole_end >= *hole_start) + *hole_size = hole_end - *hole_start; + else + *hole_size = 0; + changed = true; + } + + switch (device->fs_devices->chunk_alloc_policy) { + case BTRFS_CHUNK_ALLOC_REGULAR: + /* No extra check */ + break; + case BTRFS_CHUNK_ALLOC_ZONED: + if (dev_extent_hole_check_zoned(device, hole_start, + hole_size, num_bytes)) { + changed = true; + /* + * The changed hole can contain pending + * extent. Loop again to check that. + */ + continue; + } + break; + default: + BUG(); + } - switch (device->fs_devices->chunk_alloc_policy) { - case BTRFS_CHUNK_ALLOC_REGULAR: - /* No extra check */ break; - default: - BUG(); } return changed; @@ -1505,6 +1571,9 @@ static int find_free_dev_extent_start(struct btrfs_device *device, search_start = dev_extent_search_start(device, search_start); + WARN_ON(device->zone_info && + !IS_ALIGNED(num_bytes, device->zone_info->zone_size)); + path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -4899,6 +4968,37 @@ static void init_alloc_chunk_ctl_policy_regular( ctl->dev_extent_min = BTRFS_STRIPE_LEN * ctl->dev_stripes; } +static void init_alloc_chunk_ctl_policy_zoned( + struct btrfs_fs_devices *fs_devices, + struct alloc_chunk_ctl *ctl) +{ + u64 zone_size = fs_devices->fs_info->zone_size; + u64 limit; + int min_num_stripes = ctl->devs_min * ctl->dev_stripes; + int min_data_stripes = (min_num_stripes - ctl->nparity) / ctl->ncopies; + u64 min_chunk_size = min_data_stripes * zone_size; + u64 type = ctl->type; + + ctl->max_stripe_size = zone_size; + if (type & BTRFS_BLOCK_GROUP_DATA) { + ctl->max_chunk_size = round_down(BTRFS_MAX_DATA_CHUNK_SIZE, + zone_size); + } else if (type & BTRFS_BLOCK_GROUP_METADATA) { + ctl->max_chunk_size = ctl->max_stripe_size; + } else if (type & BTRFS_BLOCK_GROUP_SYSTEM) { + ctl->max_chunk_size = 2 * ctl->max_stripe_size; + ctl->devs_max = min_t(int, ctl->devs_max, + BTRFS_MAX_DEVS_SYS_CHUNK); + } + + /* We don't want a chunk larger than 10% of writable space */ + limit = max(round_down(div_factor(fs_devices->total_rw_bytes, 1), + zone_size), + min_chunk_size); + ctl->max_chunk_size = min(limit, ctl->max_chunk_size); + ctl->dev_extent_min = zone_size * ctl->dev_stripes; +} + static void init_alloc_chunk_ctl(struct btrfs_fs_devices *fs_devices, struct alloc_chunk_ctl *ctl) { @@ -4919,6 +5019,9 @@ static void init_alloc_chunk_ctl(struct btrfs_fs_devices *fs_devices, case BTRFS_CHUNK_ALLOC_REGULAR: init_alloc_chunk_ctl_policy_regular(fs_devices, ctl); break; + case BTRFS_CHUNK_ALLOC_ZONED: + init_alloc_chunk_ctl_policy_zoned(fs_devices, ctl); + break; default: BUG(); } @@ -5045,6 +5148,38 @@ static int decide_stripe_size_regular(struct alloc_chunk_ctl *ctl, return 0; } +static int decide_stripe_size_zoned(struct alloc_chunk_ctl *ctl, + struct btrfs_device_info *devices_info) +{ + u64 zone_size = devices_info[0].dev->zone_info->zone_size; + /* Number of stripes that count for block group size */ + int data_stripes; + + /* + * It should hold because: + * dev_extent_min == dev_extent_want == zone_size * dev_stripes + */ + ASSERT(devices_info[ctl->ndevs - 1].max_avail == ctl->dev_extent_min); + + ctl->stripe_size = zone_size; + ctl->num_stripes = ctl->ndevs * ctl->dev_stripes; + data_stripes = (ctl->num_stripes - ctl->nparity) / ctl->ncopies; + + /* stripe_size is fixed in ZONED. Reduce ndevs instead. */ + if (ctl->stripe_size * data_stripes > ctl->max_chunk_size) { + ctl->ndevs = div_u64(div_u64(ctl->max_chunk_size * ctl->ncopies, + ctl->stripe_size) + ctl->nparity, + ctl->dev_stripes); + ctl->num_stripes = ctl->ndevs * ctl->dev_stripes; + data_stripes = (ctl->num_stripes - ctl->nparity) / ctl->ncopies; + ASSERT(ctl->stripe_size * data_stripes <= ctl->max_chunk_size); + } + + ctl->chunk_size = ctl->stripe_size * data_stripes; + + return 0; +} + static int decide_stripe_size(struct btrfs_fs_devices *fs_devices, struct alloc_chunk_ctl *ctl, struct btrfs_device_info *devices_info) @@ -5072,6 +5207,8 @@ static int decide_stripe_size(struct btrfs_fs_devices *fs_devices, switch (fs_devices->chunk_alloc_policy) { case BTRFS_CHUNK_ALLOC_REGULAR: return decide_stripe_size_regular(ctl, devices_info); + case BTRFS_CHUNK_ALLOC_ZONED: + return decide_stripe_size_zoned(ctl, devices_info); default: BUG(); } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 1997a4649a66..98a447badd6a 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -213,6 +213,7 @@ BTRFS_DEVICE_GETSET_FUNCS(bytes_used); enum btrfs_chunk_allocation_policy { BTRFS_CHUNK_ALLOC_REGULAR, + BTRFS_CHUNK_ALLOC_ZONED, }; /* diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index f0af88d497c7..e829fa2df8ac 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1,11 +1,13 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include #include #include "ctree.h" #include "volumes.h" #include "zoned.h" #include "rcu-string.h" +#include "disk-io.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -557,6 +559,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) fs_info->zone_size = zone_size; fs_info->max_zone_append_size = max_zone_append_size; + fs_info->fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_ZONED; /* * Check mount options here, because we might change fs_info->zoned @@ -779,3 +782,144 @@ int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror) sb_zone << zone_sectors_shift, zone_sectors * BTRFS_NR_SB_LOG_ZONES, GFP_NOFS); } + +/* + * btrfs_check_allocatable_zones - find allocatable zones within give region + * @device: the device to allocate a region + * @hole_start: the position of the hole to allocate the region + * @num_bytes: the size of wanted region + * @hole_size: the size of hole + * @return: position of allocatable zones + * + * Allocatable region should not contain any superblock locations. + */ +u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, + u64 hole_end, u64 num_bytes) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + u8 shift = zinfo->zone_size_shift; + u64 nzones = num_bytes >> shift; + u64 pos = hole_start; + u64 begin, end; + bool have_sb; + int i; + + ASSERT(IS_ALIGNED(hole_start, zinfo->zone_size)); + ASSERT(IS_ALIGNED(num_bytes, zinfo->zone_size)); + + while (pos < hole_end) { + begin = pos >> shift; + end = begin + nzones; + + if (end > zinfo->nr_zones) + return hole_end; + + /* Check if zones in the region are all empty */ + if (btrfs_dev_is_sequential(device, pos) && + find_next_zero_bit(zinfo->empty_zones, end, begin) != end) { + pos += zinfo->zone_size; + continue; + } + + have_sb = false; + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { + u32 sb_zone; + u64 sb_pos; + + sb_zone = sb_zone_number(shift, i); + if (!(end <= sb_zone || + sb_zone + BTRFS_NR_SB_LOG_ZONES <= begin)) { + have_sb = true; + pos = ((u64)sb_zone + BTRFS_NR_SB_LOG_ZONES) << shift; + break; + } + + /* + * We also need to exclude regular superblock + * positions + */ + sb_pos = btrfs_sb_offset(i); + if (!(pos + num_bytes <= sb_pos || + sb_pos + BTRFS_SUPER_INFO_SIZE <= pos)) { + have_sb = true; + pos = ALIGN(sb_pos + BTRFS_SUPER_INFO_SIZE, + zinfo->zone_size); + break; + } + } + if (!have_sb) + break; + } + + return pos; +} + +int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, + u64 length, u64 *bytes) +{ + int ret; + + *bytes = 0; + ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_RESET, + physical >> SECTOR_SHIFT, length >> SECTOR_SHIFT, + GFP_NOFS); + if (ret) + return ret; + + *bytes = length; + while (length) { + btrfs_dev_set_zone_empty(device, physical); + physical += device->zone_info->zone_size; + length -= device->zone_info->zone_size; + } + + return 0; +} + +int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + u8 shift = zinfo->zone_size_shift; + unsigned long begin = start >> shift; + unsigned long end = (start + size) >> shift; + u64 pos; + int ret; + + ASSERT(IS_ALIGNED(start, zinfo->zone_size)); + ASSERT(IS_ALIGNED(size, zinfo->zone_size)); + + if (end > zinfo->nr_zones) + return -ERANGE; + + /* All the zones are conventional */ + if (find_next_bit(zinfo->seq_zones, begin, end) == end) + return 0; + + /* All the zones are sequential and empty */ + if (find_next_zero_bit(zinfo->seq_zones, begin, end) == end && + find_next_zero_bit(zinfo->empty_zones, begin, end) == end) + return 0; + + for (pos = start; pos < start + size; pos += zinfo->zone_size) { + u64 reset_bytes; + + if (!btrfs_dev_is_sequential(device, pos) || + btrfs_dev_is_empty_zone(device, pos)) + continue; + + /* Free regions should be empty */ + btrfs_warn_in_rcu( + device->fs_info, + "zoned: resetting device %s (devid %llu) zone %llu for allocation", + rcu_str_deref(device->name), device->devid, + pos >> shift); + WARN_ON_ONCE(1); + + ret = btrfs_reset_device_zone(device, pos, zinfo->zone_size, + &reset_bytes); + if (ret) + return ret; + } + + return 0; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 058a57317c05..de5901f5ae66 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -36,6 +36,11 @@ int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw, u64 *bytenr_ret); void btrfs_advance_sb_log(struct btrfs_device *device, int mirror); int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror); +u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, + u64 hole_end, u64 num_bytes); +int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, + u64 length, u64 *bytes); +int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -92,6 +97,26 @@ static inline int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror return 0; } +static inline u64 btrfs_find_allocatable_zones(struct btrfs_device *device, + u64 hole_start, u64 hole_end, + u64 num_bytes) +{ + return hole_start; +} + +static inline int btrfs_reset_device_zone(struct btrfs_device *device, + u64 physical, u64 length, u64 *bytes) +{ + *bytes = 0; + return 0; +} + +static inline int btrfs_ensure_empty_zones(struct btrfs_device *device, + u64 start, u64 size) +{ + return 0; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Jan 26 02:24:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048193 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEA88C4332B for ; Tue, 26 Jan 2021 20:00:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7A01022CE3 for ; Tue, 26 Jan 2021 20:00:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732888AbhAZF2i (ORCPT ); Tue, 26 Jan 2021 00:28:38 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38256 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731219AbhAZCa5 (ORCPT ); Mon, 25 Jan 2021 21:30:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628256; x=1643164256; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=S9+/RYxer975SEcYFMIh/JyuK33yUpWJ6D9D8jaz4oI=; b=mbyD7E1yf9PS0DJh8+hWFvNCviR3s5BKyY2WJUd+Ds5IU7Rd/m6KKmkb gag3C07x3FhZbazYTyhpfTlqEbRZl8w28YFQyxQlCNC2W657i3CzncQvV eAOFVjlcHc8uDPJ3Q5zx33547tkHWcFv5y4JUmsgNGaiqYxHNzpHlnzW4 myBv0oIRoIt4CzJyqqY58eWEkVYFKWOzTyIRtPyCNxHT2DRkxq4yregBr rr4uR+K9ndLaOq16NbexXLJXdyXQusbpthFk1qfVaIKYVqRskk5kX7XmL zXTVhSLNyXEhvk750qw6Vvuw5+f4jh63oTAw6tCcdm8P+T7ZZgYwK6ieS A==; IronPort-SDR: YXez0rRuSon1dQpnoMmLxsxl4HxpsFzkKRI/hRtvGyJ43Q/efJRxg0RGFp1wVo3Pc4N0AyoBhI za98K7+rCxqI14h6nEYMhJjT/i4UGBsT/gIQ3c8IcZkxMZ8msFPPv3Fi//LOFXIlvIDkFC7fV5 OejSpVIBv44hYiYLq958NwPl1Qr8E+p+H1LYw753pIp2ps8KIZon/DUWuZOXWMmo5arroYMlrW S4/BQeSlltZO99U8enM9xmOLK3fA8QD8jBvCUa7inW/RISwqI3XM+yOSugWalzQsAwwbe7FB3G sIk= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483527" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:16 +0800 IronPort-SDR: I2j3rAKs6nb6FXgP7hsckdjrlGwCNFLh0Ex8B/gm7e4TzVqPMGjU9fe5/RsA0q3DS7G63lC7ew mHfcrBn2dZJP4Snc0WZaalPg5LK6S7UvfbKoRN/7a86c9rdPHK2IOsDBeQHUzYHBMsoqESJqeG GohFOE1M3Rq9k1dMRy9O9eAj+wZogjl4wnhOLtdOQblH+5Zht1oloV0Xqw2nhnEi22KmAnY0oQ uMj3Ak5mjTmZoqe6hsQWnRPzIJ6PqNEnBXzxiS5kmKO/3+uzQORimioxli0PqlwaDx/JTngrqU JZpEBfKWOguoxca/m4yuId68 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:43 -0800 IronPort-SDR: SlR6+xEbBoUc9pg8JwWNxwArKL62RgCe278VD2zmwa+EJNnblS88Zi8JIxoYj8cxL3E7p2Bcm5 QECfLxL70SKpRhgELSbTw53+GjzyL6LutXBg4ERqyH2te9bh6YmtwtjeTC3q2tuJiAtqXKGR5E 2/XUIb0Nigu6roXNCXlDN6sGz8g8FlmKz1KuLgTp0VY0FiJzkYnyHBwJ1yxwCP0g/tW4yDYtCB uTvPPDpkF698GMdTsC0ylBqBcQfbalC+f3ehLWec0wLXmRqmQ11EuR5LFqDq7+/OSWmmezis5q SEc= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:14 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Anand Jain , Josef Bacik Subject: [PATCH v14 10/42] btrfs: verify device extent is aligned to zone Date: Tue, 26 Jan 2021 11:24:48 +0900 Message-Id: <12cbdd4b2a2d144a5053c93d972855d8bccd03cd.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add a check in verify_one_dev_extent() to check if a device extent on a zoned block device is aligned to the respective zone boundary. Signed-off-by: Naohiro Aota Reviewed-by: Anand Jain Reviewed-by: Josef Bacik --- fs/btrfs/volumes.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 27208139d6e2..2d52330f26b5 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7776,6 +7776,20 @@ static int verify_one_dev_extent(struct btrfs_fs_info *fs_info, ret = -EUCLEAN; goto out; } + + if (dev->zone_info) { + u64 zone_size = dev->zone_info->zone_size; + + if (!IS_ALIGNED(physical_offset, zone_size) || + !IS_ALIGNED(physical_len, zone_size)) { + btrfs_err(fs_info, +"zoned: dev extent devid %llu physical offset %llu len %llu is not aligned to device zone", + devid, physical_offset, physical_len); + ret = -EUCLEAN; + goto out; + } + } + out: free_extent_map(em); return ret; From patchwork Tue Jan 26 02:24:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048195 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB96AC43331 for ; Tue, 26 Jan 2021 20:00:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ACDC723110 for ; Tue, 26 Jan 2021 20:00:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732882AbhAZF2d (ORCPT ); Tue, 26 Jan 2021 00:28:33 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38278 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731824AbhAZCaw (ORCPT ); Mon, 25 Jan 2021 21:30:52 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628251; x=1643164251; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nwU/ux0ywKOaV2AdChONKam8+OdFpBJlxIEjuUt0/lw=; b=Bnk8KZ23yqdaqw8BKRnzdkAx4qiXS6u3fPXq0P/uWZ3j1g3cxGJBjtRW 1Z/YOoJwEjUguSYKln3EvwNjsOsqBDg6e4IiExugJB63BP9TQ+4QLPaXA hO0ijLcLSu5GRP/vY5SQynoGh/gXCo+Ms/rofMspMIZB+vJAhrhDpVs9P 5zUPZZ+fWRmZHptch4etd55pdwkgtbLKW4g65M3t1J4iSXqJmb5FGOIGO P0Bgmlk0RY4j93TgApdHdfZ9E2dTo1EHTPYNwpaacz2A28GnLTcsUfE9w NDdq7xNNjABcIs0Wk30y7HiZpUyby3HN1NnM8xM2d3BDKZw+bLZ2tiT4V g==; IronPort-SDR: pJ2A1N61J98l5PZtSxjkQwhE6QZc94gUGPtGdSyQcVuifnwSe85d5E3HFs7nQbY/nM451jJs4o XYEZT/jxV9J1y6HeJCCodNNGUKzMyn7mCLo2r2k/R5ZEnLh++hkM3/+vWe1amGYIpo8v5t9qXc wyc+q4oZp0cCsBuij7Sw4pbIF5UL4CFS1tKmFsYFnpsOOIAchng/yid70lsthUllaPTno5zGeT vvU3eWfffGUt45RJ8dy/rDBFaffLVetGeNFHIMOlf4Y98jE7SvJagHCtTAUxVMrwgxydoMxBhR Fvk= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483529" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:18 +0800 IronPort-SDR: bC0onIdY0CN+0uGo7kwr3eOGDhpjcPoBTur2Bajnw/F6okl1qEH+riU+0Q7aKRwTZoYbZzSiqE ZM2rVBrjkH/PdldqlG5gm24TR0nhVfHdQyJoxMHPgJjMNbs6Qw2F3ta5FBjMTJDYFDUa99SuiV 1azkM5KuX1JamfyOjR9gsIrrgVscEhJrupbACBitVFkIjoMEVDIIYJWuSQJz73ehepes9/LZku XG/sz4zRfM5M/hf5IUJigp5d3jCtjSQkHi1b+vFHxrvQNQXVcPO4a2PRB8mrgdYGCKyuy/6hPc CScFH+/gTzlYpj+NxwliON0z Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:45 -0800 IronPort-SDR: hEA99C3f/jvV+hoIO+8xMDiB9ssUo3nZFbciHOeJDleuW3P136W9mAXSiY/pkq65eKyWFU/uqj qBeOwY4rlGe+DWP/S0XceA0NZQOL9ipRje+onGIKTlKvpaswcuXKw4Mub/FqHsjLS+DDZzofRL 8UUsg46Yiu7D6aF0T07cKt8+XKrjrcDxzXOHlu0AUvMJWhoWRU5/QjCf+NSA9Pse6N5mOdlmlW mKVSZIQ8Z1XN9z+YoSbUF6W4xPxQKsNL7JiU5w77VlK+mW2/pCbpucHAFwR/kD0zPMS7IqvY2m +P4= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:16 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik , Anand Jain Subject: [PATCH v14 11/42] btrfs: load zone's allocation offset Date: Tue, 26 Jan 2021 11:24:49 +0900 Message-Id: <43e1c678c7c1b10428c6b30c86c4ca3725452de1.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Zoned btrfs must allocate blocks at the zones' write pointer. The device's write pointer position can be mapped to a logical address within a block group. This commit adds "alloc_offset" to track the logical address. This logical address is populated in btrfs_load_block_group_zone_info() from write pointers of corresponding zones. For now, zoned btrfs only support the SINGLE profile. Supporting non-SINGLE profile with zone append writing is not trivial. For example, in the DUP profile, we send a zone append writing IO to two zones on a device. The device reply with written LBAs for the IOs. If the offsets of the returned addresses from the beginning of the zone are different, then it results in different logical addresses. We need fine-grained logical to physical mapping to support such separated physical address issue. Since it should require additional metadata type, disable non-SINGLE profiles for now. This commit supports the case all the zones in a block group are sequential. The next patch will handle the case having a conventional zone. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik Reviewed-by: Anand Jain --- fs/btrfs/block-group.c | 15 +++++ fs/btrfs/block-group.h | 6 ++ fs/btrfs/zoned.c | 150 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 7 ++ 4 files changed, 178 insertions(+) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index bdd20af69dde..0140fafedb6a 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -15,6 +15,7 @@ #include "delalloc-space.h" #include "discard.h" #include "raid56.h" +#include "zoned.h" /* * Return target flags in extended format or 0 if restripe for this chunk_type @@ -1850,6 +1851,13 @@ static int read_one_block_group(struct btrfs_fs_info *info, goto error; } + ret = btrfs_load_block_group_zone_info(cache); + if (ret) { + btrfs_err(info, "zoned: failed to load zone info of bg %llu", + cache->start); + goto error; + } + /* * We need to exclude the super stripes now so that the space info has * super bytes accounted for, otherwise we'll think we have more space @@ -2137,6 +2145,13 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, cache->cached = BTRFS_CACHE_FINISHED; if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) cache->needs_free_space = 1; + + ret = btrfs_load_block_group_zone_info(cache); + if (ret) { + btrfs_put_block_group(cache); + return ret; + } + ret = exclude_super_stripes(cache); if (ret) { /* We may have excluded something, so call this just in case */ diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 8f74a96074f7..9d026ab1768d 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -183,6 +183,12 @@ struct btrfs_block_group { /* Record locked full stripes for RAID5/6 block group */ struct btrfs_full_stripe_locks_tree full_stripe_locks_root; + + /* + * Allocation offset for the block group to implement sequential + * allocation. This is used only with ZONED mode enabled. + */ + u64 alloc_offset; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index e829fa2df8ac..22c0665ee816 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -3,14 +3,20 @@ #include #include #include +#include #include "ctree.h" #include "volumes.h" #include "zoned.h" #include "rcu-string.h" #include "disk-io.h" +#include "block-group.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 +/* Invalid allocation pointer value for missing devices */ +#define WP_MISSING_DEV ((u64)-1) +/* Pseudo write pointer value for conventional zone */ +#define WP_CONVENTIONAL ((u64)-2) /* Number of superblock log zones */ #define BTRFS_NR_SB_LOG_ZONES 2 @@ -923,3 +929,147 @@ int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size) return 0; } + +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct extent_map_tree *em_tree = &fs_info->mapping_tree; + struct extent_map *em; + struct map_lookup *map; + struct btrfs_device *device; + u64 logical = cache->start; + u64 length = cache->length; + u64 physical = 0; + int ret; + int i; + unsigned int nofs_flag; + u64 *alloc_offsets = NULL; + u32 num_sequential = 0, num_conventional = 0; + + if (!btrfs_is_zoned(fs_info)) + return 0; + + /* Sanity check */ + if (!IS_ALIGNED(length, fs_info->zone_size)) { + btrfs_err(fs_info, "zoned: block group %llu len %llu unaligned to zone size %llu", + logical, length, fs_info->zone_size); + return -EIO; + } + + /* Get the chunk mapping */ + read_lock(&em_tree->lock); + em = lookup_extent_mapping(em_tree, logical, length); + read_unlock(&em_tree->lock); + + if (!em) + return -EINVAL; + + map = em->map_lookup; + + alloc_offsets = kcalloc(map->num_stripes, sizeof(*alloc_offsets), + GFP_NOFS); + if (!alloc_offsets) { + free_extent_map(em); + return -ENOMEM; + } + + for (i = 0; i < map->num_stripes; i++) { + bool is_sequential; + struct blk_zone zone; + + device = map->stripes[i].dev; + physical = map->stripes[i].physical; + + if (device->bdev == NULL) { + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } + + is_sequential = btrfs_dev_is_sequential(device, physical); + if (is_sequential) + num_sequential++; + else + num_conventional++; + + if (!is_sequential) { + alloc_offsets[i] = WP_CONVENTIONAL; + continue; + } + + /* + * This zone will be used for allocation, so mark this + * zone non-empty. + */ + btrfs_dev_clear_zone_empty(device, physical); + + /* + * The group is mapped to a sequential zone. Get the zone write + * pointer to determine the allocation offset within the zone. + */ + WARN_ON(!IS_ALIGNED(physical, fs_info->zone_size)); + nofs_flag = memalloc_nofs_save(); + ret = btrfs_get_dev_zone(device, physical, &zone); + memalloc_nofs_restore(nofs_flag); + if (ret == -EIO || ret == -EOPNOTSUPP) { + ret = 0; + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } else if (ret) { + goto out; + } + + switch (zone.cond) { + case BLK_ZONE_COND_OFFLINE: + case BLK_ZONE_COND_READONLY: + btrfs_err(fs_info, "zoned: offline/readonly zone %llu on device %s (devid %llu)", + physical >> device->zone_info->zone_size_shift, + rcu_str_deref(device->name), device->devid); + alloc_offsets[i] = WP_MISSING_DEV; + break; + case BLK_ZONE_COND_EMPTY: + alloc_offsets[i] = 0; + break; + case BLK_ZONE_COND_FULL: + alloc_offsets[i] = fs_info->zone_size; + break; + default: + /* Partially used zone */ + alloc_offsets[i] = + ((zone.wp - zone.start) << SECTOR_SHIFT); + break; + } + } + + if (num_conventional > 0) { + /* + * Since conventional zones do not have a write pointer, we + * cannot determine alloc_offset from the pointer + */ + ret = -EINVAL; + goto out; + } + + switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { + case 0: /* single */ + cache->alloc_offset = alloc_offsets[0]; + break; + case BTRFS_BLOCK_GROUP_DUP: + case BTRFS_BLOCK_GROUP_RAID1: + case BTRFS_BLOCK_GROUP_RAID0: + case BTRFS_BLOCK_GROUP_RAID10: + case BTRFS_BLOCK_GROUP_RAID5: + case BTRFS_BLOCK_GROUP_RAID6: + /* non-SINGLE profiles are not supported yet */ + default: + btrfs_err(fs_info, "zoned: profile %s not supported", + btrfs_bg_type_to_raid_name(map->type)); + ret = -EINVAL; + goto out; + } + +out: + kfree(alloc_offsets); + free_extent_map(em); + + return ret; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index de5901f5ae66..491b98c97f48 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -41,6 +41,7 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, u64 length, u64 *bytes); int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -117,6 +118,12 @@ static inline int btrfs_ensure_empty_zones(struct btrfs_device *device, return 0; } +static inline int btrfs_load_block_group_zone_info( + struct btrfs_block_group *cache) +{ + return 0; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Jan 26 02:24:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048191 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C46EC433E6 for ; Tue, 26 Jan 2021 20:00:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6528A22B3B for ; Tue, 26 Jan 2021 20:00:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732898AbhAZF2l (ORCPT ); Tue, 26 Jan 2021 00:28:41 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33033 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731906AbhAZCcq (ORCPT ); Mon, 25 Jan 2021 21:32:46 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628365; x=1643164365; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yhzRzcUFzBIUDc/7z/NWn55YH+KK3c88UuahRAjMiI8=; b=IIV7Syg20r8Ori8ncUJuG1UDupSo/SP1u6kYtnwNjCQdsdpSc7NToKj9 lP1T+xf3ZUfDOp+GoY9e4Vj7RyrrafUIVBfUTEUfQEW2007+WUBcS8W3J a7JwDtVFJtX2jH2zMD55t4/WIuDMTJfcd62BoVCG0OlKFgms1Ipf+1pXW hy/RYh1f9TyoMmx1tkVIn7Km5g0hl2+oft1TxHbyf2BpwvM04aj6MiZDG 0mXIKrN4M/YvIZz5RdD2qeddWGWOss3N+FOAB+fWklZXUmN00BYUDtXKo yDAZsFtVcvPChM7JE0xKc3gdCiK3MBgshaUjL5jwn8hcN7TV8f1NWFb0f Q==; IronPort-SDR: 7QclYlWYiIG5o681VfTKvplssz+2zM6sAFbLamPfz9ER7gaMBxckWQs1sf8LUvXWWFEFE3YV+j 0x3jKjiIvfFWMas1L706ewDkgFGiWe1PiDuvW3z4YaSVVuRO1PAqx8ofzgJ8jgMz+0AzkIysKj 39x8dDRdK6zvoShs4Dk7VT9e79GH9pbz/EiCzy5oPzU0cnynjVVuEn0ftRsiRS12y+rajdBckA 4LzDIYgOP8f9dMeDgCMe8yXMqXMKaSbQimY0/ymBtRZgHYYLj78kEIoLU3wXEmxpdsm6zqBWNA 6Zk= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483531" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:20 +0800 IronPort-SDR: LSvggqn+dmNkJ6bqkb93zdRDX/DM7oc0B8dKkYpf43hQsTYO7n6xK2jGYXgKF2FIWxvtIwRHWm j8DWa1hqJJNUM2G4z9fCc+BxEehJgo/LXwPQZwdZy63efaQYYcRiPT0aI6yvg0GHRfRwXoW9j1 OSf7Fs76qY+O4UUUF0cMDSajGDQWi7S4fpf0D72qGDgXeOis+507hvPP799f5L1kXd1+nRaTie Eh8Z6VfwEndoDiPjEYGJibkwddAB28q3ChVLi4ebdKijMRwdXUeMDvbj4FG6jOWk5sit5SeJBE ytvgmLstK4N6NBhGgY1E7e2R Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:46 -0800 IronPort-SDR: 4d1jYEem91L68oScgvgBA/0L65oLor4mNLoFmS9GnChqzz3WrT7HpzX1kIW7TpGrGsv8d0cH7B TlfriHlMCqhOnourLo/x0gUf9StHKCuYarLmXptLIrId+FwHGsvMTSW+ToWp/i8xYCrNyHtUke 2URh3frCEWV+3a8Mll7OAbc/pzOxNT0Uz6xnOWASTW4YlAauFjI7JslbhFnkHIvYnHYPHek7E8 bPHJr8PPfFOGu0PgWxbSMN4F+xaO8TqTLl91FO4lBm8389r2fxyiLN99LZhO9kPedPF3KmB3B/ WCU= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:18 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v14 12/42] btrfs: calculate allocation offset for conventional zones Date: Tue, 26 Jan 2021 11:24:50 +0900 Message-Id: <583b2d2e286c482f9bcd53c71043a1be1a1c3cec.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Conventional zones do not have a write pointer, so we cannot use it to determine the allocation offset if a block group contains a conventional zone. But instead, we can consider the end of the last allocated extent in the block group as an allocation offset. For new block group, we cannot calculate the allocation offset by consulting the extent tree, because it can cause deadlock by taking extent buffer lock after chunk mutex (which is already taken in btrfs_make_block_group()). Since it is a new block group, we can simply set the allocation offset to 0, anyway. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik Reviewed-by: Anand Jain --- fs/btrfs/block-group.c | 4 +- fs/btrfs/zoned.c | 99 +++++++++++++++++++++++++++++++++++++++--- fs/btrfs/zoned.h | 4 +- 3 files changed, 98 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 0140fafedb6a..349b2a09bdf1 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1851,7 +1851,7 @@ static int read_one_block_group(struct btrfs_fs_info *info, goto error; } - ret = btrfs_load_block_group_zone_info(cache); + ret = btrfs_load_block_group_zone_info(cache, false); if (ret) { btrfs_err(info, "zoned: failed to load zone info of bg %llu", cache->start); @@ -2146,7 +2146,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) cache->needs_free_space = 1; - ret = btrfs_load_block_group_zone_info(cache); + ret = btrfs_load_block_group_zone_info(cache, true); if (ret) { btrfs_put_block_group(cache); return ret; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 22c0665ee816..ca7aef252d33 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -930,7 +930,68 @@ int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size) return 0; } -int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) +/* + * Calculate an allocation pointer from the extent allocation information + * for a block group consist of conventional zones. It is pointed to the + * end of the last allocated extent in the block group as an allocation + * offset. + */ +static int calculate_alloc_pointer(struct btrfs_block_group *cache, + u64 *offset_ret) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct btrfs_root *root = fs_info->extent_root; + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_key found_key; + int ret; + u64 length; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + key.objectid = cache->start + cache->length; + key.type = 0; + key.offset = 0; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + /* We should not find the exact match */ + if (!ret) + ret = -EUCLEAN; + if (ret < 0) + goto out; + + ret = btrfs_previous_extent_item(root, path, cache->start); + if (ret) { + if (ret == 1) { + ret = 0; + *offset_ret = 0; + } + goto out; + } + + btrfs_item_key_to_cpu(path->nodes[0], &found_key, path->slots[0]); + + if (found_key.type == BTRFS_EXTENT_ITEM_KEY) + length = found_key.offset; + else + length = fs_info->nodesize; + + if (!(found_key.objectid >= cache->start && + found_key.objectid + length <= cache->start + cache->length)) { + ret = -EUCLEAN; + goto out; + } + *offset_ret = found_key.objectid + length - cache->start; + ret = 0; + +out: + btrfs_free_path(path); + return ret; +} + +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) { struct btrfs_fs_info *fs_info = cache->fs_info; struct extent_map_tree *em_tree = &fs_info->mapping_tree; @@ -944,6 +1005,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) int i; unsigned int nofs_flag; u64 *alloc_offsets = NULL; + u64 last_alloc = 0; u32 num_sequential = 0, num_conventional = 0; if (!btrfs_is_zoned(fs_info)) @@ -1042,11 +1104,30 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) if (num_conventional > 0) { /* - * Since conventional zones do not have a write pointer, we - * cannot determine alloc_offset from the pointer + * Avoid calling calculate_alloc_pointer() for new BG. It + * is no use for new BG. It must be always 0. + * + * Also, we have a lock chain of extent buffer lock -> + * chunk mutex. For new BG, this function is called from + * btrfs_make_block_group() which is already taking the + * chunk mutex. Thus, we cannot call + * calculate_alloc_pointer() which takes extent buffer + * locks to avoid deadlock. */ - ret = -EINVAL; - goto out; + if (new) { + cache->alloc_offset = 0; + goto out; + } + ret = calculate_alloc_pointer(cache, &last_alloc); + if (ret || map->num_stripes == num_conventional) { + if (!ret) + cache->alloc_offset = last_alloc; + else + btrfs_err(fs_info, + "zoned: failed to determine allocation offset of bg %llu", + cache->start); + goto out; + } } switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { @@ -1068,6 +1149,14 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) } out: + /* An extent is allocated after the write pointer */ + if (!ret && num_conventional && last_alloc > cache->alloc_offset) { + btrfs_err(fs_info, + "zoned: got wrong write pointer in BG %llu: %llu > %llu", + logical, last_alloc, cache->alloc_offset); + ret = -EIO; + } + kfree(alloc_offsets); free_extent_map(em); diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 491b98c97f48..b53403ba0b10 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -41,7 +41,7 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, u64 length, u64 *bytes); int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); -int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache); +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -119,7 +119,7 @@ static inline int btrfs_ensure_empty_zones(struct btrfs_device *device, } static inline int btrfs_load_block_group_zone_info( - struct btrfs_block_group *cache) + struct btrfs_block_group *cache, bool new) { return 0; } From patchwork Tue Jan 26 02:24:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12045403 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6D81C433E0 for ; Tue, 26 Jan 2021 05:29:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B1BE222795 for ; Tue, 26 Jan 2021 05:29:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732912AbhAZF2o (ORCPT ); Tue, 26 Jan 2021 00:28:44 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33036 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731910AbhAZCcq (ORCPT ); Mon, 25 Jan 2021 21:32:46 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628365; x=1643164365; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7+wFkvNqSt/BzG1T6KlMUTLNYl9OMkOu/9HamYjVab8=; b=dTbIuNdcr5U0ixsh7eTMy5b58NZWuTIFxY1UnFh09KMOlMpzeWBSsN2V s2T9hJYoISD5fxAGtykYhfFmpEQif0Wx3Wi5y/Km06mGaIYNLrcKNvtBD VaQedI2PKCIKZw1kbb/WMz4kqoQZbTbij3/4ekwKrWc0NHOMvDvBbt+9U Jh5vI3bEF3EK2diwA9MTbv9uVFEfM1JhZ5taulONSdrGGHlRSZPLVBYYH 0RDxuX6isxl+JcukrXkWrsZWuGyjFILENSKqpd0KFsiUr2ldIkNN87noM Q9OqCqn16DTK2l2qvRH+FTVyPowWD+AnodR581WG1tiFfvhzh04VUOFEx g==; IronPort-SDR: 1PH+0miRhuXgp+Kec0MxIiRDWlGna0nK1fOmqhpzueVAjT1Oa3uulTlhbZ51ixv6sSA0JiuAZE FkZc0Nx8BosVl4wyIv5SAz+F0pWarQAuMCgwnzW1zBBWYVCnf4x2xjrkm5hmkxzugBiCvfsPLr SSiqLoa3Ef4W4GyasKDg2riefQcB0tUnfd109UiPNo7Nj32EjDQpNg52uKZ1Nd+CQUArWJ0YQg CC92lN43gsu2UVrkXmVlClnC0vbxkNXVYFqF9B0e6eU+1e0QSGw7wcSnN5OryIEHnYPAZhDgMU Ewk= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483532" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:21 +0800 IronPort-SDR: M940twnkI6X2/NGB5hMLETCWgOG0IQ309HOENHeic4Ai+lhbIW0wUvjovMFzc9Vpn0A37EF53J 6pq3WKhtnYS3vNgpFMMg/8y+69ntBk2k8eW4eEqc4F0vLSeLTqt/4RvjmHCeMl8VYytVtFBC8V AttBWKSrZzoFeXIgdkfwLECex0ZGuL+ojgpX8tW5rMFo0x7Z34E9m90hk816Q/q7/j+XopuBVQ VO6sCjQItcGIFyyZfu1+2/K2hcB8eertDQDYEjuaWohSvcBVh8vMdFoyzNLWQRAE5i8CFK6XWf p5SmRmMV9ERM15BWXaYXkYBJ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:48 -0800 IronPort-SDR: d/QyiAg4lIalu7zYBISaqAus3ZYGxnac+dyo8CblcGpIpauqX8uaB++mZ5FtlwUU5oATli/At/ J5dCF5UcNXt4JLAv7SqlXaCh1oUiN+7jJ0QeOnfo1hdGGx3zUoP4Q7vULY+qVBm0+ep+k2Mrqw 2VpG0ZK9sS91Nb9hdaX5JjDvv753V6w0XPkliMwZanh9m4Up19As2d29FbIytiHUj2LegNFSyW LJ9BFE2o6+WzsQyzF7oI6wUjBmE2B0qnbALQkLFPve2AItLfctRAcFgviprvXNnQER6kSSKseh KoM= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:20 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v14 13/42] btrfs: track unusable bytes for zones Date: Tue, 26 Jan 2021 11:24:51 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In zoned btrfs a region that was once written then freed is not usable until we reset the underlying zone. So we need to distinguish such unusable space from usable free space. Therefore we need to introduce the "zone_unusable" field to the block group structure, and "bytes_zone_unusable" to the space_info structure to track the unusable space. Pinned bytes are always reclaimed to the unusable space. But, when an allocated region is returned before using e.g., the block group becomes read-only between allocation time and reservation time, we can safely return the region to the block group. For the situation, this commit introduces "btrfs_add_free_space_unused". This behaves the same as btrfs_add_free_space() on regular btrfs. On zoned btrfs, it rewinds the allocation offset. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/block-group.c | 52 +++++++++++++++++++++------- fs/btrfs/block-group.h | 1 + fs/btrfs/extent-tree.c | 5 +++ fs/btrfs/free-space-cache.c | 67 +++++++++++++++++++++++++++++++++++++ fs/btrfs/free-space-cache.h | 2 ++ fs/btrfs/space-info.c | 13 ++++--- fs/btrfs/space-info.h | 4 ++- fs/btrfs/sysfs.c | 2 ++ fs/btrfs/zoned.c | 24 +++++++++++++ fs/btrfs/zoned.h | 3 ++ 10 files changed, 155 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 349b2a09bdf1..dcc2a466c353 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1009,12 +1009,17 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, WARN_ON(block_group->space_info->total_bytes < block_group->length); WARN_ON(block_group->space_info->bytes_readonly - < block_group->length); + < block_group->length - block_group->zone_unusable); + WARN_ON(block_group->space_info->bytes_zone_unusable + < block_group->zone_unusable); WARN_ON(block_group->space_info->disk_total < block_group->length * factor); } block_group->space_info->total_bytes -= block_group->length; - block_group->space_info->bytes_readonly -= block_group->length; + block_group->space_info->bytes_readonly -= + (block_group->length - block_group->zone_unusable); + block_group->space_info->bytes_zone_unusable -= + block_group->zone_unusable; block_group->space_info->disk_total -= block_group->length * factor; spin_unlock(&block_group->space_info->lock); @@ -1158,7 +1163,7 @@ static int inc_block_group_ro(struct btrfs_block_group *cache, int force) } num_bytes = cache->length - cache->reserved - cache->pinned - - cache->bytes_super - cache->used; + cache->bytes_super - cache->zone_unusable - cache->used; /* * Data never overcommits, even in mixed mode, so do just the straight @@ -1189,6 +1194,12 @@ static int inc_block_group_ro(struct btrfs_block_group *cache, int force) if (!ret) { sinfo->bytes_readonly += num_bytes; + if (btrfs_is_zoned(cache->fs_info)) { + /* Migrate zone_unusable bytes to readonly */ + sinfo->bytes_readonly += cache->zone_unusable; + sinfo->bytes_zone_unusable -= cache->zone_unusable; + cache->zone_unusable = 0; + } cache->ro++; list_add_tail(&cache->ro_list, &sinfo->ro_bgs); } @@ -1871,12 +1882,20 @@ static int read_one_block_group(struct btrfs_fs_info *info, } /* - * Check for two cases, either we are full, and therefore don't need - * to bother with the caching work since we won't find any space, or we - * are empty, and we can just add all the space in and be done with it. - * This saves us _a_lot_ of time, particularly in the full case. + * For zoned btrfs, space after the allocation offset is the only + * free space for a block group. So, we don't need any caching + * work. btrfs_calc_zone_unusable() will set the amount of free + * space and zone_unusable space. + * + * For regular btrfs, check for two cases, either we are full, and + * therefore don't need to bother with the caching work since we + * won't find any space, or we are empty, and we can just add all + * the space in and be done with it. This saves us _a_lot_ of + * time, particularly in the full case. */ - if (cache->length == cache->used) { + if (btrfs_is_zoned(info)) { + btrfs_calc_zone_unusable(cache); + } else if (cache->length == cache->used) { cache->last_byte_to_unpin = (u64)-1; cache->cached = BTRFS_CACHE_FINISHED; btrfs_free_excluded_extents(cache); @@ -1895,7 +1914,8 @@ static int read_one_block_group(struct btrfs_fs_info *info, } trace_btrfs_add_block_group(info, cache, 0); btrfs_update_space_info(info, cache->flags, cache->length, - cache->used, cache->bytes_super, &space_info); + cache->used, cache->bytes_super, + cache->zone_unusable, &space_info); cache->space_info = space_info; @@ -1951,7 +1971,7 @@ static int fill_dummy_bgs(struct btrfs_fs_info *fs_info) break; } btrfs_update_space_info(fs_info, bg->flags, em->len, em->len, - 0, &space_info); + 0, 0, &space_info); bg->space_info = space_info; link_block_group(bg); @@ -2193,7 +2213,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, */ trace_btrfs_add_block_group(fs_info, cache, 1); btrfs_update_space_info(fs_info, cache->flags, size, bytes_used, - cache->bytes_super, &cache->space_info); + cache->bytes_super, 0, &cache->space_info); btrfs_update_global_block_rsv(fs_info); link_block_group(cache); @@ -2301,8 +2321,16 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group *cache) spin_lock(&cache->lock); if (!--cache->ro) { num_bytes = cache->length - cache->reserved - - cache->pinned - cache->bytes_super - cache->used; + cache->pinned - cache->bytes_super - + cache->zone_unusable - cache->used; sinfo->bytes_readonly -= num_bytes; + if (btrfs_is_zoned(cache->fs_info)) { + /* Migrate zone_unusable bytes back */ + cache->zone_unusable = cache->alloc_offset - + cache->used; + sinfo->bytes_zone_unusable += cache->zone_unusable; + sinfo->bytes_readonly -= cache->zone_unusable; + } list_del_init(&cache->ro_list); } spin_unlock(&cache->lock); diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 9d026ab1768d..0f3c62c561bc 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -189,6 +189,7 @@ struct btrfs_block_group { * allocation. This is used only with ZONED mode enabled. */ u64 alloc_offset; + u64 zone_unusable; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 505abbc5120d..193dda1b83bb 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -34,6 +34,7 @@ #include "block-group.h" #include "discard.h" #include "rcu-string.h" +#include "zoned.h" #undef SCRAMBLE_DELAYED_REFS @@ -2762,6 +2763,10 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info, if (cache->ro) { space_info->bytes_readonly += len; readonly = true; + } else if (btrfs_is_zoned(fs_info)) { + /* Need reset before reusing in a zoned block group */ + space_info->bytes_zone_unusable += len; + readonly = true; } spin_unlock(&cache->lock); if (!readonly && return_free_space && diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 0d6dcb5ff963..22a7a95088be 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2468,6 +2468,8 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, int ret = 0; u64 filter_bytes = bytes; + ASSERT(!btrfs_is_zoned(fs_info)); + info = kmem_cache_zalloc(btrfs_free_space_cachep, GFP_NOFS); if (!info) return -ENOMEM; @@ -2525,11 +2527,49 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, return ret; } +static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group, + u64 bytenr, u64 size, bool used) +{ + struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + u64 offset = bytenr - block_group->start; + u64 to_free, to_unusable; + + spin_lock(&ctl->tree_lock); + if (!used) + to_free = size; + else if (offset >= block_group->alloc_offset) + to_free = size; + else if (offset + size <= block_group->alloc_offset) + to_free = 0; + else + to_free = offset + size - block_group->alloc_offset; + to_unusable = size - to_free; + + ctl->free_space += to_free; + block_group->zone_unusable += to_unusable; + spin_unlock(&ctl->tree_lock); + if (!used) { + spin_lock(&block_group->lock); + block_group->alloc_offset -= size; + spin_unlock(&block_group->lock); + } + + /* All the region is now unusable. Mark it as unused and reclaim */ + if (block_group->zone_unusable == block_group->length) + btrfs_mark_bg_unused(block_group); + + return 0; +} + int btrfs_add_free_space(struct btrfs_block_group *block_group, u64 bytenr, u64 size) { enum btrfs_trim_state trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + if (btrfs_is_zoned(block_group->fs_info)) + return __btrfs_add_free_space_zoned(block_group, bytenr, size, + true); + if (btrfs_test_opt(block_group->fs_info, DISCARD_SYNC)) trim_state = BTRFS_TRIM_STATE_TRIMMED; @@ -2538,6 +2578,16 @@ int btrfs_add_free_space(struct btrfs_block_group *block_group, bytenr, size, trim_state); } +int btrfs_add_free_space_unused(struct btrfs_block_group *block_group, + u64 bytenr, u64 size) +{ + if (btrfs_is_zoned(block_group->fs_info)) + return __btrfs_add_free_space_zoned(block_group, bytenr, size, + false); + + return btrfs_add_free_space(block_group, bytenr, size); +} + /* * This is a subtle distinction because when adding free space back in general, * we want it to be added as untrimmed for async. But in the case where we add @@ -2548,6 +2598,10 @@ int btrfs_add_free_space_async_trimmed(struct btrfs_block_group *block_group, { enum btrfs_trim_state trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + if (btrfs_is_zoned(block_group->fs_info)) + return __btrfs_add_free_space_zoned(block_group, bytenr, size, + true); + if (btrfs_test_opt(block_group->fs_info, DISCARD_SYNC) || btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC)) trim_state = BTRFS_TRIM_STATE_TRIMMED; @@ -2565,6 +2619,9 @@ int btrfs_remove_free_space(struct btrfs_block_group *block_group, int ret; bool re_search = false; + if (btrfs_is_zoned(block_group->fs_info)) + return 0; + spin_lock(&ctl->tree_lock); again: @@ -2659,6 +2716,16 @@ void btrfs_dump_free_space(struct btrfs_block_group *block_group, struct rb_node *n; int count = 0; + /* + * Zoned btrfs does not use free space tree and cluster. Just print + * out the free space after the allocation offset. + */ + if (btrfs_is_zoned(fs_info)) { + btrfs_info(fs_info, "free space %llu", + block_group->length - block_group->alloc_offset); + return; + } + spin_lock(&ctl->tree_lock); for (n = rb_first(&ctl->free_space_offset); n; n = rb_next(n)) { info = rb_entry(n, struct btrfs_free_space, offset_index); diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index ecb09a02d544..1f23088d43f9 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -107,6 +107,8 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, enum btrfs_trim_state trim_state); int btrfs_add_free_space(struct btrfs_block_group *block_group, u64 bytenr, u64 size); +int btrfs_add_free_space_unused(struct btrfs_block_group *block_group, + u64 bytenr, u64 size); int btrfs_add_free_space_async_trimmed(struct btrfs_block_group *block_group, u64 bytenr, u64 size); int btrfs_remove_free_space(struct btrfs_block_group *block_group, diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index fd8e79e3c10e..3185b9f7152c 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -163,6 +163,7 @@ u64 __pure btrfs_space_info_used(struct btrfs_space_info *s_info, ASSERT(s_info); return s_info->bytes_used + s_info->bytes_reserved + s_info->bytes_pinned + s_info->bytes_readonly + + s_info->bytes_zone_unusable + (may_use_included ? s_info->bytes_may_use : 0); } @@ -257,7 +258,7 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info) void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, u64 total_bytes, u64 bytes_used, - u64 bytes_readonly, + u64 bytes_readonly, u64 bytes_zone_unusable, struct btrfs_space_info **space_info) { struct btrfs_space_info *found; @@ -273,6 +274,7 @@ void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, found->bytes_used += bytes_used; found->disk_used += bytes_used * factor; found->bytes_readonly += bytes_readonly; + found->bytes_zone_unusable += bytes_zone_unusable; if (total_bytes > 0) found->full = 0; btrfs_try_granting_tickets(info, found); @@ -422,10 +424,10 @@ static void __btrfs_dump_space_info(struct btrfs_fs_info *fs_info, info->total_bytes - btrfs_space_info_used(info, true), info->full ? "" : "not "); btrfs_info(fs_info, - "space_info total=%llu, used=%llu, pinned=%llu, reserved=%llu, may_use=%llu, readonly=%llu", + "space_info total=%llu, used=%llu, pinned=%llu, reserved=%llu, may_use=%llu, readonly=%llu zone_unusable=%llu", info->total_bytes, info->bytes_used, info->bytes_pinned, info->bytes_reserved, info->bytes_may_use, - info->bytes_readonly); + info->bytes_readonly, info->bytes_zone_unusable); DUMP_BLOCK_RSV(fs_info, global_block_rsv); DUMP_BLOCK_RSV(fs_info, trans_block_rsv); @@ -454,9 +456,10 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info, list_for_each_entry(cache, &info->block_groups[index], list) { spin_lock(&cache->lock); btrfs_info(fs_info, - "block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %s", + "block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %llu zone_unusable %s", cache->start, cache->length, cache->used, cache->pinned, - cache->reserved, cache->ro ? "[readonly]" : ""); + cache->reserved, cache->zone_unusable, + cache->ro ? "[readonly]" : ""); spin_unlock(&cache->lock); btrfs_dump_free_space(cache, bytes); } diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index 74706f604bce..5731a13ba70e 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -17,6 +17,8 @@ struct btrfs_space_info { u64 bytes_may_use; /* number of bytes that may be used for delalloc/allocations */ u64 bytes_readonly; /* total bytes that are read only */ + u64 bytes_zone_unusable; /* total bytes that are unusable until + resetting the device zone */ u64 max_extent_size; /* This will hold the maximum extent size of the space info if we had an ENOSPC in the @@ -119,7 +121,7 @@ DECLARE_SPACE_INFO_UPDATE(bytes_pinned, "pinned"); int btrfs_init_space_info(struct btrfs_fs_info *fs_info); void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, u64 total_bytes, u64 bytes_used, - u64 bytes_readonly, + u64 bytes_readonly, u64 bytes_zone_unusable, struct btrfs_space_info **space_info); struct btrfs_space_info *btrfs_find_space_info(struct btrfs_fs_info *info, u64 flags); diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 19b9fffa2c9c..6eb1c50fa98c 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -666,6 +666,7 @@ SPACE_INFO_ATTR(bytes_pinned); SPACE_INFO_ATTR(bytes_reserved); SPACE_INFO_ATTR(bytes_may_use); SPACE_INFO_ATTR(bytes_readonly); +SPACE_INFO_ATTR(bytes_zone_unusable); SPACE_INFO_ATTR(disk_used); SPACE_INFO_ATTR(disk_total); BTRFS_ATTR(space_info, total_bytes_pinned, @@ -679,6 +680,7 @@ static struct attribute *space_info_attrs[] = { BTRFS_ATTR_PTR(space_info, bytes_reserved), BTRFS_ATTR_PTR(space_info, bytes_may_use), BTRFS_ATTR_PTR(space_info, bytes_readonly), + BTRFS_ATTR_PTR(space_info, bytes_zone_unusable), BTRFS_ATTR_PTR(space_info, disk_used), BTRFS_ATTR_PTR(space_info, disk_total), BTRFS_ATTR_PTR(space_info, total_bytes_pinned), diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index ca7aef252d33..f6b52de4abca 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1162,3 +1162,27 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) return ret; } + +void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) +{ + u64 unusable, free; + + if (!btrfs_is_zoned(cache->fs_info)) + return; + + WARN_ON(cache->bytes_super != 0); + unusable = cache->alloc_offset - cache->used; + free = cache->length - cache->alloc_offset; + + /* We only need ->free_space in ALLOC_SEQ BGs */ + cache->last_byte_to_unpin = (u64)-1; + cache->cached = BTRFS_CACHE_FINISHED; + cache->free_space_ctl->free_space = free; + cache->zone_unusable = unusable; + + /* + * Should not have any excluded extents. Just + * in case, though. + */ + btrfs_free_excluded_extents(cache); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index b53403ba0b10..0cc0b27e9437 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -42,6 +42,7 @@ int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, u64 length, u64 *bytes); int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new); +void btrfs_calc_zone_unusable(struct btrfs_block_group *cache); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -124,6 +125,8 @@ static inline int btrfs_load_block_group_zone_info( return 0; } +static inline void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Jan 26 02:24:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12045401 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D54CEC433DB for ; Tue, 26 Jan 2021 05:29:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 942B5206F7 for ; Tue, 26 Jan 2021 05:29:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731568AbhAZF2m (ORCPT ); Tue, 26 Jan 2021 00:28:42 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33029 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731905AbhAZCcq (ORCPT ); Mon, 25 Jan 2021 21:32:46 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628365; x=1643164365; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jPqwgZGI5bxeK7MgzXekEr0jNSk2h5t9A2pyqG7eu38=; b=ecQU2q2fFIQ5CAxq45cLrnj4JKArP9I5AZQL3TSWxzIRANVHwSXAj5Zq wjZGBXIUNxA6cIQ+kJVGYKjSPiI6doNC4DAHRRtpeFeACzwZYMausCNSW CT1zSph2IyH8m3ynbCYDkusDuoSQ0YyYMIuSvIAqukilPM0AltJZgQ5d4 h5xeJlGWkC1vS7a6dklhiNgHl1OfSyErEZhjV9IEMvBDHgmi0a097wacU IK44oUupoOnzeY9YQsGznbQ/uEZGvrWtiP6rHJ2jH8X/JvCiQlU1g2/e2 1hkBAbyZQpNp+hhJICii1VwwpwNDyrU16uiGXDcjclRNbsicN8JLL7/z7 g==; IronPort-SDR: Cy1pWrmoAs5YrkkDBEeolXFR+Z+pLk3hV3PDrhuKYNTRerDX3zS6w7gMTBmQdiu8wXku2R5u3X 84kG45ngX3PjcPyvlvLlz6rGD84BdjQLyB5Zqe9Sh2Ovz29qi1zeOJYRL8j1P+iz8TZjMYxHBl EE8Q6563mweS0drgcwOXsHNpXpfd0WzjfVKUFeTQ/DsrQicJ67pvy0ryruFftl9XRu/gmRg+Yw 952AaU+k1zTRsj03V4xkyblnAUERQYqfhkSRrS8es7DZ8zNq9tsddrUNLp+Ckvcezz8xq+n/ZG Qig= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483534" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:23 +0800 IronPort-SDR: 0itxSmKkx0PzAtWjAB9lgcEn0BchwaxhM4PUHA0U3JHB8H3PMMOAUQdq2GZHc9Ap08pi+x9ULa +5/gYvqfXkSvY9dFL94ldY/rYy4AR3obkLW+QDiLhqPU/YGtpkggu9JNP51Ms9jmUbYMusJl/2 UkTlQmyIFBQJez5qQBGpoAE2v5CXoq3O9mKKu3jSbDg61jcnWtJAfHGomEO/svFN6wkxeJRww3 wP+gteZhRUUBMIH+XVnhuex1LNKLcHNGyH0xckJrVhGjhnhbXe8v9EgrTtu+ba3FIsvMRC9KNw TUzI2jsW+Tas2F4+5iS81BIX Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:49 -0800 IronPort-SDR: KnlLHVBrAQzSo3O5xfUCJtcBBuCb9wXnX5MMOXubHOJIgq+9Yg8X5EraQ9sJ17ys3zLg44BcvV dHJJrVOrSzNtCtcQUF6cjeKLg8hM01VhPy3BTwVJeCbDNq8D1rVa33AryBfby706uR/nOGPf/8 /0GHGke2yVlb+SH+AwyJOS+9Pa16DueHD/G/Dzg31sVFbxhdk7/+tAwo5vWELD+n2AeUoD/6OS tccvczEwF9r02LVq9PtrvwP3XObX8mEmFWF4zKGCaOC0j1CiLdELLzivFXx2TweL9hfRulm4oq q90= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:21 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 14/42] btrfs: do sequential extent allocation in ZONED mode Date: Tue, 26 Jan 2021 11:24:52 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit implements a sequential extent allocator for the ZONED mode. This allocator just needs to check if there is enough space in the block group. Therefor the allocator never manages bitmaps or clusters. Also add ASSERTs to the corresponding functions. Actually, with zone append writing, it is unnecessary to track the allocation offset. It only needs to check space availability. But, by tracking the offset and returning the offset as an allocated region, we can skip modification of ordered extents and checksum information when there is no IO reordering. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 4 ++ fs/btrfs/extent-tree.c | 92 ++++++++++++++++++++++++++++++++++--- fs/btrfs/free-space-cache.c | 6 +++ 3 files changed, 96 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index dcc2a466c353..f38817a82901 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -725,6 +725,10 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only struct btrfs_caching_control *caching_ctl = NULL; int ret = 0; + /* Allocator for ZONED btrfs does not use the cache at all */ + if (btrfs_is_zoned(fs_info)) + return 0; + caching_ctl = kzalloc(sizeof(*caching_ctl), GFP_NOFS); if (!caching_ctl) return -ENOMEM; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 193dda1b83bb..6d6feac90005 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3454,6 +3454,7 @@ btrfs_release_block_group(struct btrfs_block_group *cache, enum btrfs_extent_allocation_policy { BTRFS_EXTENT_ALLOC_CLUSTERED, + BTRFS_EXTENT_ALLOC_ZONED, }; /* @@ -3706,6 +3707,65 @@ static int do_allocation_clustered(struct btrfs_block_group *block_group, return find_free_extent_unclustered(block_group, ffe_ctl); } +/* + * Simple allocator for sequential only block group. It only allows + * sequential allocation. No need to play with trees. This function + * also reserves the bytes as in btrfs_add_reserved_bytes. + */ +static int do_allocation_zoned(struct btrfs_block_group *block_group, + struct find_free_extent_ctl *ffe_ctl, + struct btrfs_block_group **bg_ret) +{ + struct btrfs_space_info *space_info = block_group->space_info; + struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + u64 start = block_group->start; + u64 num_bytes = ffe_ctl->num_bytes; + u64 avail; + int ret = 0; + + ASSERT(btrfs_is_zoned(block_group->fs_info)); + + spin_lock(&space_info->lock); + spin_lock(&block_group->lock); + + if (block_group->ro) { + ret = 1; + goto out; + } + + avail = block_group->length - block_group->alloc_offset; + if (avail < num_bytes) { + if (ffe_ctl->max_extent_size < avail) { + /* + * With sequential allocator, free space is always + * contiguous. + */ + ffe_ctl->max_extent_size = avail; + ffe_ctl->total_free_space = avail; + } + ret = 1; + goto out; + } + + ffe_ctl->found_offset = start + block_group->alloc_offset; + block_group->alloc_offset += num_bytes; + spin_lock(&ctl->tree_lock); + ctl->free_space -= num_bytes; + spin_unlock(&ctl->tree_lock); + + /* + * We do not check if found_offset is aligned to stripesize. The + * address is anyway rewritten when using zone append writing. + */ + + ffe_ctl->search_start = ffe_ctl->found_offset; + +out: + spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); + return ret; +} + static int do_allocation(struct btrfs_block_group *block_group, struct find_free_extent_ctl *ffe_ctl, struct btrfs_block_group **bg_ret) @@ -3713,6 +3773,8 @@ static int do_allocation(struct btrfs_block_group *block_group, switch (ffe_ctl->policy) { case BTRFS_EXTENT_ALLOC_CLUSTERED: return do_allocation_clustered(block_group, ffe_ctl, bg_ret); + case BTRFS_EXTENT_ALLOC_ZONED: + return do_allocation_zoned(block_group, ffe_ctl, bg_ret); default: BUG(); } @@ -3727,6 +3789,9 @@ static void release_block_group(struct btrfs_block_group *block_group, ffe_ctl->retry_clustered = false; ffe_ctl->retry_unclustered = false; break; + case BTRFS_EXTENT_ALLOC_ZONED: + /* Nothing to do */ + break; default: BUG(); } @@ -3755,6 +3820,9 @@ static void found_extent(struct find_free_extent_ctl *ffe_ctl, case BTRFS_EXTENT_ALLOC_CLUSTERED: found_extent_clustered(ffe_ctl, ins); break; + case BTRFS_EXTENT_ALLOC_ZONED: + /* Nothing to do */ + break; default: BUG(); } @@ -3770,6 +3838,9 @@ static int chunk_allocation_failed(struct find_free_extent_ctl *ffe_ctl) */ ffe_ctl->loop = LOOP_NO_EMPTY_SIZE; return 0; + case BTRFS_EXTENT_ALLOC_ZONED: + /* Give up here */ + return -ENOSPC; default: BUG(); } @@ -3938,6 +4009,9 @@ static int prepare_allocation(struct btrfs_fs_info *fs_info, case BTRFS_EXTENT_ALLOC_CLUSTERED: return prepare_allocation_clustered(fs_info, ffe_ctl, space_info, ins); + case BTRFS_EXTENT_ALLOC_ZONED: + /* nothing to do */ + return 0; default: BUG(); } @@ -4001,6 +4075,9 @@ static noinline int find_free_extent(struct btrfs_root *root, ffe_ctl.last_ptr = NULL; ffe_ctl.use_cluster = true; + if (btrfs_is_zoned(fs_info)) + ffe_ctl.policy = BTRFS_EXTENT_ALLOC_ZONED; + ins->type = BTRFS_EXTENT_ITEM_KEY; ins->objectid = 0; ins->offset = 0; @@ -4143,20 +4220,23 @@ static noinline int find_free_extent(struct btrfs_root *root, /* move on to the next group */ if (ffe_ctl.search_start + num_bytes > block_group->start + block_group->length) { - btrfs_add_free_space(block_group, ffe_ctl.found_offset, - num_bytes); + btrfs_add_free_space_unused(block_group, + ffe_ctl.found_offset, + num_bytes); goto loop; } if (ffe_ctl.found_offset < ffe_ctl.search_start) - btrfs_add_free_space(block_group, ffe_ctl.found_offset, - ffe_ctl.search_start - ffe_ctl.found_offset); + btrfs_add_free_space_unused(block_group, + ffe_ctl.found_offset, + ffe_ctl.search_start - ffe_ctl.found_offset); ret = btrfs_add_reserved_bytes(block_group, ram_bytes, num_bytes, delalloc); if (ret == -EAGAIN) { - btrfs_add_free_space(block_group, ffe_ctl.found_offset, - num_bytes); + btrfs_add_free_space_unused(block_group, + ffe_ctl.found_offset, + num_bytes); goto loop; } btrfs_inc_block_group_reservations(block_group); diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 22a7a95088be..19c00118917a 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2919,6 +2919,8 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group *block_group, u64 align_gap_len = 0; enum btrfs_trim_state align_gap_trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + ASSERT(!btrfs_is_zoned(block_group->fs_info)); + spin_lock(&ctl->tree_lock); entry = find_free_space(ctl, &offset, &bytes_search, block_group->full_stripe_len, max_extent_size); @@ -3050,6 +3052,8 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group *block_group, struct rb_node *node; u64 ret = 0; + ASSERT(!btrfs_is_zoned(block_group->fs_info)); + spin_lock(&cluster->lock); if (bytes > cluster->max_size) goto out; @@ -3826,6 +3830,8 @@ int btrfs_trim_block_group(struct btrfs_block_group *block_group, int ret; u64 rem = 0; + ASSERT(!btrfs_is_zoned(block_group->fs_info)); + *trimmed = 0; spin_lock(&block_group->lock); From patchwork Tue Jan 26 02:24:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12045405 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58ABBC4332B for ; Tue, 26 Jan 2021 05:29:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2A901206F7 for ; Tue, 26 Jan 2021 05:29:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732918AbhAZF2r (ORCPT ); Tue, 26 Jan 2021 00:28:47 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38250 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727020AbhAZCdP (ORCPT ); Mon, 25 Jan 2021 21:33:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628395; x=1643164395; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CrpsAoe8Gzxu8wUVbVAHTRzJ1euPoNodBOhBXGGCcM8=; b=J/7HDQLOBRrxAAZG0jmTSuTWgAOUtpquZgjzhqvXzb6tXKx6j5ZMnHJn T6fjpAVT2shcTc1K6VvucHA3yKZ07WaTqunTAqTjTlzbh2bVXcaF04ioF YYi7OS6SMnW1GCB3wAmFyYhu3UVKPSVig6+iMy9fJysQ2hOgFSStVO74e LAPj1baqJmzDNO312bkXwzRMwgFrXr1EhDZHPaUZ2Dc5mxtbrHZWqwPIc TEpvxVUZOjOvdJ3Irfx2zI4dYVH5Kqyhd6JsjkB7Y5op5geriuymgpSVC SfjG75Rawfv87yv9qkMyYjnv3HuEMjELri9vDEK7Hy+Vj6HH1YYC3ZMWx w==; IronPort-SDR: HRfT05yong1Joz73IW53UncpLr/3nCCRiAWmqND04I+HcHPAWQUpZ1u9DSGayWfcH+YoNpQJEg +kTQICYBhDAeTWSJgyTC3yDpRi5QGJKUAJH3msRfdUOwYZDcFMsUX53Btjv9prMzMMSLZfDNL+ 9R2QM/h+fqvP8yV73aM1fKMBxDgpC9XUXtI25pkJb6orN8Au1ESl6P5LXXZgzf/GjnDshbpfUk BFQ8maHrC/MqOwswSw4VgkIcQL8Fvpf88e60CKJRaBhwnB0yke/3Altw+cFqsz/HzlWcFk9sEf i7E= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483537" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:25 +0800 IronPort-SDR: d8rYqu6uMPK9QmiYkJyyO2EeERL4c89w+/EWkG1SZtCiiyl6ncYV8JSZ/Zdddw3xYro8oAhHjJ tkDTOsfypelUS8ylO6bXHegUdNItbp3/uSicCtOt2llSQEuKaaUhRLWXGM+lsxShQQUazu060V JQyl4+hiPWewZj+iPPamjFnx7xJ2mKiP6h5t0TsGRSbWhRSwzGWNJNYg8wuIGfoH5cpRnQnEdN ZORS3lU4SpVEt9vyoTXKBgPADlvwl9twJoxqDZIfAdRFp99QRpnhtmxVk5KS/e3CqQ/bmDRVD1 gvZyK5/Bs7kBR3YOqhrAG9+Y Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:51 -0800 IronPort-SDR: RhCZIArcRQKN2DdmIeOj8T5zSMuOq+A14OjHBBOV8Y8hNg7ZRV0U45A3+VVY69zkYR7ndQ+HsZ sfQkqc7jsM0fHckqHKvZQ/rLIX3IgaHbf4NzvVGwHGbXeexSjvowb9SzJRMIchWQwdjn76CQ05 fc+rqYPNC0C8d4ntDU8L5VG7WPGoKFAsRQ7nI2dpSPuuqRkOMr+Qcn1lABSFlOuZc+h8saTvHX MJt24NVJw1MweK489WreoDSk3TROXS5cs3OmPlaOB9OcnuGdJG4xmMYpPM5CudEntZ7QS8nPtD Sek= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:23 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 15/42] btrfs: redirty released extent buffers in ZONED mode Date: Tue, 26 Jan 2021 11:24:53 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Tree manipulating operations like merging nodes often release once-allocated tree nodes. Btrfs cleans such nodes so that pages in the node are not uselessly written out. On ZONED volumes, however, such optimization blocks the following IOs as the cancellation of the write out of the freed blocks breaks the sequential write sequence expected by the device. This patch introduces a list of clean and unwritten extent buffers that have been released in a transaction. Btrfs redirty the buffer so that btree_write_cache_pages() can send proper bios to the devices. Besides it clears the entire content of the extent buffer not to confuse raw block scanners e.g. btrfsck. By clearing the content, csum_dirty_buffer() complains about bytenr mismatch, so avoid the checking and checksum using newly introduced buffer flag EXTENT_BUFFER_NO_CHECK. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 8 ++++++++ fs/btrfs/extent-tree.c | 12 +++++++++++- fs/btrfs/extent_io.c | 4 ++++ fs/btrfs/extent_io.h | 2 ++ fs/btrfs/transaction.c | 10 ++++++++++ fs/btrfs/transaction.h | 3 +++ fs/btrfs/tree-log.c | 6 ++++++ fs/btrfs/zoned.c | 37 +++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 7 +++++++ 9 files changed, 88 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 76ab86dacc8d..d530bceb8f9b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -459,6 +459,12 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec return 0; found_start = btrfs_header_bytenr(eb); + + if (test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)) { + WARN_ON(found_start != 0); + return 0; + } + /* * Please do not consolidate these warnings into a single if. * It is useful to know what went wrong. @@ -4697,6 +4703,8 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans, EXTENT_DIRTY); btrfs_destroy_pinned_extent(fs_info, &cur_trans->pinned_extents); + btrfs_free_redirty_list(cur_trans); + cur_trans->state =TRANS_STATE_COMPLETED; wake_up(&cur_trans->commit_wait); } diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 6d6feac90005..36a105697781 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3317,8 +3317,10 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) { ret = check_ref_cleanup(trans, buf->start); - if (!ret) + if (!ret) { + btrfs_redirty_list_add(trans->transaction, buf); goto out; + } } cache = btrfs_lookup_block_group(fs_info, buf->start); @@ -3329,6 +3331,13 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, goto out; } + if (btrfs_is_zoned(fs_info)) { + btrfs_redirty_list_add(trans->transaction, buf); + pin_down_extent(trans, cache, buf->start, buf->len, 1); + btrfs_put_block_group(cache); + goto out; + } + WARN_ON(test_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)); btrfs_add_free_space(cache, buf->start, buf->len); @@ -4663,6 +4672,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root, __btrfs_tree_lock(buf, nest); btrfs_clean_tree_block(buf); clear_bit(EXTENT_BUFFER_STALE, &buf->bflags); + clear_bit(EXTENT_BUFFER_NO_CHECK, &buf->bflags); set_extent_buffer_uptodate(buf); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 2fa563da65bd..d80b3a96ae49 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -24,6 +24,7 @@ #include "rcu-string.h" #include "backref.h" #include "disk-io.h" +#include "zoned.h" static struct kmem_cache *extent_state_cache; static struct kmem_cache *extent_buffer_cache; @@ -5043,6 +5044,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start, btrfs_leak_debug_add(&fs_info->eb_leak_lock, &eb->leak_list, &fs_info->allocated_ebs); + INIT_LIST_HEAD(&eb->release_list); spin_lock_init(&eb->refs_lock); atomic_set(&eb->refs, 1); @@ -5830,6 +5832,8 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv, char *src = (char *)srcv; unsigned long i = get_eb_page_index(start); + WARN_ON(test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)); + if (check_eb_range(eb, start, len)) return; diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 19221095c635..5a81268c4d8c 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -31,6 +31,7 @@ enum { EXTENT_BUFFER_IN_TREE, /* write IO error */ EXTENT_BUFFER_WRITE_ERR, + EXTENT_BUFFER_NO_CHECK, }; /* these are flags for __process_pages_contig */ @@ -93,6 +94,7 @@ struct extent_buffer { struct rw_semaphore lock; struct page *pages[INLINE_EXTENT_BUFFER_PAGES]; + struct list_head release_list; #ifdef CONFIG_BTRFS_DEBUG struct list_head leak_list; #endif diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 3bcb5444536e..ef4fcb925cb7 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -21,6 +21,7 @@ #include "qgroup.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" #define BTRFS_ROOT_TRANS_TAG 0 @@ -375,6 +376,8 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, spin_lock_init(&cur_trans->dirty_bgs_lock); INIT_LIST_HEAD(&cur_trans->deleted_bgs); spin_lock_init(&cur_trans->dropped_roots_lock); + INIT_LIST_HEAD(&cur_trans->releasing_ebs); + spin_lock_init(&cur_trans->releasing_ebs_lock); list_add_tail(&cur_trans->list, &fs_info->trans_list); extent_io_tree_init(fs_info, &cur_trans->dirty_pages, IO_TREE_TRANS_DIRTY_PAGES, fs_info->btree_inode); @@ -2336,6 +2339,13 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) goto scrub_continue; } + /* + * At this point, we should have written all the tree blocks + * allocated in this transaction. So it's now safe to free the + * redirtyied extent buffers. + */ + btrfs_free_redirty_list(cur_trans); + ret = write_all_supers(fs_info, 0); /* * the super is written, we can safely allow the tree-loggers diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 31ca81bad822..660b4e1f1181 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -92,6 +92,9 @@ struct btrfs_transaction { */ atomic_t pending_ordered; wait_queue_head_t pending_wait; + + spinlock_t releasing_ebs_lock; + struct list_head releasing_ebs; }; #define __TRANS_FREEZABLE (1U << 0) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 8ee0700a980f..930e752686b4 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -19,6 +19,7 @@ #include "qgroup.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" /* magic values for the inode_only field in btrfs_log_inode: * @@ -2752,6 +2753,8 @@ static noinline int walk_down_log_tree(struct btrfs_trans_handle *trans, free_extent_buffer(next); return ret; } + btrfs_redirty_list_add( + trans->transaction, next); } else { if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &next->bflags)) clear_extent_buffer_dirty(next); @@ -3296,6 +3299,9 @@ static void free_log_tree(struct btrfs_trans_handle *trans, clear_extent_bits(&log->dirty_log_pages, 0, (u64)-1, EXTENT_DIRTY | EXTENT_NEW | EXTENT_NEED_WAIT); extent_io_tree_release(&log->log_csum_range); + + if (trans && log->node) + btrfs_redirty_list_add(trans->transaction, log->node); btrfs_put_root(log); } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index f6b52de4abca..db6cb0070220 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -10,6 +10,7 @@ #include "rcu-string.h" #include "disk-io.h" #include "block-group.h" +#include "transaction.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -1186,3 +1187,39 @@ void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) */ btrfs_free_excluded_extents(cache); } + +void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb) +{ + struct btrfs_fs_info *fs_info = eb->fs_info; + + if (!btrfs_is_zoned(fs_info) || + btrfs_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN) || + !list_empty(&eb->release_list)) + return; + + set_extent_buffer_dirty(eb); + set_extent_bits_nowait(&trans->dirty_pages, eb->start, + eb->start + eb->len - 1, EXTENT_DIRTY); + memzero_extent_buffer(eb, 0, eb->len); + set_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags); + + spin_lock(&trans->releasing_ebs_lock); + list_add_tail(&eb->release_list, &trans->releasing_ebs); + spin_unlock(&trans->releasing_ebs_lock); + atomic_inc(&eb->refs); +} + +void btrfs_free_redirty_list(struct btrfs_transaction *trans) +{ + spin_lock(&trans->releasing_ebs_lock); + while (!list_empty(&trans->releasing_ebs)) { + struct extent_buffer *eb; + + eb = list_first_entry(&trans->releasing_ebs, + struct extent_buffer, release_list); + list_del_init(&eb->release_list); + free_extent_buffer(eb); + } + spin_unlock(&trans->releasing_ebs_lock); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 0cc0b27e9437..b2ce16de0c22 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -43,6 +43,9 @@ int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new); void btrfs_calc_zone_unusable(struct btrfs_block_group *cache); +void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb); +void btrfs_free_redirty_list(struct btrfs_transaction *trans); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -127,6 +130,10 @@ static inline int btrfs_load_block_group_zone_info( static inline void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) { } +static inline void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb) { } +static inline void btrfs_free_redirty_list(struct btrfs_transaction *trans) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Jan 26 02:24:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048189 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5862C433E0 for ; Tue, 26 Jan 2021 19:59:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B8BA822A85 for ; Tue, 26 Jan 2021 19:59:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732928AbhAZF2v (ORCPT ); Tue, 26 Jan 2021 00:28:51 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38278 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726712AbhAZCdd (ORCPT ); Mon, 25 Jan 2021 21:33:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628412; x=1643164412; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8o51zJXgKCL1eBAT72xASng1Vg7d57lWE5i4quEbbGI=; b=msnz6RI1OeBfVTP8ZagVl18MoZXsPqkOOtKaS00xwjSP2V0WwcTvUMCK Ih9GHHDym2D9mVOj0hyPl7eefKmSz+CztrjRcs3O/+3TC41Z8HdqjV9No +KncZ7FCVC98qX89CL4VGrUcaYWYSurEtno/jAAMBvx3nd7uSwLt9z1rs Y3nvDHeMt3tJyy2cuxptjI2eCAdzgxuf81nZhIkBUx0huk7YIH3lbjm1Z nORAol/RwpBGIY3472z3ROv16ob6r+5w+SUkNbfgjt3MP6HmYBvJH1a81 ELv3ESOMUJ7d4i1EdwuYlD65eNDLZQQPxvsGg3bdTkNCqlT/BW+cDzZod g==; IronPort-SDR: UTHpgILdlQPlzDeFIAqBjb0OCgAkNx41TufGKHKoSNwgsHg1FdA/iF4zfkcn35bwJb6uTFuTao XZ6TT/F30C3cz+oSuRcq3Z5T78P2+S6OGSHt+l7uFwnFy0oG6hAqZ/RTGlRpY1xNPl8vTvsPrW VlotcdO/6BBwl5k70FStxAmEB+LX6ftsmYlBU41F14EAZJe/zR1eoDYUxd1cHEWqG4Zub3UgE4 oVODNSdNEkeCha0PV7+SCsALWUD/GjvYgqT2D9o1+rgWfz/5OMs1oeruZyDZREL7UbuCcJYDFp r2A= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483539" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:26 +0800 IronPort-SDR: RG2XqoY/bGDBfIVlS2h4DV04Jxa9VwUeFJPOmuOBuFRyvZaE0L3ybCCS1JlNp8SyIiCGR2gmmv PchdEMTk6HIyyfeStAM/HRozKR5idbpTDyRheBMnTOD5TPjn91uZY5/V+mJvhkk+cYHPBSKy9y JAHRYTbGW2pEGIxGZj5T9T4cXavp6eGCQVHRDoNJhMk+Drhy0Y05AeiJBEOPYGfouocgPp94Y8 0hkr6/mJu/RX8M2OZMaS1glVkkYqhxR3/wUpOQrH/HBWwBpUo2HoRQ1KhNbgWybqDNbTkQDhSq nEU3EVexIMiIAOg7bYYYXq+v Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:53 -0800 IronPort-SDR: PTmbE2p0B7nDKxOYXNgSyF5DkLIrfICpLBu4ErRTMyZdYvdi+mP5Z4/1ovhzzmK+N4/6ewK7+w AlDN7fdMutIodRLUkxFPtggRgtgMZCqQWuhnAAURkL8eHZIpYfVrppNeAqVZSAyR4FnfTu18uM noHTxpVmAk/LwkYsezZxcquR64fZ0y0deTjeT+bxaqv1YmuAeH720LJ+cxrvapBcImXpj38j4x 8/S4/X9BidSSy5jK0k3KpxiXTS7mJq3x64JRkYEAdvdEA7P+szAqzuswFDYV893d3ILCAdoHJE nQk= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:24 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 16/42] btrfs: advance allocation pointer after tree log node Date: Tue, 26 Jan 2021 11:24:54 +0900 Message-Id: <69b443b9aa59b80de9442a0f5f81c685fd326b15.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Since the allocation info of tree log node is not recorded to the extent tree, calculate_alloc_pointer() cannot detect the node, so the pointer can be over a tree node. Replaying the log call btrfs_remove_free_space() for each node in the log tree. So, advance the pointer after the node. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/free-space-cache.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 19c00118917a..c4ccfcb98aed 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2619,8 +2619,22 @@ int btrfs_remove_free_space(struct btrfs_block_group *block_group, int ret; bool re_search = false; - if (btrfs_is_zoned(block_group->fs_info)) + if (btrfs_is_zoned(block_group->fs_info)) { + /* + * This can happen with conventional zones when replaying + * log. Since the allocation info of tree-log nodes are + * not recorded to the extent-tree, calculate_alloc_pointer() + * failed to advance the allocation pointer after last + * allocated tree log node blocks. + * + * This function is called from + * btrfs_pin_extent_for_log_replay() when replaying the + * log. Advance the pointer not to overwrite the tree-log nodes. + */ + if (block_group->alloc_offset < offset + bytes) + block_group->alloc_offset = offset + bytes; return 0; + } spin_lock(&ctl->tree_lock); From patchwork Tue Jan 26 02:24:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12045407 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CCF1C43381 for ; Tue, 26 Jan 2021 05:29:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 44708229C4 for ; Tue, 26 Jan 2021 05:29:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732970AbhAZF2y (ORCPT ); Tue, 26 Jan 2021 00:28:54 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38256 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727520AbhAZCdi (ORCPT ); Mon, 25 Jan 2021 21:33:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628417; x=1643164417; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SeEkBNnuIvaOUWWikpt8QDyQbatKEPjRZik1fvusQvM=; b=hvqQ3D1d4PKbunsR4h7aYyarLb81PJsgWJPBz9k2DXyV6ZNh3htNMt2O mJVfbVsyBXsIsvEmrN7MxZmnUFbpz9uQ9GdDGGlcvcILUXyggDpluEMss XgP5YtTPIqSIw3jJCJ3zMb4t1kHZbbrkQhyifyAnsxuppvm4VzEdzc8ZR w5gDM+TwDhZB/zWJqmQZBMec0T6RfA0udvUimoRrTZFWTJyKCSDYHypgJ F0BofnHoe+rysarJkgm0E8mI4wgT9olCopgNKTfl5dkYLO+XJAmDxD1KI epbtlHGChtCpN5arKyHfl5T1Oc3CAAVF556HsjuS9fiunnoHgXPXsSJWU A==; IronPort-SDR: w1Ex3Bwb0vY+Ta4DiZefA31rTooUXN84Qi6ZVMHvwg1JYzLt3t1owaTNL2uCmvn2oU5qIs8eF2 AnYw2hrXbYi3+FO+wUTxL2JzOzxaRUEyPsn3KKvFEnVZa3EsdtSrbAkZJsyQLygHx0tTlzaV0w 5U07EfHyo/8Bh/pxmXgudiiMSXMUflwF4Xz2pSOq0oLCs9mkyvYN5O4UBmttrhpALNiTZ3wjAP 5QoUAoMhprsXiuB/MEbyFhpyPh8OGa6xMlyqwejyccKN26KJJ3MtX7GoWclH8zvuitbHtkX0jU iJc= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483540" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:28 +0800 IronPort-SDR: Q4IEIoFiayd4A9eCbNkkwnErTCgApevPEYt/FTtx52m3l8mZLIWdmBzZJWHZiUqhuwRSPeporc OO0/LkL9fVTF505KTGUr2ZAzxlzJl9ZWRCEnG4XtQjNAAvvc3v+LxE7W6o9A1OnUcBXtPL6upx 5TmeCVkI4DEhGqukbv4lZo6cWbIV9lEN7Mqe+JY1Ef5Iv6F5w/dYaU+8XXh/GfWEQCQqAvuBCs uaTjU9dFY+FZvxllC/uIaJdW151SUlC10JgdztIzdrTlqZY0i9xyapau9lthyorHN99ARgQboT YbRrw0W+5hASbs848W6nA6qH Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:54 -0800 IronPort-SDR: Bpqs6fNKk87F1FvkG99HMYRLVXxXJbtPLPbjlgBqHMulp/pkHc5yy2MTyKuCAxTcaO7KzucKCC EkO/e6kZ9eQUfqTfayeP3XGdyMvMNnvgHDw1XABKbIhNWL1aJd0fzDA9Os9g8FVX17jBuZnR/r AeFtImJZksu3Wqa0zDk4mQbpMJetxZ/tfwlaoHkl1sKP30UUaUYOcrfvGKrQWaopM0ezct6DsN 9lZGOAb7j91+JMWIdWZBY88o5TNsLkDHG7iQJjR2hUG/2F2pKA+ay7ZH5buaJWN6Atiru0YYrE nyQ= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:26 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 17/42] btrfs: enable to mount ZONED incompat flag Date: Tue, 26 Jan 2021 11:24:55 +0900 Message-Id: <51183faaa8afba3858bb48be627ef5072d268fc1.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This final patch adds the ZONED incompat flag to BTRFS_FEATURE_INCOMPAT_SUPP and enables btrfs to mount ZONED flagged file system. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik Reviewed-by: Anand Jain --- fs/btrfs/ctree.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index ed6bb46a2572..29976d37f4f9 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -298,7 +298,8 @@ struct btrfs_super_block { BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \ BTRFS_FEATURE_INCOMPAT_METADATA_UUID | \ - BTRFS_FEATURE_INCOMPAT_RAID1C34) + BTRFS_FEATURE_INCOMPAT_RAID1C34 | \ + BTRFS_FEATURE_INCOMPAT_ZONED) #define BTRFS_FEATURE_INCOMPAT_SAFE_SET \ (BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF) From patchwork Tue Jan 26 02:24:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12045409 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1953EC433E0 for ; Tue, 26 Jan 2021 05:30:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D7F2522B37 for ; Tue, 26 Jan 2021 05:30:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733016AbhAZF3C (ORCPT ); Tue, 26 Jan 2021 00:29:02 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33036 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729818AbhAZCf2 (ORCPT ); Mon, 25 Jan 2021 21:35:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628527; x=1643164527; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TQaHpRpvpvOmMKBtrqop2a0m3vKpXCqXRe8FvJCx62k=; b=lJE/GqxoCVf+wURRfSH75apFWe6SXAvFKZZN3TNWp4nbKqrzt6imjvdv DxaXTCXp0+gZ8FOz3fDi33ojf1QDvgt6LkmC4ybKuudwNuSZPvk/Qdzin 69Y+l1ZyLadJjbAg4kN7nY996z7HEf1F6QWbbnA40TZoUC6M7EYBEhAbG lClzUBIR7iD1Hr3BEacrA+MVeHLBiEFamtV0uxJaUUoHIbnWW4p1dEUMk 83bKcUo3HxyBazHVasoIxtWhbiBhw1E6WkEtRpkwhGOoyBksETZIk9WQ9 nC6oKnj4NnwNa39h/vL6KoMt/sEuAkZ7PJzbycxL5KA3Cm278b3eOuLKb g==; IronPort-SDR: FfDIcsaM1bwhI1+QnAIOn7MBO+hv/Z530ovQU2Y/6/ZO0QNAOT8HgPzLPJhe/+NbfrP01qooAD msS9+p1O3uAZp36mWqazp2HicrP50h2aADdhSnEGMk2ppkGvvS1PSKgwXwXW79XmQ2sD4Qnzlc WQEvCWncEEOrL63P4dsrYcTMZPA41xWlI9nwwEsNc3OKa9MCcYoMHrqAQWIVcIx2LyqYru4FqH E5QyZUralUmMgmBzLW2UIfH8iLOE7IWgt1L/ZmRp+VQc9gjpmAYvxlSAF9gITtiNNZO2axEUH0 dyw= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483542" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:30 +0800 IronPort-SDR: wKzXZdoerhXJpBswUgouKVvc5yR7QAAXZAvVUeMwi1JPbQqC+ASGZgekkTfTSF2Y786+VtVnj6 eSWhokvp/r1tm9bRStUBFw7fYymmmSY7sK4u4XL+ms27IIYiOz5ZQ1yxV+ttWcMEFQWwOunNGV SkPZxFcFFiTdu5N3yK42AmRceeNPWqR3/rrJPgfVKX9o8CPzLDFmQHixLiJug6kJmZS5PejYEK 0oRXquQgo6GpaGID1T5gkgZaOxEL0jcQyxKvtixMHRVp1X4C27dRFB/g66BbQKPhBOLXueqd43 w7VCpOKObsXzPnBVSOpco8TQ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:56 -0800 IronPort-SDR: 83UhUV6g5XlHJfVta+zVvwoiadUBXAPQY3pWgr2/KTM6LaUrWDjtrg/r6dV2v4eDmMuW/dzajq Qv0REkHplS2vyLa6i4nLQk9EZCj+d8ZjXstdYE5gmsKdfQZd3elujOz+MBfq0y0K1o4ZrsWh6T 7Rw5gpUJYMmf23CMLOcuWa4YcU2R+/m+947yzxZADGwlTPLPGH5V5+/UZYREMd/tUWNFl/IkHn xKTrFmZbKFiywW5BJUN+okypU+8wiArKwiNh2W2feuHBYfZ3qu/5gw1hh/k8BS/fu6riF0PeHc 6RQ= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:28 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 18/42] btrfs: reset zones of unused block groups Date: Tue, 26 Jan 2021 11:24:56 +0900 Message-Id: <10c505102feb1c7bd352057ed528b1a04b36bb57.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org For an ZONED volume, a block group maps to a zone of the device. For deleted unused block groups, the zone of the block group can be reset to rewind the zone write pointer at the start of the zone. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 8 ++++++-- fs/btrfs/extent-tree.c | 17 ++++++++++++----- fs/btrfs/zoned.h | 16 ++++++++++++++++ 3 files changed, 34 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index f38817a82901..9801df6cbfd8 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1403,8 +1403,12 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) if (!async_trim_enabled && btrfs_test_opt(fs_info, DISCARD_ASYNC)) goto flip_async; - /* DISCARD can flip during remount */ - trimming = btrfs_test_opt(fs_info, DISCARD_SYNC); + /* + * DISCARD can flip during remount. In ZONED mode, we need + * to reset sequential required zones. + */ + trimming = btrfs_test_opt(fs_info, DISCARD_SYNC) || + btrfs_is_zoned(fs_info); /* Implicit trim during transaction commit. */ if (trimming) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 36a105697781..4c126e4ada27 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1298,6 +1298,9 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, stripe = bbio->stripes; for (i = 0; i < bbio->num_stripes; i++, stripe++) { + struct btrfs_device *dev = stripe->dev; + u64 physical = stripe->physical; + u64 length = stripe->length; u64 bytes; struct request_queue *req_q; @@ -1305,14 +1308,18 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, ASSERT(btrfs_test_opt(fs_info, DEGRADED)); continue; } + req_q = bdev_get_queue(stripe->dev->bdev); - if (!blk_queue_discard(req_q)) + /* Zone reset in ZONED mode */ + if (btrfs_can_zone_reset(dev, physical, length)) + ret = btrfs_reset_device_zone(dev, physical, + length, &bytes); + else if (blk_queue_discard(req_q)) + ret = btrfs_issue_discard(dev->bdev, physical, + length, &bytes); + else continue; - ret = btrfs_issue_discard(stripe->dev->bdev, - stripe->physical, - stripe->length, - &bytes); if (!ret) { discarded_bytes += bytes; } else if (ret != -EOPNOTSUPP) { diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index b2ce16de0c22..331951978487 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -210,4 +210,20 @@ static inline bool btrfs_check_super_location(struct btrfs_device *device, u64 p return device->zone_info == NULL || !btrfs_dev_is_sequential(device, pos); } +static inline bool btrfs_can_zone_reset(struct btrfs_device *device, + u64 physical, u64 length) +{ + u64 zone_size; + + if (!btrfs_dev_is_sequential(device, physical)) + return false; + + zone_size = device->zone_info->zone_size; + if (!IS_ALIGNED(physical, zone_size) || + !IS_ALIGNED(length, zone_size)) + return false; + + return true; +} + #endif From patchwork Tue Jan 26 02:24:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048185 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 123CBC433DB for ; Tue, 26 Jan 2021 19:59:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DC8B522A83 for ; Tue, 26 Jan 2021 19:59:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733059AbhAZF3P (ORCPT ); Tue, 26 Jan 2021 00:29:15 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33029 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732027AbhAZCfa (ORCPT ); Mon, 25 Jan 2021 21:35:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628530; x=1643164530; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=93seYua6GtUzgBRkbG55tIMzckw0frJtOIIYR7zSu2g=; b=dPce5jkc3Yryib5NvX+XfXghfHerD9Jsl4nRtDuapeEZEnQ7jEn6R22v d5R0P8nZS8q5tpreQ4usLwWflBGV4ey8xEVpYDdqXfnj4WGV2KShnSpEC jgjw7Im26IEUuAq/+96ek4YJGmIMbSOScNxefBXq0+BaCLr8RIHKtlVaX IDCU8DTLe6RDvH3d6DVeNjkgA8X/AqLNpGREp7u2yyE7pPo9PfUStxGbg Q7DMzPN2EFTYms3q6tUC/SyGlHBF86YpHyxYWfa04zlcgP+GzUav6sNb6 luFJnlgYq166b5VA4fCcypsOYXaTE9dNerElh15k55CqJx0sghgZKJHPS Q==; IronPort-SDR: IeLwFrchL5at2eegMnR7G/XJ9dne8Xi02FBELvDJhhc3+Mz6mvUTRb6x2Z5UjwllhFjzn0uEUn ar2T98IO102231lFStlkHvU2c5yae5gyw4NozF0d6B23CCWjjGIatdGgSyLIFrYOjKDpTq+gEu oiigWr3HKsy64V75uYXQhIARHeiJqaUdF7UDB4VcGvLD/JKiUljunKr1B08ITR/ZVwULDGh44Q gud5f07oh/UzH4YdTfsuJD7yKnj7e2MtsSM9ezjW6m/7dhgvYS95hgY+OFO7MEXErgFxBykgee jB4= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483544" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:31 +0800 IronPort-SDR: YWThrVYiADF4pTcEo3Jq36fVvXq5q7xj3U4UKDo1udgDnnxOks+1VvcEC9ECW7OXKLonA+cp26 1v0EUAHMSTN/gYYGKBpiEhSQaNwT+CsInKZLhBRblmKO2SGwVW1zea0Oe5tnEOhbtF/M8ztzHM S0agUgOhBCbeYYC54j4e537GISEWBE7hr/h/eZGezw3aLlglrl9kkKHrvEiFnQpecn+oCwxuda m2PvVyzdqK4qoK7MNhn6C+WMvO1o+5qzcWqz2d5isrQkyLOY/xSX6od8QdurKN6pO+N6p3bTaY keQ42hbX5+64qa2uVIFOCHaF Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:58 -0800 IronPort-SDR: ppnNv1gEYu9RKJ5hsqCPzI7TY7TOkmT10sD95NMwXP8KeBmkUKQY5LK5IiH86VCkecc6aSvZKx 9JBWjHvVeJvRXxOa0n5a3E4mOEnKguzb+Z+uB753clDTVHKTBMY1PqgsIRjIvadUynHQ48AtFI +sFgP0pJT618ewy+NCp2AYO+VJ9rIHklkZXVPQm/9f6lhOEGwW1vVnDJDYtbUezuem15XGHBBI NZDoKGcmcFStL7uWABgOckxY9sCGV/J77C4d/pjY1wFXncKsDpbCvTMdZ20s++K6otpZJPQGcb GfE= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:30 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 19/42] btrfs: extract page adding function Date: Tue, 26 Jan 2021 11:24:57 +0900 Message-Id: <96945d7bd401af14e03525c0c4fe3557ab9441f9.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit extract page adding to bio part from submit_extent_page(). The page is added only when bio_flags are the same, contiguous and the added page fits in the same stripe as pages in the bio. Condition checkings are reordered to allow early return to avoid possibly heavy btrfs_bio_fits_in_stripe() calling. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/extent_io.c | 58 ++++++++++++++++++++++++++++++++------------ 1 file changed, 43 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index d80b3a96ae49..df434f9ba774 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3060,6 +3060,46 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, int offset, int size) return bio; } +/** + * btrfs_bio_add_page - attempt to add a page to bio + * @bio: destination bio + * @page: page to add to the bio + * @disk_bytenr: offset of the new bio or to check whether we are adding + * a contiguous page to the previous one + * @pg_offset: starting offset in the page + * @size: portion of page that we want to write + * @prev_bio_flags: flags of previous bio to see if we can merge the current one + * @bio_flags: flags of the current bio to see if we can merge them + * @return: true if page was added, false otherwise + * + * Attempt to add a page to bio considering stripe alignment etc. Return + * true if successfully page added. Otherwise, return false. + */ +static bool btrfs_bio_add_page(struct bio *bio, struct page *page, + u64 disk_bytenr, unsigned int size, + unsigned int pg_offset, + unsigned long prev_bio_flags, + unsigned long bio_flags) +{ + sector_t sector = disk_bytenr >> SECTOR_SHIFT; + bool contig; + + if (prev_bio_flags != bio_flags) + return false; + + if (prev_bio_flags & EXTENT_BIO_COMPRESSED) + contig = bio->bi_iter.bi_sector == sector; + else + contig = bio_end_sector(bio) == sector; + if (!contig) + return false; + + if (btrfs_bio_fits_in_stripe(page, size, bio, bio_flags)) + return false; + + return bio_add_page(bio, page, size, pg_offset) == size; +} + /* * @opf: bio REQ_OP_* and REQ_* flags as one value * @wbc: optional writeback control for io accounting @@ -3088,27 +3128,15 @@ static int submit_extent_page(unsigned int opf, int ret = 0; struct bio *bio; size_t io_size = min_t(size_t, size, PAGE_SIZE); - sector_t sector = disk_bytenr >> 9; struct extent_io_tree *tree = &BTRFS_I(page->mapping->host)->io_tree; ASSERT(bio_ret); if (*bio_ret) { - bool contig; - bool can_merge = true; - bio = *bio_ret; - if (prev_bio_flags & EXTENT_BIO_COMPRESSED) - contig = bio->bi_iter.bi_sector == sector; - else - contig = bio_end_sector(bio) == sector; - - if (btrfs_bio_fits_in_stripe(page, io_size, bio, bio_flags)) - can_merge = false; - - if (prev_bio_flags != bio_flags || !contig || !can_merge || - force_bio_submit || - bio_add_page(bio, page, io_size, pg_offset) < io_size) { + if (force_bio_submit || + !btrfs_bio_add_page(bio, page, disk_bytenr, io_size, + pg_offset, prev_bio_flags, bio_flags)) { ret = submit_one_bio(bio, mirror_num, prev_bio_flags); if (ret < 0) { *bio_ret = NULL; From patchwork Tue Jan 26 02:24:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048187 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84212C4332B for ; Tue, 26 Jan 2021 19:59:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5CA3322A85 for ; Tue, 26 Jan 2021 19:59:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733031AbhAZF3J (ORCPT ); Tue, 26 Jan 2021 00:29:09 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33033 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732043AbhAZCfa (ORCPT ); Mon, 25 Jan 2021 21:35:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628530; x=1643164530; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bGkX3hb0zkxfFujhJMgPf7DfHa82gfSPmHQHl4Wnv7U=; b=F+qmPKfDxYzdAGz7/xgp1dlkR+FE2yh/iKrwtLQ+dOd65Z0kvStAjwFf d1gExwa6Ytgtuk1UWlQ/AhlBzT8aS0kfLpHqo74vpEpKn3vmDFd17Xwle FRaUQddarM1r0sKCG7Vw1asAo15Eu5yJf5AGlMl3MemjeEeYpCjNwShcP kghS4qItaUv9Owo40BFBe9fXfTfNDPbMR5QHSTZBaAJ/VnSaQI6EyrKnU 5PhUB4Zj6gjga5jyjpyraZemfb52WYUw8oZ90N2lgVvy+Z4yyYdTBJsTN eKUFH1w+N6kjHHEehT3XuVoT8LzS4/wkm5CqktZZ4YRsmkWE+g+y+PRVX A==; IronPort-SDR: bWFRV7igUvzG26PQujB1PLliWkfLBALw+3O/XKFD5LeOq37l8pgOo0Ymwg/GxdKqpfUjSabEze pCbtpJXomb6MRiofzNSjsLnor+Sm0no4Jfyp1NOxGMV0lzxR54gtFiVRODVi34nhPGBoXNZGhq IUyEgJmtn5qT4weR/KCyvSijIKkNP66RG0ulfSQtkgfDU157k67xBsqA5PGBs7VE+5olxD6bx8 /vp2rqB6lb2L6D8qJ+5dGi393cF0bkYRowyAAe0QA3Ey2BHrQryU9zPCCskw+hcZQAy29gl5YY EEQ= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483546" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:33 +0800 IronPort-SDR: HNXHPrSkw98bKwfD4URNRnX3NU9Owo+dn3pTVlbTh6NaEVPFuE+GRg+00/MDQSexHA5u74g+zi zsGG+qC7F4jmb+Up6Xpm+CE3zksBTOPzAp9inBX0Q/70XBqSP+ov1XJ5iVfh2zx6ivJDBsvdHZ iJD61XQjHQy+eWH5PDIbxTvHrn/Nzd35ZNOjso34oh3XyPMWzNAmpOSoKOSm+hc0o2bRqOXM9S lvSyDmcl7DAdnYq70aKeE7ahAgqZcrRii49j+6jkGk40hQm3NoUWnzVh+PgqoeojikglUEgmQl fkjWx9XzC368D0U9zn3gfKKh Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:10:59 -0800 IronPort-SDR: EcGX66tlTnzY8TVo9z4elhRvKGv3DMEki59QYNEt0x4v/n1Adex32+S7yeQxyK9sAlwWtQ3tEH qjfclTqtujzMiwZL7+p15V43wAXxXgCu80Qk7koAxF+q7OReHiV9i/rBLpzABLlBlvu+gjKT/a A86Pe1us005+Q8B3walJigw0Xmp/DnT2Br33zKIC0aJz95wOJHCjxh1qoUe//BsbjJG8MErcMh ilVm6fFZa9lsvrD8RR3fpQzQ7L02RiN0wkQExGxzzRDHtondP7kKF6uTkEoHoPYvdRKWA7Tm3T fnw= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:31 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 20/42] btrfs: use bio_add_zone_append_page for zoned btrfs Date: Tue, 26 Jan 2021 11:24:58 +0900 Message-Id: <51b1f298d964cbe07c3714c3361fdbf596dbf52e.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Zoned device has its own hardware restrictions e.g. max_zone_append_size when using REQ_OP_ZONE_APPEND. To follow the restrictions, use bio_add_zone_append_page() instead of bio_add_page(). We need target device to use bio_add_zone_append_page(), so this commit reads the chunk information to memoize the target device to btrfs_io_bio(bio)->device. Currently, zoned btrfs only supports SINGLE profile. In the feature, btrfs_io_bio can hold extent_map and check the restrictions for all the devices the bio will be mapped. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/extent_io.c | 30 +++++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index df434f9ba774..ad19757d685d 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3083,6 +3083,7 @@ static bool btrfs_bio_add_page(struct bio *bio, struct page *page, { sector_t sector = disk_bytenr >> SECTOR_SHIFT; bool contig; + int ret; if (prev_bio_flags != bio_flags) return false; @@ -3097,7 +3098,12 @@ static bool btrfs_bio_add_page(struct bio *bio, struct page *page, if (btrfs_bio_fits_in_stripe(page, size, bio, bio_flags)) return false; - return bio_add_page(bio, page, size, pg_offset) == size; + if (bio_op(bio) == REQ_OP_ZONE_APPEND) + ret = bio_add_zone_append_page(bio, page, size, pg_offset); + else + ret = bio_add_page(bio, page, size, pg_offset); + + return ret == size; } /* @@ -3128,7 +3134,9 @@ static int submit_extent_page(unsigned int opf, int ret = 0; struct bio *bio; size_t io_size = min_t(size_t, size, PAGE_SIZE); - struct extent_io_tree *tree = &BTRFS_I(page->mapping->host)->io_tree; + struct btrfs_inode *inode = BTRFS_I(page->mapping->host); + struct extent_io_tree *tree = &inode->io_tree; + struct btrfs_fs_info *fs_info = inode->root->fs_info; ASSERT(bio_ret); @@ -3159,11 +3167,27 @@ static int submit_extent_page(unsigned int opf, if (wbc) { struct block_device *bdev; - bdev = BTRFS_I(page->mapping->host)->root->fs_info->fs_devices->latest_bdev; + bdev = fs_info->fs_devices->latest_bdev; bio_set_dev(bio, bdev); wbc_init_bio(wbc, bio); wbc_account_cgroup_owner(wbc, page, io_size); } + if (btrfs_is_zoned(fs_info) && + bio_op(bio) == REQ_OP_ZONE_APPEND) { + struct extent_map *em; + struct map_lookup *map; + + em = btrfs_get_chunk_map(fs_info, disk_bytenr, io_size); + if (IS_ERR(em)) + return PTR_ERR(em); + + map = em->map_lookup; + /* We only support SINGLE profile for now */ + ASSERT(map->num_stripes == 1); + btrfs_io_bio(bio)->device = map->stripes[0].dev; + + free_extent_map(em); + } *bio_ret = bio; From patchwork Tue Jan 26 02:24:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12045413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28CD4C433E6 for ; Tue, 26 Jan 2021 05:30:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EF46E22D58 for ; Tue, 26 Jan 2021 05:30:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733086AbhAZF3R (ORCPT ); Tue, 26 Jan 2021 00:29:17 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38250 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732090AbhAZCf4 (ORCPT ); Mon, 25 Jan 2021 21:35:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628555; x=1643164555; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SxJne3p2Xhpjjk7eFNWNTn3PUxOOwtkuubHw0ijL2hA=; b=qGPiYJbAY5+ax9O8ftxBMBtJl/UnS/2slkHEJ1yjyjDBIGbSgOq24Hsd 1QaebAQQbI3/PJN3makuiD5ATGxH3FEVVn5+H9r+0c40qzd+bUU1ySSNg MxaQrOx3WS63niedgpJRg7aYdeNKaO4jEyzkTJNmC+7+GbUOYjF0e98XB r0pCG0UStuOXvLDBPh36THTk2l1Iobu07Eqnv81SHC/QXEcTHOm13zCDm fOe64VKDYJjkT4izuT5TiRH4zgc0B5QBEchl2PdgMO6IDX4iJaE+IPcbs dycToLPqXigiBheMqlsIBALAMfJDN2z7KutMkwPTUBBPOqCXFlvqsD7ZQ A==; IronPort-SDR: gDAWA+lV3bZqY+x6woauad7Pt7moeH3NOVCdAHfa2yVE9TjJpqW/UtoxYq7onOHRk4lmnKf4AM nmhMoNjg544Hdjyd9gJ4IO9cQfVr/3b/hUisMSJKwVP1Q3EVAGjztG+b/NlRFGvhvY+02iGjux Qsx6JwdskiFZ/z9lahdfMk6UCM75EFV3kTl1n6odnxDBQZsNDOoQdjWQ/G69cYNUjeCN2P3kFb TjbFHQW+S3LrAbMAes2ugCytMhbuQ6P3aSDLlLZJnUNzJK1dalG69m8aLHPMg5Vev2bSQ97ENe mPo= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483550" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:35 +0800 IronPort-SDR: v0jnXDLIB5065BtUSEZVUPddA3UzJzr+RLtvcWgMk6ZhlLRlQm12la2PFOtCGMwlDQ2Naar74R nqv6O3hnXhneESOA9aDV2BeovbmIxIKc8Uzw+r3ycm0TdBkE3e+IY6ZvSjPfLdU4gum9jSQUpC 5qadpcc4bF2Ubo15SqUWsWTisP1Ddsh5CuKyMhEcH0L7vJLE/RxrNtyA0cgVsd6ORkRFHsAC0x YK62o94iWU5YJdUpWzC++Rj4rTtojt9n3N0YXkAwnkysV1XQ9rQZt2eCQnFQQqqORCjiN7kEXg iwvsw7g1Hjwy1UWZmOoW/XUZ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:01 -0800 IronPort-SDR: nFTAwLlpfgx3glSTIaxQxuOJtVN2pSVAXjEDDanId9rrJYlYb9h34fVMiVNOA5G9uIu4Ld6uKN FEPPb/FWrxPj/tsXd34xGC8RojBoSm5Qe7QURdz7UNn+JTkXTa7Bk71IOUQqBHRqwgH3KvpsjF HZ/VX3inFED+EzbfJXZKCZDvSPSJBy6hWE89GQfs/UlW3Dk/l+O+vJEOr1ZcWPGmP8sxl/0u9J gvy49nVVIrblSbk95uMy+o3SgCzcyIqYxmEJmk+0eQybA5o0Mov1jHFH1YP2RMh1uylzcKpgnf Yq8= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:33 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 21/42] btrfs: handle REQ_OP_ZONE_APPEND as writing Date: Tue, 26 Jan 2021 11:24:59 +0900 Message-Id: <52be26da1c7d2521665c385744dd7115e09dc644.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org ZONED btrfs uses REQ_OP_ZONE_APPEND bios for writing to actual devices. Let btrfs_end_bio() and btrfs_op be aware of it. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 4 ++-- fs/btrfs/inode.c | 10 +++++----- fs/btrfs/volumes.c | 8 ++++---- fs/btrfs/volumes.h | 1 + 4 files changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index d530bceb8f9b..ba0ca953f7e5 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -652,7 +652,7 @@ static void end_workqueue_bio(struct bio *bio) fs_info = end_io_wq->info; end_io_wq->status = bio->bi_status; - if (bio_op(bio) == REQ_OP_WRITE) { + if (btrfs_op(bio) == BTRFS_MAP_WRITE) { if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA) wq = fs_info->endio_meta_write_workers; else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE) @@ -828,7 +828,7 @@ blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio, int async = check_async_write(fs_info, BTRFS_I(inode)); blk_status_t ret; - if (bio_op(bio) != REQ_OP_WRITE) { + if (btrfs_op(bio) != BTRFS_MAP_WRITE) { /* * called for a read, do the setup so that checksum validation * can happen in the async kernel threads diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 0dbe1aaa0b71..04b9efe4ca5a 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2252,7 +2252,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, if (btrfs_is_free_space_inode(BTRFS_I(inode))) metadata = BTRFS_WQ_ENDIO_FREE_SPACE; - if (bio_op(bio) != REQ_OP_WRITE) { + if (btrfs_op(bio) != BTRFS_MAP_WRITE) { ret = btrfs_bio_wq_end_io(fs_info, bio, metadata); if (ret) goto out; @@ -7684,7 +7684,7 @@ static void btrfs_dio_private_put(struct btrfs_dio_private *dip) if (!refcount_dec_and_test(&dip->refs)) return; - if (bio_op(dip->dio_bio) == REQ_OP_WRITE) { + if (btrfs_op(dip->dio_bio) == BTRFS_MAP_WRITE) { __endio_write_update_ordered(BTRFS_I(dip->inode), dip->logical_offset, dip->bytes, @@ -7850,7 +7850,7 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct btrfs_dio_private *dip = bio->bi_private; - bool write = bio_op(bio) == REQ_OP_WRITE; + bool write = btrfs_op(bio) == BTRFS_MAP_WRITE; blk_status_t ret; /* Check btrfs_submit_bio_hook() for rules about async submit. */ @@ -7900,7 +7900,7 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, struct inode *inode, loff_t file_offset) { - const bool write = (bio_op(dio_bio) == REQ_OP_WRITE); + const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE); const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM); size_t dip_size; struct btrfs_dio_private *dip; @@ -7930,7 +7930,7 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap, struct bio *dio_bio, loff_t file_offset) { - const bool write = (bio_op(dio_bio) == REQ_OP_WRITE); + const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); const bool raid56 = (btrfs_data_alloc_profile(fs_info) & BTRFS_BLOCK_GROUP_RAID56_MASK); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 2d52330f26b5..e69754af2eba 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6455,7 +6455,7 @@ static void btrfs_end_bio(struct bio *bio) struct btrfs_device *dev = btrfs_io_bio(bio)->device; ASSERT(dev->bdev); - if (bio_op(bio) == REQ_OP_WRITE) + if (btrfs_op(bio) == BTRFS_MAP_WRITE) btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS); else if (!(bio->bi_opf & REQ_RAHEAD)) @@ -6568,10 +6568,10 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, atomic_set(&bbio->stripes_pending, bbio->num_stripes); if ((bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) && - ((bio_op(bio) == REQ_OP_WRITE) || (mirror_num > 1))) { + ((btrfs_op(bio) == BTRFS_MAP_WRITE) || (mirror_num > 1))) { /* In this case, map_length has been set to the length of a single stripe; not the whole write */ - if (bio_op(bio) == REQ_OP_WRITE) { + if (btrfs_op(bio) == BTRFS_MAP_WRITE) { ret = raid56_parity_write(fs_info, bio, bbio, map_length); } else { @@ -6594,7 +6594,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, dev = bbio->stripes[dev_nr].dev; if (!dev || !dev->bdev || test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) || - (bio_op(first_bio) == REQ_OP_WRITE && + (btrfs_op(first_bio) == BTRFS_MAP_WRITE && !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) { bbio_error(bbio, first_bio, logical); continue; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 98a447badd6a..0bcf87a9e594 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -423,6 +423,7 @@ static inline enum btrfs_map_op btrfs_op(struct bio *bio) case REQ_OP_DISCARD: return BTRFS_MAP_DISCARD; case REQ_OP_WRITE: + case REQ_OP_ZONE_APPEND: return BTRFS_MAP_WRITE; default: WARN_ON_ONCE(1); From patchwork Tue Jan 26 02:25:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12045415 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 417B1C433E9 for ; Tue, 26 Jan 2021 05:30:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1381F229C4 for ; Tue, 26 Jan 2021 05:30:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733121AbhAZF3T (ORCPT ); Tue, 26 Jan 2021 00:29:19 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38278 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732093AbhAZCgR (ORCPT ); Mon, 25 Jan 2021 21:36:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628576; x=1643164576; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xaQvsXfKxbUsEs4+KCyxUr51zfpc6zyn6qZFPGuCIE8=; b=cwfKAU7IS1yIdthkM6b7FS0wx2Y2QlGji8kVqMGJ4qr48FyHj06kEdMY a355Cxq5ZW1qzI3KCmk8pRFAXWp3S7z0TRXycJV6U/Ri6xlOMUN23yR98 XBvM4gsMoB5mPdRwR83ZufoByBzItUAbfkuByuzv02o+PPcgGMEtXRcsM czSMZsbX6wJaG/RByg+UY3hgWzsOkQ1sufIuo30NVyP7t2496pzQ7XqSW p4M4pw0w1jK0r1wV6R+LrWC5UmWSTpoc/q/iVu9Ps1+YTCZo1zv7x/9Jr LLrCdJ38Igak3Fvaq3sB4imfs5jeCV2qNseQyLmnqPzexwW+r9v+f/EW8 A==; IronPort-SDR: mwSatgumxmJxpPy1qi3KoNoGfHzdKNK7lgXvCsUBiH84jGqpkkCz1UQ90Pg6DDhATiYlg5vFT0 IW9lKb2oSQxe70ZRbu9HbNW3xPTHBEHDa3EnO3BXho3V4/ShM8202pXsgDEe6IvR8fPJ8J0QXr Tekik6JE4pwnGvt2nWDJQ88AQVQPY01m0Jl276GexGcCtQs6Lz9fg+AfEa9lfJLyK8RzaJsdEX N7W6B0zv1bLho8g798/uZC4BemnFtg7I845hz43DCW48EbTTKorAfI+vgB/Q04qCvJ9NVL37g1 17Y= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483552" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:36 +0800 IronPort-SDR: uTt/xF8fql/cY9Xt64MuQvVCSohcFjDX5Yu0ss3JLo7uoVyAb3MBIaFMsWfvX5805hJA2CNPO1 RACBwG4dNov5USK2hfRJYB5lXgP4E7G0yDP1p2HCdCHjdt/E6FO8JoIiLfPxX0Yp3nuVm5MBpg UaW3d3oF374WfQBErE8Qu9cSUVtGqqtceLrWI78rDdSsEtFFyHi+IgcM3+noBAEX5XICdSPCAX SS8MSZeEBrCBfG06SL5fZNeJtb8Ig6SGjr0ruU0v5h4NIqY7++KejyhWvF9ErtH32zdhMKlAsM 2N66RQGidexR9C4BlUEIgoXw Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:03 -0800 IronPort-SDR: XAQyqIbnbhOD3PhgBAhQPyuEOgKShudbCjDsIM6WwUtSNxqpHCLG022ELmIWH6MZaw4WJANCPU V0zd/dgcd5qyTVyXdJWDKdv212Kfcvw7wReaJPdNqn7p1XXL22x/3fO3k3EL1Mq99uPJ6+3BYV fQqTlieKdEQilLp01MYzg55/kCA9BNsDqdMnoWx64v1o6Vs72LKFKJ5DDXfAQP7TfULvfBsHbr P/p5Ox1FgMEcV6SqDBI3qzJVsHK1ga7Gb+HARHYYD731dGO4eMTD1w7I1WBHX8MH/lbOOqMCUp D5A= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:35 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , kernel test robot Subject: [PATCH v14 22/42] btrfs: split ordered extent when bio is sent Date: Tue, 26 Jan 2021 11:25:00 +0900 Message-Id: <4293f37cdedd93b58df550eb0cdbea44e05e1280.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org For a zone append write, the device decides the location the data is written to. Therefore we cannot ensure that two bios are written consecutively on the device. In order to ensure that a ordered extent maps to a contiguous region on disk, we need to maintain a "one bio == one ordered extent" rule. This commit implements the splitting of an ordered extent and extent map on bio submission to adhere to the rule. [testbot] made extract_ordered_extent static Reported-by: kernel test robot Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/inode.c | 95 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/ordered-data.c | 85 ++++++++++++++++++++++++++++++++++++ fs/btrfs/ordered-data.h | 2 + 3 files changed, 182 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 04b9efe4ca5a..92fae7654a3a 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2217,6 +2217,92 @@ static blk_status_t btrfs_submit_bio_start(struct inode *inode, struct bio *bio, return btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0); } +static blk_status_t extract_ordered_extent(struct btrfs_inode *inode, + struct bio *bio, loff_t file_offset) +{ + struct btrfs_ordered_extent *ordered; + struct extent_map *em = NULL, *em_new = NULL; + struct extent_map_tree *em_tree = &inode->extent_tree; + u64 start = (u64)bio->bi_iter.bi_sector << SECTOR_SHIFT; + u64 len = bio->bi_iter.bi_size; + u64 end = start + len; + u64 ordered_end; + u64 pre, post; + int ret = 0; + + ordered = btrfs_lookup_ordered_extent(inode, file_offset); + if (WARN_ON_ONCE(!ordered)) + return BLK_STS_IOERR; + + /* No need to split */ + if (ordered->disk_num_bytes == len) + goto out; + + /* We cannot split once end_bio'd ordered extent */ + if (WARN_ON_ONCE(ordered->bytes_left != ordered->disk_num_bytes)) { + ret = -EINVAL; + goto out; + } + + /* We cannot split a compressed ordered extent */ + if (WARN_ON_ONCE(ordered->disk_num_bytes != ordered->num_bytes)) { + ret = -EINVAL; + goto out; + } + + ordered_end = ordered->disk_bytenr + ordered->disk_num_bytes; + /* bio must be in one ordered extent */ + if (WARN_ON_ONCE(start < ordered->disk_bytenr || end > ordered_end)) { + ret = -EINVAL; + goto out; + } + + /* Checksum list should be empty */ + if (WARN_ON_ONCE(!list_empty(&ordered->list))) { + ret = -EINVAL; + goto out; + } + + pre = start - ordered->disk_bytenr; + post = ordered_end - end; + + ret = btrfs_split_ordered_extent(ordered, pre, post); + if (ret) + goto out; + + read_lock(&em_tree->lock); + em = lookup_extent_mapping(em_tree, ordered->file_offset, len); + if (!em) { + read_unlock(&em_tree->lock); + ret = -EIO; + goto out; + } + read_unlock(&em_tree->lock); + + ASSERT(!test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)); + /* + * We canot re-use em_new here but have to create a new one, as + * unpin_extent_cache() expects the start of the extent map to be the + * logical offset of the file, which does not hold true anymore after + * splitting. + */ + em_new = create_io_em(inode, em->start + pre, len, + em->start + pre, em->block_start + pre, len, + len, len, BTRFS_COMPRESS_NONE, + BTRFS_ORDERED_REGULAR); + if (IS_ERR(em_new)) { + ret = PTR_ERR(em_new); + goto out; + } + free_extent_map(em_new); + +out: + free_extent_map(em); + btrfs_put_ordered_extent(ordered); + + return errno_to_blk_status(ret); +} + /* * extent_io.c submission hook. This does the right thing for csum calculation * on write, or reading the csums from the tree before a read. @@ -2252,6 +2338,15 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, if (btrfs_is_free_space_inode(BTRFS_I(inode))) metadata = BTRFS_WQ_ENDIO_FREE_SPACE; + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + struct page *page = bio_first_bvec_all(bio)->bv_page; + loff_t file_offset = page_offset(page); + + ret = extract_ordered_extent(BTRFS_I(inode), bio, file_offset); + if (ret) + goto out; + } + if (btrfs_op(bio) != BTRFS_MAP_WRITE) { ret = btrfs_bio_wq_end_io(fs_info, bio, metadata); if (ret) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index b4e6500548a2..23aae67fe9e9 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -921,6 +921,91 @@ void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start, } } +static int clone_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pos, + u64 len) +{ + struct inode *inode = ordered->inode; + u64 file_offset = ordered->file_offset + pos; + u64 disk_bytenr = ordered->disk_bytenr + pos; + u64 num_bytes = len; + u64 disk_num_bytes = len; + int type; + unsigned long flags_masked = + ordered->flags & ~(1 << BTRFS_ORDERED_DIRECT); + int compress_type = ordered->compress_type; + unsigned long weight; + int ret; + + weight = hweight_long(flags_masked); + WARN_ON_ONCE(weight > 1); + if (!weight) + type = 0; + else + type = __ffs(flags_masked); + + if (test_bit(BTRFS_ORDERED_COMPRESSED, &ordered->flags)) { + WARN_ON_ONCE(1); + ret = btrfs_add_ordered_extent_compress(BTRFS_I(inode), + file_offset, + disk_bytenr, num_bytes, + disk_num_bytes, + compress_type); + } else if (test_bit(BTRFS_ORDERED_DIRECT, &ordered->flags)) { + ret = btrfs_add_ordered_extent_dio(BTRFS_I(inode), file_offset, + disk_bytenr, num_bytes, + disk_num_bytes, type); + } else { + ret = btrfs_add_ordered_extent(BTRFS_I(inode), file_offset, + disk_bytenr, num_bytes, + disk_num_bytes, type); + } + + return ret; +} + +int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre, + u64 post) +{ + struct inode *inode = ordered->inode; + struct btrfs_ordered_inode_tree *tree = &BTRFS_I(inode)->ordered_tree; + struct rb_node *node; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + int ret = 0; + + spin_lock_irq(&tree->lock); + /* Remove from tree once */ + node = &ordered->rb_node; + rb_erase(node, &tree->tree); + RB_CLEAR_NODE(node); + if (tree->last == node) + tree->last = NULL; + + ordered->file_offset += pre; + ordered->disk_bytenr += pre; + ordered->num_bytes -= (pre + post); + ordered->disk_num_bytes -= (pre + post); + ordered->bytes_left -= (pre + post); + + /* Re-insert the node */ + node = tree_insert(&tree->tree, ordered->file_offset, + &ordered->rb_node); + if (node) + btrfs_panic(fs_info, -EEXIST, + "zoned: inconsistency in ordered tree at offset %llu", + ordered->file_offset); + + spin_unlock_irq(&tree->lock); + + if (pre) + ret = clone_ordered_extent(ordered, 0, pre); + if (post) + ret = clone_ordered_extent(ordered, + pre + ordered->disk_num_bytes, + post); + + return ret; +} + int __init ordered_data_init(void) { btrfs_ordered_extent_cache = kmem_cache_create("btrfs_ordered_extent", diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index cca3307807e8..c400be75a3f1 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -201,6 +201,8 @@ void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, u64 nr, void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start, u64 end, struct extent_state **cached_state); +int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre, + u64 post); int __init ordered_data_init(void); void __cold ordered_data_exit(void); From patchwork Tue Jan 26 02:25:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12045411 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56ACDC43381 for ; Tue, 26 Jan 2021 05:30:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2F4DC22ADF for ; Tue, 26 Jan 2021 05:30:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733168AbhAZF3U (ORCPT ); Tue, 26 Jan 2021 00:29:20 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38256 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732097AbhAZCgS (ORCPT ); Mon, 25 Jan 2021 21:36:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628578; x=1643164578; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gvbD4z5IB2RF0Zbskb8h6rGcV0uN7PdkjCyrNVFHGfE=; b=WiOjE9/BX02hKw5PlZezYyMcZzFMyklMGODLH9zkiYjGCm7Fwrs95Ze4 M7ovt9iit3arAt8kKpz7gYbet3ZkgBM1lJnPCz56etbvc9r39SF44CIQG tkaCRRwAoZhSQyD320XCxX+IUHYcW3HffacM+OyLRmsPIovOaTf8nnHP1 nCFfhxKqXy32rxm4M5ah+ZR9iwjRijGhUHlVEDrlJXV3BVvrC7bidcfi7 1JIVsBKWtUzh9uaVAH2Y3oD+UG8owX+C+e/TECbKzyBgsCDB8FVX4E4ai e6cLpmma2c92OaEEv2S3DHTckIGcmAStNoiJM48fIA4YgaAqaVxG/1JsQ Q==; IronPort-SDR: k4LpN99b4XT6Zq4bWdw3begrLZ1qMvF1Km7vc3jqgxHj0n/aVGmk/AJBEbV6qDcpfW14NxBrPw Fy9OwDmS/mqg+j92Y8Qcm0h3eginUXnqe7yAS9+hrD+0zhEGmQumg9sFEMV17vavRgsaktL6Qa VvKTilUgsuVDTasZo1o8EuH3Mfc36QTGLpzevMKgxypNYE3WfhnkOWuu6IqK+JkziQoed+0c8V UnbMbu4nKVQOnDXrWlVPSzZxeQE+IxNhnUjgoQgpsiEDoK0HbnMfWYMhdPs5xrdMW2XP8DPKYA 5Ik= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483555" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:38 +0800 IronPort-SDR: d6fKTtOpH5EemapAftatn7Ai4/mAeMEZMRW5V4JnSIn+r4fEW7jaIaTHSFUKajOEwzT838460f vmwJkSwRi9gqeyeN3Z6W738TbGRoRRCFcnd+H8uhX97WpZuuaSckXtYeA8LCXwz3mElYTbW1fG +vhIel9gXquIzZehQ86xpRLnno9pZghFCrSz5PHWWr43E1sV4M2nqlrHrFTRIFNT1MxxizvNsp acOG1wzky3YHn638EqJhlZg52iPTzObflYwaazdWjmKOCs1y3bgbhVPXikNC7PmVB4OVAFriE+ xjvwpHD8OIhkBUjfmDMDmntR Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:05 -0800 IronPort-SDR: SlkyZqQhJEYFkVpk8OrMkvn61ZlafXC45jpVCdvH7dC6G4WihOx1dSlpm4KI2EQwlgP8yatPEP N/dwpEwV0fDtTsLrTDLmGSP5XfM5ovAX38fjuY46IfNvmTCHGy8ddtyrlSE+G1Vu95bEBPhSP9 DVBzPyTbF2W75EkhDT+yBN2NpXK+6WHtRhjSoD/kkHI6z7HFDK8ZeH1DMRqVXaerhrSTSPvF2+ 5UM0xpPn7dEEzg4NqNkloTCd1R5ko+fzmQ7tZribXcbj2P/MlNTpYyy8g3g545SMe2sSNOHmLO c9s= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:36 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn , Naohiro Aota , Josef Bacik Subject: [PATCH v14 23/42] btrfs: check if bio spans across an ordered extent Date: Tue, 26 Jan 2021 11:25:01 +0900 Message-Id: <430fac31a56a9d251c42f1e3036d7614abe56be4.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn To ensure that an ordered extent maps to a contiguous region on disk, we need to maintain a "one bio == one ordered extent" rule. This commit ensures that constructing bio does not span across an ordered extent. Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 2 ++ fs/btrfs/extent_io.c | 9 +++++++-- fs/btrfs/inode.c | 29 +++++++++++++++++++++++++++++ 3 files changed, 38 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 29976d37f4f9..6c4ff56eeb5e 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3119,6 +3119,8 @@ void btrfs_split_delalloc_extent(struct inode *inode, struct extent_state *orig, u64 split); int btrfs_bio_fits_in_stripe(struct page *page, size_t size, struct bio *bio, unsigned long bio_flags); +bool btrfs_bio_fits_in_ordered_extent(struct page *page, struct bio *bio, + unsigned int size); void btrfs_set_range_writeback(struct extent_io_tree *tree, u64 start, u64 end); vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf); int btrfs_readpage(struct file *file, struct page *page); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index ad19757d685d..6092ca6edc86 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3098,10 +3098,15 @@ static bool btrfs_bio_add_page(struct bio *bio, struct page *page, if (btrfs_bio_fits_in_stripe(page, size, bio, bio_flags)) return false; - if (bio_op(bio) == REQ_OP_ZONE_APPEND) + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + struct page *first_page = bio_first_bvec_all(bio)->bv_page; + + if (!btrfs_bio_fits_in_ordered_extent(first_page, bio, size)) + return false; ret = bio_add_zone_append_page(bio, page, size, pg_offset); - else + } else { ret = bio_add_page(bio, page, size, pg_offset); + } return ret == size; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 92fae7654a3a..419f4290bdf8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2217,6 +2217,35 @@ static blk_status_t btrfs_submit_bio_start(struct inode *inode, struct bio *bio, return btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0); } + + +bool btrfs_bio_fits_in_ordered_extent(struct page *page, struct bio *bio, + unsigned int size) +{ + struct btrfs_inode *inode = BTRFS_I(page->mapping->host); + struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct btrfs_ordered_extent *ordered; + u64 len = bio->bi_iter.bi_size + size; + bool ret = true; + + ASSERT(btrfs_is_zoned(fs_info)); + ASSERT(fs_info->max_zone_append_size > 0); + ASSERT(bio_op(bio) == REQ_OP_ZONE_APPEND); + + /* Ordered extent not yet created, so we're good */ + ordered = btrfs_lookup_ordered_extent(inode, page_offset(page)); + if (!ordered) + return ret; + + if ((bio->bi_iter.bi_sector << SECTOR_SHIFT) + len > + ordered->disk_bytenr + ordered->disk_num_bytes) + ret = false; + + btrfs_put_ordered_extent(ordered); + + return ret; +} + static blk_status_t extract_ordered_extent(struct btrfs_inode *inode, struct bio *bio, loff_t file_offset) { From patchwork Tue Jan 26 02:25:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12045417 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5B7FC4332D for ; Tue, 26 Jan 2021 05:30:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7A33922B3B for ; Tue, 26 Jan 2021 05:30:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387449AbhAZF33 (ORCPT ); Tue, 26 Jan 2021 00:29:29 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33036 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732134AbhAZCiK (ORCPT ); Mon, 25 Jan 2021 21:38:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628690; x=1643164690; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IkN5GoeRkESPBFBXYtV8ezLFrKmoz8xW5id1nnpCmBc=; b=Y9PP5Kg1QdJZuUUXn87TIUdFuz85vNHLHODuNaWkQFrjiHHVBI+o7s2A QYo4tLOLpBzPCVpIs7O3OYFjH7Nc8ySQdTpbiIamO4uf7H0hpVW4aeRd/ 29YDm18/fOmI+9vbjciDahRVkZmMZUrdOY+yQGap73qCnJ0InwEIhFXM4 6vJRn6e2iHenOcKa3MsMl2ufqY5hx5WXUGazPLXsT9o4LrxUNAPUTtaRV OP9Rb64axu9EWO7dkhrOKGtY0wS2v3tnAFJzYD1fZrPpqVd7y2gMZD2SG 9v3HMRP4yL2BJLeHjkLR5o3LL+2ODitb1CaCwQdrtb0NKxhRtOsvxA9Z1 w==; IronPort-SDR: 7klANJk52sUAj/cf3wOTaMd2V1FQh79ZwY0L5QtYpjJQWlJKCUm5/+L8e2hPPB4uppnlWB19ke wX3Iu23j/A9Qd7cGNR/w97yhm4vkDvSUNR60nFNLVd+vAVA4mEfN409kWIvYZ5QG/kOJe7R7qY Vfn7WT5c6kuUf+dEsTM/qWfdCnN7DM2FnN6DyZddsPphB+mTKQ+PbesdvgNkm6qn7tFkYxrM6z Q5YAccgWhPPSAzsvFvK7AX5kCVheiPNcpxtCSmvXAGw3WXjZlpnFHWe65Ui70lL1zFf8dPKstI Oo4= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483557" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:40 +0800 IronPort-SDR: jExnuyg8UvRQlhTwcGhyxWLkAmA81aZWa5BJi5gqE+dd996BIBthpQEK7sayC9XtYoZgEyBeYA Pd1JtZTNHnsrSu6Zd63rcAVwhEdHP+keQMM3aTfMmofTEuUSKuZx1tCT+8BqdrQJEtccSWNz1O IL5NWTOSz/U+yTQUqgj3zpBHbOrIXAriTNKMbxDmK/pfWtZt+eE09RknfiiCyPKvv5ar9iAiRl WDmXodpQ6X7UihuoV05QxUYrICp5p4IKaCu23FZZOwUMcR7SZCoxCdL2F7jkpT2SVvPgVwCdyS GOsrCEFip9c0OeUnXpnIjgn8 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:06 -0800 IronPort-SDR: xORTNpDQKqloZPqpsyeEgcsOH3RQSHaz/8K08ftwQrjDPteQgXjpcYBKgICbRCnUMXjR8zb+EE W8k9eBUeMrSc+y/nfLhsz+D6gub5sxDqYnDe7iEjdfu0LYvCqB5VUkqeOE+f2Fr8xXE/TYUmZg 5ehE9coM8a/nkWIEnAGGR8eb0QtFX3BenpvUXJY+FLXAR6KjGJNJ/jkY6P1fu3NzYZA6UOMSV1 PdZTmx5uP19Ftbzh2WeD+sDqYRhjacE83mF7OEjTciOLBoc6lxM/kR45hrk02iaM5fKJ6G4J7q 938= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:38 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 24/42] btrfs: extend btrfs_rmap_block for specifying a device Date: Tue, 26 Jan 2021 11:25:02 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org btrfs_rmap_block currently reverse-maps the physical addresses on all devices to the corresponding logical addresses. This commit extends the function to match to a specified device. The old functionality of querying all devices is left intact by specifying NULL as target device. We pass block_device instead of btrfs_device to __btrfs_rmap_block. This function is intended to reverse-map the result of bio, which only have block_device. This commit also exports the function for later use. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 17 ++++++++++++----- fs/btrfs/block-group.h | 8 +++----- fs/btrfs/tests/extent-map-tests.c | 2 +- 3 files changed, 16 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 9801df6cbfd8..7facc4439116 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1583,6 +1583,8 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) * * @fs_info: the filesystem * @chunk_start: logical address of block group + * @bdev: physical device to resolve. Can be NULL to indicate any + * device. * @physical: physical address to map to logical addresses * @logical: return array of logical addresses which map to @physical * @naddrs: length of @logical @@ -1592,9 +1594,9 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) * Used primarily to exclude those portions of a block group that contain super * block copies. */ -EXPORT_FOR_TESTS int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, - u64 physical, u64 **logical, int *naddrs, int *stripe_len) + struct block_device *bdev, u64 physical, u64 **logical, + int *naddrs, int *stripe_len) { struct extent_map *em; struct map_lookup *map; @@ -1612,6 +1614,7 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, map = em->map_lookup; data_stripe_length = em->orig_block_len; io_stripe_size = map->stripe_len; + chunk_start = em->start; /* For RAID5/6 adjust to a full IO stripe length */ if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) @@ -1626,14 +1629,18 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, for (i = 0; i < map->num_stripes; i++) { bool already_inserted = false; u64 stripe_nr; + u64 offset; int j; if (!in_range(physical, map->stripes[i].physical, data_stripe_length)) continue; + if (bdev && map->stripes[i].dev->bdev != bdev) + continue; + stripe_nr = physical - map->stripes[i].physical; - stripe_nr = div64_u64(stripe_nr, map->stripe_len); + stripe_nr = div64_u64_rem(stripe_nr, map->stripe_len, &offset); if (map->type & BTRFS_BLOCK_GROUP_RAID10) { stripe_nr = stripe_nr * map->num_stripes + i; @@ -1647,7 +1654,7 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, * instead of map->stripe_len */ - bytenr = chunk_start + stripe_nr * io_stripe_size; + bytenr = chunk_start + stripe_nr * io_stripe_size + offset; /* Ensure we don't add duplicate addresses */ for (j = 0; j < nr; j++) { @@ -1689,7 +1696,7 @@ static int exclude_super_stripes(struct btrfs_block_group *cache) for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { bytenr = btrfs_sb_offset(i); - ret = btrfs_rmap_block(fs_info, cache->start, + ret = btrfs_rmap_block(fs_info, cache->start, NULL, bytenr, &logical, &nr, &stripe_len); if (ret) return ret; diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 0f3c62c561bc..9df00ada09f9 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -277,6 +277,9 @@ void btrfs_put_block_group_cache(struct btrfs_fs_info *info); int btrfs_free_block_groups(struct btrfs_fs_info *info); void btrfs_wait_space_cache_v1_finished(struct btrfs_block_group *cache, struct btrfs_caching_control *caching_ctl); +int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, + struct block_device *bdev, u64 physical, u64 **logical, + int *naddrs, int *stripe_len); static inline u64 btrfs_data_alloc_profile(struct btrfs_fs_info *fs_info) { @@ -303,9 +306,4 @@ static inline int btrfs_block_group_done(struct btrfs_block_group *cache) void btrfs_freeze_block_group(struct btrfs_block_group *cache); void btrfs_unfreeze_block_group(struct btrfs_block_group *cache); -#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS -int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, - u64 physical, u64 **logical, int *naddrs, int *stripe_len); -#endif - #endif /* BTRFS_BLOCK_GROUP_H */ diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c index 57379e96ccc9..c0aefe6dee0b 100644 --- a/fs/btrfs/tests/extent-map-tests.c +++ b/fs/btrfs/tests/extent-map-tests.c @@ -507,7 +507,7 @@ static int test_rmap_block(struct btrfs_fs_info *fs_info, goto out_free; } - ret = btrfs_rmap_block(fs_info, em->start, btrfs_sb_offset(1), + ret = btrfs_rmap_block(fs_info, em->start, NULL, btrfs_sb_offset(1), &logical, &out_ndaddrs, &out_stripe_len); if (ret || (out_ndaddrs == 0 && test->expected_mapped_addr)) { test_err("didn't rmap anything but expected %d", From patchwork Tue Jan 26 02:25:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048139 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11E03C4332B for ; Tue, 26 Jan 2021 19:55:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E64D422CE3 for ; Tue, 26 Jan 2021 19:55:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387483AbhAZF3w (ORCPT ); Tue, 26 Jan 2021 00:29:52 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33033 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732136AbhAZCiW (ORCPT ); Mon, 25 Jan 2021 21:38:22 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628701; x=1643164701; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TPIdQDn9DfTnmfdGppIQ7mhUm11+1iLGE3pAmciGZjM=; b=K4r9oiwU+MfVQdwFJK7332hiU5+V/qAmjC9TAPo/3DBv/T5lHnNcsUSK R/tDtr9Q/16KkPSVCzBXAhcysRZ0Wf/pjR7bBwIjTJowA95AlRwjofGDD JmwOqV+iDNYeU8Rs+TpC7JCG3HX/2RWp07oXpNZ5MbpXLx5qYYGyA8xKZ vgDGKV8SZhvsEcAXAnEuorIedSaxPz356qEYZyK135uvXPFa22lAQl+Pa Yi8e2jNxoIBjj5ctzahV5doLi4PxEuB8GTNOHEkokSS5Emj1VuYF4ZM3k Guupcng/QJYYh9b9LmirgAjIs2FJAm4CI6gUa4jB9Ifb/jh8S4bchvr8S Q==; IronPort-SDR: raoLNe2xct/YVZgEb3wzsFelkEk5Bxe9EDofDwxZpWvMUbxwcjqGVPw75fZin5gPNB/BTzvzpU 6A+3+mMHiKm7dyGYb3ljpeKN2lfUSR2sSQAaEoyVVQj4KB/Ab0E0hLcy7xlnkVakqP/6wHcw1F vEI2iaVU7FBg6UE2xXpgOrnDJFwLvucyn+WkMpKA89ZcdyK5SA6zI/iuo779fL6UtEUdsq07cr iIftDzVp65Dl0PTBwZNLwlXES/3ClsqITEGdN8anZvgNxAs2a62pDlqZWWmXE/I10+BH8AEVF0 /Ic= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483558" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:42 +0800 IronPort-SDR: 7kMGEyTaXm9TWn1YvpZbqrp4MFHS7D7j3fg+7CdriszOCP7gmpbr4KBSTaZdIT5Cd3u6ziRK+H HIJUGcMLAy6Ck8CSP7FcVUFDgIAa9N4ULW7D8rQYnN5/Ed3hpYU/1Z1RSRqVzq+IJwtnEqbdMH dHkZGaM3ejlPVGdzWxoqIEh1LHiyU489S8YRuzmOJUhhw1cc3bOqGN2WLU5DPXXgLT1LYItwoV nNLh5nrQ12Sl+3hxUmWSQTFwTosH5Rdj+4IyJCKZMUAzEzR3v5aAhmtAg/3Ol/p1hgjERwRfho 24Vsyb5W9V+8n9pDzWZg+dP1 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:08 -0800 IronPort-SDR: VB8H1rMRHfe5tUzN4+dQ8JfGDwB/YcfLi/OkEn6wv7OE+bWddWRJ9WNyXyI3cLFAnzMJrdx+AO a+THcKHwQWskEvwt9wDq7ZYLGeUSko5QltHqwTNB6qlj0LVFLMQPQgmdPiMRsRUw+6tYAcWaoE u8eh2JXIalpdUrnDsF3KsSK6AZZ+bGMXbHDLP7A390bYYLTNRz11v1IZEPphLiffrqUEFIJueA 1R3B0tqCI2kFkmk7Tp1GsZxEyjWgKRIi7FZlHWaru1b6uCnZ5SN9ffRPwFEpkL6vEABDu71xo4 1tA= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:40 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn , Josef Bacik Subject: [PATCH v14 25/42] btrfs: cache if block-group is on a sequential zone Date: Tue, 26 Jan 2021 11:25:03 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn In zoned mode, cache if a block-group is on a sequential write only zone. On sequential write only zones, we can use REQ_OP_ZONE_APPEND for writing of data, therefore provide btrfs_use_zone_append() to figure out if I/O is targeting a sequential write only zone and we can use said REQ_OP_ZONE_APPEND for data writing. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn --- fs/btrfs/block-group.h | 2 ++ fs/btrfs/zoned.c | 29 +++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 5 +++++ 3 files changed, 36 insertions(+) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 9df00ada09f9..a1d96c4cfa3b 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -184,6 +184,8 @@ struct btrfs_block_group { /* Record locked full stripes for RAID5/6 block group */ struct btrfs_full_stripe_locks_tree full_stripe_locks_root; + /* Flag indicating this block-group is placed on a sequential zone */ + bool seq_zone; /* * Allocation offset for the block group to implement sequential * allocation. This is used only with ZONED mode enabled. diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index db6cb0070220..abe6b415de98 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1103,6 +1103,9 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) } } + if (num_sequential > 0) + cache->seq_zone = true; + if (num_conventional > 0) { /* * Avoid calling calculate_alloc_pointer() for new BG. It @@ -1223,3 +1226,29 @@ void btrfs_free_redirty_list(struct btrfs_transaction *trans) } spin_unlock(&trans->releasing_ebs_lock); } + +bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em) +{ + struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct btrfs_block_group *cache; + bool ret = false; + + if (!btrfs_is_zoned(fs_info)) + return false; + + if (!fs_info->max_zone_append_size) + return false; + + if (!is_data_inode(&inode->vfs_inode)) + return false; + + cache = btrfs_lookup_block_group(fs_info, em->block_start); + ASSERT(cache); + if (!cache) + return false; + + ret = cache->seq_zone; + btrfs_put_block_group(cache); + + return ret; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 331951978487..92888eb86055 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -46,6 +46,7 @@ void btrfs_calc_zone_unusable(struct btrfs_block_group *cache); void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb); void btrfs_free_redirty_list(struct btrfs_transaction *trans); +bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -134,6 +135,10 @@ static inline void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb) { } static inline void btrfs_free_redirty_list(struct btrfs_transaction *trans) { } +bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em) +{ + return false; +} #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Jan 26 02:25:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048141 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6870DC4332E for ; Tue, 26 Jan 2021 19:55:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 37B4B22DFB for ; Tue, 26 Jan 2021 19:55:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387471AbhAZF3p (ORCPT ); Tue, 26 Jan 2021 00:29:45 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33029 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732135AbhAZCiT (ORCPT ); Mon, 25 Jan 2021 21:38:19 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628698; x=1643164698; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PzKrK+/EaIAWaAVbsVXNTR19q/xbdDY3gO/u9Gs5HxI=; b=al5MRWWOWX+ganXxPF4fGSZ08w9YrJ1JUx/ihIHYD/W5uWHscz0/fFYU cTu3Lbq1/bsFzMv3DKiqnPMTwYYKN3IJpBXi7RrFTnBQn186R9PIwolay pUT6KbeubwAnlDiy6mYhYpy06jpDg+QYh4mrH9oQn/A3bbw4JmahB1t0P gTxMaSpsszC+Vyb34onojd2mk5lW3GDSvI8wV2xc4VPMoXwwS/+M9MIkK eTZUMBl7u76I0JihA3SgmBN0AhqxCCe0BmZtwyPWNebot0tw0N0FJVqGJ aXaYLz4D2OzWc/9nwFaqtfWPJt0qOAFP0ioPR4m3MscUmYHECrWtr9vfu A==; IronPort-SDR: s6kB/mrcJQS3hMrOD8w6YzmHYSIVkjcBOGlu/TS2Jf3CsFw2tS7Yk6ehd8cI1GFHiK3fAPopQ/ GqPbPrTGztA1Rmcwp2UyhhIxFvnwqJNaA7rRM87QPnPTl7cAzKyJnPYIvqvkly/qjcHTlc9e7g fI26FghZJ9kgoYvS2wrlIKWaYsM1KzVWkx7L1bwopqlUMn8dTSpXcTdi3qWlYnMlZSHsdeT8P4 8j0B8/IafcjqPautlrZCGFE4YBXPsa3rvS/dLQtm0VBFoqnGOd9JkPOc8OsoKmtfA21IR3av8H DRQ= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483560" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:43 +0800 IronPort-SDR: dXdjsie99ZTF1mhznwTjD46mbzYzQn7VVue/ULPKyf+l+RhNKgb9fOKphV24RZy3RUFgSCDWYg AEpdTPfTq7BXpPYPNUKXyXWAAlNIn8y6ycrVUTqqKqA4sMFTHk2Bz7tQmuSZWr5g5Zj1FYKCla xGjBp9Fc4jiT7Yqjob7GvSQr7MIGK3SExyg6RNaJQbz6SPAkXHV4W0fi58ey73WAKEsFQIFQrL BNb8h9WeR7kcYP6qkDgezWcNGvGS6G8vMQZHAO0y56DnZI/7MNlpdYK/jPCCc2d2n7jgdBRFgG bsC9GsEVuBLimdeih9aV63G1 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:10 -0800 IronPort-SDR: GqNM1ae1aOnsw80gVOcTeh/lB/EXYHMdwjPaITkGBlGqUqz+CPqEBK4zL2WU0+zcySCd1Cyk5E sPzJv8KwN2rNgJJW8WbV++zVvM9cxMsEbcwxF39tfdb3lk4sCPBfte7jbITYel4amS5c2epxn5 /q1mfxgyDFFdmc9GfKxUZbFrRy7qL7j2xhi/kbtB1u/VY3wM/my+NzL/uIwNV9aEX+M4Y0E/5q IlX5z3Po2M2UZnlutDgKHaKAzR/BacVoTQxvaL60qjAOP2iDm4KiYsdAjlrKZ+U9UXnzDz8sVA IDw= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:41 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Johannes Thumshirn , Josef Bacik Subject: [PATCH v14 26/42] btrfs: save irq flags when looking up an ordered extent Date: Tue, 26 Jan 2021 11:25:04 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn A following patch will add another caller of btrfs_lookup_ordered_extent() from a bio endio context. btrfs_lookup_ordered_extent() uses spin_lock_irq() which unconditionally disables interrupts. Change this to spin_lock_irqsave() so interrupts aren't disabled and re-enabled unconditionally. Signed-off-by: Johannes Thumshirn Reviewed-by: Josef Bacik --- fs/btrfs/ordered-data.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 23aae67fe9e9..7c061146ead9 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -768,9 +768,10 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_extent(struct btrfs_inode *ino struct btrfs_ordered_inode_tree *tree; struct rb_node *node; struct btrfs_ordered_extent *entry = NULL; + unsigned long flags; tree = &inode->ordered_tree; - spin_lock_irq(&tree->lock); + spin_lock_irqsave(&tree->lock, flags); node = tree_search(tree, file_offset); if (!node) goto out; @@ -781,7 +782,7 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_extent(struct btrfs_inode *ino if (entry) refcount_inc(&entry->refs); out: - spin_unlock_irq(&tree->lock); + spin_unlock_irqrestore(&tree->lock, flags); return entry; } From patchwork Tue Jan 26 02:25:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048137 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8E8EC433E9 for ; Tue, 26 Jan 2021 19:55:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B248322D58 for ; Tue, 26 Jan 2021 19:55:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387495AbhAZF3x (ORCPT ); Tue, 26 Jan 2021 00:29:53 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38250 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732151AbhAZCig (ORCPT ); Mon, 25 Jan 2021 21:38:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628716; x=1643164716; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7hQZD06/7MbLwbON8CoqRP3IpCmya3Zfmyf4RlqCOYg=; b=PeTVEZd7BbqdshzcZcOt4Bts0vkIiwYrzwAQE6KUuYd/ACBtcKYF/Ckw Kmg5BRqN9IuN8K5LKR7TZkjiwL+b2vYhY8k3h1sznrlpKUCTfQI4ndLKD X7c65xvnfbRhmA+Gz+46gU5HPbl/ZerdvS3c+npxSaOZWBwKuIQVRehMG hIc58iqDeU/yKdn6YuxZao1IybTkR0NtIRPc6818mn+o5r49ht/DWYr+9 flG29JM5EvJtp6TA3XFFYD7sCNJnDN++cLltx0QNxj49ejv8NvAY2UaC7 Kqsdu9ltXnq0YNCzwvxIxR2x6AxiMGTxgbBo5r6N3rLkFQJu4VosyoQW9 Q==; IronPort-SDR: A8dL2iIGLaGngi0GQ7M2W/O7mosIVR4MonHE5o0NHo9ZPAlWX5zXzGPIhKnxWq2zB+aNWYP/ZP E2ZY6i4M/eMSqpFJxHzWjDvEBYRMMGeZoy+yipEXxUtRW9Qg9gUgTTsVEQgGqCVCWmX6ROlodX lVXjbRTkSB1BM6quljfcUnvddE/DGc+D28kL2FmesqEw4a/YQ2jATp9bHqIxXUpqINDPpJogmH Gnr7iRdI8ctpAXZlMsZURlKtASbvZzRBMwlHLI/+Ah0CXwQqlGOX16My4JdzuGzvRzFVZBmRcP dwU= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483562" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:45 +0800 IronPort-SDR: gV+4HaXTBfSZgwNxBdRJJIuhsquGOYSHNSzPvZalvO/QK8E0dfKbpEPTJmsvdTL0ZQNC6bipfS hwjjYjBoNdd79wCo1TfKTAcfOb4TFRDVG3IYarcNM0BqVZGU0bGiD9BabzCoyddfW3adREwNmh TlCmfckjByi4NgYBNzHln6nNUkasU0psdfmoAxe+Z2R6C9LN5h0EuEXOQOP/SBHhWkFoq2X5z6 YEVd/NK59Qu360phrqYv9tSFAiW450WE9gt1ABjkaRLgugDRXfbMzwEmzVs9CYHOT9B073niTo yTa5dOsXqIpma1/02uYe2foo Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:11 -0800 IronPort-SDR: YrFNE2JW9NogaC4BcivPjrvJDZQ04SNKkptjUCGovujx0YgaAKID8th80I0xr+N9JUt4lHlPFk ii3Ssvuav9OPplFpd/CWreov+EzEeae1tBScrzz7uxCC7FqvDPPNGpfiWd5jcVtlL2QFiB7rLw cllFIOjFWjp/koSnRXw6S3sMlbNrkspcg1rguAMxZuB/0hDHlaMwPMt3uMeQm6ujWLmj1dnCfy 3Eap1G7VdtR50VazLMdud8qgUoDV5y//gBfzwxNx7Io9eitMcDHekUhvU8Gxv4F7O9K3RU3rKj rhc= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:43 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Johannes Thumshirn , Josef Bacik Subject: [PATCH v14 27/42] btrfs: use ZONE_APPEND write for ZONED btrfs Date: Tue, 26 Jan 2021 11:25:05 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit enables zone append writing for zoned btrfs. When using zone append, a bio is issued to the start of a target zone and the device decides to place it inside the zone. Upon completion the device reports the actual written position back to the host. Three parts are necessary to enable zone append in btrfs. First, modify the bio to use REQ_OP_ZONE_APPEND in btrfs_submit_bio_hook() and adjust the bi_sector to point the beginning of the zone. Secondly, records the returned physical address (and disk/partno) to the ordered extent in end_bio_extent_writepage() after the bio has been completed. We cannot resolve the physical address to the logical address because we can neither take locks nor allocate a buffer in this end_bio context. So, we need to record the physical address to resolve it later in btrfs_finish_ordered_io(). And finally, rewrites the logical addresses of the extent mapping and checksum data according to the physical address (using __btrfs_rmap_block). If the returned address matches the originally allocated address, we can skip this rewriting process. Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/extent_io.c | 15 +++++++-- fs/btrfs/file.c | 6 +++- fs/btrfs/inode.c | 4 +++ fs/btrfs/ordered-data.c | 3 ++ fs/btrfs/ordered-data.h | 8 +++++ fs/btrfs/volumes.c | 15 +++++++++ fs/btrfs/zoned.c | 73 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 12 +++++++ 8 files changed, 133 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 6092ca6edc86..75df05193eb8 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2734,6 +2734,7 @@ static void end_bio_extent_writepage(struct bio *bio) u64 start; u64 end; struct bvec_iter_all iter_all; + bool first_bvec = true; ASSERT(!bio_flagged(bio, BIO_CLONED)); bio_for_each_segment_all(bvec, bio, iter_all) { @@ -2760,6 +2761,11 @@ static void end_bio_extent_writepage(struct bio *bio) start = page_offset(page); end = start + bvec->bv_offset + bvec->bv_len - 1; + if (first_bvec) { + btrfs_record_physical_zoned(inode, start, bio); + first_bvec = false; + } + end_extent_writepage(page, error, start, end); end_page_writeback(page); } @@ -3582,6 +3588,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, struct extent_map *em; int ret = 0; int nr = 0; + int opf = REQ_OP_WRITE; const unsigned int write_flags = wbc_to_write_flags(wbc); bool compressed; @@ -3628,6 +3635,10 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, /* Note that em_end from extent_map_end() is exclusive */ iosize = min(em_end, end + 1) - cur; + + if (btrfs_use_zone_append(inode, em)) + opf = REQ_OP_ZONE_APPEND; + free_extent_map(em); em = NULL; @@ -3653,8 +3664,8 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, page->index, cur, end); } - ret = submit_extent_page(REQ_OP_WRITE | write_flags, wbc, - page, disk_bytenr, iosize, + ret = submit_extent_page(opf | write_flags, wbc, page, + disk_bytenr, iosize, cur - page_offset(page), &epd->bio, end_bio_extent_writepage, 0, 0, 0, false); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index d81ae1f518f2..eaa1e473e75e 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2174,8 +2174,12 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) * commit waits for their completion, to avoid data loss if we fsync, * the current transaction commits before the ordered extents complete * and a power failure happens right after that. + * + * For zoned btrfs, if a write IO uses a ZONE_APPEND command, the + * logical address recorded in the ordered extent may change. We + * need to wait for the IO to stabilize the logical address. */ - if (full_sync) { + if (full_sync || btrfs_is_zoned(fs_info)) { ret = btrfs_wait_ordered_range(inode, start, len); } else { /* diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 419f4290bdf8..e3e4b4f7c0d7 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -50,6 +50,7 @@ #include "delalloc-space.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" struct btrfs_iget_args { u64 ino; @@ -2878,6 +2879,9 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) goto out; } + if (ordered_extent->disk) + btrfs_rewrite_logical_zoned(ordered_extent); + btrfs_free_io_failure_record(inode, start, end); if (test_bit(BTRFS_ORDERED_TRUNCATED, &ordered_extent->flags)) { diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 7c061146ead9..4d36cc87a3f7 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -199,6 +199,9 @@ static int __btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset entry->compress_type = compress_type; entry->truncated_len = (u64)-1; entry->qgroup_rsv = ret; + entry->physical = (u64)-1; + entry->disk = NULL; + entry->partno = (u8)-1; ASSERT(type == BTRFS_ORDERED_REGULAR || type == BTRFS_ORDERED_NOCOW || diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index c400be75a3f1..6cb35df00d21 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -139,6 +139,14 @@ struct btrfs_ordered_extent { struct completion completion; struct btrfs_work flush_work; struct list_head work_list; + + /* + * used to reverse-map physical address returned from ZONE_APPEND + * write command in a workqueue context. + */ + u64 physical; + struct gendisk *disk; + u8 partno; }; /* diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index e69754af2eba..4cb5e940356e 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6507,6 +6507,21 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio, btrfs_io_bio(bio)->device = dev; bio->bi_end_io = btrfs_end_bio; bio->bi_iter.bi_sector = physical >> 9; + /* + * For zone append writing, bi_sector must point the beginning of the + * zone + */ + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + if (btrfs_dev_is_sequential(dev, physical)) { + u64 zone_start = round_down(physical, + fs_info->zone_size); + + bio->bi_iter.bi_sector = zone_start >> SECTOR_SHIFT; + } else { + bio->bi_opf &= ~REQ_OP_ZONE_APPEND; + bio->bi_opf |= REQ_OP_WRITE; + } + } btrfs_debug_in_rcu(fs_info, "btrfs_map_bio: rw %d 0x%x, sector=%llu, dev=%lu (%s id %llu), size=%u", bio_op(bio), bio->bi_opf, bio->bi_iter.bi_sector, diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index abe6b415de98..4f1801b71458 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1252,3 +1252,76 @@ bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em) return ret; } + +void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, + struct bio *bio) +{ + struct btrfs_ordered_extent *ordered; + u64 physical = (u64)bio->bi_iter.bi_sector << SECTOR_SHIFT; + + if (bio_op(bio) != REQ_OP_ZONE_APPEND) + return; + + ordered = btrfs_lookup_ordered_extent(BTRFS_I(inode), file_offset); + if (WARN_ON(!ordered)) + return; + + ordered->physical = physical; + ordered->disk = bio->bi_disk; + ordered->partno = bio->bi_partno; + + btrfs_put_ordered_extent(ordered); +} + +void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered) +{ + struct extent_map_tree *em_tree; + struct extent_map *em; + struct inode *inode = ordered->inode; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + struct btrfs_ordered_sum *sum; + struct block_device *bdev; + u64 orig_logical = ordered->disk_bytenr; + u64 *logical = NULL; + int nr, stripe_len; + + /* + * Zoned devices should not have partitions. So, we can assume it + * is 0. + */ + ASSERT(ordered->partno == 0); + bdev = bdgrab(ordered->disk->part0); + if (WARN_ON(!bdev)) + return; + + if (WARN_ON(btrfs_rmap_block(fs_info, orig_logical, bdev, + ordered->physical, &logical, &nr, + &stripe_len))) + goto out; + + WARN_ON(nr != 1); + + if (orig_logical == *logical) + goto out; + + ordered->disk_bytenr = *logical; + + em_tree = &BTRFS_I(inode)->extent_tree; + write_lock(&em_tree->lock); + em = search_extent_mapping(em_tree, ordered->file_offset, + ordered->num_bytes); + em->block_start = *logical; + free_extent_map(em); + write_unlock(&em_tree->lock); + + list_for_each_entry(sum, &ordered->list, list) { + if (*logical < orig_logical) + sum->bytenr -= orig_logical - *logical; + else + sum->bytenr += *logical - orig_logical; + } + +out: + kfree(logical); + bdput(bdev); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 92888eb86055..cf420964305f 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -47,6 +47,9 @@ void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb); void btrfs_free_redirty_list(struct btrfs_transaction *trans); bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em); +void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, + struct bio *bio); +void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -139,6 +142,15 @@ bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em) { return false; } + +static inline void btrfs_record_physical_zoned(struct inode *inode, + u64 file_offset, struct bio *bio) +{ +} + +static inline void btrfs_rewrite_logical_zoned( + struct btrfs_ordered_extent *ordered) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Jan 26 02:25:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048133 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80B4EC433DB for ; Tue, 26 Jan 2021 19:55:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5565E22CE3 for ; Tue, 26 Jan 2021 19:55:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387517AbhAZF34 (ORCPT ); Tue, 26 Jan 2021 00:29:56 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38278 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732166AbhAZCjA (ORCPT ); Mon, 25 Jan 2021 21:39:00 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628739; x=1643164739; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=niCYSduaj1gcu8wTmuqzJ1Ki+mzQOWMfoFT3y/unoec=; b=VbgjJyFsSDaTYZGzFEIt98FpqCU3awhozj2ZVDqjQQhOLLvT+RfGf9rs mUb7wCojTTt5ZJzCuQYFfk1vsw01OIYqz+mL7gLOHVRZUHKU3eeIDRPV+ vG6u6qh4stYP/KNdC06gvJGyJPTE66CXLH1xEoB0ePJRh/ACf3DOBD98j fy0nlo0QsqVUAy8nKKMekP7TQ5fOTllEQkaEbZQ6pQduPDQoZeEAG8btK GV9jjlXYSZv2koD3rdkwhWIw4TYPl+JFK8f5nDkKtwyJ9mydtAnwrdx+c B5RBnNsO6FybhFl0LyEzONRNnHgvTu4yNw5q2QJ+UNTLo/8auszCQHcrJ g==; IronPort-SDR: KmClbd228Jk69ErQQLXtitf/6oPVKcSS4ESgoF/D7mFPryVRsArO/S3+MpMOTjPlmyjygA1h19 S33c5VDXITQHPrsbb+FBWFZNGPArkCB9G3uFlSSKV4n+ABOhtcCjIymT/pebsJvUJ0FE6rH7Kw T2pgd2hAj+6orQ2PWaKKdcfhkktxPmwS3mE3/OMo0mdPlUaV1/4QRsNQw0gu39FN86nkx5YGOF l2o4VIYUbHwal8n3BIq7xMSxA0M16s0xZuIVZPp9gsCxeuCI7Yq+q2Z1Ux3QAxChwmK0pbltVd qeY= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483564" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:47 +0800 IronPort-SDR: qRO2nLMxwI1L44Oqqo8NOozx4KT4q06QZ+TXRM7ji2unglDts8mLqR4CCSMBa7lwGBRT5ctRox B42xaGOeEev6AKdxaRiFz6oy8FndfnExpC86e+oANktU7L+7g8FjMGvcfXYEkfecsT3cThscVL TtyS3i0TE4l0oeeooTKosizNo2XTXFBDrxyMPEm5MC+kn9GIqvgXg59Al5j+hSXIrBAVmW6MWA s03cp8+Q5zXsma+Ynu/xSNthntMyxWWQgBLqNZnmT1CNFMupxkO/ycCCWiBcoFxltz7/+Bviei fOgcytRRBSMqgdvVG4t1ASEP Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:13 -0800 IronPort-SDR: XJrfF4KEJpM4IGm7W1vPXouxXc2yF3Vy0wq3l5LniIzjzkBKKrlPVEuhLI6O274WcX9w5bWHtD vw0Kdalhjy0qZlSZ4Ty2oDHPu8Cuo8I3z8nxMudyHYYJvg+biu22UYHxdfgkrOlCNkPA2zqBi6 21ii39bNiNysWkqwvnI2zGC57If8sU7TZNVxOw5AuNJpHyo5bGdUhFTacxCCrPsQXNddOjUuFt C3wyOCty79QNGJepefEK1+H8/nUIRi7XFvuPeDfxq0inc8Gvl8thx8MuEYnXFjyX1ovmC30JvP pCU= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:45 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 28/42] btrfs: enable zone append writing for direct IO Date: Tue, 26 Jan 2021 11:25:06 +0900 Message-Id: <537b04a8699749b53431f875692c18bd1e7b379c.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Likewise to buffered IO, enable zone append writing for direct IO when its used on a zoned block device. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e3e4b4f7c0d7..a9bf78eaed42 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7743,6 +7743,9 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start, iomap->bdev = fs_info->fs_devices->latest_bdev; iomap->length = len; + if (write && btrfs_use_zone_append(BTRFS_I(inode), em)) + iomap->flags |= IOMAP_F_ZONE_APPEND; + free_extent_map(em); return 0; @@ -7969,6 +7972,8 @@ static void btrfs_end_dio_bio(struct bio *bio) if (err) dip->dio_bio->bi_status = err; + btrfs_record_physical_zoned(dip->inode, dip->logical_offset, bio); + bio_put(bio); btrfs_dio_private_put(dip); } @@ -8121,6 +8126,19 @@ static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap, bio->bi_end_io = btrfs_end_dio_bio; btrfs_io_bio(bio)->logical = file_offset; + WARN_ON_ONCE(write && btrfs_is_zoned(fs_info) && + fs_info->max_zone_append_size && + bio_op(bio) != REQ_OP_ZONE_APPEND); + + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + status = extract_ordered_extent(BTRFS_I(inode), bio, + file_offset); + if (status) { + bio_put(bio); + goto out_err; + } + } + ASSERT(submit_len >= clone_len); submit_len -= clone_len; From patchwork Tue Jan 26 02:25:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048135 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF306C43381 for ; Tue, 26 Jan 2021 19:55:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9484D22CE3 for ; Tue, 26 Jan 2021 19:55:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387506AbhAZF3y (ORCPT ); Tue, 26 Jan 2021 00:29:54 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38256 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732165AbhAZCjA (ORCPT ); Mon, 25 Jan 2021 21:39:00 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628739; x=1643164739; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nu7Q1or+YsEP3U542BppwgqnO6OSlLy2x7LKzUGLHv4=; b=A++wFXrL1VUOxe67kX2AqpapSN2z0ilWkejmH+tFXAjIqU0M0aWSW3u9 lGTXEQ5zQuAGLvSkwor5h3/NdcGEGjOlsjo8rEVp44M48h1TzYYCQxTv0 VVS+MgJp0+5xNgxmhs/lPk3cy9HW669YkQMoM99gPwN8DV3Ktne4xCMDQ LIgJCkBmx4Wi4lFN5izDQgAtS4v4zg5HOPXj0iLbR8Tl124sqLbrNZ9xi l5DsYpCOzuiQ0xfWC2uPJOeiI2ZvY8UtvKa26n+D9WStiJ1ihH4n74Mu6 bArbZEt1XPMtrlCNveLDGTBpmKNQTzJyp8gYdikjnGgCf54FdMohJ86sf Q==; IronPort-SDR: oK7TtG0IfuTDQhGynpSTuZ2R1/E0gOF1T4OJtw2w0xd9/hkeribU47q0dciACIQVK2bDIvSTUn HV9LGS9PucLikRKCLz1zBY21VHMtpeIQV1DYQHoS5ZzHV5mm5qpZPZiq44uPr5BRveKFdzDopN UhtNrbIns/Ous9+bB5CMlsCHwOOIw2yBgMxQDu/ES1vk2Zz0zHSBGW/twHJCFdqiH7ov4YJFFG YWyhdPFmpdRE6rjJRlmVHEbiNDWLbBuP2/k2ZjjgMXognJpAsaLUcmh5wJJTG1FAXNS1nyRsFk IAo= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483565" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:48 +0800 IronPort-SDR: vwOcthLbAJG14QF22xZpdBrZpEVKVMcz7g+mYMqZ3z6bj8r4yzrseY2tJDKJmJX0PnSWqO4GEI 0bkWGY1Hup2gBVo69Jbhu67DGcsLV6ZDH+t6y6uJeMPNJh41kGrwxUhphqf/CxSMX6Wcnb58rG jIWVV4ZnVFqW+bDHpFZ/+RKwim9xPWujgXbUSgbHHoVNmjJLndrukkWritSGLOqjv2oL49TDXc bKoadOAtexKRShe7g8PqULIhW7sfpqM2L/plATi1JEsGn/Pd6M4JzihG8MqwhMAqUOZdBVjLTc C/GzrmXwuYYKEhxTXYARHFpI Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:15 -0800 IronPort-SDR: BBpfXBnCy+2awBsfd3IuK6sHpxexOFb3rh5HCqXTEcXd7NqzY6JkNWPR55XHbn4M5Siz3L0evJ nCu8lB8d/cd0YQTIbpaACBrlAB10SaSnV0lgQI3jkv2fDwO/hFm/zQxBTaVWgQbZdGiGWJbaVA 3Zxh5eCpXxZBcXEG9y2AyVz2J5hFWG86LH6YFft2hxOHD06OEhsoIdI6rlZZVL2sbAUmg7cB5Q PzcvqtXgXDFFjSREcZt795ExjdRav6Bj4JRizJOsZr1Q6RLF3ILdIvR1DY09VcPQoH90ngG2Pg BgI= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:47 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 29/42] btrfs: introduce dedicated data write path for ZONED mode Date: Tue, 26 Jan 2021 11:25:07 +0900 Message-Id: <698bfc6446634e06a9399fa819d0f19aba3b4196.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org If more than one IO is issued for one file extent, these IO can be written to separate regions on a device. Since we cannot map one file extent to such a separate area, we need to follow the "one IO == one ordered extent" rule. The Normal buffered, uncompressed, not pre-allocated write path (used by cow_file_range()) sometimes does not follow this rule. It can write a part of an ordered extent when specified a region to write e.g., when its called from fdatasync(). Introduces a dedicated (uncompressed buffered) data write path for ZONED mode. This write path will CoW the region and write it at once. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 34 ++++++++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a9bf78eaed42..6d43aaa1f537 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1400,6 +1400,29 @@ static int cow_file_range_async(struct btrfs_inode *inode, return 0; } +static noinline int run_delalloc_zoned(struct btrfs_inode *inode, + struct page *locked_page, u64 start, + u64 end, int *page_started, + unsigned long *nr_written) +{ + int ret; + + ret = cow_file_range(inode, locked_page, start, end, + page_started, nr_written, 0); + if (ret) + return ret; + + if (*page_started) + return 0; + + __set_page_dirty_nobuffers(locked_page); + account_page_redirty(locked_page); + extent_write_locked_range(&inode->vfs_inode, start, end, WB_SYNC_ALL); + *page_started = 1; + + return 0; +} + static noinline int csum_exist_in_range(struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes) { @@ -1879,17 +1902,24 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page { int ret; int force_cow = need_force_cow(inode, start, end); + const bool do_compress = inode_can_compress(inode) && + inode_need_compress(inode, start, end); + const bool zoned = btrfs_is_zoned(inode->root->fs_info); if (inode->flags & BTRFS_INODE_NODATACOW && !force_cow) { + ASSERT(!zoned); ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 1, nr_written); } else if (inode->flags & BTRFS_INODE_PREALLOC && !force_cow) { + ASSERT(!zoned); ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 0, nr_written); - } else if (!inode_can_compress(inode) || - !inode_need_compress(inode, start, end)) { + } else if (!do_compress && !zoned) { ret = cow_file_range(inode, locked_page, start, end, page_started, nr_written, 1); + } else if (!do_compress && zoned) { + ret = run_delalloc_zoned(inode, locked_page, start, end, + page_started, nr_written); } else { set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags); ret = cow_file_range_async(inode, wbc, locked_page, start, end, From patchwork Tue Jan 26 02:25:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14925C43333 for ; Tue, 26 Jan 2021 19:55:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D9D7122A83 for ; Tue, 26 Jan 2021 19:55:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387526AbhAZF35 (ORCPT ); Tue, 26 Jan 2021 00:29:57 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33036 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732266AbhAZCk5 (ORCPT ); Mon, 25 Jan 2021 21:40:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628857; x=1643164857; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bSyjCVLFKxom+NnfZj3BrXysuI7QxXFL5HEdW7K34II=; b=M0h1FGy3wVUbpZiFhu+RynFpI2V7GV7vQpEA7Sa3dMy5dmOudc23vI4t NrDZwdKzK3nxfVdMYjhZjOVcg7k/54CNvsGM56Io0qSv9HHAJaQv/oDZ8 2Af26gmH8qXaR3yVN8c0nEV45jh1aA0gNfIdfTLv4SI6Vxb9QQ3aQtoxb IpoOjqziwhfHZeu4c18y2Q5o21/a/NtG8ZxENHSeX0+CyKEY05TQbfVnB YeHU6UGk/g7oXqY5e8avt3z6CdPBxsiwqb7oToCU07IfnCUJ/+OeG+S1A L4rTNQlF2jPgNMWkLsA/Eu3gUxu2Pm5NI6LG2BJ31vl+skuFb4ftN4NbU w==; IronPort-SDR: 2pjL8uRAruWxRyLA3j9j5A5NgmoVw5FQMq2wZ8mqonf3PqaFPR/nEL7tALQg3qV2eaSXJeZJoj sVYKkluKOKp6C3RWBl34xVVMehowdpk0LJSHiejgaQEuysTMI8IpOewAh1yXOVJv+UB8jcn00b F/6Ros01DdZmhKPGylv0td4PzxDI/5xq0+lPIYJOgJ3ClPrcQmiFLYdt/skHfTopP0yQrOQvTm vtTu53U/jCMSIMIi3AsVbla3+R/n51+fYJbY0kOjDzHHEDzLJG1ZgPROBse7Icf8TDK0eCejH8 vmc= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483567" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:50 +0800 IronPort-SDR: QwxTAQD1m5Vd5BAZJMsCn+W0R2nLSoZlnRX/pPQdwy7+nVHv44TgC0680F7bwWZDGDpZRcvM8K +pO9f4AX1p2C78te5ExmssKPXI2exzuvDKGHxj3/WKSFyVab8FnL+lnVvEAj42qumEIBVnJbo8 jishk79yNiXqzBvXDm2E+3twWlldC0pWOdOPXQQ5UkeWgO3Hc+0TdmJ78XmqLKklNNOhCZ3AVB 8S6PtQZ9JTAd+mOkZe86zuWwUhzVodaUaf5Ai3iPA5tYA4I6lHPOIOi14XOrDeC7rw46Kbslr8 ckbJJyVpqAGi0k63WuDvxvm7 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:16 -0800 IronPort-SDR: 0x6ECqD9Fx1xe1dk3B54LEjk/iXKFnh20iKUfPDpKUmLTu2QLAZm8BGVrmGFsaP6+I99uAAh2A ns7IYvdGLqtlalPqiWf7qAjNIrda4Qdv1/Zj9SaEt1dfskWuAW8p24oFTTXl7IA9U4zFJ6WQOe ChpvvtiPQ5afMrhTmmaVWX6EW9udZkbbCFo4YbuPVAXDyg7a7PeQXK9DsSzJ4ElM8QF/MMu5Re 6bhPi7smGFK2HD1soTpGGgFuJa6hYp2tPKCzdsnR4LHAl1hngQCTaI0Y/4e2iskdkyBeUjV8Wv Lwg= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:48 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 30/42] btrfs: serialize meta IOs on ZONED mode Date: Tue, 26 Jan 2021 11:25:08 +0900 Message-Id: <50c5a35ef64d4b6d58a1c928acceb5e40b09f523.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org We cannot use zone append for writing metadata, because the B-tree nodes have references to each other using the logical address. Without knowing the address in advance, we cannot construct the tree in the first place. So we need to serialize write IOs for metadata. We cannot add a mutex around allocation and submission because metadata blocks are allocated in an earlier stage to build up B-trees. Add a zoned_meta_io_lock and hold it during metadata IO submission in btree_write_cache_pages() to serialize IOs. Furthermore, this add a per-block group metadata IO submission pointer "meta_write_pointer" to ensure sequential writing, which can be caused when writing back blocks in an unfinished transaction. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.h | 1 + fs/btrfs/ctree.h | 1 + fs/btrfs/disk-io.c | 1 + fs/btrfs/extent_io.c | 25 ++++++++++++++++++++- fs/btrfs/zoned.c | 50 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 32 +++++++++++++++++++++++++++ 6 files changed, 109 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index a1d96c4cfa3b..19a22bf930c6 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -192,6 +192,7 @@ struct btrfs_block_group { */ u64 alloc_offset; u64 zone_unusable; + u64 meta_write_pointer; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 6c4ff56eeb5e..37afe3f49045 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -975,6 +975,7 @@ struct btrfs_fs_info { /* Max size to emit ZONE_APPEND write command */ u64 max_zone_append_size; + struct mutex zoned_meta_io_lock; #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index ba0ca953f7e5..a41bdf9312d6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2704,6 +2704,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) mutex_init(&fs_info->delete_unused_bgs_mutex); mutex_init(&fs_info->reloc_mutex); mutex_init(&fs_info->delalloc_root_mutex); + mutex_init(&fs_info->zoned_meta_io_lock); seqlock_init(&fs_info->profiles_lock); INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 75df05193eb8..8de609d1897a 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -25,6 +25,7 @@ #include "backref.h" #include "disk-io.h" #include "zoned.h" +#include "block-group.h" static struct kmem_cache *extent_state_cache; static struct kmem_cache *extent_buffer_cache; @@ -4075,6 +4076,7 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc, struct extent_buffer **eb_context) { struct address_space *mapping = page->mapping; + struct btrfs_block_group *cache = NULL; struct extent_buffer *eb; int ret; @@ -4107,13 +4109,31 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc, if (!ret) return 0; + if (!btrfs_check_meta_write_pointer(eb->fs_info, eb, &cache)) { + /* + * If for_sync, this hole will be filled with + * trasnsaction commit. + */ + if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync) + ret = -EAGAIN; + else + ret = 0; + free_extent_buffer(eb); + return ret; + } + *eb_context = eb; ret = lock_extent_buffer_for_io(eb, epd); if (ret <= 0) { + btrfs_revert_meta_write_pointer(cache, eb); + if (cache) + btrfs_put_block_group(cache); free_extent_buffer(eb); return ret; } + if (cache) + btrfs_put_block_group(cache); ret = write_one_eb(eb, wbc, epd); free_extent_buffer(eb); if (ret < 0) @@ -4159,6 +4179,7 @@ int btree_write_cache_pages(struct address_space *mapping, tag = PAGECACHE_TAG_TOWRITE; else tag = PAGECACHE_TAG_DIRTY; + btrfs_zoned_meta_io_lock(fs_info); retry: if (wbc->sync_mode == WB_SYNC_ALL) tag_pages_for_writeback(mapping, index, end); @@ -4199,7 +4220,7 @@ int btree_write_cache_pages(struct address_space *mapping, } if (ret < 0) { end_write_bio(&epd, ret); - return ret; + goto out; } /* * If something went wrong, don't allow any metadata write bio to be @@ -4234,6 +4255,8 @@ int btree_write_cache_pages(struct address_space *mapping, ret = -EROFS; end_write_bio(&epd, ret); } +out: + btrfs_zoned_meta_io_unlock(fs_info); return ret; } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 4f1801b71458..7cf7d74247c7 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1161,6 +1161,9 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) ret = -EIO; } + if (!ret) + cache->meta_write_pointer = cache->alloc_offset + cache->start; + kfree(alloc_offsets); free_extent_map(em); @@ -1325,3 +1328,50 @@ void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered) kfree(logical); bdput(bdev); } + +bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group **cache_ret) +{ + struct btrfs_block_group *cache; + bool ret = true; + + if (!btrfs_is_zoned(fs_info)) + return true; + + cache = *cache_ret; + + if (cache && (eb->start < cache->start || + cache->start + cache->length <= eb->start)) { + btrfs_put_block_group(cache); + cache = NULL; + *cache_ret = NULL; + } + + if (!cache) + cache = btrfs_lookup_block_group(fs_info, eb->start); + + if (cache) { + if (cache->meta_write_pointer != eb->start) { + btrfs_put_block_group(cache); + cache = NULL; + ret = false; + } else { + cache->meta_write_pointer = eb->start + eb->len; + } + + *cache_ret = cache; + } + + return ret; +} + +void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, + struct extent_buffer *eb) +{ + if (!btrfs_is_zoned(eb->fs_info) || !cache) + return; + + ASSERT(cache->meta_write_pointer == eb->start + eb->len); + cache->meta_write_pointer = eb->start; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index cf420964305f..a42e120158ab 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -50,6 +50,11 @@ bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em); void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, struct bio *bio); void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered); +bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group **cache_ret); +void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, + struct extent_buffer *eb); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -151,6 +156,19 @@ static inline void btrfs_record_physical_zoned(struct inode *inode, static inline void btrfs_rewrite_logical_zoned( struct btrfs_ordered_extent *ordered) { } +static inline bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group **cache_ret) +{ + return true; +} + +static inline void btrfs_revert_meta_write_pointer( + struct btrfs_block_group *cache, + struct extent_buffer *eb) +{ +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) @@ -243,4 +261,18 @@ static inline bool btrfs_can_zone_reset(struct btrfs_device *device, return true; } +static inline void btrfs_zoned_meta_io_lock(struct btrfs_fs_info *fs_info) +{ + if (!btrfs_is_zoned(fs_info)) + return; + mutex_lock(&fs_info->zoned_meta_io_lock); +} + +static inline void btrfs_zoned_meta_io_unlock(struct btrfs_fs_info *fs_info) +{ + if (!btrfs_is_zoned(fs_info)) + return; + mutex_unlock(&fs_info->zoned_meta_io_lock); +} + #endif From patchwork Tue Jan 26 02:25:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048129 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B853FC43332 for ; Tue, 26 Jan 2021 19:55:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 921FD22EBD for ; Tue, 26 Jan 2021 19:55:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387537AbhAZF37 (ORCPT ); Tue, 26 Jan 2021 00:29:59 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33029 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732274AbhAZCk7 (ORCPT ); Mon, 25 Jan 2021 21:40:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628859; x=1643164859; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=j0EWFLd0YVg9PnzQoQvxkFICOjN0HWzrNeMzZ6iJiaE=; b=GXEokJEJoBHvUzOdMjrzIbIQ73Ku5DUqYTmDveXrwlZix0iFDMv0hhdv olqLD495UpHqRL7HkSGbdcynE9Dw1DUkkD8Vply/eB5EVwkTiEU4gLXcR qMP9JzOcJ+40wzxP190ZZ6ymEc9rOuobz8YjIScEv/CcV04TcaBE+2Bfq hlBmSxLYuG52ztP7qpCRG9JbKkKX05CGXuMo/uzDX/lHpvlVSCAXZ8pYN NTDskFwky7o6+tW36dsKM0mQEk14yALv1SvEZLq9mmnLWMOf3nEv4NsWs RGS1oyGiQAUW+Ony60TfZURGaqZlOnpCZNHWfiVPfoBeuJY0R/00H2hx1 g==; IronPort-SDR: 9nUp3QNPXBXrkljaXBPO2m1YJHZDABIqf9EXfEnys7pYRsQipgv9CrHR0XX5/qEOjMiO0qssSU NMJTYC8jWC4/9RjI5rZzfYwtjLW4UIZxi6gJ0cwVuJAiHxd178BYrDgej368jwvtekapsZ4F+P TMKQDNmYeYOONF8vMB2mI5Jarx3iiOSO1v/Pfkq1FFjHF0EWPUiDQ0JTvudBoaLebRPl7DfM7a 1uJcQ/YfATzJGMWDhej9rIdzsBy07rDIFLYpB6NIVthAaqE+nGdCqDef6iIwBrcWX+T9+TrNUN JE0= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483569" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:52 +0800 IronPort-SDR: CYIDFRa0CIyJNytJOFUiOahm7NgC7Mkc0dkRB3bbCfLAUK4N84whmA/Rz5NcY+jolFXqAWxeDF nlZ8o7bOO88m1qvauqR+T+xreeXaoBOrj9lzRXXORhu9JUmE25F3hcd2uRUMXvE6lE2QuiF99x tTRoDmky46Biilay7HIX3hhXRgk5mhnBk2icHLU+sDQ19uHwuONVs8A6EkJhYGev5Ehfc3X1IV LbYCIdxG+pa6r8/upE81ytbvFf9kGvzLRH0zde54t7A6wZVDEa3oYrEbfpD+8xu0fmn50Yo35f 23o/cOytVwEQZvKnAfJ+Z4V1 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:18 -0800 IronPort-SDR: MXxqvCYH5A/InhK+760NThT0W0ZGWOmfnZkyK1c1plG2NR8N+DsopklFs0lYOYAQnq75Qs3mq7 sm5t5lVwVdKnIMAMkCLhZCrx6CBgLu08hCaoTUYm6vcDyTuVHLWTpQEa9jhju6KkaOpkd+yi6S wmE83+EkkkmeycKmNzU9HXpRhEyf+AnRqlbNinW2OH1hzqrxRVB9pG153Atccvg1tzex/l+Vqd mUlRYQFtUJhmH5QQbelgBi6JFGG0YzJpDCqO53YBBFngvMdD6/zyC99OffaRkFLNy3bD92FoeF 5Cs= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:50 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 31/42] btrfs: wait existing extents before truncating Date: Tue, 26 Jan 2021 11:25:09 +0900 Message-Id: <43dc46a32f060ab1e76cb0a7d98e517de2a73356.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When truncating a file, file buffers which have already been allocated but not yet written may be truncated. Truncating these buffers could cause breakage of a sequential write pattern in a block group if the truncated blocks are for example followed by blocks allocated to another file. To avoid this problem, always wait for write out of all unwritten buffers before proceeding with the truncate execution. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/inode.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 6d43aaa1f537..5b8f97469964 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5171,6 +5171,16 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr) btrfs_drew_write_unlock(&root->snapshot_lock); btrfs_end_transaction(trans); } else { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + + if (btrfs_is_zoned(fs_info)) { + ret = btrfs_wait_ordered_range( + inode, + ALIGN(newsize, fs_info->sectorsize), + (u64)-1); + if (ret) + return ret; + } /* * We're truncating a file that used to have good data down to From patchwork Tue Jan 26 02:25:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048127 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6919FC4332D for ; Tue, 26 Jan 2021 19:55:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3D0C82311D for ; Tue, 26 Jan 2021 19:55:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387549AbhAZFaC (ORCPT ); Tue, 26 Jan 2021 00:30:02 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33033 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732283AbhAZClC (ORCPT ); Mon, 25 Jan 2021 21:41:02 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628862; x=1643164862; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2s5NK8tSWvlhL5OFm4v5zkBSPKsGpOvG+1jvfV3eidk=; b=PgvcKT1F+lG/ikSRMOMOgFIj0qcD+cDDTRmO5oI29wDHExNTVfeUFXZZ ExvdLoPqe6WVUYcxyOdKQD6Cynlg7odFPfLlKuuUY+oLHVbOCqhDQSmF2 Cb3ON/YfhtgN88lmTinJZ74EBbPbicCyJpaz71io+/ReX9nMTgnviJ68K Y+L84S3IUbCkjyzPof65FgGrPUJStW0F17pFOvPu5GtU/1hhlhWoxFHna rZCZ9QNouGai74lipYL303koSotPtfFz8ABheiO4x0/Xwgy+MH4+BzcMp jYgK5Nh1ZM9jyBVDOEc3xf9gTveNxrpRvayZBK8KPNj3GvsIyLlu2bzCC w==; IronPort-SDR: OmhjJhBlrGNYPXbn17bkRs4+Te1mic33KZXBqPy7IZd4mJqSSVSVBgNgLnxUieNBaZNmdz2+YV bwUK6PbtOgEr09aq9GMhMbTqSPatC8i1pHPPZgI3KIYX/xdp4f227O4XXPJhdX1lRzg53ZEdA6 UxOYz0UQztgFrW/hsUcdOOsRHs097IRVgi3/Gg9MPoDBP6IKC3T9Uyh8eeLA1GKHZhEAhyScUR XVUFOxSp1h45BxzW94UGAK0LJnabTHJ6+rMsonxkRzBBYVGZybO7IY/UOdGzwZzX2DKTIfv0md ksc= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483570" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:53 +0800 IronPort-SDR: RfR6M534V2iPo/+KbE7Fb9RJMS5IynvqJrgaTMf2rgn2/0GAID8LO4ThJ9DQ0XgrDSJp3138LC jvT0Ex1TXvQOBTGPPRXVRI29ziOTLu2lUmVctiinqSV/RgWVzn5xdTj4tVbq6s10rzlFHxxaa+ jmAEHU7S6yL0xeEMpEJe1iPyMCLNfj2lIG0AjmXjX5ZsCbgnIio884jFOZI1CNvJZM/AElKpqe 9xM9ZoIQxXFkLRvjmfi0di/8EEv3nNBO7MwhY20Dzs75KwQcbhD9LRIDSDmiOuakH/7rzYAhJ6 KcnaFlCDqrLHd00PaS1jGsf5 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:20 -0800 IronPort-SDR: A0AFHjaDQFZ7MQvLjVov5GROTL1kleCIbn4O47K1mRjENq8sz5FDORroRUuxSillJ1HAtTzsbv rw9wfZkF81KTmgINiJBqGO/EZ2CXPFpCCQAFboltnG3brI2n61/jXCG50Bzx/hSMHkxng185cF hK216r1rtT8H+Ij4mPtDmpSpC4i9fDgu8EUtgZUNF2S8ZJn2c7EDUkRBO13x0X5NTT+O/kHypZ cVdmPA56cnNoBz1oGfHJyvyu/JZDNzVLMb/5sdUrE+bi/73NLzhdraWfvXLGPEmTUcxZzVseGX 1qc= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:52 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 32/42] btrfs: avoid async metadata checksum on ZONED mode Date: Tue, 26 Jan 2021 11:25:10 +0900 Message-Id: <13728adcc4f433c928b00be73ea5466f62ccb4b9.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In ZONED, btrfs uses per-FS zoned_meta_io_lock to serialize the metadata write IOs. Even with these serialization, write bios sent from btree_write_cache_pages can be reordered by async checksum workers as these workers are per CPU and not per zone. To preserve write BIO ordering, we can disable async metadata checksum on ZONED. This does not result in lower performance with HDDs as a single CPU core is fast enough to do checksum for a single zone write stream with the maximum possible bandwidth of the device. If multiple zones are being written simultaneously, HDD seek overhead lowers the achievable maximum bandwidth, resulting again in a per zone checksum serialization not affecting performance. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a41bdf9312d6..5d14100ecf72 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -814,6 +814,8 @@ static blk_status_t btree_submit_bio_start(struct inode *inode, struct bio *bio, static int check_async_write(struct btrfs_fs_info *fs_info, struct btrfs_inode *bi) { + if (btrfs_is_zoned(fs_info)) + return 0; if (atomic_read(&bi->sync_writers)) return 0; if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags)) From patchwork Tue Jan 26 02:25:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048125 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 171AEC433E9 for ; Tue, 26 Jan 2021 19:55:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D9D1C22A83 for ; Tue, 26 Jan 2021 19:55:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387559AbhAZFaF (ORCPT ); Tue, 26 Jan 2021 00:30:05 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38250 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730911AbhAZClR (ORCPT ); Mon, 25 Jan 2021 21:41:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628876; x=1643164876; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=u+vkAh1NMFAZIkRuh6fcXsdDpdyPjj3y49UjVClZ7hA=; b=lnVtWhvxVhtFtXWAI9RR2TcRa5xheaJ8kODrDR/R75MtHVRPfZl7nx9t jYHflTpYpGpj6RILtOhci/vLHgEjMeKQOy/6R8ym/EWCSWsVHNmNI1W/x IOa6zUdun8PiTM4S1Bu2PX8o8FYJJvWxFj3GFhd9ls45Khk5LmdG0aWMG NYd88DjKLBJB83p1ENMWhDsUnWWk55uEgVdfXhKABt9vaour492QUK1BT l3iiHy/EdZWNpFLJ4yOgZ+JfiPOv2XqXyhX9B/vf4BRVQ2/COPxQMBc9o N3VZ9XR7EvQ8FlINvfTK2bwv4X5ugQ80SbUACd/sknUdxU/GnvAmo0BVS w==; IronPort-SDR: dkaYWHqxjTEeuWmOc+D2f45NfD69WG+TTKM9mb7W3ymvnkxWUbzqq44OUGmhp4rCE/kvOpMGZe ++7qrwR+XRUXbZ97Xpz/JDzEsZvo3Chr7H7G/7lF64vj/zJU3h+RufwZe+orLr65ai340Bj3y/ ZzzlRcz224Cx3LegvtYDzK/jd+sQgTeGCq49CI8cIowOs8zln09LK/3RYsouPwm8UT4sVy1vfy U9GuYCuqbkA4KfLWEJg/+SU375jL6lbGAga74VmUDHhmLIw28bRusprAoKBARQRwSfP+/wi5oF EsI= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483574" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:55 +0800 IronPort-SDR: TS8XPHoLyLKTrLXaib3jfdvRa0rPkI2EsqO5Vx/lWlgImzkfQHaXnb59lqtDg8U/cvZleaNvpj GlvSltXu73dmAYRPfpKn8D86Fi0GmEHANSg81mnrKYvS00WYf5AwG4s3WGpM9/oGEmxjGh1vT7 GeQBcVDpPd8egu33oNwILvmsfZ0oSi6dAd+BHwKOk1vrrdnTf3rboYe9I9sAbenLlpkFdUR5GM E+9Mxj/olUQ422zaNyWJnH/qlCk7xzbpx771+x701b5sIdtTGbnCh5jRAvwCHgCFCx5sfcbg9W M+FVFtps/ulMgA875fQmNZ0D Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:22 -0800 IronPort-SDR: epiLyE+NFzQU16hAhRlqRc548VDneOhQpccFiKNZCmiyjyon2QXR90H1xtiWPZFktAO/xwmvJx 7Onp7+k0iJ09do0PhcfdKZvQXVCT5ZQ885ke0pfosLzjo/gqRUPdXqCjTj3jvBHgF4xdQRsBia 6y3HK9G0Px+RWDNxtFJEOVvJfOYWJLz8poJ+vg4q/Iwm738KImy1bCbRMWmQcd+XdHjUxOsDo/ 98nn15mUStgOcb+1o3WRa842R6zfBE5WfuyqlLPgVFQRXsy8iG+nj43C3UbCmxqtkgUrxfflDv Irg= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:53 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 33/42] btrfs: mark block groups to copy for device-replace Date: Tue, 26 Jan 2021 11:25:11 +0900 Message-Id: <6ccb85309fdc67f70909546165d960d315db4d4c.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 1/4 patch to support device-replace in ZONED mode. We have two types of I/Os during the device-replace process. One is an I/O to "copy" (by the scrub functions) all the device extents on the source device to the destination device. The other one is an I/O to "clone" (by handle_ops_on_dev_replace()) new incoming write I/Os from users to the source device into the target device. Cloning incoming I/Os can break the sequential write rule in the target device. When writing is mapped in the middle of a block group, the I/O is directed in the middle of a target device zone, which breaks the sequential write rule. However, the cloning function cannot be merely disabled since incoming I/Os targeting already copied device extents must be cloned so that the I/O is executed on the target device. We cannot use dev_replace->cursor_{left,right} to determine whether bio is going to not yet copied region. Since we have a time gap between finishing btrfs_scrub_dev() and rewriting the mapping tree in btrfs_dev_replace_finishing(), we can have a newly allocated device extent which is never cloned nor copied. So the point is to copy only already existing device extents. This patch introduces mark_block_group_to_copy() to mark existing block groups as a target of copying. Then, handle_ops_on_dev_replace() and dev-replace can check the flag to do their job. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.h | 1 + fs/btrfs/dev-replace.c | 182 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/dev-replace.h | 3 + fs/btrfs/scrub.c | 17 ++++ 4 files changed, 203 insertions(+) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 19a22bf930c6..3dec66ed36cb 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -95,6 +95,7 @@ struct btrfs_block_group { unsigned int iref:1; unsigned int has_caching_ctl:1; unsigned int removed:1; + unsigned int to_copy:1; int disk_cache_state; diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index bc73f798ce3a..b7f84fe45368 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -22,6 +22,7 @@ #include "dev-replace.h" #include "sysfs.h" #include "zoned.h" +#include "block-group.h" /* * Device replace overview @@ -459,6 +460,183 @@ static char* btrfs_dev_name(struct btrfs_device *device) return rcu_str_deref(device->name); } +static int mark_block_group_to_copy(struct btrfs_fs_info *fs_info, + struct btrfs_device *src_dev) +{ + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_key found_key; + struct btrfs_root *root = fs_info->dev_root; + struct btrfs_dev_extent *dev_extent = NULL; + struct btrfs_block_group *cache; + struct btrfs_trans_handle *trans; + int ret = 0; + u64 chunk_offset; + + /* Do not use "to_copy" on non-ZONED for now */ + if (!btrfs_is_zoned(fs_info)) + return 0; + + mutex_lock(&fs_info->chunk_mutex); + + /* Ensure we don't have pending new block group */ + spin_lock(&fs_info->trans_lock); + while (fs_info->running_transaction && + !list_empty(&fs_info->running_transaction->dev_update_list)) { + spin_unlock(&fs_info->trans_lock); + mutex_unlock(&fs_info->chunk_mutex); + trans = btrfs_attach_transaction(root); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + mutex_lock(&fs_info->chunk_mutex); + if (ret == -ENOENT) + continue; + else + goto unlock; + } + + ret = btrfs_commit_transaction(trans); + mutex_lock(&fs_info->chunk_mutex); + if (ret) + goto unlock; + + spin_lock(&fs_info->trans_lock); + } + spin_unlock(&fs_info->trans_lock); + + path = btrfs_alloc_path(); + if (!path) { + ret = -ENOMEM; + goto unlock; + } + + path->reada = READA_FORWARD; + path->search_commit_root = 1; + path->skip_locking = 1; + + key.objectid = src_dev->devid; + key.offset = 0; + key.type = BTRFS_DEV_EXTENT_KEY; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + if (ret < 0) + goto free_path; + if (ret > 0) { + if (path->slots[0] >= + btrfs_header_nritems(path->nodes[0])) { + ret = btrfs_next_leaf(root, path); + if (ret < 0) + goto free_path; + if (ret > 0) { + ret = 0; + goto free_path; + } + } else { + ret = 0; + } + } + + while (1) { + struct extent_buffer *l = path->nodes[0]; + int slot = path->slots[0]; + + btrfs_item_key_to_cpu(l, &found_key, slot); + + if (found_key.objectid != src_dev->devid) + break; + + if (found_key.type != BTRFS_DEV_EXTENT_KEY) + break; + + if (found_key.offset < key.offset) + break; + + dev_extent = btrfs_item_ptr(l, slot, struct btrfs_dev_extent); + + chunk_offset = btrfs_dev_extent_chunk_offset(l, dev_extent); + + cache = btrfs_lookup_block_group(fs_info, chunk_offset); + if (!cache) + goto skip; + + spin_lock(&cache->lock); + cache->to_copy = 1; + spin_unlock(&cache->lock); + + btrfs_put_block_group(cache); + +skip: + ret = btrfs_next_item(root, path); + if (ret != 0) { + if (ret > 0) + ret = 0; + break; + } + } + +free_path: + btrfs_free_path(path); +unlock: + mutex_unlock(&fs_info->chunk_mutex); + + return ret; +} + +bool btrfs_finish_block_group_to_copy(struct btrfs_device *srcdev, + struct btrfs_block_group *cache, + u64 physical) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct extent_map *em; + struct map_lookup *map; + u64 chunk_offset = cache->start; + int num_extents, cur_extent; + int i; + + /* Do not use "to_copy" on non-ZONED for now */ + if (!btrfs_is_zoned(fs_info)) + return true; + + spin_lock(&cache->lock); + if (cache->removed) { + spin_unlock(&cache->lock); + return true; + } + spin_unlock(&cache->lock); + + em = btrfs_get_chunk_map(fs_info, chunk_offset, 1); + ASSERT(!IS_ERR(em)); + map = em->map_lookup; + + num_extents = cur_extent = 0; + for (i = 0; i < map->num_stripes; i++) { + /* We have more device extent to copy */ + if (srcdev != map->stripes[i].dev) + continue; + + num_extents++; + if (physical == map->stripes[i].physical) + cur_extent = i; + } + + free_extent_map(em); + + if (num_extents > 1 && cur_extent < num_extents - 1) { + /* + * Has more stripes on this device. Keep this BG + * readonly until we finish all the stripes. + */ + return false; + } + + /* Last stripe on this device */ + spin_lock(&cache->lock); + cache->to_copy = 0; + spin_unlock(&cache->lock); + + return true; +} + static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, const char *tgtdev_name, u64 srcdevid, const char *srcdev_name, int read_src) @@ -500,6 +678,10 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, if (ret) return ret; + ret = mark_block_group_to_copy(fs_info, src_device); + if (ret) + return ret; + down_write(&dev_replace->rwsem); switch (dev_replace->replace_state) { case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED: diff --git a/fs/btrfs/dev-replace.h b/fs/btrfs/dev-replace.h index 60b70dacc299..3911049a5f23 100644 --- a/fs/btrfs/dev-replace.h +++ b/fs/btrfs/dev-replace.h @@ -18,5 +18,8 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info); void btrfs_dev_replace_suspend_for_unmount(struct btrfs_fs_info *fs_info); int btrfs_resume_dev_replace_async(struct btrfs_fs_info *fs_info); int __pure btrfs_dev_replace_is_ongoing(struct btrfs_dev_replace *dev_replace); +bool btrfs_finish_block_group_to_copy(struct btrfs_device *srcdev, + struct btrfs_block_group *cache, + u64 physical); #endif diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 3a0a6b8ed6f2..b57c1184f330 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3564,6 +3564,17 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, if (!cache) goto skip; + + if (sctx->is_dev_replace && btrfs_is_zoned(fs_info)) { + spin_lock(&cache->lock); + if (!cache->to_copy) { + spin_unlock(&cache->lock); + ro_set = 0; + goto done; + } + spin_unlock(&cache->lock); + } + /* * Make sure that while we are scrubbing the corresponding block * group doesn't get its logical address and its device extents @@ -3695,6 +3706,12 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, scrub_pause_off(fs_info); + if (sctx->is_dev_replace && + !btrfs_finish_block_group_to_copy(dev_replace->srcdev, + cache, found_key.offset)) + ro_set = 0; + +done: down_write(&dev_replace->rwsem); dev_replace->cursor_left = dev_replace->cursor_right; dev_replace->item_needs_writeback = 1; From patchwork Tue Jan 26 02:25:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048123 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEA6DC433DB for ; Tue, 26 Jan 2021 19:55:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 88D7022CE3 for ; Tue, 26 Jan 2021 19:55:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387572AbhAZFaI (ORCPT ); Tue, 26 Jan 2021 00:30:08 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38278 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732307AbhAZClk (ORCPT ); Mon, 25 Jan 2021 21:41:40 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628900; x=1643164900; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=iP1cE4eZOYaWr5cCdgIrpkccxeO7yiD76ZwPTB6mNrg=; b=Kg+FaWnZtkVZvzdxKrgzfqcb2yy9EyTwwqTJpA5Qf7akXUhUDWn0QaRf i+IsNdWHkvtcb/YfxuKenxqCtr27PcCo2p5K+uZz0w8SpyqTI9Yig7o7v nNVldTaPxr95bmlp+Rw/meH37URm/2HIe4BCMWeSCoyMMfcK6Txnkfsjp TwzTdkHLxOg9uPWi+9cIhKG0PmoIWd0vUeBelWQD/sQiw7xBgmxOl38cB oTbER3dpBJUpaTL6sBxOmHCJxM2QTsc00YrMF2tLevPXiXft+MlSMcjr8 Q3n7/A8hmUGLyzavjp9Ciz5zZqUWYk9LKfd0S+tzyQdk9vAC7grhG0kyP g==; IronPort-SDR: ZNFzG21KYeW+F9+qA2FCDCrMLZR1R9DCIZtKkz9yXxT3G+375j8H39CCox2gFSwoMknXBIVZN0 JclGvDaOQuwcrgG3Dk5ay3hy5RU2Y829MT+pETxOfCO4vjFeINE9Ui6aZjexGZn0BLgQyszUuH RWsiApDysriJ1LM0rTITi1bsdTHW+A+29fVh9BQMNtq4EgFT5xwpFQw7bIb2enzoX9M/uyaQ1g FLp5lLuMg+4JVfIFZYuiILRyMhs7ouWiKGQS+RmnBkau/SdgrGBqirM5QH8IJroQ5SSZPG8JGH muI= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483575" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:57 +0800 IronPort-SDR: L+sg9lv2lI7sQXorRJSO+khXO45gJ4bE5k5viMNNAlya4q3YeCW9pX3+Nxg2o+iAE0dU5iSPVX QBI6BfJou3pZTm09AUm2SNA5512xBjRJhWH5PEjexqwNS0af3Jay1E+444ZYePYBkYtMM+Xsad /US2XlyhdQi2YyAo0un+3pRoP06/J41pX4DfovciyTFbVAPAVGvy4GIOGV6s1O+5JWAD65/Ilq gL4ynxo5p2M6jgp2zkVdr3oFTlsE5x4t8ySnMItxc5xrm4uwxdkEzuRRkanmKSvJVffUEE7sWr K/L7mdRXdTsRY4cvFtG4LbP8 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:23 -0800 IronPort-SDR: 5GHUf3J7OcdeRz4rA0T/cMUGE8J35f5KvAe0nLS0m6wx7C3tHx7JPW5AF0lkAmCQXoUo0tuvEY Y+NH25M6wjWeTbEimj3oHXTbhidYt99kzC0OPlNPSXiMC6exvMq8lPQZepe1DHAZn1Et+eckqo GHk5KuVJ7webKwGRrAm8N4+XBIqz6qbn45UALkHlBAjK1Ppd76m2YjNjhRDbf5QlL7nQNz2Xhx hOqeLhliQw/gcmqNECpxauWgN3LeSSwaPFavTh0j6SoWLB2hGIeGWltvdyaQW+6pMMmkm+XPeR HHE= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:55 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 34/42] btrfs: implement cloning for ZONED device-replace Date: Tue, 26 Jan 2021 11:25:12 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is 2/4 patch to implement device-replace for ZONED mode. On zoned mode, a block group must be either copied (from the source device to the destination device) or cloned (to the both device). This commit implements the cloning part. If a block group targeted by an IO is marked to copy, we should not clone the IO to the destination device, because the block group is eventually copied by the replace process. This commit also handles cloning of device reset. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/extent-tree.c | 57 +++++++++++++++++++++++++++++++----------- fs/btrfs/volumes.c | 33 ++++++++++++++++++++++-- fs/btrfs/zoned.c | 11 ++++++++ 3 files changed, 84 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 4c126e4ada27..f73f39bd68c0 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -35,6 +35,7 @@ #include "discard.h" #include "rcu-string.h" #include "zoned.h" +#include "dev-replace.h" #undef SCRAMBLE_DELAYED_REFS @@ -1265,6 +1266,46 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len, return ret; } +static int do_discard_extent(struct btrfs_bio_stripe *stripe, u64 *bytes) +{ + struct btrfs_device *dev = stripe->dev; + struct btrfs_fs_info *fs_info = dev->fs_info; + struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace; + u64 phys = stripe->physical; + u64 len = stripe->length; + u64 discarded = 0; + int ret = 0; + + /* Zone reset in ZONED mode */ + if (btrfs_can_zone_reset(dev, phys, len)) { + u64 src_disc; + + ret = btrfs_reset_device_zone(dev, phys, len, &discarded); + if (ret) + goto out; + + if (!btrfs_dev_replace_is_ongoing(dev_replace) || + dev != dev_replace->srcdev) + goto out; + + src_disc = discarded; + + /* send to replace target as well */ + ret = btrfs_reset_device_zone(dev_replace->tgtdev, phys, len, + &discarded); + discarded += src_disc; + } else if (blk_queue_discard(bdev_get_queue(stripe->dev->bdev))) { + ret = btrfs_issue_discard(dev->bdev, phys, len, &discarded); + } else { + ret = 0; + *bytes = 0; + } + +out: + *bytes = discarded; + return ret; +} + int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes, u64 *actual_bytes) { @@ -1298,28 +1339,14 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, stripe = bbio->stripes; for (i = 0; i < bbio->num_stripes; i++, stripe++) { - struct btrfs_device *dev = stripe->dev; - u64 physical = stripe->physical; - u64 length = stripe->length; u64 bytes; - struct request_queue *req_q; if (!stripe->dev->bdev) { ASSERT(btrfs_test_opt(fs_info, DEGRADED)); continue; } - req_q = bdev_get_queue(stripe->dev->bdev); - /* Zone reset in ZONED mode */ - if (btrfs_can_zone_reset(dev, physical, length)) - ret = btrfs_reset_device_zone(dev, physical, - length, &bytes); - else if (blk_queue_discard(req_q)) - ret = btrfs_issue_discard(dev->bdev, physical, - length, &bytes); - else - continue; - + ret = do_discard_extent(stripe, &bytes); if (!ret) { discarded_bytes += bytes; } else if (ret != -EOPNOTSUPP) { diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 4cb5e940356e..a99735dda515 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5973,9 +5973,29 @@ static int get_extra_mirror_from_replace(struct btrfs_fs_info *fs_info, return ret; } +static bool is_block_group_to_copy(struct btrfs_fs_info *fs_info, u64 logical) +{ + struct btrfs_block_group *cache; + bool ret; + + /* non-ZONED mode does not use "to_copy" flag */ + if (!btrfs_is_zoned(fs_info)) + return false; + + cache = btrfs_lookup_block_group(fs_info, logical); + + spin_lock(&cache->lock); + ret = cache->to_copy; + spin_unlock(&cache->lock); + + btrfs_put_block_group(cache); + return ret; +} + static void handle_ops_on_dev_replace(enum btrfs_map_op op, struct btrfs_bio **bbio_ret, struct btrfs_dev_replace *dev_replace, + u64 logical, int *num_stripes_ret, int *max_errors_ret) { struct btrfs_bio *bbio = *bbio_ret; @@ -5988,6 +6008,15 @@ static void handle_ops_on_dev_replace(enum btrfs_map_op op, if (op == BTRFS_MAP_WRITE) { int index_where_to_add; + /* + * a block group which have "to_copy" set will + * eventually copied by dev-replace process. We can + * avoid cloning IO here. + */ + if (is_block_group_to_copy(dev_replace->srcdev->fs_info, + logical)) + return; + /* * duplicate the write operations while the dev replace * procedure is running. Since the copying of the old disk to @@ -6383,8 +6412,8 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL && need_full_stripe(op)) { - handle_ops_on_dev_replace(op, &bbio, dev_replace, &num_stripes, - &max_errors); + handle_ops_on_dev_replace(op, &bbio, dev_replace, logical, + &num_stripes, &max_errors); } *bbio_ret = bbio; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 7cf7d74247c7..462a6337d460 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -11,6 +11,7 @@ #include "disk-io.h" #include "block-group.h" #include "transaction.h" +#include "dev-replace.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -1039,6 +1040,8 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) for (i = 0; i < map->num_stripes; i++) { bool is_sequential; struct blk_zone zone; + struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace; + int dev_replace_is_ongoing = 0; device = map->stripes[i].dev; physical = map->stripes[i].physical; @@ -1065,6 +1068,14 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) */ btrfs_dev_clear_zone_empty(device, physical); + down_read(&dev_replace->rwsem); + dev_replace_is_ongoing = + btrfs_dev_replace_is_ongoing(dev_replace); + if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL) + btrfs_dev_clear_zone_empty(dev_replace->tgtdev, + physical); + up_read(&dev_replace->rwsem); + /* * The group is mapped to a sequential zone. Get the zone write * pointer to determine the allocation offset within the zone. From patchwork Tue Jan 26 02:25:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12045419 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C967C43217 for ; Tue, 26 Jan 2021 05:30:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0C2AC229C4 for ; Tue, 26 Jan 2021 05:30:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387585AbhAZFaK (ORCPT ); Tue, 26 Jan 2021 00:30:10 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38256 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732308AbhAZCll (ORCPT ); Mon, 25 Jan 2021 21:41:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611628901; x=1643164901; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mEpyROl0duy87zl70YlYF58Pje4QttE4JRT5DM7flLw=; b=iNMEQLEgvfFRhJ2zAOFHy3aqw2x+CuxvZ8PBHWz+xAZez1A2SvPAQYcb A5J0vKQYfvXoEXIIj2Vjn3HK7mpHfxYyNCVdqpCpfUq3rw/abS9W6+of/ 4Mngz024vQ424dNp3YOEMRAiVHXUiJWoQq80UqtkBZdQahHwbfHu5aNuP jaLsFi4bOr6dxm6PGDTfVxcxtZEVkM3ua0J3xSNK68PLBQHzOADBU+n4+ SELajYModTywZMlDDf68FX2jAem7Z9Q9yZP44OWpfMF+r9Wmf9swflWCb ejBmF+4WkMgDgetVMrI7thARBUyXUx2J1HVcwMgxb2Jc6IjRNaNKMcgie Q==; IronPort-SDR: yqacMs1sk/2ycdvegTtZTN9PJwzTcZcsVQ2YpQ6q1+Xdqqk89thYeWdOWebg5HVCe+TYiYN7px FQVkw7OftoGPFSe13cW/7FgZGgR/U2On/GKGua6Uyo0mz/3bYDTEzPmeEUgaRLmbwJrcybifh7 i3f28nP3MUnzekX5/KMtpl/8pyzK7KFR3qo45MT9VjtJZaUzGu+jIvFsr2n7l1mkk0vONP2Wpn BkypSP1G2+JYmN/U47QR7F0HvvDTLu7jjcPXmz/kP5VRvs6tbTXoXq1OenocWYlyjDpSFSTfoN 3c8= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483576" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:26:58 +0800 IronPort-SDR: ifMP5iUvCT+RKUtBBmVMvepVNdWNGmN+VxbjkzM+W6V3WUeArJYKfBFgHOowqQViuKTGQiza4v a+rF2sdN2DnC6fNXVSjbLxf3M3v5/0i2VgOFeAF5we3rpgGoozWy0JN2ma3zU+bpQc2/8y1Srs ihu4P9IH+I94KDFgJHxmNXYQdC4yRV952sSlXbhFOUkeWxUQWkQmBIanksH2CWivOl1R2C7cxC YgNOrDZnqAxpnDzEL/w4YrCQl34dTrekMZmJ1sex8t1AEdpZOiA6hjnZXNynoNOkBp1JpyUsF+ flDB/+VKzsXP8NXY8cnOsZVJ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:25 -0800 IronPort-SDR: kQsRC8cwdpex/13xB54Wq1+UsLX2P2pk9IEutuzOwb8WUKEAvBGYnRmu3luTuLp+bho3aIsHVq jv6cnDShhvSlsqEvxampC5cSdid8Clmy7Q9/S6JBUyD6oqphApkqjdnHJuyfbSJ1MuuXVr1hQD hO7iUFxtstrpOgEFp7pHxSe+BZUDJf6UJO45RFNPCjuGxPZgZllbN/Lm65Jhf6pfsDQQ8uYOcO WaSlYUNMjoWGMqEX/P/kCL9MtkKBDM1MGilrPeZGLEeZVVbDRr/92SvvDsaLiKzLiGFMf8OBE/ GtY= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:57 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 35/42] btrfs: implement copying for ZONED device-replace Date: Tue, 26 Jan 2021 11:25:13 +0900 Message-Id: <3e67e5199d8472decccb96a10844d5265e4daaa9.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is 3/4 patch to implement device-replace on ZONED mode. This commit implement copying. So, it track the write pointer during device replace process. Device-replace's copying is smart to copy only used extents on source device, we have to fill the gap to honor the sequential write rule in the target device. Device-replace process in ZONED mode must copy or clone all the extents in the source device exactly once. So, we need to use to ensure allocations started just before the dev-replace process to have their corresponding extent information in the B-trees. finish_extent_writes_for_zoned() implements that functionality, which basically is the removed code in the commit 042528f8d840 ("Btrfs: fix block group remaining RO forever after error during device replace"). Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/scrub.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.c | 12 +++++++ fs/btrfs/zoned.h | 8 +++++ 3 files changed, 106 insertions(+) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index b57c1184f330..b03c3629fb12 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -166,6 +166,7 @@ struct scrub_ctx { int pages_per_rd_bio; int is_dev_replace; + u64 write_pointer; struct scrub_bio *wr_curr_bio; struct mutex wr_lock; @@ -1619,6 +1620,25 @@ static int scrub_write_page_to_dev_replace(struct scrub_block *sblock, return scrub_add_page_to_wr_bio(sblock->sctx, spage); } +static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical) +{ + int ret = 0; + u64 length; + + if (!btrfs_is_zoned(sctx->fs_info)) + return 0; + + if (sctx->write_pointer < physical) { + length = physical - sctx->write_pointer; + + ret = btrfs_zoned_issue_zeroout(sctx->wr_tgtdev, + sctx->write_pointer, length); + if (!ret) + sctx->write_pointer = physical; + } + return ret; +} + static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx, struct scrub_page *spage) { @@ -1641,6 +1661,13 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx, if (sbio->page_count == 0) { struct bio *bio; + ret = fill_writer_pointer_gap(sctx, + spage->physical_for_dev_replace); + if (ret) { + mutex_unlock(&sctx->wr_lock); + return ret; + } + sbio->physical = spage->physical_for_dev_replace; sbio->logical = spage->logical; sbio->dev = sctx->wr_tgtdev; @@ -1705,6 +1732,10 @@ static void scrub_wr_submit(struct scrub_ctx *sctx) * doubled the write performance on spinning disks when measured * with Linux 3.5 */ btrfsic_submit_bio(sbio->bio); + + if (btrfs_is_zoned(sctx->fs_info)) + sctx->write_pointer = sbio->physical + + sbio->page_count * PAGE_SIZE; } static void scrub_wr_bio_end_io(struct bio *bio) @@ -3028,6 +3059,21 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, return ret < 0 ? ret : 0; } +static void sync_replace_for_zoned(struct scrub_ctx *sctx) +{ + if (!btrfs_is_zoned(sctx->fs_info)) + return; + + sctx->flush_all_writes = true; + scrub_submit(sctx); + mutex_lock(&sctx->wr_lock); + scrub_wr_submit(sctx); + mutex_unlock(&sctx->wr_lock); + + wait_event(sctx->list_wait, + atomic_read(&sctx->bios_in_flight) == 0); +} + static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, struct map_lookup *map, struct btrfs_device *scrub_dev, @@ -3168,6 +3214,14 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, */ blk_start_plug(&plug); + if (sctx->is_dev_replace && + btrfs_dev_is_sequential(sctx->wr_tgtdev, physical)) { + mutex_lock(&sctx->wr_lock); + sctx->write_pointer = physical; + mutex_unlock(&sctx->wr_lock); + sctx->flush_all_writes = true; + } + /* * now find all extents for each stripe and scrub them */ @@ -3356,6 +3410,9 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, if (ret) goto out; + if (sctx->is_dev_replace) + sync_replace_for_zoned(sctx); + if (extent_logical + extent_len < key.objectid + bytes) { if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) { @@ -3478,6 +3535,25 @@ static noinline_for_stack int scrub_chunk(struct scrub_ctx *sctx, return ret; } +static int finish_extent_writes_for_zoned(struct btrfs_root *root, + struct btrfs_block_group *cache) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct btrfs_trans_handle *trans; + + if (!btrfs_is_zoned(fs_info)) + return 0; + + btrfs_wait_block_group_reservations(cache); + btrfs_wait_nocow_writers(cache); + btrfs_wait_ordered_roots(fs_info, U64_MAX, cache->start, cache->length); + + trans = btrfs_join_transaction(root); + if (IS_ERR(trans)) + return PTR_ERR(trans); + return btrfs_commit_transaction(trans); +} + static noinline_for_stack int scrub_enumerate_chunks(struct scrub_ctx *sctx, struct btrfs_device *scrub_dev, u64 start, u64 end) @@ -3633,6 +3709,16 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, * group is not RO. */ ret = btrfs_inc_block_group_ro(cache, sctx->is_dev_replace); + if (!ret && sctx->is_dev_replace) { + ret = finish_extent_writes_for_zoned(root, cache); + if (ret) { + btrfs_dec_block_group_ro(cache); + scrub_pause_off(fs_info); + btrfs_put_block_group(cache); + break; + } + } + if (ret == 0) { ro_set = 1; } else if (ret == -ENOSPC && !sctx->is_dev_replace) { diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 462a6337d460..ecda55474c20 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1386,3 +1386,15 @@ void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, ASSERT(cache->meta_write_pointer == eb->start + eb->len); cache->meta_write_pointer = eb->start; } + +int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, + u64 length) +{ + if (!btrfs_dev_is_sequential(device, physical)) + return -EOPNOTSUPP; + + return blkdev_issue_zeroout(device->bdev, + physical >> SECTOR_SHIFT, + length >> SECTOR_SHIFT, + GFP_NOFS, 0); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index a42e120158ab..a9698470c08e 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -55,6 +55,8 @@ bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, struct btrfs_block_group **cache_ret); void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, struct extent_buffer *eb); +int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, + u64 length); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -169,6 +171,12 @@ static inline void btrfs_revert_meta_write_pointer( { } +static inline int btrfs_zoned_issue_zeroout(struct btrfs_device *device, + u64 physical, u64 length) +{ + return -EOPNOTSUPP; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Jan 26 02:25:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048121 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46312C43381 for ; Tue, 26 Jan 2021 19:55:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2A82B22B3F for ; Tue, 26 Jan 2021 19:55:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387593AbhAZFaL (ORCPT ); Tue, 26 Jan 2021 00:30:11 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33036 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732316AbhAZCnk (ORCPT ); Mon, 25 Jan 2021 21:43:40 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611629020; x=1643165020; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GTPYUctou73Z0pkUsWd0jz2vPmTQM9fnEffqu/hXBOk=; b=ncEnJrHgqr9FNPJEoXpKETOHegR7BX6j/jVABJ450005V7UKsaEie8xt lPf0DCYVqPiKOhG5+v+g0dP9RfYuXNPRhjOA8YfImoxbWYwekb844pw8Q F7sdVFc8qfPjH753a5I/1xmnyKqJlZPmQbK3EVzbgLEwA1PH9YZa4GwGI oDVAOvdWD6ztZcicY8xOW8GdB44EFnlR0zDYRMh8ad19OmCpa5vovgKAf CIMFzn5EfdEKp1jjFY5ynEmdkRozDoYqv8HaFzQDKNEWa9fBZqzPhXTp0 Zuq6pWjNxIn6rIyxoW4s71VMPsFku/updcN6bLCX0nCnGBTc4zATBJXfP g==; IronPort-SDR: kFtfXNaTEByqdSaVBKtVCQlcPBnp0ProEaQI91BbTkU4ptMBKt+y/NmFKfOaaa9kDHw77zKCwI 3pMmL+MleU0it44R83ERHuZqRauWeVWF0pp8VJCwzzESXegA51qvvYKQ3ARkLj9PqogThTWZ7O 7bDQ7V6zPeV4sc5SHsBdl6lywtqQ5lpgievi10lhvSM9uUEHiFzfW8pjMU5//hLmkEgT+/3Ycf VkVMdF5O0W68PLwUujcgjhkOAbZepbIXLmjlcMKBQrv++zwylO+KZLhcp0mJSY3cKzIzBPyo3z jTg= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483577" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:27:00 +0800 IronPort-SDR: BZP4yYpeU3LXbZH7YCaiHiE7NOD5tCvI1gomO1Hs6EJTcVMo7kBjqJLntnG7Kf4DOlrUjcwaRE Q97wnn1LcEcyaY0ymKza+T1iIbNyf1DstPyKt/te5LI0IYfBjXdeCFOLu4tcYslHBYJqX7DnYT l0jSnTzCSL3xbWBq2NsT4DScGzjU0ccMQFUtCXNBUYCkVktOVKRxF5sJAJ7hV2kV5kRfFVi/l5 q2wCzDd6g31ZylX/e5MUx6+EMaWIaplOfbAZ+ARs0583w39e/fC5UZZUnaoJlx1ZhOoTGIGX/1 NqDKOWDWBhhKgkGDBVpqdfjM Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:27 -0800 IronPort-SDR: zeuR+XczIL3xCbuPg1pKRm/+lr1B9AXIlCFj+YC0LXjsGoururEshVd7b8o1aY4fufcUZeai5L +KEbdTUwTJP2hRk/sWUVwQhRrRpVqAIxPsNuNphsTmFkhjxVBhDcf38ASJ0m5hV+wID1oQMvIY lRdTythi9hI7+cokfMbSYtfZkVcHq1hDLZD1tKksGZdsRBQBrSZHrdfVi/9F5zwe8Ndnj7ZWzr 37MQowmmXKI1TKKqRqXQZrpwgocyJqZnB625Vtcl5/X6IqpRtaqo8AephIDpNin8ORHieP8QCU bL0= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:26:58 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 36/42] btrfs: support dev-replace in ZONED mode Date: Tue, 26 Jan 2021 11:25:14 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is 4/4 patch to implement device-replace on ZONED mode. Even after the copying is done, the write pointers of the source device and the destination device may not be synchronized. For example, when the last allocated extent is freed before device-replace process, the extent is not copied, leaving a hole there. This patch synchronize the write pointers by writing zeros to the destination device. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/scrub.c | 39 +++++++++++++++++++++++++ fs/btrfs/zoned.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 9 ++++++ 3 files changed, 122 insertions(+) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index b03c3629fb12..2f577f3b1c31 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1628,6 +1628,9 @@ static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical) if (!btrfs_is_zoned(sctx->fs_info)) return 0; + if (!btrfs_dev_is_sequential(sctx->wr_tgtdev, physical)) + return 0; + if (sctx->write_pointer < physical) { length = physical - sctx->write_pointer; @@ -3074,6 +3077,31 @@ static void sync_replace_for_zoned(struct scrub_ctx *sctx) atomic_read(&sctx->bios_in_flight) == 0); } +static int sync_write_pointer_for_zoned(struct scrub_ctx *sctx, u64 logical, + u64 physical, u64 physical_end) +{ + struct btrfs_fs_info *fs_info = sctx->fs_info; + int ret = 0; + + if (!btrfs_is_zoned(fs_info)) + return 0; + + wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0); + + mutex_lock(&sctx->wr_lock); + if (sctx->write_pointer < physical_end) { + ret = btrfs_sync_zone_write_pointer(sctx->wr_tgtdev, logical, + physical, + sctx->write_pointer); + if (ret) + btrfs_err(fs_info, "failed to recover write pointer"); + } + mutex_unlock(&sctx->wr_lock); + btrfs_dev_clear_zone_empty(sctx->wr_tgtdev, physical); + + return ret; +} + static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, struct map_lookup *map, struct btrfs_device *scrub_dev, @@ -3480,6 +3508,17 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, blk_finish_plug(&plug); btrfs_free_path(path); btrfs_free_path(ppath); + + if (sctx->is_dev_replace && ret >= 0) { + int ret2; + + ret2 = sync_write_pointer_for_zoned(sctx, base + offset, + map->stripes[num].physical, + physical_end); + if (ret2) + ret = ret2; + } + return ret < 0 ? ret : 0; } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index ecda55474c20..730a56f1e09f 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -12,6 +12,7 @@ #include "block-group.h" #include "transaction.h" #include "dev-replace.h" +#include "space-info.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -1398,3 +1399,76 @@ int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, length >> SECTOR_SHIFT, GFP_NOFS, 0); } + +static int read_zone_info(struct btrfs_fs_info *fs_info, u64 logical, + struct blk_zone *zone) +{ + struct btrfs_bio *bbio = NULL; + u64 mapped_length = PAGE_SIZE; + unsigned int nofs_flag; + int nmirrors; + int i, ret; + + ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS, logical, + &mapped_length, &bbio); + if (ret || !bbio || mapped_length < PAGE_SIZE) { + btrfs_put_bbio(bbio); + return -EIO; + } + + if (bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) + return -EINVAL; + + nofs_flag = memalloc_nofs_save(); + nmirrors = (int)bbio->num_stripes; + for (i = 0; i < nmirrors; i++) { + u64 physical = bbio->stripes[i].physical; + struct btrfs_device *dev = bbio->stripes[i].dev; + + /* Missing device */ + if (!dev->bdev) + continue; + + ret = btrfs_get_dev_zone(dev, physical, zone); + /* Failing device */ + if (ret == -EIO || ret == -EOPNOTSUPP) + continue; + break; + } + memalloc_nofs_restore(nofs_flag); + + return ret; +} + +/* + * Synchronize write pointer in a zone at @physical_start on @tgt_dev, by + * filling zeros between @physical_pos to a write pointer of dev-replace + * source device. + */ +int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, u64 logical, + u64 physical_start, u64 physical_pos) +{ + struct btrfs_fs_info *fs_info = tgt_dev->fs_info; + struct blk_zone zone; + u64 length; + u64 wp; + int ret; + + if (!btrfs_dev_is_sequential(tgt_dev, physical_pos)) + return 0; + + ret = read_zone_info(fs_info, logical, &zone); + if (ret) + return ret; + + wp = physical_start + ((zone.wp - zone.start) << SECTOR_SHIFT); + + if (physical_pos == wp) + return 0; + + if (physical_pos > wp) + return -EUCLEAN; + + length = wp - physical_pos; + return btrfs_zoned_issue_zeroout(tgt_dev, physical_pos, length); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index a9698470c08e..8c203c0425e0 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -57,6 +57,8 @@ void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, struct extent_buffer *eb); int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, u64 length); +int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, u64 logical, + u64 physical_start, u64 physical_pos); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -177,6 +179,13 @@ static inline int btrfs_zoned_issue_zeroout(struct btrfs_device *device, return -EOPNOTSUPP; } +static inline int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, + u64 logical, u64 physical_start, + u64 physical_pos) +{ + return -EOPNOTSUPP; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Tue Jan 26 02:25:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048117 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FF95C433E6 for ; Tue, 26 Jan 2021 19:55:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CF20C22CE3 for ; Tue, 26 Jan 2021 19:55:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387612AbhAZFaO (ORCPT ); Tue, 26 Jan 2021 00:30:14 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33029 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732280AbhAZCnq (ORCPT ); Mon, 25 Jan 2021 21:43:46 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611629026; x=1643165026; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=C0suuqAEVuTZh016N5UsV5kbXaTibBUMf8NpAAdpB+4=; b=GfL5R5VZbQgVLUfel0coI776Agkmh83RuAsD3LyD5pFwOaXRhS5M5Atj 8bMrva5zCpnyf8CK03V7yigOW1DnJR2gNSjG56VGuk5/9PysTkl562+M5 7Nuyy6j8zykHzW2vqShflDP1d/cro/X3RYGd3066iuw3JIVVhaU/ThVTx wNrjL0Et8SsPYlEKCamNto/P8c8RNrdbPvPdGOJMRiLXceATFVUO4njUf Luq5Ha52ZcHoTaqhu3xq6JNHxczI5RA7umam3QSPsrqDDe+09jV8wzzjA QL9UKqjXjEzx2WMJSf9tHmvKo1ZmaOgEjVdYDHeSa+Fdnte3hMFlFlZoM A==; IronPort-SDR: CJg4h21QotRwrGk0Gmyled2KlSoZ6k/JfzY09+IgVyuGFrl323PTRP8diymKc2ysAFs8rMYlQs kbtpXRTALJ9FeUYKlqmoHRIJwSs69pKyAKtBHu07mqwIGK67ny3LgSPpSOArIArwqc8o4rZN2R WqapL1Go4ijZcr4dDN/9tV+hBLMh0gaSoW9YhH+7idldcgEb9HvBE1YnvyyXDsFQJ57KhSNo8m EMD7NQqhtR2jzw9WLa+WpOurOUJxmxxhSpN0f3+uf+aMRrSmfDlPv+DysBUmyQdaOLZr4S1tyL A1U= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483579" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:27:03 +0800 IronPort-SDR: 4IB4sbkDwzxvDEnAF3RX/nBs7YGb7uw3G3OmHCQhCudLjB3+ODncysprJunrECg2xoAZnL/lY0 a0mrdCAhYwSuvp6MrmHmacEFAkNs391iZMDoMJuwSTB/tp/GJKqdxoCgaCSdo0KtDEz17Urpek OV7jtvy16ZQDJ5b7cVsLE8AMLPA0ZbrlSq9vGfun57JJjn4ig6o9UbJb1nOPqd/yXI9x3xxuoY cCGu5+Q1BsKKo3LpfIlzhTaNQJkMulDmYJmIi1gM4sxBK/5HJVgjdjlXVb/K+eSAxoQhODrfO+ l9O8nF9hd1QBjJVFilV2pCYr Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:29 -0800 IronPort-SDR: H6Yh3Y8QZ+9NSoYYX9swPOhQztTg7fFwKzsLatvHv5V0qqcCk1d7zR4mIdyQ6X7xTo0UQXKXYM L2nBqp9BkeoSoxiSmsYPo9jmCZUbAwh/BEHlEkAex2AiAL0+2YoPFAYIXrN+xeMfoWgEhilnI5 Mg4hTEgKBEpBFq676VFZ9/cwyznzNiqAxk1TcngM9yiUxgAcAAygpQfcJmlBtXUyk7dDH0XL1V jLwuvpT8JbEP3y4Vv2hpAk4y9CVabOLJiFKj4l5Jcq4Deyf8Kvvp+aX85vfyl39dl9KcZFi7U5 xTg= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:27:00 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 37/42] btrfs: enable relocation in ZONED mode Date: Tue, 26 Jan 2021 11:25:15 +0900 Message-Id: <54d1e6732ba8f867e6dbbc8c26b38eb1006e405f.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org To serialize allocation and submit_bio, we introduced mutex around them. As a result, preallocation must be completely disabled to avoid a deadlock. Since current relocation process relies on preallocation to move file data extents, it must be handled in another way. In ZONED mode, we just truncate the inode to the size that we wanted to pre-allocate. Then, we flush dirty pages on the file before finishing relocation process. run_delalloc_zoned() will handle all the allocation and submit IOs to the underlying layers. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/relocation.c | 34 ++++++++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 9f2289bcdde6..702986b83f6c 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2555,6 +2555,31 @@ static noinline_for_stack int prealloc_file_extent_cluster( if (ret) return ret; + /* + * In ZONED mode, we cannot preallocate the file region. Instead, we + * dirty and fiemap_write the region. + */ + if (btrfs_is_zoned(inode->root->fs_info)) { + struct btrfs_root *root = inode->root; + struct btrfs_trans_handle *trans; + + end = cluster->end - offset + 1; + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) + return PTR_ERR(trans); + + inode->vfs_inode.i_ctime = current_time(&inode->vfs_inode); + i_size_write(&inode->vfs_inode, end); + ret = btrfs_update_inode(trans, root, inode); + if (ret) { + btrfs_abort_transaction(trans, ret); + btrfs_end_transaction(trans); + return ret; + } + + return btrfs_end_transaction(trans); + } + inode_lock(&inode->vfs_inode); for (nr = 0; nr < cluster->nr; nr++) { start = cluster->boundary[nr] - offset; @@ -2751,6 +2776,8 @@ static int relocate_file_extent_cluster(struct inode *inode, } } WARN_ON(nr != cluster->nr); + if (btrfs_is_zoned(fs_info) && !ret) + ret = btrfs_wait_ordered_range(inode, 0, (u64)-1); out: kfree(ra); return ret; @@ -3429,8 +3456,12 @@ static int __insert_orphan_inode(struct btrfs_trans_handle *trans, struct btrfs_path *path; struct btrfs_inode_item *item; struct extent_buffer *leaf; + u64 flags = BTRFS_INODE_NOCOMPRESS | BTRFS_INODE_PREALLOC; int ret; + if (btrfs_is_zoned(trans->fs_info)) + flags &= ~BTRFS_INODE_PREALLOC; + path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -3445,8 +3476,7 @@ static int __insert_orphan_inode(struct btrfs_trans_handle *trans, btrfs_set_inode_generation(leaf, item, 1); btrfs_set_inode_size(leaf, item, 0); btrfs_set_inode_mode(leaf, item, S_IFREG | 0600); - btrfs_set_inode_flags(leaf, item, BTRFS_INODE_NOCOMPRESS | - BTRFS_INODE_PREALLOC); + btrfs_set_inode_flags(leaf, item, flags); btrfs_mark_buffer_dirty(leaf); out: btrfs_free_path(path); From patchwork Tue Jan 26 02:25:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048119 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 333D2C433E0 for ; Tue, 26 Jan 2021 19:55:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1055A23109 for ; Tue, 26 Jan 2021 19:55:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387604AbhAZFaM (ORCPT ); Tue, 26 Jan 2021 00:30:12 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33033 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732327AbhAZCnq (ORCPT ); Mon, 25 Jan 2021 21:43:46 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611629026; x=1643165026; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hGinAuqqruk5A3jSuyaeVywo7gMtxrv8chVgsYCpY/M=; b=qyVpRu/NMrO0PQKvsooFi9++zaLCYTZH2KUgmtrpabomHRENRnUzutvg dDJwEUjnHvbPxIthilZgT2T0JXVrg2Cu9CgJ6cgNyMgiSnUE4URQp3u2n oIIma90rPy35FP16gEwwdKEdIguqQfwDip2wtIFKPgJql67iFMjnHoyeC r7k9yKS059Lj6krKwt8AdKGTLkxhkbWzivVYUVuie0PiJfVR8SbMe1pDd sSErjZC8eXSq+BsqezFFqaKLub6qojuxpqTVSStdDqjzYN4RnsEZmz5pF rPQS/JGo5Wao0AIoLxHnwQ2YFKtovYFj/lsDpqGzVMQXGnEXPO2j0K8dh g==; IronPort-SDR: Eh3aFY2jK0SlFQM3kyZMtsNZ55iPyz7nT/0gdPFKiaa0OK29Dby8dH3C0indgPgGMJnmNF1rxx LFvNo8e1QwMJOIqSmjlHyeFhv+VznlxRr8rLjkJA4s0lxvlkSPzC1ZLlDHur5j5jg2iTjo9TWF +E5Bege+Oc9bShjmV2ZhA7xMdERpNiPAA9Q1M4CLMUlK72tD2IJH/8BB39jkXdvZmUO8vy4oEQ VMD4g9XwdXGm0ZSLsEwy7QDSG+oQBmYhr8M1IlGMN6KmPx/zUXstIpvWBtPQtirnE/0mSWR2QC eWs= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483580" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:27:04 +0800 IronPort-SDR: dAlVDl5FLs5dGJe/kAk+FLO4y1+9hbLm5pEig2h4mFzYKQ6NswYORNIq16IRcdAaChrIdNnMR3 s1KOTjgC3SQsbuXbheo39NcCVvMTmJHsJ2KkG7bynRLRBhlKIIQ/d33XrikdpgyMW//qVq/eeU 4qU41jOuSm2hoQFw4K0mmzakOcNI7BRqMoZkxjV2XKa3r9ug9raXlgFgOufEjCvVk2tN+wePJX 9Ye1S8FoC+GuoYo+20THM+BTJXdQLNpW4dR+/mwfIVWI/hwWS+9Nc7mk0le9Io/dCVbAsENm20 +ct7JLs91sga1FF06sSNSUb/ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:31 -0800 IronPort-SDR: IDtNcALwWGUvLFa+j9c7T4VHgPXBkiTNHiGDsNswUcEFw4+fsVxgDGXXyL7kVUZCi8HhPoky96 dcFRz201jR/R3GpraqNBLwWB6DGQvU26Peo18TRkAeQ1GcOnALCYDYYF/F647GBywq9tPfjrF3 XBgIL6u9xS4KINIWqJVPgRu1DbJBuASMn1PWh0IylJ0+gOPRAYGAAZsrQlW9KUOt7ny7N/aeZ2 B8upc1RYXli9zt7kpMDZdVPgLE5E0sGUg/e5NUbVAEpeRoLpbj+aSWsrjOXs9bsG4ZTZnxjADO atU= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:27:02 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik Subject: [PATCH v14 38/42] btrfs: relocate block group to repair IO failure in ZONED Date: Tue, 26 Jan 2021 11:25:16 +0900 Message-Id: <09a953f3fc7f068588250ecee58a529f04279df7.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When btrfs find a checksum error and if the file system has a mirror of the damaged data, btrfs read the correct data from the mirror and write the data to damaged blocks. This repairing, however, is against the sequential write required rule. We can consider three methods to repair an IO failure in ZONED mode: (1) Reset and rewrite the damaged zone (2) Allocate new device extent and replace the damaged device extent to the new extent (3) Relocate the corresponding block group Method (1) is most similar to a behavior done with regular devices. However, it also wipes non-damaged data in the same device extent, and so it unnecessary degrades non-damaged data. Method (2) is much like device replacing but done in the same device. It is safe because it keeps the device extent until the replacing finish. However, extending device replacing is non-trivial. It assumes "src_dev->physical == dst_dev->physical". Also, the extent mapping replacing function should be extended to support replacing device extent position in one device. Method (3) invokes relocation of the damaged block group, so it is straightforward to implement. It relocates all the mirrored device extents, so it is, potentially, a more costly operation than method (1) or (2). But it relocates only using extents which reduce the total IO size. Let's apply method (3) for now. In the future, we can extend device-replace and apply method (2). For protecting a block group gets relocated multiple time with multiple IO errors, this commit introduces "relocating_repair" bit to show it's now relocating to repair IO failures. Also it uses a new kthread "btrfs-relocating-repair", not to block IO path with relocating process. This commit also supports repairing in the scrub process. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/block-group.h | 1 + fs/btrfs/extent_io.c | 3 ++ fs/btrfs/scrub.c | 3 ++ fs/btrfs/volumes.c | 71 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 1 + 5 files changed, 79 insertions(+) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 3dec66ed36cb..36654bcd2a83 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -96,6 +96,7 @@ struct btrfs_block_group { unsigned int has_caching_ctl:1; unsigned int removed:1; unsigned int to_copy:1; + unsigned int relocating_repair:1; int disk_cache_state; diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 8de609d1897a..49c7bf78c82e 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2259,6 +2259,9 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start, ASSERT(!(fs_info->sb->s_flags & SB_RDONLY)); BUG_ON(!mirror_num); + if (btrfs_is_zoned(fs_info)) + return btrfs_repair_one_zone(fs_info, logical); + bio = btrfs_io_bio_alloc(1); bio->bi_iter.bi_size = 0; map_length = length; diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 2f577f3b1c31..d0c47ef72d46 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -857,6 +857,9 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) have_csum = sblock_to_check->pagev[0]->have_csum; dev = sblock_to_check->pagev[0]->dev; + if (btrfs_is_zoned(fs_info) && !sctx->is_dev_replace) + return btrfs_repair_one_zone(fs_info, logical); + /* * We must use GFP_NOFS because the scrub task might be waiting for a * worker task executing this function and in turn a transaction commit diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index a99735dda515..0f6a79e67666 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7990,3 +7990,74 @@ bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr) spin_unlock(&fs_info->swapfile_pins_lock); return node != NULL; } + +static int relocating_repair_kthread(void *data) +{ + struct btrfs_block_group *cache = (struct btrfs_block_group *) data; + struct btrfs_fs_info *fs_info = cache->fs_info; + u64 target; + int ret = 0; + + target = cache->start; + btrfs_put_block_group(cache); + + if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE)) { + btrfs_info(fs_info, + "zoned: skip relocating block group %llu to repair: EBUSY", + target); + return -EBUSY; + } + + mutex_lock(&fs_info->delete_unused_bgs_mutex); + + /* Ensure Block Group still exists */ + cache = btrfs_lookup_block_group(fs_info, target); + if (!cache) + goto out; + + if (!cache->relocating_repair) + goto out; + + ret = btrfs_may_alloc_data_chunk(fs_info, target); + if (ret < 0) + goto out; + + btrfs_info(fs_info, "zoned: relocating block group %llu to repair IO failure", + target); + ret = btrfs_relocate_chunk(fs_info, target); + +out: + if (cache) + btrfs_put_block_group(cache); + mutex_unlock(&fs_info->delete_unused_bgs_mutex); + btrfs_exclop_finish(fs_info); + + return ret; +} + +int btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical) +{ + struct btrfs_block_group *cache; + + /* Do not attempt to repair in degraded state */ + if (btrfs_test_opt(fs_info, DEGRADED)) + return 0; + + cache = btrfs_lookup_block_group(fs_info, logical); + if (!cache) + return 0; + + spin_lock(&cache->lock); + if (cache->relocating_repair) { + spin_unlock(&cache->lock); + btrfs_put_block_group(cache); + return 0; + } + cache->relocating_repair = 1; + spin_unlock(&cache->lock); + + kthread_run(relocating_repair_kthread, cache, + "btrfs-relocating-repair"); + + return 0; +} diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 0bcf87a9e594..54f475e0c702 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -597,5 +597,6 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info, int btrfs_bg_type_to_factor(u64 flags); const char *btrfs_bg_type_to_raid_name(u64 flags); int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info); +int btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical); #endif From patchwork Tue Jan 26 02:25:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048113 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C458C43217 for ; Tue, 26 Jan 2021 19:54:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2EA0F22A83 for ; Tue, 26 Jan 2021 19:54:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387626AbhAZFaX (ORCPT ); Tue, 26 Jan 2021 00:30:23 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38250 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732334AbhAZCn6 (ORCPT ); Mon, 25 Jan 2021 21:43:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611629038; x=1643165038; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mMmFlm1ADxq3dYqfU0mxrP6qLI19NJyl4vx6551H+1s=; b=SDvH5xovq6wBTfcifCFQBhNq9Si9ALXp0KycliKsdhou2jBUN4KMXbzB +ZoXbdhQ98BUl+pW08xNNp2cMTXoyYi9chf3Hh7abFGxcIInZdHYBna/p YM3vXiOdO0F2DEX805c5vRomXjO4Qtp2lu/f1lAaCegVBbwxGUI/CMjLu clTIu9OpBlblHYry8kR+KhWOffV+vNe0pBrCNbqercsbN7jwy5EtRpAjW ez3uUy0moVwyBExdMjOG1LzEHtHU4cgKxSrAqsRaOzEQINhV68nPIPZ/8 xQVYdnCMJZJyufjb61X3uCmS3WEh5zOxnP/8dphRwAgaJ7uTOZ/W4KVcz w==; IronPort-SDR: e7TWfv5AZfRgGMXm8gnQdmuz0xkPmffFJ9TvO7tFSiiEM0TGbhX5IJPksFOpCZy9Zvrz+nBsE0 8n8qDFTRwMbADFvgOsthviJ9EcgK+Lmi2A5YkwO3E2l5ZyOEzE95+ph6zfSA78DAGLOLPNidpk N/MHvr1uoaY+EeTWGZG89R91WVIexDMme1SnGXBF6+b7NJHqfRPQi4/fp7FYAmZZGfuYBsLpaU 3RMZvXImx7KBqYCMr9kFMY4rj/uaj2b0nmkYkNAW7TVotjh1uNzy+QyPHP2u6MJOiza0bwjp4C BEI= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483582" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:27:06 +0800 IronPort-SDR: 7FIYzlsFUiU770LmxizF/lvA+a/a8YULB4ilBEgzfZIywGIpLtrU26qpbFpKQy34sSamp/Am69 7J2HtiiHlprJ1RngIGxbKXUB17W7OXMBN3ylMBItO7Sy8LZvW1EOYU5swdg2NRAJ13HIzGvNKt NR6T8f9sKIbU3BYPmxgBd2yxviD0Sw79qQHEVXkARayv6Gkn1oKi7Q3VduiCAisJUQ5P0HdH5p MSKw5BK14RKpvTU/uTd9SaDYWDPOyL3MSHAfsP0FGmTvJUBTonhbFcbmqoaKiQaH2GzfT90/wt 0JH0sEFU1TUPIuXwrd8FjcLy Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:32 -0800 IronPort-SDR: qGpGfF9+fEaSlZB3VNHTVPx9N+zGYwV/m9HxF+A3yPWZfE68GmRTzgJmz0wzbvZiGmK2sswPCo maD9IVIBNdIqwtHOd0geVrl8PLwkhUkvaA2hJkMraVrrXbg+/jDjXXJIOy5LMqpgY4JNdjQxJ4 fOqgu+TKW+KfTFJiLxd8vRim7borarJQchpq3DvBClADYr3f180VCp4INHDw0dPqbbyVv7+vRY Ch9XbpgPfqnTZhmFPRZrIqzsKWF7sAQt1SuiiOsaeGyJiIml1wJtulUiThrIiOw5UXScgeNjll eow= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:27:04 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik , Johannes Thumshirn Subject: [PATCH v14 39/42] btrfs: split alloc_log_tree() Date: Tue, 26 Jan 2021 11:25:17 +0900 Message-Id: <183f68ab886fde72dcaacacf4619d59f44dfab8d.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is a preparation for the next patch. This commit split alloc_log_tree() to allocating tree structure part (remains in alloc_log_tree()) and allocating tree node part (moved in btrfs_alloc_log_tree_node()). The latter part is also exported to be used in the next patch. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 33 +++++++++++++++++++++++++++------ fs/btrfs/disk-io.h | 2 ++ 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 5d14100ecf72..2e2f09a46f45 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1197,7 +1197,6 @@ static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { struct btrfs_root *root; - struct extent_buffer *leaf; root = btrfs_alloc_root(fs_info, BTRFS_TREE_LOG_OBJECTID, GFP_NOFS); if (!root) @@ -1207,6 +1206,14 @@ static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans, root->root_key.type = BTRFS_ROOT_ITEM_KEY; root->root_key.offset = BTRFS_TREE_LOG_OBJECTID; + return root; +} + +int btrfs_alloc_log_tree_node(struct btrfs_trans_handle *trans, + struct btrfs_root *root) +{ + struct extent_buffer *leaf; + /* * DON'T set SHAREABLE bit for log trees. * @@ -1219,26 +1226,33 @@ static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans, leaf = btrfs_alloc_tree_block(trans, root, 0, BTRFS_TREE_LOG_OBJECTID, NULL, 0, 0, 0, BTRFS_NESTING_NORMAL); - if (IS_ERR(leaf)) { - btrfs_put_root(root); - return ERR_CAST(leaf); - } + if (IS_ERR(leaf)) + return PTR_ERR(leaf); root->node = leaf; btrfs_mark_buffer_dirty(root->node); btrfs_tree_unlock(root->node); - return root; + + return 0; } int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { struct btrfs_root *log_root; + int ret; log_root = alloc_log_tree(trans, fs_info); if (IS_ERR(log_root)) return PTR_ERR(log_root); + + ret = btrfs_alloc_log_tree_node(trans, log_root); + if (ret) { + btrfs_put_root(log_root); + return ret; + } + WARN_ON(fs_info->log_root_tree); fs_info->log_root_tree = log_root; return 0; @@ -1250,11 +1264,18 @@ int btrfs_add_log_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_root *log_root; struct btrfs_inode_item *inode_item; + int ret; log_root = alloc_log_tree(trans, fs_info); if (IS_ERR(log_root)) return PTR_ERR(log_root); + ret = btrfs_alloc_log_tree_node(trans, log_root); + if (ret) { + btrfs_put_root(log_root); + return ret; + } + log_root->last_trans = trans->transid; log_root->root_key.offset = root->root_key.objectid; diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index 9f4a2a1e3d36..0e7e9526b6a8 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -120,6 +120,8 @@ blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio, extent_submit_bio_start_t *submit_bio_start); blk_status_t btrfs_submit_bio_done(void *private_data, struct bio *bio, int mirror_num); +int btrfs_alloc_log_tree_node(struct btrfs_trans_handle *trans, + struct btrfs_root *root); int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); int btrfs_add_log_tree(struct btrfs_trans_handle *trans, From patchwork Tue Jan 26 02:25:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12048115 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23884C432C3 for ; Tue, 26 Jan 2021 19:54:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EA07A22A83 for ; Tue, 26 Jan 2021 19:54:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387650AbhAZFaZ (ORCPT ); Tue, 26 Jan 2021 00:30:25 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38278 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728554AbhAZCoW (ORCPT ); Mon, 25 Jan 2021 21:44:22 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611629061; x=1643165061; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+XadiDr1zpbht6Yv4G8am8bAMnL1cto0nn1lKsmF1c8=; b=GOUeWAkn3a1czkN2S/fKtQyg6cH1qvqrStB814MHS24U9lr0wKHbwVb5 1g4SZx2O+M9monuATX5OoUXdGSwsGtHB41MV263zB1KyLsLFYYYKkkipj 36uoC5AKzduWHdeAUtBDpJi6eSsqjvtq42jEbW83ffP8OqStRR5V1vDLs BMSJMyGzD2N4b12Es7k/VsOVc+CBm63sSgohawxOzy51x5E0i2M85psiF P0popfID7POkjRq/Qt7smdXC7FNm5KXW1ixMTDBiPr3FDAIdrorVlXlQN wpdSKeP1o5fX44olb4uTdvkwRhcWiVsrDU3MUYWxgtr+88Rxtu4uopLYo w==; IronPort-SDR: X5Ogspuyy5g/hhRmcNfb7TZXewFU7Uh0OQAuMBtnTqWxDXR0h3sz9/fLrXa0t+2VcXYO6wPdlp i6HSoo5GsMWQ4CwsOD7nT96W4CpGLnCbOFkf9xqpj74wZWG4pnG3948z7tDfCPUeenqSFsOxwS HG1PYBNIsSI6y47ZBCypx5WEZjGttwRJU/sorNasSn8ghGWSrIUDth4zAXLjKTmy4VXU1vPeBC Dzo32WBcaadI7/EV9r3RIU+Td4IuHvI40IEyjFVYpuctiFVp9MC1W0VKnv66XO0dS2ckR0csJA iWw= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483585" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:27:08 +0800 IronPort-SDR: UzTvlZAQJyQzwRkoBQbZ5rutwj3TFNE+ltZCJLbDgqUV0DrACXBqlRhF2c+H6lVTyTjUAIh2vL uY90A0C9JKxEFQVBZqyAPk489dA+RHZy/dpKv980l8NQu/DelQZulRiziOe82bE8yTPUKtdL5T 4Resf+9Isf/qLJmvLUtsFL/ahWrpWv0eQAMELkzIrDT41jYQ6qU0gnrIIJH9uCdakxZba3lVMB 6Tp/lLh4XNKsoX4e9huXnkaN4FuA+OAAGx5VQbGQ0acmzSIL/fAHCMz56qH9OA99ly1x14OZIG E7A/s3kqD7S3lrwkqlhnEtF+ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:34 -0800 IronPort-SDR: 2Sg/CN3Fjy9+6G9V+v2vZdn5MJsEuy0at0+tgW6JaZTwCH7MlmVFcCBgo3/4gPwnTaLC3XCzQ1 W3Q2xKFKiF2gklSJcNVEPNnmNjGdZRuZ8hLMw/BqFTRUnsWSdF5+n2XeWPU2YbtKulqcJEoziu RYNWPkjMmaIBH/suYZwgI3BeycZpRmKn/6y8MV510LhMPd/CGt4NSeCJt7x2f7KoMA4mYMMc1q D4ZmT+jRVJuTY+1+bj72fS8+b/RRTbrcOYOUnnFw3hKYbJtLf4M31k9AOLkrJOGg15bFWrEWRx UAo= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:27:06 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik , Johannes Thumshirn Subject: [PATCH v14 40/42] btrfs: extend zoned allocator to use dedicated tree-log block group Date: Tue, 26 Jan 2021 11:25:18 +0900 Message-Id: <932487e490aed6797414f34f225b190668020db7.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 1/3 patch to enable tree log on ZONED mode. The tree-log feature does not work on ZONED mode as is. Blocks for a tree-log tree are allocated mixed with other metadata blocks, and btrfs writes and syncs the tree-log blocks to devices at the time of fsync(), which is different timing from a global transaction commit. As a result, both writing tree-log blocks and writing other metadata blocks become non-sequential writes that ZONED mode must avoid. We can introduce a dedicated block group for tree-log blocks so that tree-log blocks and other metadata blocks can be separated write streams. As a result, each write stream can now be written to devices separately. "fs_info->treelog_bg" tracks the dedicated block group and btrfs assign "treelog_bg" on-demand on tree-log block allocation time. This commit extends the zoned block allocator to use the block group. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 2 ++ fs/btrfs/ctree.h | 2 ++ fs/btrfs/disk-io.c | 1 + fs/btrfs/extent-tree.c | 75 +++++++++++++++++++++++++++++++++++++++--- fs/btrfs/zoned.h | 14 ++++++++ 5 files changed, 90 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 7facc4439116..7f90e2fd39a5 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -901,6 +901,8 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, btrfs_return_cluster_to_free_space(block_group, cluster); spin_unlock(&cluster->refill_lock); + btrfs_clear_treelog_bg(block_group); + path = btrfs_alloc_path(); if (!path) { ret = -ENOMEM; diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 37afe3f49045..7c04f33b3e44 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -976,6 +976,8 @@ struct btrfs_fs_info { /* Max size to emit ZONE_APPEND write command */ u64 max_zone_append_size; struct mutex zoned_meta_io_lock; + spinlock_t treelog_bg_lock; + u64 treelog_bg; #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 2e2f09a46f45..c3b5cfe4d928 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2722,6 +2722,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) spin_lock_init(&fs_info->super_lock); spin_lock_init(&fs_info->buffer_lock); spin_lock_init(&fs_info->unused_bgs_lock); + spin_lock_init(&fs_info->treelog_bg_lock); rwlock_init(&fs_info->tree_mod_log_lock); mutex_init(&fs_info->unused_bg_unpin_mutex); mutex_init(&fs_info->delete_unused_bgs_mutex); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index f73f39bd68c0..7ba91b175450 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3522,6 +3522,9 @@ struct find_free_extent_ctl { bool have_caching_bg; bool orig_have_caching_bg; + /* Allocation is called for tree-log */ + bool for_treelog; + /* RAID index, converted from flags */ int index; @@ -3750,6 +3753,22 @@ static int do_allocation_clustered(struct btrfs_block_group *block_group, return find_free_extent_unclustered(block_group, ffe_ctl); } +/* + * Tree-log Block Group Locking + * ============================ + * + * fs_info::treelog_bg_lock protects the fs_info::treelog_bg which + * indicates the starting address of a block group, which is reserved only + * for tree-log metadata. + * + * Lock nesting + * ============ + * + * space_info::lock + * block_group::lock + * fs_info::treelog_bg_lock + */ + /* * Simple allocator for sequential only block group. It only allows * sequential allocation. No need to play with trees. This function @@ -3759,23 +3778,54 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, struct find_free_extent_ctl *ffe_ctl, struct btrfs_block_group **bg_ret) { + struct btrfs_fs_info *fs_info = block_group->fs_info; struct btrfs_space_info *space_info = block_group->space_info; struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; u64 start = block_group->start; u64 num_bytes = ffe_ctl->num_bytes; u64 avail; + u64 bytenr = block_group->start; + u64 log_bytenr; int ret = 0; + bool skip; ASSERT(btrfs_is_zoned(block_group->fs_info)); + /* + * Do not allow non-tree-log blocks in the dedicated tree-log block + * group, and vice versa. + */ + spin_lock(&fs_info->treelog_bg_lock); + log_bytenr = fs_info->treelog_bg; + skip = log_bytenr && ((ffe_ctl->for_treelog && bytenr != log_bytenr) || + (!ffe_ctl->for_treelog && bytenr == log_bytenr)); + spin_unlock(&fs_info->treelog_bg_lock); + if (skip) + return 1; + spin_lock(&space_info->lock); spin_lock(&block_group->lock); + spin_lock(&fs_info->treelog_bg_lock); + + ASSERT(!ffe_ctl->for_treelog || + block_group->start == fs_info->treelog_bg || + fs_info->treelog_bg == 0); if (block_group->ro) { ret = 1; goto out; } + /* + * Do not allow currently using block group to be tree-log dedicated + * block group. + */ + if (ffe_ctl->for_treelog && !fs_info->treelog_bg && + (block_group->used || block_group->reserved)) { + ret = 1; + goto out; + } + avail = block_group->length - block_group->alloc_offset; if (avail < num_bytes) { if (ffe_ctl->max_extent_size < avail) { @@ -3790,6 +3840,9 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, goto out; } + if (ffe_ctl->for_treelog && !fs_info->treelog_bg) + fs_info->treelog_bg = block_group->start; + ffe_ctl->found_offset = start + block_group->alloc_offset; block_group->alloc_offset += num_bytes; spin_lock(&ctl->tree_lock); @@ -3804,6 +3857,9 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, ffe_ctl->search_start = ffe_ctl->found_offset; out: + if (ret && ffe_ctl->for_treelog) + fs_info->treelog_bg = 0; + spin_unlock(&fs_info->treelog_bg_lock); spin_unlock(&block_group->lock); spin_unlock(&space_info->lock); return ret; @@ -4053,7 +4109,12 @@ static int prepare_allocation(struct btrfs_fs_info *fs_info, return prepare_allocation_clustered(fs_info, ffe_ctl, space_info, ins); case BTRFS_EXTENT_ALLOC_ZONED: - /* nothing to do */ + if (ffe_ctl->for_treelog) { + spin_lock(&fs_info->treelog_bg_lock); + if (fs_info->treelog_bg) + ffe_ctl->hint_byte = fs_info->treelog_bg; + spin_unlock(&fs_info->treelog_bg_lock); + } return 0; default: BUG(); @@ -4097,6 +4158,7 @@ static noinline int find_free_extent(struct btrfs_root *root, struct find_free_extent_ctl ffe_ctl = {0}; struct btrfs_space_info *space_info; bool full_search = false; + bool for_treelog = root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID; WARN_ON(num_bytes < fs_info->sectorsize); @@ -4110,6 +4172,7 @@ static noinline int find_free_extent(struct btrfs_root *root, ffe_ctl.orig_have_caching_bg = false; ffe_ctl.found_offset = 0; ffe_ctl.hint_byte = hint_byte_orig; + ffe_ctl.for_treelog = for_treelog; ffe_ctl.policy = BTRFS_EXTENT_ALLOC_CLUSTERED; /* For clustered allocation */ @@ -4184,8 +4247,11 @@ static noinline int find_free_extent(struct btrfs_root *root, struct btrfs_block_group *bg_ret; /* If the block group is read-only, we can skip it entirely. */ - if (unlikely(block_group->ro)) + if (unlikely(block_group->ro)) { + if (for_treelog) + btrfs_clear_treelog_bg(block_group); continue; + } btrfs_grab_block_group(block_group, delalloc); ffe_ctl.search_start = block_group->start; @@ -4373,6 +4439,7 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, bool final_tried = num_bytes == min_alloc_size; u64 flags; int ret; + bool for_treelog = root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID; flags = get_alloc_profile_by_root(root, is_data); again: @@ -4396,8 +4463,8 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, sinfo = btrfs_find_space_info(fs_info, flags); btrfs_err(fs_info, - "allocation failed flags %llu, wanted %llu", - flags, num_bytes); + "allocation failed flags %llu, wanted %llu treelog %d", + flags, num_bytes, for_treelog); if (sinfo) btrfs_dump_space_info(fs_info, sinfo, num_bytes, 1); diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 8c203c0425e0..52789da61fa3 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -7,6 +7,7 @@ #include #include "volumes.h" #include "disk-io.h" +#include "block-group.h" struct btrfs_zoned_device_info { /* @@ -292,4 +293,17 @@ static inline void btrfs_zoned_meta_io_unlock(struct btrfs_fs_info *fs_info) mutex_unlock(&fs_info->zoned_meta_io_lock); } +static inline void btrfs_clear_treelog_bg(struct btrfs_block_group *bg) +{ + struct btrfs_fs_info *fs_info = bg->fs_info; + + if (!btrfs_is_zoned(fs_info)) + return; + + spin_lock(&fs_info->treelog_bg_lock); + if (fs_info->treelog_bg == bg->start) + fs_info->treelog_bg = 0; + spin_unlock(&fs_info->treelog_bg_lock); +} + #endif From patchwork Tue Jan 26 02:25:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12045421 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C4C8C43603 for ; Tue, 26 Jan 2021 05:30:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 613AB230FE for ; Tue, 26 Jan 2021 05:30:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387638AbhAZFaY (ORCPT ); Tue, 26 Jan 2021 00:30:24 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:38256 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728656AbhAZCoV (ORCPT ); Mon, 25 Jan 2021 21:44:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611629061; x=1643165061; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GgKvwZ3/MumYs8snuRzyykc3kM4rM9mgurEtkGpvjZY=; b=ZSdujKgLi9QxZJRk/utl90HZQVvqlmUEZwTP6X0U/KSF0LN+I69+bjRZ /ZH/GYm2qRhVX5XrNlgL17uUKQkMWdGRF/YU2H+EVu4dHWcvmXq81NTY6 HuS0XQgUQrHBbj54lofRp8ZFSMUDmRVcySSAbaf7hklXZWfA7tgUhqzHn TrrI2GKHzp39Hf6i9XscI4O+Ad7n0ji78pV/bP7S3ZcBOQB6iAcd/1iv0 rznx0nE0KNpi41aM/4QBJ3MLBC6jRSeDYhtQvZvuVYkL6JxHkzx0J+/wC hx5r/j91e3fy43saaTTvlE5/YYQPngYgGwIN3VhmaALMD2GfYeyTW9TA8 Q==; IronPort-SDR: gLfLHo7z6Hrz//BrwtRI52pYSXoxGtyfE3O1piYDSiveRt6V5v12XdaIGgD4u6qSit4IGyQ55/ LKI1fG7M9JlaZTuqfi865xLOBZOsE3c/IYqE8HmgAuA1XuBq3LM0iY6uFoCn7mytRLpF9+1fHy uUGairxEHNdD2X/+3TPA9lkOQnKib+HHCFUEVMwY08pilIwJqTbfQAsx5wosfdpHCPN/PpbbgZ tFsFgMIhZeFvJEknsvCIKEfmDA+4Eev3JrG8TP1JcPoJl7v8TDz5yGRGTFn0XbX7ZPSL2WEGIt Yf0= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483587" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:27:09 +0800 IronPort-SDR: uBqffQ1uoGVq5KIlDChCt/0vjm0Y5YHe+BZPsBjb9x6q2VH0YD4XF4vaEWv+QTAsHn2xNZkPQS 5sIPWyXFHYYfCu9094CJf+AajvtHI6OYyt1Ngk9f4K41D8qf5YZGR+FHP++EirL31wanmW3Bg1 sNC7aJO688rSBnqtyrZz5maCSqUtXJccrYjuwOEj7oUBrny9fOkuzLJkT/eRqVw8m87SWNcook 2cEobKEN9XEn3TYjk+AD40BABJ3l5LSALcEf+wFW9ocWmpiZV6Nk5CnR0GYbXpUwOjNhuM/Meu MyAQEexY3mjhhI/csujViegu Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:36 -0800 IronPort-SDR: HlmVOMwwXwOSRzNiPL6El0wEBh3oA9FEgFwnRsi2FgjLfYeiUWCFz+k/Weo+pB4twpWISy1aTl hxix33yX51PebZqdYoD3pEME50gP+bzm7gK1BDPEGT7hzx7oUWHPTFlBgBSfP3uPVI1n5nUGuh AgrCs8yHO6emP2HDXVhdSrcrLFwK2H/8UA8kYPZYMAOfxtK5jMETnJK/bbat2fPddUN26kmDob kKzKOgqwIXcLUn3AuHlsWsW6ELklKsAbLbyamTDt6CULJy9X6NTE8WXud55sjb43omZOAteB++ vLU= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:27:08 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota Subject: [PATCH v14 41/42] btrfs: serialize log transaction on ZONED mode Date: Tue, 26 Jan 2021 11:25:19 +0900 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 2/3 patch to enable tree-log on ZONED mode. Since we can start more than one log transactions per subvolume simultaneously, nodes from multiple transactions can be allocated interleaved. Such mixed allocation results in non-sequential writes at the time of log transaction commit. The nodes of the global log root tree (fs_info->log_root_tree), also have the same mixed allocation problem. This patch serializes log transactions by waiting for a committing transaction when someone tries to start a new transaction, to avoid the mixed allocation problem. We must also wait for running log transactions from another subvolume, but there is no easy way to detect which subvolume root is running a log transaction. So, this patch forbids starting a new log transaction when other subvolumes already allocated the global log root tree. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/tree-log.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 930e752686b4..71a1c0b5bc26 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -105,6 +105,7 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans, struct btrfs_root *log, struct btrfs_path *path, u64 dirid, int del_all); +static void wait_log_commit(struct btrfs_root *root, int transid); /* * tree logging is a special write ahead log used to make sure that @@ -140,6 +141,7 @@ static int start_log_trans(struct btrfs_trans_handle *trans, { struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_root *tree_root = fs_info->tree_root; + const bool zoned = btrfs_is_zoned(fs_info); int ret = 0; /* @@ -160,12 +162,20 @@ static int start_log_trans(struct btrfs_trans_handle *trans, mutex_lock(&root->log_mutex); +again: if (root->log_root) { + int index = (root->log_transid + 1) % 2; + if (btrfs_need_log_full_commit(trans)) { ret = -EAGAIN; goto out; } + if (zoned && atomic_read(&root->log_commit[index])) { + wait_log_commit(root, root->log_transid - 1); + goto again; + } + if (!root->log_start_pid) { clear_bit(BTRFS_ROOT_MULTI_LOG_TASKS, &root->state); root->log_start_pid = current->pid; @@ -173,6 +183,17 @@ static int start_log_trans(struct btrfs_trans_handle *trans, set_bit(BTRFS_ROOT_MULTI_LOG_TASKS, &root->state); } } else { + if (zoned) { + mutex_lock(&fs_info->tree_log_mutex); + if (fs_info->log_root_tree) + ret = -EAGAIN; + else + ret = btrfs_init_log_root_tree(trans, fs_info); + mutex_unlock(&fs_info->tree_log_mutex); + } + if (ret) + goto out; + ret = btrfs_add_log_tree(trans, root); if (ret) goto out; @@ -201,14 +222,22 @@ static int start_log_trans(struct btrfs_trans_handle *trans, */ static int join_running_log_trans(struct btrfs_root *root) { + const bool zoned = btrfs_is_zoned(root->fs_info); int ret = -ENOENT; if (!test_bit(BTRFS_ROOT_HAS_LOG_TREE, &root->state)) return ret; mutex_lock(&root->log_mutex); +again: if (root->log_root) { + int index = (root->log_transid + 1) % 2; + ret = 0; + if (zoned && atomic_read(&root->log_commit[index])) { + wait_log_commit(root, root->log_transid - 1); + goto again; + } atomic_inc(&root->log_writers); } mutex_unlock(&root->log_mutex); From patchwork Tue Jan 26 02:25:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12045423 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D6FFC4361B for ; Tue, 26 Jan 2021 05:30:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8A33C22EBF for ; Tue, 26 Jan 2021 05:30:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387659AbhAZFa1 (ORCPT ); Tue, 26 Jan 2021 00:30:27 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:33036 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732410AbhAZCqV (ORCPT ); Mon, 25 Jan 2021 21:46:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1611629181; x=1643165181; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4E4eQmbryp9+ccIW3vYSkkKzer1kPPbZykOhJlEGgJQ=; b=XiS4btePFbkXlz4bPkTjUQOVN5p4eLjMOIfSXKaXaGgLKXvuvUjL9cI9 D7ZLYVtdu5tuymApu8eBuQF14zO8/MYjojJUluD00s8P1Zbmd6i9bNq7D WfP/36BkJuOZf9wGREG40uDCiTSMi5YhZKHEX1cbm20ygjjOYFXJ+EtEl M0RHO680dR1aNJTb5J0gdL0ASxRbtGGU3UsM/1x7IYdCdEqczi73SZ3Lo V6yR6xfpmOpqkzp8PKO1cX5VE4RrtWaJ7b53RCztU1IPcCRlcf+c36ji7 Vosa/fYXNujlKiTAUSydbAjSNsxU9yp2cZQdZKqFHq33XuOGIOwQhf/a6 A==; IronPort-SDR: bIVPHBK6xcxb1zX7FyaYp+vYXNf25xHXAOinjU5qVcjHxVf/xULOnWjXW1sEt2Jp2nBDYG/K9y j59ORVsLod8J+qlI6IxPuw+HI0RzdFY0oevo9aYjhdB9fJHeQSSd8+dpGdTXqSetfVDfao1p0B MBhIE6dAWrRQmxkWjv5PPUjWfTGgYqXVFPZPmUswIq+YJVZMu6KH9pjKsQuVUVAvtlt4ZrqDJX IoGm6gm4qW5gsm9VNbgM0tr9kZ/tYvVm03+GykFUDzPtjfBbjxIheXjkXs4IM/b5G+6b4zQktw N6I= X-IronPort-AV: E=Sophos;i="5.79,375,1602518400"; d="scan'208";a="159483594" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 26 Jan 2021 10:27:11 +0800 IronPort-SDR: Fy8xlq3uH8gAokt3yxlWxESoVaIeCkn+yIUKIE6trNZszINrW2o0j5ZW0AiAuYzM4l6Su2PJgr QeK2ew1FW1Bb21d5PQNoCAP5PwJET2/fvzyA4UKlwBeb4pyMSjxnBqs4sD/FjX5vWuGJQY/br/ bVqKS4JuZu61TT8LXCYqSzBHP00OlAeyD6wr7xau2PjeGniL7cEuddeOy4Ayz+bsKDuiiUmrpb BAzsXKmSFk6Z0rkUhugbjg5iLFbhlLdHch5TCkoK+LZEvQCvVoFWooijHdC8ebU4yj5pJarrXC 3aVCKgOjPhPighbL315cV4hA Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2021 18:11:38 -0800 IronPort-SDR: XnLxU7QF1GzciBQCA7PYmJv9mHN4Q/+eb6UcnXP9bKr7FZaIljn9N7KEmlqscaNlMkocJKJGWU wJj1y8qzsPxR0A8xs2UsPKcrkagE2gYwG7A/T88K0DoEuFnApDCzfB9+61ULys2E5kPIi0yEU9 6IliH6wZVsWD/RYIceCP6ZvijOuLAkSRlRU0UDDQHRL1VU1PdHygOULP4woM1ykedwKuwZXO4P cNBz1KPXX3gJ2wrSFOV5VTX+oK4cBNj006lSo0mtvJGJdKGHaEUJ+o1Nl2RzxrmPvm7pQ/TEmU lCg= WDCIronportException: Internal Received: from naota.dhcp.fujisawa.hgst.com ([10.149.52.155]) by uls-op-cesaip02.wdc.com with ESMTP; 25 Jan 2021 18:27:09 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Jens Axboe , Christoph Hellwig , "Darrick J. Wong" , Naohiro Aota , Josef Bacik , Johannes Thumshirn Subject: [PATCH v14 42/42] btrfs: reorder log node allocation Date: Tue, 26 Jan 2021 11:25:20 +0900 Message-Id: <246db67fcf56240127a252f09742684cd30f4cfe.1611627788.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 3/3 patch to enable tree-log on ZONED mode. The allocation order of nodes of "fs_info->log_root_tree" and nodes of "root->log_root" is not the same as the writing order of them. So, the writing causes unaligned write errors. This patch reorders the allocation of them by delaying allocation of the root node of "fs_info->log_root_tree," so that the node buffers can go out sequentially to devices. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 7 ------- fs/btrfs/tree-log.c | 24 ++++++++++++++++++------ 2 files changed, 18 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c3b5cfe4d928..d2b30716de84 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1241,18 +1241,11 @@ int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { struct btrfs_root *log_root; - int ret; log_root = alloc_log_tree(trans, fs_info); if (IS_ERR(log_root)) return PTR_ERR(log_root); - ret = btrfs_alloc_log_tree_node(trans, log_root); - if (ret) { - btrfs_put_root(log_root); - return ret; - } - WARN_ON(fs_info->log_root_tree); fs_info->log_root_tree = log_root; return 0; diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 71a1c0b5bc26..d8315363dc1e 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3159,6 +3159,16 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, list_add_tail(&root_log_ctx.list, &log_root_tree->log_ctxs[index2]); root_log_ctx.log_transid = log_root_tree->log_transid; + mutex_lock(&fs_info->tree_log_mutex); + if (!log_root_tree->node) { + ret = btrfs_alloc_log_tree_node(trans, log_root_tree); + if (ret) { + mutex_unlock(&fs_info->tree_log_mutex); + goto out; + } + } + mutex_unlock(&fs_info->tree_log_mutex); + /* * Now we are safe to update the log_root_tree because we're under the * log_mutex, and we're a current writer so we're holding the commit @@ -3317,12 +3327,14 @@ static void free_log_tree(struct btrfs_trans_handle *trans, .process_func = process_one_buffer }; - ret = walk_log_tree(trans, log, &wc); - if (ret) { - if (trans) - btrfs_abort_transaction(trans, ret); - else - btrfs_handle_fs_error(log->fs_info, ret, NULL); + if (log->node) { + ret = walk_log_tree(trans, log, &wc); + if (ret) { + if (trans) + btrfs_abort_transaction(trans, ret); + else + btrfs_handle_fs_error(log->fs_info, ret, NULL); + } } clear_extent_bits(&log->dirty_log_pages, 0, (u64)-1,