From patchwork Thu Feb 4 10:21:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066767 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 277BDC433E6 for ; Thu, 4 Feb 2021 10:24:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D980364F65 for ; Thu, 4 Feb 2021 10:24:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235332AbhBDKYF (ORCPT ); Thu, 4 Feb 2021 05:24:05 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54218 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235201AbhBDKYB (ORCPT ); Thu, 4 Feb 2021 05:24:01 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434240; x=1643970240; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BMLhpfHfBzIq80REyLJtBw7YshDs+nIEmFK1hkUa1t8=; b=HPOGI6EQXySJ1hSeeASMbhDPw4NxDSSfSidcJspHj+7jjhQXEPS87Zfw sC7wcSrY0FOUUW7z02tJxELlUSfZ0alV4d/HNj88XVNUIwt/BxdywShvo 9waXqfZDV96VbiLV7qatGly+xRJ34yOw9DiQCejEGV6EpqRnb+V92eMK+ LltD9uqQ5d11tzIm9Q0Nf9GdmK7Bwdgsp71d5aJdKqsCcFggsGtpH2czf 3CDnalI0XczCcSbqeXQKfFxPKclmWYDqVZQPbxK/QHR1aSlazCtJA3qLy 1q+uAATtxcZsk0ezzqctY81HZ5mBhqsGPMcB+e+CoR1MH7ImVuLthOy1i w==; IronPort-SDR: qj+9R4ntVQ2okrvAzc2utO2gDD0lalDyD7VaM399SOq7aR4Zd6cMsNSRhMUw9Ng9b4Cy3TQ/+w HnaWYJdY6/raVVaHGPJTYqSOO9J8mP4eBu66OXe/4nOKDHFBbVDVZCAquU6r/G3dE8iPLjAzZ6 gwUxnZ3PYqX44Hn4ehD//o/6ndDq5iHQdWWKh27u2eOoiW3WpBAfNAo+Gt6HLlsd+OmuaU7HDH 9TXNDr+kvFAWpQk+wN3F845NvYazcXaxaA3ARo5G6tLKl4TKduYq1VCjxggt86/xFN1EVmsJOv ti8= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159107942" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:22:55 +0800 IronPort-SDR: And/SLZwm4WssPnlyQn4jN7Y/d/suMD5F2+wDJND3d90J3oNe/WUvp5f4p+aTXl9GWjmssOFj+ pp3GnhGYMzu5YAP5uFtaEjy2QVjyV2sQ3SQvQle4KvKOO8dLYK4b8cXG3oDxSAh3U9wWhK9BwI j0aVV8vcdy8OnPcExaKDBGjpPqNjn39UxWoZonkEDBv+MTIb9MHC8BLgunmEQmNczYEGtGfP6A I3viqQSEkxHf2wPvMJt7PWTT80gFjKBLtt32XPaZKiPIOg1HyQfpAJXroEh0d9sdTtjWI2iWhx DUy4fuj6iZk7eRfRWRtJ6nNk Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:04:58 -0800 IronPort-SDR: 9Jh2cmnEWePNq86qjtv7lNRtPSGAAZ8CDZFM+bADgWaXMgdPXPggcHK3kAK+aQCJ9hwhDdgzVm 4nltHnmXFZRQd9CpkE4cySvr8JrJxdxOQJimMS0CXqTNhE5sCsu/aQmOzPY4Vwd2lyfiUJdPem 90YwT+D+HTqUPMFpKbZb163FG5m4igG8uFylQWDVFQ95S3YcjNceesvmm+w7MSqoALvXgVl5wY JvSIqL8vx6mizqD1exD3PqstGQ7hN8sYBfSCpNqirdBGuxYvcQyuroXkIj+qF/oLv6QlLQTJUc ryw= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:22:54 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Johannes Thumshirn , Christoph Hellwig , Josef Bacik , Chaitanya Kulkarni , Jens Axboe Subject: [PATCH v15 01/42] block: add bio_add_zone_append_page Date: Thu, 4 Feb 2021 19:21:40 +0900 Message-Id: X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Add bio_add_zone_append_page(), a wrapper around bio_add_hw_page() which is intended to be used by file systems that directly add pages to a bio instead of using bio_iov_iter_get_pages(). Reviewed-by: Christoph Hellwig Reviewed-by: Josef Bacik Reviewed-by: Chaitanya Kulkarni Acked-by: Jens Axboe Signed-off-by: Johannes Thumshirn --- block/bio.c | 33 +++++++++++++++++++++++++++++++++ include/linux/bio.h | 2 ++ 2 files changed, 35 insertions(+) diff --git a/block/bio.c b/block/bio.c index 1f2cc1fbe283..2f21d2958b60 100644 --- a/block/bio.c +++ b/block/bio.c @@ -851,6 +851,39 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, } EXPORT_SYMBOL(bio_add_pc_page); +/** + * bio_add_zone_append_page - attempt to add page to zone-append bio + * @bio: destination bio + * @page: page to add + * @len: vec entry length + * @offset: vec entry offset + * + * Attempt to add a page to the bio_vec maplist of a bio that will be submitted + * for a zone-append request. This can fail for a number of reasons, such as the + * bio being full or the target block device is not a zoned block device or + * other limitations of the target block device. The target block device must + * allow bio's up to PAGE_SIZE, so it is always possible to add a single page + * to an empty bio. + * + * Returns: number of bytes added to the bio, or 0 in case of a failure. + */ +int bio_add_zone_append_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int offset) +{ + struct request_queue *q = bio->bi_disk->queue; + bool same_page = false; + + if (WARN_ON_ONCE(bio_op(bio) != REQ_OP_ZONE_APPEND)) + return 0; + + if (WARN_ON_ONCE(!blk_queue_is_zoned(q))) + return 0; + + return bio_add_hw_page(q, bio, page, len, offset, + queue_max_zone_append_sectors(q), &same_page); +} +EXPORT_SYMBOL_GPL(bio_add_zone_append_page); + /** * __bio_try_merge_page - try appending data to an existing bvec. * @bio: destination bio diff --git a/include/linux/bio.h b/include/linux/bio.h index 1edda614f7ce..de62911473bb 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -455,6 +455,8 @@ void bio_chain(struct bio *, struct bio *); extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int); extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *, unsigned int, unsigned int); +int bio_add_zone_append_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int offset); bool __bio_try_merge_page(struct bio *bio, struct page *page, unsigned int len, unsigned int off, bool *same_page); void __bio_add_page(struct bio *bio, struct page *page, From patchwork Thu Feb 4 10:21:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066765 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E18AAC433DB for ; Thu, 4 Feb 2021 10:24:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 915B764F59 for ; Thu, 4 Feb 2021 10:24:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235348AbhBDKYH (ORCPT ); Thu, 4 Feb 2021 05:24:07 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54222 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235267AbhBDKYC (ORCPT ); Thu, 4 Feb 2021 05:24:02 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434242; x=1643970242; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qyufLWUgSwmWrYzDYEPcSXEzB03Y/7J9Ar2bT3V275o=; b=X2Ha5Zv1npqTwQX0HwB6wr0Pl+Spkb6KbKj5wqPCj534FIztIGhfcSKp blaL0xyO1RilBXqK3/GTzua3ZwcDRcLvgEPpr5GN1vnO2kg4gAL73si1H 8XQPJ+/0yp/RHYU3MHU7Vmk+qPRxcgtAtRlJCvmy2RNo1RTtmrGe/iSRK lgT5JfQRb07VSHX+78H8mUHDRnR1vlxgyMjYbzO64vHUB+/973Q5j5mJa lNQ+SiyZpteF3FZKnkBpbVDG2Gwd6snzClMe/enAuebLeqGhEt2vbuDXN VcSB3+D0CVrfJdwrXQ7YkaMtTVk8UAyjx99k8gEFrkJSKm9XI3cHRvb/7 Q==; IronPort-SDR: 4BgndRiGdTMuo/Unpwt4hCSe9rSa6g+EpLqTuKwj2xIuyNT1ijNaP+aGnvircgvcQpOgoDRvyj 8ga/4fdQ/Arc7achOefbinijDLNPPdM2/qaFdlPE8nWoyeAFuIx4Sh543y2rBNCpO5ChWvSmpD +89Tj243bNfSikU8FcJjVuV/FUj8HmIqdc1M55xCETZpDDDV5KO/LcdHDBaMaXcOrX7SRKo3hf uf9L96pi/c5orKbXN5x2Oi4oPYDQCeFisfGUg64xU3DqLiAgXZY3ds4oCbKAtQ55U6dA5oWnaM yb8= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159107946" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:22:57 +0800 IronPort-SDR: eq0a/anM/JA9VL7IXlCH3M+MmG9qNBQxuPrckT+i+BIspksMa6HnwHQDBC1MFJTifzk0fO96Qo cE+nR9EMXP9TJweAFNy404DdyjHj3kkap+s/lRASyoplEXiVL4z0UEUTz7+Ol0I+VS93Q+HYuh iloAV+Lpdz10HdCbHTdbWXZ7IXwiVs0vGtRUtSM7vZJhbQ/0R3uC0xRnPC699d+PgHQ/VhLTZc 93FpSW3NnDPXgZpgYleP+N+1xiXwuPGciOxZ/RQ2VJQ9uunp/5BYezPPgIC2VLLlEEhOZN+xa1 jCPdfb4mV1543qDDBQOrZPR4 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:00 -0800 IronPort-SDR: YFzU51e54JdVUjo+h/QipFcHM9BLdh70P+M+0VGtP68bxgapY9kuxc+WHdsMojSSrosGX/zLdz Yul+78xm2Fsgl8udAVS47ZbC2uVaaAnOgkjadt074eVZgEuYAXzEdY+fElJGFlRLcmQJYxiBbC 8s+z85+QhKpmkrqg/0AzwJ9RnT3V1900zaONaPEfD0KFjEbYOdoTdA3Q+1P0oN0/elsn7Axfke P/FpuudguX9/TFfkXADq48jXwPKYbrPZCILtQYveRzkOlsRr+D6dOChdGYsQfJEI+gviZ2aGYL 4U0= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:22:56 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , "Darrick J . Wong" , Christoph Hellwig , Chaitanya Kulkarni Subject: [PATCH v15 02/42] iomap: support REQ_OP_ZONE_APPEND Date: Thu, 4 Feb 2021 19:21:41 +0900 Message-Id: X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org A ZONE_APPEND bio must follow hardware restrictions (e.g. not exceeding max_zone_append_sectors) not to be split. bio_iov_iter_get_pages builds such restricted bio using __bio_iov_append_get_pages if bio_op(bio) == REQ_OP_ZONE_APPEND. To utilize it, we need to set the bio_op before calling bio_iov_iter_get_pages(). This commit introduces IOMAP_F_ZONE_APPEND, so that iomap user can set the flag to indicate they want REQ_OP_ZONE_APPEND and restricted bio. Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Signed-off-by: Naohiro Aota --- fs/iomap/direct-io.c | 43 +++++++++++++++++++++++++++++++++++++------ include/linux/iomap.h | 1 + 2 files changed, 38 insertions(+), 6 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 933f234d5bec..2273120d8ed7 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -201,6 +201,34 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos, iomap_dio_submit_bio(dio, iomap, bio, pos); } +/* + * Figure out the bio's operation flags from the dio request, the + * mapping, and whether or not we want FUA. Note that we can end up + * clearing the WRITE_FUA flag in the dio request. + */ +static inline unsigned int +iomap_dio_bio_opflags(struct iomap_dio *dio, struct iomap *iomap, bool use_fua) +{ + unsigned int opflags = REQ_SYNC | REQ_IDLE; + + if (!(dio->flags & IOMAP_DIO_WRITE)) { + WARN_ON_ONCE(iomap->flags & IOMAP_F_ZONE_APPEND); + return REQ_OP_READ; + } + + if (iomap->flags & IOMAP_F_ZONE_APPEND) + opflags |= REQ_OP_ZONE_APPEND; + else + opflags |= REQ_OP_WRITE; + + if (use_fua) + opflags |= REQ_FUA; + else + dio->flags &= ~IOMAP_DIO_WRITE_FUA; + + return opflags; +} + static loff_t iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, struct iomap_dio *dio, struct iomap *iomap) @@ -208,6 +236,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev)); unsigned int fs_block_size = i_blocksize(inode), pad; unsigned int align = iov_iter_alignment(dio->submit.iter); + unsigned int bio_opf; struct bio *bio; bool need_zeroout = false; bool use_fua = false; @@ -263,6 +292,13 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, iomap_dio_zero(dio, iomap, pos - pad, pad); } + /* + * Set the operation flags early so that bio_iov_iter_get_pages + * can set up the page vector appropriately for a ZONE_APPEND + * operation. + */ + bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua); + do { size_t n; if (dio->error) { @@ -278,6 +314,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, bio->bi_ioprio = dio->iocb->ki_ioprio; bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; + bio->bi_opf = bio_opf; ret = bio_iov_iter_get_pages(bio, dio->submit.iter); if (unlikely(ret)) { @@ -293,14 +330,8 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, n = bio->bi_iter.bi_size; if (dio->flags & IOMAP_DIO_WRITE) { - bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE; - if (use_fua) - bio->bi_opf |= REQ_FUA; - else - dio->flags &= ~IOMAP_DIO_WRITE_FUA; task_io_account_write(n); } else { - bio->bi_opf = REQ_OP_READ; if (dio->flags & IOMAP_DIO_DIRTY) bio_set_pages_dirty(bio); } diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 5bd3cac4df9c..8ebb1fa6f3b7 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -55,6 +55,7 @@ struct vm_fault; #define IOMAP_F_SHARED 0x04 #define IOMAP_F_MERGED 0x08 #define IOMAP_F_BUFFER_HEAD 0x10 +#define IOMAP_F_ZONE_APPEND 0x20 /* * Flags set by the core iomap code during operations: From patchwork Thu Feb 4 10:21:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066769 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A4B1C433E9 for ; Thu, 4 Feb 2021 10:25:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4B3326024A for ; Thu, 4 Feb 2021 10:25:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235389AbhBDKYf (ORCPT ); Thu, 4 Feb 2021 05:24:35 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54276 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234756AbhBDKYb (ORCPT ); Thu, 4 Feb 2021 05:24:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434271; x=1643970271; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=16Cyxz0AewxnSVyea76nd8i6lD1C3tgvgP+EVto14sQ=; b=GN02HbXTdhTmeHKkamwO04dKQI/FmALwtOveS0bSIS1BH3eyGNyIPAWS 7SIKAC3tcisYoYqPCxDHLvuLQAOC0UPSj7mopFZOPzQ0FPcJJFhMUxVvw 4KX7lx6+PSWWuY2KY0TsiTmN4hSOxx6DR+6jQSFjjICeHHyIdhGT1Hdhr izisqUg4qx9FxcJMAEMTckTBKRKFQaMMIpXVWIkYf+R9USmxIZJurqsYE sOetO4sTxUh9ClJHioNNaNbHfr0WSvjn+cTo+nb5PcwWEP15S7GgW5SWK WR72hxqKulzylQqryR+DQftyR65QZ5yMFCV8GOXBhE6dkumbRiuFnlB0w w==; IronPort-SDR: X+2pVH0DNMWWkv5xe3H3vanlaPFZNw2XVmLNPCO/Ng4ZFwx2356TpU4pkiG6SOoDmOW2eMUQ8A Quzow44D5x4DzdFvA0AlAnhi2gvjP1na3FWfRkkRXrxaly44UdYMqkhILcy4acvhzcy7WQ/3OH wkltbjtQLeTctT1tAku5dXhF3RfGQ0uJFFaDFsig78nLWB+eROOOqOK/R+bUz3/sfYxp/0bUAX UHCokMhTFrUzPlFQrGXg7L6KpdgMxxp9Vw8hJENMJeeT/XRh7anMexuLQFophulrkWWY2aniCL bGg= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159107949" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:22:58 +0800 IronPort-SDR: LpktwMOtvWI1mA3ziM0iACWr7HesnH1+HDkVXLyvDxIu1LBIDFl2a88AA6j3DOWjEso4hjmjcn k+bE6MQSWTXaxXoTgq/U8vtUY0gMjff95QZQReF57cal/iDmyE/MFaQWavyDdCXGGvuTtYN3AR xpq2hkIvOFsTPiYE1sMKgTQX0JGHr8hYNi45+8EE9P8TGrNqkkiIcunsUlJ11hMGVP/v3SQ0MJ x8vQ7ufpcup3qS+4OFuwGWsOU/89rXlhsC57X0vRijJ5mc4I6D0f2TCie04inApnka4Igfq1f2 egOlbd2G6HXyZDeSZMXwA7jc Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:02 -0800 IronPort-SDR: ld1Tx0eGtrkK4MrkO0Mb2kDmhBf+mKP9EmYoN5s/vIBIxoV2cbXGU9o6htFn4oI1JtaOpCsM/C 7mTZ3IRmPAlAV3ybd8qVzp9p5iYxKvjthVumxmEE5osPb/kAy99sYQp6LRspNKudfbcTbGCteO TcX1MQsyIcDHcZftg3PFQHD4YKOpnkB5mGtQeka+GZw3E5RrOQVjMMGEGA9dvBEVOYTYZr5poX NwCOrFMtqWN7op3ePRVY5cbJ5rujfSKzwG/iTtOssNMdxoWNvgWaBNJdKRdUOgH5g5ZIsBZWDb trs= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:22:57 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Anand Jain , Josef Bacik Subject: [PATCH v15 03/42] btrfs: zoned: defer loading zone info after opening trees Date: Thu, 4 Feb 2021 19:21:42 +0900 Message-Id: <214dd3a87be0f9bdb19a5be6fc8880cb832846d4.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is a preparation patch to implement zone emulation on a regular device. To emulate a zoned filesystem on a regular (non-zoned) device, we need to decide an emulated zone size. Instead of making it a compile-time static value, we'll make it configurable at mkfs time. Since we have one zone == one device extent restriction, we can determine the emulated zone size from the size of a device extent. We can extend btrfs_get_dev_zone_info() to show a regular device filled with conventional zones once the zone size is decided. The current call site of btrfs_get_dev_zone_info() during the mount process is earlier than loading the file system trees so that we don't know the size of a device extent at this point. Thus we can't slice a regular device to conventional zones. This patch introduces btrfs_get_dev_zone_info_all_devices to load the zone info for all the devices. And, it places this function in open_ctree() after loading the trees. Reviewed-by: Anand Jain Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 13 +++++++++++++ fs/btrfs/volumes.c | 4 ---- fs/btrfs/zoned.c | 25 +++++++++++++++++++++++++ fs/btrfs/zoned.h | 6 ++++++ 4 files changed, 44 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 71fab77873a5..2b6a3df765cd 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3333,6 +3333,19 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device if (ret) goto fail_tree_roots; + /* + * Get zone type information of zoned block devices. This will also + * handle emulation of a zoned filesystem if a regular device has the + * zoned incompat feature flag set. + */ + ret = btrfs_get_dev_zone_info_all_devices(fs_info); + if (ret) { + btrfs_err(fs_info, + "zoned: failed to read device zone info: %d", + ret); + goto fail_block_groups; + } + /* * If we have a uuid root and we're not being told to rescan we need to * check the generation here so we can set the diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 3948f5b50d11..07cd4742c123 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -669,10 +669,6 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices, clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state); device->mode = flags; - ret = btrfs_get_dev_zone_info(device); - if (ret != 0) - goto error_free_page; - fs_devices->open_devices++; if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) && device->devid != BTRFS_DEV_REPLACE_DEVID) { diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 41d27fefd306..0b1b1f38a196 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -143,6 +143,31 @@ static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, return 0; } +int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) +{ + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; + struct btrfs_device *device; + int ret = 0; + + /* fs_info->zone_size might not set yet. Use the incomapt flag here. */ + if (!btrfs_fs_incompat(fs_info, ZONED)) + return 0; + + mutex_lock(&fs_devices->device_list_mutex); + list_for_each_entry(device, &fs_devices->devices, dev_list) { + /* We can skip reading of zone info for missing devices */ + if (!device->bdev) + continue; + + ret = btrfs_get_dev_zone_info(device); + if (ret) + break; + } + mutex_unlock(&fs_devices->device_list_mutex); + + return ret; +} + int btrfs_get_dev_zone_info(struct btrfs_device *device) { struct btrfs_zoned_device_info *zone_info = NULL; diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 8abe2f83272b..eb47b7ad9ab1 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -25,6 +25,7 @@ struct btrfs_zoned_device_info { #ifdef CONFIG_BLK_DEV_ZONED int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone); +int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info); int btrfs_get_dev_zone_info(struct btrfs_device *device); void btrfs_destroy_dev_zone_info(struct btrfs_device *device); int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info); @@ -42,6 +43,11 @@ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, return 0; } +static inline int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) +{ + return 0; +} + static inline int btrfs_get_dev_zone_info(struct btrfs_device *device) { return 0; From patchwork Thu Feb 4 10:21:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066771 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EAAA2C43331 for ; Thu, 4 Feb 2021 10:25:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BFAE664F53 for ; Thu, 4 Feb 2021 10:25:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235427AbhBDKYu (ORCPT ); Thu, 4 Feb 2021 05:24:50 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54283 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235342AbhBDKYl (ORCPT ); Thu, 4 Feb 2021 05:24:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434281; x=1643970281; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TRIIwMxHz8AYXIAQwMcpVIPM838tqUOQe+pVMTcvumU=; b=M1LLpXsv+tU2J8lRVZiBNazUqXDSRYKYeFtPOgCPG5rEfTS03DKRQd3o Vxs9TxJD1um5RpJrmgFhSdG3kKtkTFmxUUXeVjQuFtOmF3/5IK3Mdd1/M 1/L5iMz9lyyovbOfCwKzkVkjzAWmNg8gCkOE75QiC3RS56Od99G5JcZ2/ Yo3sjhNpXtNIJj4XpVy7mBe9grlLFaW9alYeOQF1kleU2b5DZPWrHyRSJ QrFJoHrimCmm03V8neC7idM3On2rGoP1gI/U6lNw5JsCQgY6c640cc4FL hH5xgGaPHh3Mecq9T0CkVWqoaCgCqb/3752XDuTs8ahDIFIKMzEC99s7T w==; IronPort-SDR: hdXqqbTTjuvb9UAjzV0VKn5dLTnEMQw2Yy9tJqmfpR3LUNwHdSvBv2FDb2m/EtjXbwdF4qrZbW kMErPv/x6DYwM57zExnfDbL9xSUxCFnyYqr76q/pKWl7OcMK9sMov/WjrU0RrxQdp45SosjxcT ltK7ziE8Pbr6mj66tKmR2LCw1Pl96wwS14+d7ZL6niney3xPkN1QgyI+SflyjlLZPrsj9RWwI8 qr1hwlt19VEd1s1lygM1tvuEUl1uWuC2S41EtrLufRcCewgjB+8PyKnHjWnaApjRuVC0OIZW3X HW8= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159107952" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:00 +0800 IronPort-SDR: UdemrVGqbchGRWB3xF5muBnrJejNnHimSvtJr4hiLtgDKgLPTKXj7Xndkqlky51WltKKAts+dl TtmXH4Xz1AZ1zJBCBM+zjxQFb2k1S9Vfeyksq7IIBKXOa/a5KmpooxZNxCmlRMvExmOsp+BCNH vGCLnuIUwBB2ToW/cz+k7Jn50HYEhY1GsOnJCfqXh/PO1mkSZD+LMb451eVvzS1vwk+sCah9Lz BSPb8d3T6/4/V9AVoRAVkk1MmrXSxzXkLSghULWbS4DGCxe7OTYzkuM0LJPP9T+5vvjymg6Q/H awYSFTVz5IyjK/Z0HNRAZ5bE Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:03 -0800 IronPort-SDR: 0zUz16AZNbMTwZ8N0B/1YV3ome2lUnDX75znjW07ME+9ne2+25hN/XU960B5h/5wxcdNouT7sL oOKX2hHS8eIdLpL3aTUcvUgqtLp97olKFEZKybuxQiFl3PD+xWSkBqh6sF/jFt5ASfSxWwyoAe IjFBEy7qn+dRDTDjfgApWgKurnPUoea8zsYjaMEwIHyt6sQvLxkQdlwVSXinFIL/4TiDBwDcQw R8D/0q0Hs8ap8446jy03tSGuMBVlcYHXukbaC8YNfuz56vA/eKIHxNUdhuBRJGFRyJDBmYCmBE puc= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:22:59 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Anand Jain , Josef Bacik Subject: [PATCH v15 04/42] btrfs: zoned: use regular super block location on zone emulation Date: Thu, 4 Feb 2021 19:21:43 +0900 Message-Id: <906d62e60392f97e9ce10346ca7c79ff5f20e6da.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org A zoned btrfs filesystem currently has a superblock at the beginning of the superblock logging zones if the zones are conventional. This difference in superblock position causes a chicken-and-egg problem for filesystems with emulated zones. Since the device is a regular (non-zoned) device, we cannot know if the filesystem is regular or zoned while reading the superblock. But, to load the superblock, we need to see if it is emulated zoned or not. Place the superblocks at the same location as they are on regular btrfs on regular devices to solve the problem. It is possible because it's ensured that all the superblock locations are at an (emulated) conventional zone on regular devices. Reviewed-by: Anand Jain Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/zoned.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 0b1b1f38a196..8b3868088c5e 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -552,7 +552,13 @@ int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw, struct btrfs_zoned_device_info *zinfo = device->zone_info; u32 zone_num; - if (!zinfo) { + /* + * For a zoned filesystem on a non-zoned block device, use the same + * super block locations as regular filesystem. Doing so, the super + * block can always be retrieved and the zoned flag of the volume + * detected from the super block information. + */ + if (!bdev_is_zoned(device->bdev)) { *bytenr_ret = btrfs_sb_offset(mirror); return 0; } From patchwork Thu Feb 4 10:21:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066773 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7178EC43217 for ; Thu, 4 Feb 2021 10:25:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4553E64F59 for ; Thu, 4 Feb 2021 10:25:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235432AbhBDKYw (ORCPT ); Thu, 4 Feb 2021 05:24:52 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54215 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234873AbhBDKYn (ORCPT ); Thu, 4 Feb 2021 05:24:43 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434282; x=1643970282; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aX25ISm2mENeBvL+nhAQbvr1BG8yMXwYbvgZNdffXxQ=; b=NG8M1hl5tNFWTjePgWCbaoeRa/2Zy6NLjUgHX/sT4iw+0foleM2IhnKE K3OcHW+I6suNMOf1m5xPd/keFfVvy5DhmOzlNZmGEB4T3Iy9WfwHADXdG 4rkIyw0ruKpMs+zEwqLYH2L4VMM5V/1fcEiLUad4hSf9qL7xrpXugCwgh n375CnbbmguDVcaI0WyXoXWnzwOK7a8AOktWzrC2BWKWL7RNq0Wv7Rt0E 15rmDlf7+fl/43Q9FvzIi3nxYS4MO7tBrpygTaAuiTOYhICNz7xF4jSN8 SpVG+nAL+Xy2d7b93wzOL1Lq3wI89OcuyTszHy4ZS0YvRKwviuYKS77cx g==; IronPort-SDR: lQnE7O+XzLdZQjmgJI7WseIYhz3QBZh+jcBoJGazUT5Dk0CF/LItgoz6wi239nB8Y9ioJyeNnq je6NCGERW6KbRgwQOlv11+tArpO/1J3cdy0A5lo6Is8pnb41AzBJmi/uwcIilVgNe+zKvLlevb oSXYab+TV4RsAVGnIk0HNWJGodFcz0eNv7YD4WQF0mgGloXRqvCAknElJ8+PcYkayy5MKDB/yM Llloo0lTcRPpDXbjBIj9lG3MaGEEh1/QMTYgX4peNHytUICgD/uJTat41uXjMSaoV6Ik0cXfVK 6sU= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159107955" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:01 +0800 IronPort-SDR: okFE1leWb6OfgVCFB1GjuGWYiT4/lv0LXMF0BKTMrudi03u3bLBJz7a4ugr3qg9UmHSb7Hwox0 FsFqz/bLa/Da1CQWJlGppTp75Bq/++ZcCt0v+E/gRDgT67RTNTBfDT9zq4Tz5/FfzARpFP0Dnm ikwq+Gr7p2qpj84YSYG2mK8SeOSh/RWOdz4LirDYt7vpEAyu8RjuIpnZmBCmw2XisWFpA7iX/H Gq0fSJbGoWxTprZX1pF/Hkq28RFcSYRcBqPbJJ3dwGsCUZSxds0srE6pK3k8zoKEfD6wb292fh vBUevP2Fx+WneNQgBd8goR+n Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:05 -0800 IronPort-SDR: hCyeLxDs43q5kz1q2dPIt39fx0an8FuSoLoIsMnyiiNYz6YGT1zUesFHb82Muwm1WV5v0Ad3Lk V60Zmvd5mIq8/UNN1t41PRkXLrhoNDyl4W+3moq7HRDTpaKsc2JZULhrQZxAMfEq/J7uCnwas8 RxECLAtgUUlUNB2pEn0+Eg6w3tu37dSXY5Q/zc2mVEZrUuSSBHk0t8FGyHW9Lj0y+vaoHjXjvj sVYQ+sPyPuxoTqfawOHfZ4FR6aakj+zP2XY1hMi05F/wIEHK3x91YM0quxKs3SmMnz8z6e/jNQ VPs= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:00 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Johannes Thumshirn , Anand Jain , Josef Bacik Subject: [PATCH v15 05/42] btrfs: release path before calling to btrfs_load_block_group_zone_info Date: Thu, 4 Feb 2021 19:21:44 +0900 Message-Id: X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Since we have no write pointer in conventional zones, we cannot determine the allocation offset from it. Instead, we set the allocation offset after the highest addressed extent. This is done by reading the extent tree in btrfs_load_block_group_zone_info(). However, this function is called from btrfs_read_block_groups(), so the read lock for the tree node could be recursively taken. To avoid this unsafe locking scenario, release the path before reading the extent tree to get the allocation offset. Reviewed-by: Anand Jain Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Reviewed-by: David Sterba --- fs/btrfs/block-group.c | 38 +++++++++++++++++--------------------- 1 file changed, 17 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 5fa6b3d540f4..b8fbee70a897 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1810,24 +1810,8 @@ static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info) return ret; } -static void read_block_group_item(struct btrfs_block_group *cache, - struct btrfs_path *path, - const struct btrfs_key *key) -{ - struct extent_buffer *leaf = path->nodes[0]; - struct btrfs_block_group_item bgi; - int slot = path->slots[0]; - - cache->length = key->offset; - - read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot), - sizeof(bgi)); - cache->used = btrfs_stack_block_group_used(&bgi); - cache->flags = btrfs_stack_block_group_flags(&bgi); -} - static int read_one_block_group(struct btrfs_fs_info *info, - struct btrfs_path *path, + struct btrfs_block_group_item *bgi, const struct btrfs_key *key, int need_clear) { @@ -1842,7 +1826,9 @@ static int read_one_block_group(struct btrfs_fs_info *info, if (!cache) return -ENOMEM; - read_block_group_item(cache, path, key); + cache->length = key->offset; + cache->used = btrfs_stack_block_group_used(bgi); + cache->flags = btrfs_stack_block_group_flags(bgi); set_free_space_tree_thresholds(cache); @@ -2001,19 +1987,29 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) need_clear = 1; while (1) { + struct btrfs_block_group_item bgi; + struct extent_buffer *leaf; + int slot; + ret = find_first_block_group(info, path, &key); if (ret > 0) break; if (ret != 0) goto error; - btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); - ret = read_one_block_group(info, path, &key, need_clear); + leaf = path->nodes[0]; + slot = path->slots[0]; + + read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot), + sizeof(bgi)); + + btrfs_item_key_to_cpu(leaf, &key, slot); + btrfs_release_path(path); + ret = read_one_block_group(info, &bgi, &key, need_clear); if (ret < 0) goto error; key.objectid += key.offset; key.offset = 0; - btrfs_release_path(path); } btrfs_release_path(path); From patchwork Thu Feb 4 10:21:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066775 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB167C43219 for ; Thu, 4 Feb 2021 10:25:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8E53664F59 for ; Thu, 4 Feb 2021 10:25:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235422AbhBDKYz (ORCPT ); Thu, 4 Feb 2021 05:24:55 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54218 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235418AbhBDKYo (ORCPT ); Thu, 4 Feb 2021 05:24:44 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434283; x=1643970283; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=U4bs4DJ0P/ScSb3Q6D3WJGlEjJVRorabpYSryLjp8+M=; b=ZF34C+8IJT/PcLKhy+BsFVcbSD3+2TSpabTUdlfbrIicakgtRNoacNsE q5qkvIo42p5kBWthXOj3MK2kuLn451nLeqVKNVyH8DCDOQI7OyW4OD6Bm +So+iuPuM572I5GYGeZt6YBGhQth1DhM4CjpuDI4rmG+56BXC1Bqz6r+h 5B/oVSNnZBEHaAZd+cdhLaI4ixuzFpq9hToFnZKPisZlOCMOAG+V22nV4 hz87qrk9jOBwoGrGXMrI1gKUlyQJhFFpH/HsNilQsp+c8x69Mr4Goa83C 5isNRk0NpRxpGiHN1wNQQiCTYUGYMETvI5dRjbFOJeDBWCcrvWOR+gz/p g==; IronPort-SDR: 6cvzcoV68ziO0hPhfsd1VtExqYrO8BDMEUWH+BKdmQlnPvmwj9qHU41MLM7e+ODOmZ1Xw0veZT 2Zpxk/nFaJWLwxrztpSype1mjwQ39rcfU9CtjxhvaGRiS4B+TFtWxUHEKxROWNZoNWVMQSYKM8 TtxSdFXY36tjkUDNUHl2OPDgF7TVbVSxSspY/5iGLJ0g8x/s4qB9mNJjHDyJeUlABUQ+wjjBsu m1UcGJBAka530QWV8/N6nqVdutsC3rsUoVvsNHEggEv3pVrzI2NBj6sevbjRk0EhuXDKaai+PN pfg= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159107958" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:03 +0800 IronPort-SDR: hNEEP3zVU/bk+/Tc+NlJwFtv9CrxjuufjBsioT5ajEdJssiwZBAoadn9qhgHCzEi+k6nSba6NA J7AZA7hiSpySXzSX64ZAIgWGE4RGYwP1cdfi3Awso89zXpP8uBQzMXaPlIlj2m+JJX/uoU6Wez k8Vm9YybReocSTEo0dKNM+D3uD6NOQ+jEjKpNW45jxpol/4t/jz+QnnfcbIQpAv+UuLyBjl3SQ G0lIiNoZi1rgu5oAash3YkiA2dtNx43ds+g0tggdVHy9sp+IoTqqqnz2ZsRBQWwi7qKNT6S6Ek jmPIyOpNWxsyGrbRJtYCbT/p Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:06 -0800 IronPort-SDR: ckvgsBSoSqum216cYx4doWVsajQEzWpqDlEA03RrH+kNCGwHJvus5v1S6a3O8STdFSSs/q8nRo DsMw0aivbGz5bcrNu0nSzhm4Jl7iCnagGbvaXSooeW+qsQotOtAYb7qkjvcAECNQlCs2RnHLCH EmaJfx1RLWrkoc71c1Uq2FA8JhoXAxGI9U7w6VG+kjh6D0vTcygxa7nDJagxI5qw/qWQWByi8v L3ISR48D4kuEgwyH7+dNxTIa4vv9CGSRyTuMS5QQJ8XEVbVkZ4VWH3dhbkNkThE1n4T6bsssB0 Xcg= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:02 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Johannes Thumshirn , Anand Jain , Josef Bacik Subject: [PATCH v15 06/42] btrfs: zoned: do not load fs_info::zoned from incompat flag Date: Thu, 4 Feb 2021 19:21:45 +0900 Message-Id: <7f1f1e2a02db66b3bd65ac1d8cd046de75997b04.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Don't set the zoned flag in fs_info as soon as we're encountering the incompat filesystem flag for a zoned filesystem on mount. The zoned flag in fs_info is in a union together with the zone_size, so setting it too early will result in setting an incorrect zone_size as well. Once the correct zone_size is read from the device, we can rely on the zoned flag in fs_info as well to determine if the filesystem is zoned. Reviewed-by: Anand Jain Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Reviewed-by: David Sterba --- fs/btrfs/disk-io.c | 2 -- fs/btrfs/zoned.c | 8 ++++++++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 2b6a3df765cd..8551b0fc1b22 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3201,8 +3201,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device if (features & BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA) btrfs_info(fs_info, "has skinny extents"); - fs_info->zoned = (features & BTRFS_FEATURE_INCOMPAT_ZONED); - /* * flag our filesystem as having big metadata blocks if * they are bigger than the page size diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 8b3868088c5e..c0840412ccb6 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -432,6 +432,14 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) fs_info->zone_size = zone_size; fs_info->max_zone_append_size = max_zone_append_size; + /* + * Check mount options here, because we might change fs_info->zoned + * from fs_info->zone_size. + */ + ret = btrfs_check_mountopts_zoned(fs_info); + if (ret) + goto out; + btrfs_info(fs_info, "zoned mode enabled with zone size %llu", zone_size); out: return ret; From patchwork Thu Feb 4 10:21:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066777 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CF5CC4321A for ; Thu, 4 Feb 2021 10:25:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 28E3764F59 for ; Thu, 4 Feb 2021 10:25:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235438AbhBDKZC (ORCPT ); Thu, 4 Feb 2021 05:25:02 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54222 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235426AbhBDKYq (ORCPT ); Thu, 4 Feb 2021 05:24:46 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434285; x=1643970285; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qpUfcyJz0ETmW7m5EHZ0mNntFqrL1+/NKpO/9cRcCas=; b=Ba8McwchA0YeUDRtg44TNSIrP54BGmyURkGXgct8kLtIAu6B7UUj7Ulx OQilF8qXcMVa2Z48GYoAh/KC+M5ZNGs1NVltzqT/e6n8aXdhQ5FH3c1t0 YtGkkoIwS8P3ime/MMOQyo0OrdFtlMKUoavHWsqZhA8zpfza2gFsvMjD6 /H++L9LKa1s9Bkx7ofBN96+WbQhzGvROKJ5t6RD3MGzTIKF0T+dEmMiBo pTiz2nw8C7XUAjHHdfekchks8DrbAZQmgPCeyPzo8ppytloY+Q45M6pKL b1vdruSOlm9SNbXlnBQrPxChM6o7rNZnfC2bRx+H0GRMCJLTStAW0yOnI w==; IronPort-SDR: CbufNk0Nt6Q3qwnC8LKOGxNruCDPv5uvCVmMdHlMLA8ZhzJjvxqg8vEV/vhsJ2B9MOdU2NOzfO dKCHW+EMh+Pb4fsIMCutQstnuBFSEzJSoq9X4x8DpEG6slYK3O8r7oWSJpt/NmcDEbdQvvPmT8 g5UMs6ARfd8AgYpIoXWkpiD9Bcew20Jjvgcs4aXc4mVatQpWoRh0lwC1KIpBjXGV+qSUecB652 DoXCgvrRk+xQtucjodKDgAUaXScyGCFcXAdNLIMsM7qnh6HXI5BSmCglSSbfXnRavk1ZFnGyrM I1c= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159107963" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:04 +0800 IronPort-SDR: XdjxP13to6yRqpA9+HpC4VGNLN5A5NZuomFUn/jXAetjJx2AzVQtVQn4PZoANz4GSixu/HUGPM hKZG9zcCcj6h6RXsmtoWt5J+CeRgXwpj/eU3WQ132fp4gFyuHeuBwKQuF/ZKNUUB8JZ1MCSO2L LOYNrxihbRdvUD1nM+iW+cy3r7joVo2VX/0xJWIu2phOckJC9oJDxrtBL0K2FJEhJNg4wDigGZ F7u1t2QYF32Xa9cD4uZ39bZabDyO71S1myP24m15JWkDEy097k3vl9jcsszo2M2I2N+xr3kK9u Pdds4S0JUl2ShmKXDmh1FHUQ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:08 -0800 IronPort-SDR: CMD2FnSEUezPbjcXxW2Q7Fc/Ui6JM7ZeyPIoftLz7qhmMQsyYHexhpUpIDSJlFiklAN8fVtm5P Tt3xpT9S2ngbuL1jS1EXiWD0KcuAk9kEOohGq2hrlQ9o32GgFXARzIDbMmI8429nAb7CPjztsS jTgBXurnm3D2h3ggwE32fmlrEwqYPp4v7+y7TAio+7gXY5VeXMiTbXTxibr6N6dB72uSO7gdcA nFUBzPRF90wLaauWSqYNo0Bg2iS+fnREjdispr45U6d5lr+WXqyOmf2NJmt2nThT8jrS+J0dYC YH4= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:03 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Anand Jain , Josef Bacik Subject: [PATCH v15 07/42] btrfs: zoned: disallow fitrim on zoned filesystems Date: Thu, 4 Feb 2021 19:21:46 +0900 Message-Id: <0fe38cc20347fd835887cdd8b979dc266dffa6bf.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The implementation of fitrim depends on space cache, which is not used and disabled for zoned extent allocator. So the current code does not work with zoned filesystem. In the future, we can implement fitrim for zoned filesystems by enabling space cache (but, only for fitrim) or scanning the extent tree at fitrim time. For now, disallow fitrim on zoned filesystems. Reviewed-by: Anand Jain Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota Reviewed-by: David Sterba --- fs/btrfs/ioctl.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index e6a63f652235..a8c60d46d19c 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -527,6 +527,14 @@ static noinline int btrfs_ioctl_fitrim(struct btrfs_fs_info *fs_info, if (!capable(CAP_SYS_ADMIN)) return -EPERM; + /* + * btrfs_trim_block_group() depends on space cache, which is not + * available in zoned filesystem. So, disallow fitrim on a zoned + * filesystem for now. + */ + if (btrfs_is_zoned(fs_info)) + return -EOPNOTSUPP; + /* * If the fs is mounted with nologreplay, which requires it to be * mounted in RO mode as well, we can not allow discard on free space From patchwork Thu Feb 4 10:21:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066779 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAB3DC433E0 for ; Thu, 4 Feb 2021 10:25:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 92B6A64F59 for ; Thu, 4 Feb 2021 10:25:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235443AbhBDKZJ (ORCPT ); Thu, 4 Feb 2021 05:25:09 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54296 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235342AbhBDKYv (ORCPT ); Thu, 4 Feb 2021 05:24:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434290; x=1643970290; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tFmga0zpE5h/qEWiiRiqzeAesgg2CotIY9FERMsEfS4=; b=IYQC8RycgeYL8t4GDiceIoiaiN01iuAIR66cvvCXggwnZhfxAWOefcPd Jd3RGs/h366mkmhi56m4IQ7lbdG/Dsr0xjvKsCduAdM/GAlShn+T/71KK OGN9vkCVYI4fAWTghqboOT7J+Ro7P5NtqwqchZqi+XXK7PcFbU25s4EIz 6h04/BYpE7o6/oPCwcQOiYlMFpRADChTH0Xgj0Guv1Uwz8PBKbo9oyuw5 We2R12nuIsZrxPysfQUPlLBu1dfB3cmz8eYFIefEXDrutwlkRAi3m5Ijd ITlJaN+taJ4Hys6iWlO9wAckz0eAmcTE9RMNAAAVLM6Q85rvGns7dtEP3 Q==; IronPort-SDR: CoAuhkSH/4aTyy3OHPvKuKqPqICm1+CrMwznAN2w8RWNA7kcJyI7hDj6e1HxPSCWnDHWCqAJaB TL+1jgN03VIzi0UgIrgVZcEizSfpPcIVC9R/Zsmje6qzil9mh9+EdQ6bJx9w2VvL3DVM5wK9F4 H+oLNgDKqGEaQfiI0Z6x9r/tLca7sUUhMrPvtMGzG79oTzWrUAcn/y0hi0WjybfQ5xe38vd+xW 0sKE9wBdwZUDNPHMJvTv2cSt83lOPitaWlAHqUGMMov53xCsRRPFoyDBGA+YIpMBcypk3Gct24 oNs= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159107966" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:06 +0800 IronPort-SDR: Uh+YGsmEqh5Wi5FmiJVpXJM57BLx75D0sKhH/v8NOb35m10gI+L7SOoB58QrpY4asyLPgPFaeA dI4K+pp7j6yloXpe4PJcxPL0z8GKxnxYe/PZGL0gkpVsSpzsXwWwskXt0S7xRUhNu2wrHewad1 HXc3R51iAKK7ZVEBiVe/X3ClkwtHa3Qks+qgKN2vqW5KbbpVDvwdpWxauor+0SWemQXHa7Nur+ xSV09N7BBz2cR0QhLekFtswsS1ilMcUI0XRMAIaUcy1KCJ9nTwMHhDKzd6rPKFNZVhCJiUi/vX FwlwkcigcX4jXukMiEmeeUM8 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:09 -0800 IronPort-SDR: USCrpSqdYH1k8E9FaArJdQklps9USrfOZsXHCY0fGUeF0gtuK+HPQu+RrfIwqpJdPP1qHP10to tnnvTIFYDxVFt+aFDmHA3ccrkFMg2qvzQaSOpWvM0N8nWcPi4R3nad4lRkljOddOOq7FCBV/Kh 5MsrM6Phqgi0f3pB+YZt16RaUiFUxAxXNUtxgmblmlq7aAwnNzM6qROGuJ5+of+aZPIOz2zuiJ NrfR5T5a5fzxSHF6YQWqRemCoOfN3b3gN6PTu9uJvFuCvvx8OyKdJqEvkeKM02HXnaU8j+jm7/ nws= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:05 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Johannes Thumshirn , Anand Jain , Josef Bacik , Naohiro Aota Subject: [PATCH v15 08/42] btrfs: zoned: allow zoned filesystems on non-zoned block devices Date: Thu, 4 Feb 2021 19:21:47 +0900 Message-Id: <98cbd6adf3ad2c27f3b422c750cada92a2ebce74.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn Run a zoned filesystem on non-zoned devices. This is done by "slicing up" the block device into static sized chunks and fake a conventional zone on each of them. The emulated zone size is determined from the size of device extent. This is mainly aimed at testing parts of zoned filesystems, i.e. the zoned chunk allocator, on regular block devices. Reviewed-by: Anand Jain Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota Reviewed-by: David Sterba --- fs/btrfs/zoned.c | 150 +++++++++++++++++++++++++++++++++++++++++++---- fs/btrfs/zoned.h | 14 +++-- 2 files changed, 148 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index c0840412ccb6..6699f626a86e 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -119,6 +119,36 @@ static inline u32 sb_zone_number(int shift, int mirror) return 0; } +/* + * Emulate blkdev_report_zones() for a non-zoned device. It slices up the block + * device into static sized chunks and fake a conventional zone on each of + * them. + */ +static int emulate_report_zones(struct btrfs_device *device, u64 pos, + struct blk_zone *zones, unsigned int nr_zones) +{ + const sector_t zone_sectors = device->fs_info->zone_size >> SECTOR_SHIFT; + sector_t bdev_size = bdev_nr_sectors(device->bdev); + unsigned int i; + + pos >>= SECTOR_SHIFT; + for (i = 0; i < nr_zones; i++) { + zones[i].start = i * zone_sectors + pos; + zones[i].len = zone_sectors; + zones[i].capacity = zone_sectors; + zones[i].wp = zones[i].start + zone_sectors; + zones[i].type = BLK_ZONE_TYPE_CONVENTIONAL; + zones[i].cond = BLK_ZONE_COND_NOT_WP; + + if (zones[i].wp >= bdev_size) { + i++; + break; + } + } + + return i; +} + static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, struct blk_zone *zones, unsigned int *nr_zones) { @@ -127,6 +157,12 @@ static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, if (!*nr_zones) return 0; + if (!bdev_is_zoned(device->bdev)) { + ret = emulate_report_zones(device, pos, zones, *nr_zones); + *nr_zones = ret; + return 0; + } + ret = blkdev_report_zones(device->bdev, pos >> SECTOR_SHIFT, *nr_zones, copy_zone_info_cb, zones); if (ret < 0) { @@ -143,6 +179,50 @@ static int btrfs_get_dev_zones(struct btrfs_device *device, u64 pos, return 0; } +/* The emulated zone size is determined from the size of device extent */ +static int calculate_emulated_zone_size(struct btrfs_fs_info *fs_info) +{ + struct btrfs_path *path; + struct btrfs_root *root = fs_info->dev_root; + struct btrfs_key key; + struct extent_buffer *leaf; + struct btrfs_dev_extent *dext; + int ret = 0; + + key.objectid = 1; + key.type = BTRFS_DEV_EXTENT_KEY; + key.offset = 0; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + if (ret < 0) + goto out; + + if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) { + ret = btrfs_next_item(root, path); + if (ret < 0) + goto out; + /* No dev extents at all? Not good */ + if (ret > 0) { + ret = -EUCLEAN; + goto out; + } + } + + leaf = path->nodes[0]; + dext = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_dev_extent); + fs_info->zone_size = btrfs_dev_extent_length(leaf, dext); + ret = 0; + +out: + btrfs_free_path(path); + + return ret; +} + int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) { struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; @@ -170,6 +250,7 @@ int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info) int btrfs_get_dev_zone_info(struct btrfs_device *device) { + struct btrfs_fs_info *fs_info = device->fs_info; struct btrfs_zoned_device_info *zone_info = NULL; struct block_device *bdev = device->bdev; struct request_queue *queue = bdev_get_queue(bdev); @@ -178,9 +259,14 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device) struct blk_zone *zones = NULL; unsigned int i, nreported = 0, nr_zones; unsigned int zone_sectors; + char *model, *emulated; int ret; - if (!bdev_is_zoned(bdev)) + /* + * Cannot use btrfs_is_zoned here, since fs_info::zone_size might not + * yet be set. + */ + if (!btrfs_fs_incompat(fs_info, ZONED)) return 0; if (device->zone_info) @@ -190,8 +276,20 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device) if (!zone_info) return -ENOMEM; + if (!bdev_is_zoned(bdev)) { + if (!fs_info->zone_size) { + ret = calculate_emulated_zone_size(fs_info); + if (ret) + goto out; + } + + ASSERT(fs_info->zone_size); + zone_sectors = fs_info->zone_size >> SECTOR_SHIFT; + } else { + zone_sectors = bdev_zone_sectors(bdev); + } + nr_sectors = bdev_nr_sectors(bdev); - zone_sectors = bdev_zone_sectors(bdev); /* Check if it's power of 2 (see is_power_of_2) */ ASSERT(zone_sectors != 0 && (zone_sectors & (zone_sectors - 1)) == 0); zone_info->zone_size = zone_sectors << SECTOR_SHIFT; @@ -297,20 +395,42 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device) device->zone_info = zone_info; - /* device->fs_info is not safe to use for printing messages */ - btrfs_info_in_rcu(NULL, - "host-%s zoned block device %s, %u zones of %llu bytes", - bdev_zoned_model(bdev) == BLK_ZONED_HM ? "managed" : "aware", - rcu_str_deref(device->name), zone_info->nr_zones, - zone_info->zone_size); + switch (bdev_zoned_model(bdev)) { + case BLK_ZONED_HM: + model = "host-managed zoned"; + emulated = ""; + break; + case BLK_ZONED_HA: + model = "host-aware zoned"; + emulated = ""; + break; + case BLK_ZONED_NONE: + model = "regular"; + emulated = "emulated "; + break; + default: + /* Just in case */ + btrfs_err_in_rcu(fs_info, "zoned: unsupported model %d on %s", + bdev_zoned_model(bdev), + rcu_str_deref(device->name)); + ret = -EOPNOTSUPP; + goto out_free_zone_info; + } + + btrfs_info_in_rcu(fs_info, + "%s block device %s, %u %szones of %llu bytes", + model, rcu_str_deref(device->name), zone_info->nr_zones, + emulated, zone_info->zone_size); return 0; out: kfree(zones); +out_free_zone_info: bitmap_free(zone_info->empty_zones); bitmap_free(zone_info->seq_zones); kfree(zone_info); + device->zone_info = NULL; return ret; } @@ -349,7 +469,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) u64 nr_devices = 0; u64 zone_size = 0; u64 max_zone_append_size = 0; - const bool incompat_zoned = btrfs_is_zoned(fs_info); + const bool incompat_zoned = btrfs_fs_incompat(fs_info, ZONED); int ret = 0; /* Count zoned devices */ @@ -360,9 +480,17 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) continue; model = bdev_zoned_model(device->bdev); + /* + * A Host-Managed zoned device must be used as a zoned device. + * A Host-Aware zoned device and a non-zoned devices can be + * treated as a zoned device, if ZONED flag is enabled in the + * superblock. + */ if (model == BLK_ZONED_HM || - (model == BLK_ZONED_HA && incompat_zoned)) { - struct btrfs_zoned_device_info *zone_info; + (model == BLK_ZONED_HA && incompat_zoned) || + (model == BLK_ZONED_NONE && incompat_zoned)) { + struct btrfs_zoned_device_info *zone_info = + device->zone_info; zone_info = device->zone_info; zoned_devices++; diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index eb47b7ad9ab1..5e78786bb723 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -142,12 +142,16 @@ static inline void btrfs_dev_clear_zone_empty(struct btrfs_device *device, u64 p static inline bool btrfs_check_device_zone_type(const struct btrfs_fs_info *fs_info, struct block_device *bdev) { - u64 zone_size; - if (btrfs_is_zoned(fs_info)) { - zone_size = bdev_zone_sectors(bdev) << SECTOR_SHIFT; - /* Do not allow non-zoned device */ - return bdev_is_zoned(bdev) && fs_info->zone_size == zone_size; + /* + * We can allow a regular device on a zoned filesystem, because + * we will emulate the zoned capabilities. + */ + if (!bdev_is_zoned(bdev)) + return true; + + return fs_info->zone_size == + (bdev_zone_sectors(bdev) << SECTOR_SHIFT); } /* Do not allow Host Manged zoned device */ From patchwork Thu Feb 4 10:21:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066781 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05287C433E6 for ; Thu, 4 Feb 2021 10:25:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B5D9164F5E for ; Thu, 4 Feb 2021 10:25:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235450AbhBDKZ3 (ORCPT ); Thu, 4 Feb 2021 05:25:29 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54276 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235447AbhBDKZQ (ORCPT ); Thu, 4 Feb 2021 05:25:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434315; x=1643970315; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VPsep94AHvQi6vgwoRzvkvHKPEZhNtQ7t4Hn4BEAp9s=; b=isKqckr+B4zKOPNAmFeG9f4SmZcXOxUVHac6sxHiZFOQtOp8MhktXDEA gmpxWfrA7UKDvdA9bq5KKiEMPvQ8Hxj3Ps2yHz0xfsQpzkP+Oir3RiQkM DUj71nf/zmX1gqJ49VFN/Uvi6eb1qPVKoCKoRoLwkSgAJcg2YcmHIXUyc b3x5H1d+3tasaBETGQt278iawKrfQKkYD8WBIS0jf1bEurTCXSA2sg1vU 3a51RfhfWoDECj+fqA13esu5PBFx8XW3NJ39v2xSgB8+XKz3Z1vNfPpje xzvuQF/frZXw/VfaDOwLX1UebcL30wTsmha7h4b/j/yf0eceqHMKvHEP0 Q==; IronPort-SDR: SV9u0HLe5YQcbMfODMvpRr28WmgGarAb6MwlGGZ6oe1OG/FflNymq45YE1rUubVVQOX1+MaeL+ k+ImcoVtyKMbedVWjMvot4jUP/x7mjK/GcC7VKRnTZLwhQxpNOlGXUqSc8n1YV8a10Vuj6u50b QyLtgLi5pWOSqTYnB6HE/s4En6lnp7nRe7vZKK9Hj5Fog0R/Fn6DGz15bF/dQgneG3VFsbzued 3O8nwS3WHG7jpAxNTTLYPFlbcqKf9Ui+wF+QPUt+Bi82MXvZCsvQKwHYukJmYj5FTLE7tt0r3t 1I4= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159107973" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:07 +0800 IronPort-SDR: qLoKAFc2y5u+e3CHMXf52J1W03W6W5icSy3uCjgT4XRoaF27hYtjkJplGGtc6CZ+f+84pPHbIF A/pYpKrs8FeuyJyaOgOy/asOtEmjETWHL8re7FtAdpZE8w/5Z9OAk7517XZ6cSh0KBTcq3dkX4 5xWeaplmrD8tfFBMIzxLeP6+INezbwGoQ3lBqsvHtSkKL7WZrPg030Yf+uNt+WdgcBCz5D7q2M 4pw05gw+yr1kQyFpNXkpwlyCouTmacIYFuICh5SAy1W3QncbrSTkXRbHjKdKlbJNvp1nUY6B97 D6BIIUItYTla9uERqcO5fZwo Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:11 -0800 IronPort-SDR: kv2FYL92BcNMUkwhdthEfqvVNYxGd7usGv3BCSA5ECM4DUhTtGcO/Sg5wuS2wjD18Lw4//Bxym Wr2jRd5hFGAunTa5558c7Ol+ZNjiwUKu7c5H6bR5h5TkOkYISwjfmegC3FfztJNjNcdTIwE9uq +vvVuSjScdKyH6SzkYRBE13KNYSwN3myblqW9H03+IdL64uW0m/Dlux1O+H5Tj+9q9LgdZUMc1 VqsTWDGlxVMltoRwqmeMYvQL0SFNcXR75OARjhb+O0e4PJN3LTD0V8In9354D9iqm7FJjwFRc8 3mQ= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:06 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Anand Jain , Josef Bacik Subject: [PATCH v15 09/42] btrfs: zoned: implement zoned chunk allocator Date: Thu, 4 Feb 2021 19:21:48 +0900 Message-Id: <8edb48e9b518e17e1a47f29ba3b10e013be45da1.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Implement a zoned chunk and device extent allocator. One device zone becomes a device extent so that a zone reset affects only this device extent and does not change the state of blocks in the neighbor device extents. To implement the allocator, we need to extend the following functions for a zoned filesystem. - init_alloc_chunk_ctl - dev_extent_search_start - dev_extent_hole_check - decide_stripe_size init_alloc_chunk_ctl_zoned() is mostly the same as regular one. It always set the stripe_size to the zone size and aligns the parameters to the zone size. dev_extent_search_start() only aligns the start offset to zone boundaries. We don't care about the first 1MB like in regular btrfs because we anyway reserve the first two zones for superblock logging. dev_extent_hole_check_zoned() checks if zones in given hole are either conventional or empty sequential zones. Also, it skips zones reserved for superblock logging. With the change to the hole, the new hole may now contain pending extents. So, in this case, loop again to check that. Finally, decide_stripe_size_zoned() should shrink the number of devices instead of stripe size because we need to honor stripe_size == zone_size. Reviewed-by: Anand Jain Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/volumes.c | 171 ++++++++++++++++++++++++++++++++++++++++----- fs/btrfs/volumes.h | 1 + fs/btrfs/zoned.c | 141 +++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 25 +++++++ 4 files changed, 321 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 07cd4742c123..ae2aeadad5a0 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1414,11 +1414,62 @@ static u64 dev_extent_search_start(struct btrfs_device *device, u64 start) * make sure to start at an offset of at least 1MB. */ return max_t(u64, start, SZ_1M); + case BTRFS_CHUNK_ALLOC_ZONED: + /* + * We don't care about the starting region like regular + * allocator, because we anyway use/reserve the first two zones + * for superblock logging. + */ + return ALIGN(start, device->zone_info->zone_size); default: BUG(); } } +static bool dev_extent_hole_check_zoned(struct btrfs_device *device, + u64 *hole_start, u64 *hole_size, + u64 num_bytes) +{ + u64 zone_size = device->zone_info->zone_size; + u64 pos; + int ret; + bool changed = false; + + ASSERT(IS_ALIGNED(*hole_start, zone_size)); + + while (*hole_size > 0) { + pos = btrfs_find_allocatable_zones(device, *hole_start, + *hole_start + *hole_size, + num_bytes); + if (pos != *hole_start) { + *hole_size = *hole_start + *hole_size - pos; + *hole_start = pos; + changed = true; + if (*hole_size < num_bytes) + break; + } + + ret = btrfs_ensure_empty_zones(device, pos, num_bytes); + + /* Range is ensured to be empty */ + if (!ret) + return changed; + + /* Given hole range was invalid (outside of device) */ + if (ret == -ERANGE) { + *hole_start += *hole_size; + *hole_size = 0; + return 1; + } + + *hole_start += zone_size; + *hole_size -= zone_size; + changed = true; + } + + return changed; +} + /** * dev_extent_hole_check - check if specified hole is suitable for allocation * @device: the device which we have the hole @@ -1426,7 +1477,7 @@ static u64 dev_extent_search_start(struct btrfs_device *device, u64 start) * @hole_size: the size of the hole * @num_bytes: the size of the free space that we need * - * This function may modify @hole_start and @hole_end to reflect the suitable + * This function may modify @hole_start and @hole_size to reflect the suitable * position for allocation. Returns 1 if hole position is updated, 0 otherwise. */ static bool dev_extent_hole_check(struct btrfs_device *device, u64 *hole_start, @@ -1435,24 +1486,39 @@ static bool dev_extent_hole_check(struct btrfs_device *device, u64 *hole_start, bool changed = false; u64 hole_end = *hole_start + *hole_size; - /* - * Check before we set max_hole_start, otherwise we could end up - * sending back this offset anyway. - */ - if (contains_pending_extent(device, hole_start, *hole_size)) { - if (hole_end >= *hole_start) - *hole_size = hole_end - *hole_start; - else - *hole_size = 0; - changed = true; - } + for (;;) { + /* + * Check before we set max_hole_start, otherwise we could end up + * sending back this offset anyway. + */ + if (contains_pending_extent(device, hole_start, *hole_size)) { + if (hole_end >= *hole_start) + *hole_size = hole_end - *hole_start; + else + *hole_size = 0; + changed = true; + } + + switch (device->fs_devices->chunk_alloc_policy) { + case BTRFS_CHUNK_ALLOC_REGULAR: + /* No extra check */ + break; + case BTRFS_CHUNK_ALLOC_ZONED: + if (dev_extent_hole_check_zoned(device, hole_start, + hole_size, num_bytes)) { + changed = true; + /* + * The changed hole can contain pending extent. + * Loop again to check that. + */ + continue; + } + break; + default: + BUG(); + } - switch (device->fs_devices->chunk_alloc_policy) { - case BTRFS_CHUNK_ALLOC_REGULAR: - /* No extra check */ break; - default: - BUG(); } return changed; @@ -1505,6 +1571,9 @@ static int find_free_dev_extent_start(struct btrfs_device *device, search_start = dev_extent_search_start(device, search_start); + WARN_ON(device->zone_info && + !IS_ALIGNED(num_bytes, device->zone_info->zone_size)); + path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -4899,6 +4968,37 @@ static void init_alloc_chunk_ctl_policy_regular( ctl->dev_extent_min = BTRFS_STRIPE_LEN * ctl->dev_stripes; } +static void init_alloc_chunk_ctl_policy_zoned( + struct btrfs_fs_devices *fs_devices, + struct alloc_chunk_ctl *ctl) +{ + u64 zone_size = fs_devices->fs_info->zone_size; + u64 limit; + int min_num_stripes = ctl->devs_min * ctl->dev_stripes; + int min_data_stripes = (min_num_stripes - ctl->nparity) / ctl->ncopies; + u64 min_chunk_size = min_data_stripes * zone_size; + u64 type = ctl->type; + + ctl->max_stripe_size = zone_size; + if (type & BTRFS_BLOCK_GROUP_DATA) { + ctl->max_chunk_size = round_down(BTRFS_MAX_DATA_CHUNK_SIZE, + zone_size); + } else if (type & BTRFS_BLOCK_GROUP_METADATA) { + ctl->max_chunk_size = ctl->max_stripe_size; + } else if (type & BTRFS_BLOCK_GROUP_SYSTEM) { + ctl->max_chunk_size = 2 * ctl->max_stripe_size; + ctl->devs_max = min_t(int, ctl->devs_max, + BTRFS_MAX_DEVS_SYS_CHUNK); + } + + /* We don't want a chunk larger than 10% of writable space */ + limit = max(round_down(div_factor(fs_devices->total_rw_bytes, 1), + zone_size), + min_chunk_size); + ctl->max_chunk_size = min(limit, ctl->max_chunk_size); + ctl->dev_extent_min = zone_size * ctl->dev_stripes; +} + static void init_alloc_chunk_ctl(struct btrfs_fs_devices *fs_devices, struct alloc_chunk_ctl *ctl) { @@ -4919,6 +5019,9 @@ static void init_alloc_chunk_ctl(struct btrfs_fs_devices *fs_devices, case BTRFS_CHUNK_ALLOC_REGULAR: init_alloc_chunk_ctl_policy_regular(fs_devices, ctl); break; + case BTRFS_CHUNK_ALLOC_ZONED: + init_alloc_chunk_ctl_policy_zoned(fs_devices, ctl); + break; default: BUG(); } @@ -5045,6 +5148,38 @@ static int decide_stripe_size_regular(struct alloc_chunk_ctl *ctl, return 0; } +static int decide_stripe_size_zoned(struct alloc_chunk_ctl *ctl, + struct btrfs_device_info *devices_info) +{ + u64 zone_size = devices_info[0].dev->zone_info->zone_size; + /* Number of stripes that count for block group size */ + int data_stripes; + + /* + * It should hold because: + * dev_extent_min == dev_extent_want == zone_size * dev_stripes + */ + ASSERT(devices_info[ctl->ndevs - 1].max_avail == ctl->dev_extent_min); + + ctl->stripe_size = zone_size; + ctl->num_stripes = ctl->ndevs * ctl->dev_stripes; + data_stripes = (ctl->num_stripes - ctl->nparity) / ctl->ncopies; + + /* stripe_size is fixed in zoned filesysmte. Reduce ndevs instead. */ + if (ctl->stripe_size * data_stripes > ctl->max_chunk_size) { + ctl->ndevs = div_u64(div_u64(ctl->max_chunk_size * ctl->ncopies, + ctl->stripe_size) + ctl->nparity, + ctl->dev_stripes); + ctl->num_stripes = ctl->ndevs * ctl->dev_stripes; + data_stripes = (ctl->num_stripes - ctl->nparity) / ctl->ncopies; + ASSERT(ctl->stripe_size * data_stripes <= ctl->max_chunk_size); + } + + ctl->chunk_size = ctl->stripe_size * data_stripes; + + return 0; +} + static int decide_stripe_size(struct btrfs_fs_devices *fs_devices, struct alloc_chunk_ctl *ctl, struct btrfs_device_info *devices_info) @@ -5072,6 +5207,8 @@ static int decide_stripe_size(struct btrfs_fs_devices *fs_devices, switch (fs_devices->chunk_alloc_policy) { case BTRFS_CHUNK_ALLOC_REGULAR: return decide_stripe_size_regular(ctl, devices_info); + case BTRFS_CHUNK_ALLOC_ZONED: + return decide_stripe_size_zoned(ctl, devices_info); default: BUG(); } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 04e2b26823c2..598ac225176d 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -214,6 +214,7 @@ BTRFS_DEVICE_GETSET_FUNCS(bytes_used); enum btrfs_chunk_allocation_policy { BTRFS_CHUNK_ALLOC_REGULAR, + BTRFS_CHUNK_ALLOC_ZONED, }; /* diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 6699f626a86e..69fd0d078b9b 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1,11 +1,13 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include #include #include "ctree.h" #include "volumes.h" #include "zoned.h" #include "rcu-string.h" +#include "disk-io.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -559,6 +561,7 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) fs_info->zone_size = zone_size; fs_info->max_zone_append_size = max_zone_append_size; + fs_info->fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_ZONED; /* * Check mount options here, because we might change fs_info->zoned @@ -779,3 +782,141 @@ int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror) sb_zone << zone_sectors_shift, zone_sectors * BTRFS_NR_SB_LOG_ZONES, GFP_NOFS); } + +/** + * btrfs_find_allocatable_zones - find allocatable zones within a given region + * + * @device: the device to allocate a region on + * @hole_start: the position of the hole to allocate the region + * @num_bytes: size of wanted region + * @hole_end: the end of the hole + * @return: position of allocatable zones + * + * Allocatable region should not contain any superblock locations. + */ +u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, + u64 hole_end, u64 num_bytes) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + const u8 shift = zinfo->zone_size_shift; + u64 nzones = num_bytes >> shift; + u64 pos = hole_start; + u64 begin, end; + bool have_sb; + int i; + + ASSERT(IS_ALIGNED(hole_start, zinfo->zone_size)); + ASSERT(IS_ALIGNED(num_bytes, zinfo->zone_size)); + + while (pos < hole_end) { + begin = pos >> shift; + end = begin + nzones; + + if (end > zinfo->nr_zones) + return hole_end; + + /* Check if zones in the region are all empty */ + if (btrfs_dev_is_sequential(device, pos) && + find_next_zero_bit(zinfo->empty_zones, end, begin) != end) { + pos += zinfo->zone_size; + continue; + } + + have_sb = false; + for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { + u32 sb_zone; + u64 sb_pos; + + sb_zone = sb_zone_number(shift, i); + if (!(end <= sb_zone || + sb_zone + BTRFS_NR_SB_LOG_ZONES <= begin)) { + have_sb = true; + pos = ((u64)sb_zone + BTRFS_NR_SB_LOG_ZONES) << shift; + break; + } + + /* We also need to exclude regular superblock positions */ + sb_pos = btrfs_sb_offset(i); + if (!(pos + num_bytes <= sb_pos || + sb_pos + BTRFS_SUPER_INFO_SIZE <= pos)) { + have_sb = true; + pos = ALIGN(sb_pos + BTRFS_SUPER_INFO_SIZE, + zinfo->zone_size); + break; + } + } + if (!have_sb) + break; + } + + return pos; +} + +int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, + u64 length, u64 *bytes) +{ + int ret; + + *bytes = 0; + ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_RESET, + physical >> SECTOR_SHIFT, length >> SECTOR_SHIFT, + GFP_NOFS); + if (ret) + return ret; + + *bytes = length; + while (length) { + btrfs_dev_set_zone_empty(device, physical); + physical += device->zone_info->zone_size; + length -= device->zone_info->zone_size; + } + + return 0; +} + +int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size) +{ + struct btrfs_zoned_device_info *zinfo = device->zone_info; + const u8 shift = zinfo->zone_size_shift; + unsigned long begin = start >> shift; + unsigned long end = (start + size) >> shift; + u64 pos; + int ret; + + ASSERT(IS_ALIGNED(start, zinfo->zone_size)); + ASSERT(IS_ALIGNED(size, zinfo->zone_size)); + + if (end > zinfo->nr_zones) + return -ERANGE; + + /* All the zones are conventional */ + if (find_next_bit(zinfo->seq_zones, begin, end) == end) + return 0; + + /* All the zones are sequential and empty */ + if (find_next_zero_bit(zinfo->seq_zones, begin, end) == end && + find_next_zero_bit(zinfo->empty_zones, begin, end) == end) + return 0; + + for (pos = start; pos < start + size; pos += zinfo->zone_size) { + u64 reset_bytes; + + if (!btrfs_dev_is_sequential(device, pos) || + btrfs_dev_is_empty_zone(device, pos)) + continue; + + /* Free regions should be empty */ + btrfs_warn_in_rcu( + device->fs_info, + "zoned: resetting device %s (devid %llu) zone %llu for allocation", + rcu_str_deref(device->name), device->devid, pos >> shift); + WARN_ON_ONCE(1); + + ret = btrfs_reset_device_zone(device, pos, zinfo->zone_size, + &reset_bytes); + if (ret) + return ret; + } + + return 0; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 5e78786bb723..6c8f83c48c2e 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -36,6 +36,11 @@ int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw, u64 *bytenr_ret); void btrfs_advance_sb_log(struct btrfs_device *device, int mirror); int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror); +u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, + u64 hole_end, u64 num_bytes); +int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, + u64 length, u64 *bytes); +int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -91,6 +96,26 @@ static inline int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror return 0; } +static inline u64 btrfs_find_allocatable_zones(struct btrfs_device *device, + u64 hole_start, u64 hole_end, + u64 num_bytes) +{ + return hole_start; +} + +static inline int btrfs_reset_device_zone(struct btrfs_device *device, + u64 physical, u64 length, u64 *bytes) +{ + *bytes = 0; + return 0; +} + +static inline int btrfs_ensure_empty_zones(struct btrfs_device *device, + u64 start, u64 size) +{ + return 0; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Thu Feb 4 10:21:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066783 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A026C433DB for ; Thu, 4 Feb 2021 10:25:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D81F564F53 for ; Thu, 4 Feb 2021 10:25:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235456AbhBDKZh (ORCPT ); Thu, 4 Feb 2021 05:25:37 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54283 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235451AbhBDKZa (ORCPT ); Thu, 4 Feb 2021 05:25:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434329; x=1643970329; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=h2MTSnZ4cDqTo+Aig7VYuEA+zyZqGpjVEWEtDj3dIdQ=; b=Qve978VDbCf1/BsUqqRdDwQkeGfUmzmKIxMggYEbSaJ1RqnFlYV6zV03 f8g1T+0H4FYHkFOUzZtB1vrFUYG6ZoXRn8Slb9CN+KJH0/XPqYtujbDei S25Hy63YuqQ+ZIyDLJZvxuMxLacAVHRyYSkgrAGrWhdyP0S6+Ns8Yzub6 dBYib15PpCOEQijpVauuB7vQm/KTGrc1Wk6t3h8CIFa5QHqbo5j764XWe scGU4DKUnghh/O/azFThksbNcut8kYhzgksteUrVvRQLhrde8FJS+Sfh9 isPd8y8WY2fNyNj/ziDwSVa55KHFfwzFEBb+sXxA21lmYY46O8INbE6DS g==; IronPort-SDR: s6Keji1o/bDymm6tUmdQWYpu5wqw/cW3xTVxIUqWhAQ7VPTAe+dQ2Vj16NYAVceLhXu/0rAVSl CLSE6K0AkqiXOeMjiEQ4jjUeyKo0PWSpRTmlC05GasPKa8TW+z+jbO+E7t79nh0LWencj4Lg3r /3+t8mr8sh1WWlANk/7ZyqHmLeyLZpa9zAYahoNvOOR5XHel2s1MM2lF1f6bArM2z4By7stxac 8lrvj+xJdR3mjMTtKDs1f/yaflm8R65Tp8e02RZ7cjYZdlVnHqTAgFFVE/JgUyTWcqToxjnjXg DVY= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159107981" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:09 +0800 IronPort-SDR: JUTH/mKsPqIk9RKSAjOTjhH17syHrzClvYXDUQl7hIdSM0bQwEe9M9jRHWezgoOQAhQq9sYaGg ycBQGwGq0LhcrKXecgCVL4LDIMHQtSzFWG5Gj7/+TFo2/w/tP3QGzg1OPSumiBmqj4QtuyZ1Lw LaUc62fhJnxDcZcpX+MAfm3B26dPHpDwu3mkEAQxYSGJcFTSTF9oDcVFqbHZYkOBO3+vVb4XNl p6KjO4HQlS6rGtICPz7Ac+hvRoH/g3V4+wzEWIpPXxSZfGGSnmSVi8eKtNzRvul15d+MgNI0HX BpJ+vnGuqbTHmmOthKywTU2m Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:12 -0800 IronPort-SDR: 4iQ2RSgTu720Bly2Cw4Qa557lZER+oqpRuZEvsMs3Du3Zc5Ry+MKiXHb+si+5NwGI3uZecIZUO D1osJSCd2jEukbPY6zq7pqZ1oonVfJRcOp+hnz0MjLMJa5FPxWxLYNYglgoZrQOe7/ExmLr66U +cClry/GMu+xHF6DSLHuLMeAshtjOuCfkqfgJ5bEECyVPkuQGwWwFo/Lu1qjk7eSNNkvOSysaI oezeRJGBFbck6sIJKOLr1CuAmKCKjflI+qq5/qKKnAcl5VdE3w/4rqd4P9aXCxvldMwn1yjloE KOI= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:08 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Anand Jain , Josef Bacik Subject: [PATCH v15 10/42] btrfs: zoned: verify device extent is aligned to zone Date: Thu, 4 Feb 2021 19:21:49 +0900 Message-Id: <09a33e303c77de18547ebc2319cfc1a070da49c8.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add a check in verify_one_dev_extent() to ensure that a device extent on a zoned block device is aligned to the respective zone boundary. If it isn't mark the filesystem as unclean. Reviewed-by: Anand Jain Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota Reviewed-by: David Sterba --- fs/btrfs/volumes.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index ae2aeadad5a0..10401def16ef 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7769,6 +7769,20 @@ static int verify_one_dev_extent(struct btrfs_fs_info *fs_info, ret = -EUCLEAN; goto out; } + + if (dev->zone_info) { + u64 zone_size = dev->zone_info->zone_size; + + if (!IS_ALIGNED(physical_offset, zone_size) || + !IS_ALIGNED(physical_len, zone_size)) { + btrfs_err(fs_info, +"zoned: dev extent devid %llu physical offset %llu len %llu is not aligned to device zone", + devid, physical_offset, physical_len); + ret = -EUCLEAN; + goto out; + } + } + out: free_extent_map(em); return ret; From patchwork Thu Feb 4 10:21:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066785 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D918C433E0 for ; Thu, 4 Feb 2021 10:25:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E5B6064F53 for ; Thu, 4 Feb 2021 10:25:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235460AbhBDKZo (ORCPT ); Thu, 4 Feb 2021 05:25:44 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54215 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235453AbhBDKZb (ORCPT ); Thu, 4 Feb 2021 05:25:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434331; x=1643970331; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=z8VlTnyIcDaxgL61bDA0A5lCZ+U5igbJKWk4dbiquHY=; b=Vd1wCZPkV5qsbhuScsg2rgLcLjKxwOJ2bde80+3eZeHmuLVvOzsOrP85 y09i+xBZRF8g9/hoJupKuW3jwCLYkhQmC1UPNmIPdwsOpatJfu1ok5C13 rv4/Zt+g7ULY8EcZ0lyg9JeOdbuvMWcTQY+ivOeKwqfsrXSdR43eswNXF rbluZnRZctuprz5VB4kMXyxbxAo81MV2ygN6aEoUwWG1GC+oB2TKH1DDl Wz5dlZ43wEOs4ZUFESXQJdq+a6jj1WS+D2gXSh3TYhOFlId6pBquz/W0U vnP78STjHtDTirK9u8qj9uPrIQNC5EDVnCEmTGaQmnLJ5XuS32G/n1iyC w==; IronPort-SDR: VG1TBHjrB8t5bVhOde/BdYmYlMh28CBqcHlvT4PgnSpsyoolq9C1fOLsRQ+LySTLsstT9V+74K l4ZZTKm02BgbB+nzXshu50C7hiVdRmGQNRlghLeQAq1L8pU5AvkMMJ+dVk2oLKo4AY8kYrZXOu M/MNT0Cj7O32V+FcpVF3OQ766AWItvOnkp+NWomEELeR9yhg5eko8kBaHsUhYTB7ufvL5dfJiJ J9fIJ0RvR6vnDvLBd4wbjSz6iI1ewQaE223t7zjbUaw01ochR5+C3ZoM5fWiBZtkdZ/DW2zdAm T1U= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159107984" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:10 +0800 IronPort-SDR: pegBevxDu/rUzhN/QPnfud6ctKOAkyAKhidWfbCQhrP0GiVol4/Yod2DEM2789gh+RksKD6PIS UWV9uQrolOv+X5jJFTyziXeCwShIKnPx5jngVeUtxBdhyEf1Vmal2w2eFbYuEyKlk7p80vNSbm 7tlPlkdqiGhlrDzkMFddlXpYC21JqJXB1m0q63tzks3brTowTLPgSBBBh31QcOU/g6IN11dlLn bXQSIyEZBKoZAGs4NBBDdOpmn5gS33SeaisVYvsuTVWXgoAtFTN5rlyuyxU2zdK5pzLAUXgH7n 2A9vs5BfBIxRdePeYMghbN+W Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:14 -0800 IronPort-SDR: e09WwBtrlQmCINfTFr705NVKp7LVst3JIpCeY3gcjlcxO94xhxy6XhSDGX9fm5uhMyx+b0uwtD Jt9an7oufLiEBlSYs7+/rBGiGF5RuGVbZQF+SJmzDPE6Wg2jz2U/v1HNgyLAEgQLcF29yhqX6S SNi0z1lL8KTn7Jz2CrZNOyqQjkLXk+VWXKxkWJhxLeWQAgXGmutzdXqtKhrGeSm9myRZHhi3IS TFUcdSLEsZgsd1HUqfFPY7N10O93SjSymshmNOwc1VbENq2FdLYFrB+UAtGRwjq2J8NKJeQFQ6 Iok= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:09 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik , Anand Jain Subject: [PATCH v15 11/42] btrfs: zoned: load zone's allocation offset Date: Thu, 4 Feb 2021 19:21:50 +0900 Message-Id: <9577a622c61d443199b6ec7ad4bc57730391805c.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org A zoned filesystem must allocate blocks at the zones' write pointer. The device's write pointer position can be mapped to a logical address within a block group. To facilitate this, add an "alloc_offset" to the block-group to track the logical addresses of the write pointer. This logical address is populated in btrfs_load_block_group_zone_info() from the write pointers of corresponding zones. For now, zoned filesystemzoned filesystems the single profile. Supporting non-single profile with zone append writing is not trivial. For example, in the dup profile, we send a zone append writing IO to two zones on a device. The device reply with written LBAs for the IOs. If the offsets of the returned addresses from the beginning of the zone are different, then it results in different logical addresses. We need fine-grained logical to physical mapping to support such separated physical address issue. Since it should require additional metadata type, disable non-single profiles for now. This commit supports the case all the zones in a block group are sequential. The next patch will handle the case having a conventional zone. Reviewed-by: Josef Bacik Reviewed-by: Anand Jain Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 15 ++++ fs/btrfs/block-group.h | 6 ++ fs/btrfs/zoned.c | 151 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 7 ++ 4 files changed, 179 insertions(+) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index b8fbee70a897..e6bf728496eb 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -15,6 +15,7 @@ #include "delalloc-space.h" #include "discard.h" #include "raid56.h" +#include "zoned.h" /* * Return target flags in extended format or 0 if restripe for this chunk_type @@ -1855,6 +1856,13 @@ static int read_one_block_group(struct btrfs_fs_info *info, goto error; } + ret = btrfs_load_block_group_zone_info(cache); + if (ret) { + btrfs_err(info, "zoned: failed to load zone info of bg %llu", + cache->start); + goto error; + } + /* * We need to exclude the super stripes now so that the space info has * super bytes accounted for, otherwise we'll think we have more space @@ -2141,6 +2149,13 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, cache->cached = BTRFS_CACHE_FINISHED; if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) cache->needs_free_space = 1; + + ret = btrfs_load_block_group_zone_info(cache); + if (ret) { + btrfs_put_block_group(cache); + return ret; + } + ret = exclude_super_stripes(cache); if (ret) { /* We may have excluded something, so call this just in case */ diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 8f74a96074f7..224946fa9bed 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -183,6 +183,12 @@ struct btrfs_block_group { /* Record locked full stripes for RAID5/6 block group */ struct btrfs_full_stripe_locks_tree full_stripe_locks_root; + + /* + * Allocation offset for the block group to implement sequential + * allocation. This is used only on a zoned filesystem. + */ + u64 alloc_offset; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 69fd0d078b9b..0a7cd00f405f 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -3,14 +3,20 @@ #include #include #include +#include #include "ctree.h" #include "volumes.h" #include "zoned.h" #include "rcu-string.h" #include "disk-io.h" +#include "block-group.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 +/* Invalid allocation pointer value for missing devices */ +#define WP_MISSING_DEV ((u64)-1) +/* Pseudo write pointer value for conventional zone */ +#define WP_CONVENTIONAL ((u64)-2) /* Number of superblock log zones */ #define BTRFS_NR_SB_LOG_ZONES 2 @@ -920,3 +926,148 @@ int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size) return 0; } + +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct extent_map_tree *em_tree = &fs_info->mapping_tree; + struct extent_map *em; + struct map_lookup *map; + struct btrfs_device *device; + u64 logical = cache->start; + u64 length = cache->length; + u64 physical = 0; + int ret; + int i; + unsigned int nofs_flag; + u64 *alloc_offsets = NULL; + u32 num_sequential = 0, num_conventional = 0; + + if (!btrfs_is_zoned(fs_info)) + return 0; + + /* Sanity check */ + if (!IS_ALIGNED(length, fs_info->zone_size)) { + btrfs_err(fs_info, + "zoned: block group %llu len %llu unaligned to zone size %llu", + logical, length, fs_info->zone_size); + return -EIO; + } + + /* Get the chunk mapping */ + read_lock(&em_tree->lock); + em = lookup_extent_mapping(em_tree, logical, length); + read_unlock(&em_tree->lock); + + if (!em) + return -EINVAL; + + map = em->map_lookup; + + alloc_offsets = kcalloc(map->num_stripes, sizeof(*alloc_offsets), GFP_NOFS); + if (!alloc_offsets) { + free_extent_map(em); + return -ENOMEM; + } + + for (i = 0; i < map->num_stripes; i++) { + bool is_sequential; + struct blk_zone zone; + + device = map->stripes[i].dev; + physical = map->stripes[i].physical; + + if (device->bdev == NULL) { + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } + + is_sequential = btrfs_dev_is_sequential(device, physical); + if (is_sequential) + num_sequential++; + else + num_conventional++; + + if (!is_sequential) { + alloc_offsets[i] = WP_CONVENTIONAL; + continue; + } + + /* + * This zone will be used for allocation, so mark this zone + * non-empty. + */ + btrfs_dev_clear_zone_empty(device, physical); + + /* + * The group is mapped to a sequential zone. Get the zone write + * pointer to determine the allocation offset within the zone. + */ + WARN_ON(!IS_ALIGNED(physical, fs_info->zone_size)); + nofs_flag = memalloc_nofs_save(); + ret = btrfs_get_dev_zone(device, physical, &zone); + memalloc_nofs_restore(nofs_flag); + if (ret == -EIO || ret == -EOPNOTSUPP) { + ret = 0; + alloc_offsets[i] = WP_MISSING_DEV; + continue; + } else if (ret) { + goto out; + } + + switch (zone.cond) { + case BLK_ZONE_COND_OFFLINE: + case BLK_ZONE_COND_READONLY: + btrfs_err(fs_info, + "zoned: offline/readonly zone %llu on device %s (devid %llu)", + physical >> device->zone_info->zone_size_shift, + rcu_str_deref(device->name), device->devid); + alloc_offsets[i] = WP_MISSING_DEV; + break; + case BLK_ZONE_COND_EMPTY: + alloc_offsets[i] = 0; + break; + case BLK_ZONE_COND_FULL: + alloc_offsets[i] = fs_info->zone_size; + break; + default: + /* Partially used zone */ + alloc_offsets[i] = + ((zone.wp - zone.start) << SECTOR_SHIFT); + break; + } + } + + if (num_conventional > 0) { + /* + * Since conventional zones do not have a write pointer, we + * cannot determine alloc_offset from the pointer + */ + ret = -EINVAL; + goto out; + } + + switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { + case 0: /* single */ + cache->alloc_offset = alloc_offsets[0]; + break; + case BTRFS_BLOCK_GROUP_DUP: + case BTRFS_BLOCK_GROUP_RAID1: + case BTRFS_BLOCK_GROUP_RAID0: + case BTRFS_BLOCK_GROUP_RAID10: + case BTRFS_BLOCK_GROUP_RAID5: + case BTRFS_BLOCK_GROUP_RAID6: + /* non-single profiles are not supported yet */ + default: + btrfs_err(fs_info, "zoned: profile %s not yet supported", + btrfs_bg_type_to_raid_name(map->type)); + ret = -EINVAL; + goto out; + } + +out: + kfree(alloc_offsets); + free_extent_map(em); + + return ret; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 6c8f83c48c2e..4f3152d7b98f 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -41,6 +41,7 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, u64 length, u64 *bytes); int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -116,6 +117,12 @@ static inline int btrfs_ensure_empty_zones(struct btrfs_device *device, return 0; } +static inline int btrfs_load_block_group_zone_info( + struct btrfs_block_group *cache) +{ + return 0; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Thu Feb 4 10:21:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066787 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34B75C433E6 for ; Thu, 4 Feb 2021 10:25:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D837064F5E for ; Thu, 4 Feb 2021 10:25:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235464AbhBDKZx (ORCPT ); Thu, 4 Feb 2021 05:25:53 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54218 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235447AbhBDKZf (ORCPT ); Thu, 4 Feb 2021 05:25:35 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434334; x=1643970334; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=k89HllQgQhDV9Goh6+4HF/2BCilNlOqC3R0cz3qYdpQ=; b=oKf/g6pRFuwNEB3Xo88HDQiQvn+M824yQHp43FS4fMw7ZONJ1kQXPqLH +rPQHzTJ8ohiHbt9paKzWfDeTMIB5ZE+TF6nc7/MWMvIK2BspyCOcLO77 rujND2wnZey4fyyWdOKlLd4p+kYsktfFfP4I5k5yUmsxrnsfSooFWomxs /TLHLrtQqGTtRaWf5gUG+kjwVlseYGtwBIszMduPx6xYBs1yKTC2Hk7uy Dyi71QTsEJ0nw3C5RlamorT9L5+r2K61bEgSTMPPG+D2fsu5csL1Jgo6h Aab8fi1pavtTy/R7HjhessTn3s0NehfXz6W2UCPEGrQMhwGeGf2ZDtoji g==; IronPort-SDR: wP+qsawWUx76MeH2ClvmK+VaH5x/yWXs8yXmyfS/6LulKHEo6yGygeufhN4rzRD3NOkzRA8FMg MLhI9d32my0uUHyWxFkd5MPWykxZz3p9uv4SwQ+1Tc1JeGbFIG6s8J9COFJ1BQAFoOm3JrJQO3 BH702kADVwW2b+3qK1OApbuRbM7auSqdkM3lus+jW8U05BXO+gT8BeaHkOCmrQwcA+n7dC23Ff MJyxsnTj9jtozO1JRYNLk/HF7ujTr7SWFV1PFmG4XQGdIry8qmQH+Dre09t01qUo2aq73bnZYt 5HQ= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159107988" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:12 +0800 IronPort-SDR: X0aLQ7GZVEAY6cOBR0+p4+nxECNvJdA44xf8AVbl0b33nIbUmJqEbpJo1HtYdHlgon3xUIUwcN VlextXNuBh5WnmY/A8+iWy8f6DTZzEGpIsKeXJG+JWXMoFTZ8YokcM3uhZ3A8rrN42uAuzvfL8 Pc1WLLH7nSDjccgFAzpIGp5Dka6fzkcWbBvxYAP804C4yKSU+dnp0ZBFkH1lZrdGOx+4DihNG4 HH13VBaZ1GOvcOPSM42bhYGewqQD+5A1iJcMxejRiqtNd97SHheCaSmtdYOox/RvgVcGaq22Lm a+P4bVRbkUotcGG7k0aY9wNt Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:15 -0800 IronPort-SDR: hcU2WxSWHITiNaxQ8kuyfDFAcJ75xgkLuyR2bg2/h7ZZmP3/9o9C5zUw8I/jwe3+kpEOytshbw iKR/02D0PPuyqMhaOCTp15QnCRidwYhbq1PcXDfKps1yZzCVSmCDVAZhD5BeoeRCJ3G/SuW/ZC QxB6x1nhUzyZI+CNd2ZYxYlF609zevM/VtAKHqdC5l5sIkE/rwIS0ZGWmWBuFtkqRjoceO4G0n 7x/sN1bW4quaGzT5+fSqKhq6yJgZr2oHNb4Nbyffc6BCi+Ayi+oJvtkEPWNy290DVJPIslUAlF aHs= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:11 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik , Anand Jain Subject: [PATCH v15 12/42] btrfs: zoned: calculate allocation offset for conventional zones Date: Thu, 4 Feb 2021 19:21:51 +0900 Message-Id: <7761072d41cc3be28575721df50647fb265a871f.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Conventional zones do not have a write pointer, so we cannot use it to determine the allocation offset for sequential allocation if a block group contains a conventional zone. But instead, we can consider the end of the highest addressed extent in the block group for the allocation offset. For new block group, we cannot calculate the allocation offset by consulting the extent tree, because it can cause deadlock by taking extent buffer lock after chunk mutex, which is already taken in btrfs_make_block_group(). Since it is a new block group anyways, we can simply set the allocation offset to 0. Reviewed-by: Josef Bacik Reviewed-by: Anand Jain Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 4 +- fs/btrfs/zoned.c | 99 +++++++++++++++++++++++++++++++++++++++--- fs/btrfs/zoned.h | 4 +- 3 files changed, 98 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index e6bf728496eb..6d10874189df 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1856,7 +1856,7 @@ static int read_one_block_group(struct btrfs_fs_info *info, goto error; } - ret = btrfs_load_block_group_zone_info(cache); + ret = btrfs_load_block_group_zone_info(cache, false); if (ret) { btrfs_err(info, "zoned: failed to load zone info of bg %llu", cache->start); @@ -2150,7 +2150,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) cache->needs_free_space = 1; - ret = btrfs_load_block_group_zone_info(cache); + ret = btrfs_load_block_group_zone_info(cache, true); if (ret) { btrfs_put_block_group(cache); return ret; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 0a7cd00f405f..b892566a1c93 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -927,7 +927,68 @@ int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size) return 0; } -int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) +/* + * Calculate an allocation pointer from the extent allocation information + * for a block group consist of conventional zones. It is pointed to the + * end of the highest addressed extent in the block group as an allocation + * offset. + */ +static int calculate_alloc_pointer(struct btrfs_block_group *cache, + u64 *offset_ret) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct btrfs_root *root = fs_info->extent_root; + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_key found_key; + int ret; + u64 length; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + key.objectid = cache->start + cache->length; + key.type = 0; + key.offset = 0; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + /* We should not find the exact match */ + if (!ret) + ret = -EUCLEAN; + if (ret < 0) + goto out; + + ret = btrfs_previous_extent_item(root, path, cache->start); + if (ret) { + if (ret == 1) { + ret = 0; + *offset_ret = 0; + } + goto out; + } + + btrfs_item_key_to_cpu(path->nodes[0], &found_key, path->slots[0]); + + if (found_key.type == BTRFS_EXTENT_ITEM_KEY) + length = found_key.offset; + else + length = fs_info->nodesize; + + if (!(found_key.objectid >= cache->start && + found_key.objectid + length <= cache->start + cache->length)) { + ret = -EUCLEAN; + goto out; + } + *offset_ret = found_key.objectid + length - cache->start; + ret = 0; + +out: + btrfs_free_path(path); + return ret; +} + +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) { struct btrfs_fs_info *fs_info = cache->fs_info; struct extent_map_tree *em_tree = &fs_info->mapping_tree; @@ -941,6 +1002,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) int i; unsigned int nofs_flag; u64 *alloc_offsets = NULL; + u64 last_alloc = 0; u32 num_sequential = 0, num_conventional = 0; if (!btrfs_is_zoned(fs_info)) @@ -1040,11 +1102,30 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) if (num_conventional > 0) { /* - * Since conventional zones do not have a write pointer, we - * cannot determine alloc_offset from the pointer + * Avoid calling calculate_alloc_pointer() for new BG. It + * is no use for new BG. It must be always 0. + * + * Also, we have a lock chain of extent buffer lock -> + * chunk mutex. For new BG, this function is called from + * btrfs_make_block_group() which is already taking the + * chunk mutex. Thus, we cannot call + * calculate_alloc_pointer() which takes extent buffer + * locks to avoid deadlock. */ - ret = -EINVAL; - goto out; + if (new) { + cache->alloc_offset = 0; + goto out; + } + ret = calculate_alloc_pointer(cache, &last_alloc); + if (ret || map->num_stripes == num_conventional) { + if (!ret) + cache->alloc_offset = last_alloc; + else + btrfs_err(fs_info, + "zoned: failed to determine allocation offset of bg %llu", + cache->start); + goto out; + } } switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { @@ -1066,6 +1147,14 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache) } out: + /* An extent is allocated after the write pointer */ + if (!ret && num_conventional && last_alloc > cache->alloc_offset) { + btrfs_err(fs_info, + "zoned: got wrong write pointer in BG %llu: %llu > %llu", + logical, last_alloc, cache->alloc_offset); + ret = -EIO; + } + kfree(alloc_offsets); free_extent_map(em); diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 4f3152d7b98f..d27db3993e51 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -41,7 +41,7 @@ u64 btrfs_find_allocatable_zones(struct btrfs_device *device, u64 hole_start, int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, u64 length, u64 *bytes); int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); -int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache); +int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -118,7 +118,7 @@ static inline int btrfs_ensure_empty_zones(struct btrfs_device *device, } static inline int btrfs_load_block_group_zone_info( - struct btrfs_block_group *cache) + struct btrfs_block_group *cache, bool new) { return 0; } From patchwork Thu Feb 4 10:21:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066789 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F374C433E0 for ; Thu, 4 Feb 2021 10:26:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 201D064F5E for ; Thu, 4 Feb 2021 10:26:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235467AbhBDK0D (ORCPT ); Thu, 4 Feb 2021 05:26:03 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54222 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235452AbhBDKZm (ORCPT ); Thu, 4 Feb 2021 05:25:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434341; x=1643970341; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=trJ2EGQlEiBqjqsTPQXLmbjJsHUH3vhP9lv2CW9qCIQ=; b=h8JH3CRUxlgqMfJRn1VsZNsWNBGWJz+JZksHuL9WKdp9qgVPOg8Opf2F rvoVVjvLiFiV8qv7cDYwY8nNpgAM/xiSlrmDQliJ7ZmXNlH5bfWHGUWpT 8R4hanxoMorTZRjNomf17aZMO3C+uS3dS/dFmgdP0HHLAK4zWlhgDOI5F cuiqG/gDwcjJC3Cw2VydRSjzTxque50Df/zQtqsUVbl3ibgSqEW9EVCyp djNc/5sfOQqr+LgbgOyffmMt162DjOlYYDofdxInXiVxl0v8oj5ZTYZPT 3b7NN8TF36tMGtJEsja9gvYDVB7tgYyG7ksXLsYIixryRa8jqKgRf6tcD Q==; IronPort-SDR: 78news2EPghfgoEc1uJCN3pgO/htCI8NDNomxtqR7P0LGJnGfWnTgeSlPbLdbeKnZDbJpQDMRg 9UFf1xQWeG4101cyzimFjPYdX51u3AwjNhKt+U7sGmcHNpk3qpiXOKggubS/TG34Fnqwf+o9vZ B53bK9ul0PVkfmzOVi//i12M1sTjn+yUvVzZM0DUgSws+IBAtVo9zlOVTjQXMq8vTMgr12abDJ ZF5SuWpC0TYUBb/GNx+Y26Fa5NXSNs/N/UwTrIY9TtNRsh115hGdZpqli6xm78wztKw1QMFKAo 0SI= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159107991" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:13 +0800 IronPort-SDR: ISITwNvU0uymYOAnzHwglUya1DyC8cCp9N4gmCMQy5mPL+hh43fKsamfQRNrgibJlk21ci/a5T dIaOfELa3EmLHrIBBevZDOHQa3dZIPooI3Z9OTnHGE2lELPyMDtKgtpuvljiZf3bLAizrjJtga 10b9/SOa6UCJADz27pBOag/llK6+ka2rUzRLcNq6NnPVD09kvykojrFp7HUVagHO71nL1WCcHj Un7RLyvw7Lj9DIPG3f5kETQw2Ao/2JIx1QgBcmfOTeuWKUshzfF/4NzGtfcC2esdIWBhzi+R03 jCDxolfkCXwdmZs0Kml3zLXj Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:17 -0800 IronPort-SDR: cvNif6zSiZBYjxwfWvC4tnj5UG/DfI4AIiwD+nK3F+1k9CHAufMP/DV81KF1x6LvVc/8ZdIZ96 wXxLKdCqHw7F91OCns+CyWPx8uTBjwWmTux0gWRKitxl/lE2LWNFmyaizpK17vbrCeu0Y8crUQ TXzj1jCraORc0j/V1h3a0tDhZzCTFXWxOB6I79lVvbQ6R0ZTtuGtjhjhSudHNG78MiIKu07dlP BXSSDlo251XXX9MYnXEYY1GKTLlux6LKGKL5nRw4ZGToUevcE+qFkpKFUinBuoLJIcAeL2nl7R XIQ= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:12 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 13/42] btrfs: zoned: track unusable bytes for zones Date: Thu, 4 Feb 2021 19:21:52 +0900 Message-Id: <6cec864c4cad33c0064e6d3623a96390899103ef.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In a zoned filesystem a once written then freed region is not usable until the underlying zone has been reset. So we need to distinguish such unusable space from usable free space. Therefore we need to introduce the "zone_unusable" field to the block group structure, and "bytes_zone_unusable" to the space_info structure to track the unusable space. Pinned bytes are always reclaimed to the unusable space. But, when an allocated region is returned before using e.g., the block group becomes read-only between allocation time and reservation time, we can safely return the region to the block group. For the situation, this commit introduces "btrfs_add_free_space_unused". This behaves the same as btrfs_add_free_space() on regular filesystem. On zoned filesystems, it rewinds the allocation offset. Because the read-only bytes tracks free but unusable bytes when the block group is read-only, we need to migrate the zone_unusable bytes to read-only bytes when a block group is marked read-only. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 51 +++++++++++++++++++++------- fs/btrfs/block-group.h | 1 + fs/btrfs/extent-tree.c | 5 +++ fs/btrfs/free-space-cache.c | 67 +++++++++++++++++++++++++++++++++++++ fs/btrfs/free-space-cache.h | 2 ++ fs/btrfs/space-info.c | 13 ++++--- fs/btrfs/space-info.h | 4 ++- fs/btrfs/sysfs.c | 2 ++ fs/btrfs/zoned.c | 21 ++++++++++++ fs/btrfs/zoned.h | 3 ++ 10 files changed, 151 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 6d10874189df..e4444d4dd4b5 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1009,12 +1009,17 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, WARN_ON(block_group->space_info->total_bytes < block_group->length); WARN_ON(block_group->space_info->bytes_readonly - < block_group->length); + < block_group->length - block_group->zone_unusable); + WARN_ON(block_group->space_info->bytes_zone_unusable + < block_group->zone_unusable); WARN_ON(block_group->space_info->disk_total < block_group->length * factor); } block_group->space_info->total_bytes -= block_group->length; - block_group->space_info->bytes_readonly -= block_group->length; + block_group->space_info->bytes_readonly -= + (block_group->length - block_group->zone_unusable); + block_group->space_info->bytes_zone_unusable -= + block_group->zone_unusable; block_group->space_info->disk_total -= block_group->length * factor; spin_unlock(&block_group->space_info->lock); @@ -1158,7 +1163,7 @@ static int inc_block_group_ro(struct btrfs_block_group *cache, int force) } num_bytes = cache->length - cache->reserved - cache->pinned - - cache->bytes_super - cache->used; + cache->bytes_super - cache->zone_unusable - cache->used; /* * Data never overcommits, even in mixed mode, so do just the straight @@ -1189,6 +1194,12 @@ static int inc_block_group_ro(struct btrfs_block_group *cache, int force) if (!ret) { sinfo->bytes_readonly += num_bytes; + if (btrfs_is_zoned(cache->fs_info)) { + /* Migrate zone_unusable bytes to readonly */ + sinfo->bytes_readonly += cache->zone_unusable; + sinfo->bytes_zone_unusable -= cache->zone_unusable; + cache->zone_unusable = 0; + } cache->ro++; list_add_tail(&cache->ro_list, &sinfo->ro_bgs); } @@ -1876,12 +1887,20 @@ static int read_one_block_group(struct btrfs_fs_info *info, } /* - * Check for two cases, either we are full, and therefore don't need - * to bother with the caching work since we won't find any space, or we - * are empty, and we can just add all the space in and be done with it. - * This saves us _a_lot_ of time, particularly in the full case. + * For zoned filesystem, space after the allocation offset is the only + * free space for a block group. So, we don't need any caching work. + * btrfs_calc_zone_unusable() will set the amount of free space and + * zone_unusable space. + * + * For regular filesystem, check for two cases, either we are full, and + * therefore don't need to bother with the caching work since we won't + * find any space, or we are empty, and we can just add all the space + * in and be done with it. This saves us _a_lot_ of time, particularly + * in the full case. */ - if (cache->length == cache->used) { + if (btrfs_is_zoned(info)) { + btrfs_calc_zone_unusable(cache); + } else if (cache->length == cache->used) { cache->last_byte_to_unpin = (u64)-1; cache->cached = BTRFS_CACHE_FINISHED; btrfs_free_excluded_extents(cache); @@ -1900,7 +1919,8 @@ static int read_one_block_group(struct btrfs_fs_info *info, } trace_btrfs_add_block_group(info, cache, 0); btrfs_update_space_info(info, cache->flags, cache->length, - cache->used, cache->bytes_super, &space_info); + cache->used, cache->bytes_super, + cache->zone_unusable, &space_info); cache->space_info = space_info; @@ -1956,7 +1976,7 @@ static int fill_dummy_bgs(struct btrfs_fs_info *fs_info) break; } btrfs_update_space_info(fs_info, bg->flags, em->len, em->len, - 0, &space_info); + 0, 0, &space_info); bg->space_info = space_info; link_block_group(bg); @@ -2197,7 +2217,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used, */ trace_btrfs_add_block_group(fs_info, cache, 1); btrfs_update_space_info(fs_info, cache->flags, size, bytes_used, - cache->bytes_super, &cache->space_info); + cache->bytes_super, 0, &cache->space_info); btrfs_update_global_block_rsv(fs_info); link_block_group(cache); @@ -2305,8 +2325,15 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group *cache) spin_lock(&cache->lock); if (!--cache->ro) { num_bytes = cache->length - cache->reserved - - cache->pinned - cache->bytes_super - cache->used; + cache->pinned - cache->bytes_super - + cache->zone_unusable - cache->used; sinfo->bytes_readonly -= num_bytes; + if (btrfs_is_zoned(cache->fs_info)) { + /* Migrate zone_unusable bytes back */ + cache->zone_unusable = cache->alloc_offset - cache->used; + sinfo->bytes_zone_unusable += cache->zone_unusable; + sinfo->bytes_readonly -= cache->zone_unusable; + } list_del_init(&cache->ro_list); } spin_unlock(&cache->lock); diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 224946fa9bed..0fd66febe115 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -189,6 +189,7 @@ struct btrfs_block_group { * allocation. This is used only on a zoned filesystem. */ u64 alloc_offset; + u64 zone_unusable; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 5476ab84e544..5c61c3f136f7 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -34,6 +34,7 @@ #include "block-group.h" #include "discard.h" #include "rcu-string.h" +#include "zoned.h" #undef SCRAMBLE_DELAYED_REFS @@ -2740,6 +2741,10 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info, if (cache->ro) { space_info->bytes_readonly += len; readonly = true; + } else if (btrfs_is_zoned(fs_info)) { + /* Need reset before reusing in a zoned block group */ + space_info->bytes_zone_unusable += len; + readonly = true; } spin_unlock(&cache->lock); if (!readonly && return_free_space && diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 6134e10a6e7f..b93ac31eca69 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2477,6 +2477,8 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, int ret = 0; u64 filter_bytes = bytes; + ASSERT(!btrfs_is_zoned(fs_info)); + info = kmem_cache_zalloc(btrfs_free_space_cachep, GFP_NOFS); if (!info) return -ENOMEM; @@ -2534,11 +2536,49 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, return ret; } +static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group, + u64 bytenr, u64 size, bool used) +{ + struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + u64 offset = bytenr - block_group->start; + u64 to_free, to_unusable; + + spin_lock(&ctl->tree_lock); + if (!used) + to_free = size; + else if (offset >= block_group->alloc_offset) + to_free = size; + else if (offset + size <= block_group->alloc_offset) + to_free = 0; + else + to_free = offset + size - block_group->alloc_offset; + to_unusable = size - to_free; + + ctl->free_space += to_free; + block_group->zone_unusable += to_unusable; + spin_unlock(&ctl->tree_lock); + if (!used) { + spin_lock(&block_group->lock); + block_group->alloc_offset -= size; + spin_unlock(&block_group->lock); + } + + /* All the region is now unusable. Mark it as unused and reclaim */ + if (block_group->zone_unusable == block_group->length) + btrfs_mark_bg_unused(block_group); + + return 0; +} + int btrfs_add_free_space(struct btrfs_block_group *block_group, u64 bytenr, u64 size) { enum btrfs_trim_state trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + if (btrfs_is_zoned(block_group->fs_info)) + return __btrfs_add_free_space_zoned(block_group, bytenr, size, + true); + if (btrfs_test_opt(block_group->fs_info, DISCARD_SYNC)) trim_state = BTRFS_TRIM_STATE_TRIMMED; @@ -2547,6 +2587,16 @@ int btrfs_add_free_space(struct btrfs_block_group *block_group, bytenr, size, trim_state); } +int btrfs_add_free_space_unused(struct btrfs_block_group *block_group, + u64 bytenr, u64 size) +{ + if (btrfs_is_zoned(block_group->fs_info)) + return __btrfs_add_free_space_zoned(block_group, bytenr, size, + false); + + return btrfs_add_free_space(block_group, bytenr, size); +} + /* * This is a subtle distinction because when adding free space back in general, * we want it to be added as untrimmed for async. But in the case where we add @@ -2557,6 +2607,10 @@ int btrfs_add_free_space_async_trimmed(struct btrfs_block_group *block_group, { enum btrfs_trim_state trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + if (btrfs_is_zoned(block_group->fs_info)) + return __btrfs_add_free_space_zoned(block_group, bytenr, size, + true); + if (btrfs_test_opt(block_group->fs_info, DISCARD_SYNC) || btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC)) trim_state = BTRFS_TRIM_STATE_TRIMMED; @@ -2574,6 +2628,9 @@ int btrfs_remove_free_space(struct btrfs_block_group *block_group, int ret; bool re_search = false; + if (btrfs_is_zoned(block_group->fs_info)) + return 0; + spin_lock(&ctl->tree_lock); again: @@ -2668,6 +2725,16 @@ void btrfs_dump_free_space(struct btrfs_block_group *block_group, struct rb_node *n; int count = 0; + /* + * Zoned btrfs does not use free space tree and cluster. Just print + * out the free space after the allocation offset. + */ + if (btrfs_is_zoned(fs_info)) { + btrfs_info(fs_info, "free space %llu", + block_group->length - block_group->alloc_offset); + return; + } + spin_lock(&ctl->tree_lock); for (n = rb_first(&ctl->free_space_offset); n; n = rb_next(n)) { info = rb_entry(n, struct btrfs_free_space, offset_index); diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h index ecb09a02d544..1f23088d43f9 100644 --- a/fs/btrfs/free-space-cache.h +++ b/fs/btrfs/free-space-cache.h @@ -107,6 +107,8 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, enum btrfs_trim_state trim_state); int btrfs_add_free_space(struct btrfs_block_group *block_group, u64 bytenr, u64 size); +int btrfs_add_free_space_unused(struct btrfs_block_group *block_group, + u64 bytenr, u64 size); int btrfs_add_free_space_async_trimmed(struct btrfs_block_group *block_group, u64 bytenr, u64 size); int btrfs_remove_free_space(struct btrfs_block_group *block_group, diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index bccd98141a6e..2da6177f4b0b 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -169,6 +169,7 @@ u64 __pure btrfs_space_info_used(struct btrfs_space_info *s_info, ASSERT(s_info); return s_info->bytes_used + s_info->bytes_reserved + s_info->bytes_pinned + s_info->bytes_readonly + + s_info->bytes_zone_unusable + (may_use_included ? s_info->bytes_may_use : 0); } @@ -264,7 +265,7 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info) void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, u64 total_bytes, u64 bytes_used, - u64 bytes_readonly, + u64 bytes_readonly, u64 bytes_zone_unusable, struct btrfs_space_info **space_info) { struct btrfs_space_info *found; @@ -280,6 +281,7 @@ void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, found->bytes_used += bytes_used; found->disk_used += bytes_used * factor; found->bytes_readonly += bytes_readonly; + found->bytes_zone_unusable += bytes_zone_unusable; if (total_bytes > 0) found->full = 0; btrfs_try_granting_tickets(info, found); @@ -429,10 +431,10 @@ static void __btrfs_dump_space_info(struct btrfs_fs_info *fs_info, info->total_bytes - btrfs_space_info_used(info, true), info->full ? "" : "not "); btrfs_info(fs_info, - "space_info total=%llu, used=%llu, pinned=%llu, reserved=%llu, may_use=%llu, readonly=%llu", + "space_info total=%llu, used=%llu, pinned=%llu, reserved=%llu, may_use=%llu, readonly=%llu zone_unusable=%llu", info->total_bytes, info->bytes_used, info->bytes_pinned, info->bytes_reserved, info->bytes_may_use, - info->bytes_readonly); + info->bytes_readonly, info->bytes_zone_unusable); DUMP_BLOCK_RSV(fs_info, global_block_rsv); DUMP_BLOCK_RSV(fs_info, trans_block_rsv); @@ -461,9 +463,10 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info, list_for_each_entry(cache, &info->block_groups[index], list) { spin_lock(&cache->lock); btrfs_info(fs_info, - "block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %s", + "block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %llu zone_unusable %s", cache->start, cache->length, cache->used, cache->pinned, - cache->reserved, cache->ro ? "[readonly]" : ""); + cache->reserved, cache->zone_unusable, + cache->ro ? "[readonly]" : ""); spin_unlock(&cache->lock); btrfs_dump_free_space(cache, bytes); } diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h index e237156ce888..b1a8ffb03b3e 100644 --- a/fs/btrfs/space-info.h +++ b/fs/btrfs/space-info.h @@ -17,6 +17,8 @@ struct btrfs_space_info { u64 bytes_may_use; /* number of bytes that may be used for delalloc/allocations */ u64 bytes_readonly; /* total bytes that are read only */ + u64 bytes_zone_unusable; /* total bytes that are unusable until + resetting the device zone */ u64 max_extent_size; /* This will hold the maximum extent size of the space info if we had an ENOSPC in the @@ -123,7 +125,7 @@ DECLARE_SPACE_INFO_UPDATE(bytes_pinned, "pinned"); int btrfs_init_space_info(struct btrfs_fs_info *fs_info); void btrfs_update_space_info(struct btrfs_fs_info *info, u64 flags, u64 total_bytes, u64 bytes_used, - u64 bytes_readonly, + u64 bytes_readonly, u64 bytes_zone_unusable, struct btrfs_space_info **space_info); struct btrfs_space_info *btrfs_find_space_info(struct btrfs_fs_info *info, u64 flags); diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 19b9fffa2c9c..6eb1c50fa98c 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -666,6 +666,7 @@ SPACE_INFO_ATTR(bytes_pinned); SPACE_INFO_ATTR(bytes_reserved); SPACE_INFO_ATTR(bytes_may_use); SPACE_INFO_ATTR(bytes_readonly); +SPACE_INFO_ATTR(bytes_zone_unusable); SPACE_INFO_ATTR(disk_used); SPACE_INFO_ATTR(disk_total); BTRFS_ATTR(space_info, total_bytes_pinned, @@ -679,6 +680,7 @@ static struct attribute *space_info_attrs[] = { BTRFS_ATTR_PTR(space_info, bytes_reserved), BTRFS_ATTR_PTR(space_info, bytes_may_use), BTRFS_ATTR_PTR(space_info, bytes_readonly), + BTRFS_ATTR_PTR(space_info, bytes_zone_unusable), BTRFS_ATTR_PTR(space_info, disk_used), BTRFS_ATTR_PTR(space_info, disk_total), BTRFS_ATTR_PTR(space_info, total_bytes_pinned), diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index b892566a1c93..c5f9f4c6f20b 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1160,3 +1160,24 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) return ret; } + +void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) +{ + u64 unusable, free; + + if (!btrfs_is_zoned(cache->fs_info)) + return; + + WARN_ON(cache->bytes_super != 0); + unusable = cache->alloc_offset - cache->used; + free = cache->length - cache->alloc_offset; + + /* We only need ->free_space in ALLOC_SEQ block groups */ + cache->last_byte_to_unpin = (u64)-1; + cache->cached = BTRFS_CACHE_FINISHED; + cache->free_space_ctl->free_space = free; + cache->zone_unusable = unusable; + + /* Should not have any excluded extents. Just in case, though */ + btrfs_free_excluded_extents(cache); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index d27db3993e51..37304d1675e6 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -42,6 +42,7 @@ int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, u64 length, u64 *bytes); int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new); +void btrfs_calc_zone_unusable(struct btrfs_block_group *cache); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -123,6 +124,8 @@ static inline int btrfs_load_block_group_zone_info( return 0; } +static inline void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Thu Feb 4 10:21:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066791 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CC0FC433E6 for ; Thu, 4 Feb 2021 10:26:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3157564F59 for ; Thu, 4 Feb 2021 10:26:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235471AbhBDK0H (ORCPT ); Thu, 4 Feb 2021 05:26:07 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54296 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235394AbhBDKZq (ORCPT ); Thu, 4 Feb 2021 05:25:46 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434345; x=1643970345; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FdOdb3akZdghVnsgyOGjs7F8rdRCq2/2YmVHxcdWR0E=; b=TSqtGdTl4jA23mG544hYWzj7Dv7htwmPWa7ztX4uaYELJm1PhApaKAJ2 jCbAyzCCVrtRGFk4ffeCaqoc10vffh1AOnmRv1CTLxXUe3Efb+HcHIwJ2 sTtsh8X3zv5YoqIrwoQ/B3wXviMGMFnFZDLJx89avvCpRcHfD3jQae7JO QMtyi83Ktf/8cPPB+BpdGk4XraL2uzqsH+CaMX8ATX5/Kf1EreSEMKHyr amdMwXV5H5drc56FKeGTu1/GIOyBPPq+MbQRof7xIhtqsyD1F8I02hofd k2lomCpRAx1jQ1/dHL60SKjBhSZi/LfgdLmgOXwdmWjLCaakbby/xyuOA Q==; IronPort-SDR: cMChE2ZBMW5d04lkNSQRYQq7ExYwEXsCzBbUG0QHWgZtNF0av6mVYsE5mz/Ewi5dByA4rR3uQu unMu7V2I9M9kMRvFJSmeBRmE3hVNApD6+nCGViYG41DkdwGb114REVdUKLXxQZ4LZT52/PmID/ cvMEop8jT1RdE1s7q2qhOWrY8+2RnrySk6PE9wgQV6odulRq+/eY6Px/017SpoNYNmAFtdouD+ F/KcYiq4QyG9bUd+UFT7EJMZXBU83HPXK49K3Q6gFLkUoRcOWh4j4uN5+TlbkUAT0kV/zwdtYi iuo= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159107994" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:15 +0800 IronPort-SDR: 4YxgUQNEDs3xeYvLWWmskeLOc+MW0dLFIJ0R4CTiqL2AU3dtWDFGPWTnGy1JU3jDVqaGQn/KjS RvyOVZFOLb3Wc8kK+A4wfOs2xfjYGLscdMzEgBaYXTilDv2lBUZp9H3zs6B9YAoqrMkGOVVbvl 3cwvZbrSU4QjfiI8nT6qHBgiMwUxIVk0WguC/Yat7Vp1csS5vSk4zB5HVoEWuM/zegeEkjfBVJ N70LDVkhRck6X+PAtZqN7CtnMTwU/mJD5hlg9XD4uY0CpYcgCVI8Br9X7KGatVXXZL2bB9Xv7h F8cmEERxhl/KkSlhsPjMUfu6 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:18 -0800 IronPort-SDR: EavtwIDsZOvTAEAx+5Zq9Bn1Y0Sy09qVulIhXhYsURw2hOCA5qwu+iFQup+SzdUlp1aFT7YFm8 qCru0Pcdyw/wiFubZigyuTeYn68abvJStd8sbL2G91fFY5id0FSB7Lywrm32wDO6tUpvgJqtW8 3UE+POUVjNU/+Pre0M6yoQ9TMyHiLEUYol2RrXrjF9aM/nbGcZ41UtW6Q9utd6HKBSCjO6jE0l U+tUHJYRKEQhrYj/0W6PDvOvBnIY9A5Bsgfj1X/I43ZcllBBnjozwjR6AzatkTL196qP26/wNX Dxw= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:14 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 14/42] btrfs: zoned: implement sequential extent allocation Date: Thu, 4 Feb 2021 19:21:53 +0900 Message-Id: <2a2f979a38943162a54ddf017aa44371d17695ec.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Implement a sequential extent allocator for zoned filesystems. This allocator only needs to check if there is enough space in the block group after the allocation pointer to satisfy the extent allocation request. Therefore the allocator never manages bitmaps or clusters. Also, add assertions to the corresponding functions. As zone append writing is used, it would be unnecessary to track the allocation offset, as the allocator only needs to check available space. But by tracking and returning the offset as an allocated region, we can skip modification of ordered extents and checksum information when there is no IO reordering. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 4 ++ fs/btrfs/extent-tree.c | 90 ++++++++++++++++++++++++++++++++++--- fs/btrfs/free-space-cache.c | 6 +++ 3 files changed, 94 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index e4444d4dd4b5..63093cfb807e 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -725,6 +725,10 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only struct btrfs_caching_control *caching_ctl = NULL; int ret = 0; + /* Allocator for zoned filesystems does not use the cache at all */ + if (btrfs_is_zoned(fs_info)) + return 0; + caching_ctl = kzalloc(sizeof(*caching_ctl), GFP_NOFS); if (!caching_ctl) return -ENOMEM; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 5c61c3f136f7..85d99307673d 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3429,6 +3429,7 @@ btrfs_release_block_group(struct btrfs_block_group *cache, enum btrfs_extent_allocation_policy { BTRFS_EXTENT_ALLOC_CLUSTERED, + BTRFS_EXTENT_ALLOC_ZONED, }; /* @@ -3681,6 +3682,65 @@ static int do_allocation_clustered(struct btrfs_block_group *block_group, return find_free_extent_unclustered(block_group, ffe_ctl); } +/* + * Simple allocator for sequential only block group. It only allows sequential + * allocation. No need to play with trees. This function also reserves the + * bytes as in btrfs_add_reserved_bytes. + */ +static int do_allocation_zoned(struct btrfs_block_group *block_group, + struct find_free_extent_ctl *ffe_ctl, + struct btrfs_block_group **bg_ret) +{ + struct btrfs_space_info *space_info = block_group->space_info; + struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; + u64 start = block_group->start; + u64 num_bytes = ffe_ctl->num_bytes; + u64 avail; + int ret = 0; + + ASSERT(btrfs_is_zoned(block_group->fs_info)); + + spin_lock(&space_info->lock); + spin_lock(&block_group->lock); + + if (block_group->ro) { + ret = 1; + goto out; + } + + avail = block_group->length - block_group->alloc_offset; + if (avail < num_bytes) { + if (ffe_ctl->max_extent_size < avail) { + /* + * With sequential allocator, free space is always + * contiguous + */ + ffe_ctl->max_extent_size = avail; + ffe_ctl->total_free_space = avail; + } + ret = 1; + goto out; + } + + ffe_ctl->found_offset = start + block_group->alloc_offset; + block_group->alloc_offset += num_bytes; + spin_lock(&ctl->tree_lock); + ctl->free_space -= num_bytes; + spin_unlock(&ctl->tree_lock); + + /* + * We do not check if found_offset is aligned to stripesize. The + * address is anyway rewritten when using zone append writing. + */ + + ffe_ctl->search_start = ffe_ctl->found_offset; + +out: + spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); + return ret; +} + static int do_allocation(struct btrfs_block_group *block_group, struct find_free_extent_ctl *ffe_ctl, struct btrfs_block_group **bg_ret) @@ -3688,6 +3748,8 @@ static int do_allocation(struct btrfs_block_group *block_group, switch (ffe_ctl->policy) { case BTRFS_EXTENT_ALLOC_CLUSTERED: return do_allocation_clustered(block_group, ffe_ctl, bg_ret); + case BTRFS_EXTENT_ALLOC_ZONED: + return do_allocation_zoned(block_group, ffe_ctl, bg_ret); default: BUG(); } @@ -3702,6 +3764,9 @@ static void release_block_group(struct btrfs_block_group *block_group, ffe_ctl->retry_clustered = false; ffe_ctl->retry_unclustered = false; break; + case BTRFS_EXTENT_ALLOC_ZONED: + /* Nothing to do */ + break; default: BUG(); } @@ -3730,6 +3795,9 @@ static void found_extent(struct find_free_extent_ctl *ffe_ctl, case BTRFS_EXTENT_ALLOC_CLUSTERED: found_extent_clustered(ffe_ctl, ins); break; + case BTRFS_EXTENT_ALLOC_ZONED: + /* Nothing to do */ + break; default: BUG(); } @@ -3745,6 +3813,9 @@ static int chunk_allocation_failed(struct find_free_extent_ctl *ffe_ctl) */ ffe_ctl->loop = LOOP_NO_EMPTY_SIZE; return 0; + case BTRFS_EXTENT_ALLOC_ZONED: + /* Give up here */ + return -ENOSPC; default: BUG(); } @@ -3913,6 +3984,9 @@ static int prepare_allocation(struct btrfs_fs_info *fs_info, case BTRFS_EXTENT_ALLOC_CLUSTERED: return prepare_allocation_clustered(fs_info, ffe_ctl, space_info, ins); + case BTRFS_EXTENT_ALLOC_ZONED: + /* Nothing to do */ + return 0; default: BUG(); } @@ -3976,6 +4050,9 @@ static noinline int find_free_extent(struct btrfs_root *root, ffe_ctl.last_ptr = NULL; ffe_ctl.use_cluster = true; + if (btrfs_is_zoned(fs_info)) + ffe_ctl.policy = BTRFS_EXTENT_ALLOC_ZONED; + ins->type = BTRFS_EXTENT_ITEM_KEY; ins->objectid = 0; ins->offset = 0; @@ -4118,20 +4195,21 @@ static noinline int find_free_extent(struct btrfs_root *root, /* move on to the next group */ if (ffe_ctl.search_start + num_bytes > block_group->start + block_group->length) { - btrfs_add_free_space(block_group, ffe_ctl.found_offset, - num_bytes); + btrfs_add_free_space_unused(block_group, + ffe_ctl.found_offset, num_bytes); goto loop; } if (ffe_ctl.found_offset < ffe_ctl.search_start) - btrfs_add_free_space(block_group, ffe_ctl.found_offset, - ffe_ctl.search_start - ffe_ctl.found_offset); + btrfs_add_free_space_unused(block_group, + ffe_ctl.found_offset, + ffe_ctl.search_start - ffe_ctl.found_offset); ret = btrfs_add_reserved_bytes(block_group, ram_bytes, num_bytes, delalloc); if (ret == -EAGAIN) { - btrfs_add_free_space(block_group, ffe_ctl.found_offset, - num_bytes); + btrfs_add_free_space_unused(block_group, + ffe_ctl.found_offset, num_bytes); goto loop; } btrfs_inc_block_group_reservations(block_group); diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index b93ac31eca69..d2a43186cc7f 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2928,6 +2928,8 @@ u64 btrfs_find_space_for_alloc(struct btrfs_block_group *block_group, u64 align_gap_len = 0; enum btrfs_trim_state align_gap_trim_state = BTRFS_TRIM_STATE_UNTRIMMED; + ASSERT(!btrfs_is_zoned(block_group->fs_info)); + spin_lock(&ctl->tree_lock); entry = find_free_space(ctl, &offset, &bytes_search, block_group->full_stripe_len, max_extent_size); @@ -3059,6 +3061,8 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group *block_group, struct rb_node *node; u64 ret = 0; + ASSERT(!btrfs_is_zoned(block_group->fs_info)); + spin_lock(&cluster->lock); if (bytes > cluster->max_size) goto out; @@ -3835,6 +3839,8 @@ int btrfs_trim_block_group(struct btrfs_block_group *block_group, int ret; u64 rem = 0; + ASSERT(!btrfs_is_zoned(block_group->fs_info)); + *trimmed = 0; spin_lock(&block_group->lock); From patchwork Thu Feb 4 10:21:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066793 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBEBDC433DB for ; Thu, 4 Feb 2021 10:26:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9751B64E43 for ; Thu, 4 Feb 2021 10:26:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235476AbhBDK00 (ORCPT ); Thu, 4 Feb 2021 05:26:26 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54276 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235470AbhBDK0G (ORCPT ); Thu, 4 Feb 2021 05:26:06 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434366; x=1643970366; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xg1Nsvpj3BZWKn8bbLvBcESBI3F08uBTb76C16cAjqc=; b=PFsmUJRlF+xojOBQT6fzsUfcfhHDEETOhBuSI+gnskzkCFp5anJl03lH Q/aFxtt9J0kz4fbVayyl5uGYAyWiwcw/Nz3wPC4xYjqrIsJ1Z5wyM9OXj Z4mjJlx3lB/jXkT/qkvtoZybMXSFQ73WHa0Li4RoxEagv4fVpLA7qx/uX Cewg9sXQeyzHsuFQlbKIemYFaYRuZFyL2hHCRICGwdbYyqGRl2dB8euCe tU9SCkSOzMAFCaCQ7//B7zLsDPnbHj+fYr4hue/gsc09BHyhj0/0cI36q FEAaW8IxzKQUfyzMWllxhHUlGaCpjao6ofsUontpb9UT7LSOvpQMaQjQt w==; IronPort-SDR: c4j9F6Bwrb8wQeMRVOsZicXpLhGdq6wRFUVficWyezGVjYuPP1QgywqknriofAaBESmoXoH729 eqWJsrrxpIrfajIGuh8GnMNoqt1ybEeXwxX7Y9NgFz/QwWfrDW7UgRO6wpMsWyqAtYjMBBLXuR kev3npQYK36BsGXuXqUFmGFRGlSUFmJdZBGCaL4uGATIUQASa6MWcU2l48BkiV/xB9kdhOwRP5 W3+vZ2euBL51xan+d/znmFb4yRI3LL0j4QFCTUVjlnPieyZTGLfBn9Zho2yTQuSi9n8oIzjCbf YkI= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159107997" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:16 +0800 IronPort-SDR: BsPKIUz4gdDjwVC4JvkPzzN7FxYom5I3xZm5OzihYNBASpwllPVfkONk+o4aVgsPpXEBILIgLO WE/ucbNfjM/paELznfY+rDUZAcoMBhaGEu45xxgXI4JBdOyZv+MH0RySWSBd4xyyJJosR15orp BRPPo7wrfXW7WZKkmyv1noGfrMWRVIxLUVhlJRyNe7N/92p+ONXS8X3wGqH8vJr63CWNIl0KQy AuVFOE3EIMJsZkShMqDZljtX1kG7D64R6X3jWqPC+gSyD00SBl5QcLaWAfW+tKz0LOhATWqcsU AhCtrKZaODlJvBHPps2Nz9M2 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:19 -0800 IronPort-SDR: fc4zyBDBfx3SbYVhTVaUcX5IkIOBvwmm5EIeuFRQrRxxWsNvRDppkXlUbD6T/AJZVucLZ9nVp6 8WmymN0O547pCETmMyaNEzBDGNk406wI9rvVvLMfocKeBN8Yj2nkymZ5VK8Ybd+sV0exSuRr3E AMSUfZH4LreG0zBEP1cDj5T4CJKWSW5nEoMF8TWkgqdE0N7HQg6J8sYXpE/dVDlOrP7/sZKQNI IRcMFgFvDLjsDrGq49FSXpoYD6C5PtKwf77ekt0j/mRzndW49NBDKtcpLfHWJg/ETrYYF0xjze zlA= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:15 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 15/42] btrfs: zoned: redirty released extent buffers Date: Thu, 4 Feb 2021 19:21:54 +0900 Message-Id: <60882b8bb6f8723b8568515212cac64a55ce405f.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Tree manipulating operations like merging nodes often release once-allocated tree nodes. Such nodes are cleaned so that pages in the node are not uselessly written out. On zoned volumes, however, such optimization blocks the following IOs as the cancellation of the write out of the freed blocks breaks the sequential write sequence expected by the device. Introduce a list of clean and unwritten extent buffers that have been released in a transaction. Redirty the buffers so that btree_write_cache_pages() can send proper bios to the devices. Besides it clears the entire content of the extent buffer not to confuse raw block scanners e.g. 'btrfs check'. By clearing the content, csum_dirty_buffer() complains about bytenr mismatch, so avoid the checking and checksum using newly introduced buffer flag EXTENT_BUFFER_NO_CHECK. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 8 ++++++++ fs/btrfs/extent-tree.c | 12 +++++++++++- fs/btrfs/extent_io.c | 4 ++++ fs/btrfs/extent_io.h | 2 ++ fs/btrfs/transaction.c | 10 ++++++++++ fs/btrfs/transaction.h | 3 +++ fs/btrfs/tree-log.c | 6 ++++++ fs/btrfs/zoned.c | 37 +++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 7 +++++++ 9 files changed, 88 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8551b0fc1b22..eb1afd7d89f7 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -459,6 +459,12 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec return 0; found_start = btrfs_header_bytenr(eb); + + if (test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)) { + WARN_ON(found_start != 0); + return 0; + } + /* * Please do not consolidate these warnings into a single if. * It is useful to know what went wrong. @@ -4774,6 +4780,8 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans, EXTENT_DIRTY); btrfs_destroy_pinned_extent(fs_info, &cur_trans->pinned_extents); + btrfs_free_redirty_list(cur_trans); + cur_trans->state =TRANS_STATE_COMPLETED; wake_up(&cur_trans->commit_wait); } diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 85d99307673d..4d48a773bf9c 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3292,8 +3292,10 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) { ret = check_ref_cleanup(trans, buf->start); - if (!ret) + if (!ret) { + btrfs_redirty_list_add(trans->transaction, buf); goto out; + } } cache = btrfs_lookup_block_group(fs_info, buf->start); @@ -3304,6 +3306,13 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, goto out; } + if (btrfs_is_zoned(fs_info)) { + btrfs_redirty_list_add(trans->transaction, buf); + pin_down_extent(trans, cache, buf->start, buf->len, 1); + btrfs_put_block_group(cache); + goto out; + } + WARN_ON(test_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)); btrfs_add_free_space(cache, buf->start, buf->len); @@ -4635,6 +4644,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root, __btrfs_tree_lock(buf, nest); btrfs_clean_tree_block(buf); clear_bit(EXTENT_BUFFER_STALE, &buf->bflags); + clear_bit(EXTENT_BUFFER_NO_CHECK, &buf->bflags); set_extent_buffer_uptodate(buf); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 2fa4ca12e2dd..fa9b37178d42 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -25,6 +25,7 @@ #include "backref.h" #include "disk-io.h" #include "subpage.h" +#include "zoned.h" static struct kmem_cache *extent_state_cache; static struct kmem_cache *extent_buffer_cache; @@ -5183,6 +5184,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start, btrfs_leak_debug_add(&fs_info->eb_leak_lock, &eb->leak_list, &fs_info->allocated_ebs); + INIT_LIST_HEAD(&eb->release_list); spin_lock_init(&eb->refs_lock); atomic_set(&eb->refs, 1); @@ -6105,6 +6107,8 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv, char *src = (char *)srcv; unsigned long i = get_eb_page_index(start); + WARN_ON(test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)); + if (check_eb_range(eb, start, len)) return; diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 047b3e66897f..824640cb0ace 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -31,6 +31,7 @@ enum { EXTENT_BUFFER_IN_TREE, /* write IO error */ EXTENT_BUFFER_WRITE_ERR, + EXTENT_BUFFER_NO_CHECK, }; /* these are flags for __process_pages_contig */ @@ -93,6 +94,7 @@ struct extent_buffer { struct rw_semaphore lock; struct page *pages[INLINE_EXTENT_BUFFER_PAGES]; + struct list_head release_list; #ifdef CONFIG_BTRFS_DEBUG struct list_head leak_list; #endif diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 00c0680dac3a..acff6bb49a97 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -21,6 +21,7 @@ #include "qgroup.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" #define BTRFS_ROOT_TRANS_TAG 0 @@ -380,6 +381,8 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info, spin_lock_init(&cur_trans->dirty_bgs_lock); INIT_LIST_HEAD(&cur_trans->deleted_bgs); spin_lock_init(&cur_trans->dropped_roots_lock); + INIT_LIST_HEAD(&cur_trans->releasing_ebs); + spin_lock_init(&cur_trans->releasing_ebs_lock); list_add_tail(&cur_trans->list, &fs_info->trans_list); extent_io_tree_init(fs_info, &cur_trans->dirty_pages, IO_TREE_TRANS_DIRTY_PAGES, fs_info->btree_inode); @@ -2350,6 +2353,13 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) goto scrub_continue; } + /* + * At this point, we should have written all the tree blocks allocated + * in this transaction. So it's now safe to free the redirtyied extent + * buffers. + */ + btrfs_free_redirty_list(cur_trans); + ret = write_all_supers(fs_info, 0); /* * the super is written, we can safely allow the tree-loggers diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 935bd6958a8a..6335716e513f 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -93,6 +93,9 @@ struct btrfs_transaction { */ atomic_t pending_ordered; wait_queue_head_t pending_wait; + + spinlock_t releasing_ebs_lock; + struct list_head releasing_ebs; }; #define __TRANS_FREEZABLE (1U << 0) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 4c7b283ed2b2..c02eeeac439c 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -19,6 +19,7 @@ #include "qgroup.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" /* magic values for the inode_only field in btrfs_log_inode: * @@ -2752,6 +2753,8 @@ static noinline int walk_down_log_tree(struct btrfs_trans_handle *trans, free_extent_buffer(next); return ret; } + btrfs_redirty_list_add( + trans->transaction, next); } else { if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &next->bflags)) clear_extent_buffer_dirty(next); @@ -3296,6 +3299,9 @@ static void free_log_tree(struct btrfs_trans_handle *trans, clear_extent_bits(&log->dirty_log_pages, 0, (u64)-1, EXTENT_DIRTY | EXTENT_NEW | EXTENT_NEED_WAIT); extent_io_tree_release(&log->log_csum_range); + + if (trans && log->node) + btrfs_redirty_list_add(trans->transaction, log->node); btrfs_put_root(log); } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index c5f9f4c6f20b..1de67d789b83 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -10,6 +10,7 @@ #include "rcu-string.h" #include "disk-io.h" #include "block-group.h" +#include "transaction.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -1181,3 +1182,39 @@ void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) /* Should not have any excluded extents. Just in case, though */ btrfs_free_excluded_extents(cache); } + +void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb) +{ + struct btrfs_fs_info *fs_info = eb->fs_info; + + if (!btrfs_is_zoned(fs_info) || + btrfs_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN) || + !list_empty(&eb->release_list)) + return; + + set_extent_buffer_dirty(eb); + set_extent_bits_nowait(&trans->dirty_pages, eb->start, + eb->start + eb->len - 1, EXTENT_DIRTY); + memzero_extent_buffer(eb, 0, eb->len); + set_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags); + + spin_lock(&trans->releasing_ebs_lock); + list_add_tail(&eb->release_list, &trans->releasing_ebs); + spin_unlock(&trans->releasing_ebs_lock); + atomic_inc(&eb->refs); +} + +void btrfs_free_redirty_list(struct btrfs_transaction *trans) +{ + spin_lock(&trans->releasing_ebs_lock); + while (!list_empty(&trans->releasing_ebs)) { + struct extent_buffer *eb; + + eb = list_first_entry(&trans->releasing_ebs, + struct extent_buffer, release_list); + list_del_init(&eb->release_list); + free_extent_buffer(eb); + } + spin_unlock(&trans->releasing_ebs_lock); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 37304d1675e6..b250a578e38c 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -43,6 +43,9 @@ int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical, int btrfs_ensure_empty_zones(struct btrfs_device *device, u64 start, u64 size); int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new); void btrfs_calc_zone_unusable(struct btrfs_block_group *cache); +void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb); +void btrfs_free_redirty_list(struct btrfs_transaction *trans); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -126,6 +129,10 @@ static inline int btrfs_load_block_group_zone_info( static inline void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) { } +static inline void btrfs_redirty_list_add(struct btrfs_transaction *trans, + struct extent_buffer *eb) { } +static inline void btrfs_free_redirty_list(struct btrfs_transaction *trans) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Thu Feb 4 10:21:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066795 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 064DAC433DB for ; Thu, 4 Feb 2021 10:26:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A759364DB1 for ; Thu, 4 Feb 2021 10:26:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231244AbhBDK0j (ORCPT ); Thu, 4 Feb 2021 05:26:39 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54283 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235394AbhBDK0P (ORCPT ); Thu, 4 Feb 2021 05:26:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434375; x=1643970375; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=s24xRux8e5TSHiHdmYfK90FV8NIaZiwD/kRauqwzvH0=; b=D0CTir9Xo2o4CGJ69d1lg0S2Rxfx4C+93ucyMssJqLLnaQN6hxP2TLLD X5SG69SGIXcB478iJzAEqd9RJj6wzW+Ym3YlU+CbHmm9S9+3mqkW4enFs raKy/sR8s1ytyU29qL45W9eTA99kIWSf+YVZ2dbJl8vgYgbHMqYtmSVCL CfcJzl/1wPIo/lg+3mUevIZcyZS65IkWPykuamaiaOWL4oHfv4GSil2Vu hy/O7g5vPQydG1Map71L3YIBkGQkqkktGrcudUPx47iae/wlRGH3Bou5A apycCeHkl93zI7h5BYOHooFCbRYKe3eyUa/WPFMhf+pwNh0K5zJ3rXwi5 g==; IronPort-SDR: IxuyxfqhHSRCHG/0RjKV5FztOK/dF0SxKfnho7MQlGbuBMks377HTdk2z6JEqyZN3ETFQ4BXWW qDQEMjWd4ghbggpHi+6VyJtQXyCdgEMjWGRVYy2I1ojS7iB1pqrISIWsPeJ2kSc4alsC7uCItl Xu6h0Ky3N464hhwPaofJbiroprTRiHGLEfMjyIQQoj+zuexw7Ga0aPnC/7tqeDOQTdkYaUfZoQ zEg+Vh1QKe3C/ohrZcEcWQNyAuQOHQyyOWbsunTyFkr2hswI2NWdNA8HuNt4RtZ/r3twyvEHKz wqU= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159107998" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:17 +0800 IronPort-SDR: EmexibozaWtvyEmQkpfJI4vxjTN1kOcxsQZhODa/D5UUE72fEUD7q4j7ehh4bcg69J92n0aHl5 nYPpbSptXiddjEQSxI7P6gHiDoP2eIrpHs2crnuAZEs2YxSx8dILI7E5hTIEZhRt6VpgaLDqai XjbFN/wfMmVbskcEHnYWxxtZkT2geDS0hdlXaPf9PJFAIDgo8x63A27V93WiYMuu45xzgcq/bH K+mMY6II0Z/Upi60h5u/7SZ+7KOKBwxAXyLdyrCFbOvNsjxKmpj7JKbP/DT5hP9sAqGqecB152 mZ6Bwa9hlh02SJESi5sWlesK Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:21 -0800 IronPort-SDR: lqiRhjk8snWjfe755nsh1kiIsZ1vdIha2bmTFiv15K3YsBaf89pL0zXiBlR4H2smH2bmpv0hBh aG/5H3f1qDTH1c2Nsm7gSL1KbDkHRTNIYiQ7BSxnukoot1LhE/Y3pFl+FEhWRUE5HKIThrjxTR w4+heG7mNEwjKGhefNe8M9y7s8tRrftZdO2L3t/KeTULqdTSNkyJfDyYajSkulsXitzt6ZO5YQ JBR9+uaWtuN+ZdJpYEpRCxVBmteIwmKdxAWLRW+dwj0NPfBVBZDxn+96kw06jM3KHhCuNLnmJe YGE= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:16 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 16/42] btrfs: zoned: advance allocation pointer after tree log node Date: Thu, 4 Feb 2021 19:21:55 +0900 Message-Id: <834b102881cd55d37760c3d6f49319df0ed6efe4.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Since the allocation info of a tree log node is not recorded in the extent tree, calculate_alloc_pointer() cannot detect this node, so the pointer can be over a tree node. Replaying the log calls btrfs_remove_free_space() for each node in the log tree. So, advance the pointer after the node to not allocate over it. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/free-space-cache.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index d2a43186cc7f..5400294bd271 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2628,8 +2628,22 @@ int btrfs_remove_free_space(struct btrfs_block_group *block_group, int ret; bool re_search = false; - if (btrfs_is_zoned(block_group->fs_info)) + if (btrfs_is_zoned(block_group->fs_info)) { + /* + * This can happen with conventional zones when replaying log. + * Since the allocation info of tree-log nodes are not recorded + * to the extent-tree, calculate_alloc_pointer() failed to + * advance the allocation pointer after last allocated tree log + * node blocks. + * + * This function is called from + * btrfs_pin_extent_for_log_replay() when replaying the log. + * Advance the pointer not to overwrite the tree-log nodes. + */ + if (block_group->alloc_offset < offset + bytes) + block_group->alloc_offset = offset + bytes; return 0; + } spin_lock(&ctl->tree_lock); From patchwork Thu Feb 4 10:21:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066797 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D546C433E0 for ; Thu, 4 Feb 2021 10:26:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0275E64F60 for ; Thu, 4 Feb 2021 10:26:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235483AbhBDK0v (ORCPT ); Thu, 4 Feb 2021 05:26:51 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54215 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235474AbhBDK03 (ORCPT ); Thu, 4 Feb 2021 05:26:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434388; x=1643970388; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=siSfsR28G71DWO9yF0vS8Zn+MRrW/89+EczZivcJR7k=; b=XQU5UAYypxmpGWOBNjcdVALNFZc17ufEYbV6LDFwLiaO+q8mcaUS/ERU QsyQUmKlrUKNrsaNxb0oIWH8uNBIbqJx8um/+IbbXhW6MFVoUG7AL64Ou Z6FVoKcTj6mLeAfY8Iy/TpmRX9Mbj117hBESDt/11pSi8eWtUzkKPGGBB 1DjOBY0FIQ2OAUlwh84oIhyrjYmsH3CAG9+ISKhLV5j8dRYxYUwmrEYPG i6P9DXKFDtbuiwH4LJY6ldAAHEDvGqqDwfxa/vc0s1NuACQu96RdxS3M8 rONr8jxtaHjL0o04IFeQem9o/Uv4ngqfPZthfzM59LoarThhVXkaH0ZmA g==; IronPort-SDR: jGrYDHppvBc1GoZhNo1LOrp52BNXAAVTDCe6/3hR/+KtrCxUQ0Uh2N1uywmUh6JkJfHkbESn0b IMHmnEaUBhLFoXJVV6R0Jeqfs7uSI04UNvtzguJnWXlgep0341+37IZkhBNgCcHsN6c6xndVLy 9PJgicPpwczKh5q3A1tJnLB03Gt1uDp9h9lo84VH02TdayDweXG3qnxQdtCESR44LrFp1l6gNH WhwbFBcZTK0eRV2CoU8hwi4RAFzedUu8O/HAxr/nkYMLAcT1fPx+2yo8wb3qCZyFsv6K55HCdO juY= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108001" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:19 +0800 IronPort-SDR: wkiLr7ld+ARp6/vAzjwMnhOBRVWU/8YfiYvN3TVQXEyskGBEknPBQykVJVlOXfayMwc+NNZ0kM BkBERfBA32QsSi68gDW1DkeIh4gNvsrvLbbYxdC1EhHOTznCFc5DsIP9PN3SDS4yamGaWYBJfJ pcxLdQ2o3+Cs41i9Dbl7awFoa7r1/zjeHwsM/2eKOk9x9KPDnMNiU2btvlxjg4tA3w9Rt0Eh/f XkJ7b2FGFsLpIIV/gzI7U+zScIB/+Ck/8qIdMUKGkjcS95JptYnqUXpFLrMB8lJBPfMlQg7NK2 i27BUymrV49Zd7/IJ7OJbOKK Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:22 -0800 IronPort-SDR: pvAHaGwQ5NgGnDDO95RTHlfXY/zlcZQKWZ74q3gRtX5VF1nZ4uSVEzkcnbDC6+DVs39jiXJ2H6 XaNMGid+jEifavGjMnCFSedfIw8baHoqug1VOyhobmSndu0NEBGFR0YtI+FFHZlmyPiwPwZAv2 s/gk2+YxwE9xQY9VZLhA0Yidhlq04FXxNuPomEiL1uC59KSWchi79SSm7VPC/oZ5yMlb/2eZwT TNYsSO2U/zya0qarEKI2raO/SOIS4JxFjSnh/yP2zfqGU0y0PyosBapfwUpVrrtP2tGNV3RjSv 02s= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:18 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik , Anand Jain Subject: [PATCH v15 17/42] btrfs: zoned: reset zones of unused block groups Date: Thu, 4 Feb 2021 19:21:56 +0900 Message-Id: X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org We must reset the zones of a deleted unused block group to rewind the zones' write pointers to the zones' start. To do this, we can use the DISCARD_SYNC code to do the reset when the filesystem is running on zoned devices. Reviewed-by: Josef Bacik Reviewed-by: Anand Jain Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 8 ++++++-- fs/btrfs/extent-tree.c | 17 ++++++++++++----- fs/btrfs/zoned.h | 15 +++++++++++++++ 3 files changed, 33 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 63093cfb807e..70a0c0f8f99f 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1408,8 +1408,12 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) if (!async_trim_enabled && btrfs_test_opt(fs_info, DISCARD_ASYNC)) goto flip_async; - /* DISCARD can flip during remount */ - trimming = btrfs_test_opt(fs_info, DISCARD_SYNC); + /* + * DISCARD can flip during remount. On zoned filesystems, we + * need to reset sequential-required zones. + */ + trimming = btrfs_test_opt(fs_info, DISCARD_SYNC) || + btrfs_is_zoned(fs_info); /* Implicit trim during transaction commit. */ if (trimming) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 4d48a773bf9c..a717366c9823 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1298,6 +1298,9 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, stripe = bbio->stripes; for (i = 0; i < bbio->num_stripes; i++, stripe++) { + struct btrfs_device *dev = stripe->dev; + u64 physical = stripe->physical; + u64 length = stripe->length; u64 bytes; struct request_queue *req_q; @@ -1305,14 +1308,18 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, ASSERT(btrfs_test_opt(fs_info, DEGRADED)); continue; } + req_q = bdev_get_queue(stripe->dev->bdev); - if (!blk_queue_discard(req_q)) + /* Zone reset on zoned filesystems */ + if (btrfs_can_zone_reset(dev, physical, length)) + ret = btrfs_reset_device_zone(dev, physical, + length, &bytes); + else if (blk_queue_discard(req_q)) + ret = btrfs_issue_discard(dev->bdev, physical, + length, &bytes); + else continue; - ret = btrfs_issue_discard(stripe->dev->bdev, - stripe->physical, - stripe->length, - &bytes); if (!ret) { discarded_bytes += bytes; } else if (ret != -EOPNOTSUPP) { diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index b250a578e38c..c105641a6ad3 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -209,4 +209,19 @@ static inline bool btrfs_check_super_location(struct btrfs_device *device, u64 p return device->zone_info == NULL || !btrfs_dev_is_sequential(device, pos); } +static inline bool btrfs_can_zone_reset(struct btrfs_device *device, + u64 physical, u64 length) +{ + u64 zone_size; + + if (!btrfs_dev_is_sequential(device, physical)) + return false; + + zone_size = device->zone_info->zone_size; + if (!IS_ALIGNED(physical, zone_size) || !IS_ALIGNED(length, zone_size)) + return false; + + return true; +} + #endif From patchwork Thu Feb 4 10:21:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066799 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70B97C433E0 for ; Thu, 4 Feb 2021 10:26:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 291F664F59 for ; Thu, 4 Feb 2021 10:26:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235474AbhBDK0y (ORCPT ); Thu, 4 Feb 2021 05:26:54 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54218 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235470AbhBDK0l (ORCPT ); Thu, 4 Feb 2021 05:26:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434401; x=1643970401; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+YA1u2fIjmfdYUc6OdukrKqhcyqEGqgAMSEE4YwOv9A=; b=n5lLGWuyFgO55DCAm93TVD+YxF08WNQq+9UAty/1RbOIvz7/au5/yhug lF4HL8HnP8vSXYgtZxAZb08giwgywYGMo40Jfw7QSIrazbhoSJq8Xu1yB mlS7zFcVsRi3YJ1UFPRDPkewT57NfdLO1In0Uvkpr72kN6xN2pYufec08 zywhsiThVFRPG23kIpngJOtKEpnt+PYznOEg8re9PXq7VnmuKriN5sUK9 B0d0z8QNcEZC6nWAA0kJa4EQ1TH65NI3Q/QFPgpGevLqOE3NH2YGTyLgx ZPlbFUywxUXGdRmQ8Jhs7XRWazMTXahYY92rCeKgWSCb8P0gPS6HOQSrs A==; IronPort-SDR: PyZi7ajfz2jV/CrmKWcL41+HEh1g5vz81ls/dXcoZVlIpS7RxJh8u14Qhp2WgIXuFb3M3EnUNd 2YOzghReGzrFHp/MGKuT+aQA/f1r8sKTBlQhzmdZllRk9A5vYouXFGqIRGAjusA2zaa60SsN8F uzEGM+ossW9akor5roclW+HntzNd51KiCvuNROchqKyFuCooDlnSze7laV/1t+gVGI4lXezKKE 92I8FXRqsKg/kvUkPEgP5TKRGmaXonl3jsDSV2A+ZGccDLboNlbsqzT6ZEN84ly6nc1lDQIUDK OEk= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108008" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:20 +0800 IronPort-SDR: ui0Ek1wH2M8uCVlTReBge7ijrFZNSlTIxHdHuI5WZlSDjKC52dGQrhk5P9nzwxSEbXE70Mc2sB 55os5woBFSBwClbcvIzfQ9/V7m6d0qu0fEokwznCd/0xJvRFSChNZQWTxGk4BKfBljWqAmsUHQ amkLFeKMg0gz1j8tSozPTtQXA3jHdO3sMumcdiaCIfNoK04kWqZ0nBUfNMigZM9Zhd0B4oTeQf hgM2IJ2us+5M6YrkH+kl1eO/sz/qNU3YIjuOTHphragFaQlhEy6bVNlCensdJeijTzt8wmFbwt T+4z2ASugshpNV2Shz4jeRi2 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:23 -0800 IronPort-SDR: fx9ASOv67TOLqhuov9BfmWJWRQFX/Ie0T0viZbdXHXGrhz25NHpdDKkyJilvuQasQ/pYlfsB13 z3N433GRluUF/r8UgDv3QnZgWm0j6l0pZC9Eh9zWT8iFoWEeBiGgcaOJL1C5ODKCAybZTdlKj/ bwGBvKLbNxDUgx8CjzWcTzoPvyMTyeEtuqmNw6Cj/8EOy8bsQFflgsLymqWpKCtYXX9skJGHwC ZyIAwDopyKC3+DujFoqsFWXJsVsQboCqKz79icMsspZ5ZbFVN/TL7FZFhb4MZaMroCo3RyM9HO TDo= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:19 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 18/42] btrfs: factor out helper adding a page to bio Date: Thu, 4 Feb 2021 19:21:57 +0900 Message-Id: X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Extract adding a page to a bio from submit_extent_page(). The page is added only when bio_flags are the same, contiguous and the added page fits in the same stripe as pages in the bio. Condition checks are reordered to allow early return to avoid possibly heavy btrfs_bio_fits_in_stripe() calling. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/extent_io.c | 60 +++++++++++++++++++++++++++++++++----------- 1 file changed, 45 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index fa9b37178d42..5db7e6c69391 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3084,6 +3084,48 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, int offset, int size) return bio; } +/** + * btrfs_bio_add_page - attempt to add a page to bio + * + * @bio: destination bio + * @page: page to add to the bio + * @disk_bytenr: offset of the new bio or to check whether we are adding + * a contiguous page to the previous one + * @pg_offset: starting offset in the page + * @size: portion of page that we want to write + * @prev_bio_flags: flags of previous bio to see if we can merge the current one + * @bio_flags: flags of the current bio to see if we can merge them + * @return: true if page was added, false otherwise + * + * Attempt to add a page to bio considering stripe alignment etc. + * + * Return true if successfully page added. Otherwise, return false. + */ +static bool btrfs_bio_add_page(struct bio *bio, struct page *page, + u64 disk_bytenr, unsigned int size, + unsigned int pg_offset, + unsigned long prev_bio_flags, + unsigned long bio_flags) +{ + const sector_t sector = disk_bytenr >> SECTOR_SHIFT; + bool contig; + + if (prev_bio_flags != bio_flags) + return false; + + if (prev_bio_flags & EXTENT_BIO_COMPRESSED) + contig = bio->bi_iter.bi_sector == sector; + else + contig = bio_end_sector(bio) == sector; + if (!contig) + return false; + + if (btrfs_bio_fits_in_stripe(page, size, bio, bio_flags)) + return false; + + return bio_add_page(bio, page, size, pg_offset) == size; +} + /* * @opf: bio REQ_OP_* and REQ_* flags as one value * @wbc: optional writeback control for io accounting @@ -3112,27 +3154,15 @@ static int submit_extent_page(unsigned int opf, int ret = 0; struct bio *bio; size_t io_size = min_t(size_t, size, PAGE_SIZE); - sector_t sector = disk_bytenr >> 9; struct extent_io_tree *tree = &BTRFS_I(page->mapping->host)->io_tree; ASSERT(bio_ret); if (*bio_ret) { - bool contig; - bool can_merge = true; - bio = *bio_ret; - if (prev_bio_flags & EXTENT_BIO_COMPRESSED) - contig = bio->bi_iter.bi_sector == sector; - else - contig = bio_end_sector(bio) == sector; - - if (btrfs_bio_fits_in_stripe(page, io_size, bio, bio_flags)) - can_merge = false; - - if (prev_bio_flags != bio_flags || !contig || !can_merge || - force_bio_submit || - bio_add_page(bio, page, io_size, pg_offset) < io_size) { + if (force_bio_submit || + !btrfs_bio_add_page(bio, page, disk_bytenr, io_size, + pg_offset, prev_bio_flags, bio_flags)) { ret = submit_one_bio(bio, mirror_num, prev_bio_flags); if (ret < 0) { *bio_ret = NULL; From patchwork Thu Feb 4 10:21:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066849 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C262C433E6 for ; Thu, 4 Feb 2021 10:27:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D54F864E43 for ; Thu, 4 Feb 2021 10:27:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235499AbhBDK1H (ORCPT ); Thu, 4 Feb 2021 05:27:07 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54222 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235479AbhBDK0m (ORCPT ); Thu, 4 Feb 2021 05:26:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434401; x=1643970401; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SuOeS6EIMznVG3tzHpqlZKGIM/1EU/293PfwKobgODo=; b=UDNzdUlGD9ws1EPu2qoZxqTqhJNBhYR+y2ItPtqWCPEIkCjkR0o5ssAq j3cz0cSthe9K6TEjHHzGobojcuzYWkudeHhJsrp07e/bTAArmIcGHirhk yZqer1bAozqq2HhMOmDRbeuMNiZ1WyDbqud4QUMYp6xVctfoEwwU3LzVM S+qRimFyz5IrjIy82N8FDBcDLPlaX6Hm5jdNz6ZPiqvxvr80p/BD/mfqt 1XHhvGs+hjmKFV6cJjcYgfCTm7yHMJ03wGNvXvmEIWq89bxlpWmvZkBS3 GT2QVPlmV3BalL2scc1UNpl206Bb7hIMqRc+NJPQ37kEpRX1MOqVuPSuq A==; IronPort-SDR: COy/w+czQ4HOVAtgmmGV+hwkM2WDReBnmntP1qAzv1yrrKKnNX1a/jrZ594YtTdJrf6f9ACqoN 9V2SiFVi/DkvUyNYykkTQy/JFZDGadRQY9lgIzRZ8qMVvegcgnkp1DXBz8kRN1prsZoSNxMQaN xTC5WeK2XIOQYcUrlk9pAPyOPrdXKJE7OBYdCHSABRaj+H8dtY3li+bx/5UvYVihKXkByqC8BX EeB0KJRQAxzEkIAOy6lexa8SkFy0umaMODAWAPEL+pmvatGS5nVfQo4gfTiBqNDd3ln/YmHoko Wgs= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108009" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:21 +0800 IronPort-SDR: v6NTrt8X/pemAJmWpWEELniQXkF2TI0dEa8W230y8mJQAHjgy8sM86jtyv9YfXVbnPDGhRlW8F /YgqbIobI8yk0fPcQjTB7Tclxb9+ttu+ISp+G5qiz424ss8pCDemb4qg3SYT2BNkLNS2koQBrw ydE4hgMVOAQ4ftlG4a+Kn7Q8YR13PUvUpDH/ZnsgRZGsvvNcqTL1S0g2AHJ9XKmiLkrfiAsw5e Nfnktk56YwGAmJMk5zRfRWUldorE0t1vyXx8PmXk2g8XZ6hUq+/TQ6REN9el5uMQECONs+uyWp cLvsk0Y9TjoOhbOFvpPB2F3F Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:25 -0800 IronPort-SDR: BjqWjfsC8md6v5n8h1xwN6HsfWQwfmlaXYHeKv5Ol+y+AF+nI2zfIPgvzKHwCuA8N+/EgUnYzC KGZQdNevNBCD9dXy0+J8g6IE0rwEJshKI9/keAbJ92ScYszrW+E+N+KBFulK+Ufeju/Q5szgYV Sf4iwI9ahBruWIwKQNvxUFH73Q+OnJoKT8En732RhdQXNCQoJ+zh+GOOKefQoznE2gPs7oaW/D a0SQcPwXBzQoN+avIySb3Qqb6AJ1kqh+7caR6I0B+vm07y1cMoCp2pDHCSL4HA7jC5PH1vuUtL bIc= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:21 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 19/42] btrfs: zoned: use bio_add_zone_append_page Date: Thu, 4 Feb 2021 19:21:58 +0900 Message-Id: <7fa79d7bd946a7f2f054a9d9562e6bda647cabb7.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org A zoned device has its own hardware restrictions e.g. max_zone_append_size when using REQ_OP_ZONE_APPEND. To follow these restrictions, use bio_add_zone_append_page() instead of bio_add_page(). We need target device to use bio_add_zone_append_page(), so this commit reads the chunk information to cache the target device to btrfs_io_bio(bio)->device. Caching only the target device is sufficient here as zoned filesystems only supports the single profile at the moment. Once more profiles will be supported btrfs_io_bio can hold an extent_map to be able to check for the restrictions of all devices the brtfs_bio will be mapped to. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/extent_io.c | 29 ++++++++++++++++++++++++++--- 1 file changed, 26 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 5db7e6c69391..15503a435e98 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3109,6 +3109,7 @@ static bool btrfs_bio_add_page(struct bio *bio, struct page *page, { const sector_t sector = disk_bytenr >> SECTOR_SHIFT; bool contig; + int ret; if (prev_bio_flags != bio_flags) return false; @@ -3123,7 +3124,12 @@ static bool btrfs_bio_add_page(struct bio *bio, struct page *page, if (btrfs_bio_fits_in_stripe(page, size, bio, bio_flags)) return false; - return bio_add_page(bio, page, size, pg_offset) == size; + if (bio_op(bio) == REQ_OP_ZONE_APPEND) + ret = bio_add_zone_append_page(bio, page, size, pg_offset); + else + ret = bio_add_page(bio, page, size, pg_offset); + + return ret == size; } /* @@ -3154,7 +3160,9 @@ static int submit_extent_page(unsigned int opf, int ret = 0; struct bio *bio; size_t io_size = min_t(size_t, size, PAGE_SIZE); - struct extent_io_tree *tree = &BTRFS_I(page->mapping->host)->io_tree; + struct btrfs_inode *inode = BTRFS_I(page->mapping->host); + struct extent_io_tree *tree = &inode->io_tree; + struct btrfs_fs_info *fs_info = inode->root->fs_info; ASSERT(bio_ret); @@ -3185,11 +3193,26 @@ static int submit_extent_page(unsigned int opf, if (wbc) { struct block_device *bdev; - bdev = BTRFS_I(page->mapping->host)->root->fs_info->fs_devices->latest_bdev; + bdev = fs_info->fs_devices->latest_bdev; bio_set_dev(bio, bdev); wbc_init_bio(wbc, bio); wbc_account_cgroup_owner(wbc, page, io_size); } + if (btrfs_is_zoned(fs_info) && bio_op(bio) == REQ_OP_ZONE_APPEND) { + struct extent_map *em; + struct map_lookup *map; + + em = btrfs_get_chunk_map(fs_info, disk_bytenr, io_size); + if (IS_ERR(em)) + return PTR_ERR(em); + + map = em->map_lookup; + /* We only support single profile for now */ + ASSERT(map->num_stripes == 1); + btrfs_io_bio(bio)->device = map->stripes[0].dev; + + free_extent_map(em); + } *bio_ret = bio; From patchwork Thu Feb 4 10:21:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066853 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0865EC433DB for ; Thu, 4 Feb 2021 10:27:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C0C0C64E22 for ; Thu, 4 Feb 2021 10:27:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235358AbhBDK1L (ORCPT ); Thu, 4 Feb 2021 05:27:11 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54296 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235481AbhBDK0n (ORCPT ); Thu, 4 Feb 2021 05:26:43 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434402; x=1643970402; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ilsh7Ge8V8xiJQX+1IPMIMQyrp09081CJ7cOYruSHnQ=; b=ic6IgtyHX7Mdw+KethFCG43+yHbjJQ1RhfZfVtZj8o+hE2AG8z565ekO hOnTP4Rhqt/Zlf200VTYJb5pcu7Z+JdqR29dnSl9hiybEvP9UhrwmMpRV vB7C+4SAou5oaA1FgL25BZv6AHAp/cjeA7FMR1NQw12jIv23snKKIBCq6 diPd9OQER3uKVHxSzENuHVImRVRttNJfTcqbW54SEJNLWejGlxdJJa8UG G8hjxjElcW/4kjAKrC0IIHK5Yn+fUlfB7bq0Qui5OsHLaVcfDc8tkpQ5J nq1NLjGBheT5iQjrN9RIO+KqLbK2Az321mHR4756U2QsdAE9K2+FocU82 g==; IronPort-SDR: 5GWoy/FUad5VaMrIXIZ6pT+G/M8KUZr3AMVFF6yvicEl03qcrSj/pDQ8w5SZCUjwhuywkTUgkr GScB1POR5bUm5L1yWRED12Erj+RN99QPzkFLjCfC7vseMoTxtStXxub5ruDmtzCTmLQ48sPBky eB9Wy8lTQgVjJt4cwEjbBjuyb28/oohAGKFCQdE7kl7N+BKMxqwCmk6Acu2l1rskmlG4l15C19 qAa3kIuw1d44tt46v0Jhtbt+OHakCKxlawaBoGsKCtkc3WkzfqhoIh3IelCp/E5+8XJvgJZC7F Kuo= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108013" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:23 +0800 IronPort-SDR: FvQEfYhzzwLIZNBI9CttbWhZKuPoU/vkkfEQRUn/FrC5QH73Y1Mn63VZpr+73AGYMF+juO4Cea Xq1vz1JklJFLgYkGSK0xXpxUjTFVVy2+bmeozbOY1KN0BGXLXYpNocWoOItmzXw5uZb3TNzihc D8AF1QDnEffLghJOy+mn6k4v/4KStS09IU+D/WyOfFkQRgK44zLjy/jQIpnzzKGU/6feC2IYN9 QexiVWowmAfIu83+gQ1G53vElkVkxC8z61BSBWboLYM78QmnOn2g4dfjS64wITK+8TOvroKmsf IJTy55LYCuH9aXhzl+fy30Dc Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:26 -0800 IronPort-SDR: ddouDU4WK8Mrij+QKnL6zLsAzKW+aiZjz1YFKN0m6G4HnyKdaeWIm0+f51s+rX7/yNvB6P1Irq lwrkjyEadPPYfdQm0Ab6RC/xQt9Vl6v3vUbGxc56O/Ze2MOfYaGlddEvRhz1AbIlpP7jOn1PVy Ei6eXntVMZYzU8xg7ZzIU6mV2z1yupAYBrq17v/Be86jLbOS9xKDXnal4beiQYkMyv4c1FuFgG yC0IUG/I8GrrU9LpZleEEEwWZoZH+RKLP43E62PJaN4ENO8C7sU50VArSzLRU1tNVHzO0A1L7N SiA= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:22 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 20/42] btrfs: zoned: handle REQ_OP_ZONE_APPEND as writing Date: Thu, 4 Feb 2021 19:21:59 +0900 Message-Id: <1e13940c0ba576f1e1f18d2f98c1a60ce35540b4.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Zoned filesystems use REQ_OP_ZONE_APPEND bios for writing to actual devices. Let btrfs_end_bio() and btrfs_op be aware of it, by mapping REQ_OP_ZONE_APPEND to BTRFS_MAP_WRITE and using btrfs_op() instead of bio_op(). Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 4 ++-- fs/btrfs/inode.c | 10 +++++----- fs/btrfs/volumes.c | 8 ++++---- fs/btrfs/volumes.h | 1 + 4 files changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index eb1afd7d89f7..70621184a731 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -709,7 +709,7 @@ static void end_workqueue_bio(struct bio *bio) fs_info = end_io_wq->info; end_io_wq->status = bio->bi_status; - if (bio_op(bio) == REQ_OP_WRITE) { + if (btrfs_op(bio) == BTRFS_MAP_WRITE) { if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA) wq = fs_info->endio_meta_write_workers; else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE) @@ -885,7 +885,7 @@ blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio, int async = check_async_write(fs_info, BTRFS_I(inode)); blk_status_t ret; - if (bio_op(bio) != REQ_OP_WRITE) { + if (btrfs_op(bio) != BTRFS_MAP_WRITE) { /* * called for a read, do the setup so that checksum validation * can happen in the async kernel threads diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5522e9d09c8a..d7a9c770dc3b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2250,7 +2250,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, if (btrfs_is_free_space_inode(BTRFS_I(inode))) metadata = BTRFS_WQ_ENDIO_FREE_SPACE; - if (bio_op(bio) != REQ_OP_WRITE) { + if (btrfs_op(bio) != BTRFS_MAP_WRITE) { ret = btrfs_bio_wq_end_io(fs_info, bio, metadata); if (ret) goto out; @@ -7681,7 +7681,7 @@ static void btrfs_dio_private_put(struct btrfs_dio_private *dip) if (!refcount_dec_and_test(&dip->refs)) return; - if (bio_op(dip->dio_bio) == REQ_OP_WRITE) { + if (btrfs_op(dip->dio_bio) == BTRFS_MAP_WRITE) { __endio_write_update_ordered(BTRFS_I(dip->inode), dip->logical_offset, dip->bytes, @@ -7847,7 +7847,7 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct btrfs_dio_private *dip = bio->bi_private; - bool write = bio_op(bio) == REQ_OP_WRITE; + bool write = btrfs_op(bio) == BTRFS_MAP_WRITE; blk_status_t ret; /* Check btrfs_submit_bio_hook() for rules about async submit. */ @@ -7897,7 +7897,7 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, struct inode *inode, loff_t file_offset) { - const bool write = (bio_op(dio_bio) == REQ_OP_WRITE); + const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE); const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM); size_t dip_size; struct btrfs_dio_private *dip; @@ -7927,7 +7927,7 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio, static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap, struct bio *dio_bio, loff_t file_offset) { - const bool write = (bio_op(dio_bio) == REQ_OP_WRITE); + const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); const bool raid56 = (btrfs_data_alloc_profile(fs_info) & BTRFS_BLOCK_GROUP_RAID56_MASK); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 10401def16ef..400375aaa197 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6448,7 +6448,7 @@ static void btrfs_end_bio(struct bio *bio) struct btrfs_device *dev = btrfs_io_bio(bio)->device; ASSERT(dev->bdev); - if (bio_op(bio) == REQ_OP_WRITE) + if (btrfs_op(bio) == BTRFS_MAP_WRITE) btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS); else if (!(bio->bi_opf & REQ_RAHEAD)) @@ -6561,10 +6561,10 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, atomic_set(&bbio->stripes_pending, bbio->num_stripes); if ((bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) && - ((bio_op(bio) == REQ_OP_WRITE) || (mirror_num > 1))) { + ((btrfs_op(bio) == BTRFS_MAP_WRITE) || (mirror_num > 1))) { /* In this case, map_length has been set to the length of a single stripe; not the whole write */ - if (bio_op(bio) == REQ_OP_WRITE) { + if (btrfs_op(bio) == BTRFS_MAP_WRITE) { ret = raid56_parity_write(fs_info, bio, bbio, map_length); } else { @@ -6587,7 +6587,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, dev = bbio->stripes[dev_nr].dev; if (!dev || !dev->bdev || test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) || - (bio_op(first_bio) == REQ_OP_WRITE && + (btrfs_op(first_bio) == BTRFS_MAP_WRITE && !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) { bbio_error(bbio, first_bio, logical); continue; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 598ac225176d..d3bbdb4175df 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -424,6 +424,7 @@ static inline enum btrfs_map_op btrfs_op(struct bio *bio) case REQ_OP_DISCARD: return BTRFS_MAP_DISCARD; case REQ_OP_WRITE: + case REQ_OP_ZONE_APPEND: return BTRFS_MAP_WRITE; default: WARN_ON_ONCE(1); From patchwork Thu Feb 4 10:22:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066851 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F8A5C433E6 for ; Thu, 4 Feb 2021 10:27:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E62F164E22 for ; Thu, 4 Feb 2021 10:27:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235503AbhBDK1O (ORCPT ); Thu, 4 Feb 2021 05:27:14 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54276 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235491AbhBDK1A (ORCPT ); Thu, 4 Feb 2021 05:27:00 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434419; x=1643970419; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KolP6sIqJFiajal3qZFLebkQx5lIkXWZXRORVj0y+lg=; b=jmqlWXStDRgwOPiuMQGZTz/HfAKJ7TOPPz1mgDNQS6rGvtFd4C1AR8rQ D7ZosI5O87SSUzET1PTZhLqN4O33mWLp/RdT5DcbiuKemlAYOaCmxNUWV XPBKDyryobPkRf+2vxtQQOhiidU4WdRHtXdD80TZuIW4BEAy1FD/KviXt 6mB8Jtm/8TsGuyZCy1q4AEOHfEljskRsijssZ3wLcdfS1B8pVzOvMwrn9 cGkBwEzNF6FgRnj+gUjE4a34IqoiCqyQZE6vEKH53HgcG1dZ1OWCsGv0+ kmN2X2eAGj2pW+YybjohqToHmA/dJrb0hOyRo1dz9iOf0Qe/E8RdXN0ZM Q==; IronPort-SDR: cy0hqFnE0CHhhvyI4ipSbsRku5eJIicmFGd65YbzXp/0yyuWkRBQNYHKzq69N0rpnH3W5gLckj Z5N/yaClyvJeZYgF0KCWf2mGv7OSWqKk/omAbbkgJochyT8YmTbo8Fj9H87U+38THjxSCnT0KS BEBFhYp0w4nivjdCS+7qwHzz2Uq5ieg1naD84vD9uazb/p+uojyrQJb+RdUpd+LLlueP4ydi86 S/8/hcA27q/ypV4XmRZ+5yFwGuTfCRtE5sHAt4wnFTVGcfqcj99GaTpPA4mEMUyk36mZ4MMrje Sns= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108015" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:24 +0800 IronPort-SDR: saTF6oQlzqcFnpV3yvy9zAf3nJY6i9MQ8NARWDtUxEnEDILf2FaaOmbGaKQeLDuSXMf5/NcP+i sHWRW6l1gewXv7KycHrhiaX2igZEhbZPfQQPtIKbCxv0HptMV9Jqu8xhVzmEMu+jRyPW0XLBLO wmZPLC6/Iqn6QqqhuwyicaHNtQkNynx6yBMEtShwZJIW/O1UpGym/jfgflNvlbrGzDSNxvgjUY etcAubY/Vku+BM6bM/0reGmUhFFhdMFpffN9qm4FkDVZ+/GmJxmbTFFgGMqpcQ+mZ1+eFOiBtc +r7Zxy2AJqaaMnEMjcq4Zpo3 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:27 -0800 IronPort-SDR: /h9jjG196lsNf5GxjuxgOTPsA3GQPACoDUlIYjLmPktxnVwzWl6ETAUXda0eXsSKEovTkDm0Ou eJlV2pj8mFs0FOlXWZiuB2TMhAiFsuPpQzeLgT7GdUbAjogo/l/XGvaZdiD+/iwn/8hW3+EqEB 2FHOad2zoZiBTv9qtLbmqiK70W9ERkX2olLND6YeKWAe2QCvShwbWpIX7OpMnF9MPpIE/paOKv 6e+bHV3c+ZtoiLN0jW9DCevhuZHJXCckQx4uRCnxg4CZdKixaW4khMEatFCXH9S5k+w6idQBlj Ki8= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:23 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 21/42] btrfs: zoned: split ordered extent when bio is sent Date: Thu, 4 Feb 2021 19:22:00 +0900 Message-Id: <333ff02fff5aa3133b1bfe0eccc3806d3032b0c9.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org For a zone append write, the device decides the location the data is being written to. Therefore we cannot ensure that two bios are written consecutively on the device. In order to ensure that an ordered extent maps to a contiguous region on disk, we need to maintain a "one bio == one ordered extent" rule. Implement splitting of an ordered extent and extent map on bio submission to adhere to the rule. extract_ordered_extent() hooks into btrfs_submit_data_bio() and splits the corresponding ordered extent so that the ordered extent's region fits into one bio and the corresponding device limits. Several sanity checks need to be done in extract_ordered_extent() e.g. - We cannot split once end_bio'd ordered extent because we cannot divide ordered->bytes_left for the split ones - We do not expect a compressed ordered extent - We should not have checksum list because we omit the list splitting. Since the function is called before btrfs_wq_submit_bio() or btrfs_csum_one_bio(), this sholud be always ensured. We also need to split an extent map by creating a new one. If not, unpin_extent_cache() complains about the difference between the start of the extent map and the file's logical offset. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 95 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/ordered-data.c | 78 +++++++++++++++++++++++++++++++++ fs/btrfs/ordered-data.h | 2 + 3 files changed, 175 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d7a9c770dc3b..750482a06d67 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2215,6 +2215,92 @@ static blk_status_t btrfs_submit_bio_start(struct inode *inode, struct bio *bio, return btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0); } +static blk_status_t extract_ordered_extent(struct btrfs_inode *inode, + struct bio *bio, loff_t file_offset) +{ + struct btrfs_ordered_extent *ordered; + struct extent_map *em = NULL, *em_new = NULL; + struct extent_map_tree *em_tree = &inode->extent_tree; + u64 start = (u64)bio->bi_iter.bi_sector << SECTOR_SHIFT; + u64 len = bio->bi_iter.bi_size; + u64 end = start + len; + u64 ordered_end; + u64 pre, post; + int ret = 0; + + ordered = btrfs_lookup_ordered_extent(inode, file_offset); + if (WARN_ON_ONCE(!ordered)) + return BLK_STS_IOERR; + + /* No need to split */ + if (ordered->disk_num_bytes == len) + goto out; + + /* We cannot split once end_bio'd ordered extent */ + if (WARN_ON_ONCE(ordered->bytes_left != ordered->disk_num_bytes)) { + ret = -EINVAL; + goto out; + } + + /* We cannot split a compressed ordered extent */ + if (WARN_ON_ONCE(ordered->disk_num_bytes != ordered->num_bytes)) { + ret = -EINVAL; + goto out; + } + + ordered_end = ordered->disk_bytenr + ordered->disk_num_bytes; + /* bio must be in one ordered extent */ + if (WARN_ON_ONCE(start < ordered->disk_bytenr || end > ordered_end)) { + ret = -EINVAL; + goto out; + } + + /* Checksum list should be empty */ + if (WARN_ON_ONCE(!list_empty(&ordered->list))) { + ret = -EINVAL; + goto out; + } + + pre = start - ordered->disk_bytenr; + post = ordered_end - end; + + ret = btrfs_split_ordered_extent(ordered, pre, post); + if (ret) + goto out; + + read_lock(&em_tree->lock); + em = lookup_extent_mapping(em_tree, ordered->file_offset, len); + if (!em) { + read_unlock(&em_tree->lock); + ret = -EIO; + goto out; + } + read_unlock(&em_tree->lock); + + ASSERT(!test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)); + /* + * We cannot reuse em_new here but have to create a new one, as + * unpin_extent_cache() expects the start of the extent map to be the + * logical offset of the file, which does not hold true anymore after + * splitting. + */ + em_new = create_io_em(inode, em->start + pre, len, + em->start + pre, em->block_start + pre, len, + len, len, BTRFS_COMPRESS_NONE, + BTRFS_ORDERED_REGULAR); + if (IS_ERR(em_new)) { + ret = PTR_ERR(em_new); + goto out; + } + free_extent_map(em_new); + +out: + free_extent_map(em); + btrfs_put_ordered_extent(ordered); + + return errno_to_blk_status(ret); +} + /* * extent_io.c submission hook. This does the right thing for csum calculation * on write, or reading the csums from the tree before a read. @@ -2250,6 +2336,15 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, if (btrfs_is_free_space_inode(BTRFS_I(inode))) metadata = BTRFS_WQ_ENDIO_FREE_SPACE; + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + struct page *page = bio_first_bvec_all(bio)->bv_page; + loff_t file_offset = page_offset(page); + + ret = extract_ordered_extent(BTRFS_I(inode), bio, file_offset); + if (ret) + goto out; + } + if (btrfs_op(bio) != BTRFS_MAP_WRITE) { ret = btrfs_bio_wq_end_io(fs_info, bio, metadata); if (ret) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index e8dee1578d4a..2dc707f02f00 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -920,6 +920,84 @@ void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start, } } +static int clone_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pos, + u64 len) +{ + struct inode *inode = ordered->inode; + u64 file_offset = ordered->file_offset + pos; + u64 disk_bytenr = ordered->disk_bytenr + pos; + u64 num_bytes = len; + u64 disk_num_bytes = len; + int type; + unsigned long flags_masked = ordered->flags & ~(1 << BTRFS_ORDERED_DIRECT); + int compress_type = ordered->compress_type; + unsigned long weight; + int ret; + + weight = hweight_long(flags_masked); + WARN_ON_ONCE(weight > 1); + if (!weight) + type = 0; + else + type = __ffs(flags_masked); + + if (test_bit(BTRFS_ORDERED_COMPRESSED, &ordered->flags)) { + WARN_ON_ONCE(1); + ret = btrfs_add_ordered_extent_compress(BTRFS_I(inode), + file_offset, disk_bytenr, num_bytes, + disk_num_bytes, compress_type); + } else if (test_bit(BTRFS_ORDERED_DIRECT, &ordered->flags)) { + ret = btrfs_add_ordered_extent_dio(BTRFS_I(inode), file_offset, + disk_bytenr, num_bytes, disk_num_bytes, type); + } else { + ret = btrfs_add_ordered_extent(BTRFS_I(inode), file_offset, + disk_bytenr, num_bytes, disk_num_bytes, type); + } + + return ret; +} + +int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre, + u64 post) +{ + struct inode *inode = ordered->inode; + struct btrfs_ordered_inode_tree *tree = &BTRFS_I(inode)->ordered_tree; + struct rb_node *node; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + int ret = 0; + + spin_lock_irq(&tree->lock); + /* Remove from tree once */ + node = &ordered->rb_node; + rb_erase(node, &tree->tree); + RB_CLEAR_NODE(node); + if (tree->last == node) + tree->last = NULL; + + ordered->file_offset += pre; + ordered->disk_bytenr += pre; + ordered->num_bytes -= (pre + post); + ordered->disk_num_bytes -= (pre + post); + ordered->bytes_left -= (pre + post); + + /* Re-insert the node */ + node = tree_insert(&tree->tree, ordered->file_offset, &ordered->rb_node); + if (node) + btrfs_panic(fs_info, -EEXIST, + "zoned: inconsistency in ordered tree at offset %llu", + ordered->file_offset); + + spin_unlock_irq(&tree->lock); + + if (pre) + ret = clone_ordered_extent(ordered, 0, pre); + if (post) + ret = clone_ordered_extent(ordered, pre + ordered->disk_num_bytes, + post); + + return ret; +} + int __init ordered_data_init(void) { btrfs_ordered_extent_cache = kmem_cache_create("btrfs_ordered_extent", diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index cca3307807e8..c400be75a3f1 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -201,6 +201,8 @@ void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, u64 nr, void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start, u64 end, struct extent_state **cached_state); +int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre, + u64 post); int __init ordered_data_init(void); void __cold ordered_data_exit(void); From patchwork Thu Feb 4 10:22:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066855 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F0A8C433DB for ; Thu, 4 Feb 2021 10:27:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4D92564E22 for ; Thu, 4 Feb 2021 10:27:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235507AbhBDK1Y (ORCPT ); Thu, 4 Feb 2021 05:27:24 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54283 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235496AbhBDK1E (ORCPT ); Thu, 4 Feb 2021 05:27:04 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434424; x=1643970424; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XXS0lJkbOkA+iPE68KkSqE1R1W2sX5cgKt37F5O8CS8=; b=mk9T6IuUnrlvhRAwvbfzGJBmyNOwmftVv5lgGCSyU6EotDUSQ7sbDwF1 oqPJ8/uKsMiedzyY+ooTrXIng2rd2uqTh7xSLpZM0rwI9JYq85/q4W+RB KObeJQIo95dlCaTYHPrQ7irZsW9zhKIcHScfaEmW4/K11BXJsucIqFRqB gMPgP09pCY+iXUS7VlWBkTGNMrJ8pf8TiFZGj30OvHgPYsYsfJTsHwhPu STNJ+AAFeMJjUsj7Lk/97/B1SW6AeFRi+TNmmb+O0+OO9pMDRWZaYgaSN ELfkGotqrF57hC0fHd700jBz4sCW+AyoIPqDgUV3b3+onj+ckzbMTBRWk A==; IronPort-SDR: 3MyhkXuExbbPx1qGVAI+b7+pWY0RRMVghbsSjS0J9si0oL/gJXkVcAkaq/mJjpuLrICtC07dAh x25pFR/AoQrnskj/g+J04Zge996FLyOJxTKDcom6qhsGNbVlY78rTHkNDatERC+QvdctHQNBMo 3QH1PMFmpsWs0kW9TH1O0VcNtTha+l7dN006iX2aA/nSrjxSumOfDF/ajmSLxtD/D3fhE2QNng j/YvV2ipb+frulV9cF+YWCVeHplBGQ+kApKe57lbD9byAckk8+MaSKLryQmlgu1w0jWuXYmFj0 YEM= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108018" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:26 +0800 IronPort-SDR: N/xKJz3H3HQM5z6jnGT80R67arB5lDApIhfyrssXv/Ai4daAMOAjo5v8q+LDBJSliz0nZV8RbS +UCVCf16DZa25L5Nhm/eczyOnyvrWNn8orow/GLmroQuibwEjV1GCG5y/FXy86ZVMy/0NVNYZg ZGMB8K1Vqsy+QJH5Xt5sNyj6HAsOs0a7Ki7YFG+5ZcbDhMQDfmhFtGj2+q6XCLBOWKcWwsB4TQ csxirSgHZNXLGu0AG+zEEJxD/i3hdamAqwqcerMpNIOG5ZpcaMPSRxhd4n5P4VmI/2SSeo9jCa +FoVUlhXzYJNXDfyxFzlyUAK Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:29 -0800 IronPort-SDR: +fdHMZBQkGF8VFWlYCzyey/bgA+yiUQEdov/0iplXWBRpoMKyZYivsMkN32PG8kKo0lhPnd3Ir S/gruOwxYCj8HvWhLynTOV2vuD6zumxiw2wAeWYxj1OGBUe4+b2/ytuBh5yGQPat4zxF7daggN L6ORXor6yV7lbjf+BzHdnWGNumuBMxbtzY+tfKV9YEyQrtJWhIL7XnUUWFyuaaJiKjM3Bj2a2J 4plwjUULplatq0AOlx3D7twzRrK11EggfUG276Wf1lcm/I6LX8wcAgHuibl5EX8zBqiObyN3mI hjE= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:25 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Johannes Thumshirn , Josef Bacik , Naohiro Aota Subject: [PATCH v15 22/42] btrfs: zoned: check if bio spans across an ordered extent Date: Thu, 4 Feb 2021 19:22:01 +0900 Message-Id: <2118f2d9559cbd71356a55ad4f378b5705a43e22.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn To ensure that an ordered extent maps to a contiguous region on disk, we need to maintain a "one bio == one ordered extent" rule. Ensure that constructing bio does not span more than an ordered extent. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/ctree.h | 2 ++ fs/btrfs/extent_io.c | 9 +++++++-- fs/btrfs/inode.c | 27 +++++++++++++++++++++++++++ 3 files changed, 36 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a9b0521d9e89..10da47ab093a 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3120,6 +3120,8 @@ void btrfs_split_delalloc_extent(struct inode *inode, struct extent_state *orig, u64 split); int btrfs_bio_fits_in_stripe(struct page *page, size_t size, struct bio *bio, unsigned long bio_flags); +bool btrfs_bio_fits_in_ordered_extent(struct page *page, struct bio *bio, + unsigned int size); void btrfs_set_range_writeback(struct extent_io_tree *tree, u64 start, u64 end); vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf); int btrfs_readpage(struct file *file, struct page *page); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 15503a435e98..72b1a23d17f9 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3124,10 +3124,15 @@ static bool btrfs_bio_add_page(struct bio *bio, struct page *page, if (btrfs_bio_fits_in_stripe(page, size, bio, bio_flags)) return false; - if (bio_op(bio) == REQ_OP_ZONE_APPEND) + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + struct page *first_page = bio_first_bvec_all(bio)->bv_page; + + if (!btrfs_bio_fits_in_ordered_extent(first_page, bio, size)) + return false; ret = bio_add_zone_append_page(bio, page, size, pg_offset); - else + } else { ret = bio_add_page(bio, page, size, pg_offset); + } return ret == size; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 750482a06d67..31545e503b9e 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2215,6 +2215,33 @@ static blk_status_t btrfs_submit_bio_start(struct inode *inode, struct bio *bio, return btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0); } +bool btrfs_bio_fits_in_ordered_extent(struct page *page, struct bio *bio, + unsigned int size) +{ + struct btrfs_inode *inode = BTRFS_I(page->mapping->host); + struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct btrfs_ordered_extent *ordered; + u64 len = bio->bi_iter.bi_size + size; + bool ret = true; + + ASSERT(btrfs_is_zoned(fs_info)); + ASSERT(fs_info->max_zone_append_size > 0); + ASSERT(bio_op(bio) == REQ_OP_ZONE_APPEND); + + /* Ordered extent not yet created, so we're good */ + ordered = btrfs_lookup_ordered_extent(inode, page_offset(page)); + if (!ordered) + return ret; + + if ((bio->bi_iter.bi_sector << SECTOR_SHIFT) + len > + ordered->disk_bytenr + ordered->disk_num_bytes) + ret = false; + + btrfs_put_ordered_extent(ordered); + + return ret; +} + static blk_status_t extract_ordered_extent(struct btrfs_inode *inode, struct bio *bio, loff_t file_offset) { From patchwork Thu Feb 4 10:22:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066857 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0173C433DB for ; Thu, 4 Feb 2021 10:27:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7F34364F70 for ; Thu, 4 Feb 2021 10:27:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235512AbhBDK1f (ORCPT ); Thu, 4 Feb 2021 05:27:35 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54215 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235489AbhBDK1a (ORCPT ); Thu, 4 Feb 2021 05:27:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434449; x=1643970449; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ENG3sQkqCF3XYpv8vzL6emuTctLGuWnbOyl1OJ3xoBk=; b=puw5LvwcZnqHkXgsCINGCuO0jJoM03mz6PeWIN9G5c7EQWcCSb2BmoTa /CBKxsQk/+TIUmpHLI7ZHGJOG9GPN0ajYGeRsa/L2cKxgwiRnxf5oyNCS A/SAWeLwVzVEzJQC/b5Nql4AUJx5ZzGiiNYd4xcK0ueSAQkbZKzwIFRly VIeK178q2hY9EO8VYN68ib4NSB5PxM/vdI6QUbpvELUM4k0DrbdUJvtXX OxVVivLOSDnGD/may+RAIwS9vyENIXOSES48sv0DSPLcjYLFH1qh3L3BI m0UxvFrT3eFLn+lALfTKOWjIK/Wot8aqUymiMJSyin4RzPvcD7GUFyeZJ w==; IronPort-SDR: 7kekMcSSFrlsiRnz2AF3W96ezVXsFui8S4QdW6GDKcvz5ZHlej2k6j4t87y1iDs8ZRxwewBnWC RuaNosgTHP/4QNPV74TcWN7g25Uqd5Ze36Bliypn2x6VxaNwidafZIU3ytXQxIftSDxkap1Kk3 zK/vEy9TMbNybYs9ImVM22wVMgT/zyxWKBQXYE0o9C3TCYcj83LluWw88BBQG13EuxbTqryX5d pfgZhY2UkG4XM/Os6ht+Nbsd6U1ZPtjNBiphytkiSnfrpj9GoJld4wlq75FLjXu2bw6B8p+8MG veY= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108021" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:27 +0800 IronPort-SDR: 9FxmCKAwNKTK0TTIJyS6cN9lqem02LKAkWejMXVqhTOh2nfyI0ATv1xxWK+F95oBXKsklsiYZE mCKR7M+lKF29INz3cN3Ycp5HLtWrMGR3BncZKW18cuOyabkzIrPl1corY3qx+c00/XZcuBhYIb HKfKg1ETvrs93t0Z/bTr10PyTnRrBt65P+SaxQgjU011+u6tXbsVAy8HawZiyeiHQ5qQ6SqlB8 +X2nxOcN/EXqal1hEC/0Vd9Rv/2aMyNmY2P8A3jH5LZy/v1ZTCKDBbLZVFYkiF+mQBGa7o0bOA z/osJ5Ur8j4Ojx5Z1+RS4GVz Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:30 -0800 IronPort-SDR: kbZImAqxxC3sdRuqBzAo61dhyFFATRgxyzLvm1RXTtYWvHKMjHZGhApvUEOwZiEeocRBqY39WU z0QXyOPVo7YSGW1pO0yqc7SQlaavuImpUIVcdfaDrt6b+blX9NeJR5a6WW1jH0X3cjPzTrKjUc ew/9QoSOFT46M/YyQSnHmUKAy9yHPH59o5ex91vM6ZFC2ZQc3leQCLTxnH5ahWU9Woic4OKIfn OdNYvmd1nPicnGgsjfGhZuXZXEpaT/N2MUu6V4ohhLp2UJ9yhGebn0YkmW5etWZm/Nne6P+okK kIE= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:26 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 23/42] btrfs: extend btrfs_rmap_block for specifying a device Date: Thu, 4 Feb 2021 19:22:02 +0900 Message-Id: <6b900f18c418206ed597abdcb0d7e9c8f47fdac0.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org btrfs_rmap_block currently reverse-maps the physical addresses on all devices to the corresponding logical addresses. Extend the function to match to a specified device. The old functionality of querying all devices is left intact by specifying NULL as target device. A block_device instead of a btrfs_device is passed into btrfs_rmap_block, as this function is intended to reverse-map the result of a bio, which only has a block_device. Also export the function for later use. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 16 +++++++++++----- fs/btrfs/block-group.h | 8 +++----- fs/btrfs/tests/extent-map-tests.c | 2 +- 3 files changed, 15 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 70a0c0f8f99f..f5e9f560ce6d 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1588,6 +1588,7 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) * * @fs_info: the filesystem * @chunk_start: logical address of block group + * @bdev: physical device to resolve, can be NULL to indicate any device * @physical: physical address to map to logical addresses * @logical: return array of logical addresses which map to @physical * @naddrs: length of @logical @@ -1597,9 +1598,9 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) * Used primarily to exclude those portions of a block group that contain super * block copies. */ -EXPORT_FOR_TESTS int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, - u64 physical, u64 **logical, int *naddrs, int *stripe_len) + struct block_device *bdev, u64 physical, u64 **logical, + int *naddrs, int *stripe_len) { struct extent_map *em; struct map_lookup *map; @@ -1617,6 +1618,7 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, map = em->map_lookup; data_stripe_length = em->orig_block_len; io_stripe_size = map->stripe_len; + chunk_start = em->start; /* For RAID5/6 adjust to a full IO stripe length */ if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) @@ -1631,14 +1633,18 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, for (i = 0; i < map->num_stripes; i++) { bool already_inserted = false; u64 stripe_nr; + u64 offset; int j; if (!in_range(physical, map->stripes[i].physical, data_stripe_length)) continue; + if (bdev && map->stripes[i].dev->bdev != bdev) + continue; + stripe_nr = physical - map->stripes[i].physical; - stripe_nr = div64_u64(stripe_nr, map->stripe_len); + stripe_nr = div64_u64_rem(stripe_nr, map->stripe_len, &offset); if (map->type & BTRFS_BLOCK_GROUP_RAID10) { stripe_nr = stripe_nr * map->num_stripes + i; @@ -1652,7 +1658,7 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, * instead of map->stripe_len */ - bytenr = chunk_start + stripe_nr * io_stripe_size; + bytenr = chunk_start + stripe_nr * io_stripe_size + offset; /* Ensure we don't add duplicate addresses */ for (j = 0; j < nr; j++) { @@ -1694,7 +1700,7 @@ static int exclude_super_stripes(struct btrfs_block_group *cache) for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { bytenr = btrfs_sb_offset(i); - ret = btrfs_rmap_block(fs_info, cache->start, + ret = btrfs_rmap_block(fs_info, cache->start, NULL, bytenr, &logical, &nr, &stripe_len); if (ret) return ret; diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 0fd66febe115..d14ac03bb93d 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -277,6 +277,9 @@ void btrfs_put_block_group_cache(struct btrfs_fs_info *info); int btrfs_free_block_groups(struct btrfs_fs_info *info); void btrfs_wait_space_cache_v1_finished(struct btrfs_block_group *cache, struct btrfs_caching_control *caching_ctl); +int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, + struct block_device *bdev, u64 physical, u64 **logical, + int *naddrs, int *stripe_len); static inline u64 btrfs_data_alloc_profile(struct btrfs_fs_info *fs_info) { @@ -303,9 +306,4 @@ static inline int btrfs_block_group_done(struct btrfs_block_group *cache) void btrfs_freeze_block_group(struct btrfs_block_group *cache); void btrfs_unfreeze_block_group(struct btrfs_block_group *cache); -#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS -int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, - u64 physical, u64 **logical, int *naddrs, int *stripe_len); -#endif - #endif /* BTRFS_BLOCK_GROUP_H */ diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c index 57379e96ccc9..c0aefe6dee0b 100644 --- a/fs/btrfs/tests/extent-map-tests.c +++ b/fs/btrfs/tests/extent-map-tests.c @@ -507,7 +507,7 @@ static int test_rmap_block(struct btrfs_fs_info *fs_info, goto out_free; } - ret = btrfs_rmap_block(fs_info, em->start, btrfs_sb_offset(1), + ret = btrfs_rmap_block(fs_info, em->start, NULL, btrfs_sb_offset(1), &logical, &out_ndaddrs, &out_stripe_len); if (ret || (out_ndaddrs == 0 && test->expected_mapped_addr)) { test_err("didn't rmap anything but expected %d", From patchwork Thu Feb 4 10:22:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066859 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CD28C433E0 for ; Thu, 4 Feb 2021 10:28:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CA7B364F64 for ; Thu, 4 Feb 2021 10:28:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235530AbhBDK2N (ORCPT ); Thu, 4 Feb 2021 05:28:13 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54218 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235343AbhBDK1c (ORCPT ); Thu, 4 Feb 2021 05:27:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434451; x=1643970451; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+6mD6KGXYIqq4AlpFryLdPvhIfjgM86x/lbMTwNm6PY=; b=Csig+VJyEbFWvDvzhsxW24mq4ybGQ5uNJuCWW89qAP/ZviDVXlSVm9Du M6mlPkIsmmnOGK+AdMXnHdXPKvNNFNZumnqCJYfi1zKGDcrV1Dv6tP8r8 A/sFpgNztbYbJMjrPMNJfh+EgHrdhlU9kOMjXxCEBi1DZXnrgD4CRoHke MgqubMEYCZUO6SadSodg/gxJBQnfQ+YIhnSARPqUY8Jxuuo8ZHQhQYcyc Ok0rf+PVF8NYDMnTWkLsFV0nVrpYeVRFoUPE6K7XiDTaIHcFSXJs24cKs 7tS7H/fAWtuhitlyoawf1uyilFaoRZrvR/x3BGVqi+BlQ739RLjHpM0++ g==; IronPort-SDR: PeUfwyiq5CiKf+XMyXnBSl/Nw7hD9FOdo9WePRDQGcXmLMlXY4thyiWYv6h446F/owY1vfgNpX r3jM5hyawlA6TLgyFoG8VRrHJMtDEEa8PZWVnMAclHcbqHN8R2+fzI4hMeZu0fwhwgL4zx2R2f 8+kRqLHaAIC3agNfjXjy6ZsLs0bt9WwgXUpx6ROL6kazJfc6PjzmTqW5Hmc3hO/cjxQZV/FNiA bmJO58F4hXsbZUlRoeksYIHn36PL2vX5FGvKA23cLgZJMeTBlrM5GfuBN80Lb7lHhRTomICDp7 KGQ= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108026" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:28 +0800 IronPort-SDR: eo/k+7RZpTd4Tj9Skj6KYgPh5kszll+GHqa6mYOAucd9fn9WwQnY4vp6DBkrBVHCoZtWuAVa+h dCe3H3eFPBx0cto8IcAGiUQeyLodvNdHxF+5Zh1OvRa6g6Jqi5HVduRvMF0U7rJDIx1/yQcAQs nsilbKpNnt+M2FNxdsvpTr22y+eQrSFHA8Qu2NKqmoJHGSaw4rS6FV8GH14eyljsDfjF3wWj86 Y3l+fhLyUZt2zUiTYgN4GJFCriYCi70j32I7azvUcemv4dY7FU5LM3Oy08yixDUK/5sOWrC4IX 70+7QpvzDu+NFN3qt6CmPw0/ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:32 -0800 IronPort-SDR: HQwWMnu/EoHRbD37xrEKRzPuJKTv2uPWOy7Lh+wqM/CTEqnHdQbK28rtqs17Y9GzFXqG/KcbTx 4Azc8gjI3piwtkk+WblhK9pzMdohvCTVuMnzuXTkvDNzv0eC+c8Z/zYWbr62Y58NegRcfGNMdb 16V9HoQZ7bv/cWD8CNDffQBZeJKKGhXzjPff63oj4xi4dsl6X/A7jkctsHlrzA9aG0TRpDFlhg z+c0CUZCnR+Ybms11ifeVGibRcx+gofus5l7Juua9tlwLKp+vIYhb7UhLUfyf50PAlqPSZ6qy/ 5ts= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:27 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Johannes Thumshirn , Josef Bacik Subject: [PATCH v15 24/42] btrfs: zoned: cache if block-group is on a sequential zone Date: Thu, 4 Feb 2021 19:22:03 +0900 Message-Id: X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn On a zoned filesystem, cache if a block-group is on a sequential write only zone. On sequential write only zones, we can use REQ_OP_ZONE_APPEND for writing of data, therefore provide btrfs_use_zone_append() to figure out if I/O is targeting a sequential write only zone and we can use REQ_OP_ZONE_APPEND for data writing. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn --- fs/btrfs/block-group.h | 3 +++ fs/btrfs/zoned.c | 29 +++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 6 ++++++ 3 files changed, 38 insertions(+) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index d14ac03bb93d..31c7c5872b92 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -181,6 +181,9 @@ struct btrfs_block_group { */ int needs_free_space; + /* Flag indicating this block group is placed on a sequential zone */ + bool seq_zone; + /* Record locked full stripes for RAID5/6 block group */ struct btrfs_full_stripe_locks_tree full_stripe_locks_root; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 1de67d789b83..f6c68704c840 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1101,6 +1101,9 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) } } + if (num_sequential > 0) + cache->seq_zone = true; + if (num_conventional > 0) { /* * Avoid calling calculate_alloc_pointer() for new BG. It @@ -1218,3 +1221,29 @@ void btrfs_free_redirty_list(struct btrfs_transaction *trans) } spin_unlock(&trans->releasing_ebs_lock); } + +bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em) +{ + struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct btrfs_block_group *cache; + bool ret = false; + + if (!btrfs_is_zoned(fs_info)) + return false; + + if (!fs_info->max_zone_append_size) + return false; + + if (!is_data_inode(&inode->vfs_inode)) + return false; + + cache = btrfs_lookup_block_group(fs_info, em->block_start); + ASSERT(cache); + if (!cache) + return false; + + ret = cache->seq_zone; + btrfs_put_block_group(cache); + + return ret; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index c105641a6ad3..14d578328cbe 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -46,6 +46,7 @@ void btrfs_calc_zone_unusable(struct btrfs_block_group *cache); void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb); void btrfs_free_redirty_list(struct btrfs_transaction *trans); +bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -133,6 +134,11 @@ static inline void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb) { } static inline void btrfs_free_redirty_list(struct btrfs_transaction *trans) { } +static inline bool btrfs_use_zone_append(struct btrfs_inode *inode, + struct extent_map *em) +{ + return false; +} #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Thu Feb 4 10:22:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066861 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E68D6C433DB for ; Thu, 4 Feb 2021 10:29:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9771364F5E for ; Thu, 4 Feb 2021 10:29:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235378AbhBDK3P (ORCPT ); Thu, 4 Feb 2021 05:29:15 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54222 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235513AbhBDK1g (ORCPT ); Thu, 4 Feb 2021 05:27:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434455; x=1643970455; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=uPWaXkJRIOWqRbtNMZ5jK69QnjJnbxkr+9UN06Wk/i8=; b=nZZzTm8riB4QsRxitOM3vAKNBAkbTnsXcyhGCObVv+1z69KVPBRkBJpb XkIWJXmkzDwQ8spFx8MpSJU8xip9zh0u+4v8509YBNjlHE54jV7gbth9Q w9FTsTQGWDq2w8D+Lr/vs7CJqC55i2UJAZBByf7q0vg0pyfw12lo8PsTq MRT6hCia42OhaQW5zPjUNZA4QNPsyJuE3WK8+7WlJKfmDp15lsdwK/OST aFLfJPUFxj3oDQu8FS51iinbzsE7nsVkXDeOpcoi68J1z0oJGTR3/tKAA r2Nw92rRySUl9JQxyolLPkI2MWWsVSIrm5GtzlG5qTNaG1og089e8o+cF g==; IronPort-SDR: xJLNE4saM7nChl0BJbvh4LjZ9HZa8TipfkH/NZJT7ke6G0+wWDoZqHgYLvtH0UEBzV4aaYeF8n +YYW/PZLnE6jJCJp81pLkPTqhw0uCjbQuy4Fd5qkVwYPq23uNlTXSRYHtfzpxqZY4lIziIzns7 26y9kIPT3UgsFT6WmswvV1X8otnT3nF5XkUgZyRweXPbBFtAjq2qNb+bE44gnU2oTC65VFXSKM g8fivzRyRZSnFV1QyY4WWSM917pByew1oy+rERLR3Pp6AHsfcz9XaTb3oFdwtksO1t2Oi5t7hT UTk= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108031" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:30 +0800 IronPort-SDR: tONe54oGgUv1Sn6EhLz8uVsBhJypPBxOlp27UyAYeGL5b4/fCxAVd2vKAcRkeiPAx54K7keb4U KFGP4PRZIUHqOmu5UAvbsaNw5EnHWxT00bMKTSp3mezYHiAjyu8kQDK1XKLmrr+A/v3dpJamKg HiZFS51Y2nr9oU74N+BkegJSbkv2xRxWMGByqz/UpfMEtX4fORbDPD8JTz6XtRbhxu/Dxl3moJ CXbflUVvJwsI7xu/+tAtw7BLkzbrOr5SJAN/0ZR6KN97ldU6ehe4YjOot3dH6DVcZTAR8LoYzm mmDDakXI9DLuXt1BpV7UVvHV Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:33 -0800 IronPort-SDR: zWviRoaZUZA93NQjTmBupbU6X26GBYBeb9CftuikEhbHnOPWQagmz40y5/dh+aF7DhG7wCk6tD ttrIKchjCOrtc60w3kNZbfjCBYStac14mGwJ3HwfZ7ja75FVd7QBxzHVsbhuQLz2iIH1RPY7gA 2QXkO5fdWjEmEzX56vN6sGAklgiLEMKlAell3v5HXreorPbGUNjjTYIASjOH2dbDc0oGYbntsC HfJBEZm2XFBmw02CDKYa0UZgdhG8HRphoToOvTmGRecf7qkCkkZh/dMgdidq2v8NVd7WyjS3BG VFU= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:29 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Johannes Thumshirn , Josef Bacik Subject: [PATCH v15 25/42] btrfs: save irq flags when looking up an ordered extent Date: Thu, 4 Feb 2021 19:22:04 +0900 Message-Id: X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Johannes Thumshirn A following patch will add another caller of btrfs_lookup_ordered_extent(), but from a bio's endio context. btrfs_lookup_ordered_extent() uses spin_lock_irq() which unconditionally disables interrupts. Change this to spin_lock_irqsave() so interrupts aren't disabled and re-enabled unconditionally. Signed-off-by: Johannes Thumshirn Reviewed-by: Josef Bacik --- fs/btrfs/ordered-data.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 2dc707f02f00..fe235ab935d3 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -767,9 +767,10 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_extent(struct btrfs_inode *ino struct btrfs_ordered_inode_tree *tree; struct rb_node *node; struct btrfs_ordered_extent *entry = NULL; + unsigned long flags; tree = &inode->ordered_tree; - spin_lock_irq(&tree->lock); + spin_lock_irqsave(&tree->lock, flags); node = tree_search(tree, file_offset); if (!node) goto out; @@ -780,7 +781,7 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_extent(struct btrfs_inode *ino if (entry) refcount_inc(&entry->refs); out: - spin_unlock_irq(&tree->lock); + spin_unlock_irqrestore(&tree->lock, flags); return entry; } From patchwork Thu Feb 4 10:22:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066863 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E1D6C433DB for ; Thu, 4 Feb 2021 10:29:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4372064F64 for ; Thu, 4 Feb 2021 10:29:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235430AbhBDK3Z (ORCPT ); Thu, 4 Feb 2021 05:29:25 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54296 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235489AbhBDK1o (ORCPT ); Thu, 4 Feb 2021 05:27:44 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434463; x=1643970463; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fsauJOpb526VofpGQha169pb7Izr+PSkvRzwONpQEzo=; b=YwPrqQVnecv8FSTt9GyC0qAwmqgxr7c6zZXdQCRfqoQF/+az1znDxrmU fLXeFzzkuqB8YrraHDIpsg22jwIBfvrcLFdLK4xVfG9N9wQcKfLSjVGVJ adyJafsV4mYknd7MCGrbECjetUqpEW30KHUYAqU07F21hBsjpL7ZpxIQ6 rOx8gwk57Ume2mPQl9NU+uuHyirP/6QR0+7ytCfxKWzg8g4+xszDRdDBh fEIRxBOOnM9kblbFegqn6wrsWbiw+PtPwRNxIF5mpB7Zgr3iZOBPtidfi qLCNS/rhrLobdc5d/pFhZ9D3JRwvmLbcZ1/kzKPXeqNyVoaNBdlzT/fAC g==; IronPort-SDR: XUAj6/eh/+ZBNmyfg3ME3S0tjhXMD/uxgFEFIQVCtvHv+ZJu3Ln8w+dej6JqUlOvpG/5aCVTrn o9VGp5/SvXppXCU5tt/uhAxUtW8HlbRzxuiFf143Xs/9aBdfaw3ow9rYbi6zSaXxhDlmKJqcHH b+p6GESyMMBAdGQs0yawcqCWw4ZbXf21PpLz2jKQf6R3HpGf/UMMV02h0gPXRLUjgVkEr8iPsA BElKoK3fqHEG2At271jZaoi9wDAjfVkIFjw0nkLwIDQD8scP2w+VcMLtMUX7lmi4tGsPQ0BHhy w8s= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108035" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:32 +0800 IronPort-SDR: DF87ztK29XbVFDl67bMr9ugk75H35fmH2QukOvK9bmXhkIFiq2L3X9GV4A92912WLFImefxtLc fp3tKQltTKcHnIIOcy5PTh9nyrq69ZzcDoNdwM+S8jSiX609IKDfnNRpps/5ZaAAlrcqPPNS7z 4l6QBWNM0GtqQ9VjBO6iIS8V/QCHBudRMB9lXK2DQGOt6ghqiKSUktUOyKpJZvEm4koCAfuis7 vZIro/EnZjUjW4zYYaZN4Gru8vjA6Fb3bpYdF1ysidj11Z9yKNk65VnKyP6kccY7BP5kzFW/OA pJVqa+idG2Y8BUYP8nwk2AP+ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:35 -0800 IronPort-SDR: 7P6jW0Qkjvaklp2CiGXBfTTu9LlqEY013Tyg/lcLsbiR4bKPKLV8xbRyi7ASXBdJgq7rj4sD20 a87p0/YfbAUPGWB6C1p5djRIuLAKKTBFbW2c/zesW+DxzNi0W3HpTPRtbJtB09t1aQMVJRfJzN dRf2aHKT4IB5wDub/WJ7AfgLNP5jFC9PpSUdTgrXrY8+slCYstUyCmVl0S6oT/wWl1c0tEF/50 PKq9LGFexSAAF//UAfaxll/KJLsoB9jiBbV+EZpkmXMqgdudRNz/QBCXLhWi/gO3a/C+BHE8tX qHc= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:30 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Johannes Thumshirn , Josef Bacik Subject: [PATCH v15 26/42] btrfs: zoned: use ZONE_APPEND write for zoned btrfs Date: Thu, 4 Feb 2021 19:22:05 +0900 Message-Id: <722508271fd7a4873f59f4470237ad41a78a8bf6.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This commit enables zone append writing for zoned btrfs. When using zone append, a bio is issued to the start of a target zone and the device decides to place it inside the zone. Upon completion the device reports the actual written position back to the host. Three parts are necessary to enable zone append in btrfs. First, modify the bio to use REQ_OP_ZONE_APPEND in btrfs_submit_bio_hook() and adjust the bi_sector to point the beginning of the zone. Secondly, record the returned physical address (and disk/partno) to the ordered extent in end_bio_extent_writepage() after the bio has been completed. We cannot resolve the physical address to the logical address because we can neither take locks nor allocate a buffer in this end_bio context. So, we need to record the physical address to resolve it later in btrfs_finish_ordered_io(). And finally, rewrites the logical addresses of the extent mapping and checksum data according to the physical address using btrfs_rmap_block. If the returned address matches the originally allocated address, we can skip this rewriting process. Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/extent_io.c | 15 +++++++-- fs/btrfs/file.c | 6 +++- fs/btrfs/inode.c | 4 +++ fs/btrfs/ordered-data.c | 3 ++ fs/btrfs/ordered-data.h | 8 +++++ fs/btrfs/volumes.c | 14 ++++++++ fs/btrfs/zoned.c | 73 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 12 +++++++ 8 files changed, 132 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 72b1a23d17f9..4c186a5f9efa 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2735,6 +2735,7 @@ static void end_bio_extent_writepage(struct bio *bio) u64 start; u64 end; struct bvec_iter_all iter_all; + bool first_bvec = true; ASSERT(!bio_flagged(bio, BIO_CLONED)); bio_for_each_segment_all(bvec, bio, iter_all) { @@ -2761,6 +2762,11 @@ static void end_bio_extent_writepage(struct bio *bio) start = page_offset(page); end = start + bvec->bv_offset + bvec->bv_len - 1; + if (first_bvec) { + btrfs_record_physical_zoned(inode, start, bio); + first_bvec = false; + } + end_extent_writepage(page, error, start, end); end_page_writeback(page); } @@ -3665,6 +3671,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, struct extent_map *em; int ret = 0; int nr = 0; + int opf = REQ_OP_WRITE; const unsigned int write_flags = wbc_to_write_flags(wbc); bool compressed; @@ -3711,6 +3718,10 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, /* Note that em_end from extent_map_end() is exclusive */ iosize = min(em_end, end + 1) - cur; + + if (btrfs_use_zone_append(inode, em)) + opf = REQ_OP_ZONE_APPEND; + free_extent_map(em); em = NULL; @@ -3736,8 +3747,8 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, page->index, cur, end); } - ret = submit_extent_page(REQ_OP_WRITE | write_flags, wbc, - page, disk_bytenr, iosize, + ret = submit_extent_page(opf | write_flags, wbc, page, + disk_bytenr, iosize, cur - page_offset(page), &epd->bio, end_bio_extent_writepage, 0, 0, 0, false); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 5a54f78faed5..0152524599e6 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2168,8 +2168,12 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) * commit waits for their completion, to avoid data loss if we fsync, * the current transaction commits before the ordered extents complete * and a power failure happens right after that. + * + * For zoned filesystem, if a write IO uses a ZONE_APPEND command, the + * logical address recorded in the ordered extent may change. We need + * to wait for the IO to stabilize the logical address. */ - if (full_sync) { + if (full_sync || btrfs_is_zoned(fs_info)) { ret = btrfs_wait_ordered_range(inode, start, len); } else { /* diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 31545e503b9e..6dbab9293425 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -50,6 +50,7 @@ #include "delalloc-space.h" #include "block-group.h" #include "space-info.h" +#include "zoned.h" struct btrfs_iget_args { u64 ino; @@ -2874,6 +2875,9 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) goto out; } + if (ordered_extent->disk) + btrfs_rewrite_logical_zoned(ordered_extent); + btrfs_free_io_failure_record(inode, start, end); if (test_bit(BTRFS_ORDERED_TRUNCATED, &ordered_extent->flags)) { diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index fe235ab935d3..985a21558437 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -199,6 +199,9 @@ static int __btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset entry->compress_type = compress_type; entry->truncated_len = (u64)-1; entry->qgroup_rsv = ret; + entry->physical = (u64)-1; + entry->disk = NULL; + entry->partno = (u8)-1; ASSERT(type == BTRFS_ORDERED_REGULAR || type == BTRFS_ORDERED_NOCOW || diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index c400be75a3f1..99e0853e4d3b 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -139,6 +139,14 @@ struct btrfs_ordered_extent { struct completion completion; struct btrfs_work flush_work; struct list_head work_list; + + /* + * Used to reverse-map physical address returned from ZONE_APPEND write + * command in a workqueue context + */ + u64 physical; + struct gendisk *disk; + u8 partno; }; /* diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 400375aaa197..a4d47c6050f7 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6500,6 +6500,20 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio, btrfs_io_bio(bio)->device = dev; bio->bi_end_io = btrfs_end_bio; bio->bi_iter.bi_sector = physical >> 9; + /* + * For zone append writing, bi_sector must point the beginning of the + * zone + */ + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + if (btrfs_dev_is_sequential(dev, physical)) { + u64 zone_start = round_down(physical, fs_info->zone_size); + + bio->bi_iter.bi_sector = zone_start >> SECTOR_SHIFT; + } else { + bio->bi_opf &= ~REQ_OP_ZONE_APPEND; + bio->bi_opf |= REQ_OP_WRITE; + } + } btrfs_debug_in_rcu(fs_info, "btrfs_map_bio: rw %d 0x%x, sector=%llu, dev=%lu (%s id %llu), size=%u", bio_op(bio), bio->bi_opf, bio->bi_iter.bi_sector, diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index f6c68704c840..050aea447332 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1247,3 +1247,76 @@ bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em) return ret; } + +void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, + struct bio *bio) +{ + struct btrfs_ordered_extent *ordered; + const u64 physical = (u64)bio->bi_iter.bi_sector << SECTOR_SHIFT; + + if (bio_op(bio) != REQ_OP_ZONE_APPEND) + return; + + ordered = btrfs_lookup_ordered_extent(BTRFS_I(inode), file_offset); + if (WARN_ON(!ordered)) + return; + + ordered->physical = physical; + ordered->disk = bio->bi_disk; + ordered->partno = bio->bi_partno; + + btrfs_put_ordered_extent(ordered); +} + +void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered) +{ + struct extent_map_tree *em_tree; + struct extent_map *em; + struct inode *inode = ordered->inode; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + struct btrfs_ordered_sum *sum; + struct block_device *bdev; + u64 orig_logical = ordered->disk_bytenr; + u64 *logical = NULL; + int nr, stripe_len; + + /* + * Zoned devices should not have partitions. So, we can assume it + * is 0. + */ + ASSERT(ordered->partno == 0); + bdev = bdgrab(ordered->disk->part0); + if (WARN_ON(!bdev)) + return; + + if (WARN_ON(btrfs_rmap_block(fs_info, orig_logical, bdev, + ordered->physical, &logical, &nr, + &stripe_len))) + goto out; + + WARN_ON(nr != 1); + + if (orig_logical == *logical) + goto out; + + ordered->disk_bytenr = *logical; + + em_tree = &BTRFS_I(inode)->extent_tree; + write_lock(&em_tree->lock); + em = search_extent_mapping(em_tree, ordered->file_offset, + ordered->num_bytes); + em->block_start = *logical; + free_extent_map(em); + write_unlock(&em_tree->lock); + + list_for_each_entry(sum, &ordered->list, list) { + if (*logical < orig_logical) + sum->bytenr -= orig_logical - *logical; + else + sum->bytenr += *logical - orig_logical; + } + +out: + kfree(logical); + bdput(bdev); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 14d578328cbe..04f7b21652b6 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -47,6 +47,9 @@ void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb); void btrfs_free_redirty_list(struct btrfs_transaction *trans); bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em); +void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, + struct bio *bio); +void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -139,6 +142,15 @@ static inline bool btrfs_use_zone_append(struct btrfs_inode *inode, { return false; } + +static inline void btrfs_record_physical_zoned(struct inode *inode, + u64 file_offset, struct bio *bio) +{ +} + +static inline void btrfs_rewrite_logical_zoned( + struct btrfs_ordered_extent *ordered) { } + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Thu Feb 4 10:22:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066865 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5A68C433DB for ; Thu, 4 Feb 2021 10:29:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 925FE64F5E for ; Thu, 4 Feb 2021 10:29:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235445AbhBDK32 (ORCPT ); Thu, 4 Feb 2021 05:29:28 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54276 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235519AbhBDK1v (ORCPT ); Thu, 4 Feb 2021 05:27:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434470; x=1643970470; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XjcYtZuIM51dojVGj7xLOWZbBmFag0eUIaREfn2zNt4=; b=cASTctruQfuGrXC2LU/93pj0gOwx3EIUTsWCvf7WF6hLvpL4g1FdB5UI Po/l2JrO2HPrshFsWI+AXNsZG+dWAAkIpVooDYvNyMcXrHBdKVov/O2qG LqKAzyMgC9OChqHU9RRwCRZp4w1xP/diVTtkKGUkXMrMgo6l3r0+3VH85 Un+joPFseIvfLS2y9JHYpSRT3pRey9JW+sEvie6HrZfyGaRmurlu4O7Km uX5EMwLRIupGhjIJn5wE+/zUse6l010Tm4fq9Gv2ssQhjwyl4+sUYE4Hz 6BwdBVaw04zeWZEcknwCY1n8QH7mFQPiM3uwnTnaBoRIIxE8LWeWYKvSG A==; IronPort-SDR: 2LiSH5CppNGzAmSPCCFNBdYDfAMRvHGS4ymEUjXQUj3ZOveM1XotSqfOppfaEVyAEUKvNm0uY5 soAxCgzDV/m7eSsIz6TftfkYrW3YmCn0R91WgvTVJ2VnNmkvqPnlgk5rYZ7LyjfYRBblC9C7np G4wQjsVoIhis3MQtLxPCWfSYNj4ar5XvXCoBETwojjyK+UvvnSV544tx3dIdX9Nuy6ersdFZUz 3/iTvB/6epPVg5xvsQRvXpedFSPpS19ny1hTozDTkNXqqU3zDQCdX9yBB2ZDkIQGhLq6yYr8eL 6Y0= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108038" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:33 +0800 IronPort-SDR: tosL7rpVTbI3xHo+qmpuRALxHzhNKeNzhhww1LpH0zO6DhkYsiaKi/RqF62L/oKluXVvs01mx0 AIMaF2t5thMFunm4KnRVqjE+7hLJacqcHt+4qJ5bUU9pZr5QgcbaL5ufu8x7EKM9XLVoyfFniK 8EDfsY66obVJ5OTNFwg8xYfZknDt911xwnSAoCCLHgi9kqbe95jOJok9YB0Gw0eL1WvHjvxZAd Y68ghRhKFYmxn2lV+Bv4w+/xxjA49xp216wKGn76hLPT2L4P3H1bisIZDs13kHmj19Wqg32nyd AFTxXDW7Rfpb/QEcNlccUWv5 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:36 -0800 IronPort-SDR: V7BFVJEeUMiNDpBxcFXAgsLQIyjvnExEdB3SX/N1t0vKi/2X6t5SwuLQl92s5/Oy70BXkRs9pY OnqjL+kEx1mGSYButNn2JJbNec9CZP3mP3b8neqw06gNSuoH+S5LP5x+Ysy/K1tprso3QbeRgK r/SyVepHzaaNveawov+Cn4+xeGFIkCzQ6OZc1wTBVaD3rira8yjFvDiG6IqqYuvu2LjylpL5gO R+4lKK6AEt/VVYY0Ri5T0rSN4DhU0+ZAlaCfjHXUfk6Md9110HqaFZN2C5CS2QSXUYrH1mA7nd 0JE= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:32 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 27/42] btrfs: zoned: enable zone append writing for direct IO Date: Thu, 4 Feb 2021 19:22:06 +0900 Message-Id: <8cb66ba0b58724b0235e207a6d5971a7b8a900a9.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Likewise to buffered IO, enable zone append writing for direct IO when its used on a zoned block device. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 6dbab9293425..dd6fe8afd0e0 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7738,6 +7738,9 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start, iomap->bdev = fs_info->fs_devices->latest_bdev; iomap->length = len; + if (write && btrfs_use_zone_append(BTRFS_I(inode), em)) + iomap->flags |= IOMAP_F_ZONE_APPEND; + free_extent_map(em); return 0; @@ -7964,6 +7967,8 @@ static void btrfs_end_dio_bio(struct bio *bio) if (err) dip->dio_bio->bi_status = err; + btrfs_record_physical_zoned(dip->inode, dip->logical_offset, bio); + bio_put(bio); btrfs_dio_private_put(dip); } @@ -8124,6 +8129,19 @@ static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap, bio->bi_end_io = btrfs_end_dio_bio; btrfs_io_bio(bio)->logical = file_offset; + WARN_ON_ONCE(write && btrfs_is_zoned(fs_info) && + fs_info->max_zone_append_size && + bio_op(bio) != REQ_OP_ZONE_APPEND); + + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + status = extract_ordered_extent(BTRFS_I(inode), bio, + file_offset); + if (status) { + bio_put(bio); + goto out_err; + } + } + ASSERT(submit_len >= clone_len); submit_len -= clone_len; From patchwork Thu Feb 4 10:22:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066867 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D84CC433DB for ; Thu, 4 Feb 2021 10:29:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C688764F5E for ; Thu, 4 Feb 2021 10:29:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235489AbhBDK3t (ORCPT ); Thu, 4 Feb 2021 05:29:49 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54283 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235369AbhBDK14 (ORCPT ); Thu, 4 Feb 2021 05:27:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434476; x=1643970476; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LKjAQeqUaFKvcmg8zhBIMC/N6B8G4XTwHHH2DlqohAg=; b=OMBhgh1fu0UJuzLU9uR2jpwM3FIf6rWIDatYwScb0XULohSPBAiWn7jw fBHvkpGBLbouucsCesY+BQdIOBz0trrM62yfNTL4UQYISanPxw9/wQZtk zm252mVkApPmqPjZziAJ8qF9CGu7ATwHy0x7YmSIScW7hWJGj36Z1+g2L TBV1Gio7a3iq+4zzFP5OxR3fdF7MViwoBxQ/dv6kbgfxHFOpvx+SjYZoa /NO5fkwApFX03I/Pe3wdfppAUrea5bf2bKevLszzTCrvPVyExvmMymJaJ GY7+qgnUGPGN6+GAP0FOu2dokNCXrOGgzvpqGgEXyedopMwBCWzAII7tn A==; IronPort-SDR: OKaY+Yc8qPFLO3Xrmv8he82mT6jK3upFBS5OSUXFlS9k0DNpVGGh7r+YcyxvbViSgAlPaMVCHG 0XcDSV19OQLcHRADrQXPjlNAUBsMp5UF5R2zuHZnkt5UuhzgEERGXkny7R1oxD/WQnbQ0if8u8 cUrXwyorxmasZQ+WlmyOFR8huRPka7ucT7GVDf0cdoRSe8En7fTM8BClw0XTUgxT/zsAMBzQhm x66STvzbdIO9mLNeiQEnEXDUPoazy1bhF9Ot7v39NMsYn5FO7HzWEeh2gjpxyTgGoSUcJojh9m Ypk= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108040" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:34 +0800 IronPort-SDR: HtfmwcVqXCfNHBiVoymzKZWTDpH/U3K+tMSrbcCYGW7hmIHvYPxLaeNNEO10VzfESRXdb/OmVc OeIo3496E3pzBUARKyK0HL1sgC+agWSIeXYZCX6o4xCXlpVeHkWILzE9oc9ScLHs6JCE8MXJXL acTZJY2kOmVphaCD1q1wUzOs+lHP3UyGfWo40eqp61WlyeCDnVmbzrGIHp6uF3l+rDHnX5iG8w hiHh4bav9NZJwJAy7ehpS2pu2YQMuQeK9NzAYp5NUrtFDAadTs7UQ/HEPw6qAi7HcyhvVK4Drf VXw1S3cgYFr2Q6WojzO8/7Aw Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:37 -0800 IronPort-SDR: MSscIHIwGc+gS2/H9Y+Mu/tXFs++ixbhsEi5YY0wJsi34kLDpuRER4Gx6OgtNHlsT18mNu0e/i PWCQoP+D+7dVf+lqEvVSx+txdc9rNTTnA+bwFklqNtbDVrPuICGFFdTTl0mR0/r2vpNGT+WzM6 gFnGxECZVm1RV/1XN25pnhJAMWfRISC2n/PMcbNttPuDVl1He4mLkcyuVBMKRniaPytMuvBNvf q1wyWsBqnZgx+S3b79er/Q/mE1JEO4XozUxBjLMzFUPQ6HqE7vzMCt1C3YZJ3g0t8Km4rZ/WJq 6mU= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:33 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 28/42] btrfs: zoned: introduce dedicated data write path for zoned filesystems Date: Thu, 4 Feb 2021 19:22:07 +0900 Message-Id: <8dc2fc3e477bbbfdb9ba23ead804497b2752d05f.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org If more than one IO is issued for one file extent, these IO can be written to separate regions on a device. Since we cannot map one file extent to such a separate area on a zoned filesystem, we need to follow the "one IO == one ordered extent" rule. The normal buffered, uncompressed and not pre-allocated write path (used by cow_file_range()) sometimes does not follow this rule. It can write a part of an ordered extent when specified a region to write e.g., when its called from fdatasync(). Introduce a dedicated (uncompressed buffered) data write path for zoned filesystems, that will CoW the region and write it at once. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/inode.c | 34 ++++++++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index dd6fe8afd0e0..c4779cde83c6 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1394,6 +1394,29 @@ static int cow_file_range_async(struct btrfs_inode *inode, return 0; } +static noinline int run_delalloc_zoned(struct btrfs_inode *inode, + struct page *locked_page, u64 start, + u64 end, int *page_started, + unsigned long *nr_written) +{ + int ret; + + ret = cow_file_range(inode, locked_page, start, end, page_started, + nr_written, 0); + if (ret) + return ret; + + if (*page_started) + return 0; + + __set_page_dirty_nobuffers(locked_page); + account_page_redirty(locked_page); + extent_write_locked_range(&inode->vfs_inode, start, end, WB_SYNC_ALL); + *page_started = 1; + + return 0; +} + static noinline int csum_exist_in_range(struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes) { @@ -1871,17 +1894,24 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page { int ret; int force_cow = need_force_cow(inode, start, end); + const bool zoned = btrfs_is_zoned(inode->root->fs_info); if (inode->flags & BTRFS_INODE_NODATACOW && !force_cow) { + ASSERT(!zoned); ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 1, nr_written); } else if (inode->flags & BTRFS_INODE_PREALLOC && !force_cow) { + ASSERT(!zoned); ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 0, nr_written); } else if (!inode_can_compress(inode) || !inode_need_compress(inode, start, end)) { - ret = cow_file_range(inode, locked_page, start, end, - page_started, nr_written, 1); + if (zoned) + ret = run_delalloc_zoned(inode, locked_page, start, end, + page_started, nr_written); + else + ret = cow_file_range(inode, locked_page, start, end, + page_started, nr_written, 1); } else { set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags); ret = cow_file_range_async(inode, wbc, locked_page, start, end, From patchwork Thu Feb 4 10:22:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066869 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20665C433DB for ; Thu, 4 Feb 2021 10:30:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CF5A464E22 for ; Thu, 4 Feb 2021 10:30:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235462AbhBDKaI (ORCPT ); Thu, 4 Feb 2021 05:30:08 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54215 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235533AbhBDK2P (ORCPT ); Thu, 4 Feb 2021 05:28:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434493; x=1643970493; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=kmpmGOWUn0PLgq7vxULG9L8m1uiHHUSeVWrdqeokLdI=; b=Gy333wwSsSv5CRcQCA1Qd9hpz2N3CJwPJP8kjiRr8xY4P7SBOaCoboRS puEMt9JMVje1yU9RtnAzss84lY0cyZrViToBauWqMkX7iLGFZOP37om80 FVjDECFgFMikPJx7CL+gyV+m4dxpEjd+/VV9poA5b5R/mOIpEo3CE0zoO vlyNChQCfUCMyFA9jkXCEVE75m6ZL7kuAYqZ9kn9Dn2JA7p12F6yyzCac TJlzkQfN4pbUO+ER4TMvNbd2MpBVRM8DLjiNe2vJSYOOtmxbp7Z3O+Zjc r5KOB6Zk1X91XuCXIRlHLnuNTFt278EvRdR1lFv/e2UhubdYT0i/ZgcJ+ Q==; IronPort-SDR: gmpgdtVyZ+S4r32CyWut5fRuvr+u44G29aGp42WlFxy3Fn7v/BuE+xEZZOlN/S0ltQ7TPtnRvR oYKU0ye9tpW+l8YjqKoBfM6K+s5kl6SZvJ8deIWWQ5GzSTKAoKEymmoDtdayPG5NSdV9n6Hmy0 oMYR7Mx8TLG4s9teCOZ68qBSATjtYNyKHerr79S3tdcilExaDM07V9Nm48eGvhZTFYkOgj5dTS eYiFlnElwu8uzmdpQAVE8au7czJmdMyxO29vxvGm/wR2NDi50a0nxa9u0ARH76fmPxY8UYOI// 4GM= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108042" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:35 +0800 IronPort-SDR: 4RuM/3r4z8p9xRSFVAP/ks/3mT0xAkbVwN9FqAXX3QzvVmnjrmLsWaOBpa3c3yVZtNIK8FuoGi bJcAncIpJNu+1X0Zh3kbDlpD3WGIFffODbzO+jbe/acqq1nK8Y6Jc/iS69R5qYl+kv9QOy9Zta ATF/RB2VOkZyP058mCgy03iXV10sC9tedidwTMkgzUfkH0yROZFj0F1qnDHyqa6VVNLr/1N5j5 tDIsC6Q069e6VGJE/SOho+76Oxwxtuq6UNT22j16Odf+AWwOEapdDwA1Ww/wntYVbO0AlTafDA j2Ki2EabQrOJdgDOiSnEC+N7 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:39 -0800 IronPort-SDR: MZGb/pxx7q/1A7BjtF54DLBRU7c+FkEqWazHT8Wk+RXieotApSCfeON52K/4+ku4EaMzgWr4wY CWzwkFOxBQKWyHg0PESC9NwzXUjCAo11Z+igSmTJzCNM8b4F5e/K7/vJtrG23RTeB2BkBJRxDa rlxqQaeqEzTtIADN71WZeFDHYGiUM+Cibek6LK5SPLguRhGIUhSPyHJJwvhyu0I+2vIQ+lvQkW kOzrNM0y5A8Rkg+dfO5CO6+8014oD8ZeaefozXlaef17e/39VUy/BxJQ/79upyRUZISAfASz5x 0EU= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:34 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 29/42] btrfs: zoned: serialize metadata IO Date: Thu, 4 Feb 2021 19:22:08 +0900 Message-Id: <50946e4e638b9efbc1be374bc60cb878d6ab227c.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org We cannot use zone append for writing metadata, because the B-tree nodes have references to each other using logical address. Without knowing the address in advance, we cannot construct the tree in the first place. So we need to serialize write IOs for metadata. We cannot add a mutex around allocation and submission because metadata blocks are allocated in an earlier stage to build up B-trees. Add a zoned_meta_io_lock and hold it during metadata IO submission in btree_write_cache_pages() to serialize IOs. Furthermore, this adds a per-block group metadata IO submission pointer "meta_write_pointer" to ensure sequential writing, which can break when attempting to write back blocks in an unfinished transaction. If the writing out failed because of a hole and the write out is for data integrity (WB_SYNC_ALL), it returns -EAGAIN. A caller like fsync() code should handle this properly e.g. by falling back to a full transaction commit. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.h | 1 + fs/btrfs/ctree.h | 1 + fs/btrfs/disk-io.c | 1 + fs/btrfs/extent_io.c | 25 ++++++++++++++++++++- fs/btrfs/zoned.c | 50 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 32 +++++++++++++++++++++++++++ 6 files changed, 109 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index 31c7c5872b92..a07108d65c44 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -193,6 +193,7 @@ struct btrfs_block_group { */ u64 alloc_offset; u64 zone_unusable; + u64 meta_write_pointer; }; static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 10da47ab093a..1bb4f767966a 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -975,6 +975,7 @@ struct btrfs_fs_info { /* Max size to emit ZONE_APPEND write command */ u64 max_zone_append_size; + struct mutex zoned_meta_io_lock; #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 70621184a731..458bb27e0327 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2769,6 +2769,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) mutex_init(&fs_info->delete_unused_bgs_mutex); mutex_init(&fs_info->reloc_mutex); mutex_init(&fs_info->delalloc_root_mutex); + mutex_init(&fs_info->zoned_meta_io_lock); seqlock_init(&fs_info->profiles_lock); INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 4c186a5f9efa..ac210cf0956b 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -26,6 +26,7 @@ #include "disk-io.h" #include "subpage.h" #include "zoned.h" +#include "block-group.h" static struct kmem_cache *extent_state_cache; static struct kmem_cache *extent_buffer_cache; @@ -4162,6 +4163,7 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc, struct extent_buffer **eb_context) { struct address_space *mapping = page->mapping; + struct btrfs_block_group *cache = NULL; struct extent_buffer *eb; int ret; @@ -4194,13 +4196,31 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc, if (!ret) return 0; + if (!btrfs_check_meta_write_pointer(eb->fs_info, eb, &cache)) { + /* + * If for_sync, this hole will be filled with + * trasnsaction commit. + */ + if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync) + ret = -EAGAIN; + else + ret = 0; + free_extent_buffer(eb); + return ret; + } + *eb_context = eb; ret = lock_extent_buffer_for_io(eb, epd); if (ret <= 0) { + btrfs_revert_meta_write_pointer(cache, eb); + if (cache) + btrfs_put_block_group(cache); free_extent_buffer(eb); return ret; } + if (cache) + btrfs_put_block_group(cache); ret = write_one_eb(eb, wbc, epd); free_extent_buffer(eb); if (ret < 0) @@ -4246,6 +4266,7 @@ int btree_write_cache_pages(struct address_space *mapping, tag = PAGECACHE_TAG_TOWRITE; else tag = PAGECACHE_TAG_DIRTY; + btrfs_zoned_meta_io_lock(fs_info); retry: if (wbc->sync_mode == WB_SYNC_ALL) tag_pages_for_writeback(mapping, index, end); @@ -4286,7 +4307,7 @@ int btree_write_cache_pages(struct address_space *mapping, } if (ret < 0) { end_write_bio(&epd, ret); - return ret; + goto out; } /* * If something went wrong, don't allow any metadata write bio to be @@ -4321,6 +4342,8 @@ int btree_write_cache_pages(struct address_space *mapping, ret = -EROFS; end_write_bio(&epd, ret); } +out: + btrfs_zoned_meta_io_unlock(fs_info); return ret; } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 050aea447332..2803a3e5d022 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1159,6 +1159,9 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) ret = -EIO; } + if (!ret) + cache->meta_write_pointer = cache->alloc_offset + cache->start; + kfree(alloc_offsets); free_extent_map(em); @@ -1320,3 +1323,50 @@ void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered) kfree(logical); bdput(bdev); } + +bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group **cache_ret) +{ + struct btrfs_block_group *cache; + bool ret = true; + + if (!btrfs_is_zoned(fs_info)) + return true; + + cache = *cache_ret; + + if (cache && (eb->start < cache->start || + cache->start + cache->length <= eb->start)) { + btrfs_put_block_group(cache); + cache = NULL; + *cache_ret = NULL; + } + + if (!cache) + cache = btrfs_lookup_block_group(fs_info, eb->start); + + if (cache) { + if (cache->meta_write_pointer != eb->start) { + btrfs_put_block_group(cache); + cache = NULL; + ret = false; + } else { + cache->meta_write_pointer = eb->start + eb->len; + } + + *cache_ret = cache; + } + + return ret; +} + +void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, + struct extent_buffer *eb) +{ + if (!btrfs_is_zoned(eb->fs_info) || !cache) + return; + + ASSERT(cache->meta_write_pointer == eb->start + eb->len); + cache->meta_write_pointer = eb->start; +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 04f7b21652b6..0755a25d0f4c 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -50,6 +50,11 @@ bool btrfs_use_zone_append(struct btrfs_inode *inode, struct extent_map *em); void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, struct bio *bio); void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered); +bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group **cache_ret); +void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, + struct extent_buffer *eb); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -151,6 +156,19 @@ static inline void btrfs_record_physical_zoned(struct inode *inode, static inline void btrfs_rewrite_logical_zoned( struct btrfs_ordered_extent *ordered) { } +static inline bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, + struct btrfs_block_group **cache_ret) +{ + return true; +} + +static inline void btrfs_revert_meta_write_pointer( + struct btrfs_block_group *cache, + struct extent_buffer *eb) +{ +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) @@ -242,4 +260,18 @@ static inline bool btrfs_can_zone_reset(struct btrfs_device *device, return true; } +static inline void btrfs_zoned_meta_io_lock(struct btrfs_fs_info *fs_info) +{ + if (!btrfs_is_zoned(fs_info)) + return; + mutex_lock(&fs_info->zoned_meta_io_lock); +} + +static inline void btrfs_zoned_meta_io_unlock(struct btrfs_fs_info *fs_info) +{ + if (!btrfs_is_zoned(fs_info)) + return; + mutex_unlock(&fs_info->zoned_meta_io_lock); +} + #endif From patchwork Thu Feb 4 10:22:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066871 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8F88C433E6 for ; Thu, 4 Feb 2021 10:30:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6970264F60 for ; Thu, 4 Feb 2021 10:30:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235567AbhBDKad (ORCPT ); Thu, 4 Feb 2021 05:30:33 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54218 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235267AbhBDK2k (ORCPT ); Thu, 4 Feb 2021 05:28:40 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434519; x=1643970519; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=zeq2M3/a65vBJPrd3OlLIz4zmQghn50s1Ztj4AnIRcs=; b=jzBxmbjq8W+pJpUxQlUslJ1T59SaZKz2tQast7UthG8OE0ZD2fAab4nC tKuS4xwXmoT+AIX2PjbOnMw5YAHk/LGLnjlRWjB6PKUKysaJLnOnamtIm Fdj2QbPsZHaQhAoHkTR5HqD+DkCusLA6T5MdxKMPzjUvqTaJzrGprgZJu MVZU86UcFFmhbY7XuPQhhXL9txocODiCf9rEBzzUISE0WirGAvNRAgHC6 ExzgkF1NXetFZqiMq8a66W8UFTco1/iaFe/o1JoSzBlx9C+2/QOFgkAi2 g2zAMQezmLJSo8iyA1jl4XllDhHnv927dSfcCcEj4qAnQApuKjpDGoiOc Q==; IronPort-SDR: g0Q0PfDjnp5W0Dns+hKCxdQeY2FUBvkKYyKjMGHfI0RPunhkX6nl5qUmb6hbykGr3DrnHbGSQV ezjKXnWCTkP9pGE2Mja9AbHJOTaoRUpYtcIsJjXDs2k6BAb1c/+QUYKiRxRPYnBscCz4etYcjZ VOaog9dSE1IhH+YQws7jceMFHK4mqzDMfN/Ms91OVd08AG8SR5hUHCNuC2Mx4ozKiwXHf/7U5Q LMf9d0WOinaQsVH3q2+Nx4huvCdw7OLqb52yOaqWCCtSCgOyPS7ucgm9KTZmpGZFPanb+hMzre juA= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108045" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:37 +0800 IronPort-SDR: cBA78nPkqfESplqjiAQEc7SLsM4A8jut1gzDn9sGxvzYWbzm4A11k6cp0KQse7hGh5UDxn5vEl 8E/0hOQEO2oqUMunRtDuePIaRHui2ZTshF4TyeT+bxQXn8HxcvJb+M3/jfXu1yVJyiw/LgMgAT IJoVZ+nnDJX3g6Axevt4F9uQ2u8CNYoptjCEFzTExojH8ntAiaeK5gzTxlgPDv1dOKkdZcyx9w O2uys9KPyXPGU+6IeQSRKiEBmMhozUsbRHTjFbsOxRcJQGfLamDXDhkQw0IBTh6S8Uo06GSQE/ HDv27IWtSHv/zmamvPKLZcky Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:40 -0800 IronPort-SDR: TjVNHvGYSJGSMTRCsMBNd5COq+IduvWtYcFJX7L6wACWK9iVSKC2KPx6C/lNIGjo6xGelq/Fdc Vc+00sepCMUeu7RNIVclk4w3hOx+9qGbXF+K8ORttY4OqtrFxDpP8wKShwwHGlzYSXWTUQHNMQ Naj1WAa8PwhrmmH+V1MSpVmvGCI+4/6dwMrpte9eQ1a7Iq+uxxLDJvJpPVmSRjpDCaaCT6sD6a xBy04o5MsoumQF83UUOZGjSwu35Wl9j3yRnRb2cc7TfJ5C/gAu7Q7EIyarRX8DFPyms/+L8KhM E9o= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:36 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 30/42] btrfs: zoned: wait for existing extents before truncating Date: Thu, 4 Feb 2021 19:22:09 +0900 Message-Id: X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When truncating a file, file buffers which have already been allocated but not yet written may be truncated. Truncating these buffers could cause breakage of a sequential write pattern in a block group if the truncated blocks are for example followed by blocks allocated to another file. To avoid this problem, always wait for write out of all unwritten buffers before proceeding with the truncate execution. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/inode.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index c4779cde83c6..535abf898225 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5169,6 +5169,15 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr) btrfs_drew_write_unlock(&root->snapshot_lock); btrfs_end_transaction(trans); } else { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + + if (btrfs_is_zoned(fs_info)) { + ret = btrfs_wait_ordered_range(inode, + ALIGN(newsize, fs_info->sectorsize), + (u64)-1); + if (ret) + return ret; + } /* * We're truncating a file that used to have good data down to From patchwork Thu Feb 4 10:22:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066873 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9318C433DB for ; Thu, 4 Feb 2021 10:31:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6A3F164E35 for ; Thu, 4 Feb 2021 10:31:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235457AbhBDKa4 (ORCPT ); Thu, 4 Feb 2021 05:30:56 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54222 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235037AbhBDK3t (ORCPT ); Thu, 4 Feb 2021 05:29:49 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434589; x=1643970589; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=p7GEv6HHHgKj1qej7nbXUcprZAlz1nalK06yluiEITI=; b=fD7yGo2InraRzUFGJr2qvGKSm8ZA2CHjWa+Zp40nTN6zbFvam9pPX33G Izk0NVJ/Ow1MESKs1odokGrVcXZI41z8smwt0Tx1VXqx+D9kS+5DQRN94 lDMLL/fNY5d4lfmhhpn733Q3lIF5MsDUtFORDEJHi1JVQacEY0jAiRcb4 eFOKivXMRDQzo534/j6cU5+9/m0/AE691eoiVJeRjnNGHwMrFh/QtZRQh rY/WcXX8ZjwnxaKcq8SzlZYIE+A9ctRFzJhdm+MSJWY2GtU+/tXAIzf6S xbNtfzg4+f+RG/oJNX22AlKYnB9J23ljZc49p2o9rrM/lgNAN+qKi2Dzf Q==; IronPort-SDR: XR6k3tUD6M6DFxT0GJLkBhABzWCtkdboDKN76x8XUMav7G+mECt5LxKdV6CupPLLOKzpx4nvnP BLMRV2gOukUVMIDu9PVfzW8IeLJokwEOEKW1adw9113pgJan0xXWL7kZMaWnQfA7xaMKAVDH3K K00dKNJtOP937FrhkSNeM1go6M3m1hL8eMAX4HgdzlOVpUjZumKE7Gu2JH8eG6elh+JvQuFrlB y8THHkYzuuWG1y3FVni5/Or8Fj22J/W6j7KvoT77vlOz7Hz1SaHExtXVGfX8zH4y5+xKyfALxW BVY= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108047" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:38 +0800 IronPort-SDR: V15c2XA5WXdln/k3t6DwRVs5ovBupzKGMYZd5xBQVTgnnGZjt9UehsvMuXQhqGi4UBs4bU3GlU AGy3XHbh6C4965A2U3wDm/y/k86aSJHoEqNm+9enRlbu4CqF5gSuhQWsQ6jfQlUpFM/n0DV/FI oB4ZzxtJZPiih8wU3jtZoHw8u332VREfpTD8UIsEdZ5huLQOKOKuCmefowvHkM2JoCLBzwYyig HE0V4ZS0TRg1/cST+RnOYJM0ZAPxL2Igr4zxw6kaFqW8do6JHLNluPZXQjZrr9UmrWQfp4sbKB cMz26VpeJ59/ywpcFixSHMDL Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:41 -0800 IronPort-SDR: ggq4VC0f1xpAZGPTIoJTKvCFdxTZD038MCQ9FwYn9nwpppuZTxwyDJGxB1/jGtZIYSKOHbNs6C s8WH2S78NsW83XdlILJPFch1mlAMfuU3oEutSqfWxHE9Fq1MxFNy3tpU3+UavRKdPD/5jNsjdn NMxuY7M7DUnDG45siRK0G3olEMc+cqtPkK3fkK5fQkJ8lX/r+l4o6vO3yJPulP72T23r4rH6Lo DPnRn4jD5aV5QsTd3+Z0otH+RykNDVJiuao2j2si1NbH0lbb04nXODxP2Lx/Uuqogpe7f5zNgI f9k= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:37 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 31/42] btrfs: zoned: do not use async metadata checksum on zoned filesystems Date: Thu, 4 Feb 2021 19:22:10 +0900 Message-Id: <09123e44380218e0a642320848b924377e74ba9a.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On zoned filesystems, btrfs uses per-FS zoned_meta_io_lock to serialize the metadata write IOs. Even with this serialization, write bios sent from btree_write_cache_pages can be reordered by async checksum workers as these workers are per CPU and not per zone. To preserve write BIO ordering, we disable async metadata checksum on a zoned filesystem. This does not result in lower performance with HDDs as a single CPU core is fast enough to do checksum for a single zone write stream with the maximum possible bandwidth of the device. If multiple zones are being written simultaneously, HDD seek overhead lowers the achievable maximum bandwidth, resulting again in a per zone checksum serialization not affecting the performance. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 458bb27e0327..6e16f556ed75 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -871,6 +871,8 @@ static blk_status_t btree_submit_bio_start(struct inode *inode, struct bio *bio, static int check_async_write(struct btrfs_fs_info *fs_info, struct btrfs_inode *bi) { + if (btrfs_is_zoned(fs_info)) + return 0; if (atomic_read(&bi->sync_writers)) return 0; if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags)) From patchwork Thu Feb 4 10:22:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066875 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA387C43381 for ; Thu, 4 Feb 2021 10:31:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9E45264F65 for ; Thu, 4 Feb 2021 10:31:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235576AbhBDKbC (ORCPT ); Thu, 4 Feb 2021 05:31:02 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54296 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235546AbhBDK35 (ORCPT ); Thu, 4 Feb 2021 05:29:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434596; x=1643970596; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=kLRdm4FooKiko2saNwTcBYh7S9db83ek0H2mkQ0dkLI=; b=Ehupwyqr3CH2N588hvcW5TxGzBGIKBz7brmPdxbmAlOF+0rDeBKruo4l S1OqxLTV56H5jxLMsb8/8X57OXXrX+69lgspueNb0gsxEO71GpDP8ctAv GXRE91/lF9Aq6oTSmy9djENL3z3D387u41MhwTqDlyldLZADurykKQq3z vPYhi/JZ4oMiQzYIeLdPMUPR1QOCplVnb7jBFRSKrG+nSRvJyv5SPx5Tl gyMsvbAob7AsGjU5It3V6Jv6obbRNMeijoC1QPEePkMcsPNX6I7hLa65z sEmkroRSklJfrGkKkXoHYYzUrsgggAkewC+nSXlVqldaearxTmH28up4o g==; IronPort-SDR: 4XS+aJym46yTUdqLmhylRAQNvf4+NKi/GUnZZs2IjiVDHl0weYdmK3LqdMNo0XIToM2WTaUpzp IuxFNm9bl/m3+ylqAxQV8loadyle9GiyYY5ACMZ0cFmZht2Ge1jARBfGAZTanLzjJ4Z+VI2szH sapFgZ8qVWA6xuIT3OH7Jubkvk0awZrlDNir+HIjeZR5qEYhOjaNkma+aykUAw/pNZrfq6GIUk NUftJJeddklspoV74fmtvqd2LkP6EDTeDmEzvtQMkpMX21YGIXzOR/SpgfFsgDfhrfuwuQFQVQ 4X8= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108049" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:39 +0800 IronPort-SDR: ogUpfNyXFHFGMh8V7O0/DTWiTXV3U/CAJgSvLZO7GBjIy8ult0Lat9UVLX5qeq77XZAXyrR0L6 v9XgWLddl4Kei8q5uW1UsOC/yDfawqjx2SSzkoBof23eWR52evl8iJ4HIdo5y/JIspE97UnME6 tOeRR1nGSRaGEX8rPsn31zR3+3RwHXBiocGS9f++sJglyxfHVgLzuqbMtfqDSPikPEA58Tqa3x m02AIkAG7gIwLwInDUtj3tkBAUHX1/TqNYvBAvsTor4/MTZyzEZqrJMvQZ1dos92YbYy9FEkvq pxSTe4N32BRvKFZcM7PDSlwx Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:43 -0800 IronPort-SDR: F3PmISOwBChD8jZ0QE4dV8xSE0yGjs0JqTbnWy/KyLrKUdGeWFz2+9vO0vjxaypvcs9WVOu+5t MMtdJRM05m+fGzz9tUrcEo/9KIO6CAr7gKhpDgxybW4EWT5HrjFmqCASGiiouPfOmHACKyUOXR qu4TJLyPk1oft7jQl5cRBnDZG/F5/m5OevCYfb7R6fc3dQmyq6Srt7kL1WpB1I1eC5FeBcOOAp KztSonY8fcj1nDSOmVbyiENAj3rtC38HhWwQ52KRIBr7ZP83sOlksn64HqbjKoq5QiuNZLt0Gb m5E= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:38 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 32/42] btrfs: zoned: mark block groups to copy for device-replace Date: Thu, 4 Feb 2021 19:22:11 +0900 Message-Id: X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 1/4 patch to support device-replace on zoned filesystems. We have two types of I/Os during the device-replace process. One is an I/O to "copy" (by the scrub functions) all the device extents from the source device to the destination device. The other one is an I/O to "clone" (by handle_ops_on_dev_replace()) new incoming write I/Os from users to the source device into the target device. Cloning incoming I/Os can break the sequential write rule in on target device. When a write is mapped in the middle of a block group, the I/O is directed to the middle of a target device zone, which breaks the sequential write requirement. However, the cloning function cannot be disabled since incoming I/Os targeting already copied device extents must be cloned so that the I/O is executed on the target device. We cannot use dev_replace->cursor_{left,right} to determine whether a bio is going to a not yet copied region. Since we have a time gap between finishing btrfs_scrub_dev() and rewriting the mapping tree in btrfs_dev_replace_finishing(), we can have a newly allocated device extent which is never cloned nor copied. So the point is to copy only already existing device extents. This patch introduces mark_block_group_to_copy() to mark existing block groups as a target of copying. Then, handle_ops_on_dev_replace() and dev-replace can check the flag to do their job. Also, btrfs_finish_block_group_to_copy() will check if the copied stripe is the last stripe in the block group. With the last stripe copied, the to_copy flag is finally disabled. Afterwards we can safely clone incoming IOs on this block group. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.h | 1 + fs/btrfs/dev-replace.c | 184 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/dev-replace.h | 3 + fs/btrfs/scrub.c | 16 ++++ 4 files changed, 204 insertions(+) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index a07108d65c44..d37ee576ac6e 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -95,6 +95,7 @@ struct btrfs_block_group { unsigned int iref:1; unsigned int has_caching_ctl:1; unsigned int removed:1; + unsigned int to_copy:1; int disk_cache_state; diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index bc73f798ce3a..3a9c1e046ebe 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -22,6 +22,7 @@ #include "dev-replace.h" #include "sysfs.h" #include "zoned.h" +#include "block-group.h" /* * Device replace overview @@ -459,6 +460,185 @@ static char* btrfs_dev_name(struct btrfs_device *device) return rcu_str_deref(device->name); } +static int mark_block_group_to_copy(struct btrfs_fs_info *fs_info, + struct btrfs_device *src_dev) +{ + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_key found_key; + struct btrfs_root *root = fs_info->dev_root; + struct btrfs_dev_extent *dev_extent = NULL; + struct btrfs_block_group *cache; + struct btrfs_trans_handle *trans; + int ret = 0; + u64 chunk_offset; + + /* Do not use "to_copy" on non zoned filesystem for now */ + if (!btrfs_is_zoned(fs_info)) + return 0; + + mutex_lock(&fs_info->chunk_mutex); + + /* Ensure we don't have pending new block group */ + spin_lock(&fs_info->trans_lock); + while (fs_info->running_transaction && + !list_empty(&fs_info->running_transaction->dev_update_list)) { + spin_unlock(&fs_info->trans_lock); + mutex_unlock(&fs_info->chunk_mutex); + trans = btrfs_attach_transaction(root); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + mutex_lock(&fs_info->chunk_mutex); + if (ret == -ENOENT) { + spin_lock(&fs_info->trans_lock); + continue; + } else { + goto unlock; + } + } + + ret = btrfs_commit_transaction(trans); + mutex_lock(&fs_info->chunk_mutex); + if (ret) + goto unlock; + + spin_lock(&fs_info->trans_lock); + } + spin_unlock(&fs_info->trans_lock); + + path = btrfs_alloc_path(); + if (!path) { + ret = -ENOMEM; + goto unlock; + } + + path->reada = READA_FORWARD; + path->search_commit_root = 1; + path->skip_locking = 1; + + key.objectid = src_dev->devid; + key.type = BTRFS_DEV_EXTENT_KEY; + key.offset = 0; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + if (ret < 0) + goto free_path; + if (ret > 0) { + if (path->slots[0] >= + btrfs_header_nritems(path->nodes[0])) { + ret = btrfs_next_leaf(root, path); + if (ret < 0) + goto free_path; + if (ret > 0) { + ret = 0; + goto free_path; + } + } else { + ret = 0; + } + } + + while (1) { + struct extent_buffer *leaf = path->nodes[0]; + int slot = path->slots[0]; + + btrfs_item_key_to_cpu(leaf, &found_key, slot); + + if (found_key.objectid != src_dev->devid) + break; + + if (found_key.type != BTRFS_DEV_EXTENT_KEY) + break; + + if (found_key.offset < key.offset) + break; + + dev_extent = btrfs_item_ptr(leaf, slot, struct btrfs_dev_extent); + + chunk_offset = btrfs_dev_extent_chunk_offset(leaf, dev_extent); + + cache = btrfs_lookup_block_group(fs_info, chunk_offset); + if (!cache) + goto skip; + + spin_lock(&cache->lock); + cache->to_copy = 1; + spin_unlock(&cache->lock); + + btrfs_put_block_group(cache); + +skip: + ret = btrfs_next_item(root, path); + if (ret != 0) { + if (ret > 0) + ret = 0; + break; + } + } + +free_path: + btrfs_free_path(path); +unlock: + mutex_unlock(&fs_info->chunk_mutex); + + return ret; +} + +bool btrfs_finish_block_group_to_copy(struct btrfs_device *srcdev, + struct btrfs_block_group *cache, + u64 physical) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct extent_map *em; + struct map_lookup *map; + u64 chunk_offset = cache->start; + int num_extents, cur_extent; + int i; + + /* Do not use "to_copy" on non zoned filesystem for now */ + if (!btrfs_is_zoned(fs_info)) + return true; + + spin_lock(&cache->lock); + if (cache->removed) { + spin_unlock(&cache->lock); + return true; + } + spin_unlock(&cache->lock); + + em = btrfs_get_chunk_map(fs_info, chunk_offset, 1); + ASSERT(!IS_ERR(em)); + map = em->map_lookup; + + num_extents = cur_extent = 0; + for (i = 0; i < map->num_stripes; i++) { + /* We have more device extent to copy */ + if (srcdev != map->stripes[i].dev) + continue; + + num_extents++; + if (physical == map->stripes[i].physical) + cur_extent = i; + } + + free_extent_map(em); + + if (num_extents > 1 && cur_extent < num_extents - 1) { + /* + * Has more stripes on this device. Keep this block group + * readonly until we finish all the stripes. + */ + return false; + } + + /* Last stripe on this device */ + spin_lock(&cache->lock); + cache->to_copy = 0; + spin_unlock(&cache->lock); + + return true; +} + static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, const char *tgtdev_name, u64 srcdevid, const char *srcdev_name, int read_src) @@ -500,6 +680,10 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, if (ret) return ret; + ret = mark_block_group_to_copy(fs_info, src_device); + if (ret) + return ret; + down_write(&dev_replace->rwsem); switch (dev_replace->replace_state) { case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED: diff --git a/fs/btrfs/dev-replace.h b/fs/btrfs/dev-replace.h index 60b70dacc299..3911049a5f23 100644 --- a/fs/btrfs/dev-replace.h +++ b/fs/btrfs/dev-replace.h @@ -18,5 +18,8 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info); void btrfs_dev_replace_suspend_for_unmount(struct btrfs_fs_info *fs_info); int btrfs_resume_dev_replace_async(struct btrfs_fs_info *fs_info); int __pure btrfs_dev_replace_is_ongoing(struct btrfs_dev_replace *dev_replace); +bool btrfs_finish_block_group_to_copy(struct btrfs_device *srcdev, + struct btrfs_block_group *cache, + u64 physical); #endif diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 5f4f88a4d2c8..da4f9c24e42d 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3561,6 +3561,16 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, if (!cache) goto skip; + if (sctx->is_dev_replace && btrfs_is_zoned(fs_info)) { + spin_lock(&cache->lock); + if (!cache->to_copy) { + spin_unlock(&cache->lock); + ro_set = 0; + goto done; + } + spin_unlock(&cache->lock); + } + /* * Make sure that while we are scrubbing the corresponding block * group doesn't get its logical address and its device extents @@ -3692,6 +3702,12 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, scrub_pause_off(fs_info); + if (sctx->is_dev_replace && + !btrfs_finish_block_group_to_copy(dev_replace->srcdev, + cache, found_key.offset)) + ro_set = 0; + +done: down_write(&dev_replace->rwsem); dev_replace->cursor_left = dev_replace->cursor_right; dev_replace->item_needs_writeback = 1; From patchwork Thu Feb 4 10:22:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066877 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A4E1C433E0 for ; Thu, 4 Feb 2021 10:31:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4AEA564E43 for ; Thu, 4 Feb 2021 10:31:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235534AbhBDKbW (ORCPT ); Thu, 4 Feb 2021 05:31:22 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54276 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235548AbhBDKaB (ORCPT ); Thu, 4 Feb 2021 05:30:01 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434600; x=1643970600; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=imiE5g7Lug2NvEDzj3o+lgjEqhIrDDS4EjlBgVWz1UA=; b=NLn5BW28u6y5VmanXC4WnREZ4UbcVgYND1b+3dUL0LM3+O8vtA60QDBV cyvedmxUP6/qqU6wfsz3IBkH5DXEZjroaxF0tG86UJqdfA+jF1HdokpIg Y4kKdWkhF3G4Jg4rpOaAfdOjvlVLQHc+Bt0wsDr6cqWWb0ubt6L/oKMaq 4blDJbir2HRGgZ91SY6PiwJHUBtyA2TLOASLxamxCVHeINljt1UPe91bx afurW/cIeMHEXCPpeAohO2U2EpyKjrF77xuU9ge4AH87aCX/fzkVBylU9 DPPTOM53DQ6rKu+12PfgJKms0hYdb7zLvDdSSZ46fUe0Z6fcqaCvoiT0v w==; IronPort-SDR: msmxWjMpJj6No7S+WaeWDCzYLYTavzVQN5yGl+Rg+JVFgiVg+j6MrmJE7+djPotDzg13o07qYp r16SZ2MSRoexczso6Whh0zSLn4N9NgSIkW+i611TSm1cW7VNfmvasIM/0f28Hp4dNjkhSptBZX y1fzEqCYkQ23AtCJ/Tf1MXSZQYxp0hB9dv63iiqxzLcC/oyuPE7z1HgX+EfXviBmJ8pw92dzQg SxMohO6DghmrNlhjCM+g7jyJ4kTK371KjaitD5aGts4bX1EKCtyd98XT+rT8hL08fKx7nk6do8 edI= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108055" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:41 +0800 IronPort-SDR: kSQ7EAeeh2V7QD/Ay/hKoZcVZkuPL7B8WwvJCMfhqX9tnEXED94R8pjW8xMiZLTphdU1wrvq4v YGxvziqsG+tWC27xILnLElUw4M90jNgqSLJ5lNxAtXxkv2cBURtpW9W5/88ntSTxEd6ZwFYWg8 EAnK8fEGLQD9D0PG+1Jlu5lvzQnHuLU+eidP6eOOF2rfspiGNrzLsxzWFp/973dQMPdNw/meA3 CrxfCCvTkm3KXrohsgAav+tJ7YkuQ5S1NYCudhvPlRJYva8QIHrrM332EVYaXJiCYvXNIB9KxH h0kFR5E+mHQFC5qUpFfimNQ9 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:44 -0800 IronPort-SDR: sYeoEDIm68llx+Sj+ppjC5QzkMD/t3lUY7QeMtfUPcGiqHnmeWF53zMgm7XWLjYsnNJQCPo1ui XbMULy05Bjpbo0J8rEUQU3Hf5YenI2nhVjRQSBXNevFss2qdUZo506/I009AwVYHfwwpkttU9N sNp+cq29K/0sJPXX303QvEgZj0VrjslmoZL3an0h6KwCB1WWkk12vBZc7PhBkrqRrsXPz1XxPM yufPIxztB1RCncIIoIGSVZqn/MstwT75+GcpaJM9SJjKGkWLyvT4dLuNIIw5qUKnJRKsqkzY/u j5I= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:40 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 33/42] btrfs: zoned: implement cloning for zoned device-replace Date: Thu, 4 Feb 2021 19:22:12 +0900 Message-Id: <867cd9b7da207fa0be039e5e80502843cf3388dc.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is 2/4 patch to implement device-replace for zoned filesystems. In zoned mode, a block group must be either copied (from the source device to the destination device) or cloned (to the both device). This commit implements the cloning part. If a block group targeted by an IO is marked to copy, we should not clone the IO to the destination device, because the block group is eventually copied by the replace process. This commit also handles cloning of device reset. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/extent-tree.c | 57 +++++++++++++++++++++++++++++++----------- fs/btrfs/volumes.c | 31 +++++++++++++++++++++-- fs/btrfs/zoned.c | 9 +++++++ 3 files changed, 80 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a717366c9823..e2b2abc42295 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -35,6 +35,7 @@ #include "discard.h" #include "rcu-string.h" #include "zoned.h" +#include "dev-replace.h" #undef SCRAMBLE_DELAYED_REFS @@ -1265,6 +1266,46 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len, return ret; } +static int do_discard_extent(struct btrfs_bio_stripe *stripe, u64 *bytes) +{ + struct btrfs_device *dev = stripe->dev; + struct btrfs_fs_info *fs_info = dev->fs_info; + struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace; + u64 phys = stripe->physical; + u64 len = stripe->length; + u64 discarded = 0; + int ret = 0; + + /* Zone reset on a zoned filesystem */ + if (btrfs_can_zone_reset(dev, phys, len)) { + u64 src_disc; + + ret = btrfs_reset_device_zone(dev, phys, len, &discarded); + if (ret) + goto out; + + if (!btrfs_dev_replace_is_ongoing(dev_replace) || + dev != dev_replace->srcdev) + goto out; + + src_disc = discarded; + + /* Send to replace target as well */ + ret = btrfs_reset_device_zone(dev_replace->tgtdev, phys, len, + &discarded); + discarded += src_disc; + } else if (blk_queue_discard(bdev_get_queue(stripe->dev->bdev))) { + ret = btrfs_issue_discard(dev->bdev, phys, len, &discarded); + } else { + ret = 0; + *bytes = 0; + } + +out: + *bytes = discarded; + return ret; +} + int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes, u64 *actual_bytes) { @@ -1298,28 +1339,14 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, stripe = bbio->stripes; for (i = 0; i < bbio->num_stripes; i++, stripe++) { - struct btrfs_device *dev = stripe->dev; - u64 physical = stripe->physical; - u64 length = stripe->length; u64 bytes; - struct request_queue *req_q; if (!stripe->dev->bdev) { ASSERT(btrfs_test_opt(fs_info, DEGRADED)); continue; } - req_q = bdev_get_queue(stripe->dev->bdev); - /* Zone reset on zoned filesystems */ - if (btrfs_can_zone_reset(dev, physical, length)) - ret = btrfs_reset_device_zone(dev, physical, - length, &bytes); - else if (blk_queue_discard(req_q)) - ret = btrfs_issue_discard(dev->bdev, physical, - length, &bytes); - else - continue; - + ret = do_discard_extent(stripe, &bytes); if (!ret) { discarded_bytes += bytes; } else if (ret != -EOPNOTSUPP) { diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index a4d47c6050f7..52ec6721ada2 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5973,9 +5973,29 @@ static int get_extra_mirror_from_replace(struct btrfs_fs_info *fs_info, return ret; } +static bool is_block_group_to_copy(struct btrfs_fs_info *fs_info, u64 logical) +{ + struct btrfs_block_group *cache; + bool ret; + + /* Non-ZONED mode does not use "to_copy" flag */ + if (!btrfs_is_zoned(fs_info)) + return false; + + cache = btrfs_lookup_block_group(fs_info, logical); + + spin_lock(&cache->lock); + ret = cache->to_copy; + spin_unlock(&cache->lock); + + btrfs_put_block_group(cache); + return ret; +} + static void handle_ops_on_dev_replace(enum btrfs_map_op op, struct btrfs_bio **bbio_ret, struct btrfs_dev_replace *dev_replace, + u64 logical, int *num_stripes_ret, int *max_errors_ret) { struct btrfs_bio *bbio = *bbio_ret; @@ -5988,6 +6008,13 @@ static void handle_ops_on_dev_replace(enum btrfs_map_op op, if (op == BTRFS_MAP_WRITE) { int index_where_to_add; + /* + * A block group which have "to_copy" set will eventually + * copied by dev-replace process. We can avoid cloning IO here. + */ + if (is_block_group_to_copy(dev_replace->srcdev->fs_info, logical)) + return; + /* * duplicate the write operations while the dev replace * procedure is running. Since the copying of the old disk to @@ -6376,8 +6403,8 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL && need_full_stripe(op)) { - handle_ops_on_dev_replace(op, &bbio, dev_replace, &num_stripes, - &max_errors); + handle_ops_on_dev_replace(op, &bbio, dev_replace, logical, + &num_stripes, &max_errors); } *bbio_ret = bbio; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 2803a3e5d022..72d9c8ba98a3 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -11,6 +11,7 @@ #include "disk-io.h" #include "block-group.h" #include "transaction.h" +#include "dev-replace.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -1036,6 +1037,8 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) for (i = 0; i < map->num_stripes; i++) { bool is_sequential; struct blk_zone zone; + struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace; + int dev_replace_is_ongoing = 0; device = map->stripes[i].dev; physical = map->stripes[i].physical; @@ -1062,6 +1065,12 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) */ btrfs_dev_clear_zone_empty(device, physical); + down_read(&dev_replace->rwsem); + dev_replace_is_ongoing = btrfs_dev_replace_is_ongoing(dev_replace); + if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL) + btrfs_dev_clear_zone_empty(dev_replace->tgtdev, physical); + up_read(&dev_replace->rwsem); + /* * The group is mapped to a sequential zone. Get the zone write * pointer to determine the allocation offset within the zone. From patchwork Thu Feb 4 10:22:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066879 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C005DC433DB for ; Thu, 4 Feb 2021 10:31:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7870764E43 for ; Thu, 4 Feb 2021 10:31:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235579AbhBDKb0 (ORCPT ); Thu, 4 Feb 2021 05:31:26 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54283 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235538AbhBDKaY (ORCPT ); Thu, 4 Feb 2021 05:30:24 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434623; x=1643970623; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=eULFqCHZufhwsqXDO/ov5uQGBbrw/vQdJbF508DeY7k=; b=atCSq54EdV2DDSwGZBCt0hvKIlxhV7wIVHjsmFzqq2vlVaT4pZNPKLSW B8whyzcSHtNur5HWUu0W4WDMsEVXkMeoAIxjq9b2nC9RmSDG6iqtqvQeu BImu14nXcPvvR3Dqrw3PI27BVZV/PcPav7ZL0wTFad3dNXChh/avMrkYd ElSTbyvR7M1rg5lUCXGe4V9nzI2P3dCfhjO0Yk1z+m6wbYhcUThEP3Gcy e5Y45n+F1o3Ket1auyih9stqtN4vWCEmzvxvpSWnp/lu0E98lXi04hMxg fbD4m1/3WkeleS9EOdxv+iDxWJevojkSOjoa/1LmT9tAZmPbwHhJtgWb2 w==; IronPort-SDR: KHB0unAwi0aGGV7KXLIYrpLYufMBm2q3pZ21tLGHccyubC0hi1iniku2Nti8kpfArOyNx3InKq JUG8Xio8vwutwGIavTAAqtk83ZcBU9dAzelJFMq6iKYxsVfdDMaDKuXPYqFu33e4IKgcvLcp0B v25wXQ923a4d+7Sa5lk6ORBvYvPfVjIjggB48nu8Bryc4v3GUQT7Y/HKX2F18b0pNLASbMT7pl b6ovmc+VGwF4DTVf2rwNF1XW/lwWnsGCGRdJaZjZzHqwL4Vbt89EwMncjq3ogAhIC12vWCVGR9 Ovk= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108057" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:42 +0800 IronPort-SDR: N9DJmRT704NcrzhaUrmeclGNK3eWe41AS8XIwOkZ8EZcz2bYJheKiOf7+ENObYVEfCgMK5dbel zNiEu+k9sja0BD+pAiT/tagKJ45kw5XPZigtGI+4R31KGWAT3lxxFB1JvrW9CBnizxyVHiPb6+ A+h/GFnKxi0dfkEFc4OSBBQJefPIbx6e05C+y46e2yjrtI+QD/MuAXQX1d2wPVAnuJvV+/0SrD qToh+sExU+z+6satiXWfP6cKRcLw5EzlrItSLIubt2785cujG2VFLEw/y3woCUosz1G6Yy1asI EPqOHn/ea8U5ivlMKnSLxQB6 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:45 -0800 IronPort-SDR: 6LFTwgSLL09dVZz9tkzcI6SuvLQ/nBpIQ22nDf0xWfQOwNw9kobR8VAhVy4oAFjd+W/SWH25+I dnQhYnbTOylUyRKdb5z+g9Cuxj1pvxe8WBnYxW7LmEHZAs4AFIXlN1fonZSc86gV3SZjk8PF8B 6lRCAAQ0P1JDFzJdgeIsYlPXYNtEzku26cqucjYZM3wAbZcccz5j8Oc7ynec/uMEbc6pYYi9T0 qr4DvoMNwIPvsyM57TJdWjJPrSqTabJKg4IhiZwlmuOhEz/xuhsW4Zjs8oUbTDq0XmsCGiumOi N3I= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:41 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 34/42] btrfs: zoned: implement copying for zoned device-replace Date: Thu, 4 Feb 2021 19:22:13 +0900 Message-Id: X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is 3/4 patch to implement device-replace on zoned filesystems. This commit implements copying. To do this, it tracks the write pointer during the device replace process. As device-replace's copy process is smart enough to only copy used extents on the source device, we have to fill the gap to honor the sequential write requirement in the target device. The device-replace process on zoned filesystems must copy or clone all the extents in the source device exactly once. So, we need to ensure allocations started just before the dev-replace process to have their corresponding extent information in the B-trees. finish_extent_writes_for_zoned() implements that functionality, which basically is the removed code in the commit 042528f8d840 ("Btrfs: fix block group remaining RO forever after error during device replace"). Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/scrub.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.c | 2 +- fs/btrfs/zoned.c | 9 +++++ fs/btrfs/zoned.h | 7 ++++ 4 files changed, 101 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index da4f9c24e42d..92904902d160 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -166,6 +166,7 @@ struct scrub_ctx { int pages_per_rd_bio; int is_dev_replace; + u64 write_pointer; struct scrub_bio *wr_curr_bio; struct mutex wr_lock; @@ -1619,6 +1620,25 @@ static int scrub_write_page_to_dev_replace(struct scrub_block *sblock, return scrub_add_page_to_wr_bio(sblock->sctx, spage); } +static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical) +{ + int ret = 0; + u64 length; + + if (!btrfs_is_zoned(sctx->fs_info)) + return 0; + + if (sctx->write_pointer < physical) { + length = physical - sctx->write_pointer; + + ret = btrfs_zoned_issue_zeroout(sctx->wr_tgtdev, + sctx->write_pointer, length); + if (!ret) + sctx->write_pointer = physical; + } + return ret; +} + static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx, struct scrub_page *spage) { @@ -1641,6 +1661,13 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx, if (sbio->page_count == 0) { struct bio *bio; + ret = fill_writer_pointer_gap(sctx, + spage->physical_for_dev_replace); + if (ret) { + mutex_unlock(&sctx->wr_lock); + return ret; + } + sbio->physical = spage->physical_for_dev_replace; sbio->logical = spage->logical; sbio->dev = sctx->wr_tgtdev; @@ -1702,6 +1729,9 @@ static void scrub_wr_submit(struct scrub_ctx *sctx) * doubled the write performance on spinning disks when measured * with Linux 3.5 */ btrfsic_submit_bio(sbio->bio); + + if (btrfs_is_zoned(sctx->fs_info)) + sctx->write_pointer = sbio->physical + sbio->page_count * PAGE_SIZE; } static void scrub_wr_bio_end_io(struct bio *bio) @@ -3025,6 +3055,20 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, return ret < 0 ? ret : 0; } +static void sync_replace_for_zoned(struct scrub_ctx *sctx) +{ + if (!btrfs_is_zoned(sctx->fs_info)) + return; + + sctx->flush_all_writes = true; + scrub_submit(sctx); + mutex_lock(&sctx->wr_lock); + scrub_wr_submit(sctx); + mutex_unlock(&sctx->wr_lock); + + wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0); +} + static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, struct map_lookup *map, struct btrfs_device *scrub_dev, @@ -3165,6 +3209,14 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, */ blk_start_plug(&plug); + if (sctx->is_dev_replace && + btrfs_dev_is_sequential(sctx->wr_tgtdev, physical)) { + mutex_lock(&sctx->wr_lock); + sctx->write_pointer = physical; + mutex_unlock(&sctx->wr_lock); + sctx->flush_all_writes = true; + } + /* * now find all extents for each stripe and scrub them */ @@ -3353,6 +3405,9 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, if (ret) goto out; + if (sctx->is_dev_replace) + sync_replace_for_zoned(sctx); + if (extent_logical + extent_len < key.objectid + bytes) { if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) { @@ -3475,6 +3530,25 @@ static noinline_for_stack int scrub_chunk(struct scrub_ctx *sctx, return ret; } +static int finish_extent_writes_for_zoned(struct btrfs_root *root, + struct btrfs_block_group *cache) +{ + struct btrfs_fs_info *fs_info = cache->fs_info; + struct btrfs_trans_handle *trans; + + if (!btrfs_is_zoned(fs_info)) + return 0; + + btrfs_wait_block_group_reservations(cache); + btrfs_wait_nocow_writers(cache); + btrfs_wait_ordered_roots(fs_info, U64_MAX, cache->start, cache->length); + + trans = btrfs_join_transaction(root); + if (IS_ERR(trans)) + return PTR_ERR(trans); + return btrfs_commit_transaction(trans); +} + static noinline_for_stack int scrub_enumerate_chunks(struct scrub_ctx *sctx, struct btrfs_device *scrub_dev, u64 start, u64 end) @@ -3629,6 +3703,16 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, * group is not RO. */ ret = btrfs_inc_block_group_ro(cache, sctx->is_dev_replace); + if (!ret && sctx->is_dev_replace) { + ret = finish_extent_writes_for_zoned(root, cache); + if (ret) { + btrfs_dec_block_group_ro(cache); + scrub_pause_off(fs_info); + btrfs_put_block_group(cache); + break; + } + } + if (ret == 0) { ro_set = 1; } else if (ret == -ENOSPC && !sctx->is_dev_replace) { diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 52ec6721ada2..1312b17a6b49 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5978,7 +5978,7 @@ static bool is_block_group_to_copy(struct btrfs_fs_info *fs_info, u64 logical) struct btrfs_block_group *cache; bool ret; - /* Non-ZONED mode does not use "to_copy" flag */ + /* Non zoned filesystem does not use "to_copy" flag */ if (!btrfs_is_zoned(fs_info)) return false; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 72d9c8ba98a3..396723947934 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1379,3 +1379,12 @@ void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, ASSERT(cache->meta_write_pointer == eb->start + eb->len); cache->meta_write_pointer = eb->start; } + +int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, u64 length) +{ + if (!btrfs_dev_is_sequential(device, physical)) + return -EOPNOTSUPP; + + return blkdev_issue_zeroout(device->bdev, physical >> SECTOR_SHIFT, + length >> SECTOR_SHIFT, GFP_NOFS, 0); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 0755a25d0f4c..5ed1ea2009ea 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -55,6 +55,7 @@ bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, struct btrfs_block_group **cache_ret); void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, struct extent_buffer *eb); +int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, u64 length); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -169,6 +170,12 @@ static inline void btrfs_revert_meta_write_pointer( { } +static inline int btrfs_zoned_issue_zeroout(struct btrfs_device *device, + u64 physical, u64 length) +{ + return -EOPNOTSUPP; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Thu Feb 4 10:22:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066913 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5DA6C433DB for ; Thu, 4 Feb 2021 10:32:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9677C64DDE for ; Thu, 4 Feb 2021 10:32:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235585AbhBDKb5 (ORCPT ); Thu, 4 Feb 2021 05:31:57 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54215 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235407AbhBDKah (ORCPT ); Thu, 4 Feb 2021 05:30:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434637; x=1643970637; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XczuzOxoCXSAbWlu/tErAsKsvdd0x/RAHCQ7sfTQFPU=; b=j6FmDvXqVaBrrnHcSdLqGmITvQuHoN0+3cD6GCmWhhptqydFEddD2+XW YMBSiy4VBQXmp/pQHtGEnwoZAClcUHa0oFdeYDEX1USCa8DA0eL+T8V3r b5OSg4pPUBV5uJE21qkgaIcCrmsyIpFC/tIuUyemwZYY16wt+PNMCc9DB B9gNRAdRMHS6/EOFvLVHd//ncWbVT0HFEiVTf3wxP9uGvoG7wKkQUWsQ5 rRTt05RWYnLg+cnKSnjdtbkqMWIW7t6ZD1wYr7YUkhd3UK24tPmKA5lQ/ ppdotZBopDAz/gHeTSB0ZBsXmO3vD79qjXBeal7i0NoKWxZDicWS1BaQb A==; IronPort-SDR: Epv+nKBisAt9cWxob1ZaZ8jePZNTRpXMmQT+zvHWvmuVkssu1g6n4KQyqeNg/Lgn6n+mv/jgGy l+K3t0eJA6fGOf1IWfRv90TuAQPHU/N+IudalWvOoXEYM/QXA5EZFCjThWohFYlYU9ZUqwzk47 /w9lBOphTC2N0mvRqcRJWZEf+14bJl4R6akDt9CVubiMWYA0QaCHc0hWJ3Sl0gzVgRAdGL7Bca vn831f49glSAT7vtJHXPqlXlAlSnyAZ7C0svUMFpRepjkV9fQ3A7Pvg+AG67ihPPW77NS1Thee NtM= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108060" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:43 +0800 IronPort-SDR: 4j3Ocn9cTCeTrn7HUGOKW7dcizR0t2r83E2V6JFxd2rppfOuy+qPfmGCu6HeztwFRK1uH0G0zO cNCsF7Hyhw7LJoLOJF83DKBYUbawq/bpZFjGhSwFukyZ4af2fk16EzGyGoMLKN2F4CoywaBlCW qCGYJUgz/0pJnNqnNNspQ6zBmZr/PdmXeFK5Z27q6MiP/KEonxOE1+I60DtJQofZoSQRkWcNgG BmxhmqfoOfsDG+q66Fn5CsGA50n5dmBciNdbTxWuyEv2CHAEKonpuz4aspqME4StyOQvhQscYK 8cHbWuvk1Q6L+FXaUCqTwxpD Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:47 -0800 IronPort-SDR: 2Nsm3vVQZnmfRYWJ00VpqZI+CoGK//IBJl8Ry3HU/COWErotXyV6TyQcjorf/WMhMwrw+PEjq1 0+TmXEczX0iSt2mL5BQuAadhL24Kk/w5fP75YoR6DaK+o9Q4bbZJEGt7Z3oY97EHjU304kNWeJ B4n7R3whFYhR/WfGuReF4wEp+Kd+90Y8zCkpC3yWG57LH0HdiKtUaPdrjSwnBmWdXLmR8W3me5 fj/QUT4ugcBQunnXH9OcZDC3KfrpmdVdP14oiTbZMAfmM6IKACHTGVW4d1iyL7UepGU1ofBtYC 3Z0= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:42 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 35/42] btrfs: zoned: support dev-replace in zoned filesystems Date: Thu, 4 Feb 2021 19:22:14 +0900 Message-Id: X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is 4/4 patch to implement device-replace on zoned filesystems. Even after the copying is done, the write pointers of the source device and the destination device may not be synchronized. For example, when the last allocated extent is freed before device-replace process, the extent is not copied, leaving a hole there. Synchronize the write pointers by writing zeroes to the destination device. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/scrub.c | 40 ++++++++++++++++++++++++++ fs/btrfs/zoned.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 9 ++++++ 3 files changed, 123 insertions(+) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 92904902d160..e0c3ec01e324 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1628,6 +1628,9 @@ static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical) if (!btrfs_is_zoned(sctx->fs_info)) return 0; + if (!btrfs_dev_is_sequential(sctx->wr_tgtdev, physical)) + return 0; + if (sctx->write_pointer < physical) { length = physical - sctx->write_pointer; @@ -3069,6 +3072,32 @@ static void sync_replace_for_zoned(struct scrub_ctx *sctx) wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0); } +static int sync_write_pointer_for_zoned(struct scrub_ctx *sctx, u64 logical, + u64 physical, u64 physical_end) +{ + struct btrfs_fs_info *fs_info = sctx->fs_info; + int ret = 0; + + if (!btrfs_is_zoned(fs_info)) + return 0; + + wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0); + + mutex_lock(&sctx->wr_lock); + if (sctx->write_pointer < physical_end) { + ret = btrfs_sync_zone_write_pointer(sctx->wr_tgtdev, logical, + physical, + sctx->write_pointer); + if (ret) + btrfs_err(fs_info, + "zoned: failed to recover write pointer"); + } + mutex_unlock(&sctx->wr_lock); + btrfs_dev_clear_zone_empty(sctx->wr_tgtdev, physical); + + return ret; +} + static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, struct map_lookup *map, struct btrfs_device *scrub_dev, @@ -3475,6 +3504,17 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, blk_finish_plug(&plug); btrfs_free_path(path); btrfs_free_path(ppath); + + if (sctx->is_dev_replace && ret >= 0) { + int ret2; + + ret2 = sync_write_pointer_for_zoned(sctx, base + offset, + map->stripes[num].physical, + physical_end); + if (ret2) + ret = ret2; + } + return ret < 0 ? ret : 0; } diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 396723947934..148cbfc7f988 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -12,6 +12,7 @@ #include "block-group.h" #include "transaction.h" #include "dev-replace.h" +#include "space-info.h" /* Maximum number of zones to report per blkdev_report_zones() call */ #define BTRFS_REPORT_NR_ZONES 4096 @@ -1388,3 +1389,76 @@ int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, u64 len return blkdev_issue_zeroout(device->bdev, physical >> SECTOR_SHIFT, length >> SECTOR_SHIFT, GFP_NOFS, 0); } + +static int read_zone_info(struct btrfs_fs_info *fs_info, u64 logical, + struct blk_zone *zone) +{ + struct btrfs_bio *bbio = NULL; + u64 mapped_length = PAGE_SIZE; + unsigned int nofs_flag; + int nmirrors; + int i, ret; + + ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS, logical, + &mapped_length, &bbio); + if (ret || !bbio || mapped_length < PAGE_SIZE) { + btrfs_put_bbio(bbio); + return -EIO; + } + + if (bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) + return -EINVAL; + + nofs_flag = memalloc_nofs_save(); + nmirrors = (int)bbio->num_stripes; + for (i = 0; i < nmirrors; i++) { + u64 physical = bbio->stripes[i].physical; + struct btrfs_device *dev = bbio->stripes[i].dev; + + /* Missing device */ + if (!dev->bdev) + continue; + + ret = btrfs_get_dev_zone(dev, physical, zone); + /* Failing device */ + if (ret == -EIO || ret == -EOPNOTSUPP) + continue; + break; + } + memalloc_nofs_restore(nofs_flag); + + return ret; +} + +/* + * Synchronize write pointer in a zone at @physical_start on @tgt_dev, by + * filling zeros between @physical_pos to a write pointer of dev-replace + * source device. + */ +int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, u64 logical, + u64 physical_start, u64 physical_pos) +{ + struct btrfs_fs_info *fs_info = tgt_dev->fs_info; + struct blk_zone zone; + u64 length; + u64 wp; + int ret; + + if (!btrfs_dev_is_sequential(tgt_dev, physical_pos)) + return 0; + + ret = read_zone_info(fs_info, logical, &zone); + if (ret) + return ret; + + wp = physical_start + ((zone.wp - zone.start) << SECTOR_SHIFT); + + if (physical_pos == wp) + return 0; + + if (physical_pos > wp) + return -EUCLEAN; + + length = wp - physical_pos; + return btrfs_zoned_issue_zeroout(tgt_dev, physical_pos, length); +} diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 5ed1ea2009ea..932ad9bc0de6 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -56,6 +56,8 @@ bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, struct extent_buffer *eb); int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, u64 length); +int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, u64 logical, + u64 physical_start, u64 physical_pos); #else /* CONFIG_BLK_DEV_ZONED */ static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, struct blk_zone *zone) @@ -176,6 +178,13 @@ static inline int btrfs_zoned_issue_zeroout(struct btrfs_device *device, return -EOPNOTSUPP; } +static inline int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, + u64 logical, u64 physical_start, + u64 physical_pos) +{ + return -EOPNOTSUPP; +} + #endif static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos) From patchwork Thu Feb 4 10:22:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066915 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03C01C433E0 for ; Thu, 4 Feb 2021 10:32:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B079864E34 for ; Thu, 4 Feb 2021 10:32:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235586AbhBDKcw (ORCPT ); Thu, 4 Feb 2021 05:32:52 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54218 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235577AbhBDKbF (ORCPT ); Thu, 4 Feb 2021 05:31:05 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434664; x=1643970664; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=eKn5s655mQV80Mh5yVdiqf2y3e6s1hG0pl3u2SlJQcM=; b=W3skXC+P9z1UVyY8Q3LHKK0vdB4FL6wmY9KrCqE27tdspckaR2ZdafKK dOHfbDzM1LXx5YXP/KohTzu0LVkuQ+jC8P+1o3O45A4V602wrUalk1Ofi btc+/2fYEaF11KOMwVTyAKzMmFwJXtzvfpaiYQgKUbPhUMJqJBrt6/320 8he1kgn+ReCBZjlj2qQQLE5yChToZ8c1rva6KhYK3glsZsO60TwwsZ0vk t+GQWuBqXca9WpvHTPpAz+z7rcbdSHAkJvD2+DK6/m/9bfQ1xnG2SSN6y nytxzKFWmXMu0NUGQo6jTHpzhC41M/Ujmk23BaPnbqu2glDIQIwnfxkOI A==; IronPort-SDR: A7CaCtQs74Oh8s0gnfzNYRUZo0r0Nvzqrh6i064fOV00vWh8/KqROBCwr9jnkry+lsPudz7qcd sfJmX5vM9GqSn5j1jsiiYbxESslmeaDepRTunck4ET30Or2CTssDYYbqJMKxwGouAdcOSn2KuM J5f6UbujoNn+py9HWKzHDEK0OGfo8zXJtDduXphQg8inkbEtInkCCaDpuu2qbYYUBY3cDbl4K3 fYxIC6C4PJg3Zu5Xvj3Cc6AEKSu41X5JgG/3rbAtjytGY6dDnkSkj2sM/d/AyJyO0J16/Qp3HX pFA= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108063" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:45 +0800 IronPort-SDR: Gwm9XgdPJmCN8s8sRK0HwD1JCWunMRtT4dw0hHdGk/OOd8U1NIHkLKATF16d+jeG+6+LWOkLZf ICYmEIAV6r6a4iBJtfbzTh9iPCUls+Fm+u1Znkc3JmtW+aJVnzaSuhw/srFGs5CsU/3vHKfOgC /0Mrm6M8F9Xu5X+UUKz4ZllqSwUFEmd31h31E1cNYg7gtjHEr+MuJP2FMqlKUB2wbCy6YOXMlc q9e++9GSAanDzYy1aMUF1sX8nsksKXVt6datKQnukS3dXlrkGNPqq6JqDUhay8S+5GGeK2BO93 YT2I0c8oGIKuD9nq89Aim/ZX Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:48 -0800 IronPort-SDR: 3WmhTBoeX0W/V4mfuIA9YVYkn3Aj8/46rSzugfONNG6JZsOWrA3IFzRl5SoOVtQId1tp3N8ZSM E5PHouyGitwaqtvulEJzVUiEKuYUjn4U5RZ9CBX53RBpjSjIhZ3QQ9SfG+P0Imm7Mgz6/wi+DE PkHSGpf9qLYvhUmasIMvNwNQCT9QJ2OL+NHXS+xtLDCA/SGn3y9pqSRVlyNF0h4ZZbQY1xsnyd sZNZPTrAmTPtM9ADtSPiemHTV6CQUqb59nbbGBxZvlkS0fLFfIUe2vYMVVDYjyvWPtAO0zvfvg /Bw= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:44 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 36/42] btrfs: zoned: enable relocation on a zoned filesystem Date: Thu, 4 Feb 2021 19:22:15 +0900 Message-Id: X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Currently fallocate() is disabled on a zoned filesystem. Since current relocation process relies on preallocation to move file data extents, it must be handled differently. On a zoned filesystem, we just truncate the inode to the size that we wanted to pre-allocate. Then, we flush dirty pages on the file before finishing the relocation process. run_delalloc_zoned() will handle all the allocations and submit IOs to the underlying layers. Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota --- fs/btrfs/relocation.c | 34 ++++++++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 473b78874844..232d5da7b7be 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2553,6 +2553,31 @@ static noinline_for_stack int prealloc_file_extent_cluster( if (ret) return ret; + /* + * On a zoned filesystem, we cannot preallocate the file region. + * Instead, we dirty and fiemap_write the region. + */ + if (btrfs_is_zoned(inode->root->fs_info)) { + struct btrfs_root *root = inode->root; + struct btrfs_trans_handle *trans; + + end = cluster->end - offset + 1; + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) + return PTR_ERR(trans); + + inode->vfs_inode.i_ctime = current_time(&inode->vfs_inode); + i_size_write(&inode->vfs_inode, end); + ret = btrfs_update_inode(trans, root, inode); + if (ret) { + btrfs_abort_transaction(trans, ret); + btrfs_end_transaction(trans); + return ret; + } + + return btrfs_end_transaction(trans); + } + inode_lock(&inode->vfs_inode); for (nr = 0; nr < cluster->nr; nr++) { start = cluster->boundary[nr] - offset; @@ -2756,6 +2781,8 @@ static int relocate_file_extent_cluster(struct inode *inode, } } WARN_ON(nr != cluster->nr); + if (btrfs_is_zoned(fs_info) && !ret) + ret = btrfs_wait_ordered_range(inode, 0, (u64)-1); out: kfree(ra); return ret; @@ -3434,8 +3461,12 @@ static int __insert_orphan_inode(struct btrfs_trans_handle *trans, struct btrfs_path *path; struct btrfs_inode_item *item; struct extent_buffer *leaf; + u64 flags = BTRFS_INODE_NOCOMPRESS | BTRFS_INODE_PREALLOC; int ret; + if (btrfs_is_zoned(trans->fs_info)) + flags &= ~BTRFS_INODE_PREALLOC; + path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -3450,8 +3481,7 @@ static int __insert_orphan_inode(struct btrfs_trans_handle *trans, btrfs_set_inode_generation(leaf, item, 1); btrfs_set_inode_size(leaf, item, 0); btrfs_set_inode_mode(leaf, item, S_IFREG | 0600); - btrfs_set_inode_flags(leaf, item, BTRFS_INODE_NOCOMPRESS | - BTRFS_INODE_PREALLOC); + btrfs_set_inode_flags(leaf, item, flags); btrfs_mark_buffer_dirty(leaf); out: btrfs_free_path(path); From patchwork Thu Feb 4 10:22:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066919 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71B22C433E9 for ; Thu, 4 Feb 2021 10:33:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2945E64F60 for ; Thu, 4 Feb 2021 10:33:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235590AbhBDKc5 (ORCPT ); Thu, 4 Feb 2021 05:32:57 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54222 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235581AbhBDKbe (ORCPT ); Thu, 4 Feb 2021 05:31:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434694; x=1643970694; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cWRqAXnT6xrFDfwZO4TMUDgZFvtDQlnl6iNjK9ZIFOQ=; b=qZeDUGVgXSgKIqBTF7RpGfaHv2dpqoEV6ntYw6CKxgZneFrhDWpS7DKL uE7RSvxhBCHqztQ+GLj58VAOjnkcYuGReEmjWI1bDT0KQB9/gT53SZshg cDXlOZd8Dv4hJmHBh4J0ybPsH6dYqD0XVyX99/jnSnqrPhGnWIIbVeIi3 FAgdMOx3T9isXvA6G6myBoezFoJ7uJ5wDrqXytwHlAawopTVivTM5f4aH qQPTtlvg0Xgq1PYDqwDpQETAK/0lWeRZuSzwV8TUsa/IvKZJN69WCiXdi mS6Mbw7ylTF10R+nuSbfrfHAUth3srLyW6Mk0mH0cDQmiKzbdzH1xXPrf Q==; IronPort-SDR: qS4Dl8wP38wGpIL4P9UMGgH448ex+HOcaEEkT9jAwv0OxqZDDDDxxWb3anPqDYAItR9psyJPgP TAJMVlfeLbiI+mTQsQXRE+pL1Z1serfggr96A5o02ie+u6YRHQg7Z2W8osnNB/RNS8ZMc1+mj4 ++FHX0p2p9VPoBlrGwS4jHTIf6s3e0DcQRnNPdO4ShUcmaOhr5ECb7QTCMAjlXQOjJfkRJyxu5 ahxaOmzcFN82FmxognLbv4khWhDTsVLTEbdVWma3QLwXkzwqfhHJbiCv+vgyZWrfzKXyumTkt8 VfA= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108065" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:46 +0800 IronPort-SDR: jkuipV+Yj8rn5OPyO/t5q3ScwvMFCGvCjn+KBmzeF8aJVhvUqnCeg+GD5gnxnNb+E09jVadgfD ROAwUm1j+Zms9cU93tNPDVA8WyCgcP64BsejbuF/DlIQDDeuE1Hx9RsFdtdrACGUwgverryfzo rsFc3mTsKWSbq8/TUBDrnns3Qc+vlOiTzLQ/krA31GYcK8I/xZPF1zOV3R7S12F36XJ197nIPC nshhNFW8+KcEeqbloGDhcM60s4kB7u+QeyMQuUkSs9Wscy1ojs6fqFqBLgYLMiYmxTW9ORJ6LZ TLGveagb61HrZa0Z4aMFxr2G Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:49 -0800 IronPort-SDR: hTxEFEKP73v0hg6fRf5Rfuimx0X2qgt1XkaiVJI3nwx0X2qwXRhHqavSB/4FgpHfNBKam4LGcH Iyj3Tk9uxnKaS78WlPpcD2QQNut1IEhegUvUU9ZH1IqDegDJvABODqggzADuw+RToT2zWCziSr MtT8Tse9esYrkoWW9xjV8bqltaDRBKoCzDt09W3YCbjutmOpp89Lx8NThhCXMdF3w2rJqWXky7 nvXHEOlH7qeAg8xFKpOBYSdyKh9UpU1xo1BRK/S6sdwP+V1bwross4VdgKTL9w18UefMOPuVIq ea4= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:45 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik Subject: [PATCH v15 37/42] btrfs: zoned: relocate block group to repair IO failure in zoned filesystems Date: Thu, 4 Feb 2021 19:22:16 +0900 Message-Id: X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When btrfs finds a checksum error and if the file system has a mirror of the damaged data, btrfs read the correct data from the mirror and writes it to damaged blocks. This however, violates the sequential write constraints of a zoned block device. We can consider three methods to repair an IO failure in zoned filesystems: (1) Reset and rewrite the damaged zone (2) Allocate new device extent and replace the damaged device extent to the new extent (3) Relocate the corresponding block group Method (1) is most similar to a behavior done with regular devices. However, it also wipes non-damaged data in the same device extent, and so it unnecessary degrades non-damaged data. Method (2) is much like device replacing but done in the same device. It is safe because it keeps the device extent until the replacing finish. However, extending device replacing is non-trivial. It assumes "src_dev->physical == dst_dev->physical". Also, the extent mapping replacing function should be extended to support replacing device extent position in one device. Method (3) invokes relocation of the damaged block group and is straightforward to implement. It relocates all the mirrored device extents, so it potentially is a more costly operation than method (1) or (2). But it relocates only used extents which reduce the total IO size. Let's apply method (3) for now. In the future, we can extend device-replace and apply method (2). For protecting a block group gets relocated multiple time with multiple IO errors, this commit introduces "relocating_repair" bit to show it's now relocating to repair IO failures. Also it uses a new kthread "btrfs-relocating-repair", not to block IO path with relocating process. This commit also supports repairing in the scrub process. Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/block-group.h | 1 + fs/btrfs/extent_io.c | 3 ++ fs/btrfs/scrub.c | 3 ++ fs/btrfs/volumes.c | 72 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 1 + 5 files changed, 80 insertions(+) diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h index d37ee576ac6e..29678426247d 100644 --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -96,6 +96,7 @@ struct btrfs_block_group { unsigned int has_caching_ctl:1; unsigned int removed:1; unsigned int to_copy:1; + unsigned int relocating_repair:1; int disk_cache_state; diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index ac210cf0956b..32fb5021f353 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2260,6 +2260,9 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start, ASSERT(!(fs_info->sb->s_flags & SB_RDONLY)); BUG_ON(!mirror_num); + if (btrfs_is_zoned(fs_info)) + return btrfs_repair_one_zone(fs_info, logical); + bio = btrfs_io_bio_alloc(1); bio->bi_iter.bi_size = 0; map_length = length; diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index e0c3ec01e324..310fce00fcda 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -857,6 +857,9 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) have_csum = sblock_to_check->pagev[0]->have_csum; dev = sblock_to_check->pagev[0]->dev; + if (btrfs_is_zoned(fs_info) && !sctx->is_dev_replace) + return btrfs_repair_one_zone(fs_info, logical); + /* * We must use GFP_NOFS because the scrub task might be waiting for a * worker task executing this function and in turn a transaction commit diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 1312b17a6b49..b8fab44394f5 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7980,3 +7980,75 @@ bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr) spin_unlock(&fs_info->swapfile_pins_lock); return node != NULL; } + +static int relocating_repair_kthread(void *data) +{ + struct btrfs_block_group *cache = (struct btrfs_block_group *)data; + struct btrfs_fs_info *fs_info = cache->fs_info; + u64 target; + int ret = 0; + + target = cache->start; + btrfs_put_block_group(cache); + + if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE)) { + btrfs_info(fs_info, + "zoned: skip relocating block group %llu to repair: EBUSY", + target); + return -EBUSY; + } + + mutex_lock(&fs_info->delete_unused_bgs_mutex); + + /* Ensure block group still exists */ + cache = btrfs_lookup_block_group(fs_info, target); + if (!cache) + goto out; + + if (!cache->relocating_repair) + goto out; + + ret = btrfs_may_alloc_data_chunk(fs_info, target); + if (ret < 0) + goto out; + + btrfs_info(fs_info, + "zoned: relocating block group %llu to repair IO failure", + target); + ret = btrfs_relocate_chunk(fs_info, target); + +out: + if (cache) + btrfs_put_block_group(cache); + mutex_unlock(&fs_info->delete_unused_bgs_mutex); + btrfs_exclop_finish(fs_info); + + return ret; +} + +int btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical) +{ + struct btrfs_block_group *cache; + + /* Do not attempt to repair in degraded state */ + if (btrfs_test_opt(fs_info, DEGRADED)) + return 0; + + cache = btrfs_lookup_block_group(fs_info, logical); + if (!cache) + return 0; + + spin_lock(&cache->lock); + if (cache->relocating_repair) { + spin_unlock(&cache->lock); + btrfs_put_block_group(cache); + return 0; + } + cache->relocating_repair = 1; + spin_unlock(&cache->lock); + + kthread_run(relocating_repair_kthread, cache, + "btrfs-relocating-repair"); + + return 0; +} diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index d3bbdb4175df..d4c3e0dd32b8 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -599,5 +599,6 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info, int btrfs_bg_type_to_factor(u64 flags); const char *btrfs_bg_type_to_raid_name(u64 flags); int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info); +int btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical); #endif From patchwork Thu Feb 4 10:22:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066917 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D4ECC433E6 for ; Thu, 4 Feb 2021 10:33:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 51E7A64E43 for ; Thu, 4 Feb 2021 10:33:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235538AbhBDKdF (ORCPT ); Thu, 4 Feb 2021 05:33:05 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54296 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235583AbhBDKbj (ORCPT ); Thu, 4 Feb 2021 05:31:39 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434698; x=1643970698; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QOL0E9tS9Yd5OAaj5DhPGdwfZr8n6HcliTvpWtqG1PU=; b=kvCaf6LGYsJUhHVMuP+HLW86EhT7Hd24vaBb5C5dbG6NCQxouLzNrPhB Vvbm6tyIYrJnHl4LZoKRZzvFgHYcYbYGHylvDO/zJ9zDPJNNP2JQtN4GK XAQWSruDOk8Sp4kDUHODkK0D5fah4UrqF9Ta+sEsxejUyXMMLNUj+eHJL apPMJ1Cmd0cQ+aNLmn18O4bpO1MzOmYtKtAGSZLPI88Y+T2fFdg9mukbF g/rG+zYaoDeB3CdYmCLaJ6CqtaLARS7vMRnxp2AkTXB2p6b2mgIfOGt7Z NipXhui44xHNZKNLD9cVk9ECFb25mo1trgsqJqHYQhng5OMjvSld4PfYt w==; IronPort-SDR: S+cyzh/y0SGHNDthN4hxMEmnN8Pv84MqAQ5vp3E2mHQPWpX61sbLSUT35V5/wGP0LPdSD+3wGO 9g79k1d22XjS09RKdsCh/88HxLiMdfL1W3mKR74EmCtURq8IHUJ0xv4ffYHupjYByAw4vgLDjG 1F6iTRZetzzoFRC0XvN2JxuntiQdeA4mXf9KDrutMXjvHUlru0xxZ67JgtyANlGUlMNT0Im47o /2aFx9nppRfgn74phVhzHkkA5BqRAewaQt8NQEPMitKyhWqlDVvk68X1w5b6jonvOLMhFuIF8d zok= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108070" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:48 +0800 IronPort-SDR: hy0xPrr//yKzH5cVv0AJM5HCGGE2tIerkx2Y4Kptf3zUvg48IrBNRd065qPA7OFdR0ftOlWgDF e6D8p1ohCEJxZUrncdLx9sl1M9rUOvIgEu0d1wbg8CAltUmNh1SQH9urg7gC1a/hG3xCiJVZT8 zHZADZQ2zU8iByOeYYf2R1xaTMv7hW5d7CjEevX8RxwfB8ha7ivYGEWOrv/k8ZVf/hHt+OspHD Y2aeNhYoByAU2Z3C+p1UhyNNFyhVRSYSASldh7SvSvOX7z0LtoYuq+ieSiS3T6FrEYpK9BX1xa xC5FnOj6/RlVIxR65Rrzu5rV Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:51 -0800 IronPort-SDR: xqF0nxsKXrJ2Vp05nTQyeV2qkxvMvic4SXc8Ks1xBYq55zF9+/EpScD9yOOctExM6XyHWbvoSg HfGUUcSOKH4SY3YiDjPZzGoZ4Ehb/qDv02chqK0s2HiMrap1t8ZoYSBAv+ALgBLNIM1QcFUdGR fWcvrQwSnRgQ3vZqP3RzVCCE+66ImRYW8AhST8p8xjnkGKGwQKK+Mwct9plMXPwrWW2Ed4M7ic xDnPfbIbSBwLUbY42z2m1G+Z2OeFz2RGDTvrAD0cLI1Y3S0R/rwjBqO1AqtX5pJc+KL4nbnlBY iMU= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:47 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Filipe Manana , Josef Bacik , Johannes Thumshirn Subject: [PATCH v15 38/42] btrfs: split alloc_log_tree() Date: Thu, 4 Feb 2021 19:22:17 +0900 Message-Id: <2bb9a7626f2f19b722f07a9a44c7d077cde5fd27.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is a preparation patch for the next patch. Split alloc_log_tree() into two parts. The first one allocating the tree structure, remains in alloc_log_tree() and the second part allocating the tree node, which is moved into btrfs_alloc_log_tree_node(). Also export the latter part is to be used in the next patch. Cc: Filipe Manana Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 33 +++++++++++++++++++++++++++------ fs/btrfs/disk-io.h | 2 ++ 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 6e16f556ed75..d2fa92526b3b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1254,7 +1254,6 @@ static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { struct btrfs_root *root; - struct extent_buffer *leaf; root = btrfs_alloc_root(fs_info, BTRFS_TREE_LOG_OBJECTID, GFP_NOFS); if (!root) @@ -1264,6 +1263,14 @@ static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans, root->root_key.type = BTRFS_ROOT_ITEM_KEY; root->root_key.offset = BTRFS_TREE_LOG_OBJECTID; + return root; +} + +int btrfs_alloc_log_tree_node(struct btrfs_trans_handle *trans, + struct btrfs_root *root) +{ + struct extent_buffer *leaf; + /* * DON'T set SHAREABLE bit for log trees. * @@ -1276,26 +1283,33 @@ static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans, leaf = btrfs_alloc_tree_block(trans, root, 0, BTRFS_TREE_LOG_OBJECTID, NULL, 0, 0, 0, BTRFS_NESTING_NORMAL); - if (IS_ERR(leaf)) { - btrfs_put_root(root); - return ERR_CAST(leaf); - } + if (IS_ERR(leaf)) + return PTR_ERR(leaf); root->node = leaf; btrfs_mark_buffer_dirty(root->node); btrfs_tree_unlock(root->node); - return root; + + return 0; } int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { struct btrfs_root *log_root; + int ret; log_root = alloc_log_tree(trans, fs_info); if (IS_ERR(log_root)) return PTR_ERR(log_root); + + ret = btrfs_alloc_log_tree_node(trans, log_root); + if (ret) { + btrfs_put_root(log_root); + return ret; + } + WARN_ON(fs_info->log_root_tree); fs_info->log_root_tree = log_root; return 0; @@ -1307,11 +1321,18 @@ int btrfs_add_log_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_root *log_root; struct btrfs_inode_item *inode_item; + int ret; log_root = alloc_log_tree(trans, fs_info); if (IS_ERR(log_root)) return PTR_ERR(log_root); + ret = btrfs_alloc_log_tree_node(trans, log_root); + if (ret) { + btrfs_put_root(log_root); + return ret; + } + log_root->last_trans = trans->transid; log_root->root_key.offset = root->root_key.objectid; diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index 9f4a2a1e3d36..0e7e9526b6a8 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -120,6 +120,8 @@ blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio, extent_submit_bio_start_t *submit_bio_start); blk_status_t btrfs_submit_bio_done(void *private_data, struct bio *bio, int mirror_num); +int btrfs_alloc_log_tree_node(struct btrfs_trans_handle *trans, + struct btrfs_root *root); int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); int btrfs_add_log_tree(struct btrfs_trans_handle *trans, From patchwork Thu Feb 4 10:22:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066923 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACA1AC4332B for ; Thu, 4 Feb 2021 10:33:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6B8D064F60 for ; Thu, 4 Feb 2021 10:33:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235599AbhBDKdO (ORCPT ); Thu, 4 Feb 2021 05:33:14 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54276 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235571AbhBDKcA (ORCPT ); Thu, 4 Feb 2021 05:32:00 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434719; x=1643970719; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KLkL2AWrGTi+ws9K4W3t0t/UhjMtLj661XQYJKwob+s=; b=IWC2hH9AQclR75Qlnw5u1ZIrTlVbex+ytzrFYy2gQS0CyDD0+/8SENR/ R9h3BksGZEa3n28SgXNWMdfjTdltNPQJ8d7MDOhQnqXYXQE3scgkmEC1A rVoFjEHq+H7fpK/Oo/LKYIIHY+07Nu+etSNxcyEZz3RD+aEMTudV5GRH6 aBa14YvsfSyBmWshqc3L9KYmLxVSpJEY7Zq3bdoeO7fXZ/JTdtIs5l3YT e6++1DP2yqdWW9CfMx9pBUX8AHebooyx9rGA+YP5WWRhFS0OD0dLCjAM5 MO2diJnGtJLAf5YiRCUjeLOLpmuIyyVSwyiv2ssTQq7ys9w1yrpRZmmrg A==; IronPort-SDR: NsDyR1QyQNeYedLjM9/w4JVauuWWCWbVaR1OjzpuHhEV8Dcgf9tjJ1rBjymvjVvCodkNrdQB9e Guba8quDBtKLJ8PgBBru8yWX/wZQlBG/I4VxJZqfHz++/3WaG8+7NIQIeUwMIxneSgOkAVCR45 cfWAIF/Bggpk+lTGw7FRuqs+jVb9Xje2N5pQ+HQbpiAXPCBlx9UxOX0Ra+4nvFroFkT+xIKxnx nuVNAc1sh/hbTsW1cVPRLmTTpJ7yAT7a4Mj2E1bY4mlVYWACa7CRC1Z5t+H+f58PJoLCGcuQsh gfI= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108072" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:49 +0800 IronPort-SDR: vfSDxjRoh/22FIssY2nKhVWi8h1TXWGpRm/t3L8pUFpLMngJLyW48ab4TuDnvT09jyZyZ3Fv2n 22jmp2KsKFMjqO6RILRuGpipnzeuTBRQAaP5S1+a4yG203wJBzrW0Ao7CpcS6lsRCwSvDWyBxp 4oofsCbt1doaTkgGCMRpMxPCgPv7Z0cjwZFJ3PnP7dNgQinNxCacvDrt5i+g/VvJTMPeni6TkN 6eW7/gu1YUpyUfUJbduGz84tIsMbEJeCgI4k0V1xKn4E1dY8SnnXSRW4budVhoa1j8kPSnFBVZ 53LoqFCG52j0/8yUYnFL/jG5 Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:53 -0800 IronPort-SDR: dbEnZ+0eEluMvQcxs04bCOodQC3HxVeoHwQ7vA34FKs0wiUtVUcifCp/gmOhYK1s53RdSt+iuL peXYadsYQyFuHwpwhvq2xsGZWGy99U9l9PCsoE8FXYduw/KnR8vsIHxgLuq104jB4W5b6QBCQj yxJ2tMMXw8QJRTZJ6f7wJvHIsX4LBBcRkPb+kRnAS9VgkpH6+EASesBhSg9IAZIgN0BfY3GJJL BMDfwq5CYl8ar2N5h6waeZl/gc29W8MKHGjAQxdjUT1gm1GokhDJyJePJ1dDV+5LoTylxrFTBg YeQ= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:48 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Josef Bacik , Johannes Thumshirn Subject: [PATCH v15 39/42] btrfs: zoned: extend zoned allocator to use dedicated tree-log block group Date: Thu, 4 Feb 2021 19:22:18 +0900 Message-Id: <4a02c3ff283a1c2d71bfa3b0a7135b062af7385e.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 1/3 patch to enable tree log on zoned filesystems. The tree-log feature does not work on a zoned filesystem as is. Blocks for a tree-log tree are allocated mixed with other metadata blocks and btrfs writes and syncs the tree-log blocks to devices at the time of fsync(), which has a different timing than a global transaction commit. As a result, both writing tree-log blocks and writing other metadata blocks become non-sequential writes that zoned filesystems must avoid. Introduce a dedicated block group for tree-log blocks, so that tree-log blocks and other metadata blocks can be separate write streams. As a result, each write stream can now be written to devices separately. "fs_info->treelog_bg" tracks the dedicated block group and btrfs assigns "treelog_bg" on-demand on tree-log block allocation time. This commit extends the zoned block allocator to use the block group. Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/block-group.c | 2 ++ fs/btrfs/ctree.h | 2 ++ fs/btrfs/disk-io.c | 1 + fs/btrfs/extent-tree.c | 75 +++++++++++++++++++++++++++++++++++++++--- fs/btrfs/zoned.h | 14 ++++++++ 5 files changed, 90 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index f5e9f560ce6d..5064be59dac5 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -901,6 +901,8 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, btrfs_return_cluster_to_free_space(block_group, cluster); spin_unlock(&cluster->refill_lock); + btrfs_clear_treelog_bg(block_group); + path = btrfs_alloc_path(); if (!path) { ret = -ENOMEM; diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 1bb4f767966a..6f4b493625ef 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -976,6 +976,8 @@ struct btrfs_fs_info { /* Max size to emit ZONE_APPEND write command */ u64 max_zone_append_size; struct mutex zoned_meta_io_lock; + spinlock_t treelog_bg_lock; + u64 treelog_bg; #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index d2fa92526b3b..84c6650d5ef7 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2787,6 +2787,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info) spin_lock_init(&fs_info->super_lock); spin_lock_init(&fs_info->buffer_lock); spin_lock_init(&fs_info->unused_bgs_lock); + spin_lock_init(&fs_info->treelog_bg_lock); rwlock_init(&fs_info->tree_mod_log_lock); mutex_init(&fs_info->unused_bg_unpin_mutex); mutex_init(&fs_info->delete_unused_bgs_mutex); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index e2b2abc42295..f8e8c17e5624 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3497,6 +3497,9 @@ struct find_free_extent_ctl { bool have_caching_bg; bool orig_have_caching_bg; + /* Allocation is called for tree-log */ + bool for_treelog; + /* RAID index, converted from flags */ int index; @@ -3725,6 +3728,22 @@ static int do_allocation_clustered(struct btrfs_block_group *block_group, return find_free_extent_unclustered(block_group, ffe_ctl); } +/* + * Tree-log block group locking + * ============================ + * + * fs_info::treelog_bg_lock protects the fs_info::treelog_bg which + * indicates the starting address of a block group, which is reserved only + * for tree-log metadata. + * + * Lock nesting + * ============ + * + * space_info::lock + * block_group::lock + * fs_info::treelog_bg_lock + */ + /* * Simple allocator for sequential only block group. It only allows sequential * allocation. No need to play with trees. This function also reserves the @@ -3734,23 +3753,54 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, struct find_free_extent_ctl *ffe_ctl, struct btrfs_block_group **bg_ret) { + struct btrfs_fs_info *fs_info = block_group->fs_info; struct btrfs_space_info *space_info = block_group->space_info; struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; u64 start = block_group->start; u64 num_bytes = ffe_ctl->num_bytes; u64 avail; + u64 bytenr = block_group->start; + u64 log_bytenr; int ret = 0; + bool skip; ASSERT(btrfs_is_zoned(block_group->fs_info)); + /* + * Do not allow non-tree-log blocks in the dedicated tree-log block + * group, and vice versa. + */ + spin_lock(&fs_info->treelog_bg_lock); + log_bytenr = fs_info->treelog_bg; + skip = log_bytenr && ((ffe_ctl->for_treelog && bytenr != log_bytenr) || + (!ffe_ctl->for_treelog && bytenr == log_bytenr)); + spin_unlock(&fs_info->treelog_bg_lock); + if (skip) + return 1; + spin_lock(&space_info->lock); spin_lock(&block_group->lock); + spin_lock(&fs_info->treelog_bg_lock); + + ASSERT(!ffe_ctl->for_treelog || + block_group->start == fs_info->treelog_bg || + fs_info->treelog_bg == 0); if (block_group->ro) { ret = 1; goto out; } + /* + * Do not allow currently using block group to be tree-log dedicated + * block group. + */ + if (ffe_ctl->for_treelog && !fs_info->treelog_bg && + (block_group->used || block_group->reserved)) { + ret = 1; + goto out; + } + avail = block_group->length - block_group->alloc_offset; if (avail < num_bytes) { if (ffe_ctl->max_extent_size < avail) { @@ -3765,6 +3815,9 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, goto out; } + if (ffe_ctl->for_treelog && !fs_info->treelog_bg) + fs_info->treelog_bg = block_group->start; + ffe_ctl->found_offset = start + block_group->alloc_offset; block_group->alloc_offset += num_bytes; spin_lock(&ctl->tree_lock); @@ -3779,6 +3832,9 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group, ffe_ctl->search_start = ffe_ctl->found_offset; out: + if (ret && ffe_ctl->for_treelog) + fs_info->treelog_bg = 0; + spin_unlock(&fs_info->treelog_bg_lock); spin_unlock(&block_group->lock); spin_unlock(&space_info->lock); return ret; @@ -4028,7 +4084,12 @@ static int prepare_allocation(struct btrfs_fs_info *fs_info, return prepare_allocation_clustered(fs_info, ffe_ctl, space_info, ins); case BTRFS_EXTENT_ALLOC_ZONED: - /* Nothing to do */ + if (ffe_ctl->for_treelog) { + spin_lock(&fs_info->treelog_bg_lock); + if (fs_info->treelog_bg) + ffe_ctl->hint_byte = fs_info->treelog_bg; + spin_unlock(&fs_info->treelog_bg_lock); + } return 0; default: BUG(); @@ -4072,6 +4133,7 @@ static noinline int find_free_extent(struct btrfs_root *root, struct find_free_extent_ctl ffe_ctl = {0}; struct btrfs_space_info *space_info; bool full_search = false; + bool for_treelog = root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID; WARN_ON(num_bytes < fs_info->sectorsize); @@ -4085,6 +4147,7 @@ static noinline int find_free_extent(struct btrfs_root *root, ffe_ctl.orig_have_caching_bg = false; ffe_ctl.found_offset = 0; ffe_ctl.hint_byte = hint_byte_orig; + ffe_ctl.for_treelog = for_treelog; ffe_ctl.policy = BTRFS_EXTENT_ALLOC_CLUSTERED; /* For clustered allocation */ @@ -4159,8 +4222,11 @@ static noinline int find_free_extent(struct btrfs_root *root, struct btrfs_block_group *bg_ret; /* If the block group is read-only, we can skip it entirely. */ - if (unlikely(block_group->ro)) + if (unlikely(block_group->ro)) { + if (for_treelog) + btrfs_clear_treelog_bg(block_group); continue; + } btrfs_grab_block_group(block_group, delalloc); ffe_ctl.search_start = block_group->start; @@ -4346,6 +4412,7 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, bool final_tried = num_bytes == min_alloc_size; u64 flags; int ret; + bool for_treelog = root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID; flags = get_alloc_profile_by_root(root, is_data); again: @@ -4369,8 +4436,8 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, sinfo = btrfs_find_space_info(fs_info, flags); btrfs_err(fs_info, - "allocation failed flags %llu, wanted %llu", - flags, num_bytes); + "allocation failed flags %llu, wanted %llu tree-log %d", + flags, num_bytes, for_treelog); if (sinfo) btrfs_dump_space_info(fs_info, sinfo, num_bytes, 1); diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 932ad9bc0de6..61e969652fe1 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -7,6 +7,7 @@ #include #include "volumes.h" #include "disk-io.h" +#include "block-group.h" struct btrfs_zoned_device_info { /* @@ -290,4 +291,17 @@ static inline void btrfs_zoned_meta_io_unlock(struct btrfs_fs_info *fs_info) mutex_unlock(&fs_info->zoned_meta_io_lock); } +static inline void btrfs_clear_treelog_bg(struct btrfs_block_group *bg) +{ + struct btrfs_fs_info *fs_info = bg->fs_info; + + if (!btrfs_is_zoned(fs_info)) + return; + + spin_lock(&fs_info->treelog_bg_lock); + if (fs_info->treelog_bg == bg->start) + fs_info->treelog_bg = 0; + spin_unlock(&fs_info->treelog_bg_lock); +} + #endif From patchwork Thu Feb 4 10:22:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066921 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46687C4332D for ; Thu, 4 Feb 2021 10:33:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E8E7364E34 for ; Thu, 4 Feb 2021 10:33:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235597AbhBDKdN (ORCPT ); Thu, 4 Feb 2021 05:33:13 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54283 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235588AbhBDKcJ (ORCPT ); Thu, 4 Feb 2021 05:32:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434729; x=1643970729; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TYXw8oVffvrRDwV3lkf+WMd8tHTd6h5e+gPXGQFVwFg=; b=EuTn4voVQXD42eMMdSm2oa9QkeVEcG7X5YpWe8Ne0BeinvlUASYoQ/bk D5Rfos5BtLakOqffI6+KZWsqqlmLHAwsbE6kUs2u5sv0nTKNOwhRyzSrO fLnEbI85qwVTykAHxl8UfA+KMcU3uU/R+4MvBElvy03WxrRgDLKc50eHV NWQbfj1gge//cWS5/PzDcM552z+QFnkXxNoqSv/h7Lrh0C+hN1HTBoyN7 9oEY9l2c1dadPV4DLdvyqlm32kyAxsLqyZ6n9nUvmfjcRzqnBDWSDKe6h 7egA0VCVRPbMpMTClBMKooRgS3j0ZhqQA3xb7God9mJdvuLtOflRpKf65 Q==; IronPort-SDR: tbTOFySAsLDbv3d0SxWy8giptYcS4cqsGWIp2Ezo/DwHfuafvYCQLL+Z8HrbJ7hs4U9meGo5NB xtQPziwEoCqw72l7zlro4hGGWzLi3RhivUc2usrkmAW6yp4JBxHMJeqrK+ObmYOdTlLcK6LPrQ KaEhB6Th2Ohr62xvvhveRjmDICl2qPj3blMHvf7f6d1skeRjtQdJ7uKhxWlVBiXeUK5yv65IKe JzjwGqJsKAr5gfwzjbUZQRcSP6w92k2jjr5epzBoGAF112qOkARGUjCNX1WdyfWjez7iR3STaF f4k= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108078" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:51 +0800 IronPort-SDR: TmEFt15P3oyjRHX1n7lrNtHsvypvEIe2kKlqMtgU3A/AvT4qI/5FWLtqtWgNuFAeb+3yzuS9L5 NgFHm+3FcMIazEFiDrTi8CZ5kPEjuFcUHgAdMslBvl5FFArEeQymbFBMYDyzZOGx1xyJwnJTa5 9Yk9FZwxxvLUNGw4H+pHYz7UMKRHYNF3Fm+g4BUJLhSBhRCHNv4WkQM53nD26AMfDoIcKnbqDw N1yjYHQF0ad+eNTPJkiEs8MQ0/OIqNlhxiLvQlYBYLogSxJLjgmFCDwyZ5r4RSlCvGWZdu098E M8laXIxXMPlP0rnJgsDKbufN Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:54 -0800 IronPort-SDR: eJuXZtNVhEUSNcV/wlmb0x8oDKUXjm5DeAarNwpLb4sGYhvLvYhyE2TQUbsuqE1oNZTGN+VG1p HN6i9Vj6ZG8V3QGSSUGCcKirqWpfs3nfhCoBRVNI8EVx0J8RYEoNkGumBs28bmloYcL68qDtf4 tTx1CrriXKl5qrdhLU1G6D39Fj+icz5TcEiCkuiSDytZvy3WuEbqQLiJx0qQo93lefWmJxGmVr fDHSB91RBZTQC0OKwpGoEQ8X4UDha9nCYB0uIeRowfda3ZKhqybwoRoCrpXj+YPon0KfZ/Tzfy VPc= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:50 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Filipe Manana , Josef Bacik Subject: [PATCH v15 40/42] btrfs: zoned: serialize log transaction on zoned filesystems Date: Thu, 4 Feb 2021 19:22:19 +0900 Message-Id: <5eabc4600691c618f34f8f39c156d9c094f2687b.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 2/3 patch to enable tree-log on zoned filesystems. Since we can start more than one log transactions per subvolume simultaneously, nodes from multiple transactions can be allocated interleaved. Such mixed allocation results in non-sequential writes at the time of a log transaction commit. The nodes of the global log root tree (fs_info->log_root_tree), also have the same problem with mixed allocation. Serializes log transactions by waiting for a committing transaction when someone tries to start a new transaction, to avoid the mixed allocation problem. We must also wait for running log transactions from another subvolume, but there is no easy way to detect which subvolume root is running a log transaction. So, this patch forbids starting a new log transaction when other subvolumes already allocated the global log root tree. Cc: Filipe Manana Reviewed-by: Josef Bacik Signed-off-by: Naohiro Aota Reviewed-by: Filipe Manana --- fs/btrfs/tree-log.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index c02eeeac439c..8be3164d4c5d 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -105,6 +105,7 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans, struct btrfs_root *log, struct btrfs_path *path, u64 dirid, int del_all); +static void wait_log_commit(struct btrfs_root *root, int transid); /* * tree logging is a special write ahead log used to make sure that @@ -140,6 +141,7 @@ static int start_log_trans(struct btrfs_trans_handle *trans, { struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_root *tree_root = fs_info->tree_root; + const bool zoned = btrfs_is_zoned(fs_info); int ret = 0; /* @@ -160,12 +162,20 @@ static int start_log_trans(struct btrfs_trans_handle *trans, mutex_lock(&root->log_mutex); +again: if (root->log_root) { + int index = (root->log_transid + 1) % 2; + if (btrfs_need_log_full_commit(trans)) { ret = -EAGAIN; goto out; } + if (zoned && atomic_read(&root->log_commit[index])) { + wait_log_commit(root, root->log_transid - 1); + goto again; + } + if (!root->log_start_pid) { clear_bit(BTRFS_ROOT_MULTI_LOG_TASKS, &root->state); root->log_start_pid = current->pid; @@ -173,6 +183,17 @@ static int start_log_trans(struct btrfs_trans_handle *trans, set_bit(BTRFS_ROOT_MULTI_LOG_TASKS, &root->state); } } else { + if (zoned) { + mutex_lock(&fs_info->tree_log_mutex); + if (fs_info->log_root_tree) + ret = -EAGAIN; + else + ret = btrfs_init_log_root_tree(trans, fs_info); + mutex_unlock(&fs_info->tree_log_mutex); + } + if (ret) + goto out; + ret = btrfs_add_log_tree(trans, root); if (ret) goto out; @@ -201,14 +222,22 @@ static int start_log_trans(struct btrfs_trans_handle *trans, */ static int join_running_log_trans(struct btrfs_root *root) { + const bool zoned = btrfs_is_zoned(root->fs_info); int ret = -ENOENT; if (!test_bit(BTRFS_ROOT_HAS_LOG_TREE, &root->state)) return ret; mutex_lock(&root->log_mutex); +again: if (root->log_root) { + int index = (root->log_transid + 1) % 2; + ret = 0; + if (zoned && atomic_read(&root->log_commit[index])) { + wait_log_commit(root, root->log_transid - 1); + goto again; + } atomic_inc(&root->log_writers); } mutex_unlock(&root->log_mutex); From patchwork Thu Feb 4 10:22:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066925 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3DA95C433DB for ; Thu, 4 Feb 2021 10:33:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F3DD164F5E for ; Thu, 4 Feb 2021 10:33:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235608AbhBDKd1 (ORCPT ); Thu, 4 Feb 2021 05:33:27 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54215 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235470AbhBDKcc (ORCPT ); Thu, 4 Feb 2021 05:32:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434751; x=1643970751; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=x8SsAgvEZZFtSaxoJ+hvfgVR7BtrE4xDJTuYeqHp3AQ=; b=GLpFzu4VS7GRnHeRwfc3mrfcl2FzzHJ51ANVMGYNVXJhgcAmnj0uLZiq cLuqCM1WxXCyP7iTuVPnbXFnkDjTcyNcfyinVIR+6A0QUv7ObI8oONLwX S0I4CP1IRCvHG5dxYPnAzD9oT16eYWYbPSHUvP4anA9PZqEMgc6TcS8Zs zWMNLB46+4SUEOhQUJ5HOBcIbmqrJuSKOfsuWrSZKlsrOPbpl4EyNomz6 nl5QqXlMi83flmeR7iKSXs+szutC+0VYye4PNqpy0t7sL1deo67zzT8pu c9b/NyzinWugnsxII8l+d6HsvCHJvn86ygs3pVfGQhoquKHgOOo3cY00P Q==; IronPort-SDR: sHfPkhX67DRxZSPWXGNi6lAirudL4YRgNdemo1b1bLQIni+DXAmiO14Yqgzdr5W09u3jhQybsu v1KtGrDZC/SVjN9jDq1OQN8NE74wi41LjxAXqqtfbJywlJ3hOXKIPBdaSH/duwOVpCsjRWEqTo SFxEQ47hgFmyMRYRChpxfofv5ldF34XS14gEVQeTdedKq6eppuwg/HJ4z9bkzmE6Zex4oNbK9z Ef23Tm625WS5F759yuQVNPW99Ls0JC8fSfhFhBgxMeLOx8DP//yIFHVBc2JW7zMbGepRSbbqpV FA4= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108080" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:53 +0800 IronPort-SDR: r3ODROCgNwjXd1PH+51+zytJfZ6e1xbOSn/briII7Xqir91jiZ2dg94Jr993Teg5OiKiF7/z5x zZSI82jSIg0Jbw4MKL2KFF58BJKV3MpqQwaUci4MMSKoLDXqdD9KU8QrZ6wJGsbLKenKCKeAsI GUSmXRN0dYmFC/+XMwALwML5nqaElAi+8gqR0EC+F/d2EGZsonp3yKAAM9ZiOiiSrscXZCnMYA 3MqlsqgQ7ZUALXdQH+u2SlVPYiwNrXcMfihhLC4haldN/5Ff9elTCzUvfV/PVel55Noe0OoHQ3 LhMxRaQ9D4uvJ2aAvGrZKIp/ Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:56 -0800 IronPort-SDR: QYc4lxBqemlvzRNbJ9FIFlVqBhgZJopvapNLSI/873G/a6DuIMIUXfvDf/trZOI61QtKCtq1Eb 6iRT7uf4L6mUGbNdMew4D+88grNcnmrAr8PEL9O46L2hbS/pQmATNaJCsMtxF3P9GBMmjOSobm Q5YGmILrpQ82XpoPfyQNOewKLysthsbhNMCLlFSQ66GWDPedouyn/3OqE3fdYLkOZD83QOh3hU HWnc8Onsyvsd4fLCOOTFIaPlEmP77IQtaQvBVs/eqnnP70HJE5IXYVq9njUlaNDj2gMNa95Fda e/8= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:51 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Filipe Manana , Josef Bacik , Johannes Thumshirn Subject: [PATCH v15 41/42] btrfs: zoned: reorder log node allocation on zoned filesystem Date: Thu, 4 Feb 2021 19:22:20 +0900 Message-Id: <492da9326ecb5f888e76117983603bb502b7b589.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is the 3/3 patch to enable tree-log on zoned filesystems. The allocation order of nodes of "fs_info->log_root_tree" and nodes of "root->log_root" is not the same as the writing order of them. So, the writing causes unaligned write errors. Reorder the allocation of them by delaying allocation of the root node of "fs_info->log_root_tree," so that the node buffers can go out sequentially to devices. Cc: Filipe Manana Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Signed-off-by: Naohiro Aota --- fs/btrfs/disk-io.c | 12 +++++++----- fs/btrfs/tree-log.c | 27 +++++++++++++++++++++------ 2 files changed, 28 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 84c6650d5ef7..c2576c5fe62e 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1298,16 +1298,18 @@ int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { struct btrfs_root *log_root; - int ret; log_root = alloc_log_tree(trans, fs_info); if (IS_ERR(log_root)) return PTR_ERR(log_root); - ret = btrfs_alloc_log_tree_node(trans, log_root); - if (ret) { - btrfs_put_root(log_root); - return ret; + if (!btrfs_is_zoned(fs_info)) { + int ret = btrfs_alloc_log_tree_node(trans, log_root); + + if (ret) { + btrfs_put_root(log_root); + return ret; + } } WARN_ON(fs_info->log_root_tree); diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 8be3164d4c5d..7ba044bfa9b1 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3159,6 +3159,19 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, list_add_tail(&root_log_ctx.list, &log_root_tree->log_ctxs[index2]); root_log_ctx.log_transid = log_root_tree->log_transid; + if (btrfs_is_zoned(fs_info)) { + mutex_lock(&fs_info->tree_log_mutex); + if (!log_root_tree->node) { + ret = btrfs_alloc_log_tree_node(trans, log_root_tree); + if (ret) { + mutex_unlock(&fs_info->tree_log_mutex); + mutex_unlock(&log_root_tree->log_mutex); + goto out; + } + } + mutex_unlock(&fs_info->tree_log_mutex); + } + /* * Now we are safe to update the log_root_tree because we're under the * log_mutex, and we're a current writer so we're holding the commit @@ -3317,12 +3330,14 @@ static void free_log_tree(struct btrfs_trans_handle *trans, .process_func = process_one_buffer }; - ret = walk_log_tree(trans, log, &wc); - if (ret) { - if (trans) - btrfs_abort_transaction(trans, ret); - else - btrfs_handle_fs_error(log->fs_info, ret, NULL); + if (log->node) { + ret = walk_log_tree(trans, log, &wc); + if (ret) { + if (trans) + btrfs_abort_transaction(trans, ret); + else + btrfs_handle_fs_error(log->fs_info, ret, NULL); + } } clear_extent_bits(&log->dirty_log_pages, 0, (u64)-1, From patchwork Thu Feb 4 10:22:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12066927 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 725ADC433E9 for ; Thu, 4 Feb 2021 10:35:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3854564F5E for ; Thu, 4 Feb 2021 10:35:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235603AbhBDKex (ORCPT ); Thu, 4 Feb 2021 05:34:53 -0500 Received: from esa4.hgst.iphmx.com ([216.71.154.42]:54218 "EHLO esa4.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235470AbhBDKdb (ORCPT ); Thu, 4 Feb 2021 05:33:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612434811; x=1643970811; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=31ceDJu2kLyRBJSxuPw51RxbKy7EDLDCYjcUDI+Ih0c=; b=iK/4ap8s5vRcfZH1mo/7BWobxGqXKBnCRbZfI6zSugxO50peQZva0RHJ eTsBzVIQ4q5q76CzrRGGvcm2B92H3t1vHNRp7oEH66EovK25WOTqYwwRm BlZ1GquEP5CJf/rI1hAH8ECPnngv9TlXiQ6NUotghUa3wBZngRO37PzsA xwzl0URKiS1fCfjufXxcdeVOcZzJK2XyBrgELiT9u2s+eYYnW7LGK/B6j bw+j2Tt9JbH+Tlm3tL1OuUb+Y5hLzP1sgcXzMHg1sY47xNL6uzy1vXIFm FfEemlt24L/ruJOEMPt20VQCyZh8+0M9xCI5dWYCucpDGeqcPqtt4ZMSE A==; IronPort-SDR: tnaOEmcMNeipcyXOMY60qH+uYVsKk3Bnue0vGtzE5oLeAR1Kz2mUXCs4c1MdYrr8d8rai7J2eO 1jBf2TMcBon8ENA8wadO30KqhX6lla8kSon0Gz3n6vvy/qVw+6trOusZUCkks4xvH6/aqz/I+P 5GC6bWBJl1LBdu+BoLRbzPCS5nJlAO6ufpAsuxjURUcgklLpLdJy8v4lWlZdYaBPVVTgl7yx9K 1diayeP6vHcdQHTJD5ifZeSLOKQWHyO8mxpMZMGHzULqruRh3WwE+6t1QKSEI3I42UMe7mNnTZ Jl4= X-IronPort-AV: E=Sophos;i="5.79,400,1602518400"; d="scan'208";a="159108084" Received: from uls-op-cesaip02.wdc.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 04 Feb 2021 18:23:55 +0800 IronPort-SDR: OVHgJQ3bgR41zYSJtU8G9HsmI1A9EX4o+e7nH5UKseXmTwwqRtI2hvJODy+mZvQkqVkO8cCpW9 KrjKP1V76y8vmNathwoqjz/11/9q/swaay3ZGQCI33gRR1Z1+l3fh1STcqggE323uI8x5vAVaI HLo1hRkO1mn3qTf+ybDoIFKvqdQOiydIW+72j3KirRkDJZc6Hug9vNfkil6Odm87pDm8BIZ3uj TJt0J2KGJfQBCGIHhple/Is8md2jfQOmykiXUESeXsfNuSt+4YwdcGbsR9I+Uws2I1+iNaFV0e bROZygCmSq9dgDnbt8aCN90A Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2021 02:05:57 -0800 IronPort-SDR: bhAZBBYJgUmAD9uq0Cj8A+j6Yx8yy9iAZR6FiU0YSA0oaDRAvPm87oTrdCc+iixOqoWes79nNV Z1nAnYQ/lRLKr8HVq2j/5z1LzQea8rO/+fPXPNCmVEI+DqSXSBJ3xnEM9yKtfVOcrFSPOteHUK n8IMottmv1ZC09PPFuFVhycjM1TjpEMQQqIQdtVx4HUXm6QevwUQeoMqIQ+zFb1kR/RffaOmw5 4mAamcEc91TykVLeMnPqjV8n+DC1vM0zqpPrrpMkzkJuV6o8ktzfXjWwoEXzVqmY8i4o3uIJu6 4sM= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon.wdc.com) ([10.84.71.79]) by uls-op-cesaip02.wdc.com with ESMTP; 04 Feb 2021 02:23:53 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Naohiro Aota , Anand Jain , Josef Bacik Subject: [PATCH v15 42/42] btrfs: zoned: enable to mount ZONED incompat flag Date: Thu, 4 Feb 2021 19:22:21 +0900 Message-Id: <7c375b7f63706927869c142b2bb408828472445f.1612434091.git.naohiro.aota@wdc.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This final patch adds the ZONED incompat flag to BTRFS_FEATURE_INCOMPAT_SUPP and enables btrfs to mount ZONED flagged file system. Reviewed-by: Anand Jain Signed-off-by: Naohiro Aota Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 6f4b493625ef..3bc00aed13b2 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -298,7 +298,8 @@ struct btrfs_super_block { BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \ BTRFS_FEATURE_INCOMPAT_METADATA_UUID | \ - BTRFS_FEATURE_INCOMPAT_RAID1C34) + BTRFS_FEATURE_INCOMPAT_RAID1C34 | \ + BTRFS_FEATURE_INCOMPAT_ZONED) #define BTRFS_FEATURE_INCOMPAT_SAFE_SET \ (BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF) From patchwork Fri Feb 5 09:26:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 12069695 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D828C433DB for ; Fri, 5 Feb 2021 09:30:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F129364FBE for ; Fri, 5 Feb 2021 09:30:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230423AbhBEJaM (ORCPT ); Fri, 5 Feb 2021 04:30:12 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:4008 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230322AbhBEJ15 (ORCPT ); Fri, 5 Feb 2021 04:27:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1612517276; x=1644053276; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=XB98zZQ4xlGspCrelVu2sxwba448ajtO9H0TUF+r09o=; b=h97ebBkUaByLFxexmxJoek7nxsasOBGZ4Cj+o9kGqJJjC8wGWErZbgY8 qAKl3aDF4S1vV4f+DWwW4wmrG8rZ9rRWcZIdf3C3it3TdghLvNBFVSNOH x9wDvibpu5Di1XphKofd7QyYneBh8eaU7oNBjklGAzJAH3nZA19DEIjDe NgP8lgCc4rMH5E+wgQslZF36RCungu1xDMzGxLY4bcHD+z3rz3nAeMpQY T0C2oPTcKqjJ4ytVHIsSZo1AIoK/Z81zKbZZLFsKc0egFKO71nvtM7LM9 Vaercz3Tiwahv8xQViAMBXYlUqu1nGt/gmYw3rcwCpvAxnVxtzNMbAxxC Q==; IronPort-SDR: TepCF29coyLikXLA7yB9kTrqZlP0kySANJsi77w/SX8SdwOCoZybVp9apAszWo+CkTnBrOXshD v0dgGgxC5KkWq9q+AXvRhymhU6b0mCwktC+uXHIpmbbvhJIciIIRrwY+DnKj2UMH73pFk0If6Z r92L/nBd2gvrQIaz4el1Q/OdmvzcTozH3SiANlTlodP9o4mjpI7w+HH0cGjD4pd1KUDYZVpjjZ GvtH8pPY3ioEG2tCtmpKwgqHIAQNKfYOLCgVOwazRM9S3XCWkG24xMYKZDn0ltav9Pr0WHdk9X bk8= X-IronPort-AV: E=Sophos;i="5.81,154,1610380800"; d="scan'208";a="160410287" Received: from h199-255-45-14.hgst.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 05 Feb 2021 17:26:38 +0800 IronPort-SDR: X+GDFTwY9urYwFvdQunIevvDtOpmQaA5Zv87CVY7tW0xJ4gaGbv0/+Xckrdv9+3zZcRZq76ydH nbQzgNBVFpYO5cNuypNPS/6AcX+M0iDcqs+Zp1v6zJTprBmhTVMXt0KE6inex1BwtA3eNLm4py F5gSqdVnQc11tGDmIypuLheiwthDON7gqNRYqCxniKwcCmm2i1ewXjxI8cscqG+NQ3rYtS3iUl ojaOPO3n2I2zKW1KO/94khb5DUdW3y7RBWzJZAmjt/955JKdUH8sf6Pnx1ImQIvn0xXvA0QhRo t8LruV/V9H616sh4Jv6ffyP6 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 01:10:35 -0800 IronPort-SDR: CkFZSti/6Chni8JfWyBj238UQ0+HJfFsIWux/shvqMqbt7CQbyW62YkH8fV6S96G8OzBivXK0r KFp5Ue2par5vI3QvXYLxuV9ogAv0fAGF1jLJhpZFdjE5iFVOxzLz98IXglWvH97tYYKuszx/ZB H235kntISgdi6Ry9qpw5Mnd57z9Hb7VCynljXaEU6rW1DPOF2dXS2TyqUFB3gzZUBpW5Oov2rD 0hXn42lBsnb8Rp8jfCUNsHurCLmkUQQsUhdHsMhui3FKpnl84b8hmXI6HfvTdR7RmHjQB021TZ 88U= WDCIronportException: Internal Received: from jfklab-fym3sg2.ad.shared (HELO naota-xeon) ([10.84.71.79]) by uls-op-cesaip01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 01:26:38 -0800 Date: Fri, 5 Feb 2021 18:26:35 +0900 From: Naohiro Aota To: linux-btrfs@vger.kernel.org, dsterba@suse.com Cc: hare@suse.com, linux-fsdevel@vger.kernel.org, Johannes Thumshirn , Filipe Manana Subject: [PATCH v15 43/43] btrfs: zoned: deal with holes writing out tree-log pages Message-ID: <20210205092635.i6w3c7brawlv6pgs@naota-xeon> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Since the zoned filesystem requires sequential write out of metadata, we cannot proceed with a hole in tree-log pages. When such a hole exists, btree_write_cache_pages() will return -EAGAIN. We cannot wait for the range to be written, because it will cause a deadlock. So, let's bail out to a full commit in this case. Cc: Filipe Manana Signed-off-by: Naohiro Aota --- fs/btrfs/tree-log.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) This patch solves a regression introduced by fixing patch 40. I'm sorry for the confusing patch numbering. diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 4e72794342c0..629e605cd62d 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3120,6 +3120,14 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, */ blk_start_plug(&plug); ret = btrfs_write_marked_extents(fs_info, &log->dirty_log_pages, mark); + /* + * There is a hole writing out the extents and cannot proceed it on + * zoned filesystem, which require sequential writing. We can + * ignore the error for now, since we don't wait for completion for + * now. + */ + if (ret == -EAGAIN) + ret = 0; if (ret) { blk_finish_plug(&plug); btrfs_abort_transaction(trans, ret); @@ -3229,7 +3237,16 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, &log_root_tree->dirty_log_pages, EXTENT_DIRTY | EXTENT_NEW); blk_finish_plug(&plug); - if (ret) { + /* + * There is a hole in the extents, and failed to sequential write + * on zoned filesystem. We cannot wait for this write outs, sinc it + * cause a deadlock. Bail out to the full commit, instead. + */ + if (ret == -EAGAIN) { + btrfs_wait_tree_log_extents(log, mark); + mutex_unlock(&log_root_tree->log_mutex); + goto out_wake_log_root; + } else if (ret) { btrfs_set_log_full_commit(trans); btrfs_abort_transaction(trans, ret); mutex_unlock(&log_root_tree->log_mutex);