From patchwork Wed Jul 5 23:20:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13303028 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3886EC001B0 for ; Wed, 5 Jul 2023 23:23:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231364AbjGEXXR (ORCPT ); Wed, 5 Jul 2023 19:23:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41854 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232046AbjGEXXJ (ORCPT ); Wed, 5 Jul 2023 19:23:09 -0400 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7523199F for ; Wed, 5 Jul 2023 16:23:05 -0700 (PDT) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 8DC785C0283; Wed, 5 Jul 2023 19:23:04 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Wed, 05 Jul 2023 19:23:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1688599384; x= 1688685784; bh=65B4RDPqj4d2xwt0izRLno+OHGLMGrpNZjT9HMvRX+8=; b=S HyRoNn7qxUZKt1AkqwYSTy7YvoYFt0fTGKQcaXjL1DIfCI5NJ8ia6mxOraqdhlyG gsx6TOeBWkyG695PNGVG6mQrbdo0xV/bm9W2CupbUz28JLSli8+ntC+uISfcTqb8 cCJNidl39Za1nY0cwH7wHEVkJ/IplylgBM46cejXsA4gVZ49sTJZpRBUXBgeQFpc IpEfykkQEnsyBxOJLGJOSb6YhGwnjCtNdl7kXFBiKVf1k/3D38YnNbJZGhf0TqN9 ejpY8gG498hboNo7QWZ7GamlfERNu8kCe1TumdLmjCO1xmY+5Jo6Qjz9JFlyaOsn wb/8hIXRho+nDIvWh2dMA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1688599384; x=1688685784; bh=6 5B4RDPqj4d2xwt0izRLno+OHGLMGrpNZjT9HMvRX+8=; b=UN1hYZrW2/MkvbvDp G/izV48FGLKFYfIU7/d9Yn9ZaamtAJm4g2a0degxPaPJ04p69YNNjkkG1guGQXuG aKJfjBMO/S99JBJqvLWF6BfKV8JraBgNNE7wlUg9j2CFkM/OE7OyVtzY/T497G+m HKW4GHjLB13/zwQwSpNB3AhUh4CHgOPNpzim6jw2ly7kAxopFZGDVaiWxbyVUTa7 XS/0G325C1zZrytbkVh48pGunBfC+OfiACmPNKWcYsH472nOnfrW6D3NEexVJSLk mWV6bt24+jIjelyPg3SchCZiCYmxG+FO/taYxZs65jdjCVNDkBCbICFewtEnNCqX CTygw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudekgddvudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhi oheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejffevvd ehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehm rghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 Jul 2023 19:23:03 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 01/18] btrfs: free qgroup rsv on io failure Date: Wed, 5 Jul 2023 16:20:38 -0700 Message-ID: <860ef499b2c45e2798267dd323b1b991c2bac79a.1688597211.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org If we do a write whose bio suffers an error, we will never reclaim the qgroup reserved space for it. We allocate the space in the write_iter codepath, then release the reservation as we allocate the ordered extent, but we only create a delayed ref if the ordered extent finishes. If it has an error, we simply leak the rsv. This is apparent in running any error injecting (dmerror) fstests like btrfs/146 or btrfs/160. Such tests fail due to dmesg on umount complaining about the leaked qgroup data space. When we clean up other aspects of space on failed ordered_extents, also free the qgroup rsv. Signed-off-by: Boris Burkov Reviewed-by: Josef Bacik --- fs/btrfs/inode.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index dbbb67293e34..1984e140ca46 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3359,6 +3359,13 @@ int btrfs_finish_one_ordered(struct btrfs_ordered_extent *ordered_extent) btrfs_free_reserved_extent(fs_info, ordered_extent->disk_bytenr, ordered_extent->disk_num_bytes, 1); + /* + * Actually free the qgroup rsv which was released when + * the ordered extent was created. + */ + btrfs_qgroup_free_refroot(fs_info, inode->root->root_key.objectid, + ordered_extent->qgroup_rsv, + BTRFS_QGROUP_RSV_DATA); } } From patchwork Wed Jul 5 23:20:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13303027 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8999EB64DA for ; Wed, 5 Jul 2023 23:23:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231574AbjGEXXQ (ORCPT ); Wed, 5 Jul 2023 19:23:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41870 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232085AbjGEXXJ (ORCPT ); Wed, 5 Jul 2023 19:23:09 -0400 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4E160198E for ; Wed, 5 Jul 2023 16:23:07 -0700 (PDT) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 498FB5C028E; Wed, 5 Jul 2023 19:23:06 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Wed, 05 Jul 2023 19:23:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1688599386; x= 1688685786; bh=G512oza/F2C2P0WQmOip5mZlVluqNDhNAN1ErLQz+Vc=; b=P gXnrSQqio6/CfDOLz2VikQGc7P0m2o+QizMN6ufVXR0WFxmCzMxuFUHI1VAKp/Ft JrvBHFQla75mWJS7DWgPZAxdIZ9tp2KXGzF/cwqlPvcaUdQws0iBAncax0wI3kFx vTCuy7gfb/Q80VFUVxc5j719St6+/SVmmyKSsPm8OWlRXXyaqFOcb4Kz78wvsHov 17Nmq8rBOBP6vFLnBvGZUko+cIdrUiRsPDHTCigpUd3Ynq4n5YW5re85C2nDyNmM VfSHn7D90g0it7kxuGTp3XPCTYgKNtDobX+RTRZ3UShMj3iBVjpcLqA1CxBzgF4z 1sGy5l08sDtGrc6G2pcgg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1688599386; x=1688685786; bh=G 512oza/F2C2P0WQmOip5mZlVluqNDhNAN1ErLQz+Vc=; b=GIub0+KCNEMsx6MWy rTr+cV+3k7bAD5nvUSCJ/rpavk6mC9+LqMwyg0NgRptj5uNdIs/zAYUoJDxf1ZtO mVzQEkbagM72bzb6dPghc6m2KTf+L5cL7gx7mjdSNwBqW/KTMySt/4+C+lDP9GfK joVxKLEkFnWutM3U4FRSuZnQzSVOgLGTaTxoVwkceuPONyPk+52KEF4qYdQN5XuH lPdk2+Uc0lTYxCmgGXszoaCo3MZBKSQZ9kx7eS4vmF8JyA1OXzRwaZdtaWOrlU5a 5PB13G0I74Ywnc5y/qNDS38Qqd5naXqSuxyJM4JO6VvjYTI7rfsaCTDuH2Xa1iZY FaY9w== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudekgddvtdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhi oheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejffevvd ehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehm rghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 Jul 2023 19:23:05 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 02/18] btrfs: fix start transaction qgroup rsv double free Date: Wed, 5 Jul 2023 16:20:39 -0700 Message-ID: <90d1a33e3722d5533a8bb595b658aae81d1e6c21.1688597211.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org btrfs_start_transaction reserves metadata space of the PERTRANS type before it identifies a transaction to start/join. This allows flushing when reserving that space without a deadlock. However, it results in a race which temporarily breaks qgroup rsv accounting. T1 T2 start_transaction do_stuff start_transaction qgroup_reserve_meta_pertrans commit_transaction qgroup_free_meta_all_pertrans hit an error starting txn goto reserve_fail qgroup_free_meta_pertrans (already freed!) The basic issue is that there is nothing preventing another commit from committing before start_transaction finishes (in fact sometimes we intentionally wait for it..) so any error path that frees the reserve is at risk of this race. While this exact space was getting freed anyway, and it's not a huge deal to double free it (just a warning, the free code catches this), it can result in incorrectly freeing some other pertrans reservation in this same reservation, which could then lead to spuriously granting reservations we might not have the space for. Therefore, I do believe it is worth fixing. To fix it, use the existing prealloc->pertrans conversion mechanism. When we first reserve the space, we reserve prealloc space and only when we are sure we have a transaction do we convert it to pertrans. This way any racing commits do not blow away our reservation, but we still get a pertrans reservation that is freed when _this_ transaction gets committed. This issue can be reproduced by running generic/269 with either qgroups or squotas enabled via mkfs on the scratch device. Signed-off-by: Boris Burkov Reviewed-by: Josef Bacik --- fs/btrfs/transaction.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index cf306351b148..d40b0a39a552 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -591,8 +591,13 @@ start_transaction(struct btrfs_root *root, unsigned int num_items, u64 delayed_refs_bytes = 0; qgroup_reserved = num_items * fs_info->nodesize; - ret = btrfs_qgroup_reserve_meta_pertrans(root, qgroup_reserved, - enforce_qgroups); + /* + * Use prealloc for now, as there might be a currently running + * transaction that could free this reserved space prematurely + * by committing. + */ + ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_reserved, + enforce_qgroups, false); if (ret) return ERR_PTR(ret); @@ -705,6 +710,14 @@ start_transaction(struct btrfs_root *root, unsigned int num_items, h->reloc_reserved = reloc_reserved; } + /* + * Now that we have found a transaction to be a part of, convert the + * qgroup reservation from prealloc to pertrans. A different transaction + * can't race in and free our pertrans out from under us. + */ + if (qgroup_reserved) + btrfs_qgroup_convert_reserved_meta(root, qgroup_reserved); + got_it: if (!current->journal_info) current->journal_info = h; @@ -752,7 +765,7 @@ start_transaction(struct btrfs_root *root, unsigned int num_items, btrfs_block_rsv_release(fs_info, &fs_info->trans_block_rsv, num_bytes, NULL); reserve_fail: - btrfs_qgroup_free_meta_pertrans(root, qgroup_reserved); + btrfs_qgroup_free_meta_prealloc(root, qgroup_reserved); return ERR_PTR(ret); } From patchwork Wed Jul 5 23:20:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13303032 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31A49C001DB for ; Wed, 5 Jul 2023 23:23:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232294AbjGEXXT (ORCPT ); Wed, 5 Jul 2023 19:23:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41844 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232367AbjGEXXL (ORCPT ); Wed, 5 Jul 2023 19:23:11 -0400 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 856261997 for ; Wed, 5 Jul 2023 16:23:08 -0700 (PDT) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id F29D15C0290; Wed, 5 Jul 2023 19:23:07 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Wed, 05 Jul 2023 19:23:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1688599387; x= 1688685787; bh=b3jua4w23nT6O4K4s/88PvMhnBjoQifAb7uk9Iq0ANQ=; b=I D2K/PeXSvofYA+LlCvspDlv6yElmtOgwUenJ4vTuwVvi023z7NrEfHcQohz+hH3h QzbYsHYbdCA4UcudFlNl1hvn4BEL6hceQ90nQM6Zpni62UYno5ANRbdMyPg9VBa+ /g//kAD9VgxSh6SDAvG0+MurACzAVTJ+TFtpHXp6NqE/9QBZYTxToXrBEGi0XRhp v+Q13ESUzU/YkXSZDUuYvp2wcfrHxIZWAfopk1SmtByIlbjKOBKokjQsMkTVJy4q H6W0WOUcKIIdkMKOl9htV0yim34P6LYp7q1l4Y2wZpi9cPPQ0ccROnymiWkm8/mx Kzk6hSk6qSVcGkf/JUuvQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1688599387; x=1688685787; bh=b 3jua4w23nT6O4K4s/88PvMhnBjoQifAb7uk9Iq0ANQ=; b=C3xYEybsqjHCAPbwU OP5v2VWGoycjH3xtC8FasZe/AC+rmzAejd3WGlWLYba3QsGWYAzA1+qZRsX5o9fC /wrCpAlc2E2pXjHPgdkJvC3P4Y+DVtcSoJz/Y1yAG4tnco+E2L8yVeHIlJjbSluJ bEv4kXw1kZ+KN5eiYTcs4u+jqcbm5kNNe8mau7GtyoMCKgwaRcc5ymMj8vy0D8+p NYWMtCa0ua1RpLflAgYLTMLMsp6hTM0j0GE7PRiqrORGA6GJXVbhOLnQ1LiyG4jy ycJ3IrttEIYFIIu+6gGldlXXySw5zWwJ3Br9YhqAJopDrOByA8cLN3F7wwSN4Fvu NM21w== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudekgddvudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhi oheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejffevvd ehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehm rghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 Jul 2023 19:23:07 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 03/18] btrfs: introduce quota mode Date: Wed, 5 Jul 2023 16:20:40 -0700 Message-ID: <2a0d955252c28b3652bac098e6229223f0847be6.1688597211.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In preparation for introducing simple quotas, change from a binary setting for quotas to an enum based mode. Initially, the possible modes are disabled/full. Full quotas is normal btrfs qgroups. Signed-off-by: Boris Burkov Reviewed-by: Josef Bacik --- fs/btrfs/qgroup.c | 7 +++++++ fs/btrfs/qgroup.h | 6 ++++++ 2 files changed, 13 insertions(+) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index da1f84a0eb29..2b4473cb77d6 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -30,6 +30,13 @@ #include "root-tree.h" #include "tree-checker.h" +enum btrfs_qgroup_mode btrfs_qgroup_mode(struct btrfs_fs_info *fs_info) +{ + if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + return BTRFS_QGROUP_MODE_DISABLED; + return BTRFS_QGROUP_MODE_FULL; +} + /* * Helpers to access qgroup reservation * diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index 7bffa10589d6..bb15e55f00b8 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -250,6 +250,12 @@ enum { }; int btrfs_quota_enable(struct btrfs_fs_info *fs_info); +enum btrfs_qgroup_mode { + BTRFS_QGROUP_MODE_DISABLED, + BTRFS_QGROUP_MODE_FULL, +}; + +enum btrfs_qgroup_mode btrfs_qgroup_mode(struct btrfs_fs_info *fs_info); int btrfs_quota_disable(struct btrfs_fs_info *fs_info); int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info); void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info); From patchwork Wed Jul 5 23:20:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13303031 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94C40EB64DA for ; Wed, 5 Jul 2023 23:23:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232165AbjGEXXU (ORCPT ); Wed, 5 Jul 2023 19:23:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41984 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231918AbjGEXXQ (ORCPT ); Wed, 5 Jul 2023 19:23:16 -0400 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E39719A1 for ; Wed, 5 Jul 2023 16:23:10 -0700 (PDT) Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.nyi.internal (Postfix) with ESMTP id B104D5C0281; Wed, 5 Jul 2023 19:23:09 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Wed, 05 Jul 2023 19:23:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1688599389; x= 1688685789; bh=tHGKoeA0eI8itKxLqKk9h/1fPzbYum01BFF++h912YE=; b=j mPOnbmmAD3Yatml8a4O7kCSdUAf+63SXQsgWJUQg3O7X+UY3A1oVMho6rcEk0k+C 4alnz62pFuoN52bZayWrrTywPEHil7ZHUe0wooIf4fRIK8/mwFdl3uANcF+mmpWR YVpDLqcysoigaHLDHIZFSN1WVgZTWQ6xL3Fxv7Nzt/KIHhlwS2S++zdz6LcEjH3+ zR987DWI4/Oy4/IUlQd7Dz0Qx+qX5c9PW5ov7kBkVm3s1VANkkGYY7PnVqCes4pS lggkTdB84MBlOPLaN8TzpJ9S/v/jH5CqeMrShXNzh+1gshGlFNfvuHAnTR0ZBQ3r 430n71I+iyXD3aC2USUVg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1688599389; x=1688685789; bh=t HGKoeA0eI8itKxLqKk9h/1fPzbYum01BFF++h912YE=; b=AeRyHJGKx/lgKGXyR Z5Y0w6sa1FuPOjIR7vF67Km8a5bxpubXxF0YCmiZp9vaagayfFzHyKCQkWX9ObZ9 KqFJCl82woJoNLoTEP9yh4pe17gn3ay+8TWiJcAZGAKDjfX0RBmRh4oALNBvG3bJ D0sXbvmJ3tENzeCps9UMb33S/QJGljjQe6aCkUq5OdFDlsfa1cdq2+smMrV6usbK /R/fWpG8ooh7yHHX66u7ZiL06obiEGcoKKl3+LmxyDkROqu1t+0EC33FELlv6wsJ QbJ3RXOqyr5upQ0XhXf7PXyRD+ckc2UVDDUo8P+VOUIWKvTs3I/4INNcdNzf2SMt cOsbA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudekgddvudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhi oheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejffevvd ehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehm rghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 Jul 2023 19:23:09 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 04/18] btrfs: add new quota mode for simple quotas Date: Wed, 5 Jul 2023 16:20:41 -0700 Message-ID: <668b2472739b712195d9520fdaed205ec7e4ee15.1688597211.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Add a new quota mode called "simple quotas". It can be enabled by the existing quota enable ioctl via an unused field, and sets an incompat bit, as the implementation of simple quotas will make backwards incompatible changes to the disk format of the extent tree. Signed-off-by: Boris Burkov --- fs/btrfs/delayed-ref.c | 4 +- fs/btrfs/fs.h | 5 +-- fs/btrfs/ioctl.c | 2 +- fs/btrfs/qgroup.c | 88 ++++++++++++++++++++++++++------------ fs/btrfs/qgroup.h | 4 +- fs/btrfs/root-tree.c | 2 +- fs/btrfs/transaction.c | 4 +- include/uapi/linux/btrfs.h | 2 + 8 files changed, 74 insertions(+), 37 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 6a13cf00218b..a9b938d3a531 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -898,7 +898,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans, return -ENOMEM; } - if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) && + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_DISABLED && !generic_ref->skip_qgroup) { record = kzalloc(sizeof(*record), GFP_NOFS); if (!record) { @@ -1002,7 +1002,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans, return -ENOMEM; } - if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) && + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_DISABLED && !generic_ref->skip_qgroup) { record = kzalloc(sizeof(*record), GFP_NOFS); if (!record) { diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h index 203d2a267828..f76f450c2abf 100644 --- a/fs/btrfs/fs.h +++ b/fs/btrfs/fs.h @@ -218,7 +218,8 @@ enum { BTRFS_FEATURE_INCOMPAT_NO_HOLES | \ BTRFS_FEATURE_INCOMPAT_METADATA_UUID | \ BTRFS_FEATURE_INCOMPAT_RAID1C34 | \ - BTRFS_FEATURE_INCOMPAT_ZONED) + BTRFS_FEATURE_INCOMPAT_ZONED | \ + BTRFS_FEATURE_INCOMPAT_SIMPLE_QUOTA) #ifdef CONFIG_BTRFS_DEBUG /* @@ -233,7 +234,6 @@ enum { #define BTRFS_FEATURE_INCOMPAT_SUPP \ (BTRFS_FEATURE_INCOMPAT_SUPP_STABLE) - #endif #define BTRFS_FEATURE_INCOMPAT_SAFE_SET \ @@ -790,7 +790,6 @@ struct btrfs_fs_info { struct lockdep_map btrfs_state_change_map[4]; struct lockdep_map btrfs_trans_pending_ordered_map; struct lockdep_map btrfs_ordered_extent_map; - #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; struct rb_root block_tree; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index edbbd5cf23fc..63c1a9258a1b 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3691,7 +3691,7 @@ static long btrfs_ioctl_quota_ctl(struct file *file, void __user *arg) switch (sa->cmd) { case BTRFS_QUOTA_CTL_ENABLE: - ret = btrfs_quota_enable(fs_info); + ret = btrfs_quota_enable(fs_info, sa); break; case BTRFS_QUOTA_CTL_DISABLE: ret = btrfs_quota_disable(fs_info); diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 2b4473cb77d6..80c1f500b6cc 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -3,6 +3,8 @@ * Copyright (C) 2011 STRATO. All rights reserved. */ +#include "linux/btrfs.h" +#include #include #include #include @@ -10,7 +12,6 @@ #include #include #include -#include #include #include "ctree.h" @@ -34,6 +35,8 @@ enum btrfs_qgroup_mode btrfs_qgroup_mode(struct btrfs_fs_info *fs_info) { if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) return BTRFS_QGROUP_MODE_DISABLED; + if (btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)) + return BTRFS_QGROUP_MODE_SIMPLE; return BTRFS_QGROUP_MODE_FULL; } @@ -347,6 +350,8 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid, static void qgroup_mark_inconsistent(struct btrfs_fs_info *fs_info) { + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) + return; fs_info->qgroup_flags |= (BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT | BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN | BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING); @@ -368,7 +373,7 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info) u64 flags = 0; u64 rescan_progress = 0; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED) return 0; fs_info->qgroup_ulist = ulist_alloc(GFP_KERNEL); @@ -419,7 +424,8 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info) goto out; } if (btrfs_qgroup_status_generation(l, ptr) != - fs_info->generation) { + fs_info->generation && + btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_SIMPLE) { qgroup_mark_inconsistent(fs_info); btrfs_err(fs_info, "qgroup generation mismatch, marked as inconsistent"); @@ -557,7 +563,7 @@ bool btrfs_check_quota_leak(struct btrfs_fs_info *fs_info) struct rb_node *node; bool ret = false; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED) return ret; /* * Since we're unmounting, there is no race and no need to grab qgroup @@ -956,7 +962,8 @@ static int btrfs_clean_quota_tree(struct btrfs_trans_handle *trans, return ret; } -int btrfs_quota_enable(struct btrfs_fs_info *fs_info) +int btrfs_quota_enable(struct btrfs_fs_info *fs_info, + struct btrfs_ioctl_quota_ctl_args *quota_ctl_args) { struct btrfs_root *quota_root; struct btrfs_root *tree_root = fs_info->tree_root; @@ -968,6 +975,7 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info) struct btrfs_qgroup *qgroup = NULL; struct btrfs_trans_handle *trans = NULL; struct ulist *ulist = NULL; + bool simple = quota_ctl_args->status == BTRFS_QUOTA_CTL_ENABLE_SIMPLE_QUOTA; int ret = 0; int slot; @@ -1070,8 +1078,9 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info) struct btrfs_qgroup_status_item); btrfs_set_qgroup_status_generation(leaf, ptr, trans->transid); btrfs_set_qgroup_status_version(leaf, ptr, BTRFS_QGROUP_STATUS_VERSION); - fs_info->qgroup_flags = BTRFS_QGROUP_STATUS_FLAG_ON | - BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; + fs_info->qgroup_flags = BTRFS_QGROUP_STATUS_FLAG_ON; + if (!simple) + fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; btrfs_set_qgroup_status_flags(leaf, ptr, fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAGS_MASK); btrfs_set_qgroup_status_rescan(leaf, ptr, 0); @@ -1187,8 +1196,14 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info) spin_lock(&fs_info->qgroup_lock); fs_info->quota_root = quota_root; set_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags); + if (simple) + btrfs_set_fs_incompat(fs_info, SIMPLE_QUOTA); spin_unlock(&fs_info->qgroup_lock); + /* Skip rescan for simple qgroups */ + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) + goto out_free_path; + ret = qgroup_rescan_init(fs_info, 0, 1); if (!ret) { qgroup_rescan_zero_tracking(fs_info); @@ -1787,6 +1802,9 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info, struct btrfs_qgroup_extent_record *entry; u64 bytenr = record->bytenr; + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) + return 0; + lockdep_assert_held(&delayed_refs->lock); trace_btrfs_qgroup_trace_extent(fs_info, record); @@ -1819,6 +1837,8 @@ int btrfs_qgroup_trace_extent_post(struct btrfs_trans_handle *trans, struct btrfs_backref_walk_ctx ctx = { 0 }; int ret; + if (btrfs_qgroup_mode(trans->fs_info) != BTRFS_QGROUP_MODE_FULL) + return 0; /* * We are always called in a context where we are already holding a * transaction handle. Often we are called when adding a data delayed @@ -1874,7 +1894,7 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr, struct btrfs_delayed_ref_root *delayed_refs; int ret; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL || bytenr == 0 || num_bytes == 0) return 0; record = kzalloc(sizeof(*record), GFP_NOFS); @@ -1907,7 +1927,7 @@ int btrfs_qgroup_trace_leaf_items(struct btrfs_trans_handle *trans, u64 bytenr, num_bytes; /* We can be called directly from walk_up_proc() */ - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) return 0; for (i = 0; i < nr; i++) { @@ -2283,7 +2303,7 @@ static int qgroup_trace_subtree_swap(struct btrfs_trans_handle *trans, int level; int ret; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) return 0; /* Wrong parameter order */ @@ -2340,7 +2360,7 @@ int btrfs_qgroup_trace_subtree(struct btrfs_trans_handle *trans, BUG_ON(root_level < 0 || root_level >= BTRFS_MAX_LEVEL); BUG_ON(root_eb == NULL); - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) return 0; spin_lock(&fs_info->qgroup_lock); @@ -2680,7 +2700,7 @@ int btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans, u64 bytenr, * If quotas get disabled meanwhile, the resources need to be freed and * we can't just exit here. */ - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) || + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL || fs_info->qgroup_flags & BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING) goto out_free; @@ -2768,6 +2788,9 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans) u64 qgroup_to_skip; int ret = 0; + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) + return 0; + delayed_refs = &trans->transaction->delayed_refs; qgroup_to_skip = delayed_refs->qgroup_to_skip; while ((node = rb_first(&delayed_refs->dirty_extent_root))) { @@ -2883,7 +2906,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans) qgroup_mark_inconsistent(fs_info); spin_lock(&fs_info->qgroup_lock); } - if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_DISABLED) fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_ON; else fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_ON; @@ -2936,7 +2959,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, if (!committing) mutex_lock(&fs_info->qgroup_ioctl_lock); - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED) goto out; quota_root = fs_info->quota_root; @@ -3010,11 +3033,10 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, qgroup_dirty(fs_info, dstgroup); } - if (srcid) { + if (srcid && btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_FULL) { srcgroup = find_qgroup_rb(fs_info, srcid); if (!srcgroup) goto unlock; - /* * We call inherit after we clone the root in order to make sure * our counts don't go crazy, so at this point the only @@ -3302,6 +3324,9 @@ static int qgroup_rescan_leaf(struct btrfs_trans_handle *trans, int slot; int ret; + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) + return 1; + mutex_lock(&fs_info->qgroup_rescan_lock); extent_root = btrfs_extent_root(fs_info, fs_info->qgroup_rescan_progress.objectid); @@ -3384,8 +3409,8 @@ static bool rescan_should_stop(struct btrfs_fs_info *fs_info) { return btrfs_fs_closing(fs_info) || test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state) || - !test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) || - fs_info->qgroup_flags & BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN; + btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED || + fs_info->qgroup_flags & BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN; } static void btrfs_qgroup_rescan_worker(struct btrfs_work *work) @@ -3399,6 +3424,9 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work) bool stopped = false; bool did_leaf_rescans = false; + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) + return; + path = btrfs_alloc_path(); if (!path) goto out; @@ -3502,6 +3530,12 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid, { int ret = 0; + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) { + btrfs_warn(fs_info, "qgroup rescan init failed, running in simple mode. mode: %d\n", + btrfs_qgroup_mode(fs_info)); + return -EINVAL; + } + if (!init_flags) { /* we're resuming qgroup rescan at mount time */ if (!(fs_info->qgroup_flags & @@ -3532,7 +3566,7 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid, btrfs_warn(fs_info, "qgroup rescan init failed, qgroup is not enabled"); ret = -EINVAL; - } else if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) { + } else if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED) { /* Quota disable is in progress */ ret = -EBUSY; } @@ -3788,7 +3822,7 @@ static int qgroup_reserve_data(struct btrfs_inode *inode, u64 to_reserve; int ret; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &root->fs_info->flags) || + if (btrfs_qgroup_mode(root->fs_info) == BTRFS_QGROUP_MODE_DISABLED || !is_fstree(root->root_key.objectid) || len == 0) return 0; @@ -3920,7 +3954,7 @@ static int __btrfs_qgroup_release_data(struct btrfs_inode *inode, int trace_op = QGROUP_RELEASE; int ret; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &inode->root->fs_info->flags)) + if (btrfs_qgroup_mode(inode->root->fs_info) == BTRFS_QGROUP_MODE_DISABLED) return 0; /* In release case, we shouldn't have @reserved */ @@ -4031,7 +4065,7 @@ int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes, struct btrfs_fs_info *fs_info = root->fs_info; int ret; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) || + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED || !is_fstree(root->root_key.objectid) || num_bytes == 0) return 0; @@ -4072,7 +4106,7 @@ void btrfs_qgroup_free_meta_all_pertrans(struct btrfs_root *root) { struct btrfs_fs_info *fs_info = root->fs_info; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) || + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED || !is_fstree(root->root_key.objectid)) return; @@ -4088,7 +4122,7 @@ void __btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes, { struct btrfs_fs_info *fs_info = root->fs_info; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) || + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED || !is_fstree(root->root_key.objectid)) return; @@ -4153,7 +4187,7 @@ void btrfs_qgroup_convert_reserved_meta(struct btrfs_root *root, int num_bytes) { struct btrfs_fs_info *fs_info = root->fs_info; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) || + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED || !is_fstree(root->root_key.objectid)) return; /* Same as btrfs_qgroup_free_meta_prealloc() */ @@ -4261,7 +4295,7 @@ int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans, int level = btrfs_header_level(subvol_parent) - 1; int ret = 0; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) return 0; if (btrfs_node_ptr_generation(subvol_parent, subvol_slot) > @@ -4371,7 +4405,7 @@ int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans, int ret = 0; int i; - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) return 0; if (!is_fstree(root->root_key.objectid) || !root->reloc_root) return 0; diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index bb15e55f00b8..d4c4d039585f 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -249,13 +249,15 @@ enum { ENUM_BIT(QGROUP_FREE), }; -int btrfs_quota_enable(struct btrfs_fs_info *fs_info); enum btrfs_qgroup_mode { BTRFS_QGROUP_MODE_DISABLED, BTRFS_QGROUP_MODE_FULL, + BTRFS_QGROUP_MODE_SIMPLE }; enum btrfs_qgroup_mode btrfs_qgroup_mode(struct btrfs_fs_info *fs_info); +int btrfs_quota_enable(struct btrfs_fs_info *fs_info, + struct btrfs_ioctl_quota_ctl_args *quota_ctl_args); int btrfs_quota_disable(struct btrfs_fs_info *fs_info); int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info); void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info); diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c index 859874579456..044a8c2710f8 100644 --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -508,7 +508,7 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root, struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv; - if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) { + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_DISABLED) { /* One for parent inode, two for dir entries */ qgroup_num_bytes = 3 * fs_info->nodesize; ret = btrfs_qgroup_reserve_meta_prealloc(root, diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index d40b0a39a552..f644c7c04d53 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1523,11 +1523,11 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, int ret; /* - * Save some performance in the case that qgroups are not + * Save some performance in the case that full qgroups are not * enabled. If this check races with the ioctl, rescan will * kick in anyway. */ - if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) return 0; /* diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index dbb8b96da50d..1cdabdd92338 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -333,6 +333,7 @@ struct btrfs_ioctl_fs_info_args { #define BTRFS_FEATURE_INCOMPAT_RAID1C34 (1ULL << 11) #define BTRFS_FEATURE_INCOMPAT_ZONED (1ULL << 12) #define BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2 (1ULL << 13) +#define BTRFS_FEATURE_INCOMPAT_SIMPLE_QUOTA (1ULL << 14) struct btrfs_ioctl_feature_flags { __u64 compat_flags; @@ -753,6 +754,7 @@ struct btrfs_ioctl_get_dev_stats { #define BTRFS_QUOTA_CTL_ENABLE 1 #define BTRFS_QUOTA_CTL_DISABLE 2 #define BTRFS_QUOTA_CTL_RESCAN__NOTUSED 3 +#define BTRFS_QUOTA_CTL_ENABLE_SIMPLE_QUOTA (1UL) struct btrfs_ioctl_quota_ctl_args { __u64 cmd; __u64 status; From patchwork Wed Jul 5 23:20:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13303030 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 379D0C001DF for ; Wed, 5 Jul 2023 23:23:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232323AbjGEXXV (ORCPT ); Wed, 5 Jul 2023 19:23:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42020 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232291AbjGEXXS (ORCPT ); Wed, 5 Jul 2023 19:23:18 -0400 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D4561997 for ; Wed, 5 Jul 2023 16:23:12 -0700 (PDT) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 0C72C5C028B; Wed, 5 Jul 2023 19:23:12 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Wed, 05 Jul 2023 19:23:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1688599392; x= 1688685792; bh=1jI89jJpbcX+QlAxKYbnErOrqzmmXD3xHc3JW0HJ1/E=; b=a poGWzWjZK7WHZ6DeJoWC8tyl+W8r2jXU7RBENN9cUblDdpPD/RNAwRTWDuMxDRyj IW2+mxu0AuxOAwt7opBSNke1FzxE3pwWZFnNI4Xyk1CjBCP+5XyVn5fyFkf1s3GN Lt6NDWmqE7d8luQ7kc+2wk7dPW+LjUsSt5WotPbugVu7q3/x9/7AHeq4sbsBv8ij 7y77eqG9RXe6Ffr6Sifd/SSGh746ljY1hcry/N/4LaqI0UkmQMz/u82WZykSQuWj lJjC2IgIDj64uauMg/d5BQ5y8CnsNkXVU8084ZYGfeoWVXUUc1OYS/dJAFXeZjds bDf0CbeDfSSkR6dX5Yk7A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1688599392; x=1688685792; bh=1 jI89jJpbcX+QlAxKYbnErOrqzmmXD3xHc3JW0HJ1/E=; b=YgQjNeHrPfwyoFkAD OP/6yCl5Pqw9Y7pAKha3DmZuHGDQ2DwjomfCFY9bLYnuueOU5UPcCL0gOGn4krFO 4CqqA8zRDHi4+t8uV8mV6Xg/cvpmbYeQ6zHguicI8a39/Ahdm1YEacPVOjr5zImO MoCezOKn2aOpe1coqFs8PgsBetKcwo8WDe8wvc91ZUq88VWBMEmT/eC8RTt/37zy 9Pe/Va+Rg6jMCcNjHqdGmXzeIB7KtKibah0mrXhRFcJVMNZIphCoALBqQuZ0qjmA y7spOsQKkkhD2FYwfttOw9HXj/eFzDqq6/tVmP434e7UCJrfbqmDuRdnZht/8Cvw rBgtA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudekgddvtdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhi oheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejffevvd ehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgepudenucfrrghrrghmpehm rghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 Jul 2023 19:23:11 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 05/18] btrfs: expose quota mode via sysfs Date: Wed, 5 Jul 2023 16:20:42 -0700 Message-ID: <0628cfee8bca3506127b46ac49ee4e2603f081f0.1688597211.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Add a new sysfs file /sys/fs/btrfs//qgroups/mode which prints out the mode qgroups is running in. The possible modes are disabled, qgroup, and squota Signed-off-by: Boris Burkov Reviewed-by: Josef Bacik --- fs/btrfs/sysfs.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index b1d1ac25237b..e53614753391 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -2086,6 +2086,31 @@ static ssize_t qgroup_enabled_show(struct kobject *qgroups_kobj, } BTRFS_ATTR(qgroups, enabled, qgroup_enabled_show); +static ssize_t qgroup_mode_show(struct kobject *qgroups_kobj, + struct kobj_attribute *a, + char *buf) +{ + struct btrfs_fs_info *fs_info = to_fs_info(qgroups_kobj->parent); + char *mode = ""; + + spin_lock(&fs_info->qgroup_lock); + switch (btrfs_qgroup_mode(fs_info)) { + case BTRFS_QGROUP_MODE_DISABLED: + mode = "disabled"; + break; + case BTRFS_QGROUP_MODE_FULL: + mode = "qgroup"; + break; + case BTRFS_QGROUP_MODE_SIMPLE: + mode = "squota"; + break; + } + spin_unlock(&fs_info->qgroup_lock); + + return sysfs_emit(buf, "%s\n", mode); +} +BTRFS_ATTR(qgroups, mode, qgroup_mode_show); + static ssize_t qgroup_inconsistent_show(struct kobject *qgroups_kobj, struct kobj_attribute *a, char *buf) @@ -2148,6 +2173,7 @@ static struct attribute *qgroups_attrs[] = { BTRFS_ATTR_PTR(qgroups, enabled), BTRFS_ATTR_PTR(qgroups, inconsistent), BTRFS_ATTR_PTR(qgroups, drop_subtree_threshold), + BTRFS_ATTR_PTR(qgroups, mode), NULL }; ATTRIBUTE_GROUPS(qgroups); From patchwork Wed Jul 5 23:20:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13303033 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 842D2EB64DA for ; Wed, 5 Jul 2023 23:23:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232383AbjGEXXW (ORCPT ); Wed, 5 Jul 2023 19:23:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232376AbjGEXXV (ORCPT ); Wed, 5 Jul 2023 19:23:21 -0400 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 534BC19A2 for ; Wed, 5 Jul 2023 16:23:14 -0700 (PDT) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id BBAEA5C028D; Wed, 5 Jul 2023 19:23:13 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 05 Jul 2023 19:23:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1688599393; x= 1688685793; bh=uNlVzaj79FB13LsG46M6m1r7LNzmwMdgDvsVer4X68c=; b=O IPEgzaTghm7wyFJe2idJhnsthP213ib27cNGyovTAWSzh4RkpEFyioLfv0o9AO+7 6FXsmexUhGVzIOR2L0gA5Beh6T8HYR7B59FrVj5f6FFAyB9iQhxsOLkoGpVvG/Jt dSnO2Jn8kTBoXKK66ju51U3fJL6CGFrd5YNq4VOMg4yiDinYiUN9uWHc5WpA/DF7 z/7iRCpGxm2FkpAUVMoDaxQqH/1sMhBlRoXi7cRPPByYpF7pmPwzDP5OhIZeiG46 p9bh7G6pkKmt5pXrIvKpnj3W8lvT2W2cai1TBZwoAotydsvtHE2/+efKlpE1tVdp R2tl6h1amrPtny2SDh+SQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1688599393; x=1688685793; bh=u NlVzaj79FB13LsG46M6m1r7LNzmwMdgDvsVer4X68c=; b=HnNl44r45EbFAvw32 ZAmP8GmoARZhO1j9dfWN5vPo3c8GKbPa0xK9y+Imy2m+zEJ2ZcXnZK38qKa9NOZx 1pRmf8hajR7V01HPyE5BeboJK2CIdUxW280dVD+QRTGfJDnDY6SM/abBkn08Ay0v S+VaVk8r1MLmasaX8KVGMChHMvmdYUoDhaEkQIcdEbYi12eaClE5BIzul06jglin fsxDktsYr9Zhiwl52mmxCXTBTIKliLCxnf+ONP/T0O/3igYFFRxj9qvCBGfcB4p3 eeGF5d1XJ+qcHfeL4NK8sHdzDNTXK9rTsPtrOJdvdnOsNup8G1gsEqsk3n89uNEF RoozA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudekgddvudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhi oheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejffevvd ehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehm rghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 Jul 2023 19:23:13 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 06/18] btrfs: flush reservations during quota disable Date: Wed, 5 Jul 2023 16:20:43 -0700 Message-ID: <0886b85983e4c7ab74574665fb25c9f2f81542bf.1688597211.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The following sequence: enable simple quotas do some writes reserve space create ordered_extent release rsv (store rsv_bytes in OE, mark QGROUP_RESERVED bits) disable quotas enable simple quotas set qgroup rsv to 0 on all subvols ordered_extent finishes create delayed ref with rsv_bytes from before run delayed ref record_simple_quota_delta free rsv_bytes (0 -> -rsv_delta) results in us reliably underflowing the subvolume's qgroup rsv counter, because disabling/re-enabling quotas toggles reservation counters down to 0, but does not remove other file system state which represents successful acquisition of qgroup rsv space. Specifically metadata rsv counters on the root object and rsv_bytes on ordered_extent objects that have released their reservation as well as the corresponding QGROUP_RESERVED extent bits. Normal qgroups gets away with this, I believe because it forces more work to happen on transaction commit, but I am not certain it is totally safe from the ordered_extent/leaked extent bit variant. Simple quotas hits this reliably. The intent of the fix is to make disable take the time to clear that external to qgroups state as well: after flipping off the quota bit on fs_info, flush delalloc and ordered extents, clearing the extent bits along the way. This makes it so there are no ordered extents or meta prealloc hanging around from the first enablement period during the second. Signed-off-by: Boris Burkov Reviewed-by: Josef Bacik --- fs/btrfs/qgroup.c | 47 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 44 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 80c1f500b6cc..75afd8212bc0 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1247,6 +1247,40 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info, return ret; } +/* + * It is possible to have outstanding ordered extents + * which reserved bytes before we disabled. We need to fully flush + * delalloc, ordered extents, and a commit to ensure that + * we don't leak such reservations, only to have them come back + * if we re-enable. + * + * i.e.: + * enable simple quotas + * reserve space + * release it, store rsv_bytes in OE + * disable quotas + * enable simple quotas (qgroup rsv are all 0) + * OE finishes + * run delayed refs + * free rsv_bytes, resulting in miscounting or even underflow + */ +static int flush_reservations(struct btrfs_fs_info *fs_info) +{ + struct btrfs_trans_handle *trans; + int ret; + + ret = btrfs_start_delalloc_roots(fs_info, LONG_MAX, false); + if (ret) + return ret; + btrfs_wait_ordered_roots(fs_info, U64_MAX, 0, (u64)-1); + trans = btrfs_join_transaction(fs_info->tree_root); + if (IS_ERR(trans)) + return PTR_ERR(trans); + btrfs_commit_transaction(trans); + + return ret; +} + int btrfs_quota_disable(struct btrfs_fs_info *fs_info) { struct btrfs_root *quota_root; @@ -1291,6 +1325,10 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info) clear_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags); btrfs_qgroup_wait_for_completion(fs_info, false); + ret = flush_reservations(fs_info); + if (ret) + goto out; + /* * 1 For the root item * @@ -1351,7 +1389,7 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info) if (ret && trans) btrfs_end_transaction(trans); else if (trans) - ret = btrfs_end_transaction(trans); + ret = btrfs_commit_transaction(trans); mutex_unlock(&fs_info->cleaner_mutex); return ret; @@ -3954,8 +3992,11 @@ static int __btrfs_qgroup_release_data(struct btrfs_inode *inode, int trace_op = QGROUP_RELEASE; int ret; - if (btrfs_qgroup_mode(inode->root->fs_info) == BTRFS_QGROUP_MODE_DISABLED) - return 0; + if (btrfs_qgroup_mode(inode->root->fs_info) == BTRFS_QGROUP_MODE_DISABLED) { + extent_changeset_init(&changeset); + return clear_record_extent_bits(&inode->io_tree, start, start + len - 1, + EXTENT_QGROUP_RESERVED, &changeset); + } /* In release case, we shouldn't have @reserved */ WARN_ON(!free && reserved); From patchwork Wed Jul 5 23:20:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13303034 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F15F2C001B0 for ; Wed, 5 Jul 2023 23:23:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232291AbjGEXXt (ORCPT ); Wed, 5 Jul 2023 19:23:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42066 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232108AbjGEXXV (ORCPT ); Wed, 5 Jul 2023 19:23:21 -0400 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE2011995 for ; Wed, 5 Jul 2023 16:23:15 -0700 (PDT) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 6ADC55C029A; Wed, 5 Jul 2023 19:23:15 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Wed, 05 Jul 2023 19:23:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1688599395; x= 1688685795; bh=jTyOEqNYdagK2IClPxvnIhRUEyMLn8dHeQHShqmQheE=; b=S abB11yPau/A9Fzq9PUAMJ9wmo/ket+Ekqx5YcBs3VQiGXOW6+ZCv/GAvxMg2JYm4 sD+6isTSQqtPIKZghwAwjqXNINWDBX9AySMMOguEVyQfVoz1pbwQFqeciWTW/7yY 5vrboIYuFZZLF/2/2EOasaol24Z8FXzbb1WIVYzBzEf1/En3uu1K+mQzxuOFexuq rUiMDxCG1cFqgxWa2fYfsyYbzr5Ry+5vrKUPtj5kIr2Ny6ARgkageD6obWRCCc4o 7pShtlf4Ps3OuWULBNdNBik7yKnn0/gB7I1rlBfxWNSx3AJxywqp8yFzkl7W0ZqP BLyJOBX2j3CwmoHS+t06w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1688599395; x=1688685795; bh=j TyOEqNYdagK2IClPxvnIhRUEyMLn8dHeQHShqmQheE=; b=ITHp/ZF+AO4djRFE6 zELjSgDXJfKVrgg9HYDNdfDDSRtm8utLBUcUpNVUitJlEfQjJbNFnDcqfZxYay2z Q5NB/huwJJIIl9DYEVL07L9GGWjP8Yk/5UaQpVdsZzZzA9vk7M5Yq2fFgufOQHpy zjNyEjCdBQeiOZTSqkpRLdpKZUh1tY133739wyBwdeCCDEtTFPZ2p+afuK516lXZ cS+Uu2BMVoNVWS/4Va9eMvX7yglGpaBjB9HetAPltvPFMqWEhfYS52yGHXyfPV0L OEsCYNFrGkwjQ6ipXzJH4i40Dp7fRH4A6/2LkVaZIEMvGty8PkOyyJBgV1lLoM7a R2XJA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudekgddvtdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhi oheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejffevvd ehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgepvdenucfrrghrrghmpehm rghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 Jul 2023 19:23:14 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 07/18] btrfs: create qgroup earlier in snapshot creation Date: Wed, 5 Jul 2023 16:20:44 -0700 Message-ID: <5aff5ceb6555f8026f414c4de9341c698837820b.1688597211.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Pull creating the qgroup earlier in the snapshot. This allows simple quotas qgroups to see all the metadata writes related to the snapshot being created and to be born with the root node accounted. Signed-off-by: Boris Burkov --- fs/btrfs/qgroup.c | 3 +++ fs/btrfs/transaction.c | 5 +++++ 2 files changed, 8 insertions(+) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 75afd8212bc0..8419270f7417 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1670,6 +1670,9 @@ int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid) struct btrfs_qgroup *qgroup; int ret = 0; + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_DISABLED) + return 0; + mutex_lock(&fs_info->qgroup_ioctl_lock); if (!fs_info->quota_root) { ret = -ENOTCONN; diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index f644c7c04d53..2bb5a64f6d84 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1716,6 +1716,11 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, } btrfs_release_path(path); + ret = btrfs_create_qgroup(trans, objectid); + if (ret) { + btrfs_abort_transaction(trans, ret); + goto fail; + } /* * pull in the delayed directory update * and the delayed inode item From patchwork Wed Jul 5 23:20:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13303035 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67F0BEB64DA for ; Wed, 5 Jul 2023 23:23:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232292AbjGEXXt (ORCPT ); Wed, 5 Jul 2023 19:23:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42028 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232392AbjGEXXX (ORCPT ); Wed, 5 Jul 2023 19:23:23 -0400 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AB1311BC2 for ; Wed, 5 Jul 2023 16:23:17 -0700 (PDT) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 1E6615C0281; Wed, 5 Jul 2023 19:23:17 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Wed, 05 Jul 2023 19:23:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1688599397; x= 1688685797; bh=eP5hdF+YLf/Yxlocn9TVXAugyHMatNQlD2dMek3bOjA=; b=k k/+murjv35Q/UkCjOIHzdZU9KkzxlMlv3wztMlEKlQ5VwMo0rkrPRPjXETLtK5M2 UhmQmoErRCnkgl2fEvcq56zjhOkM9lUNQtE0rlYDW5pG2DwXjGr6mFxG1avw2Y7R vJgD4UcEU3hFc/q6SD6aw5ORpmzSiI+mOKOU0Al7f9w27QONcVvOPQOxR/LfPvpS YEBr/dkGZA3PN5OsStLqIt3PoAjnGIuAj6RTIOTvTeLgoyUrDj7qoC483/AQpAWI q9h5qye6GUTc+xngubbNNRrIsFz6Qx5mcSIR/CpLjKtST0kJyc86+A8HWufNXkoy XnQ9uwB34h9YPZSFwgUWA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1688599397; x=1688685797; bh=e P5hdF+YLf/Yxlocn9TVXAugyHMatNQlD2dMek3bOjA=; b=evxQ2CEFR2xr7to7L 2B3J4H/BOnvmMMvHcpLGSFFxHLBlzptJq6Au9UFS0lrDYbDZMS8RYZOcyk7CgQi3 GsxXb9Ef4y6vF0IcB9XpFnmWH7S55oyAfyhomJ3Oh6sqfc78dW4KPflClnBdAO3W HR1kwMhpQUum4Tj5HPeZma32+tBuAN455a45nEFWtdmNexyLSv5Hlb49hA/Dmjdc XIJ9lLWbOxr1Q472ImXFL+UTy9L8hBYwXHtlrSoyDfw0UbNpJAqe/lkoVSSaEeTA jyUqj6FOp/q1Juk0OEkqVDBr24udRB/cSBIXIO7mdvD5SrNW2vlusnxkgnzu632d 4bkWQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudekgddvudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhi oheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejffevvd ehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgepudenucfrrghrrghmpehm rghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 Jul 2023 19:23:16 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 08/18] btrfs: function for recording simple quota deltas Date: Wed, 5 Jul 2023 16:20:45 -0700 Message-ID: <17e93036e598545781cb13376abb868dc22d51b2.1688597211.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Rather than re-computing shared/exclusive ownership based on backrefs and walking roots for implicit backrefs, simple quotas does an increment when creating an extent and a decrement when deleting it. Add the API for the extent item code to use to track those events. Also add a helper function to make collecting parent qgroups in a ulist easier for functions like this. Signed-off-by: Boris Burkov --- fs/btrfs/qgroup.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/qgroup.h | 11 ++++++- 2 files changed, 92 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 8419270f7417..97c00697b475 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -333,6 +333,44 @@ static int del_relation_rb(struct btrfs_fs_info *fs_info, return -ENOENT; } +static int qgroup_collect_parents(struct btrfs_qgroup *qgroup, + struct ulist *ul) +{ + struct ulist_iterator uiter; + struct ulist_node *unode; + struct btrfs_qgroup_list *glist; + struct btrfs_qgroup *qg; + bool err_free = false; + int ret = 0; + + if (!ul) { + ul = ulist_alloc(GFP_KERNEL); + err_free = true; + } else { + ulist_reinit(ul); + } + + ret = ulist_add(ul, qgroup->qgroupid, + qgroup_to_aux(qgroup), GFP_ATOMIC); + if (ret < 0) + goto out; + ULIST_ITER_INIT(&uiter); + while ((unode = ulist_next(ul, &uiter))) { + qg = unode_aux_to_qgroup(unode); + list_for_each_entry(glist, &qg->groups, next_group) { + ret = ulist_add(ul, glist->group->qgroupid, + qgroup_to_aux(glist->group), GFP_ATOMIC); + if (ret < 0) + goto out; + } + } + ret = 0; +out: + if (ret && err_free) + ulist_free(ul); + return ret; +} + #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid, u64 rfer, u64 excl) @@ -4531,3 +4569,47 @@ void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans) kfree(entry); } } + +int btrfs_record_simple_quota_delta(struct btrfs_fs_info *fs_info, + struct btrfs_simple_quota_delta *delta) +{ + int ret; + struct ulist *ul = fs_info->qgroup_ulist; + struct btrfs_qgroup *qgroup; + struct ulist_iterator uiter; + struct ulist_node *unode; + struct btrfs_qgroup *qg; + u64 root = delta->root; + u64 num_bytes = delta->num_bytes; + int sign = delta->is_inc ? 1 : -1; + + if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_SIMPLE) + return 0; + + if (!is_fstree(root)) + return 0; + + spin_lock(&fs_info->qgroup_lock); + qgroup = find_qgroup_rb(fs_info, root); + if (!qgroup) { + ret = -ENOENT; + goto out; + } + ret = qgroup_collect_parents(qgroup, ul); + if (ret) + goto out; + + ULIST_ITER_INIT(&uiter); + while ((unode = ulist_next(ul, &uiter))) { + qg = unode_aux_to_qgroup(unode); + qg->excl += num_bytes * sign; + qg->rfer += num_bytes * sign; + qgroup_dirty(fs_info, qg); + } + +out: + spin_unlock(&fs_info->qgroup_lock); + if (!ret && delta->rsv_bytes) + btrfs_qgroup_free_refroot(fs_info, root, delta->rsv_bytes, BTRFS_QGROUP_RSV_DATA); + return ret; +} diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index d4c4d039585f..94d85b4fbebd 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -235,6 +235,14 @@ struct btrfs_qgroup { struct kobject kobj; }; +struct btrfs_simple_quota_delta { + u64 root; /* The fstree root this delta counts against */ + u64 num_bytes; /* The number of bytes in the extent being counted */ + u64 rsv_bytes; /* The number of bytes reserved for this extent */ + bool is_inc; /* Whether we are using or freeing the extent */ + bool is_data; /* Whether the extent is data or metadata */ +}; + static inline u64 btrfs_qgroup_subvolid(u64 qgroupid) { return (qgroupid & ((1ULL << BTRFS_QGROUP_LEVEL_SHIFT) - 1)); @@ -447,5 +455,6 @@ int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *eb); void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans); bool btrfs_check_quota_leak(struct btrfs_fs_info *fs_info); - +int btrfs_record_simple_quota_delta(struct btrfs_fs_info *fs_info, + struct btrfs_simple_quota_delta *delta); #endif From patchwork Wed Jul 5 23:20:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13303036 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A42CC001DB for ; Wed, 5 Jul 2023 23:23:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232338AbjGEXXv (ORCPT ); Wed, 5 Jul 2023 19:23:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42066 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232396AbjGEXXX (ORCPT ); Wed, 5 Jul 2023 19:23:23 -0400 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4EA9C19A0 for ; Wed, 5 Jul 2023 16:23:19 -0700 (PDT) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id BC3DF5C028B; Wed, 5 Jul 2023 19:23:18 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Wed, 05 Jul 2023 19:23:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1688599398; x= 1688685798; bh=ty+S1vJ3uKmEldN8gRCc0jhW6fjHpkOWf4xhg2xLMH0=; b=j HhTyw+Oe3PJ+BARD1r6gWrvzwTEVLWTJ0P/wjylHsIEBI3XW89DBNsuFop/KGFSo SGvTQ5suXidaE+Gm2qqODI60kVUeWAmtc6sN0BSM29BrSdD3FuBY7trx0By+2A2V 2gpFgaXarwgQC/wKS4dHIRRo6w6w2Nv5peSdwQuIKaIxPGudr4mcbLbwwMwPbzc2 bFPJLDjwHsxnlgfYQ9esJQU8Zos7OYR5q9V6VITs7riR+PA0IiNoGY89STdgyFEE qOsQ3ijzs8zSgqCp2lk4s9wOqYQpZcvvKyfHmEF6Du54eiorZd4Uj/mHQ/7Bod90 feee8dES0Rf/4ibCXO8QQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1688599398; x=1688685798; bh=t y+S1vJ3uKmEldN8gRCc0jhW6fjHpkOWf4xhg2xLMH0=; b=N3ZNynQ0rRdD5fAe2 cGGpsoNJ7H283t47qZw7ugA9cRrgHV+qddz38PrmEiknzxmXJdcBdhE5d0Gd8SRW KnpLRSbvDI4Qs4fdqLSW83eD7N2uUu8Aza/SHAYtosJK1QIUytdFWuWmGaHUDOiU AV5Unqk34C6FNNWdGhG1JXseORvQgQJzwprrt0NSp8Ag+T0nnVPTuuYWC4ur/eZC /vhLd53eEU4tuHvaqDWxfXjiUAqAP3EfMGhMbdo6XFhSRig85O2wt1hWejPLDU4o xWRHHnpmdThPAdHG/Ecqvky55Yk5rb06MgCRVaG+lhnr5Yy6cwRvio3siFJabnUa RPmww== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudekgddvtdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhi oheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejffevvd ehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgepvdenucfrrghrrghmpehm rghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 Jul 2023 19:23:18 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 09/18] btrfs: rename tree_ref and data_ref owning_root Date: Wed, 5 Jul 2023 16:20:46 -0700 Message-ID: <76805ec21163d519e27b073110f70f9eba214021.1688597211.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org commit 113479d5b8eb ("btrfs: rename root fields in delayed refs structs") changed these from ref_root to owning_root. However, there are many circumstances where that name is not really accurate and the root on the ref struct _is_ the referring root. In general, these are not the owning root, though it does happen in some ref merging cases involving overwrites during snapshots and similar. Simple quotas cares quite a bit about tracking the original owner of an extent through delayed refs, so rename these back to free up the name for the real owning root (which will live on the generic btrfs_ref and the head ref) Signed-off-by: Boris Burkov Reviewed-by: Josef Bacik --- fs/btrfs/delayed-ref.c | 10 +++++----- fs/btrfs/delayed-ref.h | 12 ++++++------ fs/btrfs/extent-tree.c | 10 +++++----- fs/btrfs/ref-verify.c | 4 ++-- 4 files changed, 18 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index a9b938d3a531..f0bae1e1c455 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -885,7 +885,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans, u64 parent = generic_ref->parent; u8 ref_type; - is_system = (generic_ref->tree_ref.owning_root == BTRFS_CHUNK_TREE_OBJECTID); + is_system = (generic_ref->tree_ref.ref_root == BTRFS_CHUNK_TREE_OBJECTID); ASSERT(generic_ref->type == BTRFS_REF_METADATA && generic_ref->action); ref = kmem_cache_alloc(btrfs_delayed_tree_ref_cachep, GFP_NOFS); @@ -914,14 +914,14 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans, ref_type = BTRFS_TREE_BLOCK_REF_KEY; init_delayed_ref_common(fs_info, &ref->node, bytenr, num_bytes, - generic_ref->tree_ref.owning_root, action, + generic_ref->tree_ref.ref_root, action, ref_type); - ref->root = generic_ref->tree_ref.owning_root; + ref->root = generic_ref->tree_ref.ref_root; ref->parent = parent; ref->level = level; init_delayed_ref_head(head_ref, record, bytenr, num_bytes, - generic_ref->tree_ref.owning_root, 0, action, + generic_ref->tree_ref.ref_root, 0, action, false, is_system); head_ref->extent_op = extent_op; @@ -974,7 +974,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans, u64 bytenr = generic_ref->bytenr; u64 num_bytes = generic_ref->len; u64 parent = generic_ref->parent; - u64 ref_root = generic_ref->data_ref.owning_root; + u64 ref_root = generic_ref->data_ref.ref_root; u64 owner = generic_ref->data_ref.ino; u64 offset = generic_ref->data_ref.offset; u8 ref_type; diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index b8e14b0ba5f1..a71eff78469c 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -188,8 +188,8 @@ enum btrfs_ref_type { struct btrfs_data_ref { /* For EXTENT_DATA_REF */ - /* Original root this data extent belongs to */ - u64 owning_root; + /* Root which owns this data reference */ + u64 ref_root; /* Inode which refers to this data extent */ u64 ino; @@ -212,11 +212,11 @@ struct btrfs_tree_ref { int level; /* - * Root which owns this tree block. + * Root which owns this tree block reference. * * For TREE_BLOCK_REF (skinny metadata, either inline or keyed) */ - u64 owning_root; + u64 ref_root; /* For non-skinny metadata, no special member needed */ }; @@ -294,7 +294,7 @@ static inline void btrfs_init_tree_ref(struct btrfs_ref *generic_ref, generic_ref->real_root = mod_root ?: root; #endif generic_ref->tree_ref.level = level; - generic_ref->tree_ref.owning_root = root; + generic_ref->tree_ref.ref_root = root; generic_ref->type = BTRFS_REF_METADATA; if (skip_qgroup || !(is_fstree(root) && (!mod_root || is_fstree(mod_root)))) @@ -312,7 +312,7 @@ static inline void btrfs_init_data_ref(struct btrfs_ref *generic_ref, /* If @real_root not set, use @root as fallback */ generic_ref->real_root = mod_root ?: ref_root; #endif - generic_ref->data_ref.owning_root = ref_root; + generic_ref->data_ref.ref_root = ref_root; generic_ref->data_ref.ino = ino; generic_ref->data_ref.offset = offset; generic_ref->type = BTRFS_REF_DATA; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 911908ea5f6f..041f2eb153d7 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1386,7 +1386,7 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, ASSERT(generic_ref->type != BTRFS_REF_NOT_SET && generic_ref->action); BUG_ON(generic_ref->type == BTRFS_REF_METADATA && - generic_ref->tree_ref.owning_root == BTRFS_TREE_LOG_OBJECTID); + generic_ref->tree_ref.ref_root == BTRFS_TREE_LOG_OBJECTID); if (generic_ref->type == BTRFS_REF_METADATA) ret = btrfs_add_delayed_tree_ref(trans, generic_ref, NULL); @@ -3329,9 +3329,9 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_ref *ref) * tree, just update pinning info and exit early. */ if ((ref->type == BTRFS_REF_METADATA && - ref->tree_ref.owning_root == BTRFS_TREE_LOG_OBJECTID) || + ref->tree_ref.ref_root == BTRFS_TREE_LOG_OBJECTID) || (ref->type == BTRFS_REF_DATA && - ref->data_ref.owning_root == BTRFS_TREE_LOG_OBJECTID)) { + ref->data_ref.ref_root == BTRFS_TREE_LOG_OBJECTID)) { /* unlocks the pinned mutex */ btrfs_pin_extent(trans, ref->bytenr, ref->len, 1); ret = 0; @@ -3342,9 +3342,9 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_ref *ref) } if (!((ref->type == BTRFS_REF_METADATA && - ref->tree_ref.owning_root == BTRFS_TREE_LOG_OBJECTID) || + ref->tree_ref.ref_root == BTRFS_TREE_LOG_OBJECTID) || (ref->type == BTRFS_REF_DATA && - ref->data_ref.owning_root == BTRFS_TREE_LOG_OBJECTID))) + ref->data_ref.ref_root == BTRFS_TREE_LOG_OBJECTID))) btrfs_ref_tree_mod(fs_info, ref); return ret; diff --git a/fs/btrfs/ref-verify.c b/fs/btrfs/ref-verify.c index 95d28497de7c..b7b3bd86f5e2 100644 --- a/fs/btrfs/ref-verify.c +++ b/fs/btrfs/ref-verify.c @@ -681,10 +681,10 @@ int btrfs_ref_tree_mod(struct btrfs_fs_info *fs_info, if (generic_ref->type == BTRFS_REF_METADATA) { if (!parent) - ref_root = generic_ref->tree_ref.owning_root; + ref_root = generic_ref->tree_ref.ref_root; owner = generic_ref->tree_ref.level; } else if (!parent) { - ref_root = generic_ref->data_ref.owning_root; + ref_root = generic_ref->data_ref.ref_root; owner = generic_ref->data_ref.ino; offset = generic_ref->data_ref.offset; } From patchwork Wed Jul 5 23:20:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13303038 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E979CEB64DD for ; Wed, 5 Jul 2023 23:23:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232378AbjGEXXv (ORCPT ); Wed, 5 Jul 2023 19:23:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42098 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232403AbjGEXXZ (ORCPT ); Wed, 5 Jul 2023 19:23:25 -0400 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 21CEC1990 for ; Wed, 5 Jul 2023 16:23:21 -0700 (PDT) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 746FA5C02A5; Wed, 5 Jul 2023 19:23:20 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Wed, 05 Jul 2023 19:23:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1688599400; x= 1688685800; bh=30K/vuylWF3Qii7MVgvFFZttrkoBcu9PEE51vGauAqQ=; b=H 86/p14sHomj/4YZxiPs1OtprgZ5EW0POPAV/tuPV2hopC5EEm5bXNdOT9JC9VjAc MUN122s5I2pmAQBGk+6ollmkUd1EGNYkO3NCYGT3k0QYLZlbvjfjaRbH9RwY9nT1 50AEBo+PthZ4rlwDF2YmwuzlVe/ZrTzoX/xHL8ghZlEOfr6RDhGcHDbsAw5xA8Yq hEhYFX/8kfCT/AawadlXlrdAJlzuhJOvh2p1928zp++QoONp/YrnFzeRoKXOxc89 Mvh2WhbYU2wYKqcSPFXabuhwhT8+0991AvmQEdTlGFIRyzUOcjEXpqOBKJuZ6vca bY3R/yUFi8aIGD3Uh5wHw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1688599400; x=1688685800; bh=3 0K/vuylWF3Qii7MVgvFFZttrkoBcu9PEE51vGauAqQ=; b=Yn8+dHg1Ai1sHpzg+ IZftW699jhZdlqUKPwA0QczWawaMfcsz8qbx36mZ1vxhutVC7+HYH57SuhcOdNWk vxOdlpImNEzZYWTlVFlY3LZ37dyFstUcqfC6nY2MI6lvdQou4HulXBfXOR2VQm6r igVCuhpN/sXC8YmYz8ib/QNUilruG6eQ6hwQA2nTPKMkwqizWJiPui4ya2fQNCXo twQOqd1Hpo1jXE4/YWai1cp5leqYxPi2FtKKJVlDbijlLsjxolGaxgx6i0syvj8A iDiTwo89y9JyLpUePzVIvVPrAqqciP25we/bme5Pqnv+m7QUcwwxDtBiuc4oDUy7 YK6RQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudekgddvtdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhi oheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejffevvd ehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgepgeenucfrrghrrghmpehm rghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 Jul 2023 19:23:19 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 10/18] btrfs: track owning root in btrfs_ref Date: Wed, 5 Jul 2023 16:20:47 -0700 Message-ID: <2a1725b60a1978c03c67a93c55c8c52b76d7f046.1688597211.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org While data extents require us to store additional inline refs to track the original owner on free, this information is available implicitly for metadata. It is found in the owner field of the header of the tree block. Even if other trees refer to this block and the original ref goes away, we will not rewrite that header field, so it will reliably give the original owner. In addition, there is a relocation case where a new data extent needs to have an owning root separate from the referring root wired through delayed refs. To use it for recording simple quota deltas, we need to wire this root id through from when we create the delayed ref until we fully process it. Store it in the generic btrfs_ref struct of the delayed ref. Signed-off-by: Boris Burkov --- fs/btrfs/delayed-ref.c | 7 ++++--- fs/btrfs/delayed-ref.h | 13 +++++++++++-- fs/btrfs/extent-tree.c | 19 ++++++++++++------- fs/btrfs/file.c | 10 +++++----- fs/btrfs/inode-item.c | 2 +- fs/btrfs/relocation.c | 16 +++++++++------- fs/btrfs/tree-log.c | 3 ++- 7 files changed, 44 insertions(+), 26 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index f0bae1e1c455..49c320f2334b 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -840,7 +840,7 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans, static void init_delayed_ref_common(struct btrfs_fs_info *fs_info, struct btrfs_delayed_ref_node *ref, u64 bytenr, u64 num_bytes, u64 ref_root, - int action, u8 ref_type) + int action, u8 ref_type, u64 owning_root) { u64 seq = 0; @@ -857,6 +857,7 @@ static void init_delayed_ref_common(struct btrfs_fs_info *fs_info, ref->action = action; ref->seq = seq; ref->type = ref_type; + ref->owning_root = owning_root; RB_CLEAR_NODE(&ref->ref_node); INIT_LIST_HEAD(&ref->add_list); } @@ -915,7 +916,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans, init_delayed_ref_common(fs_info, &ref->node, bytenr, num_bytes, generic_ref->tree_ref.ref_root, action, - ref_type); + ref_type, generic_ref->owning_root); ref->root = generic_ref->tree_ref.ref_root; ref->parent = parent; ref->level = level; @@ -989,7 +990,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans, else ref_type = BTRFS_EXTENT_DATA_REF_KEY; init_delayed_ref_common(fs_info, &ref->node, bytenr, num_bytes, - ref_root, action, ref_type); + ref_root, action, ref_type, ref_root); ref->root = ref_root; ref->parent = parent; ref->objectid = owner; diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index a71eff78469c..336c33c28191 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -32,6 +32,12 @@ struct btrfs_delayed_ref_node { /* seq number to keep track of insertion order */ u64 seq; + /* + * root which originally allocated this extent and owns it for + * simple quota accounting purposes. + */ + u64 owning_root; + /* ref count on this data structure */ refcount_t refs; @@ -239,6 +245,7 @@ struct btrfs_ref { #endif u64 bytenr; u64 len; + u64 owning_root; /* Bytenr of the parent tree block */ u64 parent; @@ -278,16 +285,18 @@ static inline u64 btrfs_calc_delayed_ref_bytes(const struct btrfs_fs_info *fs_in } static inline void btrfs_init_generic_ref(struct btrfs_ref *generic_ref, - int action, u64 bytenr, u64 len, u64 parent) + int action, u64 bytenr, u64 len, u64 parent, u64 owning_root) { generic_ref->action = action; generic_ref->bytenr = bytenr; generic_ref->len = len; generic_ref->parent = parent; + generic_ref->owning_root = owning_root; } static inline void btrfs_init_tree_ref(struct btrfs_ref *generic_ref, - int level, u64 root, u64 mod_root, bool skip_qgroup) + int level, u64 root, u64 mod_root, + bool skip_qgroup) { #ifdef CONFIG_BTRFS_FS_REF_VERIFY /* If @real_root not set, use @root as fallback */ diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 041f2eb153d7..fa53f7cbd84a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2410,7 +2410,7 @@ static int __btrfs_mod_ref(struct btrfs_trans_handle *trans, num_bytes = btrfs_file_extent_disk_num_bytes(buf, fi); key.offset -= btrfs_file_extent_offset(buf, fi); btrfs_init_generic_ref(&generic_ref, action, bytenr, - num_bytes, parent); + num_bytes, parent, ref_root); btrfs_init_data_ref(&generic_ref, ref_root, key.objectid, key.offset, root->root_key.objectid, for_reloc); @@ -2424,7 +2424,7 @@ static int __btrfs_mod_ref(struct btrfs_trans_handle *trans, bytenr = btrfs_node_blockptr(buf, i); num_bytes = fs_info->nodesize; btrfs_init_generic_ref(&generic_ref, action, bytenr, - num_bytes, parent); + num_bytes, parent, ref_root); btrfs_init_tree_ref(&generic_ref, level - 1, ref_root, root->root_key.objectid, for_reloc); if (inc) @@ -3242,7 +3242,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, int ret; btrfs_init_generic_ref(&generic_ref, BTRFS_DROP_DELAYED_REF, - buf->start, buf->len, parent); + buf->start, buf->len, parent, btrfs_header_owner(buf)); btrfs_init_tree_ref(&generic_ref, btrfs_header_level(buf), root_id, 0, false); @@ -4699,12 +4699,16 @@ int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans, struct btrfs_key *ins) { struct btrfs_ref generic_ref = { 0 }; + u64 root_objectid = root->root_key.objectid; + u64 owning_root = root_objectid; + + BUG_ON(root_objectid == BTRFS_TREE_LOG_OBJECTID); BUG_ON(root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID); btrfs_init_generic_ref(&generic_ref, BTRFS_ADD_DELAYED_EXTENT, - ins->objectid, ins->offset, 0); - btrfs_init_data_ref(&generic_ref, root->root_key.objectid, owner, + ins->objectid, ins->offset, 0, owning_root); + btrfs_init_data_ref(&generic_ref, root_objectid, owner, offset, 0, false); btrfs_ref_tree_mod(root->fs_info, &generic_ref); @@ -4916,7 +4920,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans, extent_op->level = level; btrfs_init_generic_ref(&generic_ref, BTRFS_ADD_DELAYED_EXTENT, - ins.objectid, ins.offset, parent); + ins.objectid, ins.offset, parent, btrfs_header_owner(buf)); btrfs_init_tree_ref(&generic_ref, level, root_objectid, root->root_key.objectid, false); btrfs_ref_tree_mod(fs_info, &generic_ref); @@ -5336,8 +5340,9 @@ static noinline int do_walk_down(struct btrfs_trans_handle *trans, wc->drop_level = level; find_next_key(path, level, &wc->drop_progress); + // TODO owning root on generic ref!? btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF, bytenr, - fs_info->nodesize, parent); + fs_info->nodesize, parent, btrfs_header_owner(next)); btrfs_init_tree_ref(&ref, level - 1, root->root_key.objectid, 0, false); ret = btrfs_free_extent(trans, &ref); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 392bc7d512a0..f858d9652969 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -373,7 +373,7 @@ int btrfs_drop_extents(struct btrfs_trans_handle *trans, if (update_refs && disk_bytenr > 0) { btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF, - disk_bytenr, num_bytes, 0); + disk_bytenr, num_bytes, 0, root->root_key.objectid); btrfs_init_data_ref(&ref, root->root_key.objectid, new_key.objectid, @@ -463,7 +463,7 @@ int btrfs_drop_extents(struct btrfs_trans_handle *trans, } else if (update_refs && disk_bytenr > 0) { btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF, - disk_bytenr, num_bytes, 0); + disk_bytenr, num_bytes, 0, root->root_key.objectid); btrfs_init_data_ref(&ref, root->root_key.objectid, key.objectid, @@ -745,7 +745,7 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans, btrfs_mark_buffer_dirty(leaf); btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF, bytenr, - num_bytes, 0); + num_bytes, 0, root->root_key.objectid); btrfs_init_data_ref(&ref, root->root_key.objectid, ino, orig_offset, 0, false); ret = btrfs_inc_extent_ref(trans, &ref); @@ -771,7 +771,7 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans, other_start = end; other_end = 0; btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF, bytenr, - num_bytes, 0); + num_bytes, 0, root->root_key.objectid); btrfs_init_data_ref(&ref, root->root_key.objectid, ino, orig_offset, 0, false); if (extent_mergeable(leaf, path->slots[0] + 1, @@ -2294,7 +2294,7 @@ static int btrfs_insert_replace_extent(struct btrfs_trans_handle *trans, btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF, extent_info->disk_offset, - extent_info->disk_len, 0); + extent_info->disk_len, 0, root->root_key.objectid); ref_offset = extent_info->file_offset - extent_info->data_offset; btrfs_init_data_ref(&ref, root->root_key.objectid, btrfs_ino(inode), ref_offset, 0, false); diff --git a/fs/btrfs/inode-item.c b/fs/btrfs/inode-item.c index 4c322b720a80..4a56bf679de6 100644 --- a/fs/btrfs/inode-item.c +++ b/fs/btrfs/inode-item.c @@ -676,7 +676,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, bytes_deleted += extent_num_bytes; btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF, - extent_start, extent_num_bytes, 0); + extent_start, extent_num_bytes, 0, root->root_key.objectid); btrfs_init_data_ref(&ref, btrfs_header_owner(leaf), control->ino, extent_offset, root->root_key.objectid, false); diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 25a3361caedc..119f670538f7 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1158,7 +1158,7 @@ int replace_file_extents(struct btrfs_trans_handle *trans, key.offset -= btrfs_file_extent_offset(leaf, fi); btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF, new_bytenr, - num_bytes, parent); + num_bytes, parent, root->root_key.objectid); btrfs_init_data_ref(&ref, btrfs_header_owner(leaf), key.objectid, key.offset, root->root_key.objectid, false); @@ -1169,7 +1169,7 @@ int replace_file_extents(struct btrfs_trans_handle *trans, } btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF, bytenr, - num_bytes, parent); + num_bytes, parent, root->root_key.objectid); btrfs_init_data_ref(&ref, btrfs_header_owner(leaf), key.objectid, key.offset, root->root_key.objectid, false); @@ -1382,7 +1382,8 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc, btrfs_mark_buffer_dirty(path->nodes[level]); btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF, old_bytenr, - blocksize, path->nodes[level]->start); + blocksize, path->nodes[level]->start, + src->root_key.objectid); btrfs_init_tree_ref(&ref, level - 1, src->root_key.objectid, 0, true); ret = btrfs_inc_extent_ref(trans, &ref); @@ -1391,7 +1392,7 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc, break; } btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF, new_bytenr, - blocksize, 0); + blocksize, 0, dest->root_key.objectid); btrfs_init_tree_ref(&ref, level - 1, dest->root_key.objectid, 0, true); ret = btrfs_inc_extent_ref(trans, &ref); @@ -1401,7 +1402,8 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc, } btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF, new_bytenr, - blocksize, path->nodes[level]->start); + blocksize, path->nodes[level]->start, + src->root_key.objectid); btrfs_init_tree_ref(&ref, level - 1, src->root_key.objectid, 0, true); ret = btrfs_free_extent(trans, &ref); @@ -1411,7 +1413,7 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc, } btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF, old_bytenr, - blocksize, 0); + blocksize, 0, src->root_key.objectid); btrfs_init_tree_ref(&ref, level - 1, dest->root_key.objectid, 0, true); ret = btrfs_free_extent(trans, &ref); @@ -2491,7 +2493,7 @@ static int do_relocation(struct btrfs_trans_handle *trans, btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF, node->eb->start, blocksize, - upper->eb->start); + upper->eb->start, btrfs_header_owner(upper->eb)); btrfs_init_tree_ref(&ref, node->level, btrfs_header_owner(upper->eb), root->root_key.objectid, false); diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 8ad7e7e38d18..51aaaefaf39d 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -767,7 +767,8 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans, } else if (ret == 0) { btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF, - ins.objectid, ins.offset, 0); + ins.objectid, ins.offset, 0, + root->root_key.objectid); btrfs_init_data_ref(&ref, root->root_key.objectid, key->objectid, offset, 0, false); From patchwork Wed Jul 5 23:20:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13303037 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9509CC001DF for ; Wed, 5 Jul 2023 23:23:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232348AbjGEXXw (ORCPT ); Wed, 5 Jul 2023 19:23:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232420AbjGEXXZ (ORCPT ); Wed, 5 Jul 2023 19:23:25 -0400 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AE24A1995 for ; Wed, 5 Jul 2023 16:23:22 -0700 (PDT) Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.nyi.internal (Postfix) with ESMTP id 275D85C0283; Wed, 5 Jul 2023 19:23:22 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Wed, 05 Jul 2023 19:23:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1688599402; x= 1688685802; bh=+4u02dRquDZRbDAXWJOGiTdYcK26uXD+Fleby38+UAo=; b=C n3BdY/0bXsH1MGABYD2GVAie6N6wM0CHg20MKoEK7rQWk2uGSIhKZXe3YQhptScr qbKIwaxUchVFF9UhJ9R80VCvrWM2USs2uXpkV23WXOfj7SfanYDXGLCNYe5zEyWx CijSSH3iHWOrwpRK8sc+owL06QqVUq9zw0ohA1AIDl9U5lQWlzL0OQmLEq5+I5Et ZWyqCL969PNFu5x3jO0qGMbwdcC4KXrol2+8PAz9ANZyw+e6SzLjan+U6VFzYWyt wtrH1Mi1hgDQ1RXuZtNxcuj51pf+62j726ooLM8cYgjBVwFoE72yyBY2h9uRPYL5 JK7aZsgt2oizCvb6U745w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1688599402; x=1688685802; bh=+ 4u02dRquDZRbDAXWJOGiTdYcK26uXD+Fleby38+UAo=; b=h3+GhIQkbilDrDf/w 8BVwy82Qo3nYWthiRJ/lyrUjtHQVm6+/EXhRD5qqh/n8xPrmiZwvRJaKXtd03+O3 NUa4N1eksQndmpVGZ6trX186vSJ3kJV9eVthzG/gPCeAOe0albd98bfguhgF0B9p 2uT8kMjVE2uV77W76IVCqhydp9hdajm5Ch1awUwf9GePv0Ag5o2lCsFThNlUHbh9 9G3GJZhKCjiLXNmjVW5RTRo457BOGyPO5PNJVHhIShRmQNL54bnCPT15rlOOD6Mh ZTpcs9Lh4hlMtlJA3M7LOnISSiL0eWqC0KWEAZhsg3KMJ5SZifeIlhwgB/Aezcl7 XYCNA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudekgddvudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhi oheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejffevvd ehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgepudenucfrrghrrghmpehm rghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 Jul 2023 19:23:21 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 11/18] btrfs: track original extent owner in head_ref Date: Wed, 5 Jul 2023 16:20:48 -0700 Message-ID: <3ab4ae0d6690a8d69cee48e107ee87102ce5f070.1688597211.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Simple quotas requires tracking the original creating root of any given extent. This gets complicated when multiple subvolumes create overlapping/contradictory refs in the same transaction. For example, due to modifying or deleting an extent while also snapshotting it. To resolve this in a general way, take advantage of the fact that we are essentially already tracking this for handling releasing reservations. The head ref coalesces the various refs and uses must_insert_reserved to check if it needs to create an extent/free reservation. Store the ref that set must_insert_reserved as the owning ref on the head ref. Note that this can result in writing an extent for the very first time with an owner different from its only ref, but it will look the same as if you first created it with the original owning ref, then added the other ref, then removed the owning ref. Signed-off-by: Boris Burkov --- fs/btrfs/delayed-ref.c | 10 ++++++---- fs/btrfs/delayed-ref.h | 7 +++++++ 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 49c320f2334b..89641bcd6841 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -632,6 +632,7 @@ static noinline void update_existing_head_ref(struct btrfs_trans_handle *trans, * Set it again here */ existing->must_insert_reserved = update->must_insert_reserved; + existing->owning_root = update->owning_root; /* * update the num_bytes so we make sure the accounting @@ -694,7 +695,7 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref, struct btrfs_qgroup_extent_record *qrecord, u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved, int action, bool is_data, - bool is_system) + bool is_system, u64 owning_root) { int count_mod = 1; bool must_insert_reserved = false; @@ -735,6 +736,7 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref, head_ref->num_bytes = num_bytes; head_ref->ref_mod = count_mod; head_ref->must_insert_reserved = must_insert_reserved; + head_ref->owning_root = owning_root; head_ref->is_data = is_data; head_ref->is_system = is_system; head_ref->ref_tree = RB_ROOT_CACHED; @@ -923,7 +925,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans, init_delayed_ref_head(head_ref, record, bytenr, num_bytes, generic_ref->tree_ref.ref_root, 0, action, - false, is_system); + false, is_system, generic_ref->owning_root); head_ref->extent_op = extent_op; delayed_refs = &trans->transaction->delayed_refs; @@ -1015,7 +1017,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans, } init_delayed_ref_head(head_ref, record, bytenr, num_bytes, ref_root, - reserved, action, true, false); + reserved, action, true, false, generic_ref->owning_root); head_ref->extent_op = NULL; delayed_refs = &trans->transaction->delayed_refs; @@ -1061,7 +1063,7 @@ int btrfs_add_delayed_extent_op(struct btrfs_trans_handle *trans, return -ENOMEM; init_delayed_ref_head(head_ref, NULL, bytenr, num_bytes, 0, 0, - BTRFS_UPDATE_DELAYED_HEAD, false, false); + BTRFS_UPDATE_DELAYED_HEAD, false, false, 0); head_ref->extent_op = extent_op; delayed_refs = &trans->transaction->delayed_refs; diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index 336c33c28191..0af3b7395aba 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -123,6 +123,13 @@ struct btrfs_delayed_ref_head { * the free has happened. */ bool must_insert_reserved; + + /* + * The root which triggered the allocation when + * must_insert_reserved is true + */ + u64 owning_root; + bool is_data; bool is_system; bool processing; From patchwork Wed Jul 5 23:20:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13303040 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27E4BEB64DA for ; Wed, 5 Jul 2023 23:24:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232392AbjGEXXx (ORCPT ); Wed, 5 Jul 2023 19:23:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42110 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232426AbjGEXX0 (ORCPT ); Wed, 5 Jul 2023 19:23:26 -0400 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 698E0198E for ; Wed, 5 Jul 2023 16:23:24 -0700 (PDT) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id D63975C0281; Wed, 5 Jul 2023 19:23:23 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 05 Jul 2023 19:23:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1688599403; x= 1688685803; bh=QKh0tgqhtGxgsxuW9+z1Fvc2Qk9Qm3eBGgcUo8DHsi8=; b=D wlAaI0o5XR7L/SvtoJjjStLKn8rhOFgiADvbG01fTq7griqxIFGuASzKD8j3TdB1 TF4vh3YAO/AnSn47gJeKP0AgAAnAAsfZb4g6TTj9D14UBwEebqXhTa/vaINw3XzK oK2hDZ6GnI8qkfOW9BRL4EIzJNXx2UpmwgoYcaJ/R8N15nvkZEUHg60WhIHr3ySy m0Y5GEmcRjnIismFHLSesiKzrcnLjVTOUUSZVEsDuBlw6+QQhJj/zovAHEKzo5Nb Ooq9WtT+VZqpDog9v0LV83DdbeLY0pGlNNgdrCFZodcVhcveUi0uSiIBkbAGeuoJ Fbsd3c+VxzObSFMNeiHUA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1688599403; x=1688685803; bh=Q Kh0tgqhtGxgsxuW9+z1Fvc2Qk9Qm3eBGgcUo8DHsi8=; b=QgfH9Kz2HnZ1jJbQE Rc963oKG6EHh3UavtTB6+fRVUKCPxBwFXPHOjI5ASO3npi9TNoTMU8vqMKM7iKcI Cqhdyrks8bZPkDivcYADJJmGIwhi4YkRsPRCk+7hf3GmaFuNhorOwJOev/W0czzt /M3YFeyVDBG2/bcVMpZFaoS4nlPujGVC239kOBUnXvBX19LsuJjKEb36L43/PhBK 5QiDEktoqJVvu4gt3hU/Z8HYssRcs2fu+EPcdpp920tHMtU2GJEeFKrhzs+hzNhA Gqsic/2M2WwMc1gMTQR7dafGowTFwn10ldR35SWlJmZ8ARIU8LZ6phzZaLJU5qQj UGlJg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudekgddvudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhi oheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejffevvd ehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgepudenucfrrghrrghmpehm rghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 Jul 2023 19:23:23 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 12/18] btrfs: new inline ref storing owning subvol of data extents Date: Wed, 5 Jul 2023 16:20:49 -0700 Message-ID: <8d5a36641f7d79b66a1b306cee87f52bf1fb3b23.1688597211.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In order to implement simple quota groups, we need to be able to associate a data extent with the subvolume that created it. Once you account for reflink, this information cannot be recovered without explicitly storing it. Options for storing it are: - a new key/item - a new extent inline ref item The former is backwards compatible, but wastes space, the latter is incompat, but is efficient in space and reuses the existing inline ref machinery, while only abusing it a tiny amount -- specifically, the new item is not a ref, per-se. Signed-off-by: Boris Burkov Reviewed-by: Josef Bacik --- fs/btrfs/accessors.h | 4 +++ fs/btrfs/backref.c | 3 ++ fs/btrfs/extent-tree.c | 51 ++++++++++++++++++++++++++------- fs/btrfs/print-tree.c | 12 ++++++++ fs/btrfs/ref-verify.c | 3 ++ fs/btrfs/tree-checker.c | 3 ++ include/uapi/linux/btrfs_tree.h | 6 ++++ 7 files changed, 71 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h index ceadfc5d6c66..aab61312e4e8 100644 --- a/fs/btrfs/accessors.h +++ b/fs/btrfs/accessors.h @@ -350,6 +350,8 @@ BTRFS_SETGET_FUNCS(extent_data_ref_count, struct btrfs_extent_data_ref, count, 3 BTRFS_SETGET_FUNCS(shared_data_ref_count, struct btrfs_shared_data_ref, count, 32); +BTRFS_SETGET_FUNCS(extent_owner_ref_root_id, struct btrfs_extent_owner_ref, root_id, 64); + BTRFS_SETGET_FUNCS(extent_inline_ref_type, struct btrfs_extent_inline_ref, type, 8); BTRFS_SETGET_FUNCS(extent_inline_ref_offset, struct btrfs_extent_inline_ref, @@ -366,6 +368,8 @@ static inline u32 btrfs_extent_inline_ref_size(int type) if (type == BTRFS_EXTENT_DATA_REF_KEY) return sizeof(struct btrfs_extent_data_ref) + offsetof(struct btrfs_extent_inline_ref, offset); + if (type == BTRFS_EXTENT_OWNER_REF_KEY) + return sizeof(struct btrfs_extent_inline_ref); return 0; } diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 79336fa853db..d5bb6a880713 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1129,6 +1129,9 @@ static int add_inline_refs(struct btrfs_backref_walk_ctx *ctx, count, sc, GFP_NOFS); break; } + case BTRFS_EXTENT_OWNER_REF_KEY: + WARN_ON(!btrfs_fs_incompat(ctx->fs_info, SIMPLE_QUOTA)); + break; default: WARN_ON(1); } diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index fa53f7cbd84a..3313f3d7d4ae 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -363,9 +363,13 @@ int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb, struct btrfs_extent_inline_ref *iref, enum btrfs_inline_ref_type is_data) { + struct btrfs_fs_info *fs_info = eb->fs_info; int type = btrfs_extent_inline_ref_type(eb, iref); u64 offset = btrfs_extent_inline_ref_offset(eb, iref); + if (type == BTRFS_EXTENT_OWNER_REF_KEY && btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)) + return type; + if (type == BTRFS_TREE_BLOCK_REF_KEY || type == BTRFS_SHARED_BLOCK_REF_KEY || type == BTRFS_SHARED_DATA_REF_KEY || @@ -374,26 +378,25 @@ int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb, if (type == BTRFS_TREE_BLOCK_REF_KEY) return type; if (type == BTRFS_SHARED_BLOCK_REF_KEY) { - ASSERT(eb->fs_info); + ASSERT(fs_info); /* * Every shared one has parent tree block, * which must be aligned to sector size. */ - if (offset && - IS_ALIGNED(offset, eb->fs_info->sectorsize)) + if (offset && IS_ALIGNED(offset, fs_info->sectorsize)) return type; } } else if (is_data == BTRFS_REF_TYPE_DATA) { if (type == BTRFS_EXTENT_DATA_REF_KEY) return type; if (type == BTRFS_SHARED_DATA_REF_KEY) { - ASSERT(eb->fs_info); + ASSERT(fs_info); /* * Every shared one has parent tree block, * which must be aligned to sector size. */ if (offset && - IS_ALIGNED(offset, eb->fs_info->sectorsize)) + IS_ALIGNED(offset, fs_info->sectorsize)) return type; } } else { @@ -403,7 +406,7 @@ int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb, } btrfs_print_leaf(eb); - btrfs_err(eb->fs_info, + btrfs_err(fs_info, "eb %llu iref 0x%lx invalid extent inline ref type %d", eb->start, (unsigned long)iref, type); WARN_ON(1); @@ -912,6 +915,11 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, } iref = (struct btrfs_extent_inline_ref *)ptr; type = btrfs_get_extent_inline_ref_type(leaf, iref, needed); + if (type == BTRFS_EXTENT_OWNER_REF_KEY) { + WARN_ON(!btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)); + ptr += btrfs_extent_inline_ref_size(type); + continue; + } if (type == BTRFS_REF_TYPE_INVALID) { err = -EUCLEAN; goto out; @@ -1705,6 +1713,8 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, node->type == BTRFS_SHARED_DATA_REF_KEY) ret = run_delayed_data_ref(trans, node, extent_op, insert_reserved); + else if (node->type == BTRFS_EXTENT_OWNER_REF_KEY) + ret = 0; else BUG(); if (ret && insert_reserved) @@ -2271,6 +2281,7 @@ static noinline int check_committed_ref(struct btrfs_root *root, struct btrfs_extent_item *ei; struct btrfs_key key; u32 item_size; + u32 expected_size; int type; int ret; @@ -2297,10 +2308,17 @@ static noinline int check_committed_ref(struct btrfs_root *root, ret = 1; item_size = btrfs_item_size(leaf, path->slots[0]); ei = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_extent_item); + expected_size = sizeof(*ei) + btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY); + + iref = (struct btrfs_extent_inline_ref *)(ei + 1); + type = btrfs_get_extent_inline_ref_type(leaf, iref, BTRFS_REF_TYPE_DATA); + if (btrfs_fs_incompat(fs_info, SIMPLE_QUOTA) && type == BTRFS_EXTENT_OWNER_REF_KEY) { + expected_size += btrfs_extent_inline_ref_size(BTRFS_EXTENT_OWNER_REF_KEY); + iref = (struct btrfs_extent_inline_ref *)(iref + 1); + } /* If extent item has more than 1 inline ref then it's shared */ - if (item_size != sizeof(*ei) + - btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY)) + if (item_size != expected_size) goto out; /* @@ -2312,8 +2330,6 @@ static noinline int check_committed_ref(struct btrfs_root *root, btrfs_root_last_snapshot(&root->root_item))) goto out; - iref = (struct btrfs_extent_inline_ref *)(ei + 1); - /* If this extent has SHARED_DATA_REF then it's shared */ type = btrfs_get_extent_inline_ref_type(leaf, iref, BTRFS_REF_TYPE_DATA); if (type != BTRFS_EXTENT_DATA_REF_KEY) @@ -4564,18 +4580,23 @@ static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans, struct btrfs_root *extent_root; int ret; struct btrfs_extent_item *extent_item; + struct btrfs_extent_owner_ref *oref; struct btrfs_extent_inline_ref *iref; struct btrfs_path *path; struct extent_buffer *leaf; int type; u32 size; + bool simple_quota = btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE; if (parent > 0) type = BTRFS_SHARED_DATA_REF_KEY; else type = BTRFS_EXTENT_DATA_REF_KEY; - size = sizeof(*extent_item) + btrfs_extent_inline_ref_size(type); + size = sizeof(*extent_item); + if (simple_quota) + size += btrfs_extent_inline_ref_size(BTRFS_EXTENT_OWNER_REF_KEY); + size += btrfs_extent_inline_ref_size(type); path = btrfs_alloc_path(); if (!path) @@ -4596,8 +4617,16 @@ static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans, btrfs_set_extent_flags(leaf, extent_item, flags | BTRFS_EXTENT_FLAG_DATA); + iref = (struct btrfs_extent_inline_ref *)(extent_item + 1); + if (simple_quota) { + btrfs_set_extent_inline_ref_type(leaf, iref, BTRFS_EXTENT_OWNER_REF_KEY); + oref = (struct btrfs_extent_owner_ref *)(&iref->offset); + btrfs_set_extent_owner_ref_root_id(leaf, oref, root_objectid); + iref = (struct btrfs_extent_inline_ref *)(oref + 1); + } btrfs_set_extent_inline_ref_type(leaf, iref, type); + if (parent > 0) { struct btrfs_shared_data_ref *ref; ref = (struct btrfs_shared_data_ref *)(iref + 1); diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c index aa06d9ca911d..3fac15ce0db0 100644 --- a/fs/btrfs/print-tree.c +++ b/fs/btrfs/print-tree.c @@ -80,12 +80,20 @@ static void print_extent_data_ref(const struct extent_buffer *eb, btrfs_extent_data_ref_count(eb, ref)); } +static void print_extent_owner_ref(const struct extent_buffer *eb, + struct btrfs_extent_owner_ref *ref) +{ + WARN_ON(!btrfs_fs_incompat(eb->fs_info, SIMPLE_QUOTA)); + pr_cont("extent data owner root %llu\n", btrfs_extent_owner_ref_root_id(eb, ref)); +} + static void print_extent_item(const struct extent_buffer *eb, int slot, int type) { struct btrfs_extent_item *ei; struct btrfs_extent_inline_ref *iref; struct btrfs_extent_data_ref *dref; struct btrfs_shared_data_ref *sref; + struct btrfs_extent_owner_ref *oref; struct btrfs_disk_key key; unsigned long end; unsigned long ptr; @@ -159,6 +167,10 @@ static void print_extent_item(const struct extent_buffer *eb, int slot, int type "\t\t\t(parent %llu not aligned to sectorsize %u)\n", offset, eb->fs_info->sectorsize); break; + case BTRFS_EXTENT_OWNER_REF_KEY: + oref = (struct btrfs_extent_owner_ref *)(&iref->offset); + print_extent_owner_ref(eb, oref); + break; default: pr_cont("(extent %llu has INVALID ref type %d)\n", eb->start, type); diff --git a/fs/btrfs/ref-verify.c b/fs/btrfs/ref-verify.c index b7b3bd86f5e2..c0660233feb4 100644 --- a/fs/btrfs/ref-verify.c +++ b/fs/btrfs/ref-verify.c @@ -485,6 +485,9 @@ static int process_extent_item(struct btrfs_fs_info *fs_info, ret = add_shared_data_ref(fs_info, offset, count, key->objectid, key->offset); break; + case BTRFS_EXTENT_OWNER_REF_KEY: + WARN_ON(!btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)); + break; default: btrfs_err(fs_info, "invalid key type in iref"); ret = -EINVAL; diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index 038dfa8f1788..72d29ab74a01 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -1451,6 +1451,9 @@ static int check_extent_item(struct extent_buffer *leaf, } inline_refs += btrfs_shared_data_ref_count(leaf, sref); break; + case BTRFS_EXTENT_OWNER_REF_KEY: + WARN_ON(!btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)); + break; default: extent_err(leaf, slot, "unknown inline ref type: %u", inline_type); diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index ab38d0f411fa..424c7f342712 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -226,6 +226,8 @@ #define BTRFS_SHARED_DATA_REF_KEY 184 +#define BTRFS_EXTENT_OWNER_REF_KEY 190 + /* * block groups give us hints into the extent allocation trees. Which * blocks are free etc etc @@ -783,6 +785,10 @@ struct btrfs_shared_data_ref { __le32 count; } __attribute__ ((__packed__)); +struct btrfs_extent_owner_ref { + u64 root_id; +} __attribute__ ((__packed__)); + struct btrfs_extent_inline_ref { __u8 type; __le64 offset; From patchwork Wed Jul 5 23:20:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13303039 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F210EB64DD for ; Wed, 5 Jul 2023 23:24:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232401AbjGEXXy (ORCPT ); Wed, 5 Jul 2023 19:23:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42120 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232429AbjGEXX1 (ORCPT ); Wed, 5 Jul 2023 19:23:27 -0400 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1BAD3171A for ; Wed, 5 Jul 2023 16:23:26 -0700 (PDT) Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.nyi.internal (Postfix) with ESMTP id 89CA15C028D; Wed, 5 Jul 2023 19:23:25 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Wed, 05 Jul 2023 19:23:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1688599405; x= 1688685805; bh=Z4Gs6nbYRxaGnbhD9LdDwK/uWxdm23BqDfWAYpQ+j2A=; b=X Z3ATAoFXxT/BgLTPESruQw95zAXklwC1hQSi9p0KafdN/MIVhOFRk+nCYO3qWOHa dEE+nnEADiGmli/IYdIeAZmESMfTwimi2UGq3d/5qyXlE8UENLLan5GkSpa49sVP AHzAM+gA6dmHohgwtR50GMdwE+OZPSB0+oIzVd2p/9RU4bC/1nkBI07dHaK1Qsru hj2E4zh9W/2VhCKeNy1w8osuGxDNkRUzO+mTSy2faur4/B5+a/uAYzQW2RAjWTor fzBMFxvctsPkrgPS2jZr9/QZDKhjJV0i+bTP/eyh32gVsknsp1CmiwDa8bpV3fcn 2HAguMrrNqCem7pE58JDg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1688599405; x=1688685805; bh=Z 4Gs6nbYRxaGnbhD9LdDwK/uWxdm23BqDfWAYpQ+j2A=; b=hm/X27BTVjL7ViSet g0MZJ4blWYsbMYOKpJjfayA/coObUyrO3rtNzNzKub2w+hHrJsQk3J917QRGtqEG iN+j70m4hFneWnwavj8FnU0RIvMenX+Ac1bziMn/uuRL9fVvg3UbcxCotYDSSs+w cOUs4ZOH1n83fZySEDJU8Z6yc3a7W5twYmmBnoCyF3PsDrcK8z1JbC2g51Jjprwj bsMhNX7X3qV5ZrYW70W3Py9YU2RsOhqiE2VhzrySJklLlwLTK7bCar9O/sCB9z88 GsUdp7alsS/X3MYLer45RU+5ISRzn8xRW10DwSeTU/c6t5Y/2Lk0F9L7J4XVMuDe prdLw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudekgddvudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhi oheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejffevvd ehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgepudenucfrrghrrghmpehm rghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 Jul 2023 19:23:25 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 13/18] btrfs: inline owner ref lookup helper Date: Wed, 5 Jul 2023 16:20:50 -0700 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Inline ref parsing is a bit tricky and relies on a decent amount of implicit information, so I think it is beneficial to have a helper function for reading the owner ref, if only to "document" the format, along with the write path. The main subtlety of note which I was missing by open-coding this was that it is important to check whether or not inline refs are present *at all*. i.e., if we are writing out a new extent under squotas, we will always use a big enough item for the inline ref and have it. However, it is possible that some random item predating squotas will not have any inline refs. In that case, trying to read the "type" field of the first inline ref will just be reading garbage in the form of whatever is in the next item. This will be used by the extent free-ing path, which looks up data extent owners as well as a relocation path which needs to grab the owner before relocating an extent. Signed-off-by: Boris Burkov Reviewed-by: Josef Bacik --- fs/btrfs/extent-tree.c | 51 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/extent-tree.h | 3 +++ 2 files changed, 54 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 3313f3d7d4ae..c8f767d7e261 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2821,6 +2821,57 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans) return 0; } +/* + * Helper to parse an extent item's inline extents looking for a simple + * quotas owner ref. + * + * @fs_info - the btrfs_fs_info for this mount + * @leaf - a leaf in the extent tree containing the extent item + * @slot - the slot in the leaf where the extent item is found + * + * Returns the objectid of the root that originally allocated the extent item + * if the inline owner ref is expected and present, otherwise 0. + * + * If an extent item has an owner ref item, it will be the first + * inline ref item. Therefore the logic is to check whether there are + * any inline ref items, then check the type of the first one. + * + */ +u64 btrfs_get_extent_owner_root(struct btrfs_fs_info *fs_info, + struct extent_buffer *leaf, + int slot) +{ + struct btrfs_extent_item *ei; + struct btrfs_extent_inline_ref *iref; + struct btrfs_extent_owner_ref *oref; + unsigned long ptr; + unsigned long end; + int type; + + if (!btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)) + return 0; + + ei = btrfs_item_ptr(leaf, slot, struct btrfs_extent_item); + ptr = (unsigned long)(ei + 1); + end = (unsigned long)ei + btrfs_item_size(leaf, slot); + + /* No inline ref items of any kind, can't check type */ + if (ptr == end) + return 0; + + iref = (struct btrfs_extent_inline_ref *)ptr; + type = btrfs_get_extent_inline_ref_type(leaf, iref, BTRFS_REF_TYPE_ANY); + + /* We found an owner ref, get the root out of it */ + if (type == BTRFS_EXTENT_OWNER_REF_KEY) { + oref = (struct btrfs_extent_owner_ref *)(&iref->offset); + return btrfs_extent_owner_ref_root_id(leaf, oref); + } + + /* We have inline refs, but not an owner ref */ + return 0; +} + static int do_free_extent_accounting(struct btrfs_trans_handle *trans, u64 bytenr, u64 num_bytes, bool is_data) { diff --git a/fs/btrfs/extent-tree.h b/fs/btrfs/extent-tree.h index 429d5c570061..9c8fd96cbee7 100644 --- a/fs/btrfs/extent-tree.h +++ b/fs/btrfs/extent-tree.h @@ -144,6 +144,9 @@ int btrfs_set_disk_extent_flags(struct btrfs_trans_handle *trans, struct extent_buffer *eb, u64 flags); int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_ref *ref); +u64 btrfs_get_extent_owner_root(struct btrfs_fs_info *fs_info, + struct extent_buffer *leaf, + int slot); int btrfs_free_reserved_extent(struct btrfs_fs_info *fs_info, u64 start, u64 len, int delalloc); int btrfs_pin_reserved_extent(struct btrfs_trans_handle *trans, u64 start, u64 len); From patchwork Wed Jul 5 23:20:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13303041 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F54AC001DB for ; Wed, 5 Jul 2023 23:24:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232403AbjGEXXz (ORCPT ); Wed, 5 Jul 2023 19:23:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232434AbjGEXX3 (ORCPT ); Wed, 5 Jul 2023 19:23:29 -0400 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9259E57 for ; Wed, 5 Jul 2023 16:23:27 -0700 (PDT) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 45B405C02A1; Wed, 5 Jul 2023 19:23:27 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 05 Jul 2023 19:23:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1688599407; x= 1688685807; bh=W80V2QyUb7WEXNyME8hnBHOtlpN/8S5iQHLAwGvSMXs=; b=F bXu4k75XsbMaM2YVrTBS2aCmP/hm44v8jcPWwofOGuF/+jK2epN0/sETyfnnp1Ml qiN1b/KYIMn/iPTezAOoR1YHrhWQjs/Xhs+xMsJZFA2sRqceLnIP0LhtKlvXVQba 8Eqbs4+VcArnU6o6S0br0+92jlsgHLKFWnueIaJwQmI9pCr9X0UZfp4IKInABhEH caJLhm0fTBKSwPzEkS2qIGs9BBLQJPuumosZ2k4hnAtieY4uO5HKtOexdgauf5Gh Lb1yZwO2MZeUrJ7VBnxQLys5V4KwpkG1z6ZRQ/3ItNLBgyLwmkBSbMAsp1y8GdIo Uh6REG3E9v5kL4NNiQo2w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1688599407; x=1688685807; bh=W 80V2QyUb7WEXNyME8hnBHOtlpN/8S5iQHLAwGvSMXs=; b=d4RX6btU96cuAgooe +URgIR4g6YnkDUxmwe5bAgRfjd7xAt7yIQ9zUFwA5OxXbXkPMHstOXXgL9RaRXG9 Z7AiKMLKV3ORF4Om1o/v7/Q98uz/4wZgaBXe1lm909p/5lc3E63lfkHy+YCZPsow 6ZNdAWQHTIO5w9+MzmuIa365/Li970wF6pLeyIKYrmmnqzvlC9k36uQar4Sof0nC BLGa1dmVWWR4sVoXFZL9b3NW5aEeG361CF5Z4MR/neRfAi1Ahlf97FKXZLA2pKic PVUcHpEYYG4IpblwFJ6Ofy/i0cbuQXxMmdQ62VlsUS9dvsudgfcpxKZRf12NTQm5 0/bVA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudekgddvudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhi oheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejffevvd ehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgepvdenucfrrghrrghmpehm rghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 Jul 2023 19:23:26 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 14/18] btrfs: record simple quota deltas Date: Wed, 5 Jul 2023 16:20:51 -0700 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org At the moment that we run delayed refs, we make the final ref-count based decision on creating/removing extent (and metadata) items. Therefore, it is exactly the spot to hook up simple quotas. There are a few important subtleties to the fields we must collect to accurately track simple quotas, particularly when removing an extent. When removing a data extent, the ref could be in any tree (due to reflink, for example) and so we need to recover the owning root id from the owner ref item. When removing a metadata extent, we know the owning root from the owner field in the header when we create the delayed ref, so we can recover it from there. We must also be careful to handle reservations properly to not leaked reserved space. The happy path is freeing the reservation when the simple quota delta runs on a data extent. If that doesn't happen, due to refs canceling out or some error, the ref head already has the must_insert_reserved machinery to handle this, so we piggy back on that and use it to clean up the reserved data. Signed-off-by: Boris Burkov --- fs/btrfs/delayed-ref.c | 3 ++ fs/btrfs/delayed-ref.h | 6 ++++ fs/btrfs/extent-tree.c | 79 +++++++++++++++++++++++++++++++++++++----- 3 files changed, 80 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 89641bcd6841..04e124a93049 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -735,6 +735,9 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref, head_ref->bytenr = bytenr; head_ref->num_bytes = num_bytes; head_ref->ref_mod = count_mod; + head_ref->reserved_bytes = 0; + if (reserved) + head_ref->reserved_bytes = reserved; head_ref->must_insert_reserved = must_insert_reserved; head_ref->owning_root = owning_root; head_ref->is_data = is_data; diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index 0af3b7395aba..aff05b3fb4ba 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -110,6 +110,12 @@ struct btrfs_delayed_ref_head { */ int ref_mod; + /* + * Track reserved bytes when setting must_insert_reserved. + * On success or cleanup, we will need to free the reservation. + */ + u64 reserved_bytes; + /* * when a new extent is allocated, it is just reserved in memory * The actual extent isn't inserted into the extent allocation tree diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index c8f767d7e261..c8914c66dc83 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1503,6 +1503,7 @@ static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, } static int run_delayed_data_ref(struct btrfs_trans_handle *trans, + struct btrfs_delayed_ref_head *href, struct btrfs_delayed_ref_node *node, struct btrfs_delayed_extent_op *extent_op, bool insert_reserved) @@ -1526,12 +1527,22 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans, ref_root = ref->root; if (node->action == BTRFS_ADD_DELAYED_REF && insert_reserved) { + struct btrfs_simple_quota_delta delta = { + .root = href->owning_root, + .num_bytes = node->num_bytes, + .rsv_bytes = href->reserved_bytes, + .is_data = true, + .is_inc = true, + }; + if (extent_op) flags |= extent_op->flags_to_set; ret = alloc_reserved_file_extent(trans, parent, ref_root, flags, ref->objectid, ref->offset, &ins, node->ref_mod); + if (!ret) + ret = btrfs_record_simple_quota_delta(trans->fs_info, &delta); } else if (node->action == BTRFS_ADD_DELAYED_REF) { ret = __btrfs_inc_extent_ref(trans, node, parent, ref_root, ref->objectid, ref->offset, @@ -1653,11 +1664,13 @@ static int run_delayed_extent_op(struct btrfs_trans_handle *trans, } static int run_delayed_tree_ref(struct btrfs_trans_handle *trans, + struct btrfs_delayed_ref_head *href, struct btrfs_delayed_ref_node *node, struct btrfs_delayed_extent_op *extent_op, bool insert_reserved) { int ret = 0; + struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_delayed_tree_ref *ref; u64 parent = 0; u64 ref_root = 0; @@ -1677,8 +1690,18 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans, return -EIO; } if (node->action == BTRFS_ADD_DELAYED_REF && insert_reserved) { + struct btrfs_simple_quota_delta delta = { + .root = href->owning_root, + .num_bytes = fs_info->nodesize, + .rsv_bytes = 0, + .is_data = false, + .is_inc = true, + }; + BUG_ON(!extent_op || !extent_op->update_flags); ret = alloc_reserved_tree_block(trans, node, extent_op); + if (!ret) + btrfs_record_simple_quota_delta(fs_info, &delta); } else if (node->action == BTRFS_ADD_DELAYED_REF) { ret = __btrfs_inc_extent_ref(trans, node, parent, ref_root, ref->level, 0, 1, extent_op); @@ -1693,6 +1716,7 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans, /* helper function to actually process a single delayed ref entry */ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, + struct btrfs_delayed_ref_head *href, struct btrfs_delayed_ref_node *node, struct btrfs_delayed_extent_op *extent_op, bool insert_reserved) @@ -1707,12 +1731,12 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, if (node->type == BTRFS_TREE_BLOCK_REF_KEY || node->type == BTRFS_SHARED_BLOCK_REF_KEY) - ret = run_delayed_tree_ref(trans, node, extent_op, + ret = run_delayed_tree_ref(trans, href, node, extent_op, insert_reserved); else if (node->type == BTRFS_EXTENT_DATA_REF_KEY || node->type == BTRFS_SHARED_DATA_REF_KEY) - ret = run_delayed_data_ref(trans, node, extent_op, - insert_reserved); + ret = run_delayed_data_ref(trans, href, node, + extent_op, insert_reserved); else if (node->type == BTRFS_EXTENT_OWNER_REF_KEY) ret = 0; else @@ -1809,6 +1833,11 @@ void btrfs_cleanup_ref_head_accounting(struct btrfs_fs_info *fs_info, spin_unlock(&delayed_refs->lock); nr_items += btrfs_csum_bytes_to_leaves(fs_info, head->num_bytes); } + if (head->must_insert_reserved && head->is_data && + btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) + btrfs_qgroup_free_refroot(fs_info, head->owning_root, + head->reserved_bytes, + BTRFS_QGROUP_RSV_DATA); btrfs_delayed_refs_rsv_release(fs_info, nr_items); } @@ -1955,8 +1984,8 @@ static int btrfs_run_delayed_refs_for_head(struct btrfs_trans_handle *trans, locked_ref->extent_op = NULL; spin_unlock(&locked_ref->lock); - ret = run_one_delayed_ref(trans, ref, extent_op, - must_insert_reserved); + ret = run_one_delayed_ref(trans, locked_ref, ref, + extent_op, must_insert_reserved); btrfs_free_delayed_extent_op(extent_op); if (ret) { @@ -2873,11 +2902,12 @@ u64 btrfs_get_extent_owner_root(struct btrfs_fs_info *fs_info, } static int do_free_extent_accounting(struct btrfs_trans_handle *trans, - u64 bytenr, u64 num_bytes, bool is_data) + u64 bytenr, struct btrfs_simple_quota_delta *delta) { int ret; + u64 num_bytes = delta->num_bytes; - if (is_data) { + if (delta->is_data) { struct btrfs_root *csum_root; csum_root = btrfs_csum_root(trans->fs_info, bytenr); @@ -2888,6 +2918,12 @@ static int do_free_extent_accounting(struct btrfs_trans_handle *trans, } } + ret = btrfs_record_simple_quota_delta(trans->fs_info, delta); + if (ret) { + btrfs_abort_transaction(trans, ret); + return ret; + } + ret = add_to_free_space_tree(trans, bytenr, num_bytes); if (ret) { btrfs_abort_transaction(trans, ret); @@ -2990,6 +3026,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, u64 bytenr = node->bytenr; u64 num_bytes = node->num_bytes; bool skinny_metadata = btrfs_fs_incompat(info, SKINNY_METADATA); + u64 delayed_ref_root = node->owning_root; extent_root = btrfs_extent_root(info, bytenr); ASSERT(extent_root); @@ -3188,6 +3225,14 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, } } } else { + struct btrfs_simple_quota_delta delta = { + .root = delayed_ref_root, + .num_bytes = num_bytes, + .rsv_bytes = 0, + .is_data = is_data, + .is_inc = false, + }; + /* In this branch refs == 1 */ if (found_extent) { if (is_data && refs_to_drop != @@ -3226,6 +3271,16 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, num_to_del = 2; } } + /* + * We can't infer the data owner from the delayed ref, so we + * need to try to get it from the owning ref item. + * + * If it is not present, then that extent was not written under + * simple quotas mode, so we don't need to account for its + * deletion. + */ + if (is_data) + delta.root = btrfs_get_extent_owner_root(trans->fs_info, leaf, extent_slot); ret = btrfs_del_items(trans, extent_root, path, path->slots[0], num_to_del); @@ -3235,7 +3290,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, } btrfs_release_path(path); - ret = do_free_extent_accounting(trans, bytenr, num_bytes, is_data); + ret = do_free_extent_accounting(trans, bytenr, &delta); } btrfs_release_path(path); @@ -4808,6 +4863,13 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans, int ret; struct btrfs_block_group *block_group; struct btrfs_space_info *space_info; + struct btrfs_simple_quota_delta delta = { + .root = root_objectid, + .num_bytes = ins->offset, + .rsv_bytes = 0, + .is_data = true, + .is_inc = true, + }; /* * Mixed block groups will exclude before processing the log so we only @@ -4836,6 +4898,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans, offset, ins, 1); if (ret) btrfs_pin_extent(trans, ins->objectid, ins->offset, 1); + ret = btrfs_record_simple_quota_delta(fs_info, &delta); btrfs_put_block_group(block_group); return ret; } From patchwork Wed Jul 5 23:20:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13303043 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70D81C001DF for ; Wed, 5 Jul 2023 23:24:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232420AbjGEXX4 (ORCPT ); Wed, 5 Jul 2023 19:23:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42146 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230487AbjGEXXb (ORCPT ); Wed, 5 Jul 2023 19:23:31 -0400 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9660BE57 for ; Wed, 5 Jul 2023 16:23:29 -0700 (PDT) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id F3E545C028D; Wed, 5 Jul 2023 19:23:28 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Wed, 05 Jul 2023 19:23:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1688599408; x= 1688685808; bh=55x3ITHEATdtcMi6WnwInGFV3Dj1WTNZ3vsKAkOkZJk=; b=A mpQR52TDzGEjsyellBmaT8nokE8A50THJtrfnfCFK1iH2VsEuQz5RNJNgiMBlJm5 s/Rqr2sCiG+m6mAw01m9R6TD5pgOhgbdpXAL2xPaoOnVsZcIBNFM8574xrbh29M7 uZVEsQ2TUQ6RduoB9GmtpdJly7Nzp2DC4Y18u0Trh8zOpx1VBYP1zm4OfAVz1gVv 8fyyxv5AFOhy0FDq3LG76+QMj7oOWGRHZh2uSMrF2hrpztdBiOM7LYJjqQXI2zCS CXLBhhCInNeX/3bNtupKm087OZMdnXcJubCSk61cWRdr7kyjDtwVvUaSVIIQgSel RMHPwhXmYQbfj9rB/5u1A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1688599408; x=1688685808; bh=5 5x3ITHEATdtcMi6WnwInGFV3Dj1WTNZ3vsKAkOkZJk=; b=mHHeILE7f/PY5Whkd Wu1bKArOpEFyuG7fcJvHRdTlRIAojnWM+c7zZ/HtcAjLp67QbMLgdmBrh2fOO2PF HJKZyqjEnqApGqniMC0KRxK0qJ8oGtHWeJpWHrtM5IvMuLMCddGXQwIipVKgXNvz PTI9zuIPUg83QFBhyINY8edeuVt/6pRTnn5kDeEsSH6sp9HgSCjks3UwARkR6wgn mSBet8tlcMUu9k8q9vgXRKM3I5YYkljZ8YOIZOQv/eZx2CGvDrvaJpKWRksmNIxm aWJUOdzh/5ebV4AvphIQMUWN8E4Rk+8rH6iyz+mw4QfkvUvwlaoC9ILBO4AhjMNH h6RAA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudekgddvudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhi oheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejffevvd ehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehm rghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 Jul 2023 19:23:28 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 15/18] btrfs: simple quota auto hierarchy for nested subvols Date: Wed, 5 Jul 2023 16:20:52 -0700 Message-ID: <140f5eca28de807b8334471d91f31863ce9e1aca.1688597211.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Consider the following sequence: - enable quotas - create subvol S id 256 at dir outer/ - create a qgroup 1/100 - add 0/256 (S's auto qgroup) to 1/100 - create subvol T id 257 at dir outer/inner/ With full qgroups, there is no relationship between 0/257 and either of 0/256 or 1/100. There is an inherit feature that the creator of inner/ can use to specify it ought to be in 1/100. Simple quotas are targeted at container isolation, where such automatic inheritance for not necessarily trusted/controlled nested subvol creation would be quite helpful. Therefore, add a new default behavior for simple quotas: when you create a nested subvol, automatically inherit as parents any parents of the qgroup of the subvol the new inode is going in. In our example, 257/0 would also be under 1/100, allowing easy control of a total quota over an arbitrary hierarchy of subvolumes. I think this _might_ be a generally useful behavior, so it could be interesting to put it behind a new inheritance flag that simple quotas always use while traditional quotas let the user specify, but this is a minimally intrusive change to start. Signed-off-by: Boris Burkov --- fs/btrfs/ioctl.c | 2 +- fs/btrfs/qgroup.c | 46 +++++++++++++++++++++++++++++++++++++++--- fs/btrfs/qgroup.h | 6 +++--- fs/btrfs/transaction.c | 13 ++++++++---- 4 files changed, 56 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 63c1a9258a1b..23f5142ef18b 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -652,7 +652,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap, /* Tree log can't currently deal with an inode which is a new root. */ btrfs_set_log_full_commit(trans); - ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit); + ret = btrfs_qgroup_inherit(trans, 0, objectid, root->root_key.objectid, inherit); if (ret) goto out; diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 97c00697b475..6714c5aeb4e1 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1557,8 +1557,7 @@ static int quick_update_accounting(struct btrfs_fs_info *fs_info, return ret; } -int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, - u64 dst) +int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst) { struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_qgroup *parent; @@ -2998,6 +2997,42 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans) return ret; } +static int qgroup_auto_inherit(struct btrfs_fs_info *fs_info, + u64 inode_rootid, + struct btrfs_qgroup_inherit **inherit) +{ + int i = 0; + u64 num_qgroups = 0; + struct btrfs_qgroup *inode_qg; + struct btrfs_qgroup_list *qg_list; + + if (*inherit) + return -EEXIST; + + inode_qg = find_qgroup_rb(fs_info, inode_rootid); + if (!inode_qg) + return -ENOENT; + + list_for_each_entry(qg_list, &inode_qg->groups, next_group) { + ++num_qgroups; + } + + if (!num_qgroups) + return 0; + + *inherit = kzalloc(sizeof(**inherit) + num_qgroups * sizeof(u64), GFP_NOFS); + if (!*inherit) + return -ENOMEM; + (*inherit)->num_qgroups = num_qgroups; + + list_for_each_entry(qg_list, &inode_qg->groups, next_group) { + u64 qg_id = qg_list->group->qgroupid; + *((u64 *)((*inherit)+1) + i) = qg_id; + } + + return 0; +} + /* * Copy the accounting information between qgroups. This is necessary * when a snapshot or a subvolume is created. Throwing an error will @@ -3005,7 +3040,8 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans) * when a readonly fs is a reasonable outcome. */ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, - u64 objectid, struct btrfs_qgroup_inherit *inherit) + u64 objectid, u64 inode_rootid, + struct btrfs_qgroup_inherit *inherit) { int ret = 0; int i; @@ -3047,6 +3083,9 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, goto out; } + if (!inherit && btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) + qgroup_auto_inherit(fs_info, inode_rootid, &inherit); + if (inherit) { i_qgroups = (u64 *)(inherit + 1); nums = inherit->num_qgroups + 2 * inherit->num_ref_copies + @@ -3073,6 +3112,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, if (ret) goto out; + /* * add qgroup to all inherited groups */ diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index 94d85b4fbebd..ce6fa8694ca7 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -271,8 +271,7 @@ int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info); void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info); int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info, bool interruptible); -int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, - u64 dst); +int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst); int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst); int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid); @@ -366,7 +365,8 @@ int btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans, u64 bytenr, int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans); int btrfs_run_qgroups(struct btrfs_trans_handle *trans); int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, - u64 objectid, struct btrfs_qgroup_inherit *inherit); + u64 objectid, u64 inode_rootid, + struct btrfs_qgroup_inherit *inherit); void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info, u64 ref_root, u64 num_bytes, enum btrfs_qgroup_rsv_type type); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 2bb5a64f6d84..581cb7f5ea85 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1523,13 +1523,14 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, int ret; /* - * Save some performance in the case that full qgroups are not + * Save some performance in the case that qgroups are not * enabled. If this check races with the ioctl, rescan will * kick in anyway. */ if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) return 0; + /* * Ensure dirty @src will be committed. Or, after coming * commit_fs_roots() and switch_commit_roots(), any dirty but not @@ -1566,7 +1567,7 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, /* Now qgroup are all updated, we can inherit it to new qgroups */ ret = btrfs_qgroup_inherit(trans, src->root_key.objectid, dst_objectid, - inherit); + parent->root_key.objectid, inherit); if (ret < 0) goto out; @@ -1832,8 +1833,12 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, * To co-operate with that hack, we do hack again. * Or snapshot will be greatly slowed down by a subtree qgroup rescan */ - ret = qgroup_account_snapshot(trans, root, parent_root, - pending->inherit, objectid); + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_FULL) + ret = qgroup_account_snapshot(trans, root, parent_root, + pending->inherit, objectid); + else if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) + ret = btrfs_qgroup_inherit(trans, root->root_key.objectid, objectid, + parent_root->root_key.objectid, pending->inherit); if (ret < 0) goto fail; From patchwork Wed Jul 5 23:20:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13303045 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95CEAC00528 for ; Wed, 5 Jul 2023 23:24:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232125AbjGEXX5 (ORCPT ); Wed, 5 Jul 2023 19:23:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42154 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232469AbjGEXXc (ORCPT ); Wed, 5 Jul 2023 19:23:32 -0400 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 497E6171A for ; Wed, 5 Jul 2023 16:23:31 -0700 (PDT) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id B6AA05C0283; Wed, 5 Jul 2023 19:23:30 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Wed, 05 Jul 2023 19:23:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1688599410; x= 1688685810; bh=8lAK8TLx1DpHNqHscJ9rV3z2fS9NQS09/y7fRQguPlo=; b=Z d4o9UXUDUt/Gkl1cxrQlq699YftWVccUEWtG47LiLIcONX6IT4cdgNhLudjYv5ET 3rIrNJS6Fi+U4QdJegqLx8piAIVWrTxvdbznb2yGe+/iHG4pav57TAWG1R8rVH7T nEqTZjUPvkQAkiEIEqZnOjO6jJBn2Tc4AJP3qBenuJiOC24rWYJiiOz6Y52jn0ER Clxtpq88JGqQO0dJu4yoqc102X4GRxqdAQQh6kRCcu1D5U+R/yZ9R29FKoLd9fHm labPzUS2OjWUSMCc/4//Wv4yLyMlN/UQtPygDGupBY1mAv3Lwic9bU5RBJd56pxU ZvLxAHFQHRi3jbxv9NZ0g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1688599410; x=1688685810; bh=8 lAK8TLx1DpHNqHscJ9rV3z2fS9NQS09/y7fRQguPlo=; b=JD7BIgFctvMPsXE2V hsJsf+Hzb4OPkkeaQT1JIyB2bmOvYcMS5JmvRe5bhAXfueTxmKdKgmVAdcT5NZIM KzoyA6Gg6Rt1oG2ARlYB0tdBrwG6BxOkPSEbQB8Iho5hpb+FaSHpvJ1flmkVAI2i s84kDGkQo4e7vA9XNakgeO3rhELi59H+7KkcD8uH/wRJmm85+5XiSh1dfSrl7SEZ FXfDOATp5sS+Sp5vGQQNJDvOLiOlmusJErO7lMiEU0W8DnicgDWzHaERNEzRq7ae bi4A41HR6bf2VMZMUM4WKSHs88Y3iGh8koeR09N/4soG6EjtVo2ek7PFT2wlQizJ JgMMg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudekgddvudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhi oheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejffevvd ehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgepvdenucfrrghrrghmpehm rghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 Jul 2023 19:23:30 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 16/18] btrfs: check generation when recording simple quota delta Date: Wed, 5 Jul 2023 16:20:53 -0700 Message-ID: <53936613c2fc11671997383e1f5a5b5878687784.1688597211.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Simple quotas count extents only from the moment the feature is enabled. Therefore, if we do something like: 1. create subvol S 2. write F in S 3. enable quotas 4. remove F 5. write G in S then after 3. and 4. we would expect the simple quota usage of S to be 0 (putting aside some metadata extents that might be written) and after 5., it should be the size of G plus metadata. Therefore, we need to be able to determine whether a particular quota delta we are processing predates simple quota enablement. To do this, store the transaction id when quotas were enabled. In fs_info for immediate use and in the quota status item to make it recoverable on mount. When we see a delta, check if the generation of the extent item is less than that of quota enablement. If so, we should ignore the delta from this extent. Signed-off-by: Boris Burkov Reviewed-by: Josef Bacik --- fs/btrfs/accessors.h | 2 ++ fs/btrfs/extent-tree.c | 4 ++++ fs/btrfs/fs.h | 2 ++ fs/btrfs/qgroup.c | 17 +++++++++++++---- fs/btrfs/qgroup.h | 1 + include/uapi/linux/btrfs_tree.h | 7 +++++++ 6 files changed, 29 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h index aab61312e4e8..8d122cc42cb7 100644 --- a/fs/btrfs/accessors.h +++ b/fs/btrfs/accessors.h @@ -971,6 +971,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item, flags, 64); BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item, rescan, 64); +BTRFS_SETGET_FUNCS(qgroup_status_enable_gen, struct btrfs_qgroup_status_item, + enable_gen, 64); /* btrfs_qgroup_info_item */ BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index c8914c66dc83..3cd1d24f1146 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1533,6 +1533,7 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans, .rsv_bytes = href->reserved_bytes, .is_data = true, .is_inc = true, + .generation = trans->transid, }; if (extent_op) @@ -1696,6 +1697,7 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans, .rsv_bytes = 0, .is_data = false, .is_inc = true, + .generation = trans->transid, }; BUG_ON(!extent_op || !extent_op->update_flags); @@ -3231,6 +3233,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, .rsv_bytes = 0, .is_data = is_data, .is_inc = false, + .generation = btrfs_extent_generation(leaf, ei), }; /* In this branch refs == 1 */ @@ -4866,6 +4869,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans, struct btrfs_simple_quota_delta delta = { .root = root_objectid, .num_bytes = ins->offset, + .generation = trans->transid, .rsv_bytes = 0, .is_data = true, .is_inc = true, diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h index f76f450c2abf..da7b623ff15f 100644 --- a/fs/btrfs/fs.h +++ b/fs/btrfs/fs.h @@ -802,6 +802,8 @@ struct btrfs_fs_info { spinlock_t eb_leak_lock; struct list_head allocated_ebs; #endif + + u64 quota_enable_gen; }; static inline void btrfs_set_last_root_drop_gen(struct btrfs_fs_info *fs_info, diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 6714c5aeb4e1..c9bedd2e6451 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -468,8 +468,9 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info) btrfs_err(fs_info, "qgroup generation mismatch, marked as inconsistent"); } - fs_info->qgroup_flags = btrfs_qgroup_status_flags(l, - ptr); + if (btrfs_fs_incompat(fs_info, SIMPLE_QUOTA)) + fs_info->quota_enable_gen = btrfs_qgroup_status_enable_gen(l, ptr); + fs_info->qgroup_flags = btrfs_qgroup_status_flags(l, ptr); rescan_progress = btrfs_qgroup_status_rescan(l, ptr); goto next1; } @@ -1115,6 +1116,10 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info, ptr = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_qgroup_status_item); btrfs_set_qgroup_status_generation(leaf, ptr, trans->transid); + if (simple) { + btrfs_set_fs_incompat(fs_info, SIMPLE_QUOTA); + btrfs_set_qgroup_status_enable_gen(leaf, ptr, trans->transid); + } btrfs_set_qgroup_status_version(leaf, ptr, BTRFS_QGROUP_STATUS_VERSION); fs_info->qgroup_flags = BTRFS_QGROUP_STATUS_FLAG_ON; if (!simple) @@ -1210,6 +1215,8 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info, goto out_free_path; } + fs_info->quota_enable_gen = trans->transid; + mutex_unlock(&fs_info->qgroup_ioctl_lock); /* * Commit the transaction while not holding qgroup_ioctl_lock, to avoid @@ -1234,8 +1241,6 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info, spin_lock(&fs_info->qgroup_lock); fs_info->quota_root = quota_root; set_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags); - if (simple) - btrfs_set_fs_incompat(fs_info, SIMPLE_QUOTA); spin_unlock(&fs_info->qgroup_lock); /* Skip rescan for simple qgroups */ @@ -4629,6 +4634,10 @@ int btrfs_record_simple_quota_delta(struct btrfs_fs_info *fs_info, if (!is_fstree(root)) return 0; + /* If the extent predates enabling quotas, don't count it. */ + if (delta->generation < fs_info->quota_enable_gen) + return 0; + spin_lock(&fs_info->qgroup_lock); qgroup = find_qgroup_rb(fs_info, root); if (!qgroup) { diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index ce6fa8694ca7..ae1ce14b365c 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -241,6 +241,7 @@ struct btrfs_simple_quota_delta { u64 rsv_bytes; /* The number of bytes reserved for this extent */ bool is_inc; /* Whether we are using or freeing the extent */ bool is_data; /* Whether the extent is data or metadata */ + u64 generation; /* The generation the extent was created in */ }; static inline u64 btrfs_qgroup_subvolid(u64 qgroupid) diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index 424c7f342712..7797560f0215 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -1230,6 +1230,13 @@ struct btrfs_qgroup_status_item { * of the scan. It contains a logical address */ __le64 rescan; + + /* + * the generation when quotas are enabled. Used by simple quotas to + * avoid decrementing when freeing an extent that was written before + * enable. + */ + __le64 enable_gen; } __attribute__ ((__packed__)); struct btrfs_qgroup_info_item { From patchwork Wed Jul 5 23:20:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13303042 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8051EC001B0 for ; Wed, 5 Jul 2023 23:24:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232429AbjGEXX6 (ORCPT ); Wed, 5 Jul 2023 19:23:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232473AbjGEXXe (ORCPT ); Wed, 5 Jul 2023 19:23:34 -0400 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 066D8E57 for ; Wed, 5 Jul 2023 16:23:33 -0700 (PDT) Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.nyi.internal (Postfix) with ESMTP id 761AB5C0293; Wed, 5 Jul 2023 19:23:32 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Wed, 05 Jul 2023 19:23:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1688599412; x= 1688685812; bh=mIWHowgCkeL8+II9h8XH1QEsr3JJexoJBAWBlbzwGco=; b=l sCkYKYl7SrW/K+hqbyRL8cuiohL2HYNNP8DY90RKoaS5pprBSVN+3JY8Mmns4J3L ITtv/iKKw7pJ8a32PnhTMR14kptKAiLalVOuolDZaqVgn14891k1lkHIYxV80SrO p0zyCee9IynLhpewF7ufyAXRBdjIRF5mn1T1NAS5SvIqUUmg9aOLw3VP3e+YOq40 YtDYGbhuz/pT2J0USDfIZ+2hKyuquwv+D+vRhR5I40XMM15c++/twEO8JtWRj/FJ 15J6JZaEuJOcSSMaGa1t+rPIvQE5SYWqrMmLavbU5rXwgcen6kcQ6PWYLyafd2cI gokYCNThsfv4b6b0qwQQQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1688599412; x=1688685812; bh=m IWHowgCkeL8+II9h8XH1QEsr3JJexoJBAWBlbzwGco=; b=ZM3kXVGu3iEnVt3E/ BOXesV9jmhruG8VjjHrsxvnI8tH7i1pC7KudMqDE6E8vI8ue2pnQRQIFyX+5iJCb 6A/3FAEWcttmG6O3RkAk16h2Q0IKWvCBKboo3Qs5LveUTQdqNJ3Es7Bbm0mVwq77 TDcOelm5D/YErAElD5PrZWXV+5JDK7qW1kPnjDml/hdUvM7hQfw0qe+VdgA5Oe4I aA931daJeA3xdGvPPfwVbzMa30cB4nBeCGIW+DoRnzCnPIHqjEg+hmAlqgi1jhLr kKTnXWQezSzcliUPljPYz4LkR3NsERdHWOgWcZ26stm2yaHN07GDpzKn1QLP8zDK Qusvw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudekgddvudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhi oheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejffevvd ehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgepfeenucfrrghrrghmpehm rghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 Jul 2023 19:23:31 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 17/18] btrfs: track metadata relocation cow with simple quota Date: Wed, 5 Jul 2023 16:20:54 -0700 Message-ID: <100b6185fc66f98c72cb8ad93092aba193b681e6.1688597211.git.boris@bur.io> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Relocation cows metadata blocks in two cases for the reloc root: - copying the subvol root item when creating the reloc root - copying a btree node when there is a cow during relocation In both cases, the resulting btree node hits an abnormal code path with respect to the owner field in its btrfs_header. It first creates the root item for the new objectid, which populates the reloc root id, and it at this point that delayed refs are created. Later, it fully copies the old node into the new node (including the original owner field) which overwrites it. This results in a simple quotas mismatch where we run the delayed ref for the reloc root which has no simple quota effect (reloc root is not an fstree) but when we ultimately delete the node, the owner is the real original fstree and we do free the space. To work around this without tampering with the behavior of relocation, add a parameter to btrfs_add_tree_block that lets the relocation code path specify a different owning root than the "operating" root (in this case, owning root is the real root and the operating root is the reloc root). These can naturally be plumbed into delayed refs that have the same concept. Note that this is a double count in some sense, but a relatively natural one, as there are really two extents, and the old one will be deleted soon. This is consistent with how data relocation extents are accounted by simple quotas. Signed-off-by: Boris Burkov Reviewed-by: Josef Bacik --- fs/btrfs/ctree.c | 22 ++++++++++++++-------- fs/btrfs/disk-io.c | 4 ++-- fs/btrfs/extent-tree.c | 8 ++++++-- fs/btrfs/extent-tree.h | 3 ++- fs/btrfs/ioctl.c | 2 +- 5 files changed, 25 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index a4cb4b642987..cb0d4535de37 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -316,6 +316,7 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans, int ret = 0; int level; struct btrfs_disk_key disk_key; + u64 reloc_src_root = 0; WARN_ON(test_bit(BTRFS_ROOT_SHAREABLE, &root->state) && trans->transid != fs_info->running_transaction->transid); @@ -328,9 +329,11 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans, else btrfs_node_key(buf, &disk_key, 0); + if (new_root_objectid == BTRFS_TREE_RELOC_OBJECTID) + reloc_src_root = btrfs_header_owner(buf); cow = btrfs_alloc_tree_block(trans, root, 0, new_root_objectid, &disk_key, level, buf->start, 0, - BTRFS_NESTING_NEW_ROOT); + BTRFS_NESTING_NEW_ROOT, reloc_src_root); if (IS_ERR(cow)) return PTR_ERR(cow); @@ -522,6 +525,7 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans, int last_ref = 0; int unlock_orig = 0; u64 parent_start = 0; + u64 reloc_src_root = 0; if (*cow_ret == buf) unlock_orig = 1; @@ -540,12 +544,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans, else btrfs_node_key(buf, &disk_key, 0); - if ((root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) && parent) - parent_start = parent->start; - + if (root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) { + if (parent) + parent_start = parent->start; + reloc_src_root = btrfs_header_owner(buf); + } cow = btrfs_alloc_tree_block(trans, root, parent_start, root->root_key.objectid, &disk_key, level, - search_start, empty_size, nest); + search_start, empty_size, nest, reloc_src_root); if (IS_ERR(cow)) return PTR_ERR(cow); @@ -2956,7 +2962,7 @@ static noinline int insert_new_root(struct btrfs_trans_handle *trans, c = btrfs_alloc_tree_block(trans, root, 0, root->root_key.objectid, &lower_key, level, root->node->start, 0, - BTRFS_NESTING_NEW_ROOT); + BTRFS_NESTING_NEW_ROOT, 0); if (IS_ERR(c)) return PTR_ERR(c); @@ -3100,7 +3106,7 @@ static noinline int split_node(struct btrfs_trans_handle *trans, split = btrfs_alloc_tree_block(trans, root, 0, root->root_key.objectid, &disk_key, level, c->start, 0, - BTRFS_NESTING_SPLIT); + BTRFS_NESTING_SPLIT, 0); if (IS_ERR(split)) return PTR_ERR(split); @@ -3853,7 +3859,7 @@ static noinline int split_leaf(struct btrfs_trans_handle *trans, right = btrfs_alloc_tree_block(trans, root, 0, root->root_key.objectid, &disk_key, 0, l->start, 0, num_doubles ? BTRFS_NESTING_NEW_ROOT : - BTRFS_NESTING_SPLIT); + BTRFS_NESTING_SPLIT, 0); if (IS_ERR(right)) return PTR_ERR(right); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 7513388b0567..29f11d91bfb6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -862,7 +862,7 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans, root->root_key.offset = 0; leaf = btrfs_alloc_tree_block(trans, root, 0, objectid, NULL, 0, 0, 0, - BTRFS_NESTING_NORMAL); + BTRFS_NESTING_NORMAL, 0); if (IS_ERR(leaf)) { ret = PTR_ERR(leaf); leaf = NULL; @@ -939,7 +939,7 @@ int btrfs_alloc_log_tree_node(struct btrfs_trans_handle *trans, */ leaf = btrfs_alloc_tree_block(trans, root, 0, BTRFS_TREE_LOG_OBJECTID, - NULL, 0, 0, 0, BTRFS_NESTING_NORMAL); + NULL, 0, 0, 0, BTRFS_NESTING_NORMAL, 0); if (IS_ERR(leaf)) return PTR_ERR(leaf); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 3cd1d24f1146..99845a54e168 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -5005,7 +5005,8 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans, const struct btrfs_disk_key *key, int level, u64 hint, u64 empty_size, - enum btrfs_lock_nesting nest) + enum btrfs_lock_nesting nest, + u64 reloc_src_root) { struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_key ins; @@ -5017,6 +5018,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans, int ret; u32 blocksize = fs_info->nodesize; bool skinny_metadata = btrfs_fs_incompat(fs_info, SKINNY_METADATA); + u64 owning_root; #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS if (btrfs_is_testing(fs_info)) { @@ -5043,11 +5045,13 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans, ret = PTR_ERR(buf); goto out_free_reserved; } + owning_root = btrfs_header_owner(buf); if (root_objectid == BTRFS_TREE_RELOC_OBJECTID) { if (parent == 0) parent = ins.objectid; flags |= BTRFS_BLOCK_FLAG_FULL_BACKREF; + owning_root = reloc_src_root; } else BUG_ON(parent > 0); @@ -5067,7 +5071,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans, extent_op->level = level; btrfs_init_generic_ref(&generic_ref, BTRFS_ADD_DELAYED_EXTENT, - ins.objectid, ins.offset, parent, btrfs_header_owner(buf)); + ins.objectid, ins.offset, parent, owning_root); btrfs_init_tree_ref(&generic_ref, level, root_objectid, root->root_key.objectid, false); btrfs_ref_tree_mod(fs_info, &generic_ref); diff --git a/fs/btrfs/extent-tree.h b/fs/btrfs/extent-tree.h index 9c8fd96cbee7..8c6beb524d26 100644 --- a/fs/btrfs/extent-tree.h +++ b/fs/btrfs/extent-tree.h @@ -121,7 +121,8 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans, const struct btrfs_disk_key *key, int level, u64 hint, u64 empty_size, - enum btrfs_lock_nesting nest); + enum btrfs_lock_nesting nest, + u64 reloc_src_root); void btrfs_free_tree_block(struct btrfs_trans_handle *trans, u64 root_id, struct extent_buffer *buf, diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 23f5142ef18b..c8caab65a46b 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -657,7 +657,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap, goto out; leaf = btrfs_alloc_tree_block(trans, root, 0, objectid, NULL, 0, 0, 0, - BTRFS_NESTING_NORMAL); + BTRFS_NESTING_NORMAL, 0); if (IS_ERR(leaf)) { ret = PTR_ERR(leaf); goto out; From patchwork Wed Jul 5 23:20:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 13303044 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A586DC001E0 for ; Wed, 5 Jul 2023 23:24:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232434AbjGEXX6 (ORCPT ); Wed, 5 Jul 2023 19:23:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42172 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232481AbjGEXXf (ORCPT ); Wed, 5 Jul 2023 19:23:35 -0400 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F54CE57 for ; Wed, 5 Jul 2023 16:23:34 -0700 (PDT) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 197495C0290; Wed, 5 Jul 2023 19:23:34 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Wed, 05 Jul 2023 19:23:34 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc :content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1688599414; x= 1688685814; bh=a31zqO//fC/H2T4qCHwyOHDOKM65sKV0Mw8O7zaO0dM=; b=A 4Un0ME8zJgrVZ0ITHZ0MH/cNk8Aj3iFc8EKW05XKrKPXrdS8reGoCKKXLDG/iq2s PmRlwqTDTnNHK1CAQfm53N5JwIprRrx+O33QyYo3OcjBtdcGSamQt9LaMA1D+NwK 1dB2otxLOA5aPMY7XEsHyUaQAKjTX1Pw21IER3Hdzh/Texm44X45v1oJuXha83Q0 RcSDzwhiOwhOYyJ72Wm3nMhPYXKM/z+tPVU2IXzGenl/QWwqqqu/C67cfYCdaj1M sQMIVrLUgkRzw0VUkfGMxl8GgQItHvRfQdjxu59SKgRdcM7xyDuhQ558rnz4Tiji +ZqWUcRfBqD0PPtoMSouA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1688599414; x=1688685814; bh=a 31zqO//fC/H2T4qCHwyOHDOKM65sKV0Mw8O7zaO0dM=; b=HkUJAZjz4xx/kpdna BgVz8PiNNFXjUr5j0A/LH8JpQX0bTLVkLoSH9xmO5Po2G66C1j56iIq/lTPaCKMK f7o9eZnQ/0mC4JsTF9M0q158lKMvLoFm0usQiK2U/l2cXxjjgS11A3FOgdOOyZgV 0iQWRaDj9rkhukT9Qo2eYQD85MFALk/llR9b1nB4Patk8AMdqvW9b+c9YJdIgneu kOa+3a5Xd1UFCHFGo3E/HaNglCEAGZ0Ks8TRySwV4/tA/8ECTPM73PNMK/GwEwEJ ovKw7lfGxZIX64Um+bMcIDW9o4ZLRA1/l1kUEGDEK6txbT+tVn8xHyaydw1AO4C3 dYz2A== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudekgddvudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtkeertd ertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhi oheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejffevvd ehtddufeeihfekgeeuheelnecuvehluhhsthgvrhfuihiivgepfeenucfrrghrrghmpehm rghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 Jul 2023 19:23:33 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 18/18] btrfs: track data relocation with simple quota Date: Wed, 5 Jul 2023 16:20:55 -0700 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Relocation data allocations are quite tricky for simple quotas. The basic data relocation sequence is (ignoring details that aren't relevant to this fix): - create a fake relocation data fs root - create a fake relocation inode in that root - foreach data extent: - preallocate a data extent on behalf of the fake inode - copy over the data - foreach extent - swap the refs so that the original file extent now refers to the new extent item - drop the fake root, dropping its refs on the old extents, which lets us delete them. Done naively, this results in storing an extent item in the extent tree whose owner_ref points at the relocation data root and a no-op squota recording, since the reloc root is not a legit fstree. So far, that's OK. The problem comes when you do the swap, and leave an extent item owned by this bogus root as the real permanent extents of the file. If the file then drops that ref, we free it and no-op account that against the fake relocation root. Essentially, this means that relocation is simple quota "extent laundering", since we re-own the extents into a fake root. Simple quotas very intentionally doesn't have a mechanism for transferring ownership of extents, as that is exactly the complicated thing we are trying to avoid with the new design. Further, it cannot be correctly done in this case, since at the time you create the new "real" refs, there is no way to know which was the original owner before relocation unless we track it. Therefore, it makes more sense to trick the preallocation to handle relocation as a special case and note the proper owner ref from the beginning. That way, we never write out an extent item without the correct owner ref that it will eventually have. This could be done by wiring a special root parameter all the way through the allocation code path, but to avoid that special case touching all the code, take advantage of the serial nature of relocation to store the src root on the relocation root object. Then when we finish the prealloc, if it happens to be this case, prepare the delayed ref appropriately. This is obviously a smelly bit of code, but I think it is the best solution to the problem, given the relocation implementation. Signed-off-by: Boris Burkov --- fs/btrfs/ctree.h | 1 + fs/btrfs/extent-tree.c | 13 +++++++------ fs/btrfs/relocation.c | 15 +++++++++++++++ 3 files changed, 23 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index f2d2b313bde5..577186994188 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -333,6 +333,7 @@ struct btrfs_root { #ifdef CONFIG_BTRFS_DEBUG struct list_head leak_list; #endif + u64 relocation_src_root; }; static inline bool btrfs_root_readonly(const struct btrfs_root *root) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 99845a54e168..10e026d5b684 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -57,7 +57,7 @@ static void __run_delayed_extent_op(struct btrfs_delayed_extent_op *extent_op, static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans, u64 parent, u64 root_objectid, u64 flags, u64 owner, u64 offset, - struct btrfs_key *ins, int ref_mod); + struct btrfs_key *ins, int ref_mod, u64 oref_root); static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans, struct btrfs_delayed_ref_node *node, struct btrfs_delayed_extent_op *extent_op); @@ -1541,7 +1541,7 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans, ret = alloc_reserved_file_extent(trans, parent, ref_root, flags, ref->objectid, ref->offset, &ins, - node->ref_mod); + node->ref_mod, href->owning_root); if (!ret) ret = btrfs_record_simple_quota_delta(trans->fs_info, &delta); } else if (node->action == BTRFS_ADD_DELAYED_REF) { @@ -4683,7 +4683,7 @@ static int alloc_reserved_extent(struct btrfs_trans_handle *trans, u64 bytenr, static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans, u64 parent, u64 root_objectid, u64 flags, u64 owner, u64 offset, - struct btrfs_key *ins, int ref_mod) + struct btrfs_key *ins, int ref_mod, u64 oref_root) { struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_root *extent_root; @@ -4731,7 +4731,7 @@ static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans, if (simple_quota) { btrfs_set_extent_inline_ref_type(leaf, iref, BTRFS_EXTENT_OWNER_REF_KEY); oref = (struct btrfs_extent_owner_ref *)(&iref->offset); - btrfs_set_extent_owner_ref_root_id(leaf, oref, root_objectid); + btrfs_set_extent_owner_ref_root_id(leaf, oref, oref_root); iref = (struct btrfs_extent_inline_ref *)(oref + 1); } btrfs_set_extent_inline_ref_type(leaf, iref, type); @@ -4842,7 +4842,8 @@ int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans, BUG_ON(root_objectid == BTRFS_TREE_LOG_OBJECTID); - BUG_ON(root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID); + if (btrfs_is_data_reloc_root(root) && is_fstree(root->relocation_src_root)) + owning_root = root->relocation_src_root; btrfs_init_generic_ref(&generic_ref, BTRFS_ADD_DELAYED_EXTENT, ins->objectid, ins->offset, 0, owning_root); @@ -4899,7 +4900,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans, spin_unlock(&space_info->lock); ret = alloc_reserved_file_extent(trans, 0, root_objectid, 0, owner, - offset, ins, 1); + offset, ins, 1, root_objectid); if (ret) btrfs_pin_extent(trans, ins->objectid, ins->offset, 1); ret = btrfs_record_simple_quota_delta(fs_info, &delta); diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 119f670538f7..e12377c818c0 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -3665,6 +3665,21 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc) struct btrfs_extent_item); flags = btrfs_extent_flags(path->nodes[0], ei); + /* + * If we are relocating a simple quota owned extent item, we need + * to note the owner on the reloc data root so that when we + * allocate the replacement item, we can attribute it to the + * correct eventual owner (rather than the reloc data root) + */ + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) { + struct btrfs_root *root = BTRFS_I(rc->data_inode)->root; + u64 owning_root_id = btrfs_get_extent_owner_root(fs_info, + path->nodes[0], + path->slots[0]); + + root->relocation_src_root = owning_root_id; + } + if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) { ret = add_tree_block(rc, &key, path, &blocks); } else if (rc->stage == UPDATE_DATA_PTRS &&