From patchwork Thu Dec 12 23:03:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 13906087 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 51C011DE3DB for ; Thu, 12 Dec 2024 23:03:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044639; cv=none; b=MFJz79Q57LXfMadRp2I7KXsbbLtsCwgnqex23W/PsCAe5OrCQ7gMZfa2tHb3zDpnfD2LwjIJQNED9Au6wkSSyNGaW7bBEW/dRtOTArxmy6ZvnUseBZWASx4NAThOCEQe7R3jLCTOX1s/dJBJ8I+13+vLVHEM6oDxRwGdAL0KjUk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044639; c=relaxed/simple; bh=fU88zLcq6ecPFqxhMBQ+2G9JGuR1hEfsB9CdeCQqDmY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=bdx/LvNkVoG+7R3bXlVv9U4SG+24R1n52tsityJ96YFA7WkZ8h/b1gARamnmDENwxyJAJsXIxCX4asyugEV91XYQH2yy0WH8msYtlWmSLq0zfb3tWQCjb3FJg3YFRnJmoeDpAC90YicP8wxzM7gUHr+CG54xCdgoTQRtI3BvI3A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NXq+k5PR; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NXq+k5PR" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5EB2AC4CED0; Thu, 12 Dec 2024 23:03:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734044638; bh=fU88zLcq6ecPFqxhMBQ+2G9JGuR1hEfsB9CdeCQqDmY=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=NXq+k5PRTDf55jisKjLCSM6R+CtzOKUHkdXFm+S5vkwcJcUD3UJFIIrxFm1e/hSZ7 /bmsNX4c2xg3jeHwM9JWITQ9/E2QBjoLscHqu12Fex2ERk5uPPd7zepgvaOfUy5X4o ZdUL7sVfcMVVz+Z8FEYAZ5doaRj79faXGM2d20Vnq2fixdAK5c1d9qv/eve5cJJv2/ okrsCqes/kjUpsAQIsrOHLOXYt9GoJGC6f9tV0LpW0tXqis2O3mIWcRVGxZU8D8xT8 z0oSde7nbmMSbQTp5RsOg634kVtE2UAZ1+JqI0Wg7afUXFArjHvO1jBSfbpZ+OBPPl COqChwElUWAlg== From: Christian Brauner Date: Fri, 13 Dec 2024 00:03:40 +0100 Subject: [PATCH v3 01/10] mount: remove inlude/nospec.h include Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241213-work-mount-rbtree-lockless-v3-1-6e3cdaf9b280@kernel.org> References: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> In-Reply-To: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> To: Josef Bacik , Jeff Layton Cc: "Paul E. McKenney" , Peter Ziljstra , linux-fsdevel@vger.kernel.org, Christian Brauner X-Mailer: b4 0.15-dev-355e8 X-Developer-Signature: v=1; a=openpgp-sha256; l=545; i=brauner@kernel.org; h=from:subject:message-id; bh=fU88zLcq6ecPFqxhMBQ+2G9JGuR1hEfsB9CdeCQqDmY=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaRHZ9+yM4+/skvLTNX28aJYw9qjGnvMOvJnr72verOKw 8k9lEuho5SFQYyLQVZMkcWh3SRcbjlPxWajTA2YOaxMIEMYuDgFYCJ/9zH895/4qaKcZ1OafIQ3 x8ft8R7f7ddI+3w+9HlnvKHpPwftmQz/oxvtfji+ZfkewDLF3FJd+1ayxNXn7LMiE5aVty5ib2N lAQA= X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 It's not needed, so remove it. Signed-off-by: Christian Brauner --- fs/namespace.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/namespace.c b/fs/namespace.c index 23e81c2a1e3fee7d97df2a84a69438a677933654..c3dbe6a7ab6b1c77c2693cc75941da89fa921048 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -32,7 +32,6 @@ #include #include #include -#include #include "pnode.h" #include "internal.h" From patchwork Thu Dec 12 23:03:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 13906088 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 539881B07AE for ; Thu, 12 Dec 2024 23:04:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044641; cv=none; b=VpCSYJiB0wYMJlQqTyHfCMzMkqxKfLnuoON6RskQktjWpgKgIjB8AS3cP++EagfiUk0HbZZLYs15LlUjkgKoY5+FIZiMC9zwFLLEX6Vfht8fzZ23cmOysJoqqBxUahv+oIoaqu5AIQX7wI5z/jkSo5tAChAPVXgbc9dKLz6DqSM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044641; c=relaxed/simple; bh=jkMKbLRTLoaucwliP/7pN/v0DVUIV8X0yAaVu+aF01E=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=reOJ8FdyXkeKa5xxRS6SGdzrTRC22PQzx5Ol3AUfJNVx3b01dJYXo+Rd4CLJ0yGfY+qYWFcCnO00fFi9W+4ECsZMWJzBqteD8NGJrUpUNhn2JGg3JYxdbVgWr8+KlDKarkgzWA+rYKpy+gX3mijtdCSEBvZgFHMvSDKE3lCG4oQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=tZj4/fwJ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="tZj4/fwJ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5F20FC4CED3; Thu, 12 Dec 2024 23:03:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734044640; bh=jkMKbLRTLoaucwliP/7pN/v0DVUIV8X0yAaVu+aF01E=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=tZj4/fwJB4iitJHdgFsr98ZbQGPViKur9uHXzLCAxA/yzTfryaoHPhQ2RWJo9Q2zQ SxeM4QD4QgGUZYkwJIP4jao5ocB04QrfM1VicZSYlEbSqaq1YgCwa54Wjhv+35cYjk p/mkKf18Gy4wvbuOEKOtskBc2S/J8hEW0QEqSdA4UDUZ7Mg3sLRRfmWq7MzMc54a1E S/rMqPf91ZACzbl6of2l3Jw7x7YqqMuKZ/ne4chOTZoWpp0zVz6kNyJgY4oJnpkl7Z 0pohXEtgN3YvwGBCzrhIP/FfewaCllVsxRKeH5LqfGLlZtzjlGuyA1CUL7zbjKV54s 0lBHWBJGesnXQ== From: Christian Brauner Date: Fri, 13 Dec 2024 00:03:41 +0100 Subject: [PATCH v3 02/10] fs: add mount namespace to rbtree late Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241213-work-mount-rbtree-lockless-v3-2-6e3cdaf9b280@kernel.org> References: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> In-Reply-To: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> To: Josef Bacik , Jeff Layton Cc: "Paul E. McKenney" , Peter Ziljstra , linux-fsdevel@vger.kernel.org, Christian Brauner X-Mailer: b4 0.15-dev-355e8 X-Developer-Signature: v=1; a=openpgp-sha256; l=964; i=brauner@kernel.org; h=from:subject:message-id; bh=jkMKbLRTLoaucwliP/7pN/v0DVUIV8X0yAaVu+aF01E=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaRHZ9+yMug5WmoyJTWjPMsiwTuwtfDzFXM/rS3FXscbV +22dY3uKGVhEONikBVTZHFoNwmXW85TsdkoUwNmDisTyBAGLk4BmAjLVIZ/NnU9XhoWP7fW5Qqe XuncO/Phxw5R/W97c0y1e9pmmO06wsjQWZMkov5Zhztuk6ar7s+8+psnFn5+q2H75Zx3YErVIxU mAA== X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 There's no point doing that under the namespace semaphore it just gives the false impression that it protects the mount namespace rbtree and it simply doesn't. Signed-off-by: Christian Brauner --- fs/namespace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/namespace.c b/fs/namespace.c index c3dbe6a7ab6b1c77c2693cc75941da89fa921048..10fa18dd66018fadfdc9d18c59a851eed7bd55ad 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -3983,7 +3983,6 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns, while (p->mnt.mnt_root != q->mnt.mnt_root) p = next_mnt(skip_mnt_tree(p), old); } - mnt_ns_tree_add(new_ns); namespace_unlock(); if (rootmnt) @@ -3991,6 +3990,7 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns, if (pwdmnt) mntput(pwdmnt); + mnt_ns_tree_add(new_ns); return new_ns; } From patchwork Thu Dec 12 23:03:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 13906089 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E7141DE3DB for ; Thu, 12 Dec 2024 23:04:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044643; cv=none; b=TpimSM3TEl8SKKD05vMrrH1Ay9zsmXDrqgMxN1wsa0nC3yrY+ccX3v9qEsmP/+syD+dHD/KkY+gKM6u302048r8/5nX9EdvehqrPCSi/27G0EHryzGk4xG9WZnc3XlnLgKmIFi/z3DmqL0QiqGTBsBbJ+gXCd9Yt6N8E+8bFCy0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044643; c=relaxed/simple; bh=820WYTCAXjCPqP2AExAcrX3rT4vmFJWlBlgB1cdafIc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=cJFAwZ4naKqDrWcfxICtftSLlb1kdhzcFuL7pMl08RdVKYuofyoB0eUUX0ufGpNbdrXvLjJ2P/rRcVcfmKMIBloomidsWn3tDfZ00SvXHR7ccj8m0/HNWCBdxnS4d+tXbhO4bmw/q58nbaHfLMZGpBQJSLVeYekwz+lUag8R05A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=md4v1rzB; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="md4v1rzB" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 69704C4CECE; Thu, 12 Dec 2024 23:04:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734044642; bh=820WYTCAXjCPqP2AExAcrX3rT4vmFJWlBlgB1cdafIc=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=md4v1rzBL0QycBwoKWnNUjvPzz1Qng9RpmW8x2Yn/FGQPft3XBzCCr4R+ZPEbr+Rh W8kCpCwbIZR/99j3+xMeyks/azy65AKZdNbg4Swxm9goXHX+sXMDnu7LrVqj88wEZF pm9s5L34bfT0eo/mqridWp4oRvGTC17pK5B4oduyY0gQkC1B5J/0WdlcS9KwSlSw2B V0/W5DQIAuKgRff5DcWixzr9SjJXhy+Q2lCvyvyMERgyRsAb3GSlGt4jUHtd3xClF0 bBTRge6Y7yn9nHNZQryfPe/eYBaNIsCgZcQNw5ywu4XcOgtC7lFFDfB//Knco4bpMI tp3PH5MkPsNdw== From: Christian Brauner Date: Fri, 13 Dec 2024 00:03:42 +0100 Subject: [PATCH v3 03/10] fs: lockless mntns rbtree lookup Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241213-work-mount-rbtree-lockless-v3-3-6e3cdaf9b280@kernel.org> References: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> In-Reply-To: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> To: Josef Bacik , Jeff Layton Cc: "Paul E. McKenney" , Peter Ziljstra , linux-fsdevel@vger.kernel.org, Christian Brauner X-Mailer: b4 0.15-dev-355e8 X-Developer-Signature: v=1; a=openpgp-sha256; l=6711; i=brauner@kernel.org; h=from:subject:message-id; bh=820WYTCAXjCPqP2AExAcrX3rT4vmFJWlBlgB1cdafIc=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaRHZ9+6FntC+6aQJvurQgX2oDz5SzK/9u61Msj87K1dt 8d33mP7jlIWBjEuBlkxRRaHdpNwueU8FZuNMjVg5rAygQxh4OIUgIncnMLwVyxvqc+BZs3u61Z5 i0t2Mlplainud+quOanLK16px+C0geEPT/3eYx4v51W47RDhmRur8LnF/9ar8Aer6/wb1madmHy VFQA= X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Currently we use a read-write lock but for the simple search case we can make this lockless. Creating a new mount namespace is a rather rare event compared with querying mounts in a foreign mount namespace. Once this is picked up by e.g., systemd to list mounts in another mount in it's isolated services or in containers this will be used a lot so this seems worthwhile doing. Signed-off-by: Christian Brauner --- fs/mount.h | 5 ++- fs/namespace.c | 119 +++++++++++++++++++++++++++++++++++---------------------- 2 files changed, 77 insertions(+), 47 deletions(-) diff --git a/fs/mount.h b/fs/mount.h index 185fc56afc13338f8185fe818051444d540cbd5b..36ead0e45e8aa7614c00001102563a711d9dae6e 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -12,7 +12,10 @@ struct mnt_namespace { struct user_namespace *user_ns; struct ucounts *ucounts; u64 seq; /* Sequence number to prevent loops */ - wait_queue_head_t poll; + union { + wait_queue_head_t poll; + struct rcu_head mnt_ns_rcu; + }; u64 event; unsigned int nr_mounts; /* # of mounts in the namespace */ unsigned int pending_mounts; diff --git a/fs/namespace.c b/fs/namespace.c index 10fa18dd66018fadfdc9d18c59a851eed7bd55ad..52adee787eb1b6ee8831705b2b121854c3370fb3 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -79,6 +79,8 @@ static DECLARE_RWSEM(namespace_sem); static HLIST_HEAD(unmounted); /* protected by namespace_sem */ static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */ static DEFINE_RWLOCK(mnt_ns_tree_lock); +static seqcount_rwlock_t mnt_ns_tree_seqcount = SEQCNT_RWLOCK_ZERO(mnt_ns_tree_seqcount, &mnt_ns_tree_lock); + static struct rb_root mnt_ns_tree = RB_ROOT; /* protected by mnt_ns_tree_lock */ struct mount_kattr { @@ -105,17 +107,6 @@ EXPORT_SYMBOL_GPL(fs_kobj); */ __cacheline_aligned_in_smp DEFINE_SEQLOCK(mount_lock); -static int mnt_ns_cmp(u64 seq, const struct mnt_namespace *ns) -{ - u64 seq_b = ns->seq; - - if (seq < seq_b) - return -1; - if (seq > seq_b) - return 1; - return 0; -} - static inline struct mnt_namespace *node_to_mnt_ns(const struct rb_node *node) { if (!node) @@ -123,19 +114,41 @@ static inline struct mnt_namespace *node_to_mnt_ns(const struct rb_node *node) return rb_entry(node, struct mnt_namespace, mnt_ns_tree_node); } -static bool mnt_ns_less(struct rb_node *a, const struct rb_node *b) +static int mnt_ns_cmp(struct rb_node *a, const struct rb_node *b) { struct mnt_namespace *ns_a = node_to_mnt_ns(a); struct mnt_namespace *ns_b = node_to_mnt_ns(b); u64 seq_a = ns_a->seq; + u64 seq_b = ns_b->seq; + + if (seq_a < seq_b) + return -1; + if (seq_a > seq_b) + return 1; + return 0; +} - return mnt_ns_cmp(seq_a, ns_b) < 0; +static inline void mnt_ns_tree_write_lock(void) +{ + write_lock(&mnt_ns_tree_lock); + write_seqcount_begin(&mnt_ns_tree_seqcount); +} + +static inline void mnt_ns_tree_write_unlock(void) +{ + write_seqcount_end(&mnt_ns_tree_seqcount); + write_unlock(&mnt_ns_tree_lock); } static void mnt_ns_tree_add(struct mnt_namespace *ns) { - guard(write_lock)(&mnt_ns_tree_lock); - rb_add(&ns->mnt_ns_tree_node, &mnt_ns_tree, mnt_ns_less); + struct rb_node *node; + + mnt_ns_tree_write_lock(); + node = rb_find_add_rcu(&ns->mnt_ns_tree_node, &mnt_ns_tree, mnt_ns_cmp); + mnt_ns_tree_write_unlock(); + + WARN_ON_ONCE(node); } static void mnt_ns_release(struct mnt_namespace *ns) @@ -150,41 +163,36 @@ static void mnt_ns_release(struct mnt_namespace *ns) } DEFINE_FREE(mnt_ns_release, struct mnt_namespace *, if (_T) mnt_ns_release(_T)) +static void mnt_ns_release_rcu(struct rcu_head *rcu) +{ + struct mnt_namespace *mnt_ns; + + mnt_ns = container_of(rcu, struct mnt_namespace, mnt_ns_rcu); + mnt_ns_release(mnt_ns); +} + static void mnt_ns_tree_remove(struct mnt_namespace *ns) { /* remove from global mount namespace list */ if (!is_anon_ns(ns)) { - guard(write_lock)(&mnt_ns_tree_lock); + mnt_ns_tree_write_lock(); rb_erase(&ns->mnt_ns_tree_node, &mnt_ns_tree); + mnt_ns_tree_write_unlock(); } - mnt_ns_release(ns); + call_rcu(&ns->mnt_ns_rcu, mnt_ns_release_rcu); } -/* - * Returns the mount namespace which either has the specified id, or has the - * next smallest id afer the specified one. - */ -static struct mnt_namespace *mnt_ns_find_id_at(u64 mnt_ns_id) +static int mnt_ns_find(const void *key, const struct rb_node *node) { - struct rb_node *node = mnt_ns_tree.rb_node; - struct mnt_namespace *ret = NULL; - - lockdep_assert_held(&mnt_ns_tree_lock); - - while (node) { - struct mnt_namespace *n = node_to_mnt_ns(node); + const u64 mnt_ns_id = *(u64 *)key; + const struct mnt_namespace *ns = node_to_mnt_ns(node); - if (mnt_ns_id <= n->seq) { - ret = node_to_mnt_ns(node); - if (mnt_ns_id == n->seq) - break; - node = node->rb_left; - } else { - node = node->rb_right; - } - } - return ret; + if (mnt_ns_id < ns->seq) + return -1; + if (mnt_ns_id > ns->seq) + return 1; + return 0; } /* @@ -194,18 +202,37 @@ static struct mnt_namespace *mnt_ns_find_id_at(u64 mnt_ns_id) * namespace the @namespace_sem must first be acquired. If the namespace has * already shut down before acquiring @namespace_sem, {list,stat}mount() will * see that the mount rbtree of the namespace is empty. + * + * Note the lookup is lockless protected by a sequence counter. We only + * need to guard against false negatives as false positives aren't + * possible. So if we didn't find a mount namespace and the sequence + * counter has changed we need to retry. If the sequence counter is + * still the same we know the search actually failed. */ static struct mnt_namespace *lookup_mnt_ns(u64 mnt_ns_id) { - struct mnt_namespace *ns; + struct mnt_namespace *ns; + struct rb_node *node; + unsigned int seq; + + guard(rcu)(); + do { + seq = read_seqcount_begin(&mnt_ns_tree_seqcount); + node = rb_find_rcu(&mnt_ns_id, &mnt_ns_tree, mnt_ns_find); + if (node) + break; + } while (read_seqcount_retry(&mnt_ns_tree_seqcount, seq)); - guard(read_lock)(&mnt_ns_tree_lock); - ns = mnt_ns_find_id_at(mnt_ns_id); - if (!ns || ns->seq != mnt_ns_id) - return NULL; + if (!node) + return NULL; - refcount_inc(&ns->passive); - return ns; + /* + * The last reference count is put with RCU delay so we can + * unconditonally acquire a reference here. + */ + ns = node_to_mnt_ns(node); + refcount_inc(&ns->passive); + return ns; } static inline void lock_mount_hash(void) From patchwork Thu Dec 12 23:03:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 13906090 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2DB491B07AE for ; Thu, 12 Dec 2024 23:04:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044645; cv=none; b=m6/K4/fCYru+1Ttyk5l/h71yySDJcELN5wtSMAOJ4Ze+AQGsR3kdo/5khH1O+5vQLm10rBL5/mO9RCqiNx/ufUVZ160Q021L/IT+VhABsCTDT6Yuinn/YKuWLQMWL5QHypq+4GlzzwnX+l/rii4M/b8nvlKNmdUQZ9gqzYDwTTs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044645; c=relaxed/simple; bh=4VJSAxr2Q4F8FPOlzf8qOeMrTUOemf6hhrS7fHfl8Cs=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Xp2RE8y8KA6sVTfkbLyAhLOb2suDpGCOflvRCduvBoQPcLiAJ1SPgi2IIECnRMko/5AcBLWIaq19OnfZEFPI3LF0hcX87shZ8a4T77dr16rky73zqp04ZKTtK7nN5N+SSxBVEoO12TmnEnE6rUJNBt4ESVzoV9sYkWrLSeKcW2E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Hp8+eV8X; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Hp8+eV8X" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 46BDDC4CEDD; Thu, 12 Dec 2024 23:04:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734044644; bh=4VJSAxr2Q4F8FPOlzf8qOeMrTUOemf6hhrS7fHfl8Cs=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=Hp8+eV8X76rZrlneVWn8hxyhqVo5frzFQIREqoXyaAqG86qPCtwyUdBy0MiL7EOWo moZsq2tBaxZXNcxpwr3oDaYhQ1WUIg7DIDx+9K7vfDB7K8wjuDG+GL103bbNPdCogX MmsxUUAwzBsILN9sSzymKuthcvrf94Z28fHNGOZPDNAR8AEqMT0h5VWpS7XApQo1LT izxbtmfTiEROiDIFri4kwqbiRBCTEJybIQ0aLLzhX6QcDEckuXJWpqmy7wbwFGiTaI +hvc3QfGGjD2rxkl+NI2zns0MAlK6Em0q6Tt/DfK8zEJ557ZexbU9e3tPZ18pfzui4 RPnMMcC8OZW9Q== From: Christian Brauner Date: Fri, 13 Dec 2024 00:03:43 +0100 Subject: [PATCH v3 04/10] rculist: add list_bidir_{del,prev}_rcu() Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241213-work-mount-rbtree-lockless-v3-4-6e3cdaf9b280@kernel.org> References: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> In-Reply-To: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> To: Josef Bacik , Jeff Layton Cc: "Paul E. McKenney" , Peter Ziljstra , linux-fsdevel@vger.kernel.org, Christian Brauner X-Mailer: b4 0.15-dev-355e8 X-Developer-Signature: v=1; a=openpgp-sha256; l=3310; i=brauner@kernel.org; h=from:subject:message-id; bh=4VJSAxr2Q4F8FPOlzf8qOeMrTUOemf6hhrS7fHfl8Cs=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaRHZ9+S0YwK0g6PXiRx6KYQj9/6vHs7jf0+1eglLAv1W Tb/rXdnRykLgxgXg6yYIotDu0m43HKeis1GmRowc1iZQIYwcHEKwESWpDD8r3hib5Wg7sQhfJUt +Ntd3i9fK3e4PpSZluWdLs5WvrR7LSPD0zNrRdfPb17/dP1NnX+BmeqBOTpN9crn9RhmHjl4b+Y WJgA= X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Currently there is no primitive for retrieving the previous list member. To do this we need a new deletion primitive that doesn't poison the prev pointer and a corresponding retrieval helper. Note that it is not valid to ues both list_del_rcu() and list_bidir_del_rcu() on the same list. Suggested-by: "Paul E. McKenney" Signed-off-by: Christian Brauner Reviewed-by: Paul E. McKenney --- include/linux/rculist.h | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/include/linux/rculist.h b/include/linux/rculist.h index 14dfa6008467e803d57f98cfa0275569f1c6a181..270a9ee2f7976b1736545667973265a3bfb7ec41 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -30,6 +30,17 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list) * way, we must not access it directly */ #define list_next_rcu(list) (*((struct list_head __rcu **)(&(list)->next))) +/* + * Return the ->prev pointer of a list_head in an rcu safe way. Don't + * access it directly. + * + * Any list traversed with list_bidir_prev_rcu() must never use + * list_del_rcu(). Doing so will poison the ->prev pointer that + * list_bidir_prev_rcu() relies on, which will result in segfaults. + * To prevent these segfaults, use list_bidir_del_rcu() instead + * of list_del_rcu(). + */ +#define list_bidir_prev_rcu(list) (*((struct list_head __rcu **)(&(list)->prev))) /** * list_tail_rcu - returns the prev pointer of the head of the list @@ -158,6 +169,42 @@ static inline void list_del_rcu(struct list_head *entry) entry->prev = LIST_POISON2; } +/** + * list_bidir_del_rcu - deletes entry from list without re-initialization + * @entry: the element to delete from the list. + * + * In contrast to list_del_rcu() doesn't poison the prev pointer thus + * allowing backwards traversal via list_bidir_prev_rcu(). + * + * Note: list_empty() on entry does not return true after this because + * the entry is in a special undefined state that permits RCU-based + * lockfree reverse traversal. In particular this means that we can not + * poison the forward and backwards pointers that may still be used for + * walking the list. + * + * The caller must take whatever precautions are necessary (such as + * holding appropriate locks) to avoid racing with another list-mutation + * primitive, such as list_bidir_del_rcu() or list_add_rcu(), running on + * this same list. However, it is perfectly legal to run concurrently + * with the _rcu list-traversal primitives, such as + * list_for_each_entry_rcu(). + * + * Noe that the it is not allowed to use list_del_rcu() and + * list_bidir_del_rcu() on the same list. + * + * Note that list_del_rcu() and list_bidir_del_rcu() must not be used on + * the same list. + * + * Note that the caller is not permitted to immediately free + * the newly deleted entry. Instead, either synchronize_rcu() + * or call_rcu() must be used to defer freeing until an RCU + * grace period has elapsed. + */ +static inline void list_bidir_del_rcu(struct list_head *entry) +{ + __list_del_entry(entry); +} + /** * hlist_del_init_rcu - deletes entry from hash list with re-initialization * @n: the element to delete from the hash list. From patchwork Thu Dec 12 23:03:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 13906091 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 232551DE3DB for ; Thu, 12 Dec 2024 23:04:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044647; cv=none; b=K6o4+bpcSQ0u0fG1TlnYBhyWHo0kBnRqbvX6QPHQ7XZYtT3xQjU4irE1vsZgTHpF3lzMiZLjR6XjlK9/KzaqiQBM/Z8mB55DjO6hSN+HUV92h1MY1E33U+iDg2QR5ELx6C3hA/WhxpnePzBYG3G6MlqhHwb8n5j8OI0P6GGMD34= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044647; c=relaxed/simple; bh=+HrbHl3rDHja00XxAbQVDUlTGLxQ5nPTq9yadkF71tc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=aRCP9Ga5O1wPjOm4lizSriiNNM7dhQIFCi4W+WOie8kinOGe7+H9PsAOTX3Ah9WQezcy/AHXrVnAt/kKq4cwgRqJebhPi3x28wMl9Zummc8XA1LewnIhlUDLklmCJ7Yu47LmTxt/0Qbxy9aXvNmI0n9LXRrOvI63jKIWE/A1Z8w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=vRNV1x7L; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="vRNV1x7L" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3AA05C4CECE; Thu, 12 Dec 2024 23:04:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734044646; bh=+HrbHl3rDHja00XxAbQVDUlTGLxQ5nPTq9yadkF71tc=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=vRNV1x7LOpjUO+by3QHcN73B/7scyuKE3z7Asre47e9jPmoKibpfp5BvsBX0adG4c JYKvkuyI/QYLxF3Dtv5I61INyOQozeJGzGNF7uF4H5MyV1rlX+G121/yOUpqX62lEY DS7Aqa6PkMZrCqvrzQRC2/zENeZc522yDENVtRKv7B06MtJm7kEqyExAiAEueP/5fN FKU5epRYYStagewjUC9xGIX+B5lTMMiceeHZYQpef923rtwe0mGQI9/NLwrrfc0TCE KdiY33SfmFVr0wCufSvfcuELLiSP75Rr4cWC1cWOQuwDa5W96SQq763MkDuRocKrFj rJGAunNJS2rwg== From: Christian Brauner Date: Fri, 13 Dec 2024 00:03:44 +0100 Subject: [PATCH v3 05/10] fs: lockless mntns lookup for nsfs Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241213-work-mount-rbtree-lockless-v3-5-6e3cdaf9b280@kernel.org> References: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> In-Reply-To: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> To: Josef Bacik , Jeff Layton Cc: "Paul E. McKenney" , Peter Ziljstra , linux-fsdevel@vger.kernel.org, Christian Brauner X-Mailer: b4 0.15-dev-355e8 X-Developer-Signature: v=1; a=openpgp-sha256; l=6145; i=brauner@kernel.org; h=from:subject:message-id; bh=+HrbHl3rDHja00XxAbQVDUlTGLxQ5nPTq9yadkF71tc=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaRHZ9/yYtV381722lRU4Gv68RNXp+tM55K5l/gsdL7n1 979dRtiOkpZGMS4GGTFFFkc2k3C5ZbzVGw2ytSAmcPKBDKEgYtTACbS+46RYbOZ+DeHt7mPU+8X icg36y6wup0ZGCyf1NZzc8ku7x3JqQx/uOWuMs7wv58leHZPvnd/d6Wnl9zH+/Y1qqwSJhvLOIo ZAQ== X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 We already made the rbtree lookup lockless for the simple lookup case. However, walking the list of mount namespaces via nsfs still happens with taking the read lock blocking concurrent additions of new mount namespaces pointlessly. Plus, such additions are rare anyway so allow lockless lookup of the previous and next mount namespace by keeping a separate list. This also allows to make some things simpler in the code. Suggested-by: Peter Zijlstra Signed-off-by: Christian Brauner --- fs/mount.h | 13 ++++--------- fs/namespace.c | 42 +++++++++++++++++++++++++++++------------- fs/nsfs.c | 5 +---- 3 files changed, 34 insertions(+), 26 deletions(-) diff --git a/fs/mount.h b/fs/mount.h index 36ead0e45e8aa7614c00001102563a711d9dae6e..8cda387f47c5efd9af5e2e422569446c3d51986f 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -20,6 +20,7 @@ struct mnt_namespace { unsigned int nr_mounts; /* # of mounts in the namespace */ unsigned int pending_mounts; struct rb_node mnt_ns_tree_node; /* node in the mnt_ns_tree */ + struct list_head mnt_ns_list; /* entry in the sequential list of mounts namespace */ refcount_t passive; /* number references not pinning @mounts */ } __randomize_layout; @@ -157,15 +158,9 @@ static inline void move_from_ns(struct mount *mnt, struct list_head *dt_list) } bool has_locked_children(struct mount *mnt, struct dentry *dentry); -struct mnt_namespace *__lookup_next_mnt_ns(struct mnt_namespace *mnt_ns, bool previous); -static inline struct mnt_namespace *lookup_next_mnt_ns(struct mnt_namespace *mntns) -{ - return __lookup_next_mnt_ns(mntns, false); -} -static inline struct mnt_namespace *lookup_prev_mnt_ns(struct mnt_namespace *mntns) -{ - return __lookup_next_mnt_ns(mntns, true); -} +struct mnt_namespace *get_sequential_mnt_ns(struct mnt_namespace *mnt_ns, + bool previous); + static inline struct mnt_namespace *to_mnt_ns(struct ns_common *ns) { return container_of(ns, struct mnt_namespace, ns); diff --git a/fs/namespace.c b/fs/namespace.c index 52adee787eb1b6ee8831705b2b121854c3370fb3..71509309652315e5aa9c6b16d13de678bf1c98b3 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -82,6 +82,7 @@ static DEFINE_RWLOCK(mnt_ns_tree_lock); static seqcount_rwlock_t mnt_ns_tree_seqcount = SEQCNT_RWLOCK_ZERO(mnt_ns_tree_seqcount, &mnt_ns_tree_lock); static struct rb_root mnt_ns_tree = RB_ROOT; /* protected by mnt_ns_tree_lock */ +static LIST_HEAD(mnt_ns_list); /* protected by mnt_ns_tree_lock */ struct mount_kattr { unsigned int attr_set; @@ -142,10 +143,19 @@ static inline void mnt_ns_tree_write_unlock(void) static void mnt_ns_tree_add(struct mnt_namespace *ns) { - struct rb_node *node; + struct rb_node *node, *prev; mnt_ns_tree_write_lock(); node = rb_find_add_rcu(&ns->mnt_ns_tree_node, &mnt_ns_tree, mnt_ns_cmp); + /* + * If there's no previous entry simply add it after the + * head and if there is add it after the previous entry. + */ + prev = rb_prev(&ns->mnt_ns_tree_node); + if (!prev) + list_add_rcu(&ns->mnt_ns_list, &mnt_ns_list); + else + list_add_rcu(&ns->mnt_ns_list, &node_to_mnt_ns(prev)->mnt_ns_list); mnt_ns_tree_write_unlock(); WARN_ON_ONCE(node); @@ -177,6 +187,7 @@ static void mnt_ns_tree_remove(struct mnt_namespace *ns) if (!is_anon_ns(ns)) { mnt_ns_tree_write_lock(); rb_erase(&ns->mnt_ns_tree_node, &mnt_ns_tree); + list_bidir_del_rcu(&ns->mnt_ns_list); mnt_ns_tree_write_unlock(); } @@ -2091,30 +2102,34 @@ struct ns_common *from_mnt_ns(struct mnt_namespace *mnt) return &mnt->ns; } -struct mnt_namespace *__lookup_next_mnt_ns(struct mnt_namespace *mntns, bool previous) +struct mnt_namespace *get_sequential_mnt_ns(struct mnt_namespace *mntns, bool previous) { - guard(read_lock)(&mnt_ns_tree_lock); + guard(rcu)(); + for (;;) { - struct rb_node *node; + struct list_head *list; if (previous) - node = rb_prev(&mntns->mnt_ns_tree_node); + list = rcu_dereference(list_bidir_prev_rcu(&mntns->mnt_ns_list)); else - node = rb_next(&mntns->mnt_ns_tree_node); - if (!node) + list = rcu_dereference(list_next_rcu(&mntns->mnt_ns_list)); + if (list_is_head(list, &mnt_ns_list)) return ERR_PTR(-ENOENT); - mntns = node_to_mnt_ns(node); - node = &mntns->mnt_ns_tree_node; + mntns = list_entry_rcu(list, struct mnt_namespace, mnt_ns_list); + /* + * The last passive reference count is put with RCU + * delay so accessing the mount namespace is not just + * safe but all relevant members are still valid. + */ if (!ns_capable_noaudit(mntns->user_ns, CAP_SYS_ADMIN)) continue; /* - * Holding mnt_ns_tree_lock prevents the mount namespace from - * being freed but it may well be on it's deathbed. We want an - * active reference, not just a passive one here as we're - * persisting the mount namespace. + * We need an active reference count as we're persisting + * the mount namespace and it might already be on its + * deathbed. */ if (!refcount_inc_not_zero(&mntns->ns.count)) continue; @@ -3931,6 +3946,7 @@ static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *user_ns, bool a refcount_set(&new_ns->ns.count, 1); refcount_set(&new_ns->passive, 1); new_ns->mounts = RB_ROOT; + INIT_LIST_HEAD(&new_ns->mnt_ns_list); RB_CLEAR_NODE(&new_ns->mnt_ns_tree_node); init_waitqueue_head(&new_ns->poll); new_ns->user_ns = get_user_ns(user_ns); diff --git a/fs/nsfs.c b/fs/nsfs.c index c675fc40ce2dc674f0dafce5c4924b910a73a23f..663f8656158d52d391ba80ef1d320197d3d654e0 100644 --- a/fs/nsfs.c +++ b/fs/nsfs.c @@ -274,10 +274,7 @@ static long ns_ioctl(struct file *filp, unsigned int ioctl, if (usize < MNT_NS_INFO_SIZE_VER0) return -EINVAL; - if (previous) - mnt_ns = lookup_prev_mnt_ns(to_mnt_ns(ns)); - else - mnt_ns = lookup_next_mnt_ns(to_mnt_ns(ns)); + mnt_ns = get_sequential_mnt_ns(to_mnt_ns(ns), previous); if (IS_ERR(mnt_ns)) return PTR_ERR(mnt_ns); From patchwork Thu Dec 12 23:03:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 13906092 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EC8171DE88E for ; Thu, 12 Dec 2024 23:04:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044649; cv=none; b=HyYXJbFlbIV9kRdfSqGVY8vMOe/oPBZUb7eKW0Y6evWj8cJMQ0HGvFavXmoZvMqgfvvYCr+S34ES9IvvJRBd7HhMnw8WID6RXKLdtVlLkuSKkMiTgQGMZbcHivv362BbFljJGNvi49ze+d0N9poL6wn/KH2Mr/JfUXaTpbrL/38= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044649; c=relaxed/simple; bh=gM6o0lv+1MLCmUBeH1yTGRFwLBiTyVZcGD7SSx/p5vM=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=HOEHS1X6K0Vh+DW/hp87kWlVVAhS/cA7eAnM3NEesm8HbvDyv0fEdv611VnvKArVuPWoj+NB6Gw39zWQYdbe0ja3Y1wtLPMgLbuYrnirgJM1bGEz/XPxh/zHPQWYp4UcRkTVZlsiNWVKuzmPpkxcsVQH5QN+ptEvHgPozEhSWDg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ip1o5Q1C; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ip1o5Q1C" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 18E24C4CED0; Thu, 12 Dec 2024 23:04:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734044648; bh=gM6o0lv+1MLCmUBeH1yTGRFwLBiTyVZcGD7SSx/p5vM=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=ip1o5Q1CLA76KNTGbKjLx9ln+gaPV6MLSY7CrUaFyR8SNVUbkRPd9UP1CjFbOQ6oF wg/Ko4s/E5Kg+S0BHv4byOxGUvdjqOq03IxR7Dp4sazfhncCaY7O+SjTj1SO0hHbPr 9zEJpes7wMCAbVvd1lSIlKlcKtLzqGG7GpSVKSLLPY7ITgSveuawbLvKp4mFfRILPP 1dZ5nKYJTt4jxLXDC6EkUDhVPKC/lilUW6lDvSK8JMx5THe2XtrSrW/BytAEipjm69 Pwiv/k4l3D5DoUJPM3Ezm4SIe3D9SCS5ncsYiNxc4RvLY0DVajHHWi+ZUw0z6zOTcT wtflRcSuicV/w== From: Christian Brauner Date: Fri, 13 Dec 2024 00:03:45 +0100 Subject: [PATCH v3 06/10] fs: simplify rwlock to spinlock Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241213-work-mount-rbtree-lockless-v3-6-6e3cdaf9b280@kernel.org> References: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> In-Reply-To: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> To: Josef Bacik , Jeff Layton Cc: "Paul E. McKenney" , Peter Ziljstra , linux-fsdevel@vger.kernel.org, Christian Brauner X-Mailer: b4 0.15-dev-355e8 X-Developer-Signature: v=1; a=openpgp-sha256; l=2338; i=brauner@kernel.org; h=from:subject:message-id; bh=gM6o0lv+1MLCmUBeH1yTGRFwLBiTyVZcGD7SSx/p5vM=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaRHZ9/qa8lZFtLneHxfhdXpZ0dcNieEvlAQLHhd+Ds8Z ZXgvIvSHaUsDGJcDLJiiiwO7Sbhcst5KjYbZWrAzGFlAhnCwMUpABNxYWD4K3TTUlHkbdjaPdIv 8ljnsR71ePK6OKZVl2dr3dd9/ByWMxkZfr3+KP5p1+cPPTYum+oeeerd5L9e8+WRktic6xGPjp0 x4gIA X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 We're not taking the read_lock() anymore now that all lookup is lockless. Just use a simple spinlock. Signed-off-by: Christian Brauner --- fs/namespace.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index 71509309652315e5aa9c6b16d13de678bf1c98b3..966dcd27c81cc877837eca747babe0bc31aaf922 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -78,8 +78,7 @@ static struct kmem_cache *mnt_cache __ro_after_init; static DECLARE_RWSEM(namespace_sem); static HLIST_HEAD(unmounted); /* protected by namespace_sem */ static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */ -static DEFINE_RWLOCK(mnt_ns_tree_lock); -static seqcount_rwlock_t mnt_ns_tree_seqcount = SEQCNT_RWLOCK_ZERO(mnt_ns_tree_seqcount, &mnt_ns_tree_lock); +static DEFINE_SEQLOCK(mnt_ns_tree_lock); static struct rb_root mnt_ns_tree = RB_ROOT; /* protected by mnt_ns_tree_lock */ static LIST_HEAD(mnt_ns_list); /* protected by mnt_ns_tree_lock */ @@ -131,14 +130,12 @@ static int mnt_ns_cmp(struct rb_node *a, const struct rb_node *b) static inline void mnt_ns_tree_write_lock(void) { - write_lock(&mnt_ns_tree_lock); - write_seqcount_begin(&mnt_ns_tree_seqcount); + write_seqlock(&mnt_ns_tree_lock); } static inline void mnt_ns_tree_write_unlock(void) { - write_seqcount_end(&mnt_ns_tree_seqcount); - write_unlock(&mnt_ns_tree_lock); + write_sequnlock(&mnt_ns_tree_lock); } static void mnt_ns_tree_add(struct mnt_namespace *ns) @@ -163,7 +160,7 @@ static void mnt_ns_tree_add(struct mnt_namespace *ns) static void mnt_ns_release(struct mnt_namespace *ns) { - lockdep_assert_not_held(&mnt_ns_tree_lock); + lockdep_assert_not_held(&mnt_ns_tree_lock.lock); /* keep alive for {list,stat}mount() */ if (refcount_dec_and_test(&ns->passive)) { @@ -228,11 +225,11 @@ static struct mnt_namespace *lookup_mnt_ns(u64 mnt_ns_id) guard(rcu)(); do { - seq = read_seqcount_begin(&mnt_ns_tree_seqcount); + seq = read_seqbegin(&mnt_ns_tree_lock); node = rb_find_rcu(&mnt_ns_id, &mnt_ns_tree, mnt_ns_find); if (node) break; - } while (read_seqcount_retry(&mnt_ns_tree_seqcount, seq)); + } while (read_seqretry(&mnt_ns_tree_lock, seq)); if (!node) return NULL; From patchwork Thu Dec 12 23:03:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 13906093 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E37871DE88E for ; Thu, 12 Dec 2024 23:04:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044651; cv=none; b=H/LwRJFyizC6bHI9NkKesaXsYrSFWHCWdgPm+VrNFCBwDeV6NWxsoBh5CEYLC4ysA5PomzeW2z9o8lPoRhXPtowRu3AjMHhraE1tbvVKIUollvl13kyuuuZEgSeuGRa4lWuZdghq0L3M2yJNruRUdsTnga0wGMM6cSZ3E6qNX4s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044651; c=relaxed/simple; bh=A0fa9RzXL1zVDpIQaB/+hbO+wEsCtHZ6TdzZepRZYiY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Naa5oWYVkNfv2++8ReAKg0arsDMS2vTiN2ONO/cq+rf+Tbp7PCiEayaWXpuucq8OkgIqo1Iw8LUmDniPSmbQzorDEHZdWweqOkFLVCaLKPRGszycJ0VIsvi5ykiL3B9iD3B+WoGaeC6yOFXiWr4JJ+Mo4WxbsCviQ5+i7wKrg1k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=FXTJa+eN; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="FXTJa+eN" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1E6E8C4CECE; Thu, 12 Dec 2024 23:04:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734044650; bh=A0fa9RzXL1zVDpIQaB/+hbO+wEsCtHZ6TdzZepRZYiY=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=FXTJa+eNETVK9dzSMJ0UqU2O24leWqj3S4JUa03I/BoN4kpyZEcIZTg42wCniXw5q z2pZQThzWPvDd0Fp2UFkCllpd3IgldCYvFM4UPoeLNyLrb4frCH5LYpqOC00w1layH hNL6Grz+Yui8ZISGhUDIoKw9fnQbiGJTeS4gkqrOmLPJ/R9Wt0DoI1xk0bzfr77CaF VOOcxyHGZymEkghPgvong/a1D2ce5yt9iOmCy0OeYeopwNHgA0cH0ioZXHiXkFwO0/ REW75lUb2XB/ielFO/SsgIWJRi3sgbZUt92vAOjgDfpsVmpPgv/jw6RbEIhNpVEtPy t6HCB+5O9bpJQ== From: Christian Brauner Date: Fri, 13 Dec 2024 00:03:46 +0100 Subject: [PATCH v3 07/10] seltests: move nsfs into filesystems subfolder Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241213-work-mount-rbtree-lockless-v3-7-6e3cdaf9b280@kernel.org> References: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> In-Reply-To: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> To: Josef Bacik , Jeff Layton Cc: "Paul E. McKenney" , Peter Ziljstra , linux-fsdevel@vger.kernel.org, Christian Brauner X-Mailer: b4 0.15-dev-355e8 X-Developer-Signature: v=1; a=openpgp-sha256; l=2043; i=brauner@kernel.org; h=from:subject:message-id; bh=A0fa9RzXL1zVDpIQaB/+hbO+wEsCtHZ6TdzZepRZYiY=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaRHZ99q7n5/8ZPl4vP9AeZXpVWOvWW3XXx/rqB+znr9R WImm7e5dJSyMIhxMciKKbI4tJuEyy3nqdhslKkBM4eVCWQIAxenAEzE+A3D/+iQWE9j/0t/vl3I 66u9O+Phg4NPN+/79Ga56fd370KUD6Uz/FPpNe0+ZukYdvbZ3tX+1gpsSaFSs2/l+11cxJr28nm 7Ki8A X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 I'm going to be adding new tests for it and it belongs under filesystem selftests. Signed-off-by: Christian Brauner --- tools/testing/selftests/{ => filesystems}/nsfs/.gitignore | 0 tools/testing/selftests/{ => filesystems}/nsfs/Makefile | 2 +- tools/testing/selftests/{ => filesystems}/nsfs/config | 0 tools/testing/selftests/{ => filesystems}/nsfs/owner.c | 0 tools/testing/selftests/{ => filesystems}/nsfs/pidns.c | 0 5 files changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/nsfs/.gitignore b/tools/testing/selftests/filesystems/nsfs/.gitignore similarity index 100% rename from tools/testing/selftests/nsfs/.gitignore rename to tools/testing/selftests/filesystems/nsfs/.gitignore diff --git a/tools/testing/selftests/nsfs/Makefile b/tools/testing/selftests/filesystems/nsfs/Makefile similarity index 82% rename from tools/testing/selftests/nsfs/Makefile rename to tools/testing/selftests/filesystems/nsfs/Makefile index dd9bd50b7b936e3ff16274260e149bc8d9cd23f3..c2f3ca6e488e9ddb49514e1b8e93909d5594259b 100644 --- a/tools/testing/selftests/nsfs/Makefile +++ b/tools/testing/selftests/filesystems/nsfs/Makefile @@ -3,4 +3,4 @@ TEST_GEN_PROGS := owner pidns CFLAGS := -Wall -Werror -include ../lib.mk +include ../../lib.mk diff --git a/tools/testing/selftests/nsfs/config b/tools/testing/selftests/filesystems/nsfs/config similarity index 100% rename from tools/testing/selftests/nsfs/config rename to tools/testing/selftests/filesystems/nsfs/config diff --git a/tools/testing/selftests/nsfs/owner.c b/tools/testing/selftests/filesystems/nsfs/owner.c similarity index 100% rename from tools/testing/selftests/nsfs/owner.c rename to tools/testing/selftests/filesystems/nsfs/owner.c diff --git a/tools/testing/selftests/nsfs/pidns.c b/tools/testing/selftests/filesystems/nsfs/pidns.c similarity index 100% rename from tools/testing/selftests/nsfs/pidns.c rename to tools/testing/selftests/filesystems/nsfs/pidns.c From patchwork Thu Dec 12 23:03:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 13906094 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 989011DE8B3 for ; Thu, 12 Dec 2024 23:04:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044652; cv=none; b=sIQcVb0G5ti9PvcEaabdChd+1tDx8YJd9hdGMBaVsEEapmgtq771h3GNKx140tKG025dz+QkVKq6JcJsLdsDf3XKq8kJjzlJGqHuuhuuos+/85l49uAuuuxlFFOqNlwMq3bv16zBHGvNl8MyEluyebpGRxvhBoIYcqQx0zKu4gc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044652; c=relaxed/simple; bh=al/0U83Z/lglQjfYEzBk3wEY2KZcIRmum1v/l7IqoME=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=aukOzynXW4hBnrPMIqJHP02vICs54wWdEel5JEiJPB68sdEczb7zb8pB3COiS617mDtzewRp81+bGGi0HyfsG8TZWuiHLV4DCoWjxkIadOZ+StHBUAlCA2G2lqIRhxV3jvK53jWajPpBFdrBBjemGkx1yO4JFFQnzaRwxZ4TeII= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=loChbCzr; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="loChbCzr" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0385FC4CED0; Thu, 12 Dec 2024 23:04:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734044652; bh=al/0U83Z/lglQjfYEzBk3wEY2KZcIRmum1v/l7IqoME=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=loChbCzrJz5uf1CKxBt/M7lj9TCnLugNY10oVCRQXteeb66LR21g/vnIxplz/OF7v EabjpINwIOSEKMIdGCAJ/Wk3E5ASOk2HusHbQczkrcjZNvHtMMyMptX9YG2dS6M6uG vSbBM37tjTB7ZGe4BkC9NGWXHQZaaLXV4PgI80KjoaAqFJZJWUd8sipA66seCieT0E /+7Vd5XOvXdy/oPoCMRHMeF4h8YSnP3+rlnDCjctRC9knh0ITm74z4J/XgpPS9U4zp hVcr/jH4ts/D7TWVuUScvThkDijeM+V9Jk9pzPJcTxIVTRZUUWN2T4I8E9+V/ia3Pu SiRi6OU0j6INA== From: Christian Brauner Date: Fri, 13 Dec 2024 00:03:47 +0100 Subject: [PATCH v3 08/10] selftests: add tests for mntns iteration Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241213-work-mount-rbtree-lockless-v3-8-6e3cdaf9b280@kernel.org> References: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> In-Reply-To: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> To: Josef Bacik , Jeff Layton Cc: "Paul E. McKenney" , Peter Ziljstra , linux-fsdevel@vger.kernel.org, Christian Brauner X-Mailer: b4 0.15-dev-355e8 X-Developer-Signature: v=1; a=openpgp-sha256; l=5730; i=brauner@kernel.org; h=from:subject:message-id; bh=al/0U83Z/lglQjfYEzBk3wEY2KZcIRmum1v/l7IqoME=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaRHZ9/a3bD81N0j0+d7Ff4+fPW1TYDaqmP11kk9Itbu5 /+1beua0FHKwiDGxSArpsji0G4SLrecp2KzUaYGzBxWJpAhDFycAjARJ2ZGhsWCphO5LpYeaAr0 8shymXa0z3hG+Kvg0wvW8+X3+fdtFGdkuLZ3Terl7pNP+iYls6555r9A5nHP55Lz7VEOdl/7w8V CmQE= X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Test that forward and backward iteration works correctly. Signed-off-by: Christian Brauner --- .../testing/selftests/filesystems/nsfs/.gitignore | 1 + tools/testing/selftests/filesystems/nsfs/Makefile | 2 +- .../selftests/filesystems/nsfs/iterate_mntns.c | 149 +++++++++++++++++++++ 3 files changed, 151 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/filesystems/nsfs/.gitignore b/tools/testing/selftests/filesystems/nsfs/.gitignore index ed79ebdf286e4d945cfbbf80fb072ba3e05c9112..92a8249006d1e0817800df0057183a94ef0f939d 100644 --- a/tools/testing/selftests/filesystems/nsfs/.gitignore +++ b/tools/testing/selftests/filesystems/nsfs/.gitignore @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0-only owner pidns +iterate_mntns diff --git a/tools/testing/selftests/filesystems/nsfs/Makefile b/tools/testing/selftests/filesystems/nsfs/Makefile index c2f3ca6e488e9ddb49514e1b8e93909d5594259b..231aaa7dfd95c638c23e0a8e5a1d4f7f16f00f7b 100644 --- a/tools/testing/selftests/filesystems/nsfs/Makefile +++ b/tools/testing/selftests/filesystems/nsfs/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0-only -TEST_GEN_PROGS := owner pidns +TEST_GEN_PROGS := owner pidns iterate_mntns CFLAGS := -Wall -Werror diff --git a/tools/testing/selftests/filesystems/nsfs/iterate_mntns.c b/tools/testing/selftests/filesystems/nsfs/iterate_mntns.c new file mode 100644 index 0000000000000000000000000000000000000000..457cf76f3c5f368872292714b44c037968de4ad3 --- /dev/null +++ b/tools/testing/selftests/filesystems/nsfs/iterate_mntns.c @@ -0,0 +1,149 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +// Copyright (c) 2024 Christian Brauner + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include + +#include "../../kselftest_harness.h" + +#define MNT_NS_COUNT 11 +#define MNT_NS_LAST_INDEX 10 + +struct mnt_ns_info { + __u32 size; + __u32 nr_mounts; + __u64 mnt_ns_id; +}; + +#define MNT_NS_INFO_SIZE_VER0 16 /* size of first published struct */ + +/* Get information about namespace. */ +#define NS_MNT_GET_INFO _IOR(0xb7, 10, struct mnt_ns_info) +/* Get next namespace. */ +#define NS_MNT_GET_NEXT _IOR(0xb7, 11, struct mnt_ns_info) +/* Get previous namespace. */ +#define NS_MNT_GET_PREV _IOR(0xb7, 12, struct mnt_ns_info) + +FIXTURE(iterate_mount_namespaces) { + int fd_mnt_ns[MNT_NS_COUNT]; + __u64 mnt_ns_id[MNT_NS_COUNT]; +}; + +FIXTURE_SETUP(iterate_mount_namespaces) +{ + for (int i = 0; i < MNT_NS_COUNT; i++) + self->fd_mnt_ns[i] = -EBADF; + + /* + * Creating a new user namespace let's us guarantee that we only see + * mount namespaces that we did actually create. + */ + ASSERT_EQ(unshare(CLONE_NEWUSER), 0); + + for (int i = 0; i < MNT_NS_COUNT; i++) { + struct mnt_ns_info info = {}; + + ASSERT_EQ(unshare(CLONE_NEWNS), 0); + self->fd_mnt_ns[i] = open("/proc/self/ns/mnt", O_RDONLY | O_CLOEXEC); + ASSERT_GE(self->fd_mnt_ns[i], 0); + ASSERT_EQ(ioctl(self->fd_mnt_ns[i], NS_MNT_GET_INFO, &info), 0); + self->mnt_ns_id[i] = info.mnt_ns_id; + } +} + +FIXTURE_TEARDOWN(iterate_mount_namespaces) +{ + for (int i = 0; i < MNT_NS_COUNT; i++) { + if (self->fd_mnt_ns[i] < 0) + continue; + ASSERT_EQ(close(self->fd_mnt_ns[i]), 0); + } +} + +TEST_F(iterate_mount_namespaces, iterate_all_forward) +{ + int fd_mnt_ns_cur, count = 0; + + fd_mnt_ns_cur = fcntl(self->fd_mnt_ns[0], F_DUPFD_CLOEXEC); + ASSERT_GE(fd_mnt_ns_cur, 0); + + for (;; count++) { + struct mnt_ns_info info = {}; + int fd_mnt_ns_next; + + fd_mnt_ns_next = ioctl(fd_mnt_ns_cur, NS_MNT_GET_NEXT, &info); + if (fd_mnt_ns_next < 0 && errno == ENOENT) + break; + ASSERT_GE(fd_mnt_ns_next, 0); + ASSERT_EQ(close(fd_mnt_ns_cur), 0); + fd_mnt_ns_cur = fd_mnt_ns_next; + } + ASSERT_EQ(count, MNT_NS_LAST_INDEX); +} + +TEST_F(iterate_mount_namespaces, iterate_all_backwards) +{ + int fd_mnt_ns_cur, count = 0; + + fd_mnt_ns_cur = fcntl(self->fd_mnt_ns[MNT_NS_LAST_INDEX], F_DUPFD_CLOEXEC); + ASSERT_GE(fd_mnt_ns_cur, 0); + + for (;; count++) { + struct mnt_ns_info info = {}; + int fd_mnt_ns_prev; + + fd_mnt_ns_prev = ioctl(fd_mnt_ns_cur, NS_MNT_GET_PREV, &info); + if (fd_mnt_ns_prev < 0 && errno == ENOENT) + break; + ASSERT_GE(fd_mnt_ns_prev, 0); + ASSERT_EQ(close(fd_mnt_ns_cur), 0); + fd_mnt_ns_cur = fd_mnt_ns_prev; + } + ASSERT_EQ(count, MNT_NS_LAST_INDEX); +} + +TEST_F(iterate_mount_namespaces, iterate_forward) +{ + int fd_mnt_ns_cur; + + ASSERT_EQ(setns(self->fd_mnt_ns[0], CLONE_NEWNS), 0); + + fd_mnt_ns_cur = self->fd_mnt_ns[0]; + for (int i = 1; i < MNT_NS_COUNT; i++) { + struct mnt_ns_info info = {}; + int fd_mnt_ns_next; + + fd_mnt_ns_next = ioctl(fd_mnt_ns_cur, NS_MNT_GET_NEXT, &info); + ASSERT_GE(fd_mnt_ns_next, 0); + ASSERT_EQ(close(fd_mnt_ns_cur), 0); + fd_mnt_ns_cur = fd_mnt_ns_next; + ASSERT_EQ(info.mnt_ns_id, self->mnt_ns_id[i]); + } +} + +TEST_F(iterate_mount_namespaces, iterate_backward) +{ + int fd_mnt_ns_cur; + + ASSERT_EQ(setns(self->fd_mnt_ns[MNT_NS_LAST_INDEX], CLONE_NEWNS), 0); + + fd_mnt_ns_cur = self->fd_mnt_ns[MNT_NS_LAST_INDEX]; + for (int i = MNT_NS_LAST_INDEX - 1; i >= 0; i--) { + struct mnt_ns_info info = {}; + int fd_mnt_ns_prev; + + fd_mnt_ns_prev = ioctl(fd_mnt_ns_cur, NS_MNT_GET_PREV, &info); + ASSERT_GE(fd_mnt_ns_prev, 0); + ASSERT_EQ(close(fd_mnt_ns_cur), 0); + fd_mnt_ns_cur = fd_mnt_ns_prev; + ASSERT_EQ(info.mnt_ns_id, self->mnt_ns_id[i]); + } +} + +TEST_HARNESS_MAIN From patchwork Thu Dec 12 23:03:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 13906095 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D799F1DE884 for ; Thu, 12 Dec 2024 23:04:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044654; cv=none; b=uuXeVH+p2iu3CdPRI0wV+D+E/PQWIgIPI5GCmq61MKnt4bVf00d6ybxd3UQlXStE04D+sCSKRZOKQe9Lrtw+P8Hb29MApHUS2Xcw1fP5HevKJmebWR891SQ1RJcXtw+GxR8GtewvBdsHESc9DroRzr0fNO2kyrXZBw35k2Sgook= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044654; c=relaxed/simple; bh=7v4bOxKWz5L9zkkJvMX4FPAPpL80DWtTdu+GzAsXCsI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=u+jbOJTgrdXIkfunZnMrevtLGiyRW5tu/LRQ1xUU4vJjhmjJ7ZO8n/aBgAb2SkEzDOHijm8zTGNbRsqkgJOGCVkzyRzG1LPfC0M4pTh5OUDVNMy52jAJ12V2V+M7HE4yM9CCoNE2KgIo9qWdZ5w4hGm2OTNqYBofeVkQwkh96hc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=r/wyPSWO; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="r/wyPSWO" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 038EEC4CECE; Thu, 12 Dec 2024 23:04:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734044654; bh=7v4bOxKWz5L9zkkJvMX4FPAPpL80DWtTdu+GzAsXCsI=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=r/wyPSWORS18nJSbcBGbV/xGcoyNaVKq+7Mo0e7DB6N1pgueFG1q0ulHEs59ypTIS kUnjOKDQB/cyEYAJgkoAxcLuYtQLgDlEob1VEM+YzPd53ctsTO1xDW1lL3n6HUD9qo lbZywBUadCj5LYFMjQDfYTh1vpnFvUFdwhaflDmz0QHHuymDMN/Vs9QbEPO8SCRcAx sxUaZTZzy5ycszKdtYQxPcQq/f8LawKetR+wWt6GGnJkofosyCRDu/d/yKOS5LdxhF QhtsDiKeXcCMo8zlsZtqIcw1H/vY4d9xhpplsSac9Mf+iBImSudJlHq1mrdE4tur+s qu/CvR7qQxzTg== From: Christian Brauner Date: Fri, 13 Dec 2024 00:03:48 +0100 Subject: [PATCH v3 09/10] selftests: remove unneeded include Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241213-work-mount-rbtree-lockless-v3-9-6e3cdaf9b280@kernel.org> References: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> In-Reply-To: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> To: Josef Bacik , Jeff Layton Cc: "Paul E. McKenney" , Peter Ziljstra , linux-fsdevel@vger.kernel.org, Christian Brauner X-Mailer: b4 0.15-dev-355e8 X-Developer-Signature: v=1; a=openpgp-sha256; l=731; i=brauner@kernel.org; h=from:subject:message-id; bh=7v4bOxKWz5L9zkkJvMX4FPAPpL80DWtTdu+GzAsXCsI=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaRHZ98ymWjgLxx/eaLykpm/3vVfE9vx4u0d00Rbm7g93 8uS3u4p7ihlYRDjYpAVU2RxaDcJl1vOU7HZKFMDZg4rE8gQBi5OAZjIVmOG/x6ZgpbrHzh2s2sa bTlo6Jv6d4XksbvVZve3Hjxue/jU788M/wwSz/I0PU7mED/0hU/KSWjvrm6Xt7e1udMDLz+7dI2 ZhwEA X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 The pidfd header will be included in a sample program and this pulls in all the mount definitions that would be causing problems. Signed-off-by: Christian Brauner --- tools/testing/selftests/pidfd/pidfd.h | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/testing/selftests/pidfd/pidfd.h b/tools/testing/selftests/pidfd/pidfd.h index 88d6830ee004df3c7a9d3ebcdab89d5775e9ab9b..3a96053e52e7bbf5f7f85908c2093e9023b1d3d6 100644 --- a/tools/testing/selftests/pidfd/pidfd.h +++ b/tools/testing/selftests/pidfd/pidfd.h @@ -12,7 +12,6 @@ #include #include #include -#include #include #include From patchwork Thu Dec 12 23:03:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 13906096 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7CD0E1E493 for ; Thu, 12 Dec 2024 23:04:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044656; cv=none; b=Zt/FYqbEFvDSjXdWuWr2yNXVkgvTZCAFkj92cVQBmibsclz5LfNLIC73AczHttydyaek5uxDeEr08/EJ2J4xvUajhQrXTUFtkTf6jmSc9vX8Hhl9fXhyvWRNiLYiIqYylRNT1mL2MS2X2IN6KtkPbMcWnWb/DgOjiSxzFBgOg8Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734044656; c=relaxed/simple; bh=sml9215PPcFAlh8pcjjb9M8oaHdJe+E09N8aBMLleno=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=fxGsva3E2vUHN7TCSjgteYg5+zZe+yRGwz4J+gn9pZbRnnYPXOEHdTsjd1+KJOvTNKsmbLIoIjPGBY5zYA9Wxr4RAScgQ7aLDvighhPRVGCNrytyEKB3rqpTVjczECE804zjhJ7goE/6BIeUEpYps01z1Kp2/2sU9KibA3pu1HI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Ns24MSAl; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Ns24MSAl" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D7B78C4CED0; Thu, 12 Dec 2024 23:04:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1734044656; bh=sml9215PPcFAlh8pcjjb9M8oaHdJe+E09N8aBMLleno=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=Ns24MSAlnxZ1jez5AxxwHJww30rRw6vy4J/qJ0sL137A2aQdodladg7saeshKeQj4 3Ql7z0qcWJU/s7RAtdx5SjcPk6VbxvUse2GjTtGMoRtNpzu/eglTx+T3J9On//STdZ a/jUg28sdl3EnfnEHErXy1h15BiI7rqHv7Llh+WbuO63mDV2tdbDKFwwJkGxzdY2aI v7CVDOX1ET/lugmSPFlTF8nKpXF1B0hGW3EFLBV+oVJH91HpQJNojFxUrs7oIGKc2r yrC5UNdPXcl0bQChY1XBNHZy5QH+gK1qoKl04QYNmmU56jF5fsEh08VoUgxeqfvRDi dU+rVM1DpjP+Q== From: Christian Brauner Date: Fri, 13 Dec 2024 00:03:49 +0100 Subject: [PATCH v3 10/10] samples: add test-list-all-mounts Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20241213-work-mount-rbtree-lockless-v3-10-6e3cdaf9b280@kernel.org> References: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> In-Reply-To: <20241213-work-mount-rbtree-lockless-v3-0-6e3cdaf9b280@kernel.org> To: Josef Bacik , Jeff Layton Cc: "Paul E. McKenney" , Peter Ziljstra , linux-fsdevel@vger.kernel.org, Christian Brauner X-Mailer: b4 0.15-dev-355e8 X-Developer-Signature: v=1; a=openpgp-sha256; l=7958; i=brauner@kernel.org; h=from:subject:message-id; bh=sml9215PPcFAlh8pcjjb9M8oaHdJe+E09N8aBMLleno=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaRHZ99mmbkwrlbh/7I87bVbW/bMn331wdmPlkaG04K99 Lt8Tz+V7ihlYRDjYpAVU2RxaDcJl1vOU7HZKFMDZg4rE8gQBi5OAZhIIj8jw5ZzhlsNf375tvH7 gcU3JxW1pLM3JsiJu//cfvj8pWtLjM4xMty4weyuse/blDTJre7bkyemJQnXNu8SePB63tl/Tq+ TvjICAA== X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Add a sample program illustrating how to list all mounts in all mount namespaces. Signed-off-by: Christian Brauner --- samples/vfs/.gitignore | 1 + samples/vfs/Makefile | 2 +- samples/vfs/test-list-all-mounts.c | 235 +++++++++++++++++++++++++++++++++++++ 3 files changed, 237 insertions(+), 1 deletion(-) diff --git a/samples/vfs/.gitignore b/samples/vfs/.gitignore index 79212d91285bca72b0ff85f28aaccd2e803ac092..8694dd17b318768b975ece5c7cd450c2cca67318 100644 --- a/samples/vfs/.gitignore +++ b/samples/vfs/.gitignore @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0-only /test-fsmount +/test-list-all-mounts /test-statx diff --git a/samples/vfs/Makefile b/samples/vfs/Makefile index 6377a678134acf0d682151d751d2f5042dbf5e0a..301be72a52a0e376c7ebe235cc2058992919cc78 100644 --- a/samples/vfs/Makefile +++ b/samples/vfs/Makefile @@ -1,4 +1,4 @@ # SPDX-License-Identifier: GPL-2.0-only -userprogs-always-y += test-fsmount test-statx +userprogs-always-y += test-fsmount test-statx test-list-all-mounts userccflags += -I usr/include diff --git a/samples/vfs/test-list-all-mounts.c b/samples/vfs/test-list-all-mounts.c new file mode 100644 index 0000000000000000000000000000000000000000..f372d5aea4717fd1ab3d4b3f9af79316cd5dd3d3 --- /dev/null +++ b/samples/vfs/test-list-all-mounts.c @@ -0,0 +1,235 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +// Copyright (c) 2024 Christian Brauner + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include + +#include "../../tools/testing/selftests/pidfd/pidfd.h" + +#define die_errno(format, ...) \ + do { \ + fprintf(stderr, "%m | %s: %d: %s: " format "\n", __FILE__, \ + __LINE__, __func__, ##__VA_ARGS__); \ + exit(EXIT_FAILURE); \ + } while (0) + +/* Get the id for a mount namespace */ +#define NS_GET_MNTNS_ID _IO(0xb7, 0x5) +/* Get next mount namespace. */ + +struct mnt_ns_info { + __u32 size; + __u32 nr_mounts; + __u64 mnt_ns_id; +}; + +#define MNT_NS_INFO_SIZE_VER0 16 /* size of first published struct */ + +/* Get information about namespace. */ +#define NS_MNT_GET_INFO _IOR(0xb7, 10, struct mnt_ns_info) +/* Get next namespace. */ +#define NS_MNT_GET_NEXT _IOR(0xb7, 11, struct mnt_ns_info) +/* Get previous namespace. */ +#define NS_MNT_GET_PREV _IOR(0xb7, 12, struct mnt_ns_info) + +#define PIDFD_GET_MNT_NAMESPACE _IO(0xFF, 3) + +#ifndef __NR_listmount +#define __NR_listmount 458 +#endif + +#ifndef __NR_statmount +#define __NR_statmount 457 +#endif + +/* @mask bits for statmount(2) */ +#define STATMOUNT_SB_BASIC 0x00000001U /* Want/got sb_... */ +#define STATMOUNT_MNT_BASIC 0x00000002U /* Want/got mnt_... */ +#define STATMOUNT_PROPAGATE_FROM 0x00000004U /* Want/got propagate_from */ +#define STATMOUNT_MNT_ROOT 0x00000008U /* Want/got mnt_root */ +#define STATMOUNT_MNT_POINT 0x00000010U /* Want/got mnt_point */ +#define STATMOUNT_FS_TYPE 0x00000020U /* Want/got fs_type */ +#define STATMOUNT_MNT_NS_ID 0x00000040U /* Want/got mnt_ns_id */ +#define STATMOUNT_MNT_OPTS 0x00000080U /* Want/got mnt_opts */ + +#define STATX_MNT_ID_UNIQUE 0x00004000U /* Want/got extended stx_mount_id */ + +struct statmount { + __u32 size; + __u32 mnt_opts; + __u64 mask; + __u32 sb_dev_major; + __u32 sb_dev_minor; + __u64 sb_magic; + __u32 sb_flags; + __u32 fs_type; + __u64 mnt_id; + __u64 mnt_parent_id; + __u32 mnt_id_old; + __u32 mnt_parent_id_old; + __u64 mnt_attr; + __u64 mnt_propagation; + __u64 mnt_peer_group; + __u64 mnt_master; + __u64 propagate_from; + __u32 mnt_root; + __u32 mnt_point; + __u64 mnt_ns_id; + __u64 __spare2[49]; + char str[]; +}; + +struct mnt_id_req { + __u32 size; + __u32 spare; + __u64 mnt_id; + __u64 param; + __u64 mnt_ns_id; +}; + +#define MNT_ID_REQ_SIZE_VER1 32 /* sizeof second published struct */ + +#define LSMT_ROOT 0xffffffffffffffff /* root mount */ + +static int __statmount(__u64 mnt_id, __u64 mnt_ns_id, __u64 mask, + struct statmount *stmnt, size_t bufsize, + unsigned int flags) +{ + struct mnt_id_req req = { + .size = MNT_ID_REQ_SIZE_VER1, + .mnt_id = mnt_id, + .param = mask, + .mnt_ns_id = mnt_ns_id, + }; + + return syscall(__NR_statmount, &req, stmnt, bufsize, flags); +} + +static struct statmount *sys_statmount(__u64 mnt_id, __u64 mnt_ns_id, + __u64 mask, unsigned int flags) +{ + size_t bufsize = 1 << 15; + struct statmount *stmnt = NULL, *tmp = NULL; + int ret; + + for (;;) { + tmp = realloc(stmnt, bufsize); + if (!tmp) + goto out; + + stmnt = tmp; + ret = __statmount(mnt_id, mnt_ns_id, mask, stmnt, bufsize, flags); + if (!ret) + return stmnt; + + if (errno != EOVERFLOW) + goto out; + + bufsize <<= 1; + if (bufsize >= UINT_MAX / 2) + goto out; + } + +out: + free(stmnt); + return NULL; +} + +static ssize_t sys_listmount(__u64 mnt_id, __u64 last_mnt_id, __u64 mnt_ns_id, + __u64 list[], size_t num, unsigned int flags) +{ + struct mnt_id_req req = { + .size = MNT_ID_REQ_SIZE_VER1, + .mnt_id = mnt_id, + .param = last_mnt_id, + .mnt_ns_id = mnt_ns_id, + }; + + return syscall(__NR_listmount, &req, list, num, flags); +} + +int main(int argc, char *argv[]) +{ +#define LISTMNT_BUFFER 10 + __u64 list[LISTMNT_BUFFER], last_mnt_id = 0; + int ret, pidfd, fd_mntns; + struct mnt_ns_info info = {}; + + pidfd = sys_pidfd_open(getpid(), 0); + if (pidfd < 0) + die_errno("pidfd_open failed"); + + fd_mntns = ioctl(pidfd, PIDFD_GET_MNT_NAMESPACE, 0); + if (fd_mntns < 0) + die_errno("ioctl(PIDFD_GET_MNT_NAMESPACE) failed"); + + ret = ioctl(fd_mntns, NS_MNT_GET_INFO, &info); + if (ret < 0) + die_errno("ioctl(NS_GET_MNTNS_ID) failed"); + + printf("Listing %u mounts for mount namespace %llu\n", + info.nr_mounts, info.mnt_ns_id); + for (;;) { + ssize_t nr_mounts; +next: + nr_mounts = sys_listmount(LSMT_ROOT, last_mnt_id, + info.mnt_ns_id, list, LISTMNT_BUFFER, + 0); + if (nr_mounts <= 0) { + int fd_mntns_next; + + printf("Finished listing %u mounts for mount namespace %llu\n\n", + info.nr_mounts, info.mnt_ns_id); + fd_mntns_next = ioctl(fd_mntns, NS_MNT_GET_NEXT, &info); + if (fd_mntns_next < 0) { + if (errno == ENOENT) { + printf("Finished listing all mount namespaces\n"); + exit(0); + } + die_errno("ioctl(NS_MNT_GET_NEXT) failed"); + } + close(fd_mntns); + fd_mntns = fd_mntns_next; + last_mnt_id = 0; + printf("Listing %u mounts for mount namespace %llu\n", + info.nr_mounts, info.mnt_ns_id); + goto next; + } + + for (size_t cur = 0; cur < nr_mounts; cur++) { + struct statmount *stmnt; + + last_mnt_id = list[cur]; + + stmnt = sys_statmount(last_mnt_id, info.mnt_ns_id, + STATMOUNT_SB_BASIC | + STATMOUNT_MNT_BASIC | + STATMOUNT_MNT_ROOT | + STATMOUNT_MNT_POINT | + STATMOUNT_MNT_NS_ID | + STATMOUNT_MNT_OPTS | + STATMOUNT_FS_TYPE, 0); + if (!stmnt) { + printf("Failed to statmount(%llu) in mount namespace(%llu)\n", + last_mnt_id, info.mnt_ns_id); + continue; + } + + printf("mnt_id:\t\t%llu\nmnt_parent_id:\t%llu\nfs_type:\t%s\nmnt_root:\t%s\nmnt_point:\t%s\nmnt_opts:\t%s\n\n", + stmnt->mnt_id, + stmnt->mnt_parent_id, + stmnt->str + stmnt->fs_type, + stmnt->str + stmnt->mnt_root, + stmnt->str + stmnt->mnt_point, + stmnt->str + stmnt->mnt_opts); + free(stmnt); + } + } + + exit(0); +}