From patchwork Tue Dec 10 20:57:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 13902046 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4CE4922B5A0 for ; Tue, 10 Dec 2024 20:58:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733864291; cv=none; b=LGdBZA9FWFjAj4SLLifzzENx7nofhETL+2D0VgqGDUOgvaNavTB8AJG65uf2Ju39CMVaVNkQmq22vL2dlMGRcVAYm4L6cF+rltg9OcKZvSYuoqIE0Refzx/0+4rxtQLduSlqxw6ETQLb2SHfLsQ/PyhIQF8fCTtfgjtPlsijg1Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733864291; c=relaxed/simple; bh=fU88zLcq6ecPFqxhMBQ+2G9JGuR1hEfsB9CdeCQqDmY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jc0uqYm6YD1U0mKTik+SW4VJSS4L7ZU7g3w+Qr/ZlYYiBdh2Z4IaYex9C8SJvDYQyPdPnMZ0UnCwGArpJp6gM/D8wwpwGIbuZeiIgjNxvFNJgOFI/8P3v08imhnluRk0xVXMjHuA2X917bT6axqvbqDuOgTjk1Ldwjhtq7tzOow= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ohl3R311; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ohl3R311" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F1FB3C4CEDF; Tue, 10 Dec 2024 20:58:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733864291; bh=fU88zLcq6ecPFqxhMBQ+2G9JGuR1hEfsB9CdeCQqDmY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ohl3R311bMXmES2lHe7e5X+Aagz3dAm4svaOf0A6Ge11BOMmX7/wr3DpinOKAsG9q tYl/1PWU6VrrOvbwv7BBOkWJiTuP04vQAiiYgu7sruH8LuS4LifrYVxHyyH18fVkuJ ckj3U1rCiNbDEAxUO08wTStfRntSdKxqNyhB6RAsvzKsbJBz/d4qRDJ/fncf6jiDVi bJzJttcTtZaFdsiPvUTMpht+2TQycGw7xPToY5MpAyJGRAKA1sKzkyBhZn94FaOCYg jqYMKuwJFzJFNlUAe5Vo6If48antZBcddwok3jESeLyAo1aqMq17S1ls2pKdXxPKbb Y4K2tn0PXWVvw== From: Christian Brauner To: Josef Bacik , Jeff Layton Cc: Christian Brauner , linux-fsdevel@vger.kernel.org Subject: [PATCH 1/5] mount: remove inlude/nospec.h include Date: Tue, 10 Dec 2024 21:57:57 +0100 Message-ID: <20241210-work-mount-rbtree-lockless-v1-1-338366b9bbe4@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241210-work-mount-rbtree-lockless-v1-0-338366b9bbe4@kernel.org> References: <20241210-work-mount-rbtree-lockless-v1-0-338366b9bbe4@kernel.org> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mailer: b4 0.15-dev-355e8 X-Developer-Signature: v=1; a=openpgp-sha256; l=545; i=brauner@kernel.org; h=from:subject:message-id; bh=fU88zLcq6ecPFqxhMBQ+2G9JGuR1hEfsB9CdeCQqDmY=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaRHrA5/sc3EvXPesdvb7nywze2u+bwnb+MhZtHzl1cbV L16vl94akcpC4MYF4OsmCKLQ7tJuNxynorNRpkaMHNYmUCGMHBxCsBEQjsZ/qdbb/3e96Jq/YsL t9X9V0jtd+MwWRm0tMn9z9V3jVev32Jk+O+w2XP1pFpPrcN1a3Ye0mC+ruJccNSjVHV6Ys7B84F HjjMCAA== X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 It's not needed, so remove it. Signed-off-by: Christian Brauner --- fs/namespace.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/namespace.c b/fs/namespace.c index 23e81c2a1e3fee7d97df2a84a69438a677933654..c3dbe6a7ab6b1c77c2693cc75941da89fa921048 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -32,7 +32,6 @@ #include #include #include -#include #include "pnode.h" #include "internal.h" From patchwork Tue Dec 10 20:57:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 13902047 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1062522B5A0 for ; Tue, 10 Dec 2024 20:58:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733864293; cv=none; b=AcJJTNKP9pI4YjkblH4i14rvs3z9xQV8UdE1SuX/YSk0YjUBHa7Mkw/qxYlFjFV9qh/xLYVLsDeHa4UepzWhMzRd7F2ZjBmfNgazf0bmsKTLAIuGBb3PCFGZtKGIyzFeyKDPGiPGf+tlgFre3QdavWr5aWPlfE4AvA8piom+Ino= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733864293; c=relaxed/simple; bh=jkMKbLRTLoaucwliP/7pN/v0DVUIV8X0yAaVu+aF01E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=EShXw+How1y11seo5NoX4g6JhKn2VPpeUbkfUw8o2uSsVzIwaPgFnnzel2gFwfMPn+vFF1yimKJHLe2rR7hIWYARZjFbpsL+WTFVQW5/32qsxeGqYttG+Rv9bB0s00n+5U03b36VtUFX3Jlxigaq0WQ3nc+gb7C3VFqPZNu9JSs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DnzDYCuS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DnzDYCuS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B4103C4CEDF; Tue, 10 Dec 2024 20:58:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733864292; bh=jkMKbLRTLoaucwliP/7pN/v0DVUIV8X0yAaVu+aF01E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=DnzDYCuS2VN/QrE5TV6e6YPL+ZT5bR5VlsT59ajlvW87ws8PJo8IeYFoasEEJq0w0 Ihd56wt3xXjE/MiF6PocLR2u//aRwyCHXc17Lc855hYT1EmXolAmdTBAMsU+T+PqLB EPL8LGdtY1GLiJT7SSHJyKN3CV24qHTL0Lws8oZ2THmbJ6ZLDuaysNFalyge098hSB QpN6seamiQxu0QclU0b4042+sxC1YhaYacGVFBwG5UkITkXj/5ZHZQOsRFyDAOYIPS 4yODoegVCmp3fHpE6TnPMzyguA+1MOw2sD86ppDOAA6Dxzs8VNfbG+U5JNnCF5xUXr fQjJnZ5xQZSYA== From: Christian Brauner To: Josef Bacik , Jeff Layton Cc: Christian Brauner , linux-fsdevel@vger.kernel.org Subject: [PATCH 2/5] fs: add mount namespace to rbtree late Date: Tue, 10 Dec 2024 21:57:58 +0100 Message-ID: <20241210-work-mount-rbtree-lockless-v1-2-338366b9bbe4@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241210-work-mount-rbtree-lockless-v1-0-338366b9bbe4@kernel.org> References: <20241210-work-mount-rbtree-lockless-v1-0-338366b9bbe4@kernel.org> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mailer: b4 0.15-dev-355e8 X-Developer-Signature: v=1; a=openpgp-sha256; l=964; i=brauner@kernel.org; h=from:subject:message-id; bh=jkMKbLRTLoaucwliP/7pN/v0DVUIV8X0yAaVu+aF01E=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaRHrA5/8yzVZ+Mp57tnHNs/TJ91XIJ7V1DgOo6dooeZC 883urzO7ChlYRDjYpAVU2RxaDcJl1vOU7HZKFMDZg4rE8gQBi5OAZjIWl6G/6VFjJztnXWscvMF s18kdO1jTjisxZC183r0D8aKEO/SJwz//TVueTe9fj/t9cxWk9KZWyOOcQl8b2kIWrP06f5/mSs XcgIA X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 There's no point doing that under the namespace semaphore it just gives the false impression that it protects the mount namespace rbtree and it simply doesn't. Signed-off-by: Christian Brauner --- fs/namespace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/namespace.c b/fs/namespace.c index c3dbe6a7ab6b1c77c2693cc75941da89fa921048..10fa18dd66018fadfdc9d18c59a851eed7bd55ad 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -3983,7 +3983,6 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns, while (p->mnt.mnt_root != q->mnt.mnt_root) p = next_mnt(skip_mnt_tree(p), old); } - mnt_ns_tree_add(new_ns); namespace_unlock(); if (rootmnt) @@ -3991,6 +3990,7 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns, if (pwdmnt) mntput(pwdmnt); + mnt_ns_tree_add(new_ns); return new_ns; } From patchwork Tue Dec 10 20:57:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 13902048 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C519422B596 for ; Tue, 10 Dec 2024 20:58:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733864294; cv=none; b=AhzX0O5s1kxSnzcQDzHo5rd1Qm8vgvZ6VuKB2RaFWgdeMb0766U4P5gQMtkrmE8u48PY3kYvaDus9IJIkRLwYSqgUmP95Kub4+WPA2AhTtrxsvBlK1rxAzDcaftPgobT0gfG7jTUdnYQ56ueoesaIXSfHZlNuIDdsCdfmMsSawg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733864294; c=relaxed/simple; bh=eyBvy/l7ngDTdBGmraPBL6uSWAS16HYF9BkT++0RqHA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QMhrnUdyLQeV4KkPqeqU27NpD2mrdLRZpYzGB8qgJhmD+KGXBsgimU9ka3aLAwu1EQwo4S9j8fuEv+gNfmafDwL0US69gX37cAd6PyAxmsfjr5FyC5eWr4UcohlA7SX2p4Le3KsGPMcOSqDf6De8eQ/wB1YMpt3mqq0kllEUJco= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=XpPr7Tf/; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="XpPr7Tf/" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7461CC4CED6; Tue, 10 Dec 2024 20:58:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733864294; bh=eyBvy/l7ngDTdBGmraPBL6uSWAS16HYF9BkT++0RqHA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=XpPr7Tf/EP86+w4O614OWPLrENGDQ3aP3cHdOeAW7pa65SMzAUPabIFkizV4KAtlF cA86A5HQ6WFa1eLgO6z2guJ2vobMelZlhV1feZtccjaN9f3G3RhGcM7VHugoTaKpDz n6buxSZLHuwKoEs4iQiOQVTri43LujbHixDQPy1T2xbCoyMyjCEQ4krP+MP7/avhpb vN850GkZ9g/s2/cg7dEOt47i8Tejo26OxA2cRy+FoxmNRQhKYVa4qv3tWgYXD7sUmW JhuzasK0KU/OL5bFDn1TTg/uQ3weBCQrFrekwYI9RAb55fk+5/iQKkzRbOjOdm6K5O 9J5Lv/JWNfeDQ== From: Christian Brauner To: Josef Bacik , Jeff Layton Cc: Christian Brauner , linux-fsdevel@vger.kernel.org Subject: [PATCH 3/5] fs: lockless mntns rbtree lookup Date: Tue, 10 Dec 2024 21:57:59 +0100 Message-ID: <20241210-work-mount-rbtree-lockless-v1-3-338366b9bbe4@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241210-work-mount-rbtree-lockless-v1-0-338366b9bbe4@kernel.org> References: <20241210-work-mount-rbtree-lockless-v1-0-338366b9bbe4@kernel.org> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mailer: b4 0.15-dev-355e8 X-Developer-Signature: v=1; a=openpgp-sha256; l=6625; i=brauner@kernel.org; h=from:subject:message-id; bh=eyBvy/l7ngDTdBGmraPBL6uSWAS16HYF9BkT++0RqHA=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaRHrI7o90jMPJJoZlUzZ/NXCc4lVvlLVj/Zoakb71Ly/ /b+6DqVjlIWBjEuBlkxRRaHdpNwueU8FZuNMjVg5rAygQxh4OIUgInUTWdk2FH5OrGYpV/x9G8z A+ZJMhu2de7xftJjy75CKNjF7l7HF4b/zjxvDj1e4VU9aY+hwjF9cy6bqVucVszl7U3ZsIfHeck 9bgA= X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Currently we use a read-write lock but for the simple search case we can make this lockless. Creating a new mount namespace is a rather rare event compared with querying mounts in a foreign mount namespace. Once this is picked up by e.g., systemd to list mounts in another mount in it's isolated services or in containers this will be used a lot so this seems worthwhile doing. Signed-off-by: Christian Brauner --- fs/mount.h | 5 ++- fs/namespace.c | 99 ++++++++++++++++++++++++++++++++++++++++------------------ 2 files changed, 73 insertions(+), 31 deletions(-) diff --git a/fs/mount.h b/fs/mount.h index 185fc56afc13338f8185fe818051444d540cbd5b..3c3763d8ae821d6a117c528808dbc94d0251f964 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -16,7 +16,10 @@ struct mnt_namespace { u64 event; unsigned int nr_mounts; /* # of mounts in the namespace */ unsigned int pending_mounts; - struct rb_node mnt_ns_tree_node; /* node in the mnt_ns_tree */ + union { + struct rb_node mnt_ns_tree_node; /* node in the mnt_ns_tree */ + struct rcu_head mnt_ns_rcu; + }; refcount_t passive; /* number references not pinning @mounts */ } __randomize_layout; diff --git a/fs/namespace.c b/fs/namespace.c index 10fa18dd66018fadfdc9d18c59a851eed7bd55ad..21e990482c5b2e1844d17413b55b58803fa7b008 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -79,6 +79,8 @@ static DECLARE_RWSEM(namespace_sem); static HLIST_HEAD(unmounted); /* protected by namespace_sem */ static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */ static DEFINE_RWLOCK(mnt_ns_tree_lock); +static seqcount_rwlock_t mnt_ns_tree_seqcount = SEQCNT_RWLOCK_ZERO(mnt_ns_tree_seqcount, &mnt_ns_tree_lock); + static struct rb_root mnt_ns_tree = RB_ROOT; /* protected by mnt_ns_tree_lock */ struct mount_kattr { @@ -105,17 +107,6 @@ EXPORT_SYMBOL_GPL(fs_kobj); */ __cacheline_aligned_in_smp DEFINE_SEQLOCK(mount_lock); -static int mnt_ns_cmp(u64 seq, const struct mnt_namespace *ns) -{ - u64 seq_b = ns->seq; - - if (seq < seq_b) - return -1; - if (seq > seq_b) - return 1; - return 0; -} - static inline struct mnt_namespace *node_to_mnt_ns(const struct rb_node *node) { if (!node) @@ -123,19 +114,41 @@ static inline struct mnt_namespace *node_to_mnt_ns(const struct rb_node *node) return rb_entry(node, struct mnt_namespace, mnt_ns_tree_node); } -static bool mnt_ns_less(struct rb_node *a, const struct rb_node *b) +static int mnt_ns_cmp(struct rb_node *a, const struct rb_node *b) { struct mnt_namespace *ns_a = node_to_mnt_ns(a); struct mnt_namespace *ns_b = node_to_mnt_ns(b); u64 seq_a = ns_a->seq; + u64 seq_b = ns_b->seq; - return mnt_ns_cmp(seq_a, ns_b) < 0; + if (seq_a < seq_b) + return -1; + if (seq_a > seq_b) + return 1; + return 0; +} + +static inline void mnt_ns_tree_write_lock(void) +{ + write_lock(&mnt_ns_tree_lock); + write_seqcount_begin(&mnt_ns_tree_seqcount); +} + +static inline void mnt_ns_tree_write_unlock(void) +{ + write_seqcount_end(&mnt_ns_tree_seqcount); + write_unlock(&mnt_ns_tree_lock); } static void mnt_ns_tree_add(struct mnt_namespace *ns) { - guard(write_lock)(&mnt_ns_tree_lock); - rb_add(&ns->mnt_ns_tree_node, &mnt_ns_tree, mnt_ns_less); + struct rb_node *node; + + mnt_ns_tree_write_lock(); + node = rb_find_add_rcu(&ns->mnt_ns_tree_node, &mnt_ns_tree, mnt_ns_cmp); + mnt_ns_tree_write_unlock(); + + WARN_ON_ONCE(node); } static void mnt_ns_release(struct mnt_namespace *ns) @@ -150,15 +163,24 @@ static void mnt_ns_release(struct mnt_namespace *ns) } DEFINE_FREE(mnt_ns_release, struct mnt_namespace *, if (_T) mnt_ns_release(_T)) +static void mnt_ns_release_rcu(struct rcu_head *rcu) +{ + struct mnt_namespace *mnt_ns; + + mnt_ns = container_of(rcu, struct mnt_namespace, mnt_ns_rcu); + mnt_ns_release(mnt_ns); +} + static void mnt_ns_tree_remove(struct mnt_namespace *ns) { /* remove from global mount namespace list */ if (!is_anon_ns(ns)) { - guard(write_lock)(&mnt_ns_tree_lock); + mnt_ns_tree_write_lock(); rb_erase(&ns->mnt_ns_tree_node, &mnt_ns_tree); + mnt_ns_tree_write_unlock(); } - mnt_ns_release(ns); + call_rcu(&ns->mnt_ns_rcu, mnt_ns_release_rcu); } /* @@ -168,23 +190,23 @@ static void mnt_ns_tree_remove(struct mnt_namespace *ns) static struct mnt_namespace *mnt_ns_find_id_at(u64 mnt_ns_id) { struct rb_node *node = mnt_ns_tree.rb_node; - struct mnt_namespace *ret = NULL; + struct mnt_namespace *mnt_ns = NULL; - lockdep_assert_held(&mnt_ns_tree_lock); + lockdep_assert(rcu_read_lock_held()); while (node) { struct mnt_namespace *n = node_to_mnt_ns(node); if (mnt_ns_id <= n->seq) { - ret = node_to_mnt_ns(node); + mnt_ns = node_to_mnt_ns(node); if (mnt_ns_id == n->seq) break; - node = node->rb_left; + node = rcu_dereference_raw(node->rb_left); } else { - node = node->rb_right; + node = rcu_dereference_raw(node->rb_right); } } - return ret; + return mnt_ns; } /* @@ -194,18 +216,35 @@ static struct mnt_namespace *mnt_ns_find_id_at(u64 mnt_ns_id) * namespace the @namespace_sem must first be acquired. If the namespace has * already shut down before acquiring @namespace_sem, {list,stat}mount() will * see that the mount rbtree of the namespace is empty. + * + * Note the lookup is lockless protected by a sequence counter. We only + * need to guard against false negatives as false positives aren't + * possible. So if we didn't find a mount namespace and the sequence + * counter has changed we need to retry. If the sequence counter is + * still the same we know the search actually failed. */ static struct mnt_namespace *lookup_mnt_ns(u64 mnt_ns_id) { - struct mnt_namespace *ns; + struct mnt_namespace *ns; + unsigned int seq; - guard(read_lock)(&mnt_ns_tree_lock); - ns = mnt_ns_find_id_at(mnt_ns_id); - if (!ns || ns->seq != mnt_ns_id) - return NULL; + guard(rcu)(); + do { + seq = read_seqcount_begin(&mnt_ns_tree_seqcount); + ns = mnt_ns_find_id_at(mnt_ns_id); + if (ns) + break; + } while (read_seqcount_retry(&mnt_ns_tree_seqcount, seq)); - refcount_inc(&ns->passive); - return ns; + if (!ns || ns->seq != mnt_ns_id) + return NULL; + + /* + * The last reference count is put with after RCU delay so we + * don't need to use refcount_inc_not_zero(). + */ + refcount_inc(&ns->passive); + return ns; } static inline void lock_mount_hash(void) From patchwork Tue Dec 10 20:58:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 13902049 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8A4B422B5A9 for ; Tue, 10 Dec 2024 20:58:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733864296; cv=none; b=D/tdx3sRBxHw8GDKnMi1n64Kbw26HOYHdgWHGHK46TeiQpkJgswr5dIfeWmumb6AHjI7zQZ64Vvu6CZYgi+94qtsEWTUsu+4xMv+YI0gBTjTWpG2A9JZTWnUC+ADHFBBwWzE0XSlACJnYXiHtog+VlNxoqTPa7+hFEk5L1066ec= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733864296; c=relaxed/simple; bh=7v4bOxKWz5L9zkkJvMX4FPAPpL80DWtTdu+GzAsXCsI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=GtK44QDRk1H6I3f3tIGUnRthStoGYrLcsL8ZDeWfrA//yKvhQLGMs/Q1RDjR6iHU6u3u6Zgy5GjMCMMWblVNd1iYgQq/3atAOF9QfqBKuqDEelzL8m4tAnOqlpUUHbZHNJ3jWcTAXKdbjlax9DMEaFVRnJu0+p0dSIv4NXcvYA0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NgbOsbAu; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NgbOsbAu" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 322FFC4CEE5; Tue, 10 Dec 2024 20:58:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733864296; bh=7v4bOxKWz5L9zkkJvMX4FPAPpL80DWtTdu+GzAsXCsI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NgbOsbAus531PeJzMa20WQGOtVuNVNYosj60QHaFAvWvprtn8bQikAoEMbwGZT834 cPFYPhl9wMbqh5MUFPKV99BbSVTgPPSN6WjYYx1wvyfx5691DtRFu2XZuAPRSr8Kma wmceZF/pkmmZ/W1q7rDRm+RGBdZwcyP8rgWmmOTgH1Bvfp1KA5Zm3DuLINgfxDblk9 38YFfPkk0acu3v4aqVfBZ22BHizzL1+zAwKpUy3iPLW/4YR/EVRpbspxEdCmB6W3+R tfCclrL4ckSYF1zDO/NedyUPBHpEZF8xQ7kfAiR2CufXhJK9FAsnk6vqc0bQsLGJBx BPcNhaiQU8Nng== From: Christian Brauner To: Josef Bacik , Jeff Layton Cc: Christian Brauner , linux-fsdevel@vger.kernel.org Subject: [PATCH 4/5] selftests: remove unneeded include Date: Tue, 10 Dec 2024 21:58:00 +0100 Message-ID: <20241210-work-mount-rbtree-lockless-v1-4-338366b9bbe4@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241210-work-mount-rbtree-lockless-v1-0-338366b9bbe4@kernel.org> References: <20241210-work-mount-rbtree-lockless-v1-0-338366b9bbe4@kernel.org> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mailer: b4 0.15-dev-355e8 X-Developer-Signature: v=1; a=openpgp-sha256; l=731; i=brauner@kernel.org; h=from:subject:message-id; bh=7v4bOxKWz5L9zkkJvMX4FPAPpL80DWtTdu+GzAsXCsI=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaRHrI5Y+eI8u0Pr61MGe+Ld83O/X/Xps7sjP21rk7ezq u2el+bVHaUsDGJcDLJiiiwO7Sbhcst5KjYbZWrAzGFlAhnCwMUpADdZnZFhhufh+luq+n+XBXqX zu6U8N0889efw9cvOYo52zVWTJc6w/BXYhdLyv9qZnmnJ/nuDTyZfhK6v8MindtvuqRJ6Z1r28k PAA== X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 The pidfd header will be included in a sample program and this pulls in all the mount definitions that would be causing problems. Signed-off-by: Christian Brauner --- tools/testing/selftests/pidfd/pidfd.h | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/testing/selftests/pidfd/pidfd.h b/tools/testing/selftests/pidfd/pidfd.h index 88d6830ee004df3c7a9d3ebcdab89d5775e9ab9b..3a96053e52e7bbf5f7f85908c2093e9023b1d3d6 100644 --- a/tools/testing/selftests/pidfd/pidfd.h +++ b/tools/testing/selftests/pidfd/pidfd.h @@ -12,7 +12,6 @@ #include #include #include -#include #include #include From patchwork Tue Dec 10 20:58:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 13902050 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A667422B596 for ; Tue, 10 Dec 2024 20:58:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733864299; cv=none; b=HahpXDqDZFAA4AFP4d0rx72jCGXqjeihrlXnJ1SFLTmro4LSgQXyK0TXzC6D0TMwnqwWptR7rrR/0ZajD7Q3/YB+er4xf6qrKdjYAIxXz+TqakmSxxOfcjT4Ytmn4hgwj7Q6MZp8fgo2k869NZF9Xyvv936ztwFxZss7tm0bHsY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733864299; c=relaxed/simple; bh=sml9215PPcFAlh8pcjjb9M8oaHdJe+E09N8aBMLleno=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=LJ6oozU/73JhanYVyLY6Z1ULrp5llA3D6SzN/4zPPaypiKxaTJL53ZtXtpHPBUEsSUw29/es/DfIPGFEzsGdEYtWTI8a2nEbq8OwB06VZRGxcaoiTngaKpjS2xGRw9nP3wypl1kPEcKQK02bqKpTUEFEsewGWh9HYqYOn3i2ZnQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=pM1eTso+; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="pM1eTso+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 101D2C4CEDF; Tue, 10 Dec 2024 20:58:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733864298; bh=sml9215PPcFAlh8pcjjb9M8oaHdJe+E09N8aBMLleno=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=pM1eTso+mjlClzp7s6IONsQ00af5yWz80S+eii57yh9TUSkGmiznE40cUIihuQpvy ObYrvGDPdZcrgomoewbGkpIozoaoF9V1C+MlOY/2m/YiWSxjxOeJDldybZf8AMMkC4 nT0LzetGKMu1utf5i1T+5tEeRmGwvlAflgCcx0eIUv+7yCPw/Rrbr23ks+olUm8wr2 vP6VYhhAL1Y9ZyoeRfmi5/RyMogn2nke6wybQlcmaO1ZRLDHUFTJ06qPrz9uonqtsE yxGajiHHZ3Dua7ZmuDOA93yyrja+oeuoe68aFYyltSIUfuT76hrmOWKShdirG7NCXQ GIxGemoY9HVpg== From: Christian Brauner To: Josef Bacik , Jeff Layton Cc: Christian Brauner , linux-fsdevel@vger.kernel.org Subject: [PATCH 5/5] samples: add test-list-all-mounts Date: Tue, 10 Dec 2024 21:58:01 +0100 Message-ID: <20241210-work-mount-rbtree-lockless-v1-5-338366b9bbe4@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241210-work-mount-rbtree-lockless-v1-0-338366b9bbe4@kernel.org> References: <20241210-work-mount-rbtree-lockless-v1-0-338366b9bbe4@kernel.org> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mailer: b4 0.15-dev-355e8 X-Developer-Signature: v=1; a=openpgp-sha256; l=7958; i=brauner@kernel.org; h=from:subject:message-id; bh=sml9215PPcFAlh8pcjjb9M8oaHdJe+E09N8aBMLleno=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaRHrI6U+XfskOfUWZxmEsz7FR2Oy+gl9e4Qaoh5MU3+b sKT0CPuHaUsDGJcDLJiiiwO7Sbhcst5KjYbZWrAzGFlAhnCwMUpABM5dpqR4ZaMDF/A61XXrJde mbf4/q6iaWHTKydO4Nj0wPTY55ulV34y/Hd9H//6pvSjquo2tQcFN+yCFy7mnyqS8OSImpeB+/9 HEgwA X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Add a sample program illustrating how to list all mounts in all mount namespaces. Signed-off-by: Christian Brauner --- samples/vfs/.gitignore | 1 + samples/vfs/Makefile | 2 +- samples/vfs/test-list-all-mounts.c | 235 +++++++++++++++++++++++++++++++++++++ 3 files changed, 237 insertions(+), 1 deletion(-) diff --git a/samples/vfs/.gitignore b/samples/vfs/.gitignore index 79212d91285bca72b0ff85f28aaccd2e803ac092..8694dd17b318768b975ece5c7cd450c2cca67318 100644 --- a/samples/vfs/.gitignore +++ b/samples/vfs/.gitignore @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0-only /test-fsmount +/test-list-all-mounts /test-statx diff --git a/samples/vfs/Makefile b/samples/vfs/Makefile index 6377a678134acf0d682151d751d2f5042dbf5e0a..301be72a52a0e376c7ebe235cc2058992919cc78 100644 --- a/samples/vfs/Makefile +++ b/samples/vfs/Makefile @@ -1,4 +1,4 @@ # SPDX-License-Identifier: GPL-2.0-only -userprogs-always-y += test-fsmount test-statx +userprogs-always-y += test-fsmount test-statx test-list-all-mounts userccflags += -I usr/include diff --git a/samples/vfs/test-list-all-mounts.c b/samples/vfs/test-list-all-mounts.c new file mode 100644 index 0000000000000000000000000000000000000000..f372d5aea4717fd1ab3d4b3f9af79316cd5dd3d3 --- /dev/null +++ b/samples/vfs/test-list-all-mounts.c @@ -0,0 +1,235 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +// Copyright (c) 2024 Christian Brauner + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include + +#include "../../tools/testing/selftests/pidfd/pidfd.h" + +#define die_errno(format, ...) \ + do { \ + fprintf(stderr, "%m | %s: %d: %s: " format "\n", __FILE__, \ + __LINE__, __func__, ##__VA_ARGS__); \ + exit(EXIT_FAILURE); \ + } while (0) + +/* Get the id for a mount namespace */ +#define NS_GET_MNTNS_ID _IO(0xb7, 0x5) +/* Get next mount namespace. */ + +struct mnt_ns_info { + __u32 size; + __u32 nr_mounts; + __u64 mnt_ns_id; +}; + +#define MNT_NS_INFO_SIZE_VER0 16 /* size of first published struct */ + +/* Get information about namespace. */ +#define NS_MNT_GET_INFO _IOR(0xb7, 10, struct mnt_ns_info) +/* Get next namespace. */ +#define NS_MNT_GET_NEXT _IOR(0xb7, 11, struct mnt_ns_info) +/* Get previous namespace. */ +#define NS_MNT_GET_PREV _IOR(0xb7, 12, struct mnt_ns_info) + +#define PIDFD_GET_MNT_NAMESPACE _IO(0xFF, 3) + +#ifndef __NR_listmount +#define __NR_listmount 458 +#endif + +#ifndef __NR_statmount +#define __NR_statmount 457 +#endif + +/* @mask bits for statmount(2) */ +#define STATMOUNT_SB_BASIC 0x00000001U /* Want/got sb_... */ +#define STATMOUNT_MNT_BASIC 0x00000002U /* Want/got mnt_... */ +#define STATMOUNT_PROPAGATE_FROM 0x00000004U /* Want/got propagate_from */ +#define STATMOUNT_MNT_ROOT 0x00000008U /* Want/got mnt_root */ +#define STATMOUNT_MNT_POINT 0x00000010U /* Want/got mnt_point */ +#define STATMOUNT_FS_TYPE 0x00000020U /* Want/got fs_type */ +#define STATMOUNT_MNT_NS_ID 0x00000040U /* Want/got mnt_ns_id */ +#define STATMOUNT_MNT_OPTS 0x00000080U /* Want/got mnt_opts */ + +#define STATX_MNT_ID_UNIQUE 0x00004000U /* Want/got extended stx_mount_id */ + +struct statmount { + __u32 size; + __u32 mnt_opts; + __u64 mask; + __u32 sb_dev_major; + __u32 sb_dev_minor; + __u64 sb_magic; + __u32 sb_flags; + __u32 fs_type; + __u64 mnt_id; + __u64 mnt_parent_id; + __u32 mnt_id_old; + __u32 mnt_parent_id_old; + __u64 mnt_attr; + __u64 mnt_propagation; + __u64 mnt_peer_group; + __u64 mnt_master; + __u64 propagate_from; + __u32 mnt_root; + __u32 mnt_point; + __u64 mnt_ns_id; + __u64 __spare2[49]; + char str[]; +}; + +struct mnt_id_req { + __u32 size; + __u32 spare; + __u64 mnt_id; + __u64 param; + __u64 mnt_ns_id; +}; + +#define MNT_ID_REQ_SIZE_VER1 32 /* sizeof second published struct */ + +#define LSMT_ROOT 0xffffffffffffffff /* root mount */ + +static int __statmount(__u64 mnt_id, __u64 mnt_ns_id, __u64 mask, + struct statmount *stmnt, size_t bufsize, + unsigned int flags) +{ + struct mnt_id_req req = { + .size = MNT_ID_REQ_SIZE_VER1, + .mnt_id = mnt_id, + .param = mask, + .mnt_ns_id = mnt_ns_id, + }; + + return syscall(__NR_statmount, &req, stmnt, bufsize, flags); +} + +static struct statmount *sys_statmount(__u64 mnt_id, __u64 mnt_ns_id, + __u64 mask, unsigned int flags) +{ + size_t bufsize = 1 << 15; + struct statmount *stmnt = NULL, *tmp = NULL; + int ret; + + for (;;) { + tmp = realloc(stmnt, bufsize); + if (!tmp) + goto out; + + stmnt = tmp; + ret = __statmount(mnt_id, mnt_ns_id, mask, stmnt, bufsize, flags); + if (!ret) + return stmnt; + + if (errno != EOVERFLOW) + goto out; + + bufsize <<= 1; + if (bufsize >= UINT_MAX / 2) + goto out; + } + +out: + free(stmnt); + return NULL; +} + +static ssize_t sys_listmount(__u64 mnt_id, __u64 last_mnt_id, __u64 mnt_ns_id, + __u64 list[], size_t num, unsigned int flags) +{ + struct mnt_id_req req = { + .size = MNT_ID_REQ_SIZE_VER1, + .mnt_id = mnt_id, + .param = last_mnt_id, + .mnt_ns_id = mnt_ns_id, + }; + + return syscall(__NR_listmount, &req, list, num, flags); +} + +int main(int argc, char *argv[]) +{ +#define LISTMNT_BUFFER 10 + __u64 list[LISTMNT_BUFFER], last_mnt_id = 0; + int ret, pidfd, fd_mntns; + struct mnt_ns_info info = {}; + + pidfd = sys_pidfd_open(getpid(), 0); + if (pidfd < 0) + die_errno("pidfd_open failed"); + + fd_mntns = ioctl(pidfd, PIDFD_GET_MNT_NAMESPACE, 0); + if (fd_mntns < 0) + die_errno("ioctl(PIDFD_GET_MNT_NAMESPACE) failed"); + + ret = ioctl(fd_mntns, NS_MNT_GET_INFO, &info); + if (ret < 0) + die_errno("ioctl(NS_GET_MNTNS_ID) failed"); + + printf("Listing %u mounts for mount namespace %llu\n", + info.nr_mounts, info.mnt_ns_id); + for (;;) { + ssize_t nr_mounts; +next: + nr_mounts = sys_listmount(LSMT_ROOT, last_mnt_id, + info.mnt_ns_id, list, LISTMNT_BUFFER, + 0); + if (nr_mounts <= 0) { + int fd_mntns_next; + + printf("Finished listing %u mounts for mount namespace %llu\n\n", + info.nr_mounts, info.mnt_ns_id); + fd_mntns_next = ioctl(fd_mntns, NS_MNT_GET_NEXT, &info); + if (fd_mntns_next < 0) { + if (errno == ENOENT) { + printf("Finished listing all mount namespaces\n"); + exit(0); + } + die_errno("ioctl(NS_MNT_GET_NEXT) failed"); + } + close(fd_mntns); + fd_mntns = fd_mntns_next; + last_mnt_id = 0; + printf("Listing %u mounts for mount namespace %llu\n", + info.nr_mounts, info.mnt_ns_id); + goto next; + } + + for (size_t cur = 0; cur < nr_mounts; cur++) { + struct statmount *stmnt; + + last_mnt_id = list[cur]; + + stmnt = sys_statmount(last_mnt_id, info.mnt_ns_id, + STATMOUNT_SB_BASIC | + STATMOUNT_MNT_BASIC | + STATMOUNT_MNT_ROOT | + STATMOUNT_MNT_POINT | + STATMOUNT_MNT_NS_ID | + STATMOUNT_MNT_OPTS | + STATMOUNT_FS_TYPE, 0); + if (!stmnt) { + printf("Failed to statmount(%llu) in mount namespace(%llu)\n", + last_mnt_id, info.mnt_ns_id); + continue; + } + + printf("mnt_id:\t\t%llu\nmnt_parent_id:\t%llu\nfs_type:\t%s\nmnt_root:\t%s\nmnt_point:\t%s\nmnt_opts:\t%s\n\n", + stmnt->mnt_id, + stmnt->mnt_parent_id, + stmnt->str + stmnt->fs_type, + stmnt->str + stmnt->mnt_root, + stmnt->str + stmnt->mnt_point, + stmnt->str + stmnt->mnt_opts); + free(stmnt); + } + } + + exit(0); +}