From patchwork Wed Oct 25 14:01:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miklos Szeredi X-Patchwork-Id: 13436217 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98B6028E04 for ; Wed, 25 Oct 2023 14:02:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="gKjFUt2Z" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6AB1F1A7 for ; Wed, 25 Oct 2023 07:02:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1698242542; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Rktb275++autG2dBh3wJ1wvqmgDO3Aq+V8t/EjWFz4M=; b=gKjFUt2Zljbb4IQmB/FtWfq/OzJpJlgJReDlwn+FpTlZeGH1B/7nrv77AHFbwyq87nmRBH bY1fmMOhDzG4Zm8YRv6ZVm+0RjjTp4/XcSEe2zAYoCTi0vXerdj8nfW3rVI5fOLms+xJ9K /oI1zPvlCnwJc2VO54mZR9MsgW6Y0sA= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-657-JRmrPJPMNwmHcfggvo6oPQ-1; Wed, 25 Oct 2023 10:02:10 -0400 X-MC-Unique: JRmrPJPMNwmHcfggvo6oPQ-1 Received: by mail-ej1-f71.google.com with SMTP id a640c23a62f3a-9b274cc9636so356055966b.0 for ; Wed, 25 Oct 2023 07:02:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698242529; x=1698847329; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Rktb275++autG2dBh3wJ1wvqmgDO3Aq+V8t/EjWFz4M=; b=naE+c02sdmCoY/FcKxHVgijUQTJQUAl8bJMO+5AcXTG/CkOyMJqF2ZnX/Fb5Iv60/k ZtPYNWEre/iTo7wpJFFxazP8zycws+8hm0l1nMsRqAxsy8pQVPAFxnj56jNWKzIiL9bg ikNcy+bWw4M/SheZrr0ZLOKIVUd5nuWgz6viEfjkwnL9fQLxUWBXnVgKTWdcY2en/mwx AYvrvbO/mn/e97b/UsyGeid26Gkw40l0ACOV+w5L+98Dp0OH64TLK1eT4Ldcn2pCe7BF TxvbnIlMWMpFq8reXuDRLP/vQmcLNABJPIVGvGJzfKCMQx6uDpRei23wNvgReaZ0oRhM Lrfg== X-Gm-Message-State: AOJu0YySNzIC18dW7L74eOxVAI8NgEK0swCOd2KPsLzaHRyAJY56gVRh hHIF5EE4qqPcjw/+1CnWX1xo5vyvcYpCZ532l8uLU43mpgjjPxAMdrmDsKFW8AeUosigzN0Pw6e 0vS9xjvf5/56W3JVGzxRXXcV+e5X9pBzaveADItSrbfTdY8bobhBg0ZnFsxkmW9ic2SFt0LNrkC hQKyG2aW4+sw== X-Received: by 2002:a17:907:da3:b0:9c6:7ec2:e14e with SMTP id go35-20020a1709070da300b009c67ec2e14emr14335811ejc.50.1698242529164; Wed, 25 Oct 2023 07:02:09 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH+8SDOqqdIbWYcTzs6rDMO30AFnhMq2M9Jv3S4dbLlbYvENSTjg76T9Wfx8VJm7Ip4DabhCA== X-Received: by 2002:a17:907:da3:b0:9c6:7ec2:e14e with SMTP id go35-20020a1709070da300b009c67ec2e14emr14335764ejc.50.1698242528629; Wed, 25 Oct 2023 07:02:08 -0700 (PDT) Received: from maszat.piliscsaba.szeredi.hu (92-249-235-200.pool.digikabel.hu. [92.249.235.200]) by smtp.gmail.com with ESMTPSA id vl9-20020a170907b60900b00989828a42e8sm9857073ejc.154.2023.10.25.07.02.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Oct 2023 07:02:08 -0700 (PDT) From: Miklos Szeredi To: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-man@vger.kernel.org, linux-security-module@vger.kernel.org, Karel Zak , Ian Kent , David Howells , Linus Torvalds , Al Viro , Christian Brauner , Amir Goldstein , Matthew House , Florian Weimer , Arnd Bergmann Subject: [PATCH v4 1/6] add unique mount ID Date: Wed, 25 Oct 2023 16:01:59 +0200 Message-ID: <20231025140205.3586473-2-mszeredi@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231025140205.3586473-1-mszeredi@redhat.com> References: <20231025140205.3586473-1-mszeredi@redhat.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 If a mount is released then its mnt_id can immediately be reused. This is bad news for user interfaces that want to uniquely identify a mount. Implementing a unique mount ID is trivial (use a 64bit counter). Unfortunately userspace assumes 32bit size and would overflow after the counter reaches 2^32. Introduce a new 64bit ID alongside the old one. Initialize the counter to 2^32, this guarantees that the old and new IDs are never mixed up. Signed-off-by: Miklos Szeredi --- fs/mount.h | 3 ++- fs/namespace.c | 4 ++++ fs/stat.c | 9 +++++++-- include/uapi/linux/stat.h | 1 + 4 files changed, 14 insertions(+), 3 deletions(-) diff --git a/fs/mount.h b/fs/mount.h index 130c07c2f8d2..a14f762b3f29 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -72,7 +72,8 @@ struct mount { struct fsnotify_mark_connector __rcu *mnt_fsnotify_marks; __u32 mnt_fsnotify_mask; #endif - int mnt_id; /* mount identifier */ + int mnt_id; /* mount identifier, reused */ + u64 mnt_id_unique; /* mount ID unique until reboot */ int mnt_group_id; /* peer group identifier */ int mnt_expiry_mark; /* true if marked for expiry */ struct hlist_head mnt_pins; diff --git a/fs/namespace.c b/fs/namespace.c index e157efc54023..e02bc5f41c7b 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -68,6 +68,9 @@ static u64 event; static DEFINE_IDA(mnt_id_ida); static DEFINE_IDA(mnt_group_ida); +/* Don't allow confusion with old 32bit mount ID */ +static atomic64_t mnt_id_ctr = ATOMIC64_INIT(1ULL << 32); + static struct hlist_head *mount_hashtable __read_mostly; static struct hlist_head *mountpoint_hashtable __read_mostly; static struct kmem_cache *mnt_cache __read_mostly; @@ -131,6 +134,7 @@ static int mnt_alloc_id(struct mount *mnt) if (res < 0) return res; mnt->mnt_id = res; + mnt->mnt_id_unique = atomic64_inc_return(&mnt_id_ctr); return 0; } diff --git a/fs/stat.c b/fs/stat.c index d43a5cc1bfa4..77878ae48a0f 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -243,8 +243,13 @@ static int vfs_statx(int dfd, struct filename *filename, int flags, error = vfs_getattr(&path, stat, request_mask, flags); - stat->mnt_id = real_mount(path.mnt)->mnt_id; - stat->result_mask |= STATX_MNT_ID; + if (request_mask & STATX_MNT_ID_UNIQUE) { + stat->mnt_id = real_mount(path.mnt)->mnt_id_unique; + stat->result_mask |= STATX_MNT_ID_UNIQUE; + } else { + stat->mnt_id = real_mount(path.mnt)->mnt_id; + stat->result_mask |= STATX_MNT_ID; + } if (path.mnt->mnt_root == path.dentry) stat->attributes |= STATX_ATTR_MOUNT_ROOT; diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h index 7cab2c65d3d7..2f2ee82d5517 100644 --- a/include/uapi/linux/stat.h +++ b/include/uapi/linux/stat.h @@ -154,6 +154,7 @@ struct statx { #define STATX_BTIME 0x00000800U /* Want/got stx_btime */ #define STATX_MNT_ID 0x00001000U /* Got stx_mnt_id */ #define STATX_DIOALIGN 0x00002000U /* Want/got direct I/O alignment info */ +#define STATX_MNT_ID_UNIQUE 0x00004000U /* Want/got extended stx_mount_id */ #define STATX__RESERVED 0x80000000U /* Reserved for future struct statx expansion */ From patchwork Wed Oct 25 14:02:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miklos Szeredi X-Patchwork-Id: 13436214 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1FBAF28E03 for ; Wed, 25 Oct 2023 14:02:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="JEONiLPw" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BD16191 for ; Wed, 25 Oct 2023 07:02:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1698242534; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TqU5OFpUfnWoX0zEapwC2eTdGB2uJF4AL25ESnXjaJg=; b=JEONiLPwxMlW9y/MB0xXJaAcbalXXqsNNqiK+6Mp6HjJQj0c/Tuqk1UWdXkF7o1/9Rftu3 fEbvbb90pzycWagmonhKLKZMsJGLDLQ+CSLz7fjsKyZ+O+paQXtb0esfjF4LT9b8ZWG8yJ 2VBjG0g4pZDEE412SIjJdb0/6y6OZWY= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-412-eVBhV1DcM8afVJEvXjE-mw-1; Wed, 25 Oct 2023 10:02:12 -0400 X-MC-Unique: eVBhV1DcM8afVJEvXjE-mw-1 Received: by mail-ej1-f71.google.com with SMTP id a640c23a62f3a-9c983b42c3bso96758166b.1 for ; Wed, 25 Oct 2023 07:02:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698242531; x=1698847331; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TqU5OFpUfnWoX0zEapwC2eTdGB2uJF4AL25ESnXjaJg=; b=YjB9fsxdOnpsFW3LHYe63YSpioeLv+uKmU1E6VHKzhGGFLSi53SELU+Vn180DksHNG S2GBap6DeqUgeXb8tVd9TMwyvn5f17nTLpcX3ql2wF3khs6PKwJI0CNSqUm135Z14Ag7 YaCQc6qw+IbByCIxK5pJCCKEI711URPedtecQiByAzlhh0qR5gUdxiP47cYoeBBXHiix 2NfSco/mHz6SLVerCe8q2vgSXeI7uR1f0DGGc0sgvEyWumEk3CeYm89G58ZjRa5tb/Ds prCnhEJ4sx/DdCfTYiGFyysTzo0A2DhgV7nkaSMfY6vF9Xk53pATBh8FNtxfpvNy7nm5 w7gQ== X-Gm-Message-State: AOJu0YyVSUaAVE2GXMiFnGc+6KA9QDgr/2W8JJsRZQBkoQhG3//achjP dYJsnw6Wo3RieOMmd/2wYPLrRjq6Shi8Gtbnc+Sw1QBkoaPznC3YG+h3Wn0huVefyicHbX+PYIe Zn6K3joIIBKxxEP2kq+s1yMJVRtACnYvwgDd/I6InIQvLpa80zWPfefpwqniLkv/rlefnHiiQhP uSyeOKq1teBQ== X-Received: by 2002:a17:907:7e8b:b0:9a9:fa4a:5a4e with SMTP id qb11-20020a1709077e8b00b009a9fa4a5a4emr17952700ejc.13.1698242530839; Wed, 25 Oct 2023 07:02:10 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGbrxV78iGRyEXr71q2p/3MfdOAn7enEaIs3/+ho4rI94gfuZM96AuUtMY9RGXlfJnr6Bb2PA== X-Received: by 2002:a17:907:7e8b:b0:9a9:fa4a:5a4e with SMTP id qb11-20020a1709077e8b00b009a9fa4a5a4emr17952652ejc.13.1698242530313; Wed, 25 Oct 2023 07:02:10 -0700 (PDT) Received: from maszat.piliscsaba.szeredi.hu (92-249-235-200.pool.digikabel.hu. [92.249.235.200]) by smtp.gmail.com with ESMTPSA id vl9-20020a170907b60900b00989828a42e8sm9857073ejc.154.2023.10.25.07.02.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Oct 2023 07:02:09 -0700 (PDT) From: Miklos Szeredi To: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-man@vger.kernel.org, linux-security-module@vger.kernel.org, Karel Zak , Ian Kent , David Howells , Linus Torvalds , Al Viro , Christian Brauner , Amir Goldstein , Matthew House , Florian Weimer , Arnd Bergmann Subject: [PATCH v4 2/6] mounts: keep list of mounts in an rbtree Date: Wed, 25 Oct 2023 16:02:00 +0200 Message-ID: <20231025140205.3586473-3-mszeredi@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231025140205.3586473-1-mszeredi@redhat.com> References: <20231025140205.3586473-1-mszeredi@redhat.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 When adding a mount to a namespace insert it into an rbtree rooted in the mnt_namespace instead of a linear list. The mnt.mnt_list is still used to set up the mount tree and for propagation, but not after the mount has been added to a namespace. Hence mnt_list can live in union with rb_node. Use MNT_ONRB mount flag to validate that the mount is on the correct list. This allows removing the cursor used for reading /proc/$PID/mountinfo. The mnt_id_unique of the next mount can be used as an index into the seq file. Tested by inserting 100k bind mounts, unsharing the mount namespace, and unmounting. No performance regressions have been observed. For the last mount in the 100k list the statmount() call was more than 100x faster due to the mount ID lookup not having to do a linear search. This patch makes the overhead of mount ID lookup non-observable in this range. Signed-off-by: Miklos Szeredi --- fs/mount.h | 24 +++--- fs/namespace.c | 190 ++++++++++++++++++++---------------------- fs/pnode.c | 2 +- fs/proc_namespace.c | 3 - include/linux/mount.h | 5 +- 5 files changed, 106 insertions(+), 118 deletions(-) diff --git a/fs/mount.h b/fs/mount.h index a14f762b3f29..4a42fc68f4cc 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -8,19 +8,13 @@ struct mnt_namespace { struct ns_common ns; struct mount * root; - /* - * Traversal and modification of .list is protected by either - * - taking namespace_sem for write, OR - * - taking namespace_sem for read AND taking .ns_lock. - */ - struct list_head list; - spinlock_t ns_lock; + struct rb_root mounts; /* Protected by namespace_sem */ struct user_namespace *user_ns; struct ucounts *ucounts; u64 seq; /* Sequence number to prevent loops */ wait_queue_head_t poll; u64 event; - unsigned int mounts; /* # of mounts in the namespace */ + unsigned int nr_mounts; /* # of mounts in the namespace */ unsigned int pending_mounts; } __randomize_layout; @@ -55,7 +49,10 @@ struct mount { struct list_head mnt_child; /* and going through their mnt_child */ struct list_head mnt_instance; /* mount instance on sb->s_mounts */ const char *mnt_devname; /* Name of device e.g. /dev/dsk/hda1 */ - struct list_head mnt_list; + union { + struct rb_node mnt_node; /* Under ns->mounts */ + struct list_head mnt_list; + }; struct list_head mnt_expire; /* link in fs-specific expiry list */ struct list_head mnt_share; /* circular list of shared mounts */ struct list_head mnt_slave_list;/* list of slave mounts */ @@ -128,7 +125,6 @@ struct proc_mounts { struct mnt_namespace *ns; struct path root; int (*show)(struct seq_file *, struct vfsmount *); - struct mount cursor; }; extern const struct seq_operations mounts_op; @@ -147,4 +143,12 @@ static inline bool is_anon_ns(struct mnt_namespace *ns) return ns->seq == 0; } +static inline void move_from_ns(struct mount *mnt, struct list_head *dt_list) +{ + WARN_ON(!(mnt->mnt.mnt_flags & MNT_ONRB)); + mnt->mnt.mnt_flags &= ~MNT_ONRB; + rb_erase(&mnt->mnt_node, &mnt->mnt_ns->mounts); + list_add_tail(&mnt->mnt_list, dt_list); +} + extern void mnt_cursor_del(struct mnt_namespace *ns, struct mount *cursor); diff --git a/fs/namespace.c b/fs/namespace.c index e02bc5f41c7b..0eab47ffc76c 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -732,21 +732,6 @@ struct vfsmount *lookup_mnt(const struct path *path) return m; } -static inline void lock_ns_list(struct mnt_namespace *ns) -{ - spin_lock(&ns->ns_lock); -} - -static inline void unlock_ns_list(struct mnt_namespace *ns) -{ - spin_unlock(&ns->ns_lock); -} - -static inline bool mnt_is_cursor(struct mount *mnt) -{ - return mnt->mnt.mnt_flags & MNT_CURSOR; -} - /* * __is_local_mountpoint - Test to see if dentry is a mountpoint in the * current mount namespace. @@ -765,19 +750,15 @@ static inline bool mnt_is_cursor(struct mount *mnt) bool __is_local_mountpoint(struct dentry *dentry) { struct mnt_namespace *ns = current->nsproxy->mnt_ns; - struct mount *mnt; + struct mount *mnt, *n; bool is_covered = false; down_read(&namespace_sem); - lock_ns_list(ns); - list_for_each_entry(mnt, &ns->list, mnt_list) { - if (mnt_is_cursor(mnt)) - continue; + rbtree_postorder_for_each_entry_safe(mnt, n, &ns->mounts, mnt_node) { is_covered = (mnt->mnt_mountpoint == dentry); if (is_covered) break; } - unlock_ns_list(ns); up_read(&namespace_sem); return is_covered; @@ -1024,6 +1005,30 @@ void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp, struct m mnt_add_count(old_parent, -1); } +static inline struct mount *node_to_mount(struct rb_node *node) +{ + return rb_entry(node, struct mount, mnt_node); +} + +static void mnt_add_to_ns(struct mnt_namespace *ns, struct mount *mnt) +{ + struct rb_node **link = &ns->mounts.rb_node; + struct rb_node *parent = NULL; + + WARN_ON(mnt->mnt.mnt_flags & MNT_ONRB); + mnt->mnt_ns = ns; + while (*link) { + parent = *link; + if (mnt->mnt_id_unique < node_to_mount(parent)->mnt_id_unique) + link = &parent->rb_left; + else + link = &parent->rb_right; + } + rb_link_node(&mnt->mnt_node, parent, link); + rb_insert_color(&mnt->mnt_node, &ns->mounts); + mnt->mnt.mnt_flags |= MNT_ONRB; +} + /* * vfsmount lock must be held for write */ @@ -1037,12 +1042,13 @@ static void commit_tree(struct mount *mnt) BUG_ON(parent == mnt); list_add_tail(&head, &mnt->mnt_list); - list_for_each_entry(m, &head, mnt_list) - m->mnt_ns = n; + while (!list_empty(&head)) { + m = list_first_entry(&head, typeof(*m), mnt_list); + list_del(&m->mnt_list); - list_splice(&head, n->list.prev); - - n->mounts += n->pending_mounts; + mnt_add_to_ns(n, m); + } + n->nr_mounts += n->pending_mounts; n->pending_mounts = 0; __attach_mnt(mnt, parent); @@ -1190,7 +1196,7 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root, } mnt->mnt.mnt_flags = old->mnt.mnt_flags; - mnt->mnt.mnt_flags &= ~(MNT_WRITE_HOLD|MNT_MARKED|MNT_INTERNAL); + mnt->mnt.mnt_flags &= ~(MNT_WRITE_HOLD|MNT_MARKED|MNT_INTERNAL|MNT_ONRB); atomic_inc(&sb->s_active); mnt->mnt.mnt_idmap = mnt_idmap_get(mnt_idmap(&old->mnt)); @@ -1415,65 +1421,57 @@ struct vfsmount *mnt_clone_internal(const struct path *path) return &p->mnt; } -#ifdef CONFIG_PROC_FS -static struct mount *mnt_list_next(struct mnt_namespace *ns, - struct list_head *p) +/* + * Returns the mount which either has the specified mnt_id, or has the next + * smallest id afer the specified one. + */ +static struct mount *mnt_find_id_at(struct mnt_namespace *ns, u64 mnt_id) { - struct mount *mnt, *ret = NULL; + struct rb_node *node = ns->mounts.rb_node; + struct mount *ret = NULL; - lock_ns_list(ns); - list_for_each_continue(p, &ns->list) { - mnt = list_entry(p, typeof(*mnt), mnt_list); - if (!mnt_is_cursor(mnt)) { - ret = mnt; - break; + while (node) { + struct mount *m = node_to_mount(node); + + if (mnt_id <= m->mnt_id_unique) { + ret = node_to_mount(node); + if (mnt_id == m->mnt_id_unique) + break; + node = node->rb_left; + } else { + node = node->rb_right; } } - unlock_ns_list(ns); - return ret; } +#ifdef CONFIG_PROC_FS + /* iterator; we want it to have access to namespace_sem, thus here... */ static void *m_start(struct seq_file *m, loff_t *pos) { struct proc_mounts *p = m->private; - struct list_head *prev; down_read(&namespace_sem); - if (!*pos) { - prev = &p->ns->list; - } else { - prev = &p->cursor.mnt_list; - /* Read after we'd reached the end? */ - if (list_empty(prev)) - return NULL; - } - - return mnt_list_next(p->ns, prev); + return mnt_find_id_at(p->ns, *pos); } static void *m_next(struct seq_file *m, void *v, loff_t *pos) { - struct proc_mounts *p = m->private; - struct mount *mnt = v; + struct mount *next = NULL, *mnt = v; + struct rb_node *node = rb_next(&mnt->mnt_node); ++*pos; - return mnt_list_next(p->ns, &mnt->mnt_list); + if (node) { + next = node_to_mount(node); + *pos = next->mnt_id_unique; + } + return next; } static void m_stop(struct seq_file *m, void *v) { - struct proc_mounts *p = m->private; - struct mount *mnt = v; - - lock_ns_list(p->ns); - if (mnt) - list_move_tail(&p->cursor.mnt_list, &mnt->mnt_list); - else - list_del_init(&p->cursor.mnt_list); - unlock_ns_list(p->ns); up_read(&namespace_sem); } @@ -1491,14 +1489,6 @@ const struct seq_operations mounts_op = { .show = m_show, }; -void mnt_cursor_del(struct mnt_namespace *ns, struct mount *cursor) -{ - down_read(&namespace_sem); - lock_ns_list(ns); - list_del(&cursor->mnt_list); - unlock_ns_list(ns); - up_read(&namespace_sem); -} #endif /* CONFIG_PROC_FS */ /** @@ -1640,7 +1630,10 @@ static void umount_tree(struct mount *mnt, enum umount_tree_flags how) /* Gather the mounts to umount */ for (p = mnt; p; p = next_mnt(p, mnt)) { p->mnt.mnt_flags |= MNT_UMOUNT; - list_move(&p->mnt_list, &tmp_list); + if (p->mnt.mnt_flags & MNT_ONRB) + move_from_ns(p, &tmp_list); + else + list_move(&p->mnt_list, &tmp_list); } /* Hide the mounts from mnt_mounts */ @@ -1660,7 +1653,7 @@ static void umount_tree(struct mount *mnt, enum umount_tree_flags how) list_del_init(&p->mnt_list); ns = p->mnt_ns; if (ns) { - ns->mounts--; + ns->nr_mounts--; __touch_mnt_namespace(ns); } p->mnt_ns = NULL; @@ -1786,14 +1779,16 @@ static int do_umount(struct mount *mnt, int flags) event++; if (flags & MNT_DETACH) { - if (!list_empty(&mnt->mnt_list)) + if (mnt->mnt.mnt_flags & MNT_ONRB || + !list_empty(&mnt->mnt_list)) umount_tree(mnt, UMOUNT_PROPAGATE); retval = 0; } else { shrink_submounts(mnt); retval = -EBUSY; if (!propagate_mount_busy(mnt, 2)) { - if (!list_empty(&mnt->mnt_list)) + if (mnt->mnt.mnt_flags & MNT_ONRB || + !list_empty(&mnt->mnt_list)) umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC); retval = 0; } @@ -2211,9 +2206,9 @@ int count_mounts(struct mnt_namespace *ns, struct mount *mnt) unsigned int mounts = 0; struct mount *p; - if (ns->mounts >= max) + if (ns->nr_mounts >= max) return -ENOSPC; - max -= ns->mounts; + max -= ns->nr_mounts; if (ns->pending_mounts >= max) return -ENOSPC; max -= ns->pending_mounts; @@ -2357,8 +2352,12 @@ static int attach_recursive_mnt(struct mount *source_mnt, touch_mnt_namespace(source_mnt->mnt_ns); } else { if (source_mnt->mnt_ns) { + LIST_HEAD(head); + /* move from anon - the caller will destroy */ - list_del_init(&source_mnt->mnt_ns->list); + for (p = source_mnt; p; p = next_mnt(p, source_mnt)) + move_from_ns(p, &head); + list_del_init(&head); } if (beneath) mnt_set_mountpoint_beneath(source_mnt, top_mnt, smp); @@ -2669,11 +2668,10 @@ static struct file *open_detached_copy(struct path *path, bool recursive) lock_mount_hash(); for (p = mnt; p; p = next_mnt(p, mnt)) { - p->mnt_ns = ns; - ns->mounts++; + mnt_add_to_ns(ns, p); + ns->nr_mounts++; } ns->root = mnt; - list_add_tail(&ns->list, &mnt->mnt_list); mntget(&mnt->mnt); unlock_mount_hash(); namespace_unlock(); @@ -3736,9 +3734,8 @@ static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *user_ns, bool a if (!anon) new_ns->seq = atomic64_add_return(1, &mnt_ns_seq); refcount_set(&new_ns->ns.count, 1); - INIT_LIST_HEAD(&new_ns->list); + new_ns->mounts = RB_ROOT; init_waitqueue_head(&new_ns->poll); - spin_lock_init(&new_ns->ns_lock); new_ns->user_ns = get_user_ns(user_ns); new_ns->ucounts = ucounts; return new_ns; @@ -3785,7 +3782,6 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns, unlock_mount_hash(); } new_ns->root = new; - list_add_tail(&new_ns->list, &new->mnt_list); /* * Second pass: switch the tsk->fs->* elements and mark new vfsmounts @@ -3795,8 +3791,8 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns, p = old; q = new; while (p) { - q->mnt_ns = new_ns; - new_ns->mounts++; + mnt_add_to_ns(new_ns, q); + new_ns->nr_mounts++; if (new_fs) { if (&p->mnt == new_fs->root.mnt) { new_fs->root.mnt = mntget(&q->mnt); @@ -3838,10 +3834,9 @@ struct dentry *mount_subtree(struct vfsmount *m, const char *name) mntput(m); return ERR_CAST(ns); } - mnt->mnt_ns = ns; ns->root = mnt; - ns->mounts++; - list_add(&mnt->mnt_list, &ns->list); + ns->nr_mounts++; + mnt_add_to_ns(ns, mnt); err = vfs_path_lookup(m->mnt_root, m, name, LOOKUP_FOLLOW|LOOKUP_AUTOMOUNT, &path); @@ -4019,10 +4014,9 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags, goto err_path; } mnt = real_mount(newmount.mnt); - mnt->mnt_ns = ns; ns->root = mnt; - ns->mounts = 1; - list_add(&mnt->mnt_list, &ns->list); + ns->nr_mounts = 1; + mnt_add_to_ns(ns, mnt); mntget(newmount.mnt); /* Attach to an apparent O_PATH fd with a note that we need to unmount @@ -4693,10 +4687,9 @@ static void __init init_mount_tree(void) if (IS_ERR(ns)) panic("Can't allocate initial namespace"); m = real_mount(mnt); - m->mnt_ns = ns; ns->root = m; - ns->mounts = 1; - list_add(&m->mnt_list, &ns->list); + ns->nr_mounts = 1; + mnt_add_to_ns(ns, m); init_task.nsproxy->mnt_ns = ns; get_mnt_ns(ns); @@ -4823,18 +4816,14 @@ static bool mnt_already_visible(struct mnt_namespace *ns, int *new_mnt_flags) { int new_flags = *new_mnt_flags; - struct mount *mnt; + struct mount *mnt, *n; bool visible = false; down_read(&namespace_sem); - lock_ns_list(ns); - list_for_each_entry(mnt, &ns->list, mnt_list) { + rbtree_postorder_for_each_entry_safe(mnt, n, &ns->mounts, mnt_node) { struct mount *child; int mnt_flags; - if (mnt_is_cursor(mnt)) - continue; - if (mnt->mnt.mnt_sb->s_type != sb->s_type) continue; @@ -4882,7 +4871,6 @@ static bool mnt_already_visible(struct mnt_namespace *ns, next: ; } found: - unlock_ns_list(ns); up_read(&namespace_sem); return visible; } diff --git a/fs/pnode.c b/fs/pnode.c index e4d0340393d5..a799e0315cc9 100644 --- a/fs/pnode.c +++ b/fs/pnode.c @@ -468,7 +468,7 @@ static void umount_one(struct mount *mnt, struct list_head *to_umount) mnt->mnt.mnt_flags |= MNT_UMOUNT; list_del_init(&mnt->mnt_child); list_del_init(&mnt->mnt_umounting); - list_move_tail(&mnt->mnt_list, to_umount); + move_from_ns(mnt, to_umount); } /* diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c index 250eb5bf7b52..73d2274d5f59 100644 --- a/fs/proc_namespace.c +++ b/fs/proc_namespace.c @@ -283,8 +283,6 @@ static int mounts_open_common(struct inode *inode, struct file *file, p->ns = ns; p->root = root; p->show = show; - INIT_LIST_HEAD(&p->cursor.mnt_list); - p->cursor.mnt.mnt_flags = MNT_CURSOR; return 0; @@ -301,7 +299,6 @@ static int mounts_release(struct inode *inode, struct file *file) struct seq_file *m = file->private_data; struct proc_mounts *p = m->private; path_put(&p->root); - mnt_cursor_del(p->ns, &p->cursor); put_mnt_ns(p->ns); return seq_release_private(inode, file); } diff --git a/include/linux/mount.h b/include/linux/mount.h index 4f40b40306d0..7952eddc835c 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -50,8 +50,7 @@ struct path; #define MNT_ATIME_MASK (MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME ) #define MNT_INTERNAL_FLAGS (MNT_SHARED | MNT_WRITE_HOLD | MNT_INTERNAL | \ - MNT_DOOMED | MNT_SYNC_UMOUNT | MNT_MARKED | \ - MNT_CURSOR) + MNT_DOOMED | MNT_SYNC_UMOUNT | MNT_MARKED | MNT_ONRB) #define MNT_INTERNAL 0x4000 @@ -65,7 +64,7 @@ struct path; #define MNT_SYNC_UMOUNT 0x2000000 #define MNT_MARKED 0x4000000 #define MNT_UMOUNT 0x8000000 -#define MNT_CURSOR 0x10000000 +#define MNT_ONRB 0x10000000 struct vfsmount { struct dentry *mnt_root; /* root of the mounted tree */ From patchwork Wed Oct 25 14:02:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miklos Szeredi X-Patchwork-Id: 13436213 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E13AF28E02 for ; Wed, 25 Oct 2023 14:02:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="gT2LfIBM" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 63A1C193 for ; Wed, 25 Oct 2023 07:02:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1698242534; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oOOGhallOPxnWGST34MdPR6LpOdkhm6q6XmykKHjQ0c=; b=gT2LfIBMmQ8sCCKPzUxLcI7wq6s2aIyE98U06NzNmSIBPseLsePeYNdg0v+gQLgIfTPJrB /1l4qyt0AHF8g42FLc/sqvnLyp8/2gdWLk02P+24+3DCkyTCA+TMMuuWnwhrc6PHOpY4Iw YPsKfm28BeQ4F8Gu4THGbG0qUtFTkh8= Received: from mail-ej1-f69.google.com (mail-ej1-f69.google.com [209.85.218.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-529-6CrO1vKKPAa9FzFD3GOaPA-1; Wed, 25 Oct 2023 10:02:13 -0400 X-MC-Unique: 6CrO1vKKPAa9FzFD3GOaPA-1 Received: by mail-ej1-f69.google.com with SMTP id a640c23a62f3a-9c7f0a33afbso288770066b.3 for ; Wed, 25 Oct 2023 07:02:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698242531; x=1698847331; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oOOGhallOPxnWGST34MdPR6LpOdkhm6q6XmykKHjQ0c=; b=cZ1Ae00DVnTBhERBkTsl0Ih+Eu0Y3Z+0WdhaXLibgKdh4ZZmnXre8JqHzAFB/e+xgS UkT+pVBSKb6MWkTIC7aOx7RAH5aiqqryzheLcDPUaiR8tgohcbUKcXDuyVm8tNYups/C sT338QM1aZlHYZZ/kaApxmKHDlJWdnpqbUlBoLB3tWjT2mhfzoc55yvA20Qi7q3sG0zk xoI1LgScADzF4lDm5DysqPuPTNkn4Ceo55yySZVUTmcb1kIX9WJyQgBjRyMHPMjR92Hp EzH9pnKRWYlnMAN6SzgcQfNT6g3k0L+1NUkNP+QbwjpvRjTx+lIYUZ8FY9enJXck2XlS 7SXw== X-Gm-Message-State: AOJu0Yzz5YyNY7NJtgNGHI2uYxfYyRsFpOHzQ8f0dYFus9ag8l7I52IK KUaSNYCahUQIdZhBj6mF8EUi+zYe9eyv254g2AT9A7flIoBIVF6S8MxDFRO8/W+CL7n+qGXO30d m8a0O1INh9RpnmaeaHNO0/vtffO3f5RtEBOHp3CbxrKd6ABa+tA0IPSxp0dGephjcUcRL0D5PmO sjykLjGIJUNw== X-Received: by 2002:a17:907:7214:b0:9c5:64f2:eaba with SMTP id dr20-20020a170907721400b009c564f2eabamr12173178ejc.53.1698242531705; Wed, 25 Oct 2023 07:02:11 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGjFyLKoZkQVe5WJ8FMZvmHP9TlGwCthXsPC2ClyGzG4V59SQ3WuYohHWjLbekPtDQTXUhMwQ== X-Received: by 2002:a17:907:7214:b0:9c5:64f2:eaba with SMTP id dr20-20020a170907721400b009c564f2eabamr12173147ejc.53.1698242531456; Wed, 25 Oct 2023 07:02:11 -0700 (PDT) Received: from maszat.piliscsaba.szeredi.hu (92-249-235-200.pool.digikabel.hu. [92.249.235.200]) by smtp.gmail.com with ESMTPSA id vl9-20020a170907b60900b00989828a42e8sm9857073ejc.154.2023.10.25.07.02.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Oct 2023 07:02:10 -0700 (PDT) From: Miklos Szeredi To: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-man@vger.kernel.org, linux-security-module@vger.kernel.org, Karel Zak , Ian Kent , David Howells , Linus Torvalds , Al Viro , Christian Brauner , Amir Goldstein , Matthew House , Florian Weimer , Arnd Bergmann Subject: [PATCH v4 3/6] namespace: extract show_path() helper Date: Wed, 25 Oct 2023 16:02:01 +0200 Message-ID: <20231025140205.3586473-4-mszeredi@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231025140205.3586473-1-mszeredi@redhat.com> References: <20231025140205.3586473-1-mszeredi@redhat.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To be used by the statmount(2) syscall as well. Signed-off-by: Miklos Szeredi --- fs/internal.h | 2 ++ fs/namespace.c | 9 +++++++++ fs/proc_namespace.c | 10 +++------- 3 files changed, 14 insertions(+), 7 deletions(-) diff --git a/fs/internal.h b/fs/internal.h index d64ae03998cc..0c4f4cf2ff5a 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -83,6 +83,8 @@ int path_mount(const char *dev_name, struct path *path, const char *type_page, unsigned long flags, void *data_page); int path_umount(struct path *path, int flags); +int show_path(struct seq_file *m, struct dentry *root); + /* * fs_struct.c */ diff --git a/fs/namespace.c b/fs/namespace.c index 0eab47ffc76c..7a33ea391a02 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -4672,6 +4672,15 @@ SYSCALL_DEFINE5(mount_setattr, int, dfd, const char __user *, path, return err; } +int show_path(struct seq_file *m, struct dentry *root) +{ + if (root->d_sb->s_op->show_path) + return root->d_sb->s_op->show_path(m, root); + + seq_dentry(m, root, " \t\n\\"); + return 0; +} + static void __init init_mount_tree(void) { struct vfsmount *mnt; diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c index 73d2274d5f59..0a808951b7d3 100644 --- a/fs/proc_namespace.c +++ b/fs/proc_namespace.c @@ -142,13 +142,9 @@ static int show_mountinfo(struct seq_file *m, struct vfsmount *mnt) seq_printf(m, "%i %i %u:%u ", r->mnt_id, r->mnt_parent->mnt_id, MAJOR(sb->s_dev), MINOR(sb->s_dev)); - if (sb->s_op->show_path) { - err = sb->s_op->show_path(m, mnt->mnt_root); - if (err) - goto out; - } else { - seq_dentry(m, mnt->mnt_root, " \t\n\\"); - } + err = show_path(m, mnt->mnt_root); + if (err) + goto out; seq_putc(m, ' '); /* mountpoints outside of chroot jail will give SEQ_SKIP on this */ From patchwork Wed Oct 25 14:02:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miklos Szeredi X-Patchwork-Id: 13436218 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B05628E01 for ; Wed, 25 Oct 2023 14:02:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="GQRkPIW0" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 113D5D5C for ; Wed, 25 Oct 2023 07:02:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1698242556; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XEx2f+3WyjfEN8nno/L3GWupAAdRJt4O0xkLy6vl9ko=; b=GQRkPIW0vpmcAzdgGeNrqFcR3d5SBjsJA2ne+zNlQavgDcjaunfUfcw4EefY6Xur9KOUoV mA39vhq+r6oAGAg8gBQkhoApdgUS5Poqi/mZr4oGGLJScLdl2IZeazXYIpZbYUGQLJ87pL p40QS/PRTvaIGvIwx08QeWIlGAVPf6Q= Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com [209.85.218.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-99-w-WDITc0PDiYBlzVJB0TzA-1; Wed, 25 Oct 2023 10:02:15 -0400 X-MC-Unique: w-WDITc0PDiYBlzVJB0TzA-1 Received: by mail-ej1-f72.google.com with SMTP id a640c23a62f3a-9c983b42c3bso96765766b.1 for ; Wed, 25 Oct 2023 07:02:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698242534; x=1698847334; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XEx2f+3WyjfEN8nno/L3GWupAAdRJt4O0xkLy6vl9ko=; b=wJh+io55VhmTad7uMsovsrwu6VWW8h+sACSN0Or0QBGTmKyRNXTMGNMbzZvSr+qF1x h8m+jH7FKRqX5QVKUpBQsCK78D4rqk8m3qMEBB7tY5YMu2dUhinOA4wN9xo5FdjD96G/ TUipogJ5gGSMobiaRjglaMnfjbxxHypbpeB148pgIW5iGwGDVOvajKePN8tMm0SsGFt9 +mYV16hbuN6CT3PLWUFFg4t+uD+jlNBuVm6w6K2EQrcBTA3AHUkivwXYmPZqaiW2DhXp EP+zP3SdQpVjP1MfZUm4UgX5D17lFR4mA/zOTIaQuC+HUOyVwCqkzBBCnDOY1rTv7ewm OHYQ== X-Gm-Message-State: AOJu0YxNimGxxo1acHk3r02AdyvXAtG3H9yWm1Ka8yDOYc1ULafBnlRd odk8qMkr9OE0EwdOm5Godz3UI4txVLzfSZ4u9cVqsMbWM6WJ3DSk+wt159Nnd7Op7y+PUPVT7iF eYZINi5RPs7DZfgpY3I6mPzYB7KKRAtoM5OkNJzhPsmqwKwWYssp3TL2InHejjZ5elXIqucLtoS lrURDmmRTjPA== X-Received: by 2002:a17:907:97cc:b0:9a5:9f3c:961e with SMTP id js12-20020a17090797cc00b009a59f3c961emr14833605ejc.18.1698242533518; Wed, 25 Oct 2023 07:02:13 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFmce8EbgYDdhdsywUowDHi/7yLCcYQCCY7TtCnZTLmu1xGtDmupYnM1RFd/HemJA2KX7UkXg== X-Received: by 2002:a17:907:97cc:b0:9a5:9f3c:961e with SMTP id js12-20020a17090797cc00b009a59f3c961emr14833549ejc.18.1698242532906; Wed, 25 Oct 2023 07:02:12 -0700 (PDT) Received: from maszat.piliscsaba.szeredi.hu (92-249-235-200.pool.digikabel.hu. [92.249.235.200]) by smtp.gmail.com with ESMTPSA id vl9-20020a170907b60900b00989828a42e8sm9857073ejc.154.2023.10.25.07.02.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Oct 2023 07:02:12 -0700 (PDT) From: Miklos Szeredi To: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-man@vger.kernel.org, linux-security-module@vger.kernel.org, Karel Zak , Ian Kent , David Howells , Linus Torvalds , Al Viro , Christian Brauner , Amir Goldstein , Matthew House , Florian Weimer , Arnd Bergmann Subject: [PATCH v4 4/6] add statmount(2) syscall Date: Wed, 25 Oct 2023 16:02:02 +0200 Message-ID: <20231025140205.3586473-5-mszeredi@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231025140205.3586473-1-mszeredi@redhat.com> References: <20231025140205.3586473-1-mszeredi@redhat.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add a way to query attributes of a single mount instead of having to parse the complete /proc/$PID/mountinfo, which might be huge. Lookup the mount the new 64bit mount ID. If a mount needs to be queried based on path, then statx(2) can be used to first query the mount ID belonging to the path. Design is based on a suggestion by Linus: "So I'd suggest something that is very much like "statfsat()", which gets a buffer and a length, and returns an extended "struct statfs" *AND* just a string description at the end." The interface closely mimics that of statx. Handle ASCII attributes by appending after the end of the structure (as per above suggestion). Pointers to strings are stored in u64 members to make the structure the same regardless of pointer size. Strings are nul terminated. Link: https://lore.kernel.org/all/CAHk-=wh5YifP7hzKSbwJj94+DZ2czjrZsczy6GBimiogZws=rg@mail.gmail.com/ Signed-off-by: Miklos Szeredi --- fs/namespace.c | 277 +++++++++++++++++++++++++++++++++++++ include/linux/syscalls.h | 5 + include/uapi/linux/mount.h | 56 ++++++++ 3 files changed, 338 insertions(+) diff --git a/fs/namespace.c b/fs/namespace.c index 7a33ea391a02..a980c250a3a6 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -4681,6 +4681,283 @@ int show_path(struct seq_file *m, struct dentry *root) return 0; } +static struct vfsmount *lookup_mnt_in_ns(u64 id, struct mnt_namespace *ns) +{ + struct mount *mnt = mnt_find_id_at(ns, id); + + if (!mnt || mnt->mnt_id_unique != id) + return NULL; + + return &mnt->mnt; +} + +struct stmt_state { + struct statmnt __user *const buf; + size_t const bufsize; + struct vfsmount *const mnt; + u64 const mask; + struct seq_file seq; + struct path root; + struct statmnt sm; + size_t pos; + int err; +}; + +typedef int (*stmt_func_t)(struct stmt_state *); + +static int stmt_string_seq(struct stmt_state *s, stmt_func_t func) +{ + size_t rem = s->bufsize - s->pos - sizeof(s->sm); + struct seq_file *seq = &s->seq; + int ret; + + seq->count = 0; + seq->size = min(seq->size, rem); + seq->buf = kvmalloc(seq->size, GFP_KERNEL_ACCOUNT); + if (!seq->buf) + return -ENOMEM; + + ret = func(s); + if (ret) + return ret; + + if (seq_has_overflowed(seq)) { + if (seq->size == rem) + return -EOVERFLOW; + seq->size *= 2; + if (seq->size > MAX_RW_COUNT) + return -ENOMEM; + kvfree(seq->buf); + return 0; + } + + /* Done */ + return 1; +} + +static void stmt_string(struct stmt_state *s, u64 mask, stmt_func_t func, + u32 *str) +{ + int ret = s->pos + sizeof(s->sm) >= s->bufsize ? -EOVERFLOW : 0; + struct statmnt *sm = &s->sm; + struct seq_file *seq = &s->seq; + + if (s->err || !(s->mask & mask)) + return; + + seq->size = PAGE_SIZE; + while (!ret) + ret = stmt_string_seq(s, func); + + if (ret < 0) { + s->err = ret; + } else { + seq->buf[seq->count++] = '\0'; + if (copy_to_user(s->buf->str + s->pos, seq->buf, seq->count)) { + s->err = -EFAULT; + } else { + *str = s->pos; + s->pos += seq->count; + } + } + kvfree(seq->buf); + sm->mask |= mask; +} + +static void stmt_numeric(struct stmt_state *s, u64 mask, stmt_func_t func) +{ + if (s->err || !(s->mask & mask)) + return; + + s->err = func(s); + s->sm.mask |= mask; +} + +static u64 mnt_to_attr_flags(struct vfsmount *mnt) +{ + unsigned int mnt_flags = READ_ONCE(mnt->mnt_flags); + u64 attr_flags = 0; + + if (mnt_flags & MNT_READONLY) + attr_flags |= MOUNT_ATTR_RDONLY; + if (mnt_flags & MNT_NOSUID) + attr_flags |= MOUNT_ATTR_NOSUID; + if (mnt_flags & MNT_NODEV) + attr_flags |= MOUNT_ATTR_NODEV; + if (mnt_flags & MNT_NOEXEC) + attr_flags |= MOUNT_ATTR_NOEXEC; + if (mnt_flags & MNT_NODIRATIME) + attr_flags |= MOUNT_ATTR_NODIRATIME; + if (mnt_flags & MNT_NOSYMFOLLOW) + attr_flags |= MOUNT_ATTR_NOSYMFOLLOW; + + if (mnt_flags & MNT_NOATIME) + attr_flags |= MOUNT_ATTR_NOATIME; + else if (mnt_flags & MNT_RELATIME) + attr_flags |= MOUNT_ATTR_RELATIME; + else + attr_flags |= MOUNT_ATTR_STRICTATIME; + + if (is_idmapped_mnt(mnt)) + attr_flags |= MOUNT_ATTR_IDMAP; + + return attr_flags; +} + +static u64 mnt_to_propagation_flags(struct mount *m) +{ + u64 propagation = 0; + + if (IS_MNT_SHARED(m)) + propagation |= MS_SHARED; + if (IS_MNT_SLAVE(m)) + propagation |= MS_SLAVE; + if (IS_MNT_UNBINDABLE(m)) + propagation |= MS_UNBINDABLE; + if (!propagation) + propagation |= MS_PRIVATE; + + return propagation; +} + +static int stmt_sb_basic(struct stmt_state *s) +{ + struct super_block *sb = s->mnt->mnt_sb; + + s->sm.sb_dev_major = MAJOR(sb->s_dev); + s->sm.sb_dev_minor = MINOR(sb->s_dev); + s->sm.sb_magic = sb->s_magic; + s->sm.sb_flags = sb->s_flags & (SB_RDONLY|SB_SYNCHRONOUS|SB_DIRSYNC|SB_LAZYTIME); + + return 0; +} + +static int stmt_mnt_basic(struct stmt_state *s) +{ + struct mount *m = real_mount(s->mnt); + + s->sm.mnt_id = m->mnt_id_unique; + s->sm.mnt_parent_id = m->mnt_parent->mnt_id_unique; + s->sm.mnt_id_old = m->mnt_id; + s->sm.mnt_parent_id_old = m->mnt_parent->mnt_id; + s->sm.mnt_attr = mnt_to_attr_flags(&m->mnt); + s->sm.mnt_propagation = mnt_to_propagation_flags(m); + s->sm.mnt_peer_group = IS_MNT_SHARED(m) ? m->mnt_group_id : 0; + s->sm.mnt_master = IS_MNT_SLAVE(m) ? m->mnt_master->mnt_group_id : 0; + + return 0; +} + +static int stmt_propagate_from(struct stmt_state *s) +{ + struct mount *m = real_mount(s->mnt); + + if (!IS_MNT_SLAVE(m)) + return 0; + + s->sm.propagate_from = get_dominating_id(m, ¤t->fs->root); + + return 0; +} + +static int stmt_mnt_root(struct stmt_state *s) +{ + struct seq_file *seq = &s->seq; + int err = show_path(seq, s->mnt->mnt_root); + + if (!err && !seq_has_overflowed(seq)) { + seq->buf[seq->count] = '\0'; + seq->count = string_unescape_inplace(seq->buf, UNESCAPE_OCTAL); + } + return err; +} + +static int stmt_mnt_point(struct stmt_state *s) +{ + struct vfsmount *mnt = s->mnt; + struct path mnt_path = { .dentry = mnt->mnt_root, .mnt = mnt }; + int err = seq_path_root(&s->seq, &mnt_path, &s->root, ""); + + return err == SEQ_SKIP ? 0 : err; +} + +static int stmt_fs_type(struct stmt_state *s) +{ + struct seq_file *seq = &s->seq; + struct super_block *sb = s->mnt->mnt_sb; + + seq_puts(seq, sb->s_type->name); + return 0; +} + +static int do_statmount(struct stmt_state *s) +{ + struct statmnt *sm = &s->sm; + struct mount *m = real_mount(s->mnt); + size_t copysize = min_t(size_t, s->bufsize, sizeof(*sm)); + int err; + + err = security_sb_statfs(s->mnt->mnt_root); + if (err) + return err; + + if (!capable(CAP_SYS_ADMIN) && + !is_path_reachable(m, m->mnt.mnt_root, &s->root)) + return -EPERM; + + stmt_numeric(s, STMT_SB_BASIC, stmt_sb_basic); + stmt_numeric(s, STMT_MNT_BASIC, stmt_mnt_basic); + stmt_numeric(s, STMT_PROPAGATE_FROM, stmt_propagate_from); + stmt_string(s, STMT_FS_TYPE, stmt_fs_type, &sm->fs_type); + stmt_string(s, STMT_MNT_ROOT, stmt_mnt_root, &sm->mnt_root); + stmt_string(s, STMT_MNT_POINT, stmt_mnt_point, &sm->mnt_point); + + if (s->err) + return s->err; + + /* Return the number of bytes copied to the buffer */ + sm->size = copysize + s->pos; + + if (copy_to_user(s->buf, sm, copysize)) + return -EFAULT; + + return 0; +} + +SYSCALL_DEFINE4(statmount, const struct __mount_arg __user *, req, + struct statmnt __user *, buf, size_t, bufsize, + unsigned int, flags) +{ + struct vfsmount *mnt; + struct __mount_arg kreq; + int ret; + + if (flags) + return -EINVAL; + + if (copy_from_user(&kreq, req, sizeof(kreq))) + return -EFAULT; + + down_read(&namespace_sem); + mnt = lookup_mnt_in_ns(kreq.mnt_id, current->nsproxy->mnt_ns); + ret = -ENOENT; + if (mnt) { + struct stmt_state s = { + .mask = kreq.request_mask, + .buf = buf, + .bufsize = bufsize, + .mnt = mnt, + }; + + get_fs_root(current->fs, &s.root); + ret = do_statmount(&s); + path_put(&s.root); + } + up_read(&namespace_sem); + + return ret; +} + static void __init init_mount_tree(void) { struct vfsmount *mnt; diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 22bc6bc147f8..ba371024d902 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -74,6 +74,8 @@ struct landlock_ruleset_attr; enum landlock_rule_type; struct cachestat_range; struct cachestat; +struct statmnt; +struct __mount_arg; #include #include @@ -408,6 +410,9 @@ asmlinkage long sys_statfs64(const char __user *path, size_t sz, asmlinkage long sys_fstatfs(unsigned int fd, struct statfs __user *buf); asmlinkage long sys_fstatfs64(unsigned int fd, size_t sz, struct statfs64 __user *buf); +asmlinkage long sys_statmount(const struct __mount_arg __user *req, + struct statmnt __user *buf, size_t bufsize, + unsigned int flags); asmlinkage long sys_truncate(const char __user *path, long length); asmlinkage long sys_ftruncate(unsigned int fd, unsigned long length); #if BITS_PER_LONG == 32 diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h index bb242fdcfe6b..d2c988ab526b 100644 --- a/include/uapi/linux/mount.h +++ b/include/uapi/linux/mount.h @@ -138,4 +138,60 @@ struct mount_attr { /* List of all mount_attr versions. */ #define MOUNT_ATTR_SIZE_VER0 32 /* sizeof first published struct */ + +/* + * Structure for getting mount/superblock/filesystem info with statmount(2). + * + * The interface is similar to statx(2): individual fields or groups can be + * selected with the @mask argument of statmount(). Kernel will set the @mask + * field according to the supported fields. + * + * If string fields are selected, then the caller needs to pass a buffer that + * has space after the fixed part of the structure. Nul terminated strings are + * copied there and offsets relative to @str are stored in the relevant fields. + * If the buffer is too small, then EOVERFLOW is returned. The actually used + * size is returned in @size. + */ +struct statmnt { + __u32 size; /* Total size, including strings */ + __u32 __spare1; + __u64 mask; /* What results were written */ + __u32 sb_dev_major; /* Device ID */ + __u32 sb_dev_minor; + __u64 sb_magic; /* ..._SUPER_MAGIC */ + __u32 sb_flags; /* MS_{RDONLY,SYNCHRONOUS,DIRSYNC,LAZYTIME} */ + __u32 fs_type; /* [str] Filesystem type */ + __u64 mnt_id; /* Unique ID of mount */ + __u64 mnt_parent_id; /* Unique ID of parent (for root == mnt_id) */ + __u32 mnt_id_old; /* Reused IDs used in proc/.../mountinfo */ + __u32 mnt_parent_id_old; + __u64 mnt_attr; /* MOUNT_ATTR_... */ + __u64 mnt_propagation; /* MS_{SHARED,SLAVE,PRIVATE,UNBINDABLE} */ + __u64 mnt_peer_group; /* ID of shared peer group */ + __u64 mnt_master; /* Mount receives propagation from this ID */ + __u64 propagate_from; /* Propagation from in current namespace */ + __u32 mnt_root; /* [str] Root of mount relative to root of fs */ + __u32 mnt_point; /* [str] Mountpoint relative to current root */ + __u64 __spare2[50]; + char str[]; /* Variable size part containing strings */ +}; + +/* + * To be used on the kernel ABI only for passing 64bit arguments to statmount(2) + */ +struct __mount_arg { + __u64 mnt_id; + __u64 request_mask; +}; + +/* + * @mask bits for statmount(2) + */ +#define STMT_SB_BASIC 0x00000001U /* Want/got sb_... */ +#define STMT_MNT_BASIC 0x00000002U /* Want/got mnt_... */ +#define STMT_PROPAGATE_FROM 0x00000004U /* Want/got propagate_from */ +#define STMT_MNT_ROOT 0x00000008U /* Want/got mnt_root */ +#define STMT_MNT_POINT 0x00000010U /* Want/got mnt_point */ +#define STMT_FS_TYPE 0x00000020U /* Want/got fs_type */ + #endif /* _UAPI_LINUX_MOUNT_H */ From patchwork Wed Oct 25 14:02:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miklos Szeredi X-Patchwork-Id: 13436215 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E00D28E01 for ; Wed, 25 Oct 2023 14:02:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="LbosK9Rw" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BDC4F1B2 for ; Wed, 25 Oct 2023 07:02:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1698242543; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4bwGq/C+5APOsCqFrwpf0XoGoPuUJR4+QLSXHx9atCw=; b=LbosK9Rw7pwGZE6yeUZloVzMpDUjE31R+wRGDFnM8OEKtZa0vb98Z9KtaBGt46NjobP5Ht aX+6WFmaK3LgnMy9j1lXy2WCS0/r6pZi5cY0bfk8o8eadsWnZySSf4Z/XUJ1P6K3/BMFeX QEBS1APW5iJ5TQvxvLhPuZIcWjtdYXs= Received: from mail-ej1-f70.google.com (mail-ej1-f70.google.com [209.85.218.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-631-OONb5JzCNwG2661wPR2G4g-1; Wed, 25 Oct 2023 10:02:17 -0400 X-MC-Unique: OONb5JzCNwG2661wPR2G4g-1 Received: by mail-ej1-f70.google.com with SMTP id a640c23a62f3a-9c7f0a33afbso288779466b.3 for ; Wed, 25 Oct 2023 07:02:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698242535; x=1698847335; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4bwGq/C+5APOsCqFrwpf0XoGoPuUJR4+QLSXHx9atCw=; b=BVqSpEhe0aFIPh7AmEbVylxglr3CeLwhFoXgbVy/fSUk+/SBgoM6s/b8/20fyFTWn3 tPehlekuqBBUu8TtiK0MPt4vI23k3A7+qXACHGm9QHIiO9uvSnnGoTMQ6JcZW+022GIN qk27rf79CJnt6HCTW5nTpzEuLk/uDr2A+LPHGebhL9vi5rugU9LTNTxsj8fWCugVgkw2 CKfqwHgeXw8i2ar/FZtcEn4PzA8Hsbe5T7xr5A/HbpA4WGabxP4sdReMIlW8ki+SKV36 AUhJQ3qLIaZpmyqGveoWJHm3bf5Gx3JOS2k3vPnEau2UuL77ZNdQIQMwiYvbtvrzDNUH e21Q== X-Gm-Message-State: AOJu0Yz+OWks9nAnxVf7cl42llE/ar0wTyU3aAvXUGcXKtlW+9l9UNmh zyaqcPTlgNCg13bZm6wpu0vCiVcIC9H4jgKqEzLR/3/kBfkBd0WMQWByZydtidZqmF3tKMjBMkU S8+CttpfaV/A6Z0N/n1KjgSJbNY1JCnzAd00YJES6MvO2DZCkrdzm01Jq1H85r3hynQgua/0IWd v3yS4rl5qHhQ== X-Received: by 2002:a17:906:3e54:b0:9c3:bb0e:d4c7 with SMTP id t20-20020a1709063e5400b009c3bb0ed4c7mr9498302eji.28.1698242535443; Wed, 25 Oct 2023 07:02:15 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFP25qBhJxT4qXiqjdZQb/HVco5wkgj/lhRNtDMYF4zMn2TOmKwTlFkjQyOsrNX64rUB3w+YQ== X-Received: by 2002:a17:906:3e54:b0:9c3:bb0e:d4c7 with SMTP id t20-20020a1709063e5400b009c3bb0ed4c7mr9498262eji.28.1698242534928; Wed, 25 Oct 2023 07:02:14 -0700 (PDT) Received: from maszat.piliscsaba.szeredi.hu (92-249-235-200.pool.digikabel.hu. [92.249.235.200]) by smtp.gmail.com with ESMTPSA id vl9-20020a170907b60900b00989828a42e8sm9857073ejc.154.2023.10.25.07.02.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Oct 2023 07:02:13 -0700 (PDT) From: Miklos Szeredi To: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-man@vger.kernel.org, linux-security-module@vger.kernel.org, Karel Zak , Ian Kent , David Howells , Linus Torvalds , Al Viro , Christian Brauner , Amir Goldstein , Matthew House , Florian Weimer , Arnd Bergmann Subject: [PATCH v4 5/6] add listmount(2) syscall Date: Wed, 25 Oct 2023 16:02:03 +0200 Message-ID: <20231025140205.3586473-6-mszeredi@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231025140205.3586473-1-mszeredi@redhat.com> References: <20231025140205.3586473-1-mszeredi@redhat.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add way to query the children of a particular mount. This is a more flexible way to iterate the mount tree than having to parse the complete /proc/self/mountinfo. Allow listing either - immediate child mounts only, or - recursively all descendant mounts (depth first). Lookup the mount by the new 64bit mount ID. If a mount needs to be queried based on path, then statx(2) can be used to first query the mount ID belonging to the path. Return an array of new (64bit) mount ID's. Without privileges only mounts are listed which are reachable from the task's root. Signed-off-by: Miklos Szeredi --- fs/namespace.c | 93 ++++++++++++++++++++++++++++++++++++++ include/linux/syscalls.h | 3 ++ include/uapi/linux/mount.h | 9 ++++ 3 files changed, 105 insertions(+) diff --git a/fs/namespace.c b/fs/namespace.c index a980c250a3a6..0afe2344bba6 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -4958,6 +4958,99 @@ SYSCALL_DEFINE4(statmount, const struct __mount_arg __user *, req, return ret; } +static struct mount *listmnt_first(struct mount *root) +{ + return list_first_entry_or_null(&root->mnt_mounts, struct mount, mnt_child); +} + +static struct mount *listmnt_next(struct mount *curr, struct mount *root, bool recurse) +{ + if (recurse) + return next_mnt(curr, root); + if (!list_is_head(curr->mnt_child.next, &root->mnt_mounts)) + return list_next_entry(curr, mnt_child); + return NULL; +} + +static long do_listmount(struct vfsmount *mnt, u64 __user *buf, size_t bufsize, + const struct path *root, unsigned int flags) +{ + struct mount *r, *m = real_mount(mnt); + struct path rootmnt = { + .mnt = root->mnt, + .dentry = root->mnt->mnt_root + }; + long ctr = 0; + bool reachable_only = true; + bool recurse = flags & LISTMOUNT_RECURSIVE; + int err; + + err = security_sb_statfs(mnt->mnt_root); + if (err) + return err; + + if (flags & LISTMOUNT_UNREACHABLE) { + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + reachable_only = false; + } + + if (reachable_only && !is_path_reachable(m, mnt->mnt_root, &rootmnt)) + return capable(CAP_SYS_ADMIN) ? 0 : -EPERM; + + for (r = listmnt_first(m); r; r = listmnt_next(r, m, recurse)) { + if (reachable_only && + !is_path_reachable(r, r->mnt.mnt_root, root)) + continue; + + if (ctr >= bufsize) + return -EOVERFLOW; + if (put_user(r->mnt_id_unique, buf + ctr)) + return -EFAULT; + ctr++; + if (ctr < 0) + return -ERANGE; + } + return ctr; +} + +SYSCALL_DEFINE4(listmount, const struct __mount_arg __user *, req, + u64 __user *, buf, size_t, bufsize, unsigned int, flags) +{ + struct __mount_arg kreq; + struct vfsmount *mnt; + struct path root; + u64 mnt_id; + long err; + + if (flags & ~(LISTMOUNT_UNREACHABLE | LISTMOUNT_RECURSIVE)) + return -EINVAL; + + if (copy_from_user(&kreq, req, sizeof(kreq))) + return -EFAULT; + mnt_id = kreq.mnt_id; + + down_read(&namespace_sem); + if (mnt_id == LSMT_ROOT) + mnt = ¤t->nsproxy->mnt_ns->root->mnt; + else + mnt = lookup_mnt_in_ns(mnt_id, current->nsproxy->mnt_ns); + + err = -ENOENT; + if (mnt) { + get_fs_root(current->fs, &root); + /* Skip unreachable for LSMT_ROOT */ + if (mnt_id == LSMT_ROOT && !(flags & LISTMOUNT_UNREACHABLE)) + mnt = root.mnt; + err = do_listmount(mnt, buf, bufsize, &root, flags); + path_put(&root); + } + up_read(&namespace_sem); + + return err; +} + + static void __init init_mount_tree(void) { struct vfsmount *mnt; diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index ba371024d902..38f3da7e04d1 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -413,6 +413,9 @@ asmlinkage long sys_fstatfs64(unsigned int fd, size_t sz, asmlinkage long sys_statmount(const struct __mount_arg __user *req, struct statmnt __user *buf, size_t bufsize, unsigned int flags); +asmlinkage long sys_listmount(const struct __mount_arg __user *req, + u64 __user *buf, size_t bufsize, + unsigned int flags); asmlinkage long sys_truncate(const char __user *path, long length); asmlinkage long sys_ftruncate(unsigned int fd, unsigned long length); #if BITS_PER_LONG == 32 diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h index d2c988ab526b..704c408cc662 100644 --- a/include/uapi/linux/mount.h +++ b/include/uapi/linux/mount.h @@ -194,4 +194,13 @@ struct __mount_arg { #define STMT_MNT_POINT 0x00000010U /* Want/got mnt_point */ #define STMT_FS_TYPE 0x00000020U /* Want/got fs_type */ +/* listmount(2) flags */ +#define LISTMOUNT_UNREACHABLE 0x01 /* List unreachable mounts too */ +#define LISTMOUNT_RECURSIVE 0x02 /* List a mount tree */ + +/* + * Special @mnt_id values that can be passed to listmount + */ +#define LSMT_ROOT 0xffffffffffffffff /* root mount */ + #endif /* _UAPI_LINUX_MOUNT_H */ From patchwork Wed Oct 25 14:02:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miklos Szeredi X-Patchwork-Id: 13436216 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 845D428E03 for ; Wed, 25 Oct 2023 14:02:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HDECItOP" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4CD0819F for ; Wed, 25 Oct 2023 07:02:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1698242540; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EC7RFMR3rA+1DRsmRuXrj1V+WaMGVpdP+bUvoE8xfhg=; b=HDECItOP9L2+uAH1MssFAO4S6q2z19xgzFF9BHpp9eSqTAO8hGMwTA7WU94CvfvF8SL7Cb lHjKer3/ZNKr7F+pOqpSlue2mNdXDVuPAhIKhVU2qOLvo5s+DUKgdEz/blT7IOzTR9VpMq lNt4GBx9vhwGWQctkU1x/m4GFSCUkVA= Received: from mail-ej1-f70.google.com (mail-ej1-f70.google.com [209.85.218.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-689-dbKmaiDBOfaz98eqyGaeug-1; Wed, 25 Oct 2023 10:02:18 -0400 X-MC-Unique: dbKmaiDBOfaz98eqyGaeug-1 Received: by mail-ej1-f70.google.com with SMTP id a640c23a62f3a-99c8bbc902eso359177766b.1 for ; Wed, 25 Oct 2023 07:02:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698242537; x=1698847337; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EC7RFMR3rA+1DRsmRuXrj1V+WaMGVpdP+bUvoE8xfhg=; b=avlPMUtiNDlAPwK8hrzg2n5hXDUC5C8MfKkCKlxE2Xczmf5CUVcma0K0FYAAPy0aHY refJxbnBXm1s1aL8v2Zn/GH/LFjCWPlYs1ZGGlf6YBpTI6FLxbIHMB8c2uDuEn72W98J /Vqck0gpKdgftr84WtkMrttJ+wT1DCy3mF4VKc+5DYflb19rklke18vPabI8ZQ0XIYXc Hfh6r/nMsMqpn6/LvBsvZfj8l2fOKUG6PD/gp5Q/95LIMCSo1z7wnVaOoS9kbPndI6K+ kpYeUHlWtoXZfh9CrCMxPMlRMFnVVAV8yTrOh8lZnXD59SHdc+W3C2lHmX/15p9NpUhI dv1A== X-Gm-Message-State: AOJu0Yw4YcW/MWfecgtokYrc5oZzNurLA4/QYy9VnoJk/1eLkOt3lCq+ e8ReK2Hq+RRO5DG7qwdAosa70EJMaXJynq+Bz4qgEx+MU8FbydJNKBPUlxNrsqjmmhoQRx2fVJr VWM1YWaebtk42lHUp42Gcw7/nqsXPtAT40ppdkyEw31FMfMQKtUU7TmO/lJzdCkzpnN2yzmtSGe xEKmfos3paJA== X-Received: by 2002:a17:907:6093:b0:9ba:2f20:3d7b with SMTP id ht19-20020a170907609300b009ba2f203d7bmr13334970ejc.71.1698242537127; Wed, 25 Oct 2023 07:02:17 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEjMjmoMNoYOpgsb76Zi6H4tYS6A+eLzcgNpATF2jXSDinHL2btMGhhvP35HJjX0s8ljGUJzA== X-Received: by 2002:a17:907:6093:b0:9ba:2f20:3d7b with SMTP id ht19-20020a170907609300b009ba2f203d7bmr13334928ejc.71.1698242536775; Wed, 25 Oct 2023 07:02:16 -0700 (PDT) Received: from maszat.piliscsaba.szeredi.hu (92-249-235-200.pool.digikabel.hu. [92.249.235.200]) by smtp.gmail.com with ESMTPSA id vl9-20020a170907b60900b00989828a42e8sm9857073ejc.154.2023.10.25.07.02.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Oct 2023 07:02:15 -0700 (PDT) From: Miklos Szeredi To: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-man@vger.kernel.org, linux-security-module@vger.kernel.org, Karel Zak , Ian Kent , David Howells , Linus Torvalds , Al Viro , Christian Brauner , Amir Goldstein , Matthew House , Florian Weimer , Arnd Bergmann Subject: [PATCH v4 6/6] wire up syscalls for statmount/listmount Date: Wed, 25 Oct 2023 16:02:04 +0200 Message-ID: <20231025140205.3586473-7-mszeredi@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231025140205.3586473-1-mszeredi@redhat.com> References: <20231025140205.3586473-1-mszeredi@redhat.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Wire up all archs. Signed-off-by: Miklos Szeredi --- arch/alpha/kernel/syscalls/syscall.tbl | 3 +++ arch/arm/tools/syscall.tbl | 3 +++ arch/arm64/include/asm/unistd32.h | 4 ++++ arch/ia64/kernel/syscalls/syscall.tbl | 3 +++ arch/m68k/kernel/syscalls/syscall.tbl | 3 +++ arch/microblaze/kernel/syscalls/syscall.tbl | 3 +++ arch/mips/kernel/syscalls/syscall_n32.tbl | 3 +++ arch/mips/kernel/syscalls/syscall_n64.tbl | 3 +++ arch/mips/kernel/syscalls/syscall_o32.tbl | 3 +++ arch/parisc/kernel/syscalls/syscall.tbl | 3 +++ arch/powerpc/kernel/syscalls/syscall.tbl | 3 +++ arch/s390/kernel/syscalls/syscall.tbl | 3 +++ arch/sh/kernel/syscalls/syscall.tbl | 3 +++ arch/sparc/kernel/syscalls/syscall.tbl | 3 +++ arch/x86/entry/syscalls/syscall_32.tbl | 3 +++ arch/x86/entry/syscalls/syscall_64.tbl | 2 ++ arch/xtensa/kernel/syscalls/syscall.tbl | 3 +++ include/uapi/asm-generic/unistd.h | 8 +++++++- 18 files changed, 58 insertions(+), 1 deletion(-) diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index ad37569d0507..6c23bf68eff0 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -492,3 +492,6 @@ 560 common set_mempolicy_home_node sys_ni_syscall 561 common cachestat sys_cachestat 562 common fchmodat2 sys_fchmodat2 +# 563 reserved for map_shadow_stack +564 common statmount sys_statmount +565 common listmount sys_listmount diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index c572d6c3dee0..d110a832899b 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -466,3 +466,6 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat 452 common fchmodat2 sys_fchmodat2 +# 453 reserved for map_shadow_stack +454 common statmount sys_statmount +455 common listmount sys_listmount diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 78b68311ec81..b0a994c9ff3c 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -911,6 +911,10 @@ __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node) __SYSCALL(__NR_cachestat, sys_cachestat) #define __NR_fchmodat2 452 __SYSCALL(__NR_fchmodat2, sys_fchmodat2) +#define __NR_statmount 454 +__SYSCALL(__NR_statmount, sys_statmount) +#define __NR_listmount 455 +__SYSCALL(__NR_listmount, sys_listmount) /* * Please add new compat syscalls above this comment and update diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index 83d8609aec03..c5f45a8fc834 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -373,3 +373,6 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat 452 common fchmodat2 sys_fchmodat2 +# 453 reserved for map_shadow_stack +454 common statmount sys_statmount +455 common listmount sys_listmount diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index 259ceb125367..b9cabb746487 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -452,3 +452,6 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat 452 common fchmodat2 sys_fchmodat2 +# 453 reserved for map_shadow_stack +454 common statmount sys_statmount +455 common listmount sys_listmount diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index a3798c2637fd..89c4ed548ce8 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -458,3 +458,6 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat 452 common fchmodat2 sys_fchmodat2 +# 453 reserved for map_shadow_stack +454 common statmount sys_statmount +455 common listmount sys_listmount diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index 152034b8e0a0..a9d561698fe2 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -391,3 +391,6 @@ 450 n32 set_mempolicy_home_node sys_set_mempolicy_home_node 451 n32 cachestat sys_cachestat 452 n32 fchmodat2 sys_fchmodat2 +# 453 reserved for map_shadow_stack +454 n32 statmount sys_statmount +455 n32 listmount sys_listmount diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl index cb5e757f6621..80a056866da7 100644 --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -367,3 +367,6 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 n64 cachestat sys_cachestat 452 n64 fchmodat2 sys_fchmodat2 +# 453 reserved for map_shadow_stack +454 n64 statmount sys_statmount +455 n64 listmount sys_listmount diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 1a646813afdc..139ddc691176 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -440,3 +440,6 @@ 450 o32 set_mempolicy_home_node sys_set_mempolicy_home_node 451 o32 cachestat sys_cachestat 452 o32 fchmodat2 sys_fchmodat2 +# 453 reserved for map_shadow_stack +454 o32 statmount sys_statmount +455 o32 listmount sys_listmount diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index e97c175b56f9..46fa753f0d64 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -451,3 +451,6 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat 452 common fchmodat2 sys_fchmodat2 +# 453 reserved for map_shadow_stack +454 common statmount sys_statmount +455 common listmount sys_listmount diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 20e50586e8a2..106937744525 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -539,3 +539,6 @@ 450 nospu set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat 452 common fchmodat2 sys_fchmodat2 +# 453 reserved for map_shadow_stack +454 common statmount sys_statmount +455 common listmount sys_listmount diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 0122cc156952..57dd9202f8d7 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -455,3 +455,6 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat sys_cachestat 452 common fchmodat2 sys_fchmodat2 sys_fchmodat2 +# 453 reserved for map_shadow_stack +454 common statmount sys_statmount sys_statmount +455 common listmount sys_listmount sys_listmount diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index e90d585c4d3e..0c7407a0e32c 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -455,3 +455,6 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat 452 common fchmodat2 sys_fchmodat2 +# 453 reserved for map_shadow_stack +454 common statmount sys_statmount +455 common listmount sys_listmount diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 4ed06c71c43f..a0fd36469478 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -498,3 +498,6 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat 452 common fchmodat2 sys_fchmodat2 +# 453 reserved for map_shadow_stack +454 common statmount sys_statmount +455 common listmount sys_listmount diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 2d0b1bd866ea..4f41977bd4c1 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -457,3 +457,6 @@ 450 i386 set_mempolicy_home_node sys_set_mempolicy_home_node 451 i386 cachestat sys_cachestat 452 i386 fchmodat2 sys_fchmodat2 +# 563 reserved for map_shadow_stack +454 i386 statmount sys_statmount +455 i386 listmount sys_listmount diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 1d6eee30eceb..a1b3ce7d22cc 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -375,6 +375,8 @@ 451 common cachestat sys_cachestat 452 common fchmodat2 sys_fchmodat2 453 64 map_shadow_stack sys_map_shadow_stack +454 common statmount sys_statmount +455 common listmount sys_listmount # # Due to a historical design error, certain syscalls are numbered differently diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index fc1a4f3c81d9..73378984702b 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -423,3 +423,6 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat 452 common fchmodat2 sys_fchmodat2 +# 453 reserved for map_shadow_stack +454 common statmount sys_statmount +455 common listmount sys_listmount diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index abe087c53b4b..8df6a747e21a 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -823,8 +823,14 @@ __SYSCALL(__NR_cachestat, sys_cachestat) #define __NR_fchmodat2 452 __SYSCALL(__NR_fchmodat2, sys_fchmodat2) +#define __NR_statmount 454 +__SYSCALL(__NR_statmount, sys_statmount) + +#define __NR_listmount 455 +__SYSCALL(__NR_listmount, sys_listmount) + #undef __NR_syscalls -#define __NR_syscalls 453 +#define __NR_syscalls 456 /* * 32 bit systems traditionally used different