diff mbox series

[v3,1/4] add unique mount ID

Message ID 20230928130147.564503-2-mszeredi@redhat.com (mailing list archive)
State Handled Elsewhere
Delegated to: Paul Moore
Headers show
Series querying mount attributes | expand

Commit Message

Miklos Szeredi Sept. 28, 2023, 1:01 p.m. UTC
If a mount is released then its mnt_id can immediately be reused.  This is
bad news for user interfaces that want to uniquely identify a mount.

Implementing a unique mount ID is trivial (use a 64bit counter).
Unfortunately userspace assumes 32bit size and would overflow after the
counter reaches 2^32.

Introduce a new 64bit ID alongside the old one.  Initialize the counter to
2^32, this guarantees that the old and new IDs are never mixed up.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/mount.h                | 3 ++-
 fs/namespace.c            | 4 ++++
 fs/stat.c                 | 9 +++++++--
 include/uapi/linux/stat.h | 1 +
 4 files changed, 14 insertions(+), 3 deletions(-)

Comments

Miklos Szeredi Oct. 5, 2023, 3:52 p.m. UTC | #1
On Thu, 28 Sept 2023 at 15:03, Miklos Szeredi <mszeredi@redhat.com> wrote:
>
> If a mount is released then its mnt_id can immediately be reused.  This is
> bad news for user interfaces that want to uniquely identify a mount.
>
> Implementing a unique mount ID is trivial (use a 64bit counter).
> Unfortunately userspace assumes 32bit size and would overflow after the
> counter reaches 2^32.
>
> Introduce a new 64bit ID alongside the old one.  Initialize the counter to
> 2^32, this guarantees that the old and new IDs are never mixed up.

It occurred to me that it might make sense to make this counter
per-namespace.  That would allow more separation between namespaces,
like preventing the observation of mount creations in other
namespaces.

Does a global number make any sense?

Thanks,
Miklos
Amir Goldstein Oct. 6, 2023, 11:44 a.m. UTC | #2
On Thu, Oct 5, 2023 at 6:52 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, 28 Sept 2023 at 15:03, Miklos Szeredi <mszeredi@redhat.com> wrote:
> >
> > If a mount is released then its mnt_id can immediately be reused.  This is
> > bad news for user interfaces that want to uniquely identify a mount.
> >
> > Implementing a unique mount ID is trivial (use a 64bit counter).
> > Unfortunately userspace assumes 32bit size and would overflow after the
> > counter reaches 2^32.
> >
> > Introduce a new 64bit ID alongside the old one.  Initialize the counter to
> > 2^32, this guarantees that the old and new IDs are never mixed up.
>
> It occurred to me that it might make sense to make this counter
> per-namespace.  That would allow more separation between namespaces,
> like preventing the observation of mount creations in other
> namespaces.
>

Preventing the observation of mount creations in other mount namespaces
is independent of whether a global mntid namespace is used.

> Does a global number make any sense?
>

I think global mntid namepsace makes notifications API a lot easier.
A process (e.g. systemd) may set marks to watch new mounts on
different mount namespaces.

If mntid could collide in different mount namepsaces, we will either
need to describe the mount namespace in the event or worse,
map child mount namespace mntid to parent mount namespace
like with uids.

If we use a global mntid namespace, multi mount namespace
watching becomes much much easier.

Regarding the possible scopes for watching added/removed mounts
we could support:
- watch parent mount for children mounts (akin to inotify directory watch)
- watch all mounts of a filesystem
- watch all mounts in mount namespace
- watch on entire system

Not sure which of the above we will end up implementing, but the
first two can use existing fanotify mount/sb marks.

Thanks,
Amir.
Miklos Szeredi Oct. 6, 2023, 12:48 p.m. UTC | #3
On Fri, 6 Oct 2023 at 13:44, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Oct 5, 2023 at 6:52 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Thu, 28 Sept 2023 at 15:03, Miklos Szeredi <mszeredi@redhat.com> wrote:
> > >
> > > If a mount is released then its mnt_id can immediately be reused.  This is
> > > bad news for user interfaces that want to uniquely identify a mount.
> > >
> > > Implementing a unique mount ID is trivial (use a 64bit counter).
> > > Unfortunately userspace assumes 32bit size and would overflow after the
> > > counter reaches 2^32.
> > >
> > > Introduce a new 64bit ID alongside the old one.  Initialize the counter to
> > > 2^32, this guarantees that the old and new IDs are never mixed up.
> >
> > It occurred to me that it might make sense to make this counter
> > per-namespace.  That would allow more separation between namespaces,
> > like preventing the observation of mount creations in other
> > namespaces.
> >
>
> Preventing the observation of mount creations in other mount namespaces
> is independent of whether a global mntid namespace is used.

A local ID space makes observation of mount creations in other
namespaces impossible.  While a global ID space does not.  ID
allocation could be designed in a way that makes observation
impossible, but that basically boils down to allocating separate
ranges to each namespace, which is equivalent to identifying mounts by
an {NSID, MNTID} pair.

>
> > Does a global number make any sense?
> >
>
> I think global mntid namepsace makes notifications API a lot easier.
> A process (e.g. systemd) may set marks to watch new mounts on
> different mount namespaces.
>
> If mntid could collide in different mount namepsaces, we will either
> need to describe the mount namespace in the event or worse,
> map child mount namespace mntid to parent mount namespace
> like with uids.

Mntids are quite different from uids in that a certain ID is only
valid in a certain namespace.   There's no inheritance or sharing of
mounts.  Also mounts cannot be moved from one namespace to another
(with the exception of mounts created in an anonymous namespace).

> If we use a global mntid namespace, multi mount namespace
> watching becomes much much easier.

If the watcher wants to interpret the received event it will have to
know the namespace anyway.  Otherwise it's just a meaningless number.

> Regarding the possible scopes for watching added/removed mounts
> we could support:
> - watch parent mount for children mounts (akin to inotify directory watch)

This probably makes sense.

> - watch all mounts of a filesystem

I don't see a use case.

> - watch all mounts in mount namespace

Yes.

> - watch on entire system

Again, I don't see how this could make sense.  Each time
unshare(CLONE_NEWNS) is invoked a storm of mount creation events will
happen.

I'd imagine that watching mounts is primarily to complement the
directory tree notifications, so that all changes to the path lookup
namespace of a process could be monitored.

Thanks,
Miklos
diff mbox series

Patch

diff --git a/fs/mount.h b/fs/mount.h
index 130c07c2f8d2..a14f762b3f29 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -72,7 +72,8 @@  struct mount {
 	struct fsnotify_mark_connector __rcu *mnt_fsnotify_marks;
 	__u32 mnt_fsnotify_mask;
 #endif
-	int mnt_id;			/* mount identifier */
+	int mnt_id;			/* mount identifier, reused */
+	u64 mnt_id_unique;		/* mount ID unique until reboot */
 	int mnt_group_id;		/* peer group identifier */
 	int mnt_expiry_mark;		/* true if marked for expiry */
 	struct hlist_head mnt_pins;
diff --git a/fs/namespace.c b/fs/namespace.c
index e157efc54023..e02bc5f41c7b 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -68,6 +68,9 @@  static u64 event;
 static DEFINE_IDA(mnt_id_ida);
 static DEFINE_IDA(mnt_group_ida);
 
+/* Don't allow confusion with old 32bit mount ID */
+static atomic64_t mnt_id_ctr = ATOMIC64_INIT(1ULL << 32);
+
 static struct hlist_head *mount_hashtable __read_mostly;
 static struct hlist_head *mountpoint_hashtable __read_mostly;
 static struct kmem_cache *mnt_cache __read_mostly;
@@ -131,6 +134,7 @@  static int mnt_alloc_id(struct mount *mnt)
 	if (res < 0)
 		return res;
 	mnt->mnt_id = res;
+	mnt->mnt_id_unique = atomic64_inc_return(&mnt_id_ctr);
 	return 0;
 }
 
diff --git a/fs/stat.c b/fs/stat.c
index 6e60389d6a15..e61e0172e191 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -280,8 +280,13 @@  static int vfs_statx(int dfd, struct filename *filename, int flags,
 
 	error = vfs_getattr(&path, stat, request_mask, flags);
 
-	stat->mnt_id = real_mount(path.mnt)->mnt_id;
-	stat->result_mask |= STATX_MNT_ID;
+	if (request_mask & STATX_MNT_ID_UNIQUE) {
+		stat->mnt_id = real_mount(path.mnt)->mnt_id_unique;
+		stat->result_mask |= STATX_MNT_ID_UNIQUE;
+	} else {
+		stat->mnt_id = real_mount(path.mnt)->mnt_id;
+		stat->result_mask |= STATX_MNT_ID;
+	}
 
 	if (path.mnt->mnt_root == path.dentry)
 		stat->attributes |= STATX_ATTR_MOUNT_ROOT;
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index 7cab2c65d3d7..2f2ee82d5517 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -154,6 +154,7 @@  struct statx {
 #define STATX_BTIME		0x00000800U	/* Want/got stx_btime */
 #define STATX_MNT_ID		0x00001000U	/* Got stx_mnt_id */
 #define STATX_DIOALIGN		0x00002000U	/* Want/got direct I/O alignment info */
+#define STATX_MNT_ID_UNIQUE	0x00004000U	/* Want/got extended stx_mount_id */
 
 #define STATX__RESERVED		0x80000000U	/* Reserved for future struct statx expansion */