Message ID | 20230928130147.564503-2-mszeredi@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | querying mount attributes | expand |
On Thu, 28 Sept 2023 at 15:03, Miklos Szeredi <mszeredi@redhat.com> wrote: > > If a mount is released then its mnt_id can immediately be reused. This is > bad news for user interfaces that want to uniquely identify a mount. > > Implementing a unique mount ID is trivial (use a 64bit counter). > Unfortunately userspace assumes 32bit size and would overflow after the > counter reaches 2^32. > > Introduce a new 64bit ID alongside the old one. Initialize the counter to > 2^32, this guarantees that the old and new IDs are never mixed up. It occurred to me that it might make sense to make this counter per-namespace. That would allow more separation between namespaces, like preventing the observation of mount creations in other namespaces. Does a global number make any sense? Thanks, Miklos
On Thu, Oct 5, 2023 at 6:52 PM Miklos Szeredi <miklos@szeredi.hu> wrote: > > On Thu, 28 Sept 2023 at 15:03, Miklos Szeredi <mszeredi@redhat.com> wrote: > > > > If a mount is released then its mnt_id can immediately be reused. This is > > bad news for user interfaces that want to uniquely identify a mount. > > > > Implementing a unique mount ID is trivial (use a 64bit counter). > > Unfortunately userspace assumes 32bit size and would overflow after the > > counter reaches 2^32. > > > > Introduce a new 64bit ID alongside the old one. Initialize the counter to > > 2^32, this guarantees that the old and new IDs are never mixed up. > > It occurred to me that it might make sense to make this counter > per-namespace. That would allow more separation between namespaces, > like preventing the observation of mount creations in other > namespaces. > Preventing the observation of mount creations in other mount namespaces is independent of whether a global mntid namespace is used. > Does a global number make any sense? > I think global mntid namepsace makes notifications API a lot easier. A process (e.g. systemd) may set marks to watch new mounts on different mount namespaces. If mntid could collide in different mount namepsaces, we will either need to describe the mount namespace in the event or worse, map child mount namespace mntid to parent mount namespace like with uids. If we use a global mntid namespace, multi mount namespace watching becomes much much easier. Regarding the possible scopes for watching added/removed mounts we could support: - watch parent mount for children mounts (akin to inotify directory watch) - watch all mounts of a filesystem - watch all mounts in mount namespace - watch on entire system Not sure which of the above we will end up implementing, but the first two can use existing fanotify mount/sb marks. Thanks, Amir.
On Fri, 6 Oct 2023 at 13:44, Amir Goldstein <amir73il@gmail.com> wrote: > > On Thu, Oct 5, 2023 at 6:52 PM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > On Thu, 28 Sept 2023 at 15:03, Miklos Szeredi <mszeredi@redhat.com> wrote: > > > > > > If a mount is released then its mnt_id can immediately be reused. This is > > > bad news for user interfaces that want to uniquely identify a mount. > > > > > > Implementing a unique mount ID is trivial (use a 64bit counter). > > > Unfortunately userspace assumes 32bit size and would overflow after the > > > counter reaches 2^32. > > > > > > Introduce a new 64bit ID alongside the old one. Initialize the counter to > > > 2^32, this guarantees that the old and new IDs are never mixed up. > > > > It occurred to me that it might make sense to make this counter > > per-namespace. That would allow more separation between namespaces, > > like preventing the observation of mount creations in other > > namespaces. > > > > Preventing the observation of mount creations in other mount namespaces > is independent of whether a global mntid namespace is used. A local ID space makes observation of mount creations in other namespaces impossible. While a global ID space does not. ID allocation could be designed in a way that makes observation impossible, but that basically boils down to allocating separate ranges to each namespace, which is equivalent to identifying mounts by an {NSID, MNTID} pair. > > > Does a global number make any sense? > > > > I think global mntid namepsace makes notifications API a lot easier. > A process (e.g. systemd) may set marks to watch new mounts on > different mount namespaces. > > If mntid could collide in different mount namepsaces, we will either > need to describe the mount namespace in the event or worse, > map child mount namespace mntid to parent mount namespace > like with uids. Mntids are quite different from uids in that a certain ID is only valid in a certain namespace. There's no inheritance or sharing of mounts. Also mounts cannot be moved from one namespace to another (with the exception of mounts created in an anonymous namespace). > If we use a global mntid namespace, multi mount namespace > watching becomes much much easier. If the watcher wants to interpret the received event it will have to know the namespace anyway. Otherwise it's just a meaningless number. > Regarding the possible scopes for watching added/removed mounts > we could support: > - watch parent mount for children mounts (akin to inotify directory watch) This probably makes sense. > - watch all mounts of a filesystem I don't see a use case. > - watch all mounts in mount namespace Yes. > - watch on entire system Again, I don't see how this could make sense. Each time unshare(CLONE_NEWNS) is invoked a storm of mount creation events will happen. I'd imagine that watching mounts is primarily to complement the directory tree notifications, so that all changes to the path lookup namespace of a process could be monitored. Thanks, Miklos
diff --git a/fs/mount.h b/fs/mount.h index 130c07c2f8d2..a14f762b3f29 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -72,7 +72,8 @@ struct mount { struct fsnotify_mark_connector __rcu *mnt_fsnotify_marks; __u32 mnt_fsnotify_mask; #endif - int mnt_id; /* mount identifier */ + int mnt_id; /* mount identifier, reused */ + u64 mnt_id_unique; /* mount ID unique until reboot */ int mnt_group_id; /* peer group identifier */ int mnt_expiry_mark; /* true if marked for expiry */ struct hlist_head mnt_pins; diff --git a/fs/namespace.c b/fs/namespace.c index e157efc54023..e02bc5f41c7b 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -68,6 +68,9 @@ static u64 event; static DEFINE_IDA(mnt_id_ida); static DEFINE_IDA(mnt_group_ida); +/* Don't allow confusion with old 32bit mount ID */ +static atomic64_t mnt_id_ctr = ATOMIC64_INIT(1ULL << 32); + static struct hlist_head *mount_hashtable __read_mostly; static struct hlist_head *mountpoint_hashtable __read_mostly; static struct kmem_cache *mnt_cache __read_mostly; @@ -131,6 +134,7 @@ static int mnt_alloc_id(struct mount *mnt) if (res < 0) return res; mnt->mnt_id = res; + mnt->mnt_id_unique = atomic64_inc_return(&mnt_id_ctr); return 0; } diff --git a/fs/stat.c b/fs/stat.c index 6e60389d6a15..e61e0172e191 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -280,8 +280,13 @@ static int vfs_statx(int dfd, struct filename *filename, int flags, error = vfs_getattr(&path, stat, request_mask, flags); - stat->mnt_id = real_mount(path.mnt)->mnt_id; - stat->result_mask |= STATX_MNT_ID; + if (request_mask & STATX_MNT_ID_UNIQUE) { + stat->mnt_id = real_mount(path.mnt)->mnt_id_unique; + stat->result_mask |= STATX_MNT_ID_UNIQUE; + } else { + stat->mnt_id = real_mount(path.mnt)->mnt_id; + stat->result_mask |= STATX_MNT_ID; + } if (path.mnt->mnt_root == path.dentry) stat->attributes |= STATX_ATTR_MOUNT_ROOT; diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h index 7cab2c65d3d7..2f2ee82d5517 100644 --- a/include/uapi/linux/stat.h +++ b/include/uapi/linux/stat.h @@ -154,6 +154,7 @@ struct statx { #define STATX_BTIME 0x00000800U /* Want/got stx_btime */ #define STATX_MNT_ID 0x00001000U /* Got stx_mnt_id */ #define STATX_DIOALIGN 0x00002000U /* Want/got direct I/O alignment info */ +#define STATX_MNT_ID_UNIQUE 0x00004000U /* Want/got extended stx_mount_id */ #define STATX__RESERVED 0x80000000U /* Reserved for future struct statx expansion */
If a mount is released then its mnt_id can immediately be reused. This is bad news for user interfaces that want to uniquely identify a mount. Implementing a unique mount ID is trivial (use a 64bit counter). Unfortunately userspace assumes 32bit size and would overflow after the counter reaches 2^32. Introduce a new 64bit ID alongside the old one. Initialize the counter to 2^32, this guarantees that the old and new IDs are never mixed up. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> --- fs/mount.h | 3 ++- fs/namespace.c | 4 ++++ fs/stat.c | 9 +++++++-- include/uapi/linux/stat.h | 1 + 4 files changed, 14 insertions(+), 3 deletions(-)