Message ID | 159646187082.1784947.4293611877413578847.stgit@warthog.procyon.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | VFS: Filesystem information [ver #21] | expand |
On Mon, Aug 03, 2020 at 02:37:50PM +0100, David Howells wrote: > Provide support for the handling of an overrun in a watch queue. In the > event that an overrun occurs, the watcher needs to be able to find out what > it was that they missed. To this end, previous patches added event > counters to struct mount. So this is optimizing the buffer overrun case? Shoun't we just make sure that the likelyhood of overruns is low and if it happens, just reinitialize everthing from scratch (shouldn't be *that* expensive). Trying to find out what was missed seems like just adding complexity for no good reason.
On Tue, 2020-08-04 at 15:56 +0200, Miklos Szeredi wrote: > On Mon, Aug 03, 2020 at 02:37:50PM +0100, David Howells wrote: > > Provide support for the handling of an overrun in a watch > > queue. In the > > event that an overrun occurs, the watcher needs to be able to find > > out what > > it was that they missed. To this end, previous patches added event > > counters to struct mount. > > So this is optimizing the buffer overrun case? > > Shoun't we just make sure that the likelyhood of overruns is low and > if it > happens, just reinitialize everthing from scratch (shouldn't be > *that* > expensive). But maybe not possible if you are using notifications for tracking state in user space, you need to know when the thing you have needs to be synced because you missed something and it's during the notification processing you actually have the object that may need to be refreshed. > > Trying to find out what was missed seems like just adding complexity > for no good > reason. >
On Wed, 2020-08-05 at 10:05 +0800, Ian Kent wrote: > On Tue, 2020-08-04 at 15:56 +0200, Miklos Szeredi wrote: > > On Mon, Aug 03, 2020 at 02:37:50PM +0100, David Howells wrote: > > > Provide support for the handling of an overrun in a watch > > > queue. In the > > > event that an overrun occurs, the watcher needs to be able to > > > find > > > out what > > > it was that they missed. To this end, previous patches added > > > event > > > counters to struct mount. > > > > So this is optimizing the buffer overrun case? > > > > Shoun't we just make sure that the likelyhood of overruns is low > > and > > if it > > happens, just reinitialize everthing from scratch (shouldn't be > > *that* > > expensive). > > But maybe not possible if you are using notifications for tracking > state in user space, you need to know when the thing you have needs > to be synced because you missed something and it's during the > notification processing you actually have the object that may need > to be refreshed. > > > Trying to find out what was missed seems like just adding > > complexity > > for no good > > reason. Coming back to an actual use case. What I said above is one aspect but, since I'm looking at this right now with systemd, and I do have the legacy code to fall back to, the "just reset everything" suggestion does make sense. But I'm struggling to see how I can identify notification buffer overrun in libmount, and overrun is just one possibility for lost notifications, so I like the idea that, as a library user, I can work out that I need to take action based on what I have in the notifications themselves. Ian
On Wed, Aug 5, 2020 at 4:46 AM Ian Kent <raven@themaw.net> wrote: > > Coming back to an actual use case. > > What I said above is one aspect but, since I'm looking at this right > now with systemd, and I do have the legacy code to fall back to, the > "just reset everything" suggestion does make sense. > > But I'm struggling to see how I can identify notification buffer > overrun in libmount, and overrun is just one possibility for lost > notifications, so I like the idea that, as a library user, I can > work out that I need to take action based on what I have in the > notifications themselves. Hmm, what's the other possibility for lost notifications? Thanks, Miklos
On Wed, 2020-08-05 at 09:45 +0200, Miklos Szeredi wrote: > On Wed, Aug 5, 2020 at 4:46 AM Ian Kent <raven@themaw.net> wrote: > > Coming back to an actual use case. > > > > What I said above is one aspect but, since I'm looking at this > > right > > now with systemd, and I do have the legacy code to fall back to, > > the > > "just reset everything" suggestion does make sense. > > > > But I'm struggling to see how I can identify notification buffer > > overrun in libmount, and overrun is just one possibility for lost > > notifications, so I like the idea that, as a library user, I can > > work out that I need to take action based on what I have in the > > notifications themselves. > > Hmm, what's the other possibility for lost notifications? In user space that is: Multi-threaded application races, single threaded applications and signal processing races, other bugs ... For example systemd has it's own event handling sub-system and handles half a dozen or so event types of which the mount changes are one. It's fairly complex so I find myself wondering if I can trust it and wondering if there are undiscovered bugs in it. The answer to the former is probably yes but the answer to the later is also probably yes. Maybe I just paranoid! Ian
On Wed, Aug 5, 2020 at 1:23 PM Ian Kent <raven@themaw.net> wrote: > > On Wed, 2020-08-05 at 09:45 +0200, Miklos Szeredi wrote: > > Hmm, what's the other possibility for lost notifications? > > In user space that is: > > Multi-threaded application races, single threaded applications and > signal processing races, other bugs ... Okay, let's fix the bugs then. Thanks, Miklos
Miklos Szeredi <miklos@szeredi.hu> wrote: > Shoun't we just make sure that the likelyhood of overruns is low That's not necessarily easy. To avoid overruns you need a bigger buffer. The buffer is preallocated from unswappable kernel space. Yes, you can increase the size of the buffer, but it eats out of your pipe bufferage limit. Further, it's a *general* notifications queue, not just for a specific purpose, but that means it might get connected to multiple sources, and doing something like tearing down a container might generate enough notifications to overrun the queue. > and if it happens, just reinitialize everthing from scratch (shouldn't be > *that* expensive). If you then spend time reinitialising everything, you're increasing the likelihood of racing with further events. Further, there multiple expenses: firstly, you have to tear down and discard all the data that you've spent time setting up; secondly, it takes time doing all this; thirdly, it takes cpu cycles away from applications. The reason I put the event counters in there and made it so that fsinfo() could read all the mounts in a subtree and their event counters in one go is to make it faster for the user to find out what changed in the event that a notification is lost. I have a patch (not included here as it occasionally induces oopses) that attempts to make this doable under the RCU read lock so that it doesn't prevent mounts from taking place during the scan. David
On Wed, Aug 5, 2020 at 6:07 PM David Howells <dhowells@redhat.com> wrote: > > Miklos Szeredi <miklos@szeredi.hu> wrote: > > > Shoun't we just make sure that the likelyhood of overruns is low > > That's not necessarily easy. To avoid overruns you need a bigger buffer. The > buffer is preallocated from unswappable kernel space. Yes, you can increase > the size of the buffer, but it eats out of your pipe bufferage limit. > > Further, it's a *general* notifications queue, not just for a specific > purpose, but that means it might get connected to multiple sources, and doing > something like tearing down a container might generate enough notifications to > overrun the queue. > > > and if it happens, just reinitialize everthing from scratch (shouldn't be > > *that* expensive). > > If you then spend time reinitialising everything, you're increasing the > likelihood of racing with further events. Further, there multiple expenses: > firstly, you have to tear down and discard all the data that you've spent time > setting up; secondly, it takes time doing all this; thirdly, it takes cpu > cycles away from applications. > > The reason I put the event counters in there and made it so that fsinfo() > could read all the mounts in a subtree and their event counters in one go is > to make it faster for the user to find out what changed in the event that a > notification is lost. That's just overdesigning it, IMO. If the protocol is extensible (as you state) then the counters can be added as needed. And unless the above CPU cycle wastage is actually observed in practice, the whole thing is unnecessary. Thanks, Miklos
On Wed, 2020-08-05 at 13:27 +0200, Miklos Szeredi wrote: > On Wed, Aug 5, 2020 at 1:23 PM Ian Kent <raven@themaw.net> wrote: > > On Wed, 2020-08-05 at 09:45 +0200, Miklos Szeredi wrote: > > > Hmm, what's the other possibility for lost notifications? > > > > In user space that is: > > > > Multi-threaded application races, single threaded applications and > > signal processing races, other bugs ... > > Okay, let's fix the bugs then. It's the the bugs you don't know about that get you, in this case the world "is" actually out to get you, ;) Ian
diff --git a/fs/mount_notify.c b/fs/mount_notify.c index 57eebae51cb1..57995c27ca88 100644 --- a/fs/mount_notify.c +++ b/fs/mount_notify.c @@ -93,6 +93,8 @@ void notify_mount(struct mount *trigger, n.watch.info = info_flags | watch_sizeof(n); n.triggered_on = trigger->mnt_unique_id; + smp_wmb(); /* See fsinfo_generic_mount_info(). */ + switch (subtype) { case NOTIFY_MOUNT_EXPIRY: case NOTIFY_MOUNT_READONLY: diff --git a/fs/namespace.c b/fs/namespace.c index b5c2a3b4f96d..122c12f9512b 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -4282,6 +4282,17 @@ int fsinfo_generic_mount_info(struct path *path, struct fsinfo_context *ctx) p->mnt_unique_id = m->mnt_unique_id; p->mnt_id = m->mnt_id; +#ifdef CONFIG_MOUNT_NOTIFICATIONS + p->mnt_subtree_notifications = atomic_long_read(&m->mnt_subtree_notifications); + p->mnt_topology_changes = atomic_long_read(&m->mnt_topology_changes); + p->mnt_attr_changes = atomic_long_read(&m->mnt_attr_changes); +#endif + + /* Record the counters before reading the attributes as we're not + * holding a lock. Paired with a write barrier in notify_mount(). + */ + smp_rmb(); + flags = READ_ONCE(m->mnt.mnt_flags); if (flags & MNT_READONLY) p->attr |= MOUNT_ATTR_RDONLY; @@ -4319,6 +4330,9 @@ int fsinfo_generic_mount_topology(struct path *path, struct fsinfo_context *ctx) m = real_mount(path->mnt); +#ifdef CONFIG_MOUNT_NOTIFICATIONS + p->mnt_topology_changes = atomic_long_read(&m->mnt_topology_changes); +#endif p->parent_id = m->mnt_parent->mnt_id; if (path->mnt == root.mnt) { @@ -4445,6 +4459,13 @@ static void fsinfo_store_mount(struct fsinfo_context *ctx, const struct mount *p record.mnt_unique_id = p->mnt_unique_id; record.mnt_id = p->mnt_id; record.parent_id = is_root ? p->mnt_id : p->mnt_parent->mnt_id; + +#ifdef CONFIG_MOUNT_NOTIFICATIONS + record.mnt_notify_sum = (atomic_long_read(&p->mnt_attr_changes) + + atomic_long_read(&p->mnt_topology_changes) + + atomic_long_read(&p->mnt_subtree_notifications)); +#endif + memcpy(ctx->buffer + usage, &record, sizeof(record)); } diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h index f0a352b7028e..b021466dee0f 100644 --- a/include/uapi/linux/fsinfo.h +++ b/include/uapi/linux/fsinfo.h @@ -100,6 +100,9 @@ struct fsinfo_mount_info { __u64 mnt_unique_id; /* Kernel-lifetime unique mount ID */ __u32 mnt_id; /* Mount identifier (use with AT_FSINFO_MOUNTID_PATH) */ __u32 attr; /* MOUNT_ATTR_* flags */ + __u64 mnt_attr_changes; /* Number of attribute changes to this mount. */ + __u64 mnt_topology_changes; /* Number of topology changes to this mount. */ + __u64 mnt_subtree_notifications; /* Number of notifications in mount subtree */ }; #define FSINFO_ATTR_MOUNT_INFO__STRUCT struct fsinfo_mount_info @@ -113,6 +116,7 @@ struct fsinfo_mount_topology { __u32 dependent_source_id; /* Dependent: source mount group ID */ __u32 dependent_clone_of_id; /* Dependent: ID of mount this was cloned from */ __u32 propagation_type; /* MOUNT_PROPAGATION_* type */ + __u64 mnt_topology_changes; /* Number of topology changes to this mount. */ }; #define FSINFO_ATTR_MOUNT_TOPOLOGY__STRUCT struct fsinfo_mount_topology @@ -125,6 +129,9 @@ struct fsinfo_mount_child { __u64 mnt_unique_id; /* Kernel-lifetime unique mount ID */ __u32 mnt_id; /* Mount identifier (use with AT_FSINFO_MOUNTID_PATH) */ __u32 parent_id; /* Parent mount identifier */ + __u64 mnt_notify_sum; /* Sum of mnt_attr_changes, mnt_topology_changes and + * mnt_subtree_notifications. + */ }; #define FSINFO_ATTR_MOUNT_CHILDREN__STRUCT struct fsinfo_mount_child diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c index b7290ea8eb55..620a02477aa8 100644 --- a/samples/vfs/test-fsinfo.c +++ b/samples/vfs/test-fsinfo.c @@ -304,6 +304,10 @@ static void dump_fsinfo_generic_mount_info(void *reply, unsigned int size) printf("\tmnt_uniq: %llx\n", (unsigned long long)r->mnt_unique_id); printf("\tmnt_id : %x\n", r->mnt_id); printf("\tattr : %x\n", r->attr); + printf("\tevents : attr=%llu topology=%llu subtree=%llu\n", + (unsigned long long)r->mnt_attr_changes, + (unsigned long long)r->mnt_topology_changes, + (unsigned long long)r->mnt_subtree_notifications); } static void dump_fsinfo_generic_mount_topology(void *reply, unsigned int size) @@ -332,6 +336,7 @@ static void dump_fsinfo_generic_mount_topology(void *reply, unsigned int size) break; } + printf("\tevents : topology=%llu\n", (unsigned long long)r->mnt_topology_changes); } static void dump_fsinfo_generic_mount_children(void *reply, unsigned int size) @@ -354,8 +359,9 @@ static void dump_fsinfo_generic_mount_children(void *reply, unsigned int size) mp = "<this>"; } - printf("%8x %16llx %s\n", - r->mnt_id, (unsigned long long)r->mnt_unique_id, mp); + printf("%8x %16llx %10llu %s\n", + r->mnt_id, (unsigned long long)r->mnt_unique_id, + (unsigned long long)r->mnt_notify_sum, mp); } static void dump_string(void *reply, unsigned int size)
Provide support for the handling of an overrun in a watch queue. In the event that an overrun occurs, the watcher needs to be able to find out what it was that they missed. To this end, previous patches added event counters to struct mount. To make them accessible, they can be retrieved using fsinfo() and the FSINFO_ATTR_MOUNT_INFO attribute. struct fsinfo_mount_info { __u64 mnt_unique_id; __u64 mnt_attr_changes; __u64 mnt_topology_changes; __u64 mnt_subtree_notifications; ... }; There's a uniquifier and some event counters: (1) mnt_unique_id - This is an effectively non-repeating ID given to each mount object on creation. This allows the caller to check that the mount ID didn't get reused (the 32-bit mount ID is more efficient to look up). (2) mnt_attr_changes - Count of attribute changes on a mount object. (3) mnt_topology_changes - Count of alterations to the mount tree that affected this node. (4) mnt_subtree_notifications - Count of mount object event notifications that were generated in the subtree rooted at this node. This excludes events generated on this node itself and does not include superblock events. The counters are also accessible through the FSINFO_ATTR_MOUNT_CHILDREN attribute, where a list of all the children of a mount can be scanned. The record returned for each child includes the sum of the counters for that child. An additional record is added at the end for the queried object and that also includes the sum of its counters The mnt_topology_changes counter is also included in FSINFO_ATTR_MOUNT_TOPOLOGY. Signed-off-by: David Howells <dhowells@redhat.com> --- fs/mount_notify.c | 2 ++ fs/namespace.c | 21 +++++++++++++++++++++ include/uapi/linux/fsinfo.h | 7 +++++++ samples/vfs/test-fsinfo.c | 10 ++++++++-- 4 files changed, 38 insertions(+), 2 deletions(-)