Message ID | 20240116113247.758848-1-amir73il@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] fsnotify: optimize the case of no parent watcher | expand |
On Tue 16-01-24 13:32:47, Amir Goldstein wrote: > If parent inode is not watching, check for the event in masks of > sb/mount/inode masks early to optimize out most of the code in > __fsnotify_parent() and avoid calling fsnotify(). > > Jens has reported that this optimization improves BW and IOPS in an > io_uring benchmark by more than 10% and reduces perf reported CPU usage. > > before: > > + 4.51% io_uring [kernel.vmlinux] [k] fsnotify > + 3.67% io_uring [kernel.vmlinux] [k] __fsnotify_parent > > after: > > + 2.37% io_uring [kernel.vmlinux] [k] __fsnotify_parent > > Reported-and-tested-by: Jens Axboe <axboe@kernel.dk> > Link: https://lore.kernel.org/linux-fsdevel/b45bd8ff-5654-4e67-90a6-aad5e6759e0b@kernel.dk/ > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > > Jan, > > Considering that this change looks like a clear win and it actually > the change that you suggested, I cleaned it up a bit and posting for > your consideration. Agreed, I like this. What did you generate this patch against? It does not apply on top of current Linus' tree (maybe it needs the change sitting in VFS tree - which is fine I can wait until that gets merged)? > I've kept the wrappers fsnotify_path() and fsnotify_sb_has_watchers() > although they are not directly related to the optimization, because they > makes the code a bit nicer IMO. Yeah, these cleanups are fine. I would have prefered them as a separate patch (some people might want the performance improvement to be backported and this makes it unnecessarily more complex) but don't resend just because of that. Honza
On Tue, Jan 16, 2024 at 2:04 PM Jan Kara <jack@suse.cz> wrote: > > On Tue 16-01-24 13:32:47, Amir Goldstein wrote: > > If parent inode is not watching, check for the event in masks of > > sb/mount/inode masks early to optimize out most of the code in > > __fsnotify_parent() and avoid calling fsnotify(). > > > > Jens has reported that this optimization improves BW and IOPS in an > > io_uring benchmark by more than 10% and reduces perf reported CPU usage. > > > > before: > > > > + 4.51% io_uring [kernel.vmlinux] [k] fsnotify > > + 3.67% io_uring [kernel.vmlinux] [k] __fsnotify_parent > > > > after: > > > > + 2.37% io_uring [kernel.vmlinux] [k] __fsnotify_parent > > > > Reported-and-tested-by: Jens Axboe <axboe@kernel.dk> > > Link: https://lore.kernel.org/linux-fsdevel/b45bd8ff-5654-4e67-90a6-aad5e6759e0b@kernel.dk/ > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > --- > > > > Jan, > > > > Considering that this change looks like a clear win and it actually > > the change that you suggested, I cleaned it up a bit and posting for > > your consideration. > > Agreed, I like this. What did you generate this patch against? It does not > apply on top of current Linus' tree (maybe it needs the change sitting in > VFS tree - which is fine I can wait until that gets merged)? > Yes, it is on top of Christian's vfs-fixes branch. > > I've kept the wrappers fsnotify_path() and fsnotify_sb_has_watchers() > > although they are not directly related to the optimization, because they > > makes the code a bit nicer IMO. > > Yeah, these cleanups are fine. I would have prefered them as a separate > patch (some people might want the performance improvement to be backported > and this makes it unnecessarily more complex) but don't resend just because > of that. > Makes sense. Thanks, Amir.
On Tue 16-01-24 14:53:00, Amir Goldstein wrote: > On Tue, Jan 16, 2024 at 2:04 PM Jan Kara <jack@suse.cz> wrote: > > > > On Tue 16-01-24 13:32:47, Amir Goldstein wrote: > > > If parent inode is not watching, check for the event in masks of > > > sb/mount/inode masks early to optimize out most of the code in > > > __fsnotify_parent() and avoid calling fsnotify(). > > > > > > Jens has reported that this optimization improves BW and IOPS in an > > > io_uring benchmark by more than 10% and reduces perf reported CPU usage. > > > > > > before: > > > > > > + 4.51% io_uring [kernel.vmlinux] [k] fsnotify > > > + 3.67% io_uring [kernel.vmlinux] [k] __fsnotify_parent > > > > > > after: > > > > > > + 2.37% io_uring [kernel.vmlinux] [k] __fsnotify_parent > > > > > > Reported-and-tested-by: Jens Axboe <axboe@kernel.dk> > > > Link: https://lore.kernel.org/linux-fsdevel/b45bd8ff-5654-4e67-90a6-aad5e6759e0b@kernel.dk/ > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > --- > > > > > > Jan, > > > > > > Considering that this change looks like a clear win and it actually > > > the change that you suggested, I cleaned it up a bit and posting for > > > your consideration. > > > > Agreed, I like this. What did you generate this patch against? It does not > > apply on top of current Linus' tree (maybe it needs the change sitting in > > VFS tree - which is fine I can wait until that gets merged)? > > > > Yes, it is on top of Christian's vfs-fixes branch. Merged your improvement now (and I've split off the cleanup into a separate change and dropped the creation of fsnotify_path() which seemed a bit pointless with a single caller). All pushed out. Honza
On Wed, Jan 24, 2024 at 6:08 PM Jan Kara <jack@suse.cz> wrote: > > On Tue 16-01-24 14:53:00, Amir Goldstein wrote: > > On Tue, Jan 16, 2024 at 2:04 PM Jan Kara <jack@suse.cz> wrote: > > > > > > On Tue 16-01-24 13:32:47, Amir Goldstein wrote: > > > > If parent inode is not watching, check for the event in masks of > > > > sb/mount/inode masks early to optimize out most of the code in > > > > __fsnotify_parent() and avoid calling fsnotify(). > > > > > > > > Jens has reported that this optimization improves BW and IOPS in an > > > > io_uring benchmark by more than 10% and reduces perf reported CPU usage. > > > > > > > > before: > > > > > > > > + 4.51% io_uring [kernel.vmlinux] [k] fsnotify > > > > + 3.67% io_uring [kernel.vmlinux] [k] __fsnotify_parent > > > > > > > > after: > > > > > > > > + 2.37% io_uring [kernel.vmlinux] [k] __fsnotify_parent > > > > > > > > Reported-and-tested-by: Jens Axboe <axboe@kernel.dk> > > > > Link: https://lore.kernel.org/linux-fsdevel/b45bd8ff-5654-4e67-90a6-aad5e6759e0b@kernel.dk/ > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > > --- > > > > > > > > Jan, > > > > > > > > Considering that this change looks like a clear win and it actually > > > > the change that you suggested, I cleaned it up a bit and posting for > > > > your consideration. > > > > > > Agreed, I like this. What did you generate this patch against? It does not > > > apply on top of current Linus' tree (maybe it needs the change sitting in > > > VFS tree - which is fine I can wait until that gets merged)? > > > > > > > Yes, it is on top of Christian's vfs-fixes branch. > > Merged your improvement now (and I've split off the cleanup into a separate > change and dropped the creation of fsnotify_path() which seemed a bit > pointless with a single caller). All pushed out. > Thanks! Amir.
On Wed, Jan 24, 2024 at 6:20 PM Amir Goldstein <amir73il@gmail.com> wrote: > > On Wed, Jan 24, 2024 at 6:08 PM Jan Kara <jack@suse.cz> wrote: > > > > On Tue 16-01-24 14:53:00, Amir Goldstein wrote: > > > On Tue, Jan 16, 2024 at 2:04 PM Jan Kara <jack@suse.cz> wrote: > > > > > > > > On Tue 16-01-24 13:32:47, Amir Goldstein wrote: > > > > > If parent inode is not watching, check for the event in masks of > > > > > sb/mount/inode masks early to optimize out most of the code in > > > > > __fsnotify_parent() and avoid calling fsnotify(). > > > > > > > > > > Jens has reported that this optimization improves BW and IOPS in an > > > > > io_uring benchmark by more than 10% and reduces perf reported CPU usage. > > > > > > > > > > before: > > > > > > > > > > + 4.51% io_uring [kernel.vmlinux] [k] fsnotify > > > > > + 3.67% io_uring [kernel.vmlinux] [k] __fsnotify_parent > > > > > > > > > > after: > > > > > > > > > > + 2.37% io_uring [kernel.vmlinux] [k] __fsnotify_parent > > > > > > > > > > Reported-and-tested-by: Jens Axboe <axboe@kernel.dk> > > > > > Link: https://lore.kernel.org/linux-fsdevel/b45bd8ff-5654-4e67-90a6-aad5e6759e0b@kernel.dk/ > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > > > --- > > > > > > > > > > Jan, > > > > > > > > > > Considering that this change looks like a clear win and it actually > > > > > the change that you suggested, I cleaned it up a bit and posting for > > > > > your consideration. > > > > > > > > Agreed, I like this. What did you generate this patch against? It does not > > > > apply on top of current Linus' tree (maybe it needs the change sitting in > > > > VFS tree - which is fine I can wait until that gets merged)? > > > > > > > > > > Yes, it is on top of Christian's vfs-fixes branch. > > > > Merged your improvement now (and I've split off the cleanup into a separate > > change and dropped the creation of fsnotify_path() which seemed a bit > > pointless with a single caller). All pushed out. > > > Jan & Jens, Although Jan has already queued this v3 patch with sufficient performance improvement for Jens' workloads, I got a performance regression report from kernel robot on will-it-scale microbenchmark (buffered write loop) on my fan_pre_content patches, so I tried to improve on the existing solution. I tried something similar to v1/v2 patches, where the sb keeps accounting of the number of watchers for specific sub-classes of events. I've made two major changes: 1. moved to counters into a per-sb state object fsnotify_sb_connector as Christian requested 2. The counters are by fanotify classes, not by specific events, so they can be used to answer the questions: a) Are there any fsnotify watchers on this sb? b) Are there any fanotify permission class listeners on this sb? c) Are there any fanotify pre-content (a.k.a HSM) class listeners on this sb? I think that those questions are very relevant in the real world, because a positive answer to (b) and (c) is quite rare in the real world, so the overhead on the permission hooks could be completely eliminated in the common case. If needed, we can further bisect the class counters per specific painful events (e.g. FAN_ACCESS*), but there is no need to do that before we see concrete benchmark results. Jens, If you feel like it, you can see if this branch further improves your workloads: https://github.com/amir73il/linux/commits/fsnotify-perf/ Jan, Whenever you have the time, feel free to see if this is a valid direction, if not for the perf optimization then we are going to need the fsnotify_sb_connector container for other features as well. Thanks! Amir.
On Tue 13-02-24 21:45:56, Amir Goldstein wrote: > On Wed, Jan 24, 2024 at 6:20 PM Amir Goldstein <amir73il@gmail.com> wrote: > > On Wed, Jan 24, 2024 at 6:08 PM Jan Kara <jack@suse.cz> wrote: > > > On Tue 16-01-24 14:53:00, Amir Goldstein wrote: > > > > On Tue, Jan 16, 2024 at 2:04 PM Jan Kara <jack@suse.cz> wrote: > > > > > > > > > > On Tue 16-01-24 13:32:47, Amir Goldstein wrote: > > > > > > If parent inode is not watching, check for the event in masks of > > > > > > sb/mount/inode masks early to optimize out most of the code in > > > > > > __fsnotify_parent() and avoid calling fsnotify(). > > > > > > > > > > > > Jens has reported that this optimization improves BW and IOPS in an > > > > > > io_uring benchmark by more than 10% and reduces perf reported CPU usage. > > > > > > > > > > > > before: > > > > > > > > > > > > + 4.51% io_uring [kernel.vmlinux] [k] fsnotify > > > > > > + 3.67% io_uring [kernel.vmlinux] [k] __fsnotify_parent > > > > > > > > > > > > after: > > > > > > > > > > > > + 2.37% io_uring [kernel.vmlinux] [k] __fsnotify_parent > > > > > > > > > > > > Reported-and-tested-by: Jens Axboe <axboe@kernel.dk> > > > > > > Link: https://lore.kernel.org/linux-fsdevel/b45bd8ff-5654-4e67-90a6-aad5e6759e0b@kernel.dk/ > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > > > > --- > > > > > > > > > > > > Jan, > > > > > > > > > > > > Considering that this change looks like a clear win and it actually > > > > > > the change that you suggested, I cleaned it up a bit and posting for > > > > > > your consideration. > > > > > > > > > > Agreed, I like this. What did you generate this patch against? It does not > > > > > apply on top of current Linus' tree (maybe it needs the change sitting in > > > > > VFS tree - which is fine I can wait until that gets merged)? > > > > > > > > > > > > > Yes, it is on top of Christian's vfs-fixes branch. > > > > > > Merged your improvement now (and I've split off the cleanup into a separate > > > change and dropped the creation of fsnotify_path() which seemed a bit > > > pointless with a single caller). All pushed out. > > > > > > > Jan & Jens, > > Although Jan has already queued this v3 patch with sufficient performance > improvement for Jens' workloads, I got a performance regression report from > kernel robot on will-it-scale microbenchmark (buffered write loop) > on my fan_pre_content patches, so I tried to improve on the existing solution. > > I tried something similar to v1/v2 patches, where the sb keeps accounting > of the number of watchers for specific sub-classes of events. > > I've made two major changes: > 1. moved to counters into a per-sb state object fsnotify_sb_connector > as Christian requested > 2. The counters are by fanotify classes, not by specific events, so they > can be used to answer the questions: > a) Are there any fsnotify watchers on this sb? > b) Are there any fanotify permission class listeners on this sb? > c) Are there any fanotify pre-content (a.k.a HSM) class listeners on this sb? > > I think that those questions are very relevant in the real world, because > a positive answer to (b) and (c) is quite rare in the real world, so the > overhead on the permission hooks could be completely eliminated in > the common case. > > If needed, we can further bisect the class counters per specific painful > events (e.g. FAN_ACCESS*), but there is no need to do that before > we see concrete benchmark results. OK, I think this idea is sound, I'd just be interested whether the 0-day bot (or somebody else) is able to see improvement with this. Otherwise why bother :) > Jan, > > Whenever you have the time, feel free to see if this is a valid direction, > if not for the perf optimization then we are going to need the > fsnotify_sb_connector container for other features as well. So firstly the name fsnotify_sb_connector really confuses me. I'd save "connector" names to fsnotify_mark_connector. Maybe fsnotify_sb_info? Then I dislike how we have to specialcase superblock in quite a few places and add these wrappers and what not. This seems to be mostly caused by the fact that you directly embed fsnotify_mark_connector into fsnotify_sb_info. What if we just put fsnotify_connp_t there? I understand that this will mean one more pointer fetch if there are actually marks attached to the superblock and the event mask matches s_fsnotify_mask. But in that case we are likely to generate the event anyway so the cost of that compared to event generation is negligible? And I'd allocate fsnotify_sb_info on demand from fsnotify_add_mark_locked() which means that we need to pass object pointer (in the form of void *) instead of fsnotify_connp_t to various mark adding functions (and transform it to fsnotify_connp_t only in fsnotify_add_mark_locked() after possibly setting up fsnotify_sb_info). Passing void * around is not great but it should be fairly limited (and actually reduces the knowledge of fsnotify internals outside of the fsnotify core). Honza
> > > > Merged your improvement now (and I've split off the cleanup into a separate > > > > change and dropped the creation of fsnotify_path() which seemed a bit > > > > pointless with a single caller). All pushed out. > > > > > > > > > > > Jan & Jens, > > > > Although Jan has already queued this v3 patch with sufficient performance > > improvement for Jens' workloads, I got a performance regression report from > > kernel robot on will-it-scale microbenchmark (buffered write loop) > > on my fan_pre_content patches, so I tried to improve on the existing solution. > > > > I tried something similar to v1/v2 patches, where the sb keeps accounting > > of the number of watchers for specific sub-classes of events. > > > > I've made two major changes: > > 1. moved to counters into a per-sb state object fsnotify_sb_connector > > as Christian requested > > 2. The counters are by fanotify classes, not by specific events, so they > > can be used to answer the questions: > > a) Are there any fsnotify watchers on this sb? > > b) Are there any fanotify permission class listeners on this sb? > > c) Are there any fanotify pre-content (a.k.a HSM) class listeners on this sb? > > > > I think that those questions are very relevant in the real world, because > > a positive answer to (b) and (c) is quite rare in the real world, so the > > overhead on the permission hooks could be completely eliminated in > > the common case. > > > > If needed, we can further bisect the class counters per specific painful > > events (e.g. FAN_ACCESS*), but there is no need to do that before > > we see concrete benchmark results. > > OK, I think this idea is sound, I'd just be interested whether the 0-day > bot (or somebody else) is able to see improvement with this. Otherwise why > bother :) > Exactly. > > Jan, > > > > Whenever you have the time, feel free to see if this is a valid direction, > > if not for the perf optimization then we are going to need the > > fsnotify_sb_connector container for other features as well. > > So firstly the name fsnotify_sb_connector really confuses me. I'd save > "connector" names to fsnotify_mark_connector. Maybe fsnotify_sb_info? > Sure. > Then I dislike how we have to specialcase superblock in quite a few places > and add these wrappers and what not. This seems to be mostly caused by the > fact that you directly embed fsnotify_mark_connector into fsnotify_sb_info. > What if we just put fsnotify_connp_t there? I understand that this will > mean one more pointer fetch if there are actually marks attached to the > superblock and the event mask matches s_fsnotify_mask. But in that case we > are likely to generate the event anyway so the cost of that compared to > event generation is negligible? > I guess that can work. I can try it and see if there are any other complications. > And I'd allocate fsnotify_sb_info on demand from fsnotify_add_mark_locked() > which means that we need to pass object pointer (in the form of void *) > instead of fsnotify_connp_t to various mark adding functions (and transform > it to fsnotify_connp_t only in fsnotify_add_mark_locked() after possibly > setting up fsnotify_sb_info). Passing void * around is not great but it > should be fairly limited (and actually reduces the knowledge of fsnotify > internals outside of the fsnotify core). Unless I am missing something, I think we only need to pass an extra sb arg to fsnotify_add_mark_locked()? and it does not sound like a big deal. For adding an sb mark, connp arg could be NULL, and then we get connp from sb->fsnotify_sb_info after making sure that it is allocated. I will get to look at it in ~2 weeks. Thanks for the quick feedback. Amir.
On Wed 14-02-24 15:40:31, Amir Goldstein wrote: > > > > > Merged your improvement now (and I've split off the cleanup into a separate > > > > > change and dropped the creation of fsnotify_path() which seemed a bit > > > > > pointless with a single caller). All pushed out. > > > > > > > > > > > > > > > Jan & Jens, > > > > > > Although Jan has already queued this v3 patch with sufficient performance > > > improvement for Jens' workloads, I got a performance regression report from > > > kernel robot on will-it-scale microbenchmark (buffered write loop) > > > on my fan_pre_content patches, so I tried to improve on the existing solution. > > > > > > I tried something similar to v1/v2 patches, where the sb keeps accounting > > > of the number of watchers for specific sub-classes of events. > > > > > > I've made two major changes: > > > 1. moved to counters into a per-sb state object fsnotify_sb_connector > > > as Christian requested > > > 2. The counters are by fanotify classes, not by specific events, so they > > > can be used to answer the questions: > > > a) Are there any fsnotify watchers on this sb? > > > b) Are there any fanotify permission class listeners on this sb? > > > c) Are there any fanotify pre-content (a.k.a HSM) class listeners on this sb? > > > > > > I think that those questions are very relevant in the real world, because > > > a positive answer to (b) and (c) is quite rare in the real world, so the > > > overhead on the permission hooks could be completely eliminated in > > > the common case. > > > > > > If needed, we can further bisect the class counters per specific painful > > > events (e.g. FAN_ACCESS*), but there is no need to do that before > > > we see concrete benchmark results. ... > > Then I dislike how we have to specialcase superblock in quite a few places > > and add these wrappers and what not. This seems to be mostly caused by the > > fact that you directly embed fsnotify_mark_connector into fsnotify_sb_info. > > What if we just put fsnotify_connp_t there? I understand that this will > > mean one more pointer fetch if there are actually marks attached to the > > superblock and the event mask matches s_fsnotify_mask. But in that case we > > are likely to generate the event anyway so the cost of that compared to > > event generation is negligible? > > > > I guess that can work. > I can try it and see if there are any other complications. > > > And I'd allocate fsnotify_sb_info on demand from fsnotify_add_mark_locked() > > which means that we need to pass object pointer (in the form of void *) > > instead of fsnotify_connp_t to various mark adding functions (and transform > > it to fsnotify_connp_t only in fsnotify_add_mark_locked() after possibly > > setting up fsnotify_sb_info). Passing void * around is not great but it > > should be fairly limited (and actually reduces the knowledge of fsnotify > > internals outside of the fsnotify core). > > Unless I am missing something, I think we only need to pass an extra sb > arg to fsnotify_add_mark_locked()? and it does not sound like a big deal. > For adding an sb mark, connp arg could be NULL, and then we get connp > from sb->fsnotify_sb_info after making sure that it is allocated. Yes that would be another possibility but frankly I like passing the 'object' pointer instead of connp pointer a bit more. But we can see how the code looks like. Honza
On 2/13/24 12:45 PM, Amir Goldstein wrote: > Jens, > > If you feel like it, you can see if this branch further improves your > workloads: > > https://github.com/amir73il/linux/commits/fsnotify-perf/ Baseline is current -git with changes I have for 6.9, so not with the previous patch. 4 optanes, usually random reads, and being run on a dual socket Intel(R) Xeon(R) Platinum 8458P CPU @ 2.7GHz. IOPS=16.20M, BW=7.91GiB/s, IOS/call=32/31 IOPS=16.20M, BW=7.91GiB/s, IOS/call=32/32 IOPS=16.20M, BW=7.91GiB/s, IOS/call=32/31 and in perf profile, we see: + 3.16% io_uring [kernel.kallsyms] [k] fsnotify + 2.04% io_uring [kernel.kallsyms] [k] __fsnotify_parent with this branch pulled in: IOPS=17.44M, BW=8.52GiB/s, IOS/call=32/31 IOPS=17.45M, BW=8.52GiB/s, IOS/call=32/31 IOPS=17.45M, BW=8.52GiB/s, IOS/call=32/31 and perf diff shows: 2.04% -1.15% [kernel.kallsyms] [k] __fsnotify_parent 3.16% [kernel.kallsyms] [k] fsnotify with a big reduction for __fsnotify_parent() and fsnotify() being totally gone. In absolute terms, this is all we see in perf profile with the patch: + 0.89% io_uring [kernel.kallsyms] [k] __fsnotify_parent iow, we went from over 5% of added overhead to less than 1%.
On Thu, Feb 15, 2024 at 10:36 AM Jan Kara <jack@suse.cz> wrote: > > On Wed 14-02-24 15:40:31, Amir Goldstein wrote: > > > > > > Merged your improvement now (and I've split off the cleanup into a separate > > > > > > change and dropped the creation of fsnotify_path() which seemed a bit > > > > > > pointless with a single caller). All pushed out. > > > > > > > > > > > > > > > > > > > Jan & Jens, > > > > > > > > Although Jan has already queued this v3 patch with sufficient performance > > > > improvement for Jens' workloads, I got a performance regression report from > > > > kernel robot on will-it-scale microbenchmark (buffered write loop) > > > > on my fan_pre_content patches, so I tried to improve on the existing solution. > > > > > > > > I tried something similar to v1/v2 patches, where the sb keeps accounting > > > > of the number of watchers for specific sub-classes of events. > > > > > > > > I've made two major changes: > > > > 1. moved to counters into a per-sb state object fsnotify_sb_connector > > > > as Christian requested > > > > 2. The counters are by fanotify classes, not by specific events, so they > > > > can be used to answer the questions: > > > > a) Are there any fsnotify watchers on this sb? > > > > b) Are there any fanotify permission class listeners on this sb? > > > > c) Are there any fanotify pre-content (a.k.a HSM) class listeners on this sb? > > > > > > > > I think that those questions are very relevant in the real world, because > > > > a positive answer to (b) and (c) is quite rare in the real world, so the > > > > overhead on the permission hooks could be completely eliminated in > > > > the common case. > > > > > > > > If needed, we can further bisect the class counters per specific painful > > > > events (e.g. FAN_ACCESS*), but there is no need to do that before > > > > we see concrete benchmark results. > ... > > > > Then I dislike how we have to specialcase superblock in quite a few places > > > and add these wrappers and what not. This seems to be mostly caused by the > > > fact that you directly embed fsnotify_mark_connector into fsnotify_sb_info. > > > What if we just put fsnotify_connp_t there? I understand that this will > > > mean one more pointer fetch if there are actually marks attached to the > > > superblock and the event mask matches s_fsnotify_mask. But in that case we > > > are likely to generate the event anyway so the cost of that compared to > > > event generation is negligible? > > > > > > > I guess that can work. > > I can try it and see if there are any other complications. > > > > > And I'd allocate fsnotify_sb_info on demand from fsnotify_add_mark_locked() > > > which means that we need to pass object pointer (in the form of void *) > > > instead of fsnotify_connp_t to various mark adding functions (and transform > > > it to fsnotify_connp_t only in fsnotify_add_mark_locked() after possibly > > > setting up fsnotify_sb_info). Passing void * around is not great but it > > > should be fairly limited (and actually reduces the knowledge of fsnotify > > > internals outside of the fsnotify core). > > > > Unless I am missing something, I think we only need to pass an extra sb > > arg to fsnotify_add_mark_locked()? and it does not sound like a big deal. > > For adding an sb mark, connp arg could be NULL, and then we get connp > > from sb->fsnotify_sb_info after making sure that it is allocated. > > Yes that would be another possibility but frankly I like passing the > 'object' pointer instead of connp pointer a bit more. But we can see how > the code looks like. Ok, here it is: https://github.com/amir73il/linux/commits/fsnotify-sbinfo/ I agree that the interface does end up looking better this way. I've requested to re-test performance on fsnotify-sbinfo. You can use this rebased branch to look at the diff from the the previous patches that were tested by 0day: https://github.com/amir73il/linux/commits/fsnotify-sbconn/ If you have the bandwidth to consider those patches as candidates for (the second half of?) 6.9 merge window, I can post them for review. The main benefit of getting them merged in 6.9 is that I could work on pre-content events for 6.10 without having to worry about perf regressions. Thanks, Amir.
On Wed 06-03-24 16:51:06, Amir Goldstein wrote: > On Thu, Feb 15, 2024 at 10:36 AM Jan Kara <jack@suse.cz> wrote: > > On Wed 14-02-24 15:40:31, Amir Goldstein wrote: > > > > > > > Merged your improvement now (and I've split off the cleanup into a separate > > > > > > > change and dropped the creation of fsnotify_path() which seemed a bit > > > > > > > pointless with a single caller). All pushed out. > > > > > > > > > > > > > > > > > > > > > > > Jan & Jens, > > > > > > > > > > Although Jan has already queued this v3 patch with sufficient performance > > > > > improvement for Jens' workloads, I got a performance regression report from > > > > > kernel robot on will-it-scale microbenchmark (buffered write loop) > > > > > on my fan_pre_content patches, so I tried to improve on the existing solution. > > > > > > > > > > I tried something similar to v1/v2 patches, where the sb keeps accounting > > > > > of the number of watchers for specific sub-classes of events. > > > > > > > > > > I've made two major changes: > > > > > 1. moved to counters into a per-sb state object fsnotify_sb_connector > > > > > as Christian requested > > > > > 2. The counters are by fanotify classes, not by specific events, so they > > > > > can be used to answer the questions: > > > > > a) Are there any fsnotify watchers on this sb? > > > > > b) Are there any fanotify permission class listeners on this sb? > > > > > c) Are there any fanotify pre-content (a.k.a HSM) class listeners on this sb? > > > > > > > > > > I think that those questions are very relevant in the real world, because > > > > > a positive answer to (b) and (c) is quite rare in the real world, so the > > > > > overhead on the permission hooks could be completely eliminated in > > > > > the common case. > > > > > > > > > > If needed, we can further bisect the class counters per specific painful > > > > > events (e.g. FAN_ACCESS*), but there is no need to do that before > > > > > we see concrete benchmark results. > > ... > > > > > > Then I dislike how we have to specialcase superblock in quite a few places > > > > and add these wrappers and what not. This seems to be mostly caused by the > > > > fact that you directly embed fsnotify_mark_connector into fsnotify_sb_info. > > > > What if we just put fsnotify_connp_t there? I understand that this will > > > > mean one more pointer fetch if there are actually marks attached to the > > > > superblock and the event mask matches s_fsnotify_mask. But in that case we > > > > are likely to generate the event anyway so the cost of that compared to > > > > event generation is negligible? > > > > > > > > > > I guess that can work. > > > I can try it and see if there are any other complications. > > > > > > > And I'd allocate fsnotify_sb_info on demand from fsnotify_add_mark_locked() > > > > which means that we need to pass object pointer (in the form of void *) > > > > instead of fsnotify_connp_t to various mark adding functions (and transform > > > > it to fsnotify_connp_t only in fsnotify_add_mark_locked() after possibly > > > > setting up fsnotify_sb_info). Passing void * around is not great but it > > > > should be fairly limited (and actually reduces the knowledge of fsnotify > > > > internals outside of the fsnotify core). > > > > > > Unless I am missing something, I think we only need to pass an extra sb > > > arg to fsnotify_add_mark_locked()? and it does not sound like a big deal. > > > For adding an sb mark, connp arg could be NULL, and then we get connp > > > from sb->fsnotify_sb_info after making sure that it is allocated. > > > > Yes that would be another possibility but frankly I like passing the > > 'object' pointer instead of connp pointer a bit more. But we can see how > > the code looks like. > > Ok, here it is: > > https://github.com/amir73il/linux/commits/fsnotify-sbinfo/ > > I agree that the interface does end up looking better this way. Yep, the interface looks fine. I have left some comments on github regarding typos and some suspicious things. > I've requested to re-test performance on fsnotify-sbinfo. > > You can use this rebased branch to look at the diff from the > the previous patches that were tested by 0day: > > https://github.com/amir73il/linux/commits/fsnotify-sbconn/ > > If you have the bandwidth to consider those patches as candidates > for (the second half of?) 6.9 merge window, I can post them for review. Well, unless Linus does rc8, I don't think we should queue these for the merge window as it is too late by now. But please post them for review, I'll have a look. I can then push them to my tree early into a stable branch and you can base your patches on my branch. If the patches then need to go through VFS tree, Christian is fine with pulling my tree... Honza
On Fri, Mar 08, 2024 at 05:00:58PM +0100, Jan Kara wrote: > On Wed 06-03-24 16:51:06, Amir Goldstein wrote: > > On Thu, Feb 15, 2024 at 10:36 AM Jan Kara <jack@suse.cz> wrote: > > > On Wed 14-02-24 15:40:31, Amir Goldstein wrote: > > > > > > > > Merged your improvement now (and I've split off the cleanup into a separate > > > > > > > > change and dropped the creation of fsnotify_path() which seemed a bit > > > > > > > > pointless with a single caller). All pushed out. > > > > > > > > > > > > > > > > > > > > > > > > > > > Jan & Jens, > > > > > > > > > > > > Although Jan has already queued this v3 patch with sufficient performance > > > > > > improvement for Jens' workloads, I got a performance regression report from > > > > > > kernel robot on will-it-scale microbenchmark (buffered write loop) > > > > > > on my fan_pre_content patches, so I tried to improve on the existing solution. > > > > > > > > > > > > I tried something similar to v1/v2 patches, where the sb keeps accounting > > > > > > of the number of watchers for specific sub-classes of events. > > > > > > > > > > > > I've made two major changes: > > > > > > 1. moved to counters into a per-sb state object fsnotify_sb_connector > > > > > > as Christian requested > > > > > > 2. The counters are by fanotify classes, not by specific events, so they > > > > > > can be used to answer the questions: > > > > > > a) Are there any fsnotify watchers on this sb? > > > > > > b) Are there any fanotify permission class listeners on this sb? > > > > > > c) Are there any fanotify pre-content (a.k.a HSM) class listeners on this sb? > > > > > > > > > > > > I think that those questions are very relevant in the real world, because > > > > > > a positive answer to (b) and (c) is quite rare in the real world, so the > > > > > > overhead on the permission hooks could be completely eliminated in > > > > > > the common case. > > > > > > > > > > > > If needed, we can further bisect the class counters per specific painful > > > > > > events (e.g. FAN_ACCESS*), but there is no need to do that before > > > > > > we see concrete benchmark results. > > > ... > > > > > > > > Then I dislike how we have to specialcase superblock in quite a few places > > > > > and add these wrappers and what not. This seems to be mostly caused by the > > > > > fact that you directly embed fsnotify_mark_connector into fsnotify_sb_info. > > > > > What if we just put fsnotify_connp_t there? I understand that this will > > > > > mean one more pointer fetch if there are actually marks attached to the > > > > > superblock and the event mask matches s_fsnotify_mask. But in that case we > > > > > are likely to generate the event anyway so the cost of that compared to > > > > > event generation is negligible? > > > > > > > > > > > > > I guess that can work. > > > > I can try it and see if there are any other complications. > > > > > > > > > And I'd allocate fsnotify_sb_info on demand from fsnotify_add_mark_locked() > > > > > which means that we need to pass object pointer (in the form of void *) > > > > > instead of fsnotify_connp_t to various mark adding functions (and transform > > > > > it to fsnotify_connp_t only in fsnotify_add_mark_locked() after possibly > > > > > setting up fsnotify_sb_info). Passing void * around is not great but it > > > > > should be fairly limited (and actually reduces the knowledge of fsnotify > > > > > internals outside of the fsnotify core). > > > > > > > > Unless I am missing something, I think we only need to pass an extra sb > > > > arg to fsnotify_add_mark_locked()? and it does not sound like a big deal. > > > > For adding an sb mark, connp arg could be NULL, and then we get connp > > > > from sb->fsnotify_sb_info after making sure that it is allocated. > > > > > > Yes that would be another possibility but frankly I like passing the > > > 'object' pointer instead of connp pointer a bit more. But we can see how > > > the code looks like. > > > > Ok, here it is: > > > > https://github.com/amir73il/linux/commits/fsnotify-sbinfo/ > > > > I agree that the interface does end up looking better this way. > > Yep, the interface looks fine. I have left some comments on github > regarding typos and some suspicious things. > > > I've requested to re-test performance on fsnotify-sbinfo. > > > > You can use this rebased branch to look at the diff from the > > the previous patches that were tested by 0day: > > > > https://github.com/amir73il/linux/commits/fsnotify-sbconn/ > > > > If you have the bandwidth to consider those patches as candidates > > for (the second half of?) 6.9 merge window, I can post them for review. > > Well, unless Linus does rc8, I don't think we should queue these for the > merge window as it is too late by now. But please post them for review, > I'll have a look. I can then push them to my tree early into a stable > branch and you can base your patches on my branch. If the patches then need > to go through VFS tree, Christian is fine with pulling my tree... I'm absolutely opposed to touching anything that you do. I'm joking of course, I'm very happy to pull from you!
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 8bfd690e9f10..6582204cca70 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -141,7 +141,7 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode) } /* Are inode/sb/mount interested in parent and name info with this event? */ -static bool fsnotify_event_needs_parent(struct inode *inode, struct mount *mnt, +static bool fsnotify_event_needs_parent(struct inode *inode, __u32 mnt_mask, __u32 mask) { __u32 marks_mask = 0; @@ -160,13 +160,22 @@ static bool fsnotify_event_needs_parent(struct inode *inode, struct mount *mnt, /* Did either inode/sb/mount subscribe for events with parent/name? */ marks_mask |= fsnotify_parent_needed_mask(inode->i_fsnotify_mask); marks_mask |= fsnotify_parent_needed_mask(inode->i_sb->s_fsnotify_mask); - if (mnt) - marks_mask |= fsnotify_parent_needed_mask(mnt->mnt_fsnotify_mask); + marks_mask |= fsnotify_parent_needed_mask(mnt_mask); /* Did they subscribe for this event with parent/name info? */ return mask & marks_mask; } +/* Are there any inode/mount/sb objects that are interested in this event? */ +static inline bool fsnotify_object_watched(struct inode *inode, __u32 mnt_mask, + __u32 mask) +{ + __u32 marks_mask = inode->i_fsnotify_mask | mnt_mask | + inode->i_sb->s_fsnotify_mask; + + return mask & marks_mask & ALL_FSNOTIFY_EVENTS; +} + /* * Notify this dentry's parent about a child's events with child name info * if parent is watching or if inode/sb/mount are interested in events with @@ -179,7 +188,7 @@ int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, int data_type) { const struct path *path = fsnotify_data_path(data, data_type); - struct mount *mnt = path ? real_mount(path->mnt) : NULL; + __u32 mnt_mask = path ? real_mount(path->mnt)->mnt_fsnotify_mask : 0; struct inode *inode = d_inode(dentry); struct dentry *parent; bool parent_watched = dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED; @@ -190,16 +199,13 @@ int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, struct qstr *file_name = NULL; int ret = 0; - /* - * Do inode/sb/mount care about parent and name info on non-dir? - * Do they care about any event at all? - */ - if (!inode->i_fsnotify_marks && !inode->i_sb->s_fsnotify_marks && - (!mnt || !mnt->mnt_fsnotify_marks) && !parent_watched) + /* Optimize the likely case of nobody watching this path */ + if (likely(!parent_watched) && + likely(!fsnotify_object_watched(inode, mnt_mask, mask))) return 0; parent = NULL; - parent_needed = fsnotify_event_needs_parent(inode, mnt, mask); + parent_needed = fsnotify_event_needs_parent(inode, mnt_mask, mask); if (!parent_watched && !parent_needed) goto notify; diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index 8300a5286988..2ddfd7d1eca8 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -17,6 +17,12 @@ #include <linux/slab.h> #include <linux/bug.h> +/* Are there any inode/mount/sb objects that are being watched at all? */ +static inline bool fsnotify_sb_has_watchers(struct super_block *sb) +{ + return atomic_long_read(&sb->s_fsnotify_connectors); +} + /* * Notify this @dir inode about a change in a child directory entry. * The directory entry may have turned positive or negative or its inode may @@ -30,7 +36,7 @@ static inline int fsnotify_name(__u32 mask, const void *data, int data_type, struct inode *dir, const struct qstr *name, u32 cookie) { - if (atomic_long_read(&dir->i_sb->s_fsnotify_connectors) == 0) + if (!fsnotify_sb_has_watchers(dir->i_sb)) return 0; return fsnotify(mask, data, data_type, dir, name, NULL, cookie); @@ -44,7 +50,7 @@ static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry, static inline void fsnotify_inode(struct inode *inode, __u32 mask) { - if (atomic_long_read(&inode->i_sb->s_fsnotify_connectors) == 0) + if (!fsnotify_sb_has_watchers(inode->i_sb)) return; if (S_ISDIR(inode->i_mode)) @@ -59,7 +65,7 @@ static inline int fsnotify_parent(struct dentry *dentry, __u32 mask, { struct inode *inode = d_inode(dentry); - if (atomic_long_read(&inode->i_sb->s_fsnotify_connectors) == 0) + if (!fsnotify_sb_has_watchers(inode->i_sb)) return 0; if (S_ISDIR(inode->i_mode)) { @@ -89,15 +95,17 @@ static inline void fsnotify_dentry(struct dentry *dentry, __u32 mask) fsnotify_parent(dentry, mask, dentry, FSNOTIFY_EVENT_DENTRY); } -static inline int fsnotify_file(struct file *file, __u32 mask) +static inline int fsnotify_path(const struct path *path, __u32 mask) { - const struct path *path; + return fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH); +} +static inline int fsnotify_file(struct file *file, __u32 mask) +{ if (file->f_mode & FMODE_NONOTIFY) return 0; - path = &file->f_path; - return fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH); + return fsnotify_path(&file->f_path, mask); } #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
If parent inode is not watching, check for the event in masks of sb/mount/inode masks early to optimize out most of the code in __fsnotify_parent() and avoid calling fsnotify(). Jens has reported that this optimization improves BW and IOPS in an io_uring benchmark by more than 10% and reduces perf reported CPU usage. before: + 4.51% io_uring [kernel.vmlinux] [k] fsnotify + 3.67% io_uring [kernel.vmlinux] [k] __fsnotify_parent after: + 2.37% io_uring [kernel.vmlinux] [k] __fsnotify_parent Reported-and-tested-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/linux-fsdevel/b45bd8ff-5654-4e67-90a6-aad5e6759e0b@kernel.dk/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> --- Jan, Considering that this change looks like a clear win and it actually the change that you suggested, I cleaned it up a bit and posting for your consideration. I've kept the wrappers fsnotify_path() and fsnotify_sb_has_watchers() although they are not directly related to the optimization, because they makes the code a bit nicer IMO. Jens, I took the liberty to add Reported-and-tested-by to attribute for your help even though the patch that you tested was slightly different than this cleaned up version, because logically it should be identical. Thanks, Amir. Changes since v2: - Add helper fsnotify_object_watched() - Move the optimization from inlined fsnotify_path() to extern __fsnotify_parent() - Optimize for any event - not only content events - Drop the SB_I_CONTENT_WATCHED flag fs/notify/fsnotify.c | 28 +++++++++++++++++----------- include/linux/fsnotify.h | 22 +++++++++++++++------- 2 files changed, 32 insertions(+), 18 deletions(-)