diff mbox series

[RFC,v2] fsnotify: optimize the case of no content event watchers

Message ID 20240111152233.352912-1-amir73il@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC,v2] fsnotify: optimize the case of no content event watchers | expand

Commit Message

Amir Goldstein Jan. 11, 2024, 3:22 p.m. UTC
Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
optimized the case where there are no fsnotify watchers on any of the
filesystem's objects.

It is quite common for a system to have a single local filesystem and
it is quite common for the system to have some inotify watches on some
config files or directories, so the optimization of no marks at all is
often not in effect.

Content event (i.e. access,modify) watchers on sb/mount more rare, so
optimizing the case of no sb/mount marks with content events can improve
performance for more systems, especially for performance sensitive io
workloads.

Set a per-sb flag SB_I_CONTENT_WATCHED if sb/mount marks with content
events in their mask exist and use that flag to optimize out the call to
__fsnotify_parent() and fsnotify() in fsnotify access/modify hooks.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---

Jan,

Based on your feedback on v1 [1] that we can generally do better,
here is another variant for optimizing out FS_ACCESS*/FS_MODIFY.

Instead of relying on the assumption that access event watchers are rare,
this is relying on the assumption that content event sb/mount watchers
are rare.

I temprarily used an s_iflags for the POC and it is not being set in a
race free manner. The intention was to use an sb mask flag or sb
connector flag, but doing that from the context of adding a mount mark
requires a bit of prep work.

Let me know what you think.

Jens,

Can you take v2 for a spin with your workloads?

Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/20240109194818.91465-1-amir73il@gmail.com/

Changes since v1:
- Optimize out also FS_MODIFY events
- Use single taint sb flag instead of counter


 fs/notify/mark.c                 | 26 +++++++++++++++-
 include/linux/fs.h               |  1 +
 include/linux/fsnotify.h         | 52 ++++++++++++++++++++++++++------
 include/linux/fsnotify_backend.h | 12 ++++++++
 4 files changed, 81 insertions(+), 10 deletions(-)

Comments

Jens Axboe Jan. 11, 2024, 4:46 p.m. UTC | #1
On 1/11/24 8:22 AM, Amir Goldstein wrote:
> Jens,
> 
> Can you take v2 for a spin with your workloads?

Still looks good from a performance POV, same as v1. No 6-8% of fsnotify
and parent CPU wastage.
Jan Kara Jan. 12, 2024, 11:09 a.m. UTC | #2
On Thu 11-01-24 17:22:33, Amir Goldstein wrote:
> Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
> optimized the case where there are no fsnotify watchers on any of the
> filesystem's objects.
> 
> It is quite common for a system to have a single local filesystem and
> it is quite common for the system to have some inotify watches on some
> config files or directories, so the optimization of no marks at all is
> often not in effect.
> 
> Content event (i.e. access,modify) watchers on sb/mount more rare, so
> optimizing the case of no sb/mount marks with content events can improve
> performance for more systems, especially for performance sensitive io
> workloads.
> 
> Set a per-sb flag SB_I_CONTENT_WATCHED if sb/mount marks with content
> events in their mask exist and use that flag to optimize out the call to
> __fsnotify_parent() and fsnotify() in fsnotify access/modify hooks.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

...

> -static inline int fsnotify_file(struct file *file, __u32 mask)
> +static inline int fsnotify_path(const struct path *path, __u32 mask)
>  {
> -	const struct path *path;
> +	struct dentry *dentry = path->dentry;
>  
> -	if (file->f_mode & FMODE_NONOTIFY)
> +	if (!fsnotify_sb_has_watchers(dentry->d_sb))
>  		return 0;
>  
> -	path = &file->f_path;
> +	/* Optimize the likely case of sb/mount/parent not watching content */
> +	if (mask & FSNOTIFY_CONTENT_EVENTS &&
> +	    likely(!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED)) &&
> +	    likely(!(dentry->d_sb->s_iflags & SB_I_CONTENT_WATCHED))) {
> +		/*
> +		 * XXX: if SB_I_CONTENT_WATCHED is not set, checking for content
> +		 * events in s_fsnotify_mask is redundant, but it will be needed
> +		 * if we use the flag FS_MNT_CONTENT_WATCHED to indicate the
> +		 * existence of only mount content event watchers.
> +		 */
> +		__u32 marks_mask = d_inode(dentry)->i_fsnotify_mask |
> +				   dentry->d_sb->s_fsnotify_mask;
> +
> +		if (!(mask & marks_mask))
> +			return 0;
> +	}

So I'm probably missing something but how is all this patch different from:

	if (likely(!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED))) {
		__u32 marks_mask = d_inode(dentry)->i_fsnotify_mask |
			path->mnt->mnt_fsnotify_mask |
			dentry->d_sb->s_fsnotify_mask;
		if (!(mask & marks_mask))
			return 0;
	}

I mean (mask & FSNOTIFY_CONTENT_EVENTS) is true for the frequent events
(read & write) we care about. In Jens' case

	!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED) &&
	!(dentry->d_sb->s_iflags & SB_I_CONTENT_WATCHED)

is true as otherwise we'd go right to fsnotify_parent() and so Jens
wouldn't see the performance benefit. But then with your patch you fetch
i_fsnotify_mask and s_fsnotify_mask anyway for the test so the only
difference to what I suggest above is the path->mnt->mnt_fsnotify_mask
fetch but that is equivalent to sb->s_iflags (or wherever we store that
bit) fetch?

So that would confirm that the parent handling costs in fsnotify_parent()
is what's really making the difference and just avoiding that by checking
masks early should be enough?

								Honza
Amir Goldstein Jan. 12, 2024, 1 p.m. UTC | #3
On Fri, Jan 12, 2024 at 1:09 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 11-01-24 17:22:33, Amir Goldstein wrote:
> > Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
> > optimized the case where there are no fsnotify watchers on any of the
> > filesystem's objects.
> >
> > It is quite common for a system to have a single local filesystem and
> > it is quite common for the system to have some inotify watches on some
> > config files or directories, so the optimization of no marks at all is
> > often not in effect.
> >
> > Content event (i.e. access,modify) watchers on sb/mount more rare, so
> > optimizing the case of no sb/mount marks with content events can improve
> > performance for more systems, especially for performance sensitive io
> > workloads.
> >
> > Set a per-sb flag SB_I_CONTENT_WATCHED if sb/mount marks with content
> > events in their mask exist and use that flag to optimize out the call to
> > __fsnotify_parent() and fsnotify() in fsnotify access/modify hooks.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> ...
>
> > -static inline int fsnotify_file(struct file *file, __u32 mask)
> > +static inline int fsnotify_path(const struct path *path, __u32 mask)
> >  {
> > -     const struct path *path;
> > +     struct dentry *dentry = path->dentry;
> >
> > -     if (file->f_mode & FMODE_NONOTIFY)
> > +     if (!fsnotify_sb_has_watchers(dentry->d_sb))
> >               return 0;
> >
> > -     path = &file->f_path;
> > +     /* Optimize the likely case of sb/mount/parent not watching content */
> > +     if (mask & FSNOTIFY_CONTENT_EVENTS &&
> > +         likely(!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED)) &&
> > +         likely(!(dentry->d_sb->s_iflags & SB_I_CONTENT_WATCHED))) {
> > +             /*
> > +              * XXX: if SB_I_CONTENT_WATCHED is not set, checking for content
> > +              * events in s_fsnotify_mask is redundant, but it will be needed
> > +              * if we use the flag FS_MNT_CONTENT_WATCHED to indicate the
> > +              * existence of only mount content event watchers.
> > +              */
> > +             __u32 marks_mask = d_inode(dentry)->i_fsnotify_mask |
> > +                                dentry->d_sb->s_fsnotify_mask;
> > +
> > +             if (!(mask & marks_mask))
> > +                     return 0;
> > +     }
>
> So I'm probably missing something but how is all this patch different from:
>
>         if (likely(!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED))) {
>                 __u32 marks_mask = d_inode(dentry)->i_fsnotify_mask |
>                         path->mnt->mnt_fsnotify_mask |

It's actually:

                          real_mount(path->mnt)->mnt_fsnotify_mask

and this requires including "internal/mount.h" in all the call sites.

>                         dentry->d_sb->s_fsnotify_mask;
>                 if (!(mask & marks_mask))
>                         return 0;
>         }
>
> I mean (mask & FSNOTIFY_CONTENT_EVENTS) is true for the frequent events
> (read & write) we care about. In Jens' case
>
>         !(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED) &&
>         !(dentry->d_sb->s_iflags & SB_I_CONTENT_WATCHED)
>
> is true as otherwise we'd go right to fsnotify_parent() and so Jens
> wouldn't see the performance benefit. But then with your patch you fetch
> i_fsnotify_mask and s_fsnotify_mask anyway for the test so the only
> difference to what I suggest above is the path->mnt->mnt_fsnotify_mask
> fetch but that is equivalent to sb->s_iflags (or wherever we store that
> bit) fetch?
>
> So that would confirm that the parent handling costs in fsnotify_parent()
> is what's really making the difference and just avoiding that by checking
> masks early should be enough?

Can't the benefit be also related to saving a function call?

Only one way to find out...

Jens,

Can you please test attached v3 with a non-inlined fsnotify_path() helper?

Thanks,
Amir.
Jens Axboe Jan. 12, 2024, 1:58 p.m. UTC | #4
On 1/12/24 6:00 AM, Amir Goldstein wrote:
> On Fri, Jan 12, 2024 at 1:09?PM Jan Kara <jack@suse.cz> wrote:
>>
>> On Thu 11-01-24 17:22:33, Amir Goldstein wrote:
>>> Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
>>> optimized the case where there are no fsnotify watchers on any of the
>>> filesystem's objects.
>>>
>>> It is quite common for a system to have a single local filesystem and
>>> it is quite common for the system to have some inotify watches on some
>>> config files or directories, so the optimization of no marks at all is
>>> often not in effect.
>>>
>>> Content event (i.e. access,modify) watchers on sb/mount more rare, so
>>> optimizing the case of no sb/mount marks with content events can improve
>>> performance for more systems, especially for performance sensitive io
>>> workloads.
>>>
>>> Set a per-sb flag SB_I_CONTENT_WATCHED if sb/mount marks with content
>>> events in their mask exist and use that flag to optimize out the call to
>>> __fsnotify_parent() and fsnotify() in fsnotify access/modify hooks.
>>>
>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>
>> ...
>>
>>> -static inline int fsnotify_file(struct file *file, __u32 mask)
>>> +static inline int fsnotify_path(const struct path *path, __u32 mask)
>>>  {
>>> -     const struct path *path;
>>> +     struct dentry *dentry = path->dentry;
>>>
>>> -     if (file->f_mode & FMODE_NONOTIFY)
>>> +     if (!fsnotify_sb_has_watchers(dentry->d_sb))
>>>               return 0;
>>>
>>> -     path = &file->f_path;
>>> +     /* Optimize the likely case of sb/mount/parent not watching content */
>>> +     if (mask & FSNOTIFY_CONTENT_EVENTS &&
>>> +         likely(!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED)) &&
>>> +         likely(!(dentry->d_sb->s_iflags & SB_I_CONTENT_WATCHED))) {
>>> +             /*
>>> +              * XXX: if SB_I_CONTENT_WATCHED is not set, checking for content
>>> +              * events in s_fsnotify_mask is redundant, but it will be needed
>>> +              * if we use the flag FS_MNT_CONTENT_WATCHED to indicate the
>>> +              * existence of only mount content event watchers.
>>> +              */
>>> +             __u32 marks_mask = d_inode(dentry)->i_fsnotify_mask |
>>> +                                dentry->d_sb->s_fsnotify_mask;
>>> +
>>> +             if (!(mask & marks_mask))
>>> +                     return 0;
>>> +     }
>>
>> So I'm probably missing something but how is all this patch different from:
>>
>>         if (likely(!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED))) {
>>                 __u32 marks_mask = d_inode(dentry)->i_fsnotify_mask |
>>                         path->mnt->mnt_fsnotify_mask |
> 
> It's actually:
> 
>                           real_mount(path->mnt)->mnt_fsnotify_mask
> 
> and this requires including "internal/mount.h" in all the call sites.
> 
>>                         dentry->d_sb->s_fsnotify_mask;
>>                 if (!(mask & marks_mask))
>>                         return 0;
>>         }
>>
>> I mean (mask & FSNOTIFY_CONTENT_EVENTS) is true for the frequent events
>> (read & write) we care about. In Jens' case
>>
>>         !(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED) &&
>>         !(dentry->d_sb->s_iflags & SB_I_CONTENT_WATCHED)
>>
>> is true as otherwise we'd go right to fsnotify_parent() and so Jens
>> wouldn't see the performance benefit. But then with your patch you fetch
>> i_fsnotify_mask and s_fsnotify_mask anyway for the test so the only
>> difference to what I suggest above is the path->mnt->mnt_fsnotify_mask
>> fetch but that is equivalent to sb->s_iflags (or wherever we store that
>> bit) fetch?
>>
>> So that would confirm that the parent handling costs in fsnotify_parent()
>> is what's really making the difference and just avoiding that by checking
>> masks early should be enough?
> 
> Can't the benefit be also related to saving a function call?
> 
> Only one way to find out...
> 
> Jens,
> 
> Can you please test attached v3 with a non-inlined fsnotify_path() helper?

I can run it since it doesn't take much to do that, but there's no way
parallel universe where saving a function call would yield those kinds
of wins (or have that much cost).
Jens Axboe Jan. 12, 2024, 2:11 p.m. UTC | #5
On 1/12/24 6:58 AM, Jens Axboe wrote:
> On 1/12/24 6:00 AM, Amir Goldstein wrote:
>> On Fri, Jan 12, 2024 at 1:09?PM Jan Kara <jack@suse.cz> wrote:
>>>
>>> On Thu 11-01-24 17:22:33, Amir Goldstein wrote:
>>>> Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
>>>> optimized the case where there are no fsnotify watchers on any of the
>>>> filesystem's objects.
>>>>
>>>> It is quite common for a system to have a single local filesystem and
>>>> it is quite common for the system to have some inotify watches on some
>>>> config files or directories, so the optimization of no marks at all is
>>>> often not in effect.
>>>>
>>>> Content event (i.e. access,modify) watchers on sb/mount more rare, so
>>>> optimizing the case of no sb/mount marks with content events can improve
>>>> performance for more systems, especially for performance sensitive io
>>>> workloads.
>>>>
>>>> Set a per-sb flag SB_I_CONTENT_WATCHED if sb/mount marks with content
>>>> events in their mask exist and use that flag to optimize out the call to
>>>> __fsnotify_parent() and fsnotify() in fsnotify access/modify hooks.
>>>>
>>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>>
>>> ...
>>>
>>>> -static inline int fsnotify_file(struct file *file, __u32 mask)
>>>> +static inline int fsnotify_path(const struct path *path, __u32 mask)
>>>>  {
>>>> -     const struct path *path;
>>>> +     struct dentry *dentry = path->dentry;
>>>>
>>>> -     if (file->f_mode & FMODE_NONOTIFY)
>>>> +     if (!fsnotify_sb_has_watchers(dentry->d_sb))
>>>>               return 0;
>>>>
>>>> -     path = &file->f_path;
>>>> +     /* Optimize the likely case of sb/mount/parent not watching content */
>>>> +     if (mask & FSNOTIFY_CONTENT_EVENTS &&
>>>> +         likely(!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED)) &&
>>>> +         likely(!(dentry->d_sb->s_iflags & SB_I_CONTENT_WATCHED))) {
>>>> +             /*
>>>> +              * XXX: if SB_I_CONTENT_WATCHED is not set, checking for content
>>>> +              * events in s_fsnotify_mask is redundant, but it will be needed
>>>> +              * if we use the flag FS_MNT_CONTENT_WATCHED to indicate the
>>>> +              * existence of only mount content event watchers.
>>>> +              */
>>>> +             __u32 marks_mask = d_inode(dentry)->i_fsnotify_mask |
>>>> +                                dentry->d_sb->s_fsnotify_mask;
>>>> +
>>>> +             if (!(mask & marks_mask))
>>>> +                     return 0;
>>>> +     }
>>>
>>> So I'm probably missing something but how is all this patch different from:
>>>
>>>         if (likely(!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED))) {
>>>                 __u32 marks_mask = d_inode(dentry)->i_fsnotify_mask |
>>>                         path->mnt->mnt_fsnotify_mask |
>>
>> It's actually:
>>
>>                           real_mount(path->mnt)->mnt_fsnotify_mask
>>
>> and this requires including "internal/mount.h" in all the call sites.
>>
>>>                         dentry->d_sb->s_fsnotify_mask;
>>>                 if (!(mask & marks_mask))
>>>                         return 0;
>>>         }
>>>
>>> I mean (mask & FSNOTIFY_CONTENT_EVENTS) is true for the frequent events
>>> (read & write) we care about. In Jens' case
>>>
>>>         !(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED) &&
>>>         !(dentry->d_sb->s_iflags & SB_I_CONTENT_WATCHED)
>>>
>>> is true as otherwise we'd go right to fsnotify_parent() and so Jens
>>> wouldn't see the performance benefit. But then with your patch you fetch
>>> i_fsnotify_mask and s_fsnotify_mask anyway for the test so the only
>>> difference to what I suggest above is the path->mnt->mnt_fsnotify_mask
>>> fetch but that is equivalent to sb->s_iflags (or wherever we store that
>>> bit) fetch?
>>>
>>> So that would confirm that the parent handling costs in fsnotify_parent()
>>> is what's really making the difference and just avoiding that by checking
>>> masks early should be enough?
>>
>> Can't the benefit be also related to saving a function call?
>>
>> Only one way to find out...
>>
>> Jens,
>>
>> Can you please test attached v3 with a non-inlined fsnotify_path() helper?
> 
> I can run it since it doesn't take much to do that, but there's no way
> parallel universe where saving a function call would yield those kinds
> of wins (or have that much cost).

Ran this patch, and it's better than mainline for sure, but it does have
additional overhead that the previous version did not:

               +1.46%  [kernel.vmlinux]  [k] fsnotify_path
Amir Goldstein Jan. 12, 2024, 2:36 p.m. UTC | #6
On Fri, Jan 12, 2024 at 4:11 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 1/12/24 6:58 AM, Jens Axboe wrote:
> > On 1/12/24 6:00 AM, Amir Goldstein wrote:
> >> On Fri, Jan 12, 2024 at 1:09?PM Jan Kara <jack@suse.cz> wrote:
> >>>
> >>> On Thu 11-01-24 17:22:33, Amir Goldstein wrote:
> >>>> Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
> >>>> optimized the case where there are no fsnotify watchers on any of the
> >>>> filesystem's objects.
> >>>>
> >>>> It is quite common for a system to have a single local filesystem and
> >>>> it is quite common for the system to have some inotify watches on some
> >>>> config files or directories, so the optimization of no marks at all is
> >>>> often not in effect.
> >>>>
> >>>> Content event (i.e. access,modify) watchers on sb/mount more rare, so
> >>>> optimizing the case of no sb/mount marks with content events can improve
> >>>> performance for more systems, especially for performance sensitive io
> >>>> workloads.
> >>>>
> >>>> Set a per-sb flag SB_I_CONTENT_WATCHED if sb/mount marks with content
> >>>> events in their mask exist and use that flag to optimize out the call to
> >>>> __fsnotify_parent() and fsnotify() in fsnotify access/modify hooks.
> >>>>
> >>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >>>
> >>> ...
> >>>
> >>>> -static inline int fsnotify_file(struct file *file, __u32 mask)
> >>>> +static inline int fsnotify_path(const struct path *path, __u32 mask)
> >>>>  {
> >>>> -     const struct path *path;
> >>>> +     struct dentry *dentry = path->dentry;
> >>>>
> >>>> -     if (file->f_mode & FMODE_NONOTIFY)
> >>>> +     if (!fsnotify_sb_has_watchers(dentry->d_sb))
> >>>>               return 0;
> >>>>
> >>>> -     path = &file->f_path;
> >>>> +     /* Optimize the likely case of sb/mount/parent not watching content */
> >>>> +     if (mask & FSNOTIFY_CONTENT_EVENTS &&
> >>>> +         likely(!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED)) &&
> >>>> +         likely(!(dentry->d_sb->s_iflags & SB_I_CONTENT_WATCHED))) {
> >>>> +             /*
> >>>> +              * XXX: if SB_I_CONTENT_WATCHED is not set, checking for content
> >>>> +              * events in s_fsnotify_mask is redundant, but it will be needed
> >>>> +              * if we use the flag FS_MNT_CONTENT_WATCHED to indicate the
> >>>> +              * existence of only mount content event watchers.
> >>>> +              */
> >>>> +             __u32 marks_mask = d_inode(dentry)->i_fsnotify_mask |
> >>>> +                                dentry->d_sb->s_fsnotify_mask;
> >>>> +
> >>>> +             if (!(mask & marks_mask))
> >>>> +                     return 0;
> >>>> +     }
> >>>
> >>> So I'm probably missing something but how is all this patch different from:
> >>>
> >>>         if (likely(!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED))) {
> >>>                 __u32 marks_mask = d_inode(dentry)->i_fsnotify_mask |
> >>>                         path->mnt->mnt_fsnotify_mask |
> >>
> >> It's actually:
> >>
> >>                           real_mount(path->mnt)->mnt_fsnotify_mask
> >>
> >> and this requires including "internal/mount.h" in all the call sites.
> >>
> >>>                         dentry->d_sb->s_fsnotify_mask;
> >>>                 if (!(mask & marks_mask))
> >>>                         return 0;
> >>>         }
> >>>
> >>> I mean (mask & FSNOTIFY_CONTENT_EVENTS) is true for the frequent events
> >>> (read & write) we care about. In Jens' case
> >>>
> >>>         !(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED) &&
> >>>         !(dentry->d_sb->s_iflags & SB_I_CONTENT_WATCHED)
> >>>
> >>> is true as otherwise we'd go right to fsnotify_parent() and so Jens
> >>> wouldn't see the performance benefit. But then with your patch you fetch
> >>> i_fsnotify_mask and s_fsnotify_mask anyway for the test so the only
> >>> difference to what I suggest above is the path->mnt->mnt_fsnotify_mask
> >>> fetch but that is equivalent to sb->s_iflags (or wherever we store that
> >>> bit) fetch?
> >>>
> >>> So that would confirm that the parent handling costs in fsnotify_parent()
> >>> is what's really making the difference and just avoiding that by checking
> >>> masks early should be enough?
> >>
> >> Can't the benefit be also related to saving a function call?
> >>
> >> Only one way to find out...
> >>
> >> Jens,
> >>
> >> Can you please test attached v3 with a non-inlined fsnotify_path() helper?
> >
> > I can run it since it doesn't take much to do that, but there's no way
> > parallel universe where saving a function call would yield those kinds
> > of wins (or have that much cost).
>
> Ran this patch, and it's better than mainline for sure, but it does have
> additional overhead that the previous version did not:
>
>                +1.46%  [kernel.vmlinux]  [k] fsnotify_path
>

That is what I suspected would happen.

So at this time, we have patch v2 which looks like a clear win.
It uses a slightly hackish SB_I_CONTENT_WATCHED to work around
the fact that real_mount() is not accessible to the inlined call sites.

With a little more effort, we can move SB_I_CONTENT_WATCHED
into the s_fsnotify_mask, but TBH, I don't know if moving it is worth
the effort because right now, SB_I_CONTENT_WATCHED would
be really rare and with HSM events, an sb really does behave in
slightly different ways, so it might not be such a bad idea to mark
the sb and publish this state to all the subsystems with an s_iflags.

If folks agree to keeping SB_I_CONTENT_WATCHED, we would
need to protect from a rare concurrent update with
SB_I_TS_EXPIRY_WARNED flag.

Jan, what do you think?

Thanks,
Amir.
Jan Kara Jan. 15, 2024, 4:11 p.m. UTC | #7
On Fri 12-01-24 07:11:42, Jens Axboe wrote:
> On 1/12/24 6:58 AM, Jens Axboe wrote:
> > On 1/12/24 6:00 AM, Amir Goldstein wrote:
> >> On Fri, Jan 12, 2024 at 1:09?PM Jan Kara <jack@suse.cz> wrote:
> >>>
> >>> On Thu 11-01-24 17:22:33, Amir Goldstein wrote:
> >>>> Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
> >>>> optimized the case where there are no fsnotify watchers on any of the
> >>>> filesystem's objects.
> >>>>
> >>>> It is quite common for a system to have a single local filesystem and
> >>>> it is quite common for the system to have some inotify watches on some
> >>>> config files or directories, so the optimization of no marks at all is
> >>>> often not in effect.
> >>>>
> >>>> Content event (i.e. access,modify) watchers on sb/mount more rare, so
> >>>> optimizing the case of no sb/mount marks with content events can improve
> >>>> performance for more systems, especially for performance sensitive io
> >>>> workloads.
> >>>>
> >>>> Set a per-sb flag SB_I_CONTENT_WATCHED if sb/mount marks with content
> >>>> events in their mask exist and use that flag to optimize out the call to
> >>>> __fsnotify_parent() and fsnotify() in fsnotify access/modify hooks.
> >>>>
> >>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >>>
> >>> ...
> >>>
> >>>> -static inline int fsnotify_file(struct file *file, __u32 mask)
> >>>> +static inline int fsnotify_path(const struct path *path, __u32 mask)
> >>>>  {
> >>>> -     const struct path *path;
> >>>> +     struct dentry *dentry = path->dentry;
> >>>>
> >>>> -     if (file->f_mode & FMODE_NONOTIFY)
> >>>> +     if (!fsnotify_sb_has_watchers(dentry->d_sb))
> >>>>               return 0;
> >>>>
> >>>> -     path = &file->f_path;
> >>>> +     /* Optimize the likely case of sb/mount/parent not watching content */
> >>>> +     if (mask & FSNOTIFY_CONTENT_EVENTS &&
> >>>> +         likely(!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED)) &&
> >>>> +         likely(!(dentry->d_sb->s_iflags & SB_I_CONTENT_WATCHED))) {
> >>>> +             /*
> >>>> +              * XXX: if SB_I_CONTENT_WATCHED is not set, checking for content
> >>>> +              * events in s_fsnotify_mask is redundant, but it will be needed
> >>>> +              * if we use the flag FS_MNT_CONTENT_WATCHED to indicate the
> >>>> +              * existence of only mount content event watchers.
> >>>> +              */
> >>>> +             __u32 marks_mask = d_inode(dentry)->i_fsnotify_mask |
> >>>> +                                dentry->d_sb->s_fsnotify_mask;
> >>>> +
> >>>> +             if (!(mask & marks_mask))
> >>>> +                     return 0;
> >>>> +     }
> >>>
> >>> So I'm probably missing something but how is all this patch different from:
> >>>
> >>>         if (likely(!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED))) {
> >>>                 __u32 marks_mask = d_inode(dentry)->i_fsnotify_mask |
> >>>                         path->mnt->mnt_fsnotify_mask |
> >>
> >> It's actually:
> >>
> >>                           real_mount(path->mnt)->mnt_fsnotify_mask
> >>
> >> and this requires including "internal/mount.h" in all the call sites.
> >>
> >>>                         dentry->d_sb->s_fsnotify_mask;
> >>>                 if (!(mask & marks_mask))
> >>>                         return 0;
> >>>         }
> >>>
> >>> I mean (mask & FSNOTIFY_CONTENT_EVENTS) is true for the frequent events
> >>> (read & write) we care about. In Jens' case
> >>>
> >>>         !(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED) &&
> >>>         !(dentry->d_sb->s_iflags & SB_I_CONTENT_WATCHED)
> >>>
> >>> is true as otherwise we'd go right to fsnotify_parent() and so Jens
> >>> wouldn't see the performance benefit. But then with your patch you fetch
> >>> i_fsnotify_mask and s_fsnotify_mask anyway for the test so the only
> >>> difference to what I suggest above is the path->mnt->mnt_fsnotify_mask
> >>> fetch but that is equivalent to sb->s_iflags (or wherever we store that
> >>> bit) fetch?
> >>>
> >>> So that would confirm that the parent handling costs in fsnotify_parent()
> >>> is what's really making the difference and just avoiding that by checking
> >>> masks early should be enough?
> >>
> >> Can't the benefit be also related to saving a function call?
> >>
> >> Only one way to find out...
> >>
> >> Jens,
> >>
> >> Can you please test attached v3 with a non-inlined fsnotify_path() helper?
> > 
> > I can run it since it doesn't take much to do that, but there's no way
> > parallel universe where saving a function call would yield those kinds
> > of wins (or have that much cost).
> 
> Ran this patch, and it's better than mainline for sure, but it does have
> additional overhead that the previous version did not:
> 
>                +1.46%  [kernel.vmlinux]  [k] fsnotify_path

So did you see any effect in IOPS or just this difference in perf profile?
Because Amir's patch took a bunch of code that was previously inlined
(thus its cost was blended with the cost of the rest of the read/write
code) and moved it to this new fsnotify_path() function so its cost is now
visible...

								Honza
Jens Axboe Jan. 15, 2024, 4:14 p.m. UTC | #8
On 1/15/24 9:11 AM, Jan Kara wrote:
> On Fri 12-01-24 07:11:42, Jens Axboe wrote:
>> On 1/12/24 6:58 AM, Jens Axboe wrote:
>>> On 1/12/24 6:00 AM, Amir Goldstein wrote:
>>>> On Fri, Jan 12, 2024 at 1:09?PM Jan Kara <jack@suse.cz> wrote:
>>>>>
>>>>> On Thu 11-01-24 17:22:33, Amir Goldstein wrote:
>>>>>> Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
>>>>>> optimized the case where there are no fsnotify watchers on any of the
>>>>>> filesystem's objects.
>>>>>>
>>>>>> It is quite common for a system to have a single local filesystem and
>>>>>> it is quite common for the system to have some inotify watches on some
>>>>>> config files or directories, so the optimization of no marks at all is
>>>>>> often not in effect.
>>>>>>
>>>>>> Content event (i.e. access,modify) watchers on sb/mount more rare, so
>>>>>> optimizing the case of no sb/mount marks with content events can improve
>>>>>> performance for more systems, especially for performance sensitive io
>>>>>> workloads.
>>>>>>
>>>>>> Set a per-sb flag SB_I_CONTENT_WATCHED if sb/mount marks with content
>>>>>> events in their mask exist and use that flag to optimize out the call to
>>>>>> __fsnotify_parent() and fsnotify() in fsnotify access/modify hooks.
>>>>>>
>>>>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>>>>
>>>>> ...
>>>>>
>>>>>> -static inline int fsnotify_file(struct file *file, __u32 mask)
>>>>>> +static inline int fsnotify_path(const struct path *path, __u32 mask)
>>>>>>  {
>>>>>> -     const struct path *path;
>>>>>> +     struct dentry *dentry = path->dentry;
>>>>>>
>>>>>> -     if (file->f_mode & FMODE_NONOTIFY)
>>>>>> +     if (!fsnotify_sb_has_watchers(dentry->d_sb))
>>>>>>               return 0;
>>>>>>
>>>>>> -     path = &file->f_path;
>>>>>> +     /* Optimize the likely case of sb/mount/parent not watching content */
>>>>>> +     if (mask & FSNOTIFY_CONTENT_EVENTS &&
>>>>>> +         likely(!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED)) &&
>>>>>> +         likely(!(dentry->d_sb->s_iflags & SB_I_CONTENT_WATCHED))) {
>>>>>> +             /*
>>>>>> +              * XXX: if SB_I_CONTENT_WATCHED is not set, checking for content
>>>>>> +              * events in s_fsnotify_mask is redundant, but it will be needed
>>>>>> +              * if we use the flag FS_MNT_CONTENT_WATCHED to indicate the
>>>>>> +              * existence of only mount content event watchers.
>>>>>> +              */
>>>>>> +             __u32 marks_mask = d_inode(dentry)->i_fsnotify_mask |
>>>>>> +                                dentry->d_sb->s_fsnotify_mask;
>>>>>> +
>>>>>> +             if (!(mask & marks_mask))
>>>>>> +                     return 0;
>>>>>> +     }
>>>>>
>>>>> So I'm probably missing something but how is all this patch different from:
>>>>>
>>>>>         if (likely(!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED))) {
>>>>>                 __u32 marks_mask = d_inode(dentry)->i_fsnotify_mask |
>>>>>                         path->mnt->mnt_fsnotify_mask |
>>>>
>>>> It's actually:
>>>>
>>>>                           real_mount(path->mnt)->mnt_fsnotify_mask
>>>>
>>>> and this requires including "internal/mount.h" in all the call sites.
>>>>
>>>>>                         dentry->d_sb->s_fsnotify_mask;
>>>>>                 if (!(mask & marks_mask))
>>>>>                         return 0;
>>>>>         }
>>>>>
>>>>> I mean (mask & FSNOTIFY_CONTENT_EVENTS) is true for the frequent events
>>>>> (read & write) we care about. In Jens' case
>>>>>
>>>>>         !(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED) &&
>>>>>         !(dentry->d_sb->s_iflags & SB_I_CONTENT_WATCHED)
>>>>>
>>>>> is true as otherwise we'd go right to fsnotify_parent() and so Jens
>>>>> wouldn't see the performance benefit. But then with your patch you fetch
>>>>> i_fsnotify_mask and s_fsnotify_mask anyway for the test so the only
>>>>> difference to what I suggest above is the path->mnt->mnt_fsnotify_mask
>>>>> fetch but that is equivalent to sb->s_iflags (or wherever we store that
>>>>> bit) fetch?
>>>>>
>>>>> So that would confirm that the parent handling costs in fsnotify_parent()
>>>>> is what's really making the difference and just avoiding that by checking
>>>>> masks early should be enough?
>>>>
>>>> Can't the benefit be also related to saving a function call?
>>>>
>>>> Only one way to find out...
>>>>
>>>> Jens,
>>>>
>>>> Can you please test attached v3 with a non-inlined fsnotify_path() helper?
>>>
>>> I can run it since it doesn't take much to do that, but there's no way
>>> parallel universe where saving a function call would yield those kinds
>>> of wins (or have that much cost).
>>
>> Ran this patch, and it's better than mainline for sure, but it does have
>> additional overhead that the previous version did not:
>>
>>                +1.46%  [kernel.vmlinux]  [k] fsnotify_path
> 
> So did you see any effect in IOPS or just this difference in perf profile?
> Because Amir's patch took a bunch of code that was previously inlined
> (thus its cost was blended with the cost of the rest of the read/write
> code) and moved it to this new fsnotify_path() function so its cost is now
> visible...

These tests are CPU bound, but I don't recall for this one as there's a
bit of a mixup with the previously reported regression for 6.8-git where
we now do an extra fsnotify call per mem import.
Jan Kara Jan. 15, 2024, 6:37 p.m. UTC | #9
On Fri 12-01-24 16:36:05, Amir Goldstein wrote:
> On Fri, Jan 12, 2024 at 4:11 PM Jens Axboe <axboe@kernel.dk> wrote:
> >
> > On 1/12/24 6:58 AM, Jens Axboe wrote:
> > > On 1/12/24 6:00 AM, Amir Goldstein wrote:
> > >> On Fri, Jan 12, 2024 at 1:09?PM Jan Kara <jack@suse.cz> wrote:
> > >>>
> > >>> On Thu 11-01-24 17:22:33, Amir Goldstein wrote:
> > >>>> Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
> > >>>> optimized the case where there are no fsnotify watchers on any of the
> > >>>> filesystem's objects.
> > >>>>
> > >>>> It is quite common for a system to have a single local filesystem and
> > >>>> it is quite common for the system to have some inotify watches on some
> > >>>> config files or directories, so the optimization of no marks at all is
> > >>>> often not in effect.
> > >>>>
> > >>>> Content event (i.e. access,modify) watchers on sb/mount more rare, so
> > >>>> optimizing the case of no sb/mount marks with content events can improve
> > >>>> performance for more systems, especially for performance sensitive io
> > >>>> workloads.
> > >>>>
> > >>>> Set a per-sb flag SB_I_CONTENT_WATCHED if sb/mount marks with content
> > >>>> events in their mask exist and use that flag to optimize out the call to
> > >>>> __fsnotify_parent() and fsnotify() in fsnotify access/modify hooks.
> > >>>>
> > >>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > >>>
> > >>> ...
> > >>>
> > >>>> -static inline int fsnotify_file(struct file *file, __u32 mask)
> > >>>> +static inline int fsnotify_path(const struct path *path, __u32 mask)
> > >>>>  {
> > >>>> -     const struct path *path;
> > >>>> +     struct dentry *dentry = path->dentry;
> > >>>>
> > >>>> -     if (file->f_mode & FMODE_NONOTIFY)
> > >>>> +     if (!fsnotify_sb_has_watchers(dentry->d_sb))
> > >>>>               return 0;
> > >>>>
> > >>>> -     path = &file->f_path;
> > >>>> +     /* Optimize the likely case of sb/mount/parent not watching content */
> > >>>> +     if (mask & FSNOTIFY_CONTENT_EVENTS &&
> > >>>> +         likely(!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED)) &&
> > >>>> +         likely(!(dentry->d_sb->s_iflags & SB_I_CONTENT_WATCHED))) {
> > >>>> +             /*
> > >>>> +              * XXX: if SB_I_CONTENT_WATCHED is not set, checking for content
> > >>>> +              * events in s_fsnotify_mask is redundant, but it will be needed
> > >>>> +              * if we use the flag FS_MNT_CONTENT_WATCHED to indicate the
> > >>>> +              * existence of only mount content event watchers.
> > >>>> +              */
> > >>>> +             __u32 marks_mask = d_inode(dentry)->i_fsnotify_mask |
> > >>>> +                                dentry->d_sb->s_fsnotify_mask;
> > >>>> +
> > >>>> +             if (!(mask & marks_mask))
> > >>>> +                     return 0;
> > >>>> +     }
> > >>>
> > >>> So I'm probably missing something but how is all this patch different from:
> > >>>
> > >>>         if (likely(!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED))) {
> > >>>                 __u32 marks_mask = d_inode(dentry)->i_fsnotify_mask |
> > >>>                         path->mnt->mnt_fsnotify_mask |
> > >>
> > >> It's actually:
> > >>
> > >>                           real_mount(path->mnt)->mnt_fsnotify_mask
> > >>
> > >> and this requires including "internal/mount.h" in all the call sites.
> > >>
> > >>>                         dentry->d_sb->s_fsnotify_mask;
> > >>>                 if (!(mask & marks_mask))
> > >>>                         return 0;
> > >>>         }
> > >>>
> > >>> I mean (mask & FSNOTIFY_CONTENT_EVENTS) is true for the frequent events
> > >>> (read & write) we care about. In Jens' case
> > >>>
> > >>>         !(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED) &&
> > >>>         !(dentry->d_sb->s_iflags & SB_I_CONTENT_WATCHED)
> > >>>
> > >>> is true as otherwise we'd go right to fsnotify_parent() and so Jens
> > >>> wouldn't see the performance benefit. But then with your patch you fetch
> > >>> i_fsnotify_mask and s_fsnotify_mask anyway for the test so the only
> > >>> difference to what I suggest above is the path->mnt->mnt_fsnotify_mask
> > >>> fetch but that is equivalent to sb->s_iflags (or wherever we store that
> > >>> bit) fetch?
> > >>>
> > >>> So that would confirm that the parent handling costs in fsnotify_parent()
> > >>> is what's really making the difference and just avoiding that by checking
> > >>> masks early should be enough?
> > >>
> > >> Can't the benefit be also related to saving a function call?
> > >>
> > >> Only one way to find out...
> > >>
> > >> Jens,
> > >>
> > >> Can you please test attached v3 with a non-inlined fsnotify_path() helper?
> > >
> > > I can run it since it doesn't take much to do that, but there's no way
> > > parallel universe where saving a function call would yield those kinds
> > > of wins (or have that much cost).
> >
> > Ran this patch, and it's better than mainline for sure, but it does have
> > additional overhead that the previous version did not:
> >
> >                +1.46%  [kernel.vmlinux]  [k] fsnotify_path
> >
> 
> That is what I suspected would happen.
> 
> So at this time, we have patch v2 which looks like a clear win.
> It uses a slightly hackish SB_I_CONTENT_WATCHED to work around
> the fact that real_mount() is not accessible to the inlined call sites.

But won't it be better to move mnt_fsnotify_mask and perhaps
mnt_fsnotify_marks (to keep things in one place) into struct vfsmount in
that case? IMHO working around this visibility problem with extra flag (and
the code maintaining it) is not a great tradeoff...

As I wrote in my email to Jens I think your v3 patch just makes the real
cost of fsnotify checks visible in fsnotify_path() instead of being hidden
in the cost of read / write.

								Honza
Amir Goldstein Jan. 15, 2024, 7:42 p.m. UTC | #10
> > So at this time, we have patch v2 which looks like a clear win.
> > It uses a slightly hackish SB_I_CONTENT_WATCHED to work around
> > the fact that real_mount() is not accessible to the inlined call sites.
>
> But won't it be better to move mnt_fsnotify_mask and perhaps
> mnt_fsnotify_marks (to keep things in one place) into struct vfsmount in
> that case?

I am afraid to even bring this up to Al and Christian ;-)

> IMHO working around this visibility problem with extra flag (and
> the code maintaining it) is not a great tradeoff...

I agree.

>
> As I wrote in my email to Jens I think your v3 patch just makes the real
> cost of fsnotify checks visible in fsnotify_path() instead of being hidden
> in the cost of read / write.

There is a flaw in that argument -
Assuming that in Jens' test no parent/mnt/sb are watching,
all the reads/write on regular file would end up calling __fsnotify_parent()
in upstream code - there is nothing to optimize away this call.

But now that I am looking closer at __fsnotify_parent(), I see what you
were trying to say - if I move the checks from fsnotify_path() to the
beginning of __fsnotify_parent(), it should get most of the performance
from v3 patch, so this is all that should be needed is this simple patch
attached.

I cannot really understand why we did not do this sooner...

Jens, if you have time for another round please test.
My expectation would be to see something like

+1.46%  [kernel.vmlinux]  [k] __fsnotify_parent

Thanks,
Amir.

+
+               if (!(mask & marks_mask))
+                       return 0;
+       }
Jens Axboe Jan. 15, 2024, 8:48 p.m. UTC | #11
> Jens, if you have time for another round please test.
> My expectation would be to see something like
> 
> +1.46%  [kernel.vmlinux]  [k] __fsnotify_parent

Sure - so first I can current -git, which we know has that extra
fsnotify call as previously discussed. This is with SECURITY=y and
FANOTITY_ACCESS_PERMISSIONS=y for both.

Performance:

IOPS=65.01M, BW=31.74GiB/s, IOS/call=31/32
IOPS=65.42M, BW=31.94GiB/s, IOS/call=32/32
IOPS=65.02M, BW=31.75GiB/s, IOS/call=31/32
IOPS=65.37M, BW=31.92GiB/s, IOS/call=32/32
IOPS=65.44M, BW=31.95GiB/s, IOS/call=31/31
IOPS=64.91M, BW=31.70GiB/s, IOS/call=32/32
IOPS=64.25M, BW=31.37GiB/s, IOS/call=32/31

and profile:

+    4.51%  io_uring  [kernel.vmlinux]  [k] fsnotify
+    3.67%  io_uring  [kernel.vmlinux]  [k] __fsnotify_parent

With this patch applied, I see:

IOPS=73.09M, BW=35.69GiB/s, IOS/call=32/31
IOPS=72.94M, BW=35.61GiB/s, IOS/call=32/32
IOPS=72.95M, BW=35.62GiB/s, IOS/call=31/31
IOPS=72.54M, BW=35.42GiB/s, IOS/call=32/32
IOPS=73.01M, BW=35.65GiB/s, IOS/call=32/31
IOPS=73.07M, BW=35.68GiB/s, IOS/call=32/32

and profile:

+    2.37%  io_uring  [kernel.vmlinux]  [k] __fsnotify_parent

and perf diff shows the following top items:

     3.67%     -1.30%  [kernel.vmlinux]  [k] __fsnotify_parent
     4.51%             [kernel.vmlinux]  [k] fsnotify

Let me know if this is what you wanted, or if the base should be with
that extra fsnotify removed.
diff mbox series

Patch

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index d6944ff86ffa..d0e208913881 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -153,9 +153,30 @@  static struct inode *fsnotify_update_iref(struct fsnotify_mark_connector *conn,
 	return inode;
 }
 
+/*
+ * To avoid the performance penalty of rare case of sb/mount content event
+ * watchers in the hot io path, taint sb if such watchers are added.
+ */
+static void fsnotify_update_sb_watchers(struct fsnotify_mark_connector *conn,
+					u32 old_mask, u32 new_mask)
+{
+	struct super_block *sb = fsnotify_connector_sb(conn);
+	u32 new_watchers = new_mask & ~old_mask & FSNOTIFY_CONTENT_EVENTS;
+
+	if (conn->type == FSNOTIFY_OBJ_TYPE_INODE || !sb || !new_watchers)
+		return;
+
+	/*
+	 * TODO: We need to take sb conn->lock to set FS_MNT_CONTENT_WATCHED
+	 * in sb->s_fsnotify_mask, but if this is a recalc of mount mark mask,
+	 * it is not sure that we have an sb connector attached yet.
+	 */
+	sb->s_iflags |= SB_I_CONTENT_WATCHED;
+}
+
 static void *__fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
 {
-	u32 new_mask = 0;
+	u32 old_mask, new_mask = 0;
 	bool want_iref = false;
 	struct fsnotify_mark *mark;
 
@@ -163,6 +184,7 @@  static void *__fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
 	/* We can get detached connector here when inode is getting unlinked. */
 	if (!fsnotify_valid_obj_type(conn->type))
 		return NULL;
+	old_mask = fsnotify_conn_mask(conn);
 	hlist_for_each_entry(mark, &conn->list, obj_list) {
 		if (!(mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED))
 			continue;
@@ -173,6 +195,8 @@  static void *__fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
 	}
 	*fsnotify_conn_mask_p(conn) = new_mask;
 
+	fsnotify_update_sb_watchers(conn, old_mask, new_mask);
+
 	return fsnotify_update_iref(conn, want_iref);
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e6ba0cc6f2ee..dac36fe139e1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1173,6 +1173,7 @@  extern int send_sigurg(struct fown_struct *fown);
 #define SB_I_TS_EXPIRY_WARNED 0x00000400 /* warned about timestamp range expiry */
 #define SB_I_RETIRED	0x00000800	/* superblock shouldn't be reused */
 #define SB_I_NOUMASK	0x00001000	/* VFS does not apply umask */
+#define SB_I_CONTENT_WATCHED 0x00002000 /* fsnotify file content monitor */
 
 /* Possible states of 'frozen' field */
 enum {
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 11e6434b8e71..67568ed4b64b 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -17,6 +17,12 @@ 
 #include <linux/slab.h>
 #include <linux/bug.h>
 
+/* Are there any inode/mount/sb objects that are being watched? */
+static inline bool fsnotify_sb_has_watchers(struct super_block *sb)
+{
+	return atomic_long_read(&sb->s_fsnotify_connectors);
+}
+
 /*
  * Notify this @dir inode about a change in a child directory entry.
  * The directory entry may have turned positive or negative or its inode may
@@ -30,7 +36,7 @@  static inline int fsnotify_name(__u32 mask, const void *data, int data_type,
 				struct inode *dir, const struct qstr *name,
 				u32 cookie)
 {
-	if (atomic_long_read(&dir->i_sb->s_fsnotify_connectors) == 0)
+	if (!fsnotify_sb_has_watchers(dir->i_sb))
 		return 0;
 
 	return fsnotify(mask, data, data_type, dir, name, NULL, cookie);
@@ -44,7 +50,7 @@  static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry,
 
 static inline void fsnotify_inode(struct inode *inode, __u32 mask)
 {
-	if (atomic_long_read(&inode->i_sb->s_fsnotify_connectors) == 0)
+	if (!fsnotify_sb_has_watchers(inode->i_sb))
 		return;
 
 	if (S_ISDIR(inode->i_mode))
@@ -59,9 +65,6 @@  static inline int fsnotify_parent(struct dentry *dentry, __u32 mask,
 {
 	struct inode *inode = d_inode(dentry);
 
-	if (atomic_long_read(&inode->i_sb->s_fsnotify_connectors) == 0)
-		return 0;
-
 	if (S_ISDIR(inode->i_mode)) {
 		mask |= FS_ISDIR;
 
@@ -86,20 +89,51 @@  static inline int fsnotify_parent(struct dentry *dentry, __u32 mask,
  */
 static inline void fsnotify_dentry(struct dentry *dentry, __u32 mask)
 {
+	if (!fsnotify_sb_has_watchers(dentry->d_sb))
+		return;
+
 	fsnotify_parent(dentry, mask, dentry, FSNOTIFY_EVENT_DENTRY);
 }
 
-static inline int fsnotify_file(struct file *file, __u32 mask)
+static inline int fsnotify_path(const struct path *path, __u32 mask)
 {
-	const struct path *path;
+	struct dentry *dentry = path->dentry;
 
-	if (file->f_mode & FMODE_NONOTIFY)
+	if (!fsnotify_sb_has_watchers(dentry->d_sb))
 		return 0;
 
-	path = &file->f_path;
+	/* Optimize the likely case of sb/mount/parent not watching content */
+	if (mask & FSNOTIFY_CONTENT_EVENTS &&
+	    likely(!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED)) &&
+	    likely(!(dentry->d_sb->s_iflags & SB_I_CONTENT_WATCHED))) {
+		/*
+		 * XXX: if SB_I_CONTENT_WATCHED is not set, checking for content
+		 * events in s_fsnotify_mask is redundant, but it will be needed
+		 * if we use the flag FS_MNT_CONTENT_WATCHED to indicate the
+		 * existence of only mount content event watchers.
+		 */
+		__u32 marks_mask = d_inode(dentry)->i_fsnotify_mask |
+				   dentry->d_sb->s_fsnotify_mask;
+
+		if (!(mask & marks_mask))
+			return 0;
+	}
+
+	/*
+	 * sb/mount/parent are being watched. Need to call fsnotify_parent()
+	 * to check the event masks of sb/mount/parent/inode marks.
+	 */
 	return fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH);
 }
 
+static inline int fsnotify_file(struct file *file, __u32 mask)
+{
+	if (file->f_mode & FMODE_NONOTIFY)
+		return 0;
+
+	return fsnotify_path(&file->f_path, mask);
+}
+
 /*
  * fsnotify_file_area_perm - permission hook before access to file range
  */
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 7f63be5ca0f1..54e297f9e03e 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -66,6 +66,11 @@ 
 #define FS_RENAME		0x10000000	/* File was renamed */
 #define FS_DN_MULTISHOT		0x20000000	/* dnotify multishot */
 #define FS_ISDIR		0x40000000	/* event occurred against dir */
+/*
+ * This flag is set in the object interest mask of sb to indicate that
+ * some mount mark is interested to get content events.
+ */
+#define FS_MNT_CONTENT_WATCHED	0x80000000
 
 #define FS_MOVE			(FS_MOVED_FROM | FS_MOVED_TO)
 
@@ -77,6 +82,13 @@ 
  */
 #define ALL_FSNOTIFY_DIRENT_EVENTS (FS_CREATE | FS_DELETE | FS_MOVE | FS_RENAME)
 
+/* Content events can be used to inspect file content */
+#define FSNOTIFY_CONTENT_PERM_EVENTS	(FS_ACCESS_PERM)
+
+/* Content events can be used to monitor file content */
+#define FSNOTIFY_CONTENT_EVENTS		(FS_ACCESS | FS_MODIFY | \
+					 FSNOTIFY_CONTENT_PERM_EVENTS)
+
 #define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \
 				  FS_OPEN_EXEC_PERM)