diff mbox series

[v8,02/19] fsnotify: opt-in for permission events at file open time

Message ID 5ea5f8e283d1edb55aa79c35187bfe344056af14.1731684329.git.josef@toxicpanda.com (mailing list archive)
State New
Headers show
Series fanotify: add pre-content hooks | expand

Commit Message

Josef Bacik Nov. 15, 2024, 3:30 p.m. UTC
From: Amir Goldstein <amir73il@gmail.com>

Legacy inotify/fanotify listeners can add watches for events on inode,
parent or mount and expect to get events (e.g. FS_MODIFY) on files that
were already open at the time of setting up the watches.

fanotify permission events are typically used by Anti-malware sofware,
that is watching the entire mount and it is not common to have more that
one Anti-malware engine installed on a system.

To reduce the overhead of the fsnotify_file_perm() hooks on every file
access, relax the semantics of the legacy FAN_ACCESS_PERM event to generate
events only if there were *any* permission event listeners on the
filesystem at the time that the file was opened.

The new semantic is implemented by extending the FMODE_NONOTIFY bit into
two FMODE_NONOTIFY_* bits, that are used to store a mode for which of the
events types to report.

This is going to apply to the new fanotify pre-content events in order
to reduce the cost of the new pre-content event vfs hooks.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wj8L=mtcRTi=NECHMGfZQgXOp_uix1YVh04fEmrKaMnXA@mail.gmail.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/open.c                |  8 ++++-
 include/linux/fs.h       | 35 ++++++++++++++++---
 include/linux/fsnotify.h | 72 +++++++++++++++++++++++++++++++---------
 3 files changed, 93 insertions(+), 22 deletions(-)

Comments

Jan Kara Nov. 20, 2024, 3:53 p.m. UTC | #1
On Fri 15-11-24 10:30:15, Josef Bacik wrote:
> From: Amir Goldstein <amir73il@gmail.com>
> 
> Legacy inotify/fanotify listeners can add watches for events on inode,
> parent or mount and expect to get events (e.g. FS_MODIFY) on files that
> were already open at the time of setting up the watches.
> 
> fanotify permission events are typically used by Anti-malware sofware,
> that is watching the entire mount and it is not common to have more that
> one Anti-malware engine installed on a system.
> 
> To reduce the overhead of the fsnotify_file_perm() hooks on every file
> access, relax the semantics of the legacy FAN_ACCESS_PERM event to generate
> events only if there were *any* permission event listeners on the
> filesystem at the time that the file was opened.
> 
> The new semantic is implemented by extending the FMODE_NONOTIFY bit into
> two FMODE_NONOTIFY_* bits, that are used to store a mode for which of the
> events types to report.
> 
> This is going to apply to the new fanotify pre-content events in order
> to reduce the cost of the new pre-content event vfs hooks.
> 
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wj8L=mtcRTi=NECHMGfZQgXOp_uix1YVh04fEmrKaMnXA@mail.gmail.com/
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

FWIW I've ended up somewhat massaging this patch (see below).

> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 23bd058576b1..8e5c783013d2 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -173,13 +173,14 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
>  
>  #define	FMODE_NOREUSE		((__force fmode_t)(1 << 23))
>  
> -/* FMODE_* bit 24 */
> -
>  /* File is embedded in backing_file object */
> -#define FMODE_BACKING		((__force fmode_t)(1 << 25))
> +#define FMODE_BACKING		((__force fmode_t)(1 << 24))
>  
> -/* File was opened by fanotify and shouldn't generate fanotify events */
> -#define FMODE_NONOTIFY		((__force fmode_t)(1 << 26))
> +/* File shouldn't generate fanotify pre-content events */
> +#define FMODE_NONOTIFY_HSM	((__force fmode_t)(1 << 25))
> +
> +/* File shouldn't generate fanotify permission events */
> +#define FMODE_NONOTIFY_PERM	((__force fmode_t)(1 << 26))

Firstly, I've kept FMODE_NONOTIFY to stay a single bit instead of two bit
constant. I've seen too many bugs caused by people expecting the constant
has a single bit set when it actually had more in my life. So I've ended up
with:

+/*
+ * Together with FMODE_NONOTIFY_PERM defines which fsnotify events shouldn't be
+ * generated (see below)
+ */
+#define FMODE_NONOTIFY         ((__force fmode_t)(1 << 25))
+ 
+/*
+ * Together with FMODE_NONOTIFY defines which fsnotify events shouldn't be
+ * generated (see below)
+ */
+#define FMODE_NONOTIFY_PERM    ((__force fmode_t)(1 << 26))

and

+/*
+ * The two FMODE_NONOTIFY* define which fsnotify events should not be generated
+ * for a file. These are the possible values of (f->f_mode &
+ * FMODE_FSNOTIFY_MASK) and their meaning:
+ *
+ * FMODE_NONOTIFY - suppress all (incl. non-permission) events.
+ * FMODE_NONOTIFY_PERM - suppress permission (incl. pre-content) events.
+ * FMODE_NONOTIFY | FMODE_NONOTIFY_PERM - suppress only pre-content events.
+ */
+#define FMODE_FSNOTIFY_MASK \
+       (FMODE_NONOTIFY | FMODE_NONOTIFY_PERM)
+
+#define FMODE_FSNOTIFY_NONE(mode) \
+       ((mode & FMODE_FSNOTIFY_MASK) == FMODE_NONOTIFY)
+#define FMODE_FSNOTIFY_PERM(mode) \
+       (!(mode & FMODE_NONOTIFY_PERM))
+#define FMODE_FSNOTIFY_HSM(mode) \
+       ((mode & FMODE_FSNOTIFY_MASK) == 0)

Also I've moved file_set_fsnotify_mode() out of line into fsnotify.c. The
function gets quite big and the call is not IMO so expensive to warrant
inlining. Furthermore it saves exporting some fsnotify internals to modules
(in later patches).

								Honza
Amir Goldstein Nov. 20, 2024, 4:12 p.m. UTC | #2
On Wed, Nov 20, 2024 at 4:53 PM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 15-11-24 10:30:15, Josef Bacik wrote:
> > From: Amir Goldstein <amir73il@gmail.com>
> >
> > Legacy inotify/fanotify listeners can add watches for events on inode,
> > parent or mount and expect to get events (e.g. FS_MODIFY) on files that
> > were already open at the time of setting up the watches.
> >
> > fanotify permission events are typically used by Anti-malware sofware,
> > that is watching the entire mount and it is not common to have more that
> > one Anti-malware engine installed on a system.
> >
> > To reduce the overhead of the fsnotify_file_perm() hooks on every file
> > access, relax the semantics of the legacy FAN_ACCESS_PERM event to generate
> > events only if there were *any* permission event listeners on the
> > filesystem at the time that the file was opened.
> >
> > The new semantic is implemented by extending the FMODE_NONOTIFY bit into
> > two FMODE_NONOTIFY_* bits, that are used to store a mode for which of the
> > events types to report.
> >
> > This is going to apply to the new fanotify pre-content events in order
> > to reduce the cost of the new pre-content event vfs hooks.
> >
> > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> > Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wj8L=mtcRTi=NECHMGfZQgXOp_uix1YVh04fEmrKaMnXA@mail.gmail.com/
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> FWIW I've ended up somewhat massaging this patch (see below).
>
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 23bd058576b1..8e5c783013d2 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -173,13 +173,14 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
> >
> >  #define      FMODE_NOREUSE           ((__force fmode_t)(1 << 23))
> >
> > -/* FMODE_* bit 24 */
> > -
> >  /* File is embedded in backing_file object */
> > -#define FMODE_BACKING                ((__force fmode_t)(1 << 25))
> > +#define FMODE_BACKING                ((__force fmode_t)(1 << 24))
> >
> > -/* File was opened by fanotify and shouldn't generate fanotify events */
> > -#define FMODE_NONOTIFY               ((__force fmode_t)(1 << 26))
> > +/* File shouldn't generate fanotify pre-content events */
> > +#define FMODE_NONOTIFY_HSM   ((__force fmode_t)(1 << 25))
> > +
> > +/* File shouldn't generate fanotify permission events */
> > +#define FMODE_NONOTIFY_PERM  ((__force fmode_t)(1 << 26))
>
> Firstly, I've kept FMODE_NONOTIFY to stay a single bit instead of two bit
> constant. I've seen too many bugs caused by people expecting the constant
> has a single bit set when it actually had more in my life. So I've ended up
> with:
>
> +/*
> + * Together with FMODE_NONOTIFY_PERM defines which fsnotify events shouldn't be
> + * generated (see below)
> + */
> +#define FMODE_NONOTIFY         ((__force fmode_t)(1 << 25))
> +
> +/*
> + * Together with FMODE_NONOTIFY defines which fsnotify events shouldn't be
> + * generated (see below)
> + */
> +#define FMODE_NONOTIFY_PERM    ((__force fmode_t)(1 << 26))
>
> and
>
> +/*
> + * The two FMODE_NONOTIFY* define which fsnotify events should not be generated
> + * for a file. These are the possible values of (f->f_mode &
> + * FMODE_FSNOTIFY_MASK) and their meaning:
> + *
> + * FMODE_NONOTIFY - suppress all (incl. non-permission) events.
> + * FMODE_NONOTIFY_PERM - suppress permission (incl. pre-content) events.
> + * FMODE_NONOTIFY | FMODE_NONOTIFY_PERM - suppress only pre-content events.
> + */
> +#define FMODE_FSNOTIFY_MASK \
> +       (FMODE_NONOTIFY | FMODE_NONOTIFY_PERM)
> +
> +#define FMODE_FSNOTIFY_NONE(mode) \
> +       ((mode & FMODE_FSNOTIFY_MASK) == FMODE_NONOTIFY)
> +#define FMODE_FSNOTIFY_PERM(mode) \
> +       (!(mode & FMODE_NONOTIFY_PERM))

That looks incorrect -
It gives the wrong value for FMODE_NONOTIFY | FMODE_NONOTIFY_PERM

should be:
!= FMODE_NONOTIFY_PERM &&
!= FMODE_NONOTIFY

The simplicity of the single bit test is for permission events
is why I chose my model, but I understand your reasoning.

> +#define FMODE_FSNOTIFY_HSM(mode) \
> +       ((mode & FMODE_FSNOTIFY_MASK) == 0)
>
> Also I've moved file_set_fsnotify_mode() out of line into fsnotify.c. The
> function gets quite big and the call is not IMO so expensive to warrant
> inlining. Furthermore it saves exporting some fsnotify internals to modules
> (in later patches).

Sounds good.
Since you wanted to refrain from defining a two bit constant,
I wonder how you annotated for NONOTIFY_HSM case

   return FMODE_NONOTIFY | FMODE_NONOTIFY_PERM;


Thanks,
Amir.
Jan Kara Nov. 21, 2024, 9:39 a.m. UTC | #3
On Wed 20-11-24 17:12:21, Amir Goldstein wrote:
> On Wed, Nov 20, 2024 at 4:53 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Fri 15-11-24 10:30:15, Josef Bacik wrote:
> > > From: Amir Goldstein <amir73il@gmail.com>
> > >
> > > Legacy inotify/fanotify listeners can add watches for events on inode,
> > > parent or mount and expect to get events (e.g. FS_MODIFY) on files that
> > > were already open at the time of setting up the watches.
> > >
> > > fanotify permission events are typically used by Anti-malware sofware,
> > > that is watching the entire mount and it is not common to have more that
> > > one Anti-malware engine installed on a system.
> > >
> > > To reduce the overhead of the fsnotify_file_perm() hooks on every file
> > > access, relax the semantics of the legacy FAN_ACCESS_PERM event to generate
> > > events only if there were *any* permission event listeners on the
> > > filesystem at the time that the file was opened.
> > >
> > > The new semantic is implemented by extending the FMODE_NONOTIFY bit into
> > > two FMODE_NONOTIFY_* bits, that are used to store a mode for which of the
> > > events types to report.
> > >
> > > This is going to apply to the new fanotify pre-content events in order
> > > to reduce the cost of the new pre-content event vfs hooks.
> > >
> > > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> > > Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wj8L=mtcRTi=NECHMGfZQgXOp_uix1YVh04fEmrKaMnXA@mail.gmail.com/
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >
> > FWIW I've ended up somewhat massaging this patch (see below).
> >
> > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > index 23bd058576b1..8e5c783013d2 100644
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -173,13 +173,14 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
> > >
> > >  #define      FMODE_NOREUSE           ((__force fmode_t)(1 << 23))
> > >
> > > -/* FMODE_* bit 24 */
> > > -
> > >  /* File is embedded in backing_file object */
> > > -#define FMODE_BACKING                ((__force fmode_t)(1 << 25))
> > > +#define FMODE_BACKING                ((__force fmode_t)(1 << 24))
> > >
> > > -/* File was opened by fanotify and shouldn't generate fanotify events */
> > > -#define FMODE_NONOTIFY               ((__force fmode_t)(1 << 26))
> > > +/* File shouldn't generate fanotify pre-content events */
> > > +#define FMODE_NONOTIFY_HSM   ((__force fmode_t)(1 << 25))
> > > +
> > > +/* File shouldn't generate fanotify permission events */
> > > +#define FMODE_NONOTIFY_PERM  ((__force fmode_t)(1 << 26))
> >
> > Firstly, I've kept FMODE_NONOTIFY to stay a single bit instead of two bit
> > constant. I've seen too many bugs caused by people expecting the constant
> > has a single bit set when it actually had more in my life. So I've ended up
> > with:
> >
> > +/*
> > + * Together with FMODE_NONOTIFY_PERM defines which fsnotify events shouldn't be
> > + * generated (see below)
> > + */
> > +#define FMODE_NONOTIFY         ((__force fmode_t)(1 << 25))
> > +
> > +/*
> > + * Together with FMODE_NONOTIFY defines which fsnotify events shouldn't be
> > + * generated (see below)
> > + */
> > +#define FMODE_NONOTIFY_PERM    ((__force fmode_t)(1 << 26))
> >
> > and
> >
> > +/*
> > + * The two FMODE_NONOTIFY* define which fsnotify events should not be generated
> > + * for a file. These are the possible values of (f->f_mode &
> > + * FMODE_FSNOTIFY_MASK) and their meaning:
> > + *
> > + * FMODE_NONOTIFY - suppress all (incl. non-permission) events.
> > + * FMODE_NONOTIFY_PERM - suppress permission (incl. pre-content) events.
> > + * FMODE_NONOTIFY | FMODE_NONOTIFY_PERM - suppress only pre-content events.
> > + */
> > +#define FMODE_FSNOTIFY_MASK \
> > +       (FMODE_NONOTIFY | FMODE_NONOTIFY_PERM)
> > +
> > +#define FMODE_FSNOTIFY_NONE(mode) \
> > +       ((mode & FMODE_FSNOTIFY_MASK) == FMODE_NONOTIFY)
> > +#define FMODE_FSNOTIFY_PERM(mode) \
> > +       (!(mode & FMODE_NONOTIFY_PERM))
> 
> That looks incorrect -
> It gives the wrong value for FMODE_NONOTIFY | FMODE_NONOTIFY_PERM
> 
> should be:
> != FMODE_NONOTIFY_PERM &&
> != FMODE_NONOTIFY
> 
> The simplicity of the single bit test is for permission events
> is why I chose my model, but I understand your reasoning.

Ah, thanks for catching this! I've fixed this to:

+#define FMODE_FSNOTIFY_PERM(mode) \
+       ((mode & FMODE_FSNOTIFY_MASK) == 0 || \
+        (mode & FMODE_FSNOTIFY_MASK) == (FMODE_NONOTIFY | FMODE_NONOTIFY_PERM))

It is not a single bit test so it ends up being:

   0x0000000060180345 <+101>:	mov    0x20(%r12),%edx
   0x000000006018034a <+106>:	and    $0x6000000,%edx
   0x0000000060180350 <+112>:	je     0x6018035a <rw_verify_area+122>
   0x0000000060180352 <+114>:	cmp    $0x6000000,%edx
   0x0000000060180358 <+120>:	jne    0x6018032e <rw_verify_area+78>

But I guess that's not terrible either.

> > +#define FMODE_FSNOTIFY_HSM(mode) \
> > +       ((mode & FMODE_FSNOTIFY_MASK) == 0)
> >
> > Also I've moved file_set_fsnotify_mode() out of line into fsnotify.c. The
> > function gets quite big and the call is not IMO so expensive to warrant
> > inlining. Furthermore it saves exporting some fsnotify internals to modules
> > (in later patches).
> 
> Sounds good.
> Since you wanted to refrain from defining a two bit constant,
> I wonder how you annotated for NONOTIFY_HSM case
> 
>    return FMODE_NONOTIFY | FMODE_NONOTIFY_PERM;

I'm not sure I understand. What do you mean by "annotated"?

It is not that I object to "two bit constants". FMODE_FSNOTIFY_MASK is a
two-bit constant and a good one. But the name clearly suggests it is not a
single bit constant. When you have all FMODE_FOO and FMODE_BAR things
single bit except for FMODE_BAZ which is multi-bit, then this is IMHO a
recipe for problems and I rather prefer explicitely spelling the
combination out as FMODE_NONOTIFY | FMODE_NONOTIFY_PERM in the few places
that need this instead of hiding it behind some other name.

								Honza
Christian Brauner Nov. 21, 2024, 9:45 a.m. UTC | #4
On Wed, Nov 20, 2024 at 04:53:09PM +0100, Jan Kara wrote:
> On Fri 15-11-24 10:30:15, Josef Bacik wrote:
> > From: Amir Goldstein <amir73il@gmail.com>
> > 
> > Legacy inotify/fanotify listeners can add watches for events on inode,
> > parent or mount and expect to get events (e.g. FS_MODIFY) on files that
> > were already open at the time of setting up the watches.
> > 
> > fanotify permission events are typically used by Anti-malware sofware,
> > that is watching the entire mount and it is not common to have more that
> > one Anti-malware engine installed on a system.
> > 
> > To reduce the overhead of the fsnotify_file_perm() hooks on every file
> > access, relax the semantics of the legacy FAN_ACCESS_PERM event to generate
> > events only if there were *any* permission event listeners on the
> > filesystem at the time that the file was opened.
> > 
> > The new semantic is implemented by extending the FMODE_NONOTIFY bit into
> > two FMODE_NONOTIFY_* bits, that are used to store a mode for which of the
> > events types to report.
> > 
> > This is going to apply to the new fanotify pre-content events in order
> > to reduce the cost of the new pre-content event vfs hooks.
> > 
> > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> > Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wj8L=mtcRTi=NECHMGfZQgXOp_uix1YVh04fEmrKaMnXA@mail.gmail.com/
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> 
> FWIW I've ended up somewhat massaging this patch (see below).
> 
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 23bd058576b1..8e5c783013d2 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -173,13 +173,14 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
> >  
> >  #define	FMODE_NOREUSE		((__force fmode_t)(1 << 23))
> >  
> > -/* FMODE_* bit 24 */
> > -
> >  /* File is embedded in backing_file object */
> > -#define FMODE_BACKING		((__force fmode_t)(1 << 25))
> > +#define FMODE_BACKING		((__force fmode_t)(1 << 24))
> >  
> > -/* File was opened by fanotify and shouldn't generate fanotify events */
> > -#define FMODE_NONOTIFY		((__force fmode_t)(1 << 26))
> > +/* File shouldn't generate fanotify pre-content events */
> > +#define FMODE_NONOTIFY_HSM	((__force fmode_t)(1 << 25))
> > +
> > +/* File shouldn't generate fanotify permission events */
> > +#define FMODE_NONOTIFY_PERM	((__force fmode_t)(1 << 26))
> 
> Firstly, I've kept FMODE_NONOTIFY to stay a single bit instead of two bit
> constant. I've seen too many bugs caused by people expecting the constant
> has a single bit set when it actually had more in my life. So I've ended up
> with:
> 
> +/*
> + * Together with FMODE_NONOTIFY_PERM defines which fsnotify events shouldn't be
> + * generated (see below)
> + */
> +#define FMODE_NONOTIFY         ((__force fmode_t)(1 << 25))
> + 
> +/*
> + * Together with FMODE_NONOTIFY defines which fsnotify events shouldn't be
> + * generated (see below)
> + */
> +#define FMODE_NONOTIFY_PERM    ((__force fmode_t)(1 << 26))
> 
> and
> 
> +/*
> + * The two FMODE_NONOTIFY* define which fsnotify events should not be generated
> + * for a file. These are the possible values of (f->f_mode &
> + * FMODE_FSNOTIFY_MASK) and their meaning:
> + *
> + * FMODE_NONOTIFY - suppress all (incl. non-permission) events.
> + * FMODE_NONOTIFY_PERM - suppress permission (incl. pre-content) events.
> + * FMODE_NONOTIFY | FMODE_NONOTIFY_PERM - suppress only pre-content events.
> + */
> +#define FMODE_FSNOTIFY_MASK \
> +       (FMODE_NONOTIFY | FMODE_NONOTIFY_PERM)

This is fine by me. But I want to preemptively caution to please not
spread the disease of further defines based on such multi-bit defines
like fanotify does. I'm specifically worried about stuff like:

#define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \
                                  FS_OPEN_EXEC_PERM)

#define FS_EVENTS_POSS_ON_CHILD   (ALL_FSNOTIFY_PERM_EVENTS | \
                                   FS_ACCESS | FS_MODIFY | FS_ATTRIB | \
                                   FS_CLOSE_WRITE | FS_CLOSE_NOWRITE | \
                                   FS_OPEN | FS_OPEN_EXEC)
Christian Brauner Nov. 21, 2024, 10:09 a.m. UTC | #5
> It is not that I object to "two bit constants". FMODE_FSNOTIFY_MASK is a
> two-bit constant and a good one. But the name clearly suggests it is not a
> single bit constant. When you have all FMODE_FOO and FMODE_BAR things
> single bit except for FMODE_BAZ which is multi-bit, then this is IMHO a
> recipe for problems and I rather prefer explicitely spelling the
> combination out as FMODE_NONOTIFY | FMODE_NONOTIFY_PERM in the few places
> that need this instead of hiding it behind some other name.

Very much agreed!
Amir Goldstein Nov. 21, 2024, 11:04 a.m. UTC | #6
On Thu, Nov 21, 2024 at 11:09 AM Christian Brauner <brauner@kernel.org> wrote:
>
> > It is not that I object to "two bit constants". FMODE_FSNOTIFY_MASK is a
> > two-bit constant and a good one. But the name clearly suggests it is not a
> > single bit constant. When you have all FMODE_FOO and FMODE_BAR things
> > single bit except for FMODE_BAZ which is multi-bit, then this is IMHO a
> > recipe for problems and I rather prefer explicitely spelling the
> > combination out as FMODE_NONOTIFY | FMODE_NONOTIFY_PERM in the few places
> > that need this instead of hiding it behind some other name.
>
> Very much agreed!

Yes, I agree as well.
What I meant is that the code that does
    return FMODE_NONOTIFY | FMODE_NONOTIFY_PERM;

is going to be unclear to the future code reviewer unless there is
a comment above explaining that this is a special flag combination
to specify "suppress only pre-content events".

Thanks,
Amir.
Jan Kara Nov. 21, 2024, 11:16 a.m. UTC | #7
On Thu 21-11-24 12:04:23, Amir Goldstein wrote:
> On Thu, Nov 21, 2024 at 11:09 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > > It is not that I object to "two bit constants". FMODE_FSNOTIFY_MASK is a
> > > two-bit constant and a good one. But the name clearly suggests it is not a
> > > single bit constant. When you have all FMODE_FOO and FMODE_BAR things
> > > single bit except for FMODE_BAZ which is multi-bit, then this is IMHO a
> > > recipe for problems and I rather prefer explicitely spelling the
> > > combination out as FMODE_NONOTIFY | FMODE_NONOTIFY_PERM in the few places
> > > that need this instead of hiding it behind some other name.
> >
> > Very much agreed!
> 
> Yes, I agree as well.
> What I meant is that the code that does
>     return FMODE_NONOTIFY | FMODE_NONOTIFY_PERM;
> 
> is going to be unclear to the future code reviewer unless there is
> a comment above explaining that this is a special flag combination
> to specify "suppress only pre-content events".

So this combination is used in file_set_fsnotify_mode() only (three
occurences) and there I have:

        /*
         * If there are permission event watchers but no pre-content event
         * watchers, set FMODE_NONOTIFY | FMODE_NONOTIFY_PERM to indicate that.
         */

at the first occurence. So hopefully that's enough of an explanation.

								Honza
Amir Goldstein Nov. 21, 2024, 11:32 a.m. UTC | #8
On Thu, Nov 21, 2024 at 12:16 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 21-11-24 12:04:23, Amir Goldstein wrote:
> > On Thu, Nov 21, 2024 at 11:09 AM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > > It is not that I object to "two bit constants". FMODE_FSNOTIFY_MASK is a
> > > > two-bit constant and a good one. But the name clearly suggests it is not a
> > > > single bit constant. When you have all FMODE_FOO and FMODE_BAR things
> > > > single bit except for FMODE_BAZ which is multi-bit, then this is IMHO a
> > > > recipe for problems and I rather prefer explicitely spelling the
> > > > combination out as FMODE_NONOTIFY | FMODE_NONOTIFY_PERM in the few places
> > > > that need this instead of hiding it behind some other name.
> > >
> > > Very much agreed!
> >
> > Yes, I agree as well.
> > What I meant is that the code that does
> >     return FMODE_NONOTIFY | FMODE_NONOTIFY_PERM;
> >
> > is going to be unclear to the future code reviewer unless there is
> > a comment above explaining that this is a special flag combination
> > to specify "suppress only pre-content events".
>
> So this combination is used in file_set_fsnotify_mode() only (three
> occurences) and there I have:
>
>         /*
>          * If there are permission event watchers but no pre-content event
>          * watchers, set FMODE_NONOTIFY | FMODE_NONOTIFY_PERM to indicate that.
>          */
>
> at the first occurence. So hopefully that's enough of an explanation.
>

Yes, that's the comment that I did not see, but assumed it was there ;)
which I wrongly expressed as "I wonder how you annotated".

Thanks,
Amir.
diff mbox series

Patch

diff --git a/fs/open.c b/fs/open.c
index c3490286092e..1a9483872e1f 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -901,7 +901,7 @@  static int do_dentry_open(struct file *f,
 	f->f_sb_err = file_sample_sb_err(f);
 
 	if (unlikely(f->f_flags & O_PATH)) {
-		f->f_mode = FMODE_PATH | FMODE_OPENED;
+		f->f_mode = FMODE_PATH | FMODE_OPENED | FMODE_NONOTIFY;
 		f->f_op = &empty_fops;
 		return 0;
 	}
@@ -929,6 +929,12 @@  static int do_dentry_open(struct file *f,
 	if (error)
 		goto cleanup_all;
 
+	/*
+	 * Set FMODE_NONOTIFY_* bits according to existing permission watches.
+	 * If FMODE_NONOTIFY was already set for an fanotify fd, this doesn't
+	 * change anything.
+	 */
+	file_set_fsnotify_mode(f);
 	error = fsnotify_open_perm(f);
 	if (error)
 		goto cleanup_all;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 23bd058576b1..8e5c783013d2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -173,13 +173,14 @@  typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 
 #define	FMODE_NOREUSE		((__force fmode_t)(1 << 23))
 
-/* FMODE_* bit 24 */
-
 /* File is embedded in backing_file object */
-#define FMODE_BACKING		((__force fmode_t)(1 << 25))
+#define FMODE_BACKING		((__force fmode_t)(1 << 24))
 
-/* File was opened by fanotify and shouldn't generate fanotify events */
-#define FMODE_NONOTIFY		((__force fmode_t)(1 << 26))
+/* File shouldn't generate fanotify pre-content events */
+#define FMODE_NONOTIFY_HSM	((__force fmode_t)(1 << 25))
+
+/* File shouldn't generate fanotify permission events */
+#define FMODE_NONOTIFY_PERM	((__force fmode_t)(1 << 26))
 
 /* File is capable of returning -EAGAIN if I/O will block */
 #define FMODE_NOWAIT		((__force fmode_t)(1 << 27))
@@ -190,6 +191,30 @@  typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 /* File does not contribute to nr_files count */
 #define FMODE_NOACCOUNT		((__force fmode_t)(1 << 29))
 
+/*
+ * The two FMODE_NONOTIFY_ bits used together have a special meaning of
+ * not reporting any events at all including non-permission events.
+ * These are the possible values of FMODE_FSNOTIFY(f->f_mode) and their meaning:
+ *
+ * FMODE_NONOTIFY_HSM - suppress only pre-content events.
+ * FMODE_NONOTIFY_PERM - suppress permission (incl. pre-content) events.
+ * FMODE_NONOTIFY - suppress all (incl. non-permission) events.
+ */
+#define FMODE_FSNOTIFY_MASK \
+	(FMODE_NONOTIFY_HSM | FMODE_NONOTIFY_PERM)
+#define FMODE_NONOTIFY FMODE_FSNOTIFY_MASK
+#define FMODE_FSNOTIFY(mode) \
+	((mode) & FMODE_FSNOTIFY_MASK)
+
+#define FMODE_FSNOTIFY_NONE(mode) \
+	(FMODE_FSNOTIFY(mode) == FMODE_NONOTIFY)
+#define FMODE_FSNOTIFY_NORMAL(mode) \
+	(FMODE_FSNOTIFY(mode) == FMODE_NONOTIFY_PERM)
+#define FMODE_FSNOTIFY_PERM(mode) \
+	(!((mode) & FMODE_NONOTIFY_PERM))
+#define FMODE_FSNOTIFY_HSM(mode) \
+	(FMODE_FSNOTIFY(mode) == 0)
+
 /*
  * Attribute flags.  These should be or-ed together to figure out what
  * has been changed!
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 278620e063ab..54ec97366d7c 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -108,38 +108,68 @@  static inline void fsnotify_dentry(struct dentry *dentry, __u32 mask)
 	fsnotify_parent(dentry, mask, dentry, FSNOTIFY_EVENT_DENTRY);
 }
 
+static inline int fsnotify_path(const struct path *path, __u32 mask)
+{
+	return fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH);
+}
+
 static inline int fsnotify_file(struct file *file, __u32 mask)
 {
-	const struct path *path;
-
 	/*
 	 * FMODE_NONOTIFY are fds generated by fanotify itself which should not
 	 * generate new events. We also don't want to generate events for
 	 * FMODE_PATH fds (involves open & close events) as they are just
 	 * handle creation / destruction events and not "real" file events.
 	 */
-	if (file->f_mode & (FMODE_NONOTIFY | FMODE_PATH))
+	if (FMODE_FSNOTIFY_NONE(file->f_mode))
 		return 0;
 
-	path = &file->f_path;
-	/* Permission events require group prio >= FSNOTIFY_PRIO_CONTENT */
-	if (mask & ALL_FSNOTIFY_PERM_EVENTS &&
-	    !fsnotify_sb_has_priority_watchers(path->dentry->d_sb,
-					       FSNOTIFY_PRIO_CONTENT))
-		return 0;
-
-	return fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH);
+	return fsnotify_path(&file->f_path, mask);
 }
 
 #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
+/*
+ * At open time we check fsnotify_sb_has_priority_watchers() and set the
+ * FMODE_NONOTIFY_ mode bits accordignly.
+ * Later, fsnotify permission hooks do not check if there are permission event
+ * watches, but that there were permission event watches at open time.
+ */
+static void file_set_fsnotify_mode(struct file *file)
+{
+	struct super_block *sb = file->f_path.dentry->d_sb;
+
+	/* Is it a file opened by fanotify? */
+	if (FMODE_FSNOTIFY_NONE(file->f_mode))
+		return;
+
+	/*
+	 * Permission events is a super set of pre-content events, so if there
+	 * are no permission event watchers, there are also no pre-content event
+	 * watchers and this is implied from the single FMODE_NONOTIFY_PERM bit.
+	 */
+	if (likely(!fsnotify_sb_has_priority_watchers(sb,
+						FSNOTIFY_PRIO_CONTENT))) {
+		file->f_mode |= FMODE_NONOTIFY_PERM;
+		return;
+	}
+
+	/*
+	 * FMODE_NONOTIFY_HSM bit means there are permission event watchers, but
+	 * no pre-content event watchers.
+	 */
+	if (likely(!fsnotify_sb_has_priority_watchers(sb,
+						FSNOTIFY_PRIO_PRE_CONTENT))) {
+		file->f_mode |= FMODE_NONOTIFY_HSM;
+		return;
+	}
+}
+
 /*
  * fsnotify_file_area_perm - permission hook before access to file range
  */
 static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
 					  const loff_t *ppos, size_t count)
 {
-	__u32 fsnotify_mask = FS_ACCESS_PERM;
-
 	/*
 	 * filesystem may be modified in the context of permission events
 	 * (e.g. by HSM filling a file on access), so sb freeze protection
@@ -150,7 +180,10 @@  static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
 	if (!(perm_mask & MAY_READ))
 		return 0;
 
-	return fsnotify_file(file, fsnotify_mask);
+	if (likely(file->f_mode & FMODE_NONOTIFY_PERM))
+		return 0;
+
+	return fsnotify_path(&file->f_path, FS_ACCESS_PERM);
 }
 
 /*
@@ -168,16 +201,23 @@  static inline int fsnotify_open_perm(struct file *file)
 {
 	int ret;
 
+	if (likely(!FMODE_FSNOTIFY_PERM(file->f_mode)))
+		return 0;
+
 	if (file->f_flags & __FMODE_EXEC) {
-		ret = fsnotify_file(file, FS_OPEN_EXEC_PERM);
+		ret = fsnotify_path(&file->f_path, FS_OPEN_EXEC_PERM);
 		if (ret)
 			return ret;
 	}
 
-	return fsnotify_file(file, FS_OPEN_PERM);
+	return fsnotify_path(&file->f_path, FS_OPEN_PERM);
 }
 
 #else
+static inline void file_set_fsnotify_mode(struct file *file)
+{
+}
+
 static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
 					  const loff_t *ppos, size_t count)
 {