diff mbox series

[RFC] lsm: add the inode_free_security_rcu() LSM implementation hook

Message ID 20240710024029.669314-2-paul@paul-moore.com (mailing list archive)
State RFC
Headers show
Series [RFC] lsm: add the inode_free_security_rcu() LSM implementation hook | expand

Commit Message

Paul Moore July 10, 2024, 2:40 a.m. UTC
The LSM framework has an existing inode_free_security() hook which
is used by LSMs that manage state associated with an inode, but
due to the use of RCU to protect the inode, special care must be
taken to ensure that the LSMs do not fully release the inode state
until it is safe from a RCU perspective.

This patch implements a new inode_free_security_rcu() implementation
hook which is called when it is safe to free the LSM's internal inode
state.  Unfortunately, this new hook does not have access to the inode
itself as it may already be released, so the existing
inode_free_security() hook is retained for those LSMs which require
access to the inode.

Signed-off-by: Paul Moore <paul@paul-moore.com>
---
 include/linux/lsm_hook_defs.h     |  1 +
 security/integrity/ima/ima.h      |  2 +-
 security/integrity/ima/ima_iint.c | 20 ++++++++------------
 security/integrity/ima/ima_main.c |  2 +-
 security/landlock/fs.c            |  9 ++++++---
 security/security.c               | 26 +++++++++++++-------------
 6 files changed, 30 insertions(+), 30 deletions(-)

Comments

Paul Moore July 10, 2024, 2:47 a.m. UTC | #1
On Tue, Jul 9, 2024 at 10:40 PM Paul Moore <paul@paul-moore.com> wrote:
>
> The LSM framework has an existing inode_free_security() hook which
> is used by LSMs that manage state associated with an inode, but
> due to the use of RCU to protect the inode, special care must be
> taken to ensure that the LSMs do not fully release the inode state
> until it is safe from a RCU perspective.
>
> This patch implements a new inode_free_security_rcu() implementation
> hook which is called when it is safe to free the LSM's internal inode
> state.  Unfortunately, this new hook does not have access to the inode
> itself as it may already be released, so the existing
> inode_free_security() hook is retained for those LSMs which require
> access to the inode.
>
> Signed-off-by: Paul Moore <paul@paul-moore.com>
> ---
>  include/linux/lsm_hook_defs.h     |  1 +
>  security/integrity/ima/ima.h      |  2 +-
>  security/integrity/ima/ima_iint.c | 20 ++++++++------------
>  security/integrity/ima/ima_main.c |  2 +-
>  security/landlock/fs.c            |  9 ++++++---
>  security/security.c               | 26 +++++++++++++-------------
>  6 files changed, 30 insertions(+), 30 deletions(-)

FYI, this has only received "light" testing, and even that is fairly
generous.  I booted up a system with IMA set to measure the TCB and
ran through the audit and SELinux test suites; IMA seemed to be
working just fine but I didn't poke at it too hard.  I didn't have an
explicit Landlock test handy, but I'm hoping that the Landlock
enablement on a modern Rawhide system hit it a little :)
Mickaël Salaün July 10, 2024, 10:40 a.m. UTC | #2
On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote:
> The LSM framework has an existing inode_free_security() hook which
> is used by LSMs that manage state associated with an inode, but
> due to the use of RCU to protect the inode, special care must be
> taken to ensure that the LSMs do not fully release the inode state
> until it is safe from a RCU perspective.
> 
> This patch implements a new inode_free_security_rcu() implementation
> hook which is called when it is safe to free the LSM's internal inode
> state.  Unfortunately, this new hook does not have access to the inode
> itself as it may already be released, so the existing
> inode_free_security() hook is retained for those LSMs which require
> access to the inode.
> 
> Signed-off-by: Paul Moore <paul@paul-moore.com>

I like this new hook.  It is definitely safer than the current approach.

To make it more consistent, I think we should also rename
security_inode_free() to security_inode_put() to highlight the fact that
LSM implementations should not free potential pointers in this blob
because they could still be dereferenced in a path walk.

> ---
>  include/linux/lsm_hook_defs.h     |  1 +
>  security/integrity/ima/ima.h      |  2 +-
>  security/integrity/ima/ima_iint.c | 20 ++++++++------------
>  security/integrity/ima/ima_main.c |  2 +-
>  security/landlock/fs.c            |  9 ++++++---
>  security/security.c               | 26 +++++++++++++-------------
>  6 files changed, 30 insertions(+), 30 deletions(-)
> 
> diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
> index 8fd87f823d3a..abe6b0ef892a 100644
> --- a/include/linux/lsm_hook_defs.h
> +++ b/include/linux/lsm_hook_defs.h
> @@ -114,6 +114,7 @@ LSM_HOOK(int, 0, path_notify, const struct path *path, u64 mask,
>  	 unsigned int obj_type)
>  LSM_HOOK(int, 0, inode_alloc_security, struct inode *inode)
>  LSM_HOOK(void, LSM_RET_VOID, inode_free_security, struct inode *inode)
> +LSM_HOOK(void, LSM_RET_VOID, inode_free_security_rcu, void *)
>  LSM_HOOK(int, -EOPNOTSUPP, inode_init_security, struct inode *inode,
>  	 struct inode *dir, const struct qstr *qstr, struct xattr *xattrs,
>  	 int *xattr_count)
> diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
> index 3e568126cd48..e2a2e4c7eab6 100644
> --- a/security/integrity/ima/ima.h
> +++ b/security/integrity/ima/ima.h
> @@ -223,7 +223,7 @@ static inline void ima_inode_set_iint(const struct inode *inode,
>  
>  struct ima_iint_cache *ima_iint_find(struct inode *inode);
>  struct ima_iint_cache *ima_inode_get(struct inode *inode);
> -void ima_inode_free(struct inode *inode);
> +void ima_inode_free_rcu(void *inode_sec);
>  void __init ima_iintcache_init(void);
>  
>  extern const int read_idmap[];
> diff --git a/security/integrity/ima/ima_iint.c b/security/integrity/ima/ima_iint.c
> index e23412a2c56b..54480df90bdc 100644
> --- a/security/integrity/ima/ima_iint.c
> +++ b/security/integrity/ima/ima_iint.c
> @@ -109,22 +109,18 @@ struct ima_iint_cache *ima_inode_get(struct inode *inode)
>  }
>  
>  /**
> - * ima_inode_free - Called on inode free
> - * @inode: Pointer to the inode
> + * ima_inode_free_rcu - Called to free an inode via a RCU callback
> + * @inode_sec: The inode::i_security pointer
>   *
> - * Free the iint associated with an inode.
> + * Free the IMA data associated with an inode.
>   */
> -void ima_inode_free(struct inode *inode)
> +void ima_inode_free_rcu(void *inode_sec)
>  {
> -	struct ima_iint_cache *iint;
> +	struct ima_iint_cache **iint_p = inode_sec + ima_blob_sizes.lbs_inode;
>  
> -	if (!IS_IMA(inode))
> -		return;
> -
> -	iint = ima_iint_find(inode);
> -	ima_inode_set_iint(inode, NULL);
> -
> -	ima_iint_free(iint);
> +	/* *iint_p should be NULL if !IS_IMA(inode) */
> +	if (*iint_p)
> +		ima_iint_free(*iint_p);
>  }
>  
>  static void ima_iint_init_once(void *foo)
> diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
> index f04f43af651c..5b3394864b21 100644
> --- a/security/integrity/ima/ima_main.c
> +++ b/security/integrity/ima/ima_main.c
> @@ -1193,7 +1193,7 @@ static struct security_hook_list ima_hooks[] __ro_after_init = {
>  #ifdef CONFIG_INTEGRITY_ASYMMETRIC_KEYS
>  	LSM_HOOK_INIT(kernel_module_request, ima_kernel_module_request),
>  #endif
> -	LSM_HOOK_INIT(inode_free_security, ima_inode_free),
> +	LSM_HOOK_INIT(inode_free_security_rcu, ima_inode_free_rcu),
>  };
>  
>  static const struct lsm_id ima_lsmid = {
> diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> index 22d8b7c28074..f583f8cec345 100644
> --- a/security/landlock/fs.c
> +++ b/security/landlock/fs.c
> @@ -1198,13 +1198,16 @@ static int current_check_refer_path(struct dentry *const old_dentry,
>  
>  /* Inode hooks */
>  
> -static void hook_inode_free_security(struct inode *const inode)
> +static void hook_inode_free_security_rcu(void *inode_sec)
>  {
> +	struct landlock_inode_security *lisec;

Please rename "lisec" to "inode_sec" for consistency with
get_inode_object()'s variables.

> +
>  	/*
>  	 * All inodes must already have been untied from their object by
>  	 * release_inode() or hook_sb_delete().
>  	 */
> -	WARN_ON_ONCE(landlock_inode(inode)->object);
> +	lisec = inode_sec + landlock_blob_sizes.lbs_inode;
> +	WARN_ON_ONCE(lisec->object);
>  }

This looks good to me.

We can add these footers:
Reported-by: syzbot+5446fbf332b0602ede0b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/00000000000076ba3b0617f65cc8@google.com

However, I'm wondering if we could backport this patch down to v5.15 .
I guess not, so I'll need to remove this hook implementation for
Landlock, backport it to v5.15, and then you'll need to re-add this
check with this patch.  At least it has been useful to spot this inode
issue, but it could still be useful to spot potential memory leaks with
a negligible performance impact.


>  
>  /* Super-block hooks */
> @@ -1628,7 +1631,7 @@ static int hook_file_ioctl_compat(struct file *file, unsigned int cmd,
>  }
>  
>  static struct security_hook_list landlock_hooks[] __ro_after_init = {
> -	LSM_HOOK_INIT(inode_free_security, hook_inode_free_security),
> +	LSM_HOOK_INIT(inode_free_security_rcu, hook_inode_free_security_rcu),
>  
>  	LSM_HOOK_INIT(sb_delete, hook_sb_delete),
>  	LSM_HOOK_INIT(sb_mount, hook_sb_mount),
> diff --git a/security/security.c b/security/security.c
> index b52e81ac5526..bc6805f7332e 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -1596,9 +1596,8 @@ int security_inode_alloc(struct inode *inode)
>  
>  static void inode_free_by_rcu(struct rcu_head *head)
>  {
> -	/*
> -	 * The rcu head is at the start of the inode blob
> -	 */
> +	/* The rcu head is at the start of the inode blob */
> +	call_void_hook(inode_free_security_rcu, head);

For this to work, we need to extend the inode blob size (lbs_inode) with
sizeof(struct rcu_head).  The current implementation override the
content of the blob with a new rcu_head.

>  	kmem_cache_free(lsm_inode_cache, head);
>  }
>  
> @@ -1606,20 +1605,21 @@ static void inode_free_by_rcu(struct rcu_head *head)
>   * security_inode_free() - Free an inode's LSM blob
>   * @inode: the inode
>   *
> - * Deallocate the inode security structure and set @inode->i_security to NULL.
> + * Release any LSM resources associated with @inode, although due to the
> + * inode's RCU protections it is possible that the resources will not be
> + * fully released until after the current RCU grace period has elapsed.
> + *
> + * It is important for LSMs to note that despite being present in a call to
> + * security_inode_free(), @inode may still be referenced in a VFS path walk
> + * and calls to security_inode_permission() may be made during, or after,
> + * a call to security_inode_free().  For this reason the inode->i_security
> + * field is released via a call_rcu() callback and any LSMs which need to
> + * retain inode state for use in security_inode_permission() should only
> + * release that state in the inode_free_security_rcu() LSM hook callback.
>   */
>  void security_inode_free(struct inode *inode)
>  {
>  	call_void_hook(inode_free_security, inode);
> -	/*
> -	 * The inode may still be referenced in a path walk and
> -	 * a call to security_inode_permission() can be made
> -	 * after inode_free_security() is called. Ideally, the VFS
> -	 * wouldn't do this, but fixing that is a much harder
> -	 * job. For now, simply free the i_security via RCU, and
> -	 * leave the current inode->i_security pointer intact.
> -	 * The inode will be freed after the RCU grace period too.
> -	 */
>  	if (inode->i_security)
>  		call_rcu((struct rcu_head *)inode->i_security,
>  			 inode_free_by_rcu);

We should have something like:
call_rcu(inode->i_security.rcu, inode_free_by_rcu);

> -- 
> 2.45.2
> 
>
Mickaël Salaün July 10, 2024, 12:02 p.m. UTC | #3
On Tue, Jul 09, 2024 at 10:47:45PM -0400, Paul Moore wrote:
> On Tue, Jul 9, 2024 at 10:40 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > The LSM framework has an existing inode_free_security() hook which
> > is used by LSMs that manage state associated with an inode, but
> > due to the use of RCU to protect the inode, special care must be
> > taken to ensure that the LSMs do not fully release the inode state
> > until it is safe from a RCU perspective.
> >
> > This patch implements a new inode_free_security_rcu() implementation
> > hook which is called when it is safe to free the LSM's internal inode
> > state.  Unfortunately, this new hook does not have access to the inode
> > itself as it may already be released, so the existing
> > inode_free_security() hook is retained for those LSMs which require
> > access to the inode.
> >
> > Signed-off-by: Paul Moore <paul@paul-moore.com>
> > ---
> >  include/linux/lsm_hook_defs.h     |  1 +
> >  security/integrity/ima/ima.h      |  2 +-
> >  security/integrity/ima/ima_iint.c | 20 ++++++++------------
> >  security/integrity/ima/ima_main.c |  2 +-
> >  security/landlock/fs.c            |  9 ++++++---
> >  security/security.c               | 26 +++++++++++++-------------
> >  6 files changed, 30 insertions(+), 30 deletions(-)
> 
> FYI, this has only received "light" testing, and even that is fairly
> generous.  I booted up a system with IMA set to measure the TCB and
> ran through the audit and SELinux test suites; IMA seemed to be
> working just fine but I didn't poke at it too hard.  I didn't have an
> explicit Landlock test handy, but I'm hoping that the Landlock
> enablement on a modern Rawhide system hit it a little :)

If you want to test Landlock, you can do so like this:

cd tools/testing/selftests/landlock
make -C ../../../.. headers_install
make
for f in *_test; ./$f; done

...or you can build and run everything (on UML) with
`./check-linux build kselftest' provided here:
https://github.com/landlock-lsm/landlock-test-tools

...or, even simpler, you can run all checks by running
`./docker-run.sh debian/sid` for instance.

I need to update the kernel doc with these commands.

> 
> -- 
> paul-moore.com
>
Paul Moore July 10, 2024, 4:20 p.m. UTC | #4
On Wed, Jul 10, 2024 at 6:40 AM Mickaël Salaün <mic@digikod.net> wrote:
> On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote:
> > The LSM framework has an existing inode_free_security() hook which
> > is used by LSMs that manage state associated with an inode, but
> > due to the use of RCU to protect the inode, special care must be
> > taken to ensure that the LSMs do not fully release the inode state
> > until it is safe from a RCU perspective.
> >
> > This patch implements a new inode_free_security_rcu() implementation
> > hook which is called when it is safe to free the LSM's internal inode
> > state.  Unfortunately, this new hook does not have access to the inode
> > itself as it may already be released, so the existing
> > inode_free_security() hook is retained for those LSMs which require
> > access to the inode.
> >
> > Signed-off-by: Paul Moore <paul@paul-moore.com>
>
> I like this new hook.  It is definitely safer than the current approach.

It would be nicer to simply have a single inode free hook, but it
doesn't look like that is a possibility due to the inode
synchronization methods and differing lifetimes of inodes and their
associated LSM state.  At least the workaround isn't too ugly :)

> To make it more consistent, I think we should also rename
> security_inode_free() to security_inode_put() to highlight the fact that
> LSM implementations should not free potential pointers in this blob
> because they could still be dereferenced in a path walk.

I'd prefer to keep the naming as-is in this patch since
security_inode_free() does trigger the free'ing/release of the LSM
state associated with the inode.

> > diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> > index 22d8b7c28074..f583f8cec345 100644
> > --- a/security/landlock/fs.c
> > +++ b/security/landlock/fs.c
> > @@ -1198,13 +1198,16 @@ static int current_check_refer_path(struct dentry *const old_dentry,
> >
> >  /* Inode hooks */
> >
> > -static void hook_inode_free_security(struct inode *const inode)
> > +static void hook_inode_free_security_rcu(void *inode_sec)
> >  {
> > +     struct landlock_inode_security *lisec;
>
> Please rename "lisec" to "inode_sec" for consistency with
> get_inode_object()'s variables.

Done.  That did conflict with the parameter name so I renamed the
parameter everywhere to "inode_security".

> >       /*
> >        * All inodes must already have been untied from their object by
> >        * release_inode() or hook_sb_delete().
> >        */
> > -     WARN_ON_ONCE(landlock_inode(inode)->object);
> > +     lisec = inode_sec + landlock_blob_sizes.lbs_inode;
> > +     WARN_ON_ONCE(lisec->object);
> >  }
>
> This looks good to me.
>
> We can add these footers:
> Reported-by: syzbot+5446fbf332b0602ede0b@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/r/00000000000076ba3b0617f65cc8@google.com

Thanks, metadata added.  This was a quick RFC patch so I didn't want
to spend a lot of time chasing down metadata refs until I knew there
was some basic support for this approach.  I still want to make sure
it is okay with the IMA folks.

> However, I'm wondering if we could backport this patch down to v5.15 .
> I guess not, so I'll need to remove this hook implementation for
> Landlock, backport it to v5.15, and then you'll need to re-add this
> check with this patch.  At least it has been useful to spot this inode
> issue, but it could still be useful to spot potential memory leaks with
> a negligible performance impact.

Yes, it's a bit complicated with the IMA/EVM promotion happening
fairly recently.  I'm marking the patch with a stable tag, but
considering we're at -rc7 and this needs at least one more respin,
testing, ACKs, etc. it might not land in Linus' tree until sometime
post v6.11-rc1.  Considering that, I might suggest dropping the
Landlock hook in -stable and re-adding it to Linus' tree once this fix
lands, but that decision is up to you.

> > diff --git a/security/security.c b/security/security.c
> > index b52e81ac5526..bc6805f7332e 100644
> > --- a/security/security.c
> > +++ b/security/security.c
> > @@ -1596,9 +1596,8 @@ int security_inode_alloc(struct inode *inode)
> >
> >  static void inode_free_by_rcu(struct rcu_head *head)
> >  {
> > -     /*
> > -      * The rcu head is at the start of the inode blob
> > -      */
> > +     /* The rcu head is at the start of the inode blob */
> > +     call_void_hook(inode_free_security_rcu, head);
>
> For this to work, we need to extend the inode blob size (lbs_inode) with
> sizeof(struct rcu_head).  The current implementation override the
> content of the blob with a new rcu_head.

Take a look at lsm_set_blob_sizes() and you'll see that the rcu_head
struct is already accounted for in the inode->i_security blob.

> > @@ -1606,20 +1605,21 @@ static void inode_free_by_rcu(struct rcu_head *head)
> >   * security_inode_free() - Free an inode's LSM blob
> >   * @inode: the inode
> >   *
> > - * Deallocate the inode security structure and set @inode->i_security to NULL.
> > + * Release any LSM resources associated with @inode, although due to the
> > + * inode's RCU protections it is possible that the resources will not be
> > + * fully released until after the current RCU grace period has elapsed.
> > + *
> > + * It is important for LSMs to note that despite being present in a call to
> > + * security_inode_free(), @inode may still be referenced in a VFS path walk
> > + * and calls to security_inode_permission() may be made during, or after,
> > + * a call to security_inode_free().  For this reason the inode->i_security
> > + * field is released via a call_rcu() callback and any LSMs which need to
> > + * retain inode state for use in security_inode_permission() should only
> > + * release that state in the inode_free_security_rcu() LSM hook callback.
> >   */
> >  void security_inode_free(struct inode *inode)
> >  {
> >       call_void_hook(inode_free_security, inode);
> > -     /*
> > -      * The inode may still be referenced in a path walk and
> > -      * a call to security_inode_permission() can be made
> > -      * after inode_free_security() is called. Ideally, the VFS
> > -      * wouldn't do this, but fixing that is a much harder
> > -      * job. For now, simply free the i_security via RCU, and
> > -      * leave the current inode->i_security pointer intact.
> > -      * The inode will be freed after the RCU grace period too.
> > -      */
> >       if (inode->i_security)
> >               call_rcu((struct rcu_head *)inode->i_security,
> >                        inode_free_by_rcu);
>
> We should have something like:
> call_rcu(inode->i_security.rcu, inode_free_by_rcu);

The unfortunate side effect of that is that the patch grows
significantly as everyone that touches inode->i_security would need to
be updated to use inode->i_security.data or something similar.  For a
patch that we likely want to see backported to -stable that could make
things more difficult than needed.

However, I have always thought we should add some better
structure/typing to these opaque LSM blobs both to get away from the
raw pointer math and add a marginal layer of safety.  I've envisioned
doing something like this:

  struct lsm_blob_inode {
    struct selinux_blob_inode selinux;
    struct smack_blob_inode smack;
    struct aa_blob_inode apparmor;
    ...
    struct rcu_head rcu;
  }

... with lsm_blob_inode allocated and assigned to inode->i_security.
Those LSMs that are enabled and require blob space would define their
struct with the necessary fields, those that were disabled or did not
require space would define an empty struct; some minor pre-processor
checking and header file work would be needed, but it shouldn't be too
bad.  Ideally, we would use the same approach for all of the LSM
security blobs, only with different LSM supplied structs.  Perhaps
once we land Casey's latest patchset I'll see about doing something
like that, but right now we've got bigger problems to address.

I'll hold off on posting a proper patch for this RFC until I hear back
from Mini or Roberto on the IMA changes.
Paul Moore July 10, 2024, 4:24 p.m. UTC | #5
On Wed, Jul 10, 2024 at 8:02 AM Mickaël Salaün <mic@digikod.net> wrote:
> On Tue, Jul 09, 2024 at 10:47:45PM -0400, Paul Moore wrote:
> > On Tue, Jul 9, 2024 at 10:40 PM Paul Moore <paul@paul-moore.com> wrote:
> > >
> > > The LSM framework has an existing inode_free_security() hook which
> > > is used by LSMs that manage state associated with an inode, but
> > > due to the use of RCU to protect the inode, special care must be
> > > taken to ensure that the LSMs do not fully release the inode state
> > > until it is safe from a RCU perspective.
> > >
> > > This patch implements a new inode_free_security_rcu() implementation
> > > hook which is called when it is safe to free the LSM's internal inode
> > > state.  Unfortunately, this new hook does not have access to the inode
> > > itself as it may already be released, so the existing
> > > inode_free_security() hook is retained for those LSMs which require
> > > access to the inode.
> > >
> > > Signed-off-by: Paul Moore <paul@paul-moore.com>
> > > ---
> > >  include/linux/lsm_hook_defs.h     |  1 +
> > >  security/integrity/ima/ima.h      |  2 +-
> > >  security/integrity/ima/ima_iint.c | 20 ++++++++------------
> > >  security/integrity/ima/ima_main.c |  2 +-
> > >  security/landlock/fs.c            |  9 ++++++---
> > >  security/security.c               | 26 +++++++++++++-------------
> > >  6 files changed, 30 insertions(+), 30 deletions(-)
> >
> > FYI, this has only received "light" testing, and even that is fairly
> > generous.  I booted up a system with IMA set to measure the TCB and
> > ran through the audit and SELinux test suites; IMA seemed to be
> > working just fine but I didn't poke at it too hard.  I didn't have an
> > explicit Landlock test handy, but I'm hoping that the Landlock
> > enablement on a modern Rawhide system hit it a little :)
>
> If you want to test Landlock, you can do so like this:
>
> cd tools/testing/selftests/landlock
> make -C ../../../.. headers_install
> make
> for f in *_test; ./$f; done

Looks okay?

% for f in *_test; do ./$f; done | grep "^# Totals"
# Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0
#      SKIP      overlayfs is not supported (setup)
#      SKIP      overlayfs is not supported (setup)
#      SKIP      this filesystem is not supported (setup)
#      SKIP      this filesystem is not supported (setup)
#      SKIP      this filesystem is not supported (setup)
#      SKIP      this filesystem is not supported (setup)
#      SKIP      this filesystem is not supported (setup)
# Totals: pass:117 fail:0 xfail:0 xpass:0 skip:7 error:0
# Totals: pass:84 fail:0 xfail:0 xpass:0 skip:0 error:0
# Totals: pass:8 fail:0 xfail:0 xpass:0 skip:0 error:0
Casey Schaufler July 10, 2024, 4:41 p.m. UTC | #6
On 7/10/2024 9:20 AM, Paul Moore wrote:
> On Wed, Jul 10, 2024 at 6:40 AM Mickaël Salaün <mic@digikod.net> wrote:
>> On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote:
>>> The LSM framework has an existing inode_free_security() hook which
>>> is used by LSMs that manage state associated with an inode, but
>>> due to the use of RCU to protect the inode, special care must be
>>> taken to ensure that the LSMs do not fully release the inode state
>>> until it is safe from a RCU perspective.
>>>
>>> This patch implements a new inode_free_security_rcu() implementation
>>> hook which is called when it is safe to free the LSM's internal inode
>>> state.  Unfortunately, this new hook does not have access to the inode
>>> itself as it may already be released, so the existing
>>> inode_free_security() hook is retained for those LSMs which require
>>> access to the inode.
>>>
>>> Signed-off-by: Paul Moore <paul@paul-moore.com>
>> I like this new hook.  It is definitely safer than the current approach.
> It would be nicer to simply have a single inode free hook, but it
> doesn't look like that is a possibility due to the inode
> synchronization methods and differing lifetimes of inodes and their
> associated LSM state.  At least the workaround isn't too ugly :)
>
>> To make it more consistent, I think we should also rename
>> security_inode_free() to security_inode_put() to highlight the fact that
>> LSM implementations should not free potential pointers in this blob
>> because they could still be dereferenced in a path walk.
> I'd prefer to keep the naming as-is in this patch since
> security_inode_free() does trigger the free'ing/release of the LSM
> state associated with the inode.
>
>>> diff --git a/security/landlock/fs.c b/security/landlock/fs.c
>>> index 22d8b7c28074..f583f8cec345 100644
>>> --- a/security/landlock/fs.c
>>> +++ b/security/landlock/fs.c
>>> @@ -1198,13 +1198,16 @@ static int current_check_refer_path(struct dentry *const old_dentry,
>>>
>>>  /* Inode hooks */
>>>
>>> -static void hook_inode_free_security(struct inode *const inode)
>>> +static void hook_inode_free_security_rcu(void *inode_sec)
>>>  {
>>> +     struct landlock_inode_security *lisec;
>> Please rename "lisec" to "inode_sec" for consistency with
>> get_inode_object()'s variables.
> Done.  That did conflict with the parameter name so I renamed the
> parameter everywhere to "inode_security".
>
>>>       /*
>>>        * All inodes must already have been untied from their object by
>>>        * release_inode() or hook_sb_delete().
>>>        */
>>> -     WARN_ON_ONCE(landlock_inode(inode)->object);
>>> +     lisec = inode_sec + landlock_blob_sizes.lbs_inode;
>>> +     WARN_ON_ONCE(lisec->object);
>>>  }
>> This looks good to me.
>>
>> We can add these footers:
>> Reported-by: syzbot+5446fbf332b0602ede0b@syzkaller.appspotmail.com
>> Closes: https://lore.kernel.org/r/00000000000076ba3b0617f65cc8@google.com
> Thanks, metadata added.  This was a quick RFC patch so I didn't want
> to spend a lot of time chasing down metadata refs until I knew there
> was some basic support for this approach.  I still want to make sure
> it is okay with the IMA folks.
>
>> However, I'm wondering if we could backport this patch down to v5.15 .
>> I guess not, so I'll need to remove this hook implementation for
>> Landlock, backport it to v5.15, and then you'll need to re-add this
>> check with this patch.  At least it has been useful to spot this inode
>> issue, but it could still be useful to spot potential memory leaks with
>> a negligible performance impact.
> Yes, it's a bit complicated with the IMA/EVM promotion happening
> fairly recently.  I'm marking the patch with a stable tag, but
> considering we're at -rc7 and this needs at least one more respin,
> testing, ACKs, etc. it might not land in Linus' tree until sometime
> post v6.11-rc1.  Considering that, I might suggest dropping the
> Landlock hook in -stable and re-adding it to Linus' tree once this fix
> lands, but that decision is up to you.
>
>>> diff --git a/security/security.c b/security/security.c
>>> index b52e81ac5526..bc6805f7332e 100644
>>> --- a/security/security.c
>>> +++ b/security/security.c
>>> @@ -1596,9 +1596,8 @@ int security_inode_alloc(struct inode *inode)
>>>
>>>  static void inode_free_by_rcu(struct rcu_head *head)
>>>  {
>>> -     /*
>>> -      * The rcu head is at the start of the inode blob
>>> -      */
>>> +     /* The rcu head is at the start of the inode blob */
>>> +     call_void_hook(inode_free_security_rcu, head);
>> For this to work, we need to extend the inode blob size (lbs_inode) with
>> sizeof(struct rcu_head).  The current implementation override the
>> content of the blob with a new rcu_head.
> Take a look at lsm_set_blob_sizes() and you'll see that the rcu_head
> struct is already accounted for in the inode->i_security blob.
>
>>> @@ -1606,20 +1605,21 @@ static void inode_free_by_rcu(struct rcu_head *head)
>>>   * security_inode_free() - Free an inode's LSM blob
>>>   * @inode: the inode
>>>   *
>>> - * Deallocate the inode security structure and set @inode->i_security to NULL.
>>> + * Release any LSM resources associated with @inode, although due to the
>>> + * inode's RCU protections it is possible that the resources will not be
>>> + * fully released until after the current RCU grace period has elapsed.
>>> + *
>>> + * It is important for LSMs to note that despite being present in a call to
>>> + * security_inode_free(), @inode may still be referenced in a VFS path walk
>>> + * and calls to security_inode_permission() may be made during, or after,
>>> + * a call to security_inode_free().  For this reason the inode->i_security
>>> + * field is released via a call_rcu() callback and any LSMs which need to
>>> + * retain inode state for use in security_inode_permission() should only
>>> + * release that state in the inode_free_security_rcu() LSM hook callback.
>>>   */
>>>  void security_inode_free(struct inode *inode)
>>>  {
>>>       call_void_hook(inode_free_security, inode);
>>> -     /*
>>> -      * The inode may still be referenced in a path walk and
>>> -      * a call to security_inode_permission() can be made
>>> -      * after inode_free_security() is called. Ideally, the VFS
>>> -      * wouldn't do this, but fixing that is a much harder
>>> -      * job. For now, simply free the i_security via RCU, and
>>> -      * leave the current inode->i_security pointer intact.
>>> -      * The inode will be freed after the RCU grace period too.
>>> -      */
>>>       if (inode->i_security)
>>>               call_rcu((struct rcu_head *)inode->i_security,
>>>                        inode_free_by_rcu);
>> We should have something like:
>> call_rcu(inode->i_security.rcu, inode_free_by_rcu);
> The unfortunate side effect of that is that the patch grows
> significantly as everyone that touches inode->i_security would need to
> be updated to use inode->i_security.data or something similar.  For a
> patch that we likely want to see backported to -stable that could make
> things more difficult than needed.
>
> However, I have always thought we should add some better
> structure/typing to these opaque LSM blobs both to get away from the
> raw pointer math and add a marginal layer of safety.  I've envisioned
> doing something like this:
>
>   struct lsm_blob_inode {
>     struct selinux_blob_inode selinux;
>     struct smack_blob_inode smack;
>     struct aa_blob_inode apparmor;
>     ...
>     struct rcu_head rcu;
>   }

I have considered doing this as part of the stacking effort for quite
some time. I've already done it for the lsmblob structure that will replace
most uses of the u32 secid in the LSM APIs. I am concerned that there would
be considerable gnashing of teeth over the potential increase in the blob
sizes for kernels compiled with LSMs that aren't active. I have been frantically
avoiding anything that might slow the stacking effort further. If this would
help moving this along I'll include it in v40.

>
> .. with lsm_blob_inode allocated and assigned to inode->i_security.
> Those LSMs that are enabled and require blob space would define their
> struct with the necessary fields, those that were disabled or did not
> require space would define an empty struct; some minor pre-processor
> checking and header file work would be needed, but it shouldn't be too
> bad.  Ideally, we would use the same approach for all of the LSM
> security blobs, only with different LSM supplied structs.  Perhaps
> once we land Casey's latest patchset I'll see about doing something
> like that, but right now we've got bigger problems to address.
>
> I'll hold off on posting a proper patch for this RFC until I hear back
> from Mini or Roberto on the IMA changes.
>
Paul Moore July 10, 2024, 5:48 p.m. UTC | #7
On Wed, Jul 10, 2024 at 12:41 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> On 7/10/2024 9:20 AM, Paul Moore wrote:

...

> > However, I have always thought we should add some better
> > structure/typing to these opaque LSM blobs both to get away from the
> > raw pointer math and add a marginal layer of safety.  I've envisioned
> > doing something like this:
> >
> >   struct lsm_blob_inode {
> >     struct selinux_blob_inode selinux;
> >     struct smack_blob_inode smack;
> >     struct aa_blob_inode apparmor;
> >     ...
> >     struct rcu_head rcu;
> >   }
>
> I have considered doing this as part of the stacking effort for quite
> some time. I've already done it for the lsmblob structure that will replace
> most uses of the u32 secid in the LSM APIs. I am concerned that there would
> be considerable gnashing of teeth over the potential increase in the blob
> sizes for kernels compiled with LSMs that aren't active.

Yes, that's a fair point and something that needs to be considered.

> I have been frantically
> avoiding anything that might slow the stacking effort further. If this would
> help moving this along I'll include it in v40.

No, don't worry about this as part of improving the multi-LSM support,
this is something that can be done independently, if at all.
Roberto Sassu July 10, 2024, 9:01 p.m. UTC | #8
On 7/10/2024 4:40 AM, Paul Moore wrote:
> The LSM framework has an existing inode_free_security() hook which
> is used by LSMs that manage state associated with an inode, but
> due to the use of RCU to protect the inode, special care must be
> taken to ensure that the LSMs do not fully release the inode state
> until it is safe from a RCU perspective.
> 
> This patch implements a new inode_free_security_rcu() implementation
> hook which is called when it is safe to free the LSM's internal inode
> state.  Unfortunately, this new hook does not have access to the inode
> itself as it may already be released, so the existing
> inode_free_security() hook is retained for those LSMs which require
> access to the inode.
> 
> Signed-off-by: Paul Moore <paul@paul-moore.com>
> ---
>   include/linux/lsm_hook_defs.h     |  1 +
>   security/integrity/ima/ima.h      |  2 +-
>   security/integrity/ima/ima_iint.c | 20 ++++++++------------
>   security/integrity/ima/ima_main.c |  2 +-
>   security/landlock/fs.c            |  9 ++++++---
>   security/security.c               | 26 +++++++++++++-------------
>   6 files changed, 30 insertions(+), 30 deletions(-)
> 
> diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
> index 8fd87f823d3a..abe6b0ef892a 100644
> --- a/include/linux/lsm_hook_defs.h
> +++ b/include/linux/lsm_hook_defs.h
> @@ -114,6 +114,7 @@ LSM_HOOK(int, 0, path_notify, const struct path *path, u64 mask,
>   	 unsigned int obj_type)
>   LSM_HOOK(int, 0, inode_alloc_security, struct inode *inode)
>   LSM_HOOK(void, LSM_RET_VOID, inode_free_security, struct inode *inode)
> +LSM_HOOK(void, LSM_RET_VOID, inode_free_security_rcu, void *)
>   LSM_HOOK(int, -EOPNOTSUPP, inode_init_security, struct inode *inode,
>   	 struct inode *dir, const struct qstr *qstr, struct xattr *xattrs,
>   	 int *xattr_count)
> diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
> index 3e568126cd48..e2a2e4c7eab6 100644
> --- a/security/integrity/ima/ima.h
> +++ b/security/integrity/ima/ima.h
> @@ -223,7 +223,7 @@ static inline void ima_inode_set_iint(const struct inode *inode,
>   
>   struct ima_iint_cache *ima_iint_find(struct inode *inode);
>   struct ima_iint_cache *ima_inode_get(struct inode *inode);
> -void ima_inode_free(struct inode *inode);
> +void ima_inode_free_rcu(void *inode_sec);
>   void __init ima_iintcache_init(void);
>   
>   extern const int read_idmap[];
> diff --git a/security/integrity/ima/ima_iint.c b/security/integrity/ima/ima_iint.c
> index e23412a2c56b..54480df90bdc 100644
> --- a/security/integrity/ima/ima_iint.c
> +++ b/security/integrity/ima/ima_iint.c
> @@ -109,22 +109,18 @@ struct ima_iint_cache *ima_inode_get(struct inode *inode)
>   }
>   
>   /**
> - * ima_inode_free - Called on inode free
> - * @inode: Pointer to the inode
> + * ima_inode_free_rcu - Called to free an inode via a RCU callback
> + * @inode_sec: The inode::i_security pointer
>    *
> - * Free the iint associated with an inode.
> + * Free the IMA data associated with an inode.
>    */
> -void ima_inode_free(struct inode *inode)
> +void ima_inode_free_rcu(void *inode_sec)
>   {
> -	struct ima_iint_cache *iint;
> +	struct ima_iint_cache **iint_p = inode_sec + ima_blob_sizes.lbs_inode;
>   
> -	if (!IS_IMA(inode))
> -		return;
> -
> -	iint = ima_iint_find(inode);
> -	ima_inode_set_iint(inode, NULL);
> -
> -	ima_iint_free(iint);
> +	/* *iint_p should be NULL if !IS_IMA(inode) */
> +	if (*iint_p)
> +		ima_iint_free(*iint_p);

It looks ok for me. The only thing we are not doing is to set the 
pointer to NULL, but I guess does not matter, since the inode security 
blob is being freed.

I also ran some UML kernel tests in the CI, and everything looks good:

https://github.com/robertosassu/ima-evm-utils/actions/runs/9880817007/job/27291259487

Will think a bit, if I'm missing something.

Roberto

>   }
>   
>   static void ima_iint_init_once(void *foo)
> diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
> index f04f43af651c..5b3394864b21 100644
> --- a/security/integrity/ima/ima_main.c
> +++ b/security/integrity/ima/ima_main.c
> @@ -1193,7 +1193,7 @@ static struct security_hook_list ima_hooks[] __ro_after_init = {
>   #ifdef CONFIG_INTEGRITY_ASYMMETRIC_KEYS
>   	LSM_HOOK_INIT(kernel_module_request, ima_kernel_module_request),
>   #endif
> -	LSM_HOOK_INIT(inode_free_security, ima_inode_free),
> +	LSM_HOOK_INIT(inode_free_security_rcu, ima_inode_free_rcu),
>   };
>   
>   static const struct lsm_id ima_lsmid = {
> diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> index 22d8b7c28074..f583f8cec345 100644
> --- a/security/landlock/fs.c
> +++ b/security/landlock/fs.c
> @@ -1198,13 +1198,16 @@ static int current_check_refer_path(struct dentry *const old_dentry,
>   
>   /* Inode hooks */
>   
> -static void hook_inode_free_security(struct inode *const inode)
> +static void hook_inode_free_security_rcu(void *inode_sec)
>   {
> +	struct landlock_inode_security *lisec;
> +
>   	/*
>   	 * All inodes must already have been untied from their object by
>   	 * release_inode() or hook_sb_delete().
>   	 */
> -	WARN_ON_ONCE(landlock_inode(inode)->object);
> +	lisec = inode_sec + landlock_blob_sizes.lbs_inode;
> +	WARN_ON_ONCE(lisec->object);
>   }
>   
>   /* Super-block hooks */
> @@ -1628,7 +1631,7 @@ static int hook_file_ioctl_compat(struct file *file, unsigned int cmd,
>   }
>   
>   static struct security_hook_list landlock_hooks[] __ro_after_init = {
> -	LSM_HOOK_INIT(inode_free_security, hook_inode_free_security),
> +	LSM_HOOK_INIT(inode_free_security_rcu, hook_inode_free_security_rcu),
>   
>   	LSM_HOOK_INIT(sb_delete, hook_sb_delete),
>   	LSM_HOOK_INIT(sb_mount, hook_sb_mount),
> diff --git a/security/security.c b/security/security.c
> index b52e81ac5526..bc6805f7332e 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -1596,9 +1596,8 @@ int security_inode_alloc(struct inode *inode)
>   
>   static void inode_free_by_rcu(struct rcu_head *head)
>   {
> -	/*
> -	 * The rcu head is at the start of the inode blob
> -	 */
> +	/* The rcu head is at the start of the inode blob */
> +	call_void_hook(inode_free_security_rcu, head);
>   	kmem_cache_free(lsm_inode_cache, head);
>   }
>   
> @@ -1606,20 +1605,21 @@ static void inode_free_by_rcu(struct rcu_head *head)
>    * security_inode_free() - Free an inode's LSM blob
>    * @inode: the inode
>    *
> - * Deallocate the inode security structure and set @inode->i_security to NULL.
> + * Release any LSM resources associated with @inode, although due to the
> + * inode's RCU protections it is possible that the resources will not be
> + * fully released until after the current RCU grace period has elapsed.
> + *
> + * It is important for LSMs to note that despite being present in a call to
> + * security_inode_free(), @inode may still be referenced in a VFS path walk
> + * and calls to security_inode_permission() may be made during, or after,
> + * a call to security_inode_free().  For this reason the inode->i_security
> + * field is released via a call_rcu() callback and any LSMs which need to
> + * retain inode state for use in security_inode_permission() should only
> + * release that state in the inode_free_security_rcu() LSM hook callback.
>    */
>   void security_inode_free(struct inode *inode)
>   {
>   	call_void_hook(inode_free_security, inode);
> -	/*
> -	 * The inode may still be referenced in a path walk and
> -	 * a call to security_inode_permission() can be made
> -	 * after inode_free_security() is called. Ideally, the VFS
> -	 * wouldn't do this, but fixing that is a much harder
> -	 * job. For now, simply free the i_security via RCU, and
> -	 * leave the current inode->i_security pointer intact.
> -	 * The inode will be freed after the RCU grace period too.
> -	 */
>   	if (inode->i_security)
>   		call_rcu((struct rcu_head *)inode->i_security,
>   			 inode_free_by_rcu);
Paul Moore July 10, 2024, 9:24 p.m. UTC | #9
On Wed, Jul 10, 2024 at 5:01 PM Roberto Sassu
<roberto.sassu@huaweicloud.com> wrote:
>
> It looks ok for me. The only thing we are not doing is to set the
> pointer to NULL, but I guess does not matter, since the inode security
> blob is being freed.

Yes, that shouldn't be an issue, and depending on how things get
scheduled, it's entirely possible that the associated inode is
completely gone by the time ima_inode_free_rcu() is called.

> I also ran some UML kernel tests in the CI, and everything looks good:
>
> https://github.com/robertosassu/ima-evm-utils/actions/runs/9880817007/job/27291259487
>
> Will think a bit, if I'm missing something.

Great, thank you.
Mickaël Salaün July 15, 2024, 1:34 p.m. UTC | #10
On Wed, Jul 10, 2024 at 12:24:31PM -0400, Paul Moore wrote:
> On Wed, Jul 10, 2024 at 8:02 AM Mickaël Salaün <mic@digikod.net> wrote:
> > On Tue, Jul 09, 2024 at 10:47:45PM -0400, Paul Moore wrote:
> > > On Tue, Jul 9, 2024 at 10:40 PM Paul Moore <paul@paul-moore.com> wrote:
> > > >
> > > > The LSM framework has an existing inode_free_security() hook which
> > > > is used by LSMs that manage state associated with an inode, but
> > > > due to the use of RCU to protect the inode, special care must be
> > > > taken to ensure that the LSMs do not fully release the inode state
> > > > until it is safe from a RCU perspective.
> > > >
> > > > This patch implements a new inode_free_security_rcu() implementation
> > > > hook which is called when it is safe to free the LSM's internal inode
> > > > state.  Unfortunately, this new hook does not have access to the inode
> > > > itself as it may already be released, so the existing
> > > > inode_free_security() hook is retained for those LSMs which require
> > > > access to the inode.
> > > >
> > > > Signed-off-by: Paul Moore <paul@paul-moore.com>
> > > > ---
> > > >  include/linux/lsm_hook_defs.h     |  1 +
> > > >  security/integrity/ima/ima.h      |  2 +-
> > > >  security/integrity/ima/ima_iint.c | 20 ++++++++------------
> > > >  security/integrity/ima/ima_main.c |  2 +-
> > > >  security/landlock/fs.c            |  9 ++++++---
> > > >  security/security.c               | 26 +++++++++++++-------------
> > > >  6 files changed, 30 insertions(+), 30 deletions(-)
> > >
> > > FYI, this has only received "light" testing, and even that is fairly
> > > generous.  I booted up a system with IMA set to measure the TCB and
> > > ran through the audit and SELinux test suites; IMA seemed to be
> > > working just fine but I didn't poke at it too hard.  I didn't have an
> > > explicit Landlock test handy, but I'm hoping that the Landlock
> > > enablement on a modern Rawhide system hit it a little :)
> >
> > If you want to test Landlock, you can do so like this:
> >
> > cd tools/testing/selftests/landlock
> > make -C ../../../.. headers_install
> > make
> > for f in *_test; ./$f; done
> 
> Looks okay?
> 
> % for f in *_test; do ./$f; done | grep "^# Totals"
> # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0
> #      SKIP      overlayfs is not supported (setup)
> #      SKIP      overlayfs is not supported (setup)
> #      SKIP      this filesystem is not supported (setup)
> #      SKIP      this filesystem is not supported (setup)
> #      SKIP      this filesystem is not supported (setup)
> #      SKIP      this filesystem is not supported (setup)
> #      SKIP      this filesystem is not supported (setup)
> # Totals: pass:117 fail:0 xfail:0 xpass:0 skip:7 error:0
> # Totals: pass:84 fail:0 xfail:0 xpass:0 skip:0 error:0
> # Totals: pass:8 fail:0 xfail:0 xpass:0 skip:0 error:0

It should be enough, thanks.  FYI, the minimal configuration required to
run all tests (except hostfs) is listed in
tools/testing/selftests/landlock/config
Mickaël Salaün July 15, 2024, 1:35 p.m. UTC | #11
On Wed, Jul 10, 2024 at 12:20:18PM -0400, Paul Moore wrote:
> On Wed, Jul 10, 2024 at 6:40 AM Mickaël Salaün <mic@digikod.net> wrote:
> > On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote:
> > > The LSM framework has an existing inode_free_security() hook which
> > > is used by LSMs that manage state associated with an inode, but
> > > due to the use of RCU to protect the inode, special care must be
> > > taken to ensure that the LSMs do not fully release the inode state
> > > until it is safe from a RCU perspective.
> > >
> > > This patch implements a new inode_free_security_rcu() implementation
> > > hook which is called when it is safe to free the LSM's internal inode
> > > state.  Unfortunately, this new hook does not have access to the inode
> > > itself as it may already be released, so the existing
> > > inode_free_security() hook is retained for those LSMs which require
> > > access to the inode.
> > >
> > > Signed-off-by: Paul Moore <paul@paul-moore.com>

> > > diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> > > index 22d8b7c28074..f583f8cec345 100644
> > > --- a/security/landlock/fs.c
> > > +++ b/security/landlock/fs.c
> > > @@ -1198,13 +1198,16 @@ static int current_check_refer_path(struct dentry *const old_dentry,
> > >
> > >  /* Inode hooks */
> > >
> > > -static void hook_inode_free_security(struct inode *const inode)
> > > +static void hook_inode_free_security_rcu(void *inode_sec)
> > >  {
> > > +     struct landlock_inode_security *lisec;
> >
> > Please rename "lisec" to "inode_sec" for consistency with
> > get_inode_object()'s variables.
> 
> Done.  That did conflict with the parameter name so I renamed the
> parameter everywhere to "inode_security".

Oh, I didn't see this conflict.  Using inode_security as function
argument looks good.


> > However, I'm wondering if we could backport this patch down to v5.15 .
> > I guess not, so I'll need to remove this hook implementation for
> > Landlock, backport it to v5.15, and then you'll need to re-add this
> > check with this patch.  At least it has been useful to spot this inode
> > issue, but it could still be useful to spot potential memory leaks with
> > a negligible performance impact.
> 
> Yes, it's a bit complicated with the IMA/EVM promotion happening
> fairly recently.  I'm marking the patch with a stable tag, but
> considering we're at -rc7 and this needs at least one more respin,
> testing, ACKs, etc. it might not land in Linus' tree until sometime
> post v6.11-rc1.  Considering that, I might suggest dropping the
> Landlock hook in -stable and re-adding it to Linus' tree once this fix
> lands, but that decision is up to you.

I would prefer to backport the new hook.  I can help with that.  In
fact, I tried to backport the removal of the hook for Landlock, and it
requires the same amount of work, so it would be better to work
together.  That should also ease future backports impacting the same
part of the code.

BTW, while trying to backport that to linux-5.15.y I noticed that a fix
is missing in this branch: the default return value for the
inode_init_security hook, see commit 6bcdfd2cac55 ("security: Allow all
LSMs to provide xattrs for inode_init_security hook").

> 
> > > diff --git a/security/security.c b/security/security.c
> > > index b52e81ac5526..bc6805f7332e 100644
> > > --- a/security/security.c
> > > +++ b/security/security.c
> > > @@ -1596,9 +1596,8 @@ int security_inode_alloc(struct inode *inode)
> > >
> > >  static void inode_free_by_rcu(struct rcu_head *head)
> > >  {
> > > -     /*
> > > -      * The rcu head is at the start of the inode blob
> > > -      */
> > > +     /* The rcu head is at the start of the inode blob */
> > > +     call_void_hook(inode_free_security_rcu, head);
> >
> > For this to work, we need to extend the inode blob size (lbs_inode) with
> > sizeof(struct rcu_head).  The current implementation override the
> > content of the blob with a new rcu_head.
> 
> Take a look at lsm_set_blob_sizes() and you'll see that the rcu_head
> struct is already accounted for in the inode->i_security blob.

Indeed, I was confused with the lsm_set_blob_size() name which *adds* a
size.

> 
> > > @@ -1606,20 +1605,21 @@ static void inode_free_by_rcu(struct rcu_head *head)
> > >   * security_inode_free() - Free an inode's LSM blob
> > >   * @inode: the inode
> > >   *
> > > - * Deallocate the inode security structure and set @inode->i_security to NULL.
> > > + * Release any LSM resources associated with @inode, although due to the
> > > + * inode's RCU protections it is possible that the resources will not be
> > > + * fully released until after the current RCU grace period has elapsed.
> > > + *
> > > + * It is important for LSMs to note that despite being present in a call to
> > > + * security_inode_free(), @inode may still be referenced in a VFS path walk
> > > + * and calls to security_inode_permission() may be made during, or after,
> > > + * a call to security_inode_free().  For this reason the inode->i_security
> > > + * field is released via a call_rcu() callback and any LSMs which need to
> > > + * retain inode state for use in security_inode_permission() should only
> > > + * release that state in the inode_free_security_rcu() LSM hook callback.
> > >   */
> > >  void security_inode_free(struct inode *inode)
> > >  {
> > >       call_void_hook(inode_free_security, inode);
> > > -     /*
> > > -      * The inode may still be referenced in a path walk and
> > > -      * a call to security_inode_permission() can be made
> > > -      * after inode_free_security() is called. Ideally, the VFS
> > > -      * wouldn't do this, but fixing that is a much harder
> > > -      * job. For now, simply free the i_security via RCU, and
> > > -      * leave the current inode->i_security pointer intact.
> > > -      * The inode will be freed after the RCU grace period too.
> > > -      */
> > >       if (inode->i_security)
> > >               call_rcu((struct rcu_head *)inode->i_security,
> > >                        inode_free_by_rcu);
> >
> > We should have something like:
> > call_rcu(inode->i_security.rcu, inode_free_by_rcu);
> 
> The unfortunate side effect of that is that the patch grows
> significantly as everyone that touches inode->i_security would need to
> be updated to use inode->i_security.data or something similar.  For a
> patch that we likely want to see backported to -stable that could make
> things more difficult than needed.

Sure, this was related to my confusion with the rcu_head size addition.
Paul Moore July 15, 2024, 8:51 p.m. UTC | #12
On Mon, Jul 15, 2024 at 9:34 AM Mickaël Salaün <mic@digikod.net> wrote:
> On Wed, Jul 10, 2024 at 12:24:31PM -0400, Paul Moore wrote:
> > On Wed, Jul 10, 2024 at 8:02 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > On Tue, Jul 09, 2024 at 10:47:45PM -0400, Paul Moore wrote:
> > > > On Tue, Jul 9, 2024 at 10:40 PM Paul Moore <paul@paul-moore.com> wrote:
> > > > >
> > > > > The LSM framework has an existing inode_free_security() hook which
> > > > > is used by LSMs that manage state associated with an inode, but
> > > > > due to the use of RCU to protect the inode, special care must be
> > > > > taken to ensure that the LSMs do not fully release the inode state
> > > > > until it is safe from a RCU perspective.
> > > > >
> > > > > This patch implements a new inode_free_security_rcu() implementation
> > > > > hook which is called when it is safe to free the LSM's internal inode
> > > > > state.  Unfortunately, this new hook does not have access to the inode
> > > > > itself as it may already be released, so the existing
> > > > > inode_free_security() hook is retained for those LSMs which require
> > > > > access to the inode.
> > > > >
> > > > > Signed-off-by: Paul Moore <paul@paul-moore.com>
> > > > > ---
> > > > >  include/linux/lsm_hook_defs.h     |  1 +
> > > > >  security/integrity/ima/ima.h      |  2 +-
> > > > >  security/integrity/ima/ima_iint.c | 20 ++++++++------------
> > > > >  security/integrity/ima/ima_main.c |  2 +-
> > > > >  security/landlock/fs.c            |  9 ++++++---
> > > > >  security/security.c               | 26 +++++++++++++-------------
> > > > >  6 files changed, 30 insertions(+), 30 deletions(-)
> > > >
> > > > FYI, this has only received "light" testing, and even that is fairly
> > > > generous.  I booted up a system with IMA set to measure the TCB and
> > > > ran through the audit and SELinux test suites; IMA seemed to be
> > > > working just fine but I didn't poke at it too hard.  I didn't have an
> > > > explicit Landlock test handy, but I'm hoping that the Landlock
> > > > enablement on a modern Rawhide system hit it a little :)
> > >
> > > If you want to test Landlock, you can do so like this:
> > >
> > > cd tools/testing/selftests/landlock
> > > make -C ../../../.. headers_install
> > > make
> > > for f in *_test; ./$f; done
> >
> > Looks okay?
> >
> > % for f in *_test; do ./$f; done | grep "^# Totals"
> > # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0
> > #      SKIP      overlayfs is not supported (setup)
> > #      SKIP      overlayfs is not supported (setup)
> > #      SKIP      this filesystem is not supported (setup)
> > #      SKIP      this filesystem is not supported (setup)
> > #      SKIP      this filesystem is not supported (setup)
> > #      SKIP      this filesystem is not supported (setup)
> > #      SKIP      this filesystem is not supported (setup)
> > # Totals: pass:117 fail:0 xfail:0 xpass:0 skip:7 error:0
> > # Totals: pass:84 fail:0 xfail:0 xpass:0 skip:0 error:0
> > # Totals: pass:8 fail:0 xfail:0 xpass:0 skip:0 error:0
>
> It should be enough, thanks.  FYI, the minimal configuration required to
> run all tests (except hostfs) is listed in
> tools/testing/selftests/landlock/config

Thanks, I'll try to remember to add that to my test builds.
Paul Moore July 15, 2024, 9:23 p.m. UTC | #13
On Mon, Jul 15, 2024 at 9:35 AM Mickaël Salaün <mic@digikod.net> wrote:
> On Wed, Jul 10, 2024 at 12:20:18PM -0400, Paul Moore wrote:
> > On Wed, Jul 10, 2024 at 6:40 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote:

...

> > > However, I'm wondering if we could backport this patch down to v5.15 .
> > > I guess not, so I'll need to remove this hook implementation for
> > > Landlock, backport it to v5.15, and then you'll need to re-add this
> > > check with this patch.  At least it has been useful to spot this inode
> > > issue, but it could still be useful to spot potential memory leaks with
> > > a negligible performance impact.
> >
> > Yes, it's a bit complicated with the IMA/EVM promotion happening
> > fairly recently.  I'm marking the patch with a stable tag, but
> > considering we're at -rc7 and this needs at least one more respin,
> > testing, ACKs, etc. it might not land in Linus' tree until sometime
> > post v6.11-rc1.  Considering that, I might suggest dropping the
> > Landlock hook in -stable and re-adding it to Linus' tree once this fix
> > lands, but that decision is up to you.
>
> I would prefer to backport the new hook.  I can help with that.  In
> fact, I tried to backport the removal of the hook for Landlock, and it
> requires the same amount of work, so it would be better to work
> together.  That should also ease future backports impacting the same
> part of the code.

Okay, let's get the initial v6.11 LSM PR merged (I just sent it to
Linus) and then I'll post the updated patchset and a proper patch for
review.

> BTW, while trying to backport that to linux-5.15.y I noticed that a fix
> is missing in this branch: the default return value for the
> inode_init_security hook, see commit 6bcdfd2cac55 ("security: Allow all
> LSMs to provide xattrs for inode_init_security hook").

Likely a casualty of a merge conflict; I haven't noticed the stable
kernel folks doing any manual merging of LSM patches that fail an
automated merge.  You can always do the merge and send it to them.
Matus Jokay July 22, 2024, 12:29 p.m. UTC | #14
On 10. 7. 2024 12:40, Mickaël Salaün wrote:
> On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote:
>> The LSM framework has an existing inode_free_security() hook which
>> is used by LSMs that manage state associated with an inode, but
>> due to the use of RCU to protect the inode, special care must be
>> taken to ensure that the LSMs do not fully release the inode state
>> until it is safe from a RCU perspective.
>>
>> This patch implements a new inode_free_security_rcu() implementation
>> hook which is called when it is safe to free the LSM's internal inode
>> state.  Unfortunately, this new hook does not have access to the inode
>> itself as it may already be released, so the existing
>> inode_free_security() hook is retained for those LSMs which require
>> access to the inode.
>>
>> Signed-off-by: Paul Moore <paul@paul-moore.com>
> 
> I like this new hook.  It is definitely safer than the current approach.
> 
> To make it more consistent, I think we should also rename
> security_inode_free() to security_inode_put() to highlight the fact that
> LSM implementations should not free potential pointers in this blob
> because they could still be dereferenced in a path walk.
> 
>> ---
>>  include/linux/lsm_hook_defs.h     |  1 +
>>  security/integrity/ima/ima.h      |  2 +-
>>  security/integrity/ima/ima_iint.c | 20 ++++++++------------
>>  security/integrity/ima/ima_main.c |  2 +-
>>  security/landlock/fs.c            |  9 ++++++---
>>  security/security.c               | 26 +++++++++++++-------------
>>  6 files changed, 30 insertions(+), 30 deletions(-)
>>
>> diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
>> index 8fd87f823d3a..abe6b0ef892a 100644
>> --- a/include/linux/lsm_hook_defs.h
>> +++ b/include/linux/lsm_hook_defs.h
>> @@ -114,6 +114,7 @@ LSM_HOOK(int, 0, path_notify, const struct path *path, u64 mask,
>>  	 unsigned int obj_type)
>>  LSM_HOOK(int, 0, inode_alloc_security, struct inode *inode)
>>  LSM_HOOK(void, LSM_RET_VOID, inode_free_security, struct inode *inode)
>> +LSM_HOOK(void, LSM_RET_VOID, inode_free_security_rcu, void *)
>>  LSM_HOOK(int, -EOPNOTSUPP, inode_init_security, struct inode *inode,
>>  	 struct inode *dir, const struct qstr *qstr, struct xattr *xattrs,
>>  	 int *xattr_count)
>> diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
>> index 3e568126cd48..e2a2e4c7eab6 100644
>> --- a/security/integrity/ima/ima.h
>> +++ b/security/integrity/ima/ima.h
>> @@ -223,7 +223,7 @@ static inline void ima_inode_set_iint(const struct inode *inode,
>>  
>>  struct ima_iint_cache *ima_iint_find(struct inode *inode);
>>  struct ima_iint_cache *ima_inode_get(struct inode *inode);
>> -void ima_inode_free(struct inode *inode);
>> +void ima_inode_free_rcu(void *inode_sec);
>>  void __init ima_iintcache_init(void);
>>  
>>  extern const int read_idmap[];
>> diff --git a/security/integrity/ima/ima_iint.c b/security/integrity/ima/ima_iint.c
>> index e23412a2c56b..54480df90bdc 100644
>> --- a/security/integrity/ima/ima_iint.c
>> +++ b/security/integrity/ima/ima_iint.c
>> @@ -109,22 +109,18 @@ struct ima_iint_cache *ima_inode_get(struct inode *inode)
>>  }
>>  
>>  /**
>> - * ima_inode_free - Called on inode free
>> - * @inode: Pointer to the inode
>> + * ima_inode_free_rcu - Called to free an inode via a RCU callback
>> + * @inode_sec: The inode::i_security pointer
>>   *
>> - * Free the iint associated with an inode.
>> + * Free the IMA data associated with an inode.
>>   */
>> -void ima_inode_free(struct inode *inode)
>> +void ima_inode_free_rcu(void *inode_sec)
>>  {
>> -	struct ima_iint_cache *iint;
>> +	struct ima_iint_cache **iint_p = inode_sec + ima_blob_sizes.lbs_inode;
>>  
>> -	if (!IS_IMA(inode))
>> -		return;
>> -
>> -	iint = ima_iint_find(inode);
>> -	ima_inode_set_iint(inode, NULL);
>> -
>> -	ima_iint_free(iint);
>> +	/* *iint_p should be NULL if !IS_IMA(inode) */
>> +	if (*iint_p)
>> +		ima_iint_free(*iint_p);
>>  }
>>  
>>  static void ima_iint_init_once(void *foo)
>> diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
>> index f04f43af651c..5b3394864b21 100644
>> --- a/security/integrity/ima/ima_main.c
>> +++ b/security/integrity/ima/ima_main.c
>> @@ -1193,7 +1193,7 @@ static struct security_hook_list ima_hooks[] __ro_after_init = {
>>  #ifdef CONFIG_INTEGRITY_ASYMMETRIC_KEYS
>>  	LSM_HOOK_INIT(kernel_module_request, ima_kernel_module_request),
>>  #endif
>> -	LSM_HOOK_INIT(inode_free_security, ima_inode_free),
>> +	LSM_HOOK_INIT(inode_free_security_rcu, ima_inode_free_rcu),
>>  };
>>  
>>  static const struct lsm_id ima_lsmid = {
>> diff --git a/security/landlock/fs.c b/security/landlock/fs.c
>> index 22d8b7c28074..f583f8cec345 100644
>> --- a/security/landlock/fs.c
>> +++ b/security/landlock/fs.c
>> @@ -1198,13 +1198,16 @@ static int current_check_refer_path(struct dentry *const old_dentry,
>>  
>>  /* Inode hooks */
>>  
>> -static void hook_inode_free_security(struct inode *const inode)
>> +static void hook_inode_free_security_rcu(void *inode_sec)
>>  {
>> +	struct landlock_inode_security *lisec;
> 
> Please rename "lisec" to "inode_sec" for consistency with
> get_inode_object()'s variables.
> 
>> +
>>  	/*
>>  	 * All inodes must already have been untied from their object by
>>  	 * release_inode() or hook_sb_delete().
>>  	 */
>> -	WARN_ON_ONCE(landlock_inode(inode)->object);
>> +	lisec = inode_sec + landlock_blob_sizes.lbs_inode;
>> +	WARN_ON_ONCE(lisec->object);
>>  }
> 
> This looks good to me.
> 
> We can add these footers:
> Reported-by: syzbot+5446fbf332b0602ede0b@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/r/00000000000076ba3b0617f65cc8@google.com
Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC:

1) How does this patch close [1]?
   As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode."
   Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission,
   i.e. referencing the inode in a VFS path walk while destroying it...
   Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy.

2) Is there a guarantee that inode_free_by_rcu and i_callback will be called within the same RCU grace period?
   If not, can the security context be released earlier than the inode itself? If yes, can be executed
   inode_permission concurrently, leading to UAF of inode security context in security_inode_permission?
   Can fsnotify affect this (leading to different RCU grace periods)? (Again probably a question for VFS people.)
   I know, this RFC doesn't address exactly that question, but I'd like to know the answer.

Many thanks for the helpful answers and your time,
mY

[1] https://lore.kernel.org/r/00000000000076ba3b0617f65cc8@google.com
[2] https://lore.kernel.org/linux-security-module/CAHC9VhQUqJkWxhe5KukPOVQMnOhcOH5E+BJ4_b3Qn6edsL5YJQ@mail.gmail.com/T/#m6e6b01b196eac15a7ad99cf366614bbe60b8d9a2

> 
> However, I'm wondering if we could backport this patch down to v5.15 .
> I guess not, so I'll need to remove this hook implementation for
> Landlock, backport it to v5.15, and then you'll need to re-add this
> check with this patch.  At least it has been useful to spot this inode
> issue, but it could still be useful to spot potential memory leaks with
> a negligible performance impact.
> 
> 
>>  
>>  /* Super-block hooks */
>> @@ -1628,7 +1631,7 @@ static int hook_file_ioctl_compat(struct file *file, unsigned int cmd,
>>  }
>>  
>>  static struct security_hook_list landlock_hooks[] __ro_after_init = {
>> -	LSM_HOOK_INIT(inode_free_security, hook_inode_free_security),
>> +	LSM_HOOK_INIT(inode_free_security_rcu, hook_inode_free_security_rcu),
>>  
>>  	LSM_HOOK_INIT(sb_delete, hook_sb_delete),
>>  	LSM_HOOK_INIT(sb_mount, hook_sb_mount),
>> diff --git a/security/security.c b/security/security.c
>> index b52e81ac5526..bc6805f7332e 100644
>> --- a/security/security.c
>> +++ b/security/security.c
>> @@ -1596,9 +1596,8 @@ int security_inode_alloc(struct inode *inode)
>>  
>>  static void inode_free_by_rcu(struct rcu_head *head)
>>  {
>> -	/*
>> -	 * The rcu head is at the start of the inode blob
>> -	 */
>> +	/* The rcu head is at the start of the inode blob */
>> +	call_void_hook(inode_free_security_rcu, head);
> 
> For this to work, we need to extend the inode blob size (lbs_inode) with
> sizeof(struct rcu_head).  The current implementation override the
> content of the blob with a new rcu_head.
> 
>>  	kmem_cache_free(lsm_inode_cache, head);
>>  }
>>  
>> @@ -1606,20 +1605,21 @@ static void inode_free_by_rcu(struct rcu_head *head)
>>   * security_inode_free() - Free an inode's LSM blob
>>   * @inode: the inode
>>   *
>> - * Deallocate the inode security structure and set @inode->i_security to NULL.
>> + * Release any LSM resources associated with @inode, although due to the
>> + * inode's RCU protections it is possible that the resources will not be
>> + * fully released until after the current RCU grace period has elapsed.
>> + *
>> + * It is important for LSMs to note that despite being present in a call to
>> + * security_inode_free(), @inode may still be referenced in a VFS path walk
>> + * and calls to security_inode_permission() may be made during, or after,
>> + * a call to security_inode_free().  For this reason the inode->i_security
>> + * field is released via a call_rcu() callback and any LSMs which need to
>> + * retain inode state for use in security_inode_permission() should only
>> + * release that state in the inode_free_security_rcu() LSM hook callback.
>>   */
>>  void security_inode_free(struct inode *inode)
>>  {
>>  	call_void_hook(inode_free_security, inode);
>> -	/*
>> -	 * The inode may still be referenced in a path walk and
>> -	 * a call to security_inode_permission() can be made
>> -	 * after inode_free_security() is called. Ideally, the VFS
>> -	 * wouldn't do this, but fixing that is a much harder
>> -	 * job. For now, simply free the i_security via RCU, and
>> -	 * leave the current inode->i_security pointer intact.
>> -	 * The inode will be freed after the RCU grace period too.
>> -	 */
>>  	if (inode->i_security)
>>  		call_rcu((struct rcu_head *)inode->i_security,
>>  			 inode_free_by_rcu);
> 
> We should have something like:
> call_rcu(inode->i_security.rcu, inode_free_by_rcu);
> 
>> -- 
>> 2.45.2
>>
>>
>
Paul Moore July 22, 2024, 7:46 p.m. UTC | #15
On Mon, Jul 22, 2024 at 8:30 AM Matus Jokay <matus.jokay@stuba.sk> wrote:
> On 10. 7. 2024 12:40, Mickaël Salaün wrote:
> > On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote:
> >> The LSM framework has an existing inode_free_security() hook which
> >> is used by LSMs that manage state associated with an inode, but
> >> due to the use of RCU to protect the inode, special care must be
> >> taken to ensure that the LSMs do not fully release the inode state
> >> until it is safe from a RCU perspective.
> >>
> >> This patch implements a new inode_free_security_rcu() implementation
> >> hook which is called when it is safe to free the LSM's internal inode
> >> state.  Unfortunately, this new hook does not have access to the inode
> >> itself as it may already be released, so the existing
> >> inode_free_security() hook is retained for those LSMs which require
> >> access to the inode.
> >>
> >> Signed-off-by: Paul Moore <paul@paul-moore.com>
> >
> > I like this new hook.  It is definitely safer than the current approach.
> >
> > To make it more consistent, I think we should also rename
> > security_inode_free() to security_inode_put() to highlight the fact that
> > LSM implementations should not free potential pointers in this blob
> > because they could still be dereferenced in a path walk.
> >
> >> ---
> >>  include/linux/lsm_hook_defs.h     |  1 +
> >>  security/integrity/ima/ima.h      |  2 +-
> >>  security/integrity/ima/ima_iint.c | 20 ++++++++------------
> >>  security/integrity/ima/ima_main.c |  2 +-
> >>  security/landlock/fs.c            |  9 ++++++---
> >>  security/security.c               | 26 +++++++++++++-------------
> >>  6 files changed, 30 insertions(+), 30 deletions(-)

...

> Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC:
>
> 1) How does this patch close [1]?
>    As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode."
>    Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission,
>    i.e. referencing the inode in a VFS path walk while destroying it...
>    Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy.

The VFS folks can likely provide a better, or perhaps a more correct
answer, but my understanding is that during the path walk the inode is
protected by a RCU lock which allows for multiple threads to access
the inode simultaneously; this could result in some cases where one
thread is destroying the inode while another is accessing it.
Changing this would require changes to the VFS code, and I'm not sure
why you would want to change it anyway, the performance win of using
RCU here is likely significant.

> 2) Is there a guarantee that inode_free_by_rcu and i_callback will be called within the same RCU grace period?

I'm not an RCU expert, but I don't believe there are any guarantees
that the inode_free_by_rcu() and the inode's own free routines are
going to be called within the same RCU grace period (not really
applicable as inode_free_by_rcu() isn't called *during* a grace
period, but *after* the grace period of the associated
security_inode_free() call).  However, this patch does not rely on
synchronization between the inode and inode LSM free routine in
inode_free_by_rcu(); the inode_free_by_rcu() function and the new
inode_free_security_rcu() LSM callback does not have a pointer to the
inode, only the inode's LSM blob.  I agree that it is a bit odd, but
freeing the inode and inode's LSM blob independently of each other
should not cause a problem so long as the inode is no longer in use
(hence the RCU callbacks).

>    If not, can the security context be released earlier than the inode itself?

Possibly, but it should only happen after the inode is no longer in
use (the call_rcu () callback should ensure that we are past all of
the currently executing RCU critical sections).

> If yes, can be executed
>    inode_permission concurrently, leading to UAF of inode security context in security_inode_permission?

I do not believe so, see the discussion above, but I welcome any corrections.

>    Can fsnotify affect this (leading to different RCU grace periods)? (Again probably a question for VFS people.)

If fsnotify is affecting this negatively then I suspect that is a
reason for much larger concern as I believe that would indicate a
problem with fsnotify and the inode locking scheme.
Dave Chinner July 23, 2024, 3:34 a.m. UTC | #16
On Mon, Jul 22, 2024 at 03:46:36PM -0400, Paul Moore wrote:
> On Mon, Jul 22, 2024 at 8:30 AM Matus Jokay <matus.jokay@stuba.sk> wrote:
> > On 10. 7. 2024 12:40, Mickaël Salaün wrote:
> > > On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote:
> > >> The LSM framework has an existing inode_free_security() hook which
> > >> is used by LSMs that manage state associated with an inode, but
> > >> due to the use of RCU to protect the inode, special care must be
> > >> taken to ensure that the LSMs do not fully release the inode state
> > >> until it is safe from a RCU perspective.
> > >>
> > >> This patch implements a new inode_free_security_rcu() implementation
> > >> hook which is called when it is safe to free the LSM's internal inode
> > >> state.  Unfortunately, this new hook does not have access to the inode
> > >> itself as it may already be released, so the existing
> > >> inode_free_security() hook is retained for those LSMs which require
> > >> access to the inode.
> > >>
> > >> Signed-off-by: Paul Moore <paul@paul-moore.com>
> > >
> > > I like this new hook.  It is definitely safer than the current approach.
> > >
> > > To make it more consistent, I think we should also rename
> > > security_inode_free() to security_inode_put() to highlight the fact that
> > > LSM implementations should not free potential pointers in this blob
> > > because they could still be dereferenced in a path walk.
> > >
> > >> ---
> > >>  include/linux/lsm_hook_defs.h     |  1 +
> > >>  security/integrity/ima/ima.h      |  2 +-
> > >>  security/integrity/ima/ima_iint.c | 20 ++++++++------------
> > >>  security/integrity/ima/ima_main.c |  2 +-
> > >>  security/landlock/fs.c            |  9 ++++++---
> > >>  security/security.c               | 26 +++++++++++++-------------
> > >>  6 files changed, 30 insertions(+), 30 deletions(-)
> 
> ...
> 
> > Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC:
> >
> > 1) How does this patch close [1]?
> >    As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode."
> >    Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission,
> >    i.e. referencing the inode in a VFS path walk while destroying it...
> >    Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy.
> 
> The VFS folks can likely provide a better, or perhaps a more correct
> answer, but my understanding is that during the path walk the inode is
> protected by a RCU lock which allows for multiple threads to access
> the inode simultaneously; this could result in some cases where one
> thread is destroying the inode while another is accessing it.

Shouldn't may_lookup() be checking the inode for (I_NEW |
I_WILLFREE | I_FREE) so that it doesn't access an inode either not
completely initialised or being evicted during the RCU path walk?
All accesses to the VFS inode that don't have explicit reference
counts have to do these checks...

IIUC, at the may_lookup() point, the RCU pathwalk doesn't have a
fully validate reference count to the dentry or the inode at this
point, so it seems accessing random objects attached to an inode
that can be anywhere in the setup or teardown process isn't at all
safe...

-Dave.
Matus Jokay July 23, 2024, 9:27 a.m. UTC | #17
On 22. 7. 2024 21:46, Paul Moore wrote:
> On Mon, Jul 22, 2024 at 8:30 AM Matus Jokay <matus.jokay@stuba.sk> wrote:
>> On 10. 7. 2024 12:40, Mickaël Salaün wrote:
>>> On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote:
>>>> The LSM framework has an existing inode_free_security() hook which
>>>> is used by LSMs that manage state associated with an inode, but
>>>> due to the use of RCU to protect the inode, special care must be
>>>> taken to ensure that the LSMs do not fully release the inode state
>>>> until it is safe from a RCU perspective.
>>>>
>>>> This patch implements a new inode_free_security_rcu() implementation
>>>> hook which is called when it is safe to free the LSM's internal inode
>>>> state.  Unfortunately, this new hook does not have access to the inode
>>>> itself as it may already be released, so the existing
>>>> inode_free_security() hook is retained for those LSMs which require
>>>> access to the inode.
>>>>
>>>> Signed-off-by: Paul Moore <paul@paul-moore.com>
>>>
>>> I like this new hook.  It is definitely safer than the current approach.
>>>
>>> To make it more consistent, I think we should also rename
>>> security_inode_free() to security_inode_put() to highlight the fact that
>>> LSM implementations should not free potential pointers in this blob
>>> because they could still be dereferenced in a path walk.
>>>
>>>> ---
>>>>  include/linux/lsm_hook_defs.h     |  1 +
>>>>  security/integrity/ima/ima.h      |  2 +-
>>>>  security/integrity/ima/ima_iint.c | 20 ++++++++------------
>>>>  security/integrity/ima/ima_main.c |  2 +-
>>>>  security/landlock/fs.c            |  9 ++++++---
>>>>  security/security.c               | 26 +++++++++++++-------------
>>>>  6 files changed, 30 insertions(+), 30 deletions(-)
> 
> ...
> 
>> Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC:
>>
>> 1) How does this patch close [1]?
>>    As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode."
>>    Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission,
>>    i.e. referencing the inode in a VFS path walk while destroying it...
>>    Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy.
> 
> The VFS folks can likely provide a better, or perhaps a more correct
> answer, but my understanding is that during the path walk the inode is
> protected by a RCU lock which allows for multiple threads to access
> the inode simultaneously; this could result in some cases where one
> thread is destroying the inode while another is accessing it.
> Changing this would require changes to the VFS code, and I'm not sure
> why you would want to change it anyway, the performance win of using
> RCU here is likely significant.
> 
>> 2) Is there a guarantee that inode_free_by_rcu and i_callback will be called within the same RCU grace period?
> 
> I'm not an RCU expert, but I don't believe there are any guarantees
> that the inode_free_by_rcu() and the inode's own free routines are
> going to be called within the same RCU grace period (not really
> applicable as inode_free_by_rcu() isn't called *during* a grace
> period, but *after* the grace period of the associated
> security_inode_free() call).  However, this patch does not rely on
> synchronization between the inode and inode LSM free routine in
> inode_free_by_rcu(); the inode_free_by_rcu() function and the new
> inode_free_security_rcu() LSM callback does not have a pointer to the
> inode, only the inode's LSM blob.  I agree that it is a bit odd, but
> freeing the inode and inode's LSM blob independently of each other
> should not cause a problem so long as the inode is no longer in use
> (hence the RCU callbacks).

Paul, many thanks for your answer.

I will try to clarify the issue, because fsnotify was a bad example.
Here is the related code taken from v10.

void security_inode_free(struct inode *inode)
{
	call_void_hook(inode_free_security, inode);
	/*
	 * The inode may still be referenced in a path walk and
	 * a call to security_inode_permission() can be made
	 * after inode_free_security() is called. Ideally, the VFS
	 * wouldn't do this, but fixing that is a much harder
	 * job. For now, simply free the i_security via RCU, and
	 * leave the current inode->i_security pointer intact.
	 * The inode will be freed after the RCU grace period too.
	 */
	if (inode->i_security)
		call_rcu((struct rcu_head *)inode->i_security,
			 inode_free_by_rcu);
}

void __destroy_inode(struct inode *inode)
{
	BUG_ON(inode_has_buffers(inode));
	inode_detach_wb(inode);
	security_inode_free(inode);
	fsnotify_inode_delete(inode);
	locks_free_lock_context(inode);
	if (!inode->i_nlink) {
		WARN_ON(atomic_long_read(&inode->i_sb->s_remove_count) == 0);
		atomic_long_dec(&inode->i_sb->s_remove_count);
	}

#ifdef CONFIG_FS_POSIX_ACL
	if (inode->i_acl && !is_uncached_acl(inode->i_acl))
		posix_acl_release(inode->i_acl);
	if (inode->i_default_acl && !is_uncached_acl(inode->i_default_acl))
		posix_acl_release(inode->i_default_acl);
#endif
	this_cpu_dec(nr_inodes);
}

static void destroy_inode(struct inode *inode)
{
	const struct super_operations *ops = inode->i_sb->s_op;

	BUG_ON(!list_empty(&inode->i_lru));
	__destroy_inode(inode);
	if (ops->destroy_inode) {
		ops->destroy_inode(inode);
		if (!ops->free_inode)
			return;
	}
	inode->free_inode = ops->free_inode;
	call_rcu(&inode->i_rcu, i_callback);
}

Yes, inode_free_by_rcu() is being called after the grace period of the associated
security_inode_free(). i_callback() is also called after the grace period, but is it
always the same grace period as in the case of inode_free_by_rcu()? If not in general,
maybe it could be a problem. Explanation below.

If there is a function call leading to the end of the grace period between
call_rcu(inode_free_by_rcu) and call_rcu(i_callback) (by reaching a CPU quiescent state
or another mechanism?), there will be a small time window, when the inode security
context is released, but the inode itself not, because call_rcu(i_callback) was not called
yet. So in that case each access to inode security blob leads to UAF.

For example, see invoking ops->destroy_inode() after call_rcu(inode_free_by_rcu) but
*before* call_rcu(i_callback). If destroy_inode() may sleep, can be reached end of the
grace period? destroy_inode() is *before* call_rcu(i_callback), therefore simultaneous
access to the inode during path lookup may be performed. Note: I use destroy_inode() as
*an example* of the idea. I'm not expert at all in fsnotify, posix ACL, VFS in general
and RCU, too.

In the previous message I only mentioned fsnotify, but it was only as an example.
I think that destroy_inode() is a better example of the idea I wanted to express.

I repeat that I'm aware that this RFC does not aim to solve this issue. But it can be
unpleasant to get another UAF in a production environment.

And regarding the UAF in [1], it seems very strange to me. The object managed by
Landlock was *not* dereferenced. There was access to the inode security blob itself.

static void hook_inode_free_security(struct inode *const inode)
{
	/*
	 * All inodes must already have been untied from their object by
	 * release_inode() or hook_sb_delete().
	 */
	WARN_ON_ONCE(landlock_inode(inode)->object);
}

But security blob is released at the end of the grace period related to
security_inode_free(): call_rcu(inode_free_by_rcu) is *after* invoking all registered
inode_free_security hooks.

The only place of releasing inode security blob I see in inode_free_by_rcu(). Thus,
I think, there was another call of __destroy_inode(). Or general protection fault was
not caused by UAF. Any ideas? Can someone explain it? I don't understand what and *how*
happened.

If Landlock had dereferenced the object it manages, this RFC could be the right one (if
it were a dereference from a fast path lookup, of course).

[1] https://lore.kernel.org/all/00000000000076ba3b0617f65cc8@google.com/


> 
>>    If not, can the security context be released earlier than the inode itself?
> 
> Possibly, but it should only happen after the inode is no longer in
> use (the call_rcu () callback should ensure that we are past all of
> the currently executing RCU critical sections).
> 
>> If yes, can be executed
>>    inode_permission concurrently, leading to UAF of inode security context in security_inode_permission?
> 
> I do not believe so, see the discussion above, but I welcome any corrections.
> 
>>    Can fsnotify affect this (leading to different RCU grace periods)? (Again probably a question for VFS people.)
> 
> If fsnotify is affecting this negatively then I suspect that is a
> reason for much larger concern as I believe that would indicate a
> problem with fsnotify and the inode locking scheme.
>
Matus Jokay July 23, 2024, 10:01 a.m. UTC | #18
On 23. 7. 2024 5:34, Dave Chinner wrote:
> On Mon, Jul 22, 2024 at 03:46:36PM -0400, Paul Moore wrote:
>> On Mon, Jul 22, 2024 at 8:30 AM Matus Jokay <matus.jokay@stuba.sk> wrote:
>>> On 10. 7. 2024 12:40, Mickaël Salaün wrote:
>>>> On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote:
>>>>> The LSM framework has an existing inode_free_security() hook which
>>>>> is used by LSMs that manage state associated with an inode, but
>>>>> due to the use of RCU to protect the inode, special care must be
>>>>> taken to ensure that the LSMs do not fully release the inode state
>>>>> until it is safe from a RCU perspective.
>>>>>
>>>>> This patch implements a new inode_free_security_rcu() implementation
>>>>> hook which is called when it is safe to free the LSM's internal inode
>>>>> state.  Unfortunately, this new hook does not have access to the inode
>>>>> itself as it may already be released, so the existing
>>>>> inode_free_security() hook is retained for those LSMs which require
>>>>> access to the inode.
>>>>>
>>>>> Signed-off-by: Paul Moore <paul@paul-moore.com>
>>>>
>>>> I like this new hook.  It is definitely safer than the current approach.
>>>>
>>>> To make it more consistent, I think we should also rename
>>>> security_inode_free() to security_inode_put() to highlight the fact that
>>>> LSM implementations should not free potential pointers in this blob
>>>> because they could still be dereferenced in a path walk.
>>>>
>>>>> ---
>>>>>  include/linux/lsm_hook_defs.h     |  1 +
>>>>>  security/integrity/ima/ima.h      |  2 +-
>>>>>  security/integrity/ima/ima_iint.c | 20 ++++++++------------
>>>>>  security/integrity/ima/ima_main.c |  2 +-
>>>>>  security/landlock/fs.c            |  9 ++++++---
>>>>>  security/security.c               | 26 +++++++++++++-------------
>>>>>  6 files changed, 30 insertions(+), 30 deletions(-)
>>
>> ...
>>
>>> Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC:
>>>
>>> 1) How does this patch close [1]?
>>>    As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode."
>>>    Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission,
>>>    i.e. referencing the inode in a VFS path walk while destroying it...
>>>    Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy.
>>
>> The VFS folks can likely provide a better, or perhaps a more correct
>> answer, but my understanding is that during the path walk the inode is
>> protected by a RCU lock which allows for multiple threads to access
>> the inode simultaneously; this could result in some cases where one
>> thread is destroying the inode while another is accessing it.
> 
> Shouldn't may_lookup() be checking the inode for (I_NEW |
> I_WILLFREE | I_FREE) so that it doesn't access an inode either not
> completely initialised or being evicted during the RCU path walk?
> All accesses to the VFS inode that don't have explicit reference
> counts have to do these checks...
> 
> IIUC, at the may_lookup() point, the RCU pathwalk doesn't have a
> fully validate reference count to the dentry or the inode at this
> point, so it seems accessing random objects attached to an inode
> that can be anywhere in the setup or teardown process isn't at all
> safe...
> 
> -Dave.

Indeed, but maybe only VFS maintainers can give us the answer to why may_lookup()
does not need at some point check for (I_NEW | I_WILL_FREE | I_FREEING).

Thanks,
mY
Christian Brauner July 23, 2024, 3:19 p.m. UTC | #19
On Tue, Jul 23, 2024 at 01:34:44PM GMT, Dave Chinner wrote:
> On Mon, Jul 22, 2024 at 03:46:36PM -0400, Paul Moore wrote:
> > On Mon, Jul 22, 2024 at 8:30 AM Matus Jokay <matus.jokay@stuba.sk> wrote:
> > > On 10. 7. 2024 12:40, Mickaël Salaün wrote:
> > > > On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote:
> > > >> The LSM framework has an existing inode_free_security() hook which
> > > >> is used by LSMs that manage state associated with an inode, but
> > > >> due to the use of RCU to protect the inode, special care must be
> > > >> taken to ensure that the LSMs do not fully release the inode state
> > > >> until it is safe from a RCU perspective.
> > > >>
> > > >> This patch implements a new inode_free_security_rcu() implementation
> > > >> hook which is called when it is safe to free the LSM's internal inode
> > > >> state.  Unfortunately, this new hook does not have access to the inode
> > > >> itself as it may already be released, so the existing
> > > >> inode_free_security() hook is retained for those LSMs which require
> > > >> access to the inode.
> > > >>
> > > >> Signed-off-by: Paul Moore <paul@paul-moore.com>
> > > >
> > > > I like this new hook.  It is definitely safer than the current approach.
> > > >
> > > > To make it more consistent, I think we should also rename
> > > > security_inode_free() to security_inode_put() to highlight the fact that
> > > > LSM implementations should not free potential pointers in this blob
> > > > because they could still be dereferenced in a path walk.
> > > >
> > > >> ---
> > > >>  include/linux/lsm_hook_defs.h     |  1 +
> > > >>  security/integrity/ima/ima.h      |  2 +-
> > > >>  security/integrity/ima/ima_iint.c | 20 ++++++++------------
> > > >>  security/integrity/ima/ima_main.c |  2 +-
> > > >>  security/landlock/fs.c            |  9 ++++++---
> > > >>  security/security.c               | 26 +++++++++++++-------------
> > > >>  6 files changed, 30 insertions(+), 30 deletions(-)
> > 
> > ...
> > 
> > > Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC:
> > >
> > > 1) How does this patch close [1]?
> > >    As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode."
> > >    Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission,
> > >    i.e. referencing the inode in a VFS path walk while destroying it...
> > >    Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy.
> > 
> > The VFS folks can likely provide a better, or perhaps a more correct
> > answer, but my understanding is that during the path walk the inode is
> > protected by a RCU lock which allows for multiple threads to access
> > the inode simultaneously; this could result in some cases where one
> > thread is destroying the inode while another is accessing it.
> 
> Shouldn't may_lookup() be checking the inode for (I_NEW |
> I_WILLFREE | I_FREE) so that it doesn't access an inode either not
> completely initialised or being evicted during the RCU path walk?

Going from memory since I don't have time to go really into the weeds.

A non-completely initalised inode shouldn't appear in path lookup.
Before the inode is attached to a dentry I_NEW would have been removed
otherwise this is a bug. That can either happen via unlock_new_inode()
and d_splice_alias() or in some cases directly via d_instantiate_new().
Concurrent inode lookup calls on the same inode (e.g., iget_locked() and
friends) will sleep until I_NEW is cleared.

> All accesses to the VFS inode that don't have explicit reference
> counts have to do these checks...
> 
> IIUC, at the may_lookup() point, the RCU pathwalk doesn't have a
> fully validate reference count to the dentry or the inode at this
> point, so it seems accessing random objects attached to an inode
> that can be anywhere in the setup or teardown process isn't at all
> safe...

may_lookup() cannot encounter inodes in random states. It will start
from a well-known struct path and sample sequence counters for rename,
mount, and dentry changes. Each component will be subject to checks
after may_lookup() via these sequence counters to ensure that no change
occurred that would invalidate the lookup just done. To be precise to
ensure that no state could be reached via rcu that couldn't have been
reached via ref walk.

So may_lookup() may be called on something that's about to be freed
(concurrent unlink on a directory that's empty that we're trying to
create or lookup something nonexistent under) while we're looking at it
but all the machinery is in place so that it will be detected and force
a drop out of rcu and into reference walking mode.

When may_lookup() calls inode_permission() it only calls into the
filesystem itself if the filesystem has a custom i_op->permission()
handler. And if it has to call into the filesystem it passes
MAY_NOT_BLOCK to inform the filesystem about this. And in those cases
the filesystem must ensure any additional data structures can safely be
accessed under rcu_read_lock() (documented in path_lookup.rst).

If the filesystem detects that it cannot safely handle this or detects
that something is invalid it can return -ECHILD causing the VFS to drop
out of rcu and into ref walking mode to redo the lookup. That may happen
directly in may_lookup() it unconditionally happens in walk_component()
when it's verified that the parent had no changes while we were looking
at it.

The same logic extends to security modules. Both selinux and smack
handle MAY_NOT_BLOCK calls from security_inode_permission() with e.g.,
selinux returning -ECHILD in case the inode security context isn't
properly initialized causing the VFS to drop into ref walking mode and
allowing selinux to redo the initialization.

Checking inode state flags isn't needed because the VFS already handles
all of this via other means as e.g., sequence counters in various core
structs. It also likely wouldn't help because we'd have to take locks to
access i_state or sample i_state before calling into inode_permission()
and then it could still change behind our back. It's also the wrong
layer as we're dealing almost exclusively with dentries as the main data
structure during lookup.

Fwiw, a bunch of this is documented in path_lookup.rst, vfs.rst, and
porting.rst.

(I'm running out of time with other stuff so I want to point out that I
can't really spend a lot more time on this thread tbh.)
Paul Moore July 23, 2024, 7:48 p.m. UTC | #20
On Tue, Jul 23, 2024 at 5:27 AM Matus Jokay <matus.jokay@stuba.sk> wrote:
> On 22. 7. 2024 21:46, Paul Moore wrote:
> > On Mon, Jul 22, 2024 at 8:30 AM Matus Jokay <matus.jokay@stuba.sk> wrote:
> >> On 10. 7. 2024 12:40, Mickaël Salaün wrote:
> >>> On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote:
> >>>> The LSM framework has an existing inode_free_security() hook which
> >>>> is used by LSMs that manage state associated with an inode, but
> >>>> due to the use of RCU to protect the inode, special care must be
> >>>> taken to ensure that the LSMs do not fully release the inode state
> >>>> until it is safe from a RCU perspective.
> >>>>
> >>>> This patch implements a new inode_free_security_rcu() implementation
> >>>> hook which is called when it is safe to free the LSM's internal inode
> >>>> state.  Unfortunately, this new hook does not have access to the inode
> >>>> itself as it may already be released, so the existing
> >>>> inode_free_security() hook is retained for those LSMs which require
> >>>> access to the inode.
> >>>>
> >>>> Signed-off-by: Paul Moore <paul@paul-moore.com>
> >>>
> >>> I like this new hook.  It is definitely safer than the current approach.
> >>>
> >>> To make it more consistent, I think we should also rename
> >>> security_inode_free() to security_inode_put() to highlight the fact that
> >>> LSM implementations should not free potential pointers in this blob
> >>> because they could still be dereferenced in a path walk.
> >>>
> >>>> ---
> >>>>  include/linux/lsm_hook_defs.h     |  1 +
> >>>>  security/integrity/ima/ima.h      |  2 +-
> >>>>  security/integrity/ima/ima_iint.c | 20 ++++++++------------
> >>>>  security/integrity/ima/ima_main.c |  2 +-
> >>>>  security/landlock/fs.c            |  9 ++++++---
> >>>>  security/security.c               | 26 +++++++++++++-------------
> >>>>  6 files changed, 30 insertions(+), 30 deletions(-)
> >
> > ...
> >
> >> Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC:
> >>
> >> 1) How does this patch close [1]?
> >>    As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode."
> >>    Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission,
> >>    i.e. referencing the inode in a VFS path walk while destroying it...
> >>    Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy.
> >
> > The VFS folks can likely provide a better, or perhaps a more correct
> > answer, but my understanding is that during the path walk the inode is
> > protected by a RCU lock which allows for multiple threads to access
> > the inode simultaneously; this could result in some cases where one
> > thread is destroying the inode while another is accessing it.
> > Changing this would require changes to the VFS code, and I'm not sure
> > why you would want to change it anyway, the performance win of using
> > RCU here is likely significant.
> >
> >> 2) Is there a guarantee that inode_free_by_rcu and i_callback will be called within the same RCU grace period?
> >
> > I'm not an RCU expert, but I don't believe there are any guarantees
> > that the inode_free_by_rcu() and the inode's own free routines are
> > going to be called within the same RCU grace period (not really
> > applicable as inode_free_by_rcu() isn't called *during* a grace
> > period, but *after* the grace period of the associated
> > security_inode_free() call).  However, this patch does not rely on
> > synchronization between the inode and inode LSM free routine in
> > inode_free_by_rcu(); the inode_free_by_rcu() function and the new
> > inode_free_security_rcu() LSM callback does not have a pointer to the
> > inode, only the inode's LSM blob.  I agree that it is a bit odd, but
> > freeing the inode and inode's LSM blob independently of each other
> > should not cause a problem so long as the inode is no longer in use
> > (hence the RCU callbacks).
>
> Paul, many thanks for your answer.
>
> I will try to clarify the issue, because fsnotify was a bad example.
> Here is the related code taken from v10.
>
> void security_inode_free(struct inode *inode)
> {
>         call_void_hook(inode_free_security, inode);
>         /*
>          * The inode may still be referenced in a path walk and
>          * a call to security_inode_permission() can be made
>          * after inode_free_security() is called. Ideally, the VFS
>          * wouldn't do this, but fixing that is a much harder
>          * job. For now, simply free the i_security via RCU, and
>          * leave the current inode->i_security pointer intact.
>          * The inode will be freed after the RCU grace period too.
>          */
>         if (inode->i_security)
>                 call_rcu((struct rcu_head *)inode->i_security,
>                          inode_free_by_rcu);
> }
>
> void __destroy_inode(struct inode *inode)
> {
>         BUG_ON(inode_has_buffers(inode));
>         inode_detach_wb(inode);
>         security_inode_free(inode);
>         fsnotify_inode_delete(inode);
>         locks_free_lock_context(inode);
>         if (!inode->i_nlink) {
>                 WARN_ON(atomic_long_read(&inode->i_sb->s_remove_count) == 0);
>                 atomic_long_dec(&inode->i_sb->s_remove_count);
>         }
>
> #ifdef CONFIG_FS_POSIX_ACL
>         if (inode->i_acl && !is_uncached_acl(inode->i_acl))
>                 posix_acl_release(inode->i_acl);
>         if (inode->i_default_acl && !is_uncached_acl(inode->i_default_acl))
>                 posix_acl_release(inode->i_default_acl);
> #endif
>         this_cpu_dec(nr_inodes);
> }
>
> static void destroy_inode(struct inode *inode)
> {
>         const struct super_operations *ops = inode->i_sb->s_op;
>
>         BUG_ON(!list_empty(&inode->i_lru));
>         __destroy_inode(inode);
>         if (ops->destroy_inode) {
>                 ops->destroy_inode(inode);
>                 if (!ops->free_inode)
>                         return;
>         }
>         inode->free_inode = ops->free_inode;
>         call_rcu(&inode->i_rcu, i_callback);
> }
>
> Yes, inode_free_by_rcu() is being called after the grace period of the associated
> security_inode_free(). i_callback() is also called after the grace period, but is it
> always the same grace period as in the case of inode_free_by_rcu()? If not in general,
> maybe it could be a problem. Explanation below.
>
> If there is a function call leading to the end of the grace period between
> call_rcu(inode_free_by_rcu) and call_rcu(i_callback) (by reaching a CPU quiescent state
> or another mechanism?), there will be a small time window, when the inode security
> context is released, but the inode itself not, because call_rcu(i_callback) was not called
> yet. So in that case each access to inode security blob leads to UAF.

While it should be possible for the inode's LSM blob to be free'd
prior to the inode itself, the RCU callback mechanism provided by
call_rcu() should ensure that both the LSM's free routine and the
inode's free routine happen at a point in time after the current RCU
critical sections have lapsed and the inode is no longer being
accessed.  The LSM's inode_free_rcu callback can run independent of
the inode's callback as it doesn't access the inode and if it does
happen to run before the inode's RCU callback that should also be okay
as we are past the original RCU critical sections and the inode should
no longer be in use.  If the inode is still in use by the time the
LSM's RCU callback is triggered then there is a flaw in the inode
RCU/locking/synchronization code.

It is also worth mentioning that while this patch shuffles around some
code at the LSM layer, the basic idea of the LSM using a RCU callback
to free state associated with an inode is far from new.  While that
doesn't mean there isn't a problem, we have a few years of experience
across a large number of systems, that would indicate this isn't a
problem.

> For example, see invoking ops->destroy_inode() after call_rcu(inode_free_by_rcu) but
> *before* call_rcu(i_callback). If destroy_inode() may sleep, can be reached end of the
> grace period? destroy_inode() is *before* call_rcu(i_callback), therefore simultaneous
> access to the inode during path lookup may be performed. Note: I use destroy_inode() as
> *an example* of the idea. I'm not expert at all in fsnotify, posix ACL, VFS in general
> and RCU, too.
>
> In the previous message I only mentioned fsnotify, but it was only as an example.
> I think that destroy_inode() is a better example of the idea I wanted to express.
>
> I repeat that I'm aware that this RFC does not aim to solve this issue. But it can be
> unpleasant to get another UAF in a production environment.

I'm open to there being another fix needed, or a change to this fix,
but I don't believe the problems you are describing are possible.  Of
course it's very possible that I'm wrong, so if you are aware of an
issue I would appreciate a concrete example explaining the code path
and timeline between tasks A and B that would trigger the flaw ... and
yes, patches are always welcome ;)
Paul Moore July 23, 2024, 8:03 p.m. UTC | #21
On Tue, Jul 23, 2024 at 11:19 AM Christian Brauner <brauner@kernel.org> wrote:
> The same logic extends to security modules. Both selinux and smack
> handle MAY_NOT_BLOCK calls from security_inode_permission() with e.g.,
> selinux returning -ECHILD in case the inode security context isn't
> properly initialized causing the VFS to drop into ref walking mode and
> allowing selinux to redo the initialization.

Since we are talking mostly about the destruction of an inode, it is
worth mentioning that the SELinux -ECHILD case that Christian is
referring to isn't a common occurrence as SELinux only invalidates
inode labels on network filesystems under certain circumstances (chase
the security_inode_invalidate_secctx() hook).  On most normal SELinux
systems inodes are labeled as part of the creation process so long as
a SELinux policy is loaded into the kernel; this does mean that there
is a window during early boot where the inodes are in an invalid
state, but they are properly initialized later (there are different
ways this could happen).

For local filesystems with inodes created after the SELinux policy is
loaded, inodes have a valid SELinux label from their very creation up
until their memory is released.
Dave Chinner July 23, 2024, 11:17 p.m. UTC | #22
On Tue, Jul 23, 2024 at 05:19:40PM +0200, Christian Brauner wrote:
> On Tue, Jul 23, 2024 at 01:34:44PM GMT, Dave Chinner wrote:
> > All accesses to the VFS inode that don't have explicit reference
> > counts have to do these checks...
> > 
> > IIUC, at the may_lookup() point, the RCU pathwalk doesn't have a
> > fully validate reference count to the dentry or the inode at this
> > point, so it seems accessing random objects attached to an inode
> > that can be anywhere in the setup or teardown process isn't at all
> > safe...
> 
> may_lookup() cannot encounter inodes in random states. It will start
> from a well-known struct path and sample sequence counters for rename,
> mount, and dentry changes. Each component will be subject to checks
> after may_lookup() via these sequence counters to ensure that no change
> occurred that would invalidate the lookup just done. To be precise to
> ensure that no state could be reached via rcu that couldn't have been
> reached via ref walk.
> 
> So may_lookup() may be called on something that's about to be freed
> (concurrent unlink on a directory that's empty that we're trying to
> create or lookup something nonexistent under) while we're looking at it
> but all the machinery is in place so that it will be detected and force
> a drop out of rcu and into reference walking mode.

Yes, but...

> When may_lookup() calls inode_permission() it only calls into the
> filesystem itself if the filesystem has a custom i_op->permission()
> handler. And if it has to call into the filesystem it passes
> MAY_NOT_BLOCK to inform the filesystem about this. And in those cases
> the filesystem must ensure any additional data structures can safely be
> accessed under rcu_read_lock() (documented in path_lookup.rst).

The problem isn't the call into the filesystem - it's may_lookup()
passing the inode to security modules where we dereference
inode->i_security and assume that it is valid for the life of the
object access being made.

That's my point - if we have a lookup race and the inode is being
destroyed at this point (i.e. I_FREEING is set) inode->i_security
*is not valid* and should not be accessed by *anyone*.

> Checking inode state flags isn't needed because the VFS already handles
> all of this via other means as e.g., sequence counters in various core
> structs.

I don't think that is sufficient to avoid races with inode teardown.
The inode can be destroyed between the initial dentry count sequence
sampling and the "done processing, check that the seq count is
unchanged" validation to solidify the path.

We've seen this before with ->get_link fast path that stores the
link name in inode->i_link, and both inode->i_link and ->get_link
are accessed during RCU It is documented as such:

	If the filesystem stores the symlink target in ->i_link, the
        VFS may use it directly without calling ->get_link(); however,
        ->get_link() must still be provided.  ->i_link must not be
        freed until after an RCU grace period.  Writing to ->i_link
        post-iget() time requires a 'release' memory barrier.

XFS doesn't use RCU mode ->get_link or use ->i_link anymore, because
its has a corner case in it's inode life cycle since 2007 where it
can recycle inodes before the RCU grace period expires. It took 15
years for this issue to be noticed, but the fix for this specific
symptom is to not allow the VFS direct access to the underlying
filesystem memory that does not follow VFS inode RCU lifecycle
rules.

There was another symptom of this issue - ->get_link changed (i.e.
went to null) on certain types of inode reuse - and that caused
pathwalk to panic on a NULL pointer. Ian posted this fix for the
issue:

https://lore.kernel.org/linux-xfs/164180589176.86426.501271559065590169.stgit@mickey.themaw.net/

Which revalidates the dentry sequence number before calling
->get_link(). This clearly demonstrates we cannot rely on the
existing pathwalk dentry sequence number checks to catch an inode
concurrently moving into I_FREEING state and changing/freeing inode
object state concurrently in a way that affects the pathwalk
operation.

My point here is not about XFS - my point is that every object
attached to an inode that is accessed during a RCU pathwalk *must*
follow the same rules as memory attached to inode->i_link. The only
alternative to that (i.e. if we continue to free RCU pathwalk
visible objects in the evict() path) is to prevent pathwalk from
tripping over I_FREEING inodes.

If we decide that every pathwalk accessed object attached to the
inode needs to be RCU freed, then __destroy_inode() is the wrong
place to be freeing them - i_callback() (the call_rcu() inode
freeing callback) is the place to be freeing these objects. At this
point, the subsystems that own the objects don't have to care about
RCU at all - the objects have already gone through the necessary
grace period that makes them safe to be freed immediately.

That's a far better solution than forcing every LSM and FS developer
to have to fully understand how pathwalk and inode lifecycles
interact to get stuff right...

> It also likely wouldn't help because we'd have to take locks to
> access i_state or sample i_state before calling into inode_permission()
> and then it could still change behind our back. It's also the wrong
> layer as we're dealing almost exclusively with dentries as the main data
> structure during lookup.

Yup, I never said that's how we should fix the problem.  All I'm
stating is that pathwalk vs I_FREEING is currently not safe and that
I_FREEING is detectable from the pathwalk side. This observation
could help provide a solution, but it's almost certainly not the
solution to the problem...

-Dave.
Matus Jokay July 24, 2024, 10:20 a.m. UTC | #23
On 23. 7. 2024 21:48, Paul Moore wrote:
> On Tue, Jul 23, 2024 at 5:27 AM Matus Jokay <matus.jokay@stuba.sk> wrote:
>> On 22. 7. 2024 21:46, Paul Moore wrote:
>>> On Mon, Jul 22, 2024 at 8:30 AM Matus Jokay <matus.jokay@stuba.sk> wrote:
>>>> On 10. 7. 2024 12:40, Mickaël Salaün wrote:
>>>>> On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote:
>>>>>> The LSM framework has an existing inode_free_security() hook which
>>>>>> is used by LSMs that manage state associated with an inode, but
>>>>>> due to the use of RCU to protect the inode, special care must be
>>>>>> taken to ensure that the LSMs do not fully release the inode state
>>>>>> until it is safe from a RCU perspective.
>>>>>>
>>>>>> This patch implements a new inode_free_security_rcu() implementation
>>>>>> hook which is called when it is safe to free the LSM's internal inode
>>>>>> state.  Unfortunately, this new hook does not have access to the inode
>>>>>> itself as it may already be released, so the existing
>>>>>> inode_free_security() hook is retained for those LSMs which require
>>>>>> access to the inode.
>>>>>>
>>>>>> Signed-off-by: Paul Moore <paul@paul-moore.com>
>>>>>
>>>>> I like this new hook.  It is definitely safer than the current approach.
>>>>>
>>>>> To make it more consistent, I think we should also rename
>>>>> security_inode_free() to security_inode_put() to highlight the fact that
>>>>> LSM implementations should not free potential pointers in this blob
>>>>> because they could still be dereferenced in a path walk.
>>>>>
>>>>>> ---
>>>>>>  include/linux/lsm_hook_defs.h     |  1 +
>>>>>>  security/integrity/ima/ima.h      |  2 +-
>>>>>>  security/integrity/ima/ima_iint.c | 20 ++++++++------------
>>>>>>  security/integrity/ima/ima_main.c |  2 +-
>>>>>>  security/landlock/fs.c            |  9 ++++++---
>>>>>>  security/security.c               | 26 +++++++++++++-------------
>>>>>>  6 files changed, 30 insertions(+), 30 deletions(-)
>>>
>>> ...
>>>
>>>> Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC:
>>>>
>>>> 1) How does this patch close [1]?
>>>>    As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode."
>>>>    Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission,
>>>>    i.e. referencing the inode in a VFS path walk while destroying it...
>>>>    Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy.
>>>
>>> The VFS folks can likely provide a better, or perhaps a more correct
>>> answer, but my understanding is that during the path walk the inode is
>>> protected by a RCU lock which allows for multiple threads to access
>>> the inode simultaneously; this could result in some cases where one
>>> thread is destroying the inode while another is accessing it.
>>> Changing this would require changes to the VFS code, and I'm not sure
>>> why you would want to change it anyway, the performance win of using
>>> RCU here is likely significant.
>>>
>>>> 2) Is there a guarantee that inode_free_by_rcu and i_callback will be called within the same RCU grace period?
>>>
>>> I'm not an RCU expert, but I don't believe there are any guarantees
>>> that the inode_free_by_rcu() and the inode's own free routines are
>>> going to be called within the same RCU grace period (not really
>>> applicable as inode_free_by_rcu() isn't called *during* a grace
>>> period, but *after* the grace period of the associated
>>> security_inode_free() call).  However, this patch does not rely on
>>> synchronization between the inode and inode LSM free routine in
>>> inode_free_by_rcu(); the inode_free_by_rcu() function and the new
>>> inode_free_security_rcu() LSM callback does not have a pointer to the
>>> inode, only the inode's LSM blob.  I agree that it is a bit odd, but
>>> freeing the inode and inode's LSM blob independently of each other
>>> should not cause a problem so long as the inode is no longer in use
>>> (hence the RCU callbacks).
>>
>> Paul, many thanks for your answer.
>>
>> I will try to clarify the issue, because fsnotify was a bad example.
>> Here is the related code taken from v10.
>>
>> void security_inode_free(struct inode *inode)
>> {
>>         call_void_hook(inode_free_security, inode);
>>         /*
>>          * The inode may still be referenced in a path walk and
>>          * a call to security_inode_permission() can be made
>>          * after inode_free_security() is called. Ideally, the VFS
>>          * wouldn't do this, but fixing that is a much harder
>>          * job. For now, simply free the i_security via RCU, and
>>          * leave the current inode->i_security pointer intact.
>>          * The inode will be freed after the RCU grace period too.
>>          */
>>         if (inode->i_security)
>>                 call_rcu((struct rcu_head *)inode->i_security,
>>                          inode_free_by_rcu);
>> }
>>
>> void __destroy_inode(struct inode *inode)
>> {
>>         BUG_ON(inode_has_buffers(inode));
>>         inode_detach_wb(inode);
>>         security_inode_free(inode);
>>         fsnotify_inode_delete(inode);
>>         locks_free_lock_context(inode);
>>         if (!inode->i_nlink) {
>>                 WARN_ON(atomic_long_read(&inode->i_sb->s_remove_count) == 0);
>>                 atomic_long_dec(&inode->i_sb->s_remove_count);
>>         }
>>
>> #ifdef CONFIG_FS_POSIX_ACL
>>         if (inode->i_acl && !is_uncached_acl(inode->i_acl))
>>                 posix_acl_release(inode->i_acl);
>>         if (inode->i_default_acl && !is_uncached_acl(inode->i_default_acl))
>>                 posix_acl_release(inode->i_default_acl);
>> #endif
>>         this_cpu_dec(nr_inodes);
>> }
>>
>> static void destroy_inode(struct inode *inode)
>> {
>>         const struct super_operations *ops = inode->i_sb->s_op;
>>
>>         BUG_ON(!list_empty(&inode->i_lru));
>>         __destroy_inode(inode);
>>         if (ops->destroy_inode) {
>>                 ops->destroy_inode(inode);
>>                 if (!ops->free_inode)
>>                         return;
>>         }
>>         inode->free_inode = ops->free_inode;
>>         call_rcu(&inode->i_rcu, i_callback);
>> }
>>
>> Yes, inode_free_by_rcu() is being called after the grace period of the associated
>> security_inode_free(). i_callback() is also called after the grace period, but is it
>> always the same grace period as in the case of inode_free_by_rcu()? If not in general,
>> maybe it could be a problem. Explanation below.
>>
>> If there is a function call leading to the end of the grace period between
>> call_rcu(inode_free_by_rcu) and call_rcu(i_callback) (by reaching a CPU quiescent state
>> or another mechanism?), there will be a small time window, when the inode security
>> context is released, but the inode itself not, because call_rcu(i_callback) was not called
>> yet. So in that case each access to inode security blob leads to UAF.
> 
> While it should be possible for the inode's LSM blob to be free'd
> prior to the inode itself, the RCU callback mechanism provided by
> call_rcu() should ensure that both the LSM's free routine and the
> inode's free routine happen at a point in time after the current RCU
> critical sections have lapsed and the inode is no longer being
It is questionable whether the "current RCU CS" refers to both functions
together, see the diagram below.

> accessed.  The LSM's inode_free_rcu callback can run independent of
> the inode's callback as it doesn't access the inode and if it does
Agree, there are two independent calls as you described.

> happen to run before the inode's RCU callback that should also be okay
> as we are past the original RCU critical sections and the inode should
> no longer be in use.  If the inode is still in use by the time the
I think the inode can be still in use in may_lookup() after the security
RCU callback function, see below.

> LSM's RCU callback is triggered then there is a flaw in the inode
> RCU/locking/synchronization code.
I don't think it is a flaw. It may be the use of the RCU mechanism with
incorrect assumption, that both RCU callbacks belong to the common GP.
> 
> It is also worth mentioning that while this patch shuffles around some
> code at the LSM layer, the basic idea of the LSM using a RCU callback
> to free state associated with an inode is far from new.  While that
> doesn't mean there isn't a problem, we have a few years of experience
> across a large number of systems, that would indicate this isn't a
> problem.
I agree. But history only shows that it is very difficult to achieve this
race. And yes, I agree that we may address this issue when it turns out
to be relevant.

> 
>> For example, see invoking ops->destroy_inode() after call_rcu(inode_free_by_rcu) but
>> *before* call_rcu(i_callback). If destroy_inode() may sleep, can be reached end of the
>> grace period? destroy_inode() is *before* call_rcu(i_callback), therefore simultaneous
>> access to the inode during path lookup may be performed. Note: I use destroy_inode() as
>> *an example* of the idea. I'm not expert at all in fsnotify, posix ACL, VFS in general
>> and RCU, too.
>>
>> In the previous message I only mentioned fsnotify, but it was only as an example.
>> I think that destroy_inode() is a better example of the idea I wanted to express.
>>
>> I repeat that I'm aware that this RFC does not aim to solve this issue. But it can be
>> unpleasant to get another UAF in a production environment.
> 
> I'm open to there being another fix needed, or a change to this fix,
> but I don't believe the problems you are describing are possible.  Of
> course it's very possible that I'm wrong, so if you are aware of an
> issue I would appreciate a concrete example explaining the code path
> and timeline between tasks A and B that would trigger the flaw ... and
> yes, patches are always welcome ;)
> 
Oh patches... Even from the message from Dave it can be seen that the
cooperation of people from VFS and some very good idea of ​​a solution
are needed. Of course, provided that the scheme below was correct.
I would be very happy if someone could explain to me why this cannot be
so!

CPU related to
RCU callbacks            task A                                 task B
==================       =================================      =======================
                         ...
                         __destroy_inode()
                            ...
                            security_inode_free()
                               ...
                               call_rcu(inode_free_by_rcu)
                         ...
                         ops->destroy_inode() // *suppose* may sleep
// end of GP;
// inode *can* be used as
// i_callback does not
// belong to this GP
inode_free_by_rcu()
------------------------------------------------------------------------------------------------------
// start of another GP                                          ...
                         ...                                    rcu_read_lock()
                         call_rcu(i_callback)                   ...
                         ...                                    security_inode_permission() // <-- UAF
                                                                ...
                                                                rcu_read_unlock()
                                                                ...
// end of GP;
// right now inode is not in use anymore
i_callback()

Why is it difficult to achieve this race? The GP (grace period) between
two call_rcu() calls must come to an end. Again, I chose as an example
of this situation destroy_inode() function. But there can be others in
the code path, I really don't know.

I looked at the destroy_inode() functions (kernel v10) and from a quick
look I found overlayfs, which directly calls dput(), see [1].
If it is possible to force printk() to sleep, then it is possible to
consider afs [2] and, under certain circumstances, ext4 [3].
After a deeper analysis, maybe even more.

I think it is difficult to divide the GP exactly as this situation
requires. That's probably why this hasn't appeared yet. Or it is
impossible to achieve it. All the better. But that would require an audit
of the code between our two call_rcu()'s, whether at some point it cannot
come to the end of the GP.

And I agree with the opinion that as long as this type of error has not
yet occurred, we can just play possum. When it comes to the crunch, we
can deal with it more deeply.

[1] https://elixir.bootlin.com/linux/v6.10/source/fs/overlayfs/super.c#L181
[2] https://elixir.bootlin.com/linux/v6.10/source/fs/afs/super.c#L718
[3] https://elixir.bootlin.com/linux/v6.10/source/fs/ext4/super.c#L1448
Christian Brauner July 25, 2024, 10:52 a.m. UTC | #24
> The problem isn't the call into the filesystem - it's may_lookup()
> passing the inode to security modules where we dereference
> inode->i_security and assume that it is valid for the life of the
> object access being made.
> 
> That's my point - if we have a lookup race and the inode is being
> destroyed at this point (i.e. I_FREEING is set) inode->i_security
> *is not valid* and should not be accessed by *anyone*.

It's valid. The inode_free_security hooks unlist or deregister the
context as e.g., selinux_inode_free_security() does. It's fine to access
inode->i_security after the individual hooks have been called; before or
within the delayed freeing of i_security. And selinux and bpf are the
only cases where deregistration happens while also implementing or in
the case of bpf allowing to implement security_inode_permission().

> If we decide that every pathwalk accessed object attached to the
> inode needs to be RCU freed, then __destroy_inode() is the wrong
> place to be freeing them - i_callback() (the call_rcu() inode

The superblock might already be gone by the time free_inode() is called.
And stuff like selinux accesses the superblock from inode->i_sb during
security_inode_free_security(). So moving it in there isn't an option.
It needs to be called before. If one wanted it in there the obvious way
to do it would be to split deregistration and freeing into two hooks
security_inode_deregister() and then move the rcu part of
security_inode_free() into i_callback so it gets wasted together with
the inode. But i_callback() isn't called for xfs and so the way it's
currently done is so far the best.
diff mbox series

Patch

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 8fd87f823d3a..abe6b0ef892a 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -114,6 +114,7 @@  LSM_HOOK(int, 0, path_notify, const struct path *path, u64 mask,
 	 unsigned int obj_type)
 LSM_HOOK(int, 0, inode_alloc_security, struct inode *inode)
 LSM_HOOK(void, LSM_RET_VOID, inode_free_security, struct inode *inode)
+LSM_HOOK(void, LSM_RET_VOID, inode_free_security_rcu, void *)
 LSM_HOOK(int, -EOPNOTSUPP, inode_init_security, struct inode *inode,
 	 struct inode *dir, const struct qstr *qstr, struct xattr *xattrs,
 	 int *xattr_count)
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 3e568126cd48..e2a2e4c7eab6 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -223,7 +223,7 @@  static inline void ima_inode_set_iint(const struct inode *inode,
 
 struct ima_iint_cache *ima_iint_find(struct inode *inode);
 struct ima_iint_cache *ima_inode_get(struct inode *inode);
-void ima_inode_free(struct inode *inode);
+void ima_inode_free_rcu(void *inode_sec);
 void __init ima_iintcache_init(void);
 
 extern const int read_idmap[];
diff --git a/security/integrity/ima/ima_iint.c b/security/integrity/ima/ima_iint.c
index e23412a2c56b..54480df90bdc 100644
--- a/security/integrity/ima/ima_iint.c
+++ b/security/integrity/ima/ima_iint.c
@@ -109,22 +109,18 @@  struct ima_iint_cache *ima_inode_get(struct inode *inode)
 }
 
 /**
- * ima_inode_free - Called on inode free
- * @inode: Pointer to the inode
+ * ima_inode_free_rcu - Called to free an inode via a RCU callback
+ * @inode_sec: The inode::i_security pointer
  *
- * Free the iint associated with an inode.
+ * Free the IMA data associated with an inode.
  */
-void ima_inode_free(struct inode *inode)
+void ima_inode_free_rcu(void *inode_sec)
 {
-	struct ima_iint_cache *iint;
+	struct ima_iint_cache **iint_p = inode_sec + ima_blob_sizes.lbs_inode;
 
-	if (!IS_IMA(inode))
-		return;
-
-	iint = ima_iint_find(inode);
-	ima_inode_set_iint(inode, NULL);
-
-	ima_iint_free(iint);
+	/* *iint_p should be NULL if !IS_IMA(inode) */
+	if (*iint_p)
+		ima_iint_free(*iint_p);
 }
 
 static void ima_iint_init_once(void *foo)
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index f04f43af651c..5b3394864b21 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -1193,7 +1193,7 @@  static struct security_hook_list ima_hooks[] __ro_after_init = {
 #ifdef CONFIG_INTEGRITY_ASYMMETRIC_KEYS
 	LSM_HOOK_INIT(kernel_module_request, ima_kernel_module_request),
 #endif
-	LSM_HOOK_INIT(inode_free_security, ima_inode_free),
+	LSM_HOOK_INIT(inode_free_security_rcu, ima_inode_free_rcu),
 };
 
 static const struct lsm_id ima_lsmid = {
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index 22d8b7c28074..f583f8cec345 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -1198,13 +1198,16 @@  static int current_check_refer_path(struct dentry *const old_dentry,
 
 /* Inode hooks */
 
-static void hook_inode_free_security(struct inode *const inode)
+static void hook_inode_free_security_rcu(void *inode_sec)
 {
+	struct landlock_inode_security *lisec;
+
 	/*
 	 * All inodes must already have been untied from their object by
 	 * release_inode() or hook_sb_delete().
 	 */
-	WARN_ON_ONCE(landlock_inode(inode)->object);
+	lisec = inode_sec + landlock_blob_sizes.lbs_inode;
+	WARN_ON_ONCE(lisec->object);
 }
 
 /* Super-block hooks */
@@ -1628,7 +1631,7 @@  static int hook_file_ioctl_compat(struct file *file, unsigned int cmd,
 }
 
 static struct security_hook_list landlock_hooks[] __ro_after_init = {
-	LSM_HOOK_INIT(inode_free_security, hook_inode_free_security),
+	LSM_HOOK_INIT(inode_free_security_rcu, hook_inode_free_security_rcu),
 
 	LSM_HOOK_INIT(sb_delete, hook_sb_delete),
 	LSM_HOOK_INIT(sb_mount, hook_sb_mount),
diff --git a/security/security.c b/security/security.c
index b52e81ac5526..bc6805f7332e 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1596,9 +1596,8 @@  int security_inode_alloc(struct inode *inode)
 
 static void inode_free_by_rcu(struct rcu_head *head)
 {
-	/*
-	 * The rcu head is at the start of the inode blob
-	 */
+	/* The rcu head is at the start of the inode blob */
+	call_void_hook(inode_free_security_rcu, head);
 	kmem_cache_free(lsm_inode_cache, head);
 }
 
@@ -1606,20 +1605,21 @@  static void inode_free_by_rcu(struct rcu_head *head)
  * security_inode_free() - Free an inode's LSM blob
  * @inode: the inode
  *
- * Deallocate the inode security structure and set @inode->i_security to NULL.
+ * Release any LSM resources associated with @inode, although due to the
+ * inode's RCU protections it is possible that the resources will not be
+ * fully released until after the current RCU grace period has elapsed.
+ *
+ * It is important for LSMs to note that despite being present in a call to
+ * security_inode_free(), @inode may still be referenced in a VFS path walk
+ * and calls to security_inode_permission() may be made during, or after,
+ * a call to security_inode_free().  For this reason the inode->i_security
+ * field is released via a call_rcu() callback and any LSMs which need to
+ * retain inode state for use in security_inode_permission() should only
+ * release that state in the inode_free_security_rcu() LSM hook callback.
  */
 void security_inode_free(struct inode *inode)
 {
 	call_void_hook(inode_free_security, inode);
-	/*
-	 * The inode may still be referenced in a path walk and
-	 * a call to security_inode_permission() can be made
-	 * after inode_free_security() is called. Ideally, the VFS
-	 * wouldn't do this, but fixing that is a much harder
-	 * job. For now, simply free the i_security via RCU, and
-	 * leave the current inode->i_security pointer intact.
-	 * The inode will be freed after the RCU grace period too.
-	 */
 	if (inode->i_security)
 		call_rcu((struct rcu_head *)inode->i_security,
 			 inode_free_by_rcu);