Message ID | 20240710024029.669314-2-paul@paul-moore.com (mailing list archive) |
---|---|
State | Handled Elsewhere |
Delegated to: | Paul Moore |
Headers | show |
Series | [RFC] lsm: add the inode_free_security_rcu() LSM implementation hook | expand |
On Tue, Jul 9, 2024 at 10:40 PM Paul Moore <paul@paul-moore.com> wrote: > > The LSM framework has an existing inode_free_security() hook which > is used by LSMs that manage state associated with an inode, but > due to the use of RCU to protect the inode, special care must be > taken to ensure that the LSMs do not fully release the inode state > until it is safe from a RCU perspective. > > This patch implements a new inode_free_security_rcu() implementation > hook which is called when it is safe to free the LSM's internal inode > state. Unfortunately, this new hook does not have access to the inode > itself as it may already be released, so the existing > inode_free_security() hook is retained for those LSMs which require > access to the inode. > > Signed-off-by: Paul Moore <paul@paul-moore.com> > --- > include/linux/lsm_hook_defs.h | 1 + > security/integrity/ima/ima.h | 2 +- > security/integrity/ima/ima_iint.c | 20 ++++++++------------ > security/integrity/ima/ima_main.c | 2 +- > security/landlock/fs.c | 9 ++++++--- > security/security.c | 26 +++++++++++++------------- > 6 files changed, 30 insertions(+), 30 deletions(-) FYI, this has only received "light" testing, and even that is fairly generous. I booted up a system with IMA set to measure the TCB and ran through the audit and SELinux test suites; IMA seemed to be working just fine but I didn't poke at it too hard. I didn't have an explicit Landlock test handy, but I'm hoping that the Landlock enablement on a modern Rawhide system hit it a little :)
On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote: > The LSM framework has an existing inode_free_security() hook which > is used by LSMs that manage state associated with an inode, but > due to the use of RCU to protect the inode, special care must be > taken to ensure that the LSMs do not fully release the inode state > until it is safe from a RCU perspective. > > This patch implements a new inode_free_security_rcu() implementation > hook which is called when it is safe to free the LSM's internal inode > state. Unfortunately, this new hook does not have access to the inode > itself as it may already be released, so the existing > inode_free_security() hook is retained for those LSMs which require > access to the inode. > > Signed-off-by: Paul Moore <paul@paul-moore.com> I like this new hook. It is definitely safer than the current approach. To make it more consistent, I think we should also rename security_inode_free() to security_inode_put() to highlight the fact that LSM implementations should not free potential pointers in this blob because they could still be dereferenced in a path walk. > --- > include/linux/lsm_hook_defs.h | 1 + > security/integrity/ima/ima.h | 2 +- > security/integrity/ima/ima_iint.c | 20 ++++++++------------ > security/integrity/ima/ima_main.c | 2 +- > security/landlock/fs.c | 9 ++++++--- > security/security.c | 26 +++++++++++++------------- > 6 files changed, 30 insertions(+), 30 deletions(-) > > diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h > index 8fd87f823d3a..abe6b0ef892a 100644 > --- a/include/linux/lsm_hook_defs.h > +++ b/include/linux/lsm_hook_defs.h > @@ -114,6 +114,7 @@ LSM_HOOK(int, 0, path_notify, const struct path *path, u64 mask, > unsigned int obj_type) > LSM_HOOK(int, 0, inode_alloc_security, struct inode *inode) > LSM_HOOK(void, LSM_RET_VOID, inode_free_security, struct inode *inode) > +LSM_HOOK(void, LSM_RET_VOID, inode_free_security_rcu, void *) > LSM_HOOK(int, -EOPNOTSUPP, inode_init_security, struct inode *inode, > struct inode *dir, const struct qstr *qstr, struct xattr *xattrs, > int *xattr_count) > diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h > index 3e568126cd48..e2a2e4c7eab6 100644 > --- a/security/integrity/ima/ima.h > +++ b/security/integrity/ima/ima.h > @@ -223,7 +223,7 @@ static inline void ima_inode_set_iint(const struct inode *inode, > > struct ima_iint_cache *ima_iint_find(struct inode *inode); > struct ima_iint_cache *ima_inode_get(struct inode *inode); > -void ima_inode_free(struct inode *inode); > +void ima_inode_free_rcu(void *inode_sec); > void __init ima_iintcache_init(void); > > extern const int read_idmap[]; > diff --git a/security/integrity/ima/ima_iint.c b/security/integrity/ima/ima_iint.c > index e23412a2c56b..54480df90bdc 100644 > --- a/security/integrity/ima/ima_iint.c > +++ b/security/integrity/ima/ima_iint.c > @@ -109,22 +109,18 @@ struct ima_iint_cache *ima_inode_get(struct inode *inode) > } > > /** > - * ima_inode_free - Called on inode free > - * @inode: Pointer to the inode > + * ima_inode_free_rcu - Called to free an inode via a RCU callback > + * @inode_sec: The inode::i_security pointer > * > - * Free the iint associated with an inode. > + * Free the IMA data associated with an inode. > */ > -void ima_inode_free(struct inode *inode) > +void ima_inode_free_rcu(void *inode_sec) > { > - struct ima_iint_cache *iint; > + struct ima_iint_cache **iint_p = inode_sec + ima_blob_sizes.lbs_inode; > > - if (!IS_IMA(inode)) > - return; > - > - iint = ima_iint_find(inode); > - ima_inode_set_iint(inode, NULL); > - > - ima_iint_free(iint); > + /* *iint_p should be NULL if !IS_IMA(inode) */ > + if (*iint_p) > + ima_iint_free(*iint_p); > } > > static void ima_iint_init_once(void *foo) > diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c > index f04f43af651c..5b3394864b21 100644 > --- a/security/integrity/ima/ima_main.c > +++ b/security/integrity/ima/ima_main.c > @@ -1193,7 +1193,7 @@ static struct security_hook_list ima_hooks[] __ro_after_init = { > #ifdef CONFIG_INTEGRITY_ASYMMETRIC_KEYS > LSM_HOOK_INIT(kernel_module_request, ima_kernel_module_request), > #endif > - LSM_HOOK_INIT(inode_free_security, ima_inode_free), > + LSM_HOOK_INIT(inode_free_security_rcu, ima_inode_free_rcu), > }; > > static const struct lsm_id ima_lsmid = { > diff --git a/security/landlock/fs.c b/security/landlock/fs.c > index 22d8b7c28074..f583f8cec345 100644 > --- a/security/landlock/fs.c > +++ b/security/landlock/fs.c > @@ -1198,13 +1198,16 @@ static int current_check_refer_path(struct dentry *const old_dentry, > > /* Inode hooks */ > > -static void hook_inode_free_security(struct inode *const inode) > +static void hook_inode_free_security_rcu(void *inode_sec) > { > + struct landlock_inode_security *lisec; Please rename "lisec" to "inode_sec" for consistency with get_inode_object()'s variables. > + > /* > * All inodes must already have been untied from their object by > * release_inode() or hook_sb_delete(). > */ > - WARN_ON_ONCE(landlock_inode(inode)->object); > + lisec = inode_sec + landlock_blob_sizes.lbs_inode; > + WARN_ON_ONCE(lisec->object); > } This looks good to me. We can add these footers: Reported-by: syzbot+5446fbf332b0602ede0b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/00000000000076ba3b0617f65cc8@google.com However, I'm wondering if we could backport this patch down to v5.15 . I guess not, so I'll need to remove this hook implementation for Landlock, backport it to v5.15, and then you'll need to re-add this check with this patch. At least it has been useful to spot this inode issue, but it could still be useful to spot potential memory leaks with a negligible performance impact. > > /* Super-block hooks */ > @@ -1628,7 +1631,7 @@ static int hook_file_ioctl_compat(struct file *file, unsigned int cmd, > } > > static struct security_hook_list landlock_hooks[] __ro_after_init = { > - LSM_HOOK_INIT(inode_free_security, hook_inode_free_security), > + LSM_HOOK_INIT(inode_free_security_rcu, hook_inode_free_security_rcu), > > LSM_HOOK_INIT(sb_delete, hook_sb_delete), > LSM_HOOK_INIT(sb_mount, hook_sb_mount), > diff --git a/security/security.c b/security/security.c > index b52e81ac5526..bc6805f7332e 100644 > --- a/security/security.c > +++ b/security/security.c > @@ -1596,9 +1596,8 @@ int security_inode_alloc(struct inode *inode) > > static void inode_free_by_rcu(struct rcu_head *head) > { > - /* > - * The rcu head is at the start of the inode blob > - */ > + /* The rcu head is at the start of the inode blob */ > + call_void_hook(inode_free_security_rcu, head); For this to work, we need to extend the inode blob size (lbs_inode) with sizeof(struct rcu_head). The current implementation override the content of the blob with a new rcu_head. > kmem_cache_free(lsm_inode_cache, head); > } > > @@ -1606,20 +1605,21 @@ static void inode_free_by_rcu(struct rcu_head *head) > * security_inode_free() - Free an inode's LSM blob > * @inode: the inode > * > - * Deallocate the inode security structure and set @inode->i_security to NULL. > + * Release any LSM resources associated with @inode, although due to the > + * inode's RCU protections it is possible that the resources will not be > + * fully released until after the current RCU grace period has elapsed. > + * > + * It is important for LSMs to note that despite being present in a call to > + * security_inode_free(), @inode may still be referenced in a VFS path walk > + * and calls to security_inode_permission() may be made during, or after, > + * a call to security_inode_free(). For this reason the inode->i_security > + * field is released via a call_rcu() callback and any LSMs which need to > + * retain inode state for use in security_inode_permission() should only > + * release that state in the inode_free_security_rcu() LSM hook callback. > */ > void security_inode_free(struct inode *inode) > { > call_void_hook(inode_free_security, inode); > - /* > - * The inode may still be referenced in a path walk and > - * a call to security_inode_permission() can be made > - * after inode_free_security() is called. Ideally, the VFS > - * wouldn't do this, but fixing that is a much harder > - * job. For now, simply free the i_security via RCU, and > - * leave the current inode->i_security pointer intact. > - * The inode will be freed after the RCU grace period too. > - */ > if (inode->i_security) > call_rcu((struct rcu_head *)inode->i_security, > inode_free_by_rcu); We should have something like: call_rcu(inode->i_security.rcu, inode_free_by_rcu); > -- > 2.45.2 > >
On Tue, Jul 09, 2024 at 10:47:45PM -0400, Paul Moore wrote: > On Tue, Jul 9, 2024 at 10:40 PM Paul Moore <paul@paul-moore.com> wrote: > > > > The LSM framework has an existing inode_free_security() hook which > > is used by LSMs that manage state associated with an inode, but > > due to the use of RCU to protect the inode, special care must be > > taken to ensure that the LSMs do not fully release the inode state > > until it is safe from a RCU perspective. > > > > This patch implements a new inode_free_security_rcu() implementation > > hook which is called when it is safe to free the LSM's internal inode > > state. Unfortunately, this new hook does not have access to the inode > > itself as it may already be released, so the existing > > inode_free_security() hook is retained for those LSMs which require > > access to the inode. > > > > Signed-off-by: Paul Moore <paul@paul-moore.com> > > --- > > include/linux/lsm_hook_defs.h | 1 + > > security/integrity/ima/ima.h | 2 +- > > security/integrity/ima/ima_iint.c | 20 ++++++++------------ > > security/integrity/ima/ima_main.c | 2 +- > > security/landlock/fs.c | 9 ++++++--- > > security/security.c | 26 +++++++++++++------------- > > 6 files changed, 30 insertions(+), 30 deletions(-) > > FYI, this has only received "light" testing, and even that is fairly > generous. I booted up a system with IMA set to measure the TCB and > ran through the audit and SELinux test suites; IMA seemed to be > working just fine but I didn't poke at it too hard. I didn't have an > explicit Landlock test handy, but I'm hoping that the Landlock > enablement on a modern Rawhide system hit it a little :) If you want to test Landlock, you can do so like this: cd tools/testing/selftests/landlock make -C ../../../.. headers_install make for f in *_test; ./$f; done ...or you can build and run everything (on UML) with `./check-linux build kselftest' provided here: https://github.com/landlock-lsm/landlock-test-tools ...or, even simpler, you can run all checks by running `./docker-run.sh debian/sid` for instance. I need to update the kernel doc with these commands. > > -- > paul-moore.com >
On Wed, Jul 10, 2024 at 6:40 AM Mickaël Salaün <mic@digikod.net> wrote: > On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote: > > The LSM framework has an existing inode_free_security() hook which > > is used by LSMs that manage state associated with an inode, but > > due to the use of RCU to protect the inode, special care must be > > taken to ensure that the LSMs do not fully release the inode state > > until it is safe from a RCU perspective. > > > > This patch implements a new inode_free_security_rcu() implementation > > hook which is called when it is safe to free the LSM's internal inode > > state. Unfortunately, this new hook does not have access to the inode > > itself as it may already be released, so the existing > > inode_free_security() hook is retained for those LSMs which require > > access to the inode. > > > > Signed-off-by: Paul Moore <paul@paul-moore.com> > > I like this new hook. It is definitely safer than the current approach. It would be nicer to simply have a single inode free hook, but it doesn't look like that is a possibility due to the inode synchronization methods and differing lifetimes of inodes and their associated LSM state. At least the workaround isn't too ugly :) > To make it more consistent, I think we should also rename > security_inode_free() to security_inode_put() to highlight the fact that > LSM implementations should not free potential pointers in this blob > because they could still be dereferenced in a path walk. I'd prefer to keep the naming as-is in this patch since security_inode_free() does trigger the free'ing/release of the LSM state associated with the inode. > > diff --git a/security/landlock/fs.c b/security/landlock/fs.c > > index 22d8b7c28074..f583f8cec345 100644 > > --- a/security/landlock/fs.c > > +++ b/security/landlock/fs.c > > @@ -1198,13 +1198,16 @@ static int current_check_refer_path(struct dentry *const old_dentry, > > > > /* Inode hooks */ > > > > -static void hook_inode_free_security(struct inode *const inode) > > +static void hook_inode_free_security_rcu(void *inode_sec) > > { > > + struct landlock_inode_security *lisec; > > Please rename "lisec" to "inode_sec" for consistency with > get_inode_object()'s variables. Done. That did conflict with the parameter name so I renamed the parameter everywhere to "inode_security". > > /* > > * All inodes must already have been untied from their object by > > * release_inode() or hook_sb_delete(). > > */ > > - WARN_ON_ONCE(landlock_inode(inode)->object); > > + lisec = inode_sec + landlock_blob_sizes.lbs_inode; > > + WARN_ON_ONCE(lisec->object); > > } > > This looks good to me. > > We can add these footers: > Reported-by: syzbot+5446fbf332b0602ede0b@syzkaller.appspotmail.com > Closes: https://lore.kernel.org/r/00000000000076ba3b0617f65cc8@google.com Thanks, metadata added. This was a quick RFC patch so I didn't want to spend a lot of time chasing down metadata refs until I knew there was some basic support for this approach. I still want to make sure it is okay with the IMA folks. > However, I'm wondering if we could backport this patch down to v5.15 . > I guess not, so I'll need to remove this hook implementation for > Landlock, backport it to v5.15, and then you'll need to re-add this > check with this patch. At least it has been useful to spot this inode > issue, but it could still be useful to spot potential memory leaks with > a negligible performance impact. Yes, it's a bit complicated with the IMA/EVM promotion happening fairly recently. I'm marking the patch with a stable tag, but considering we're at -rc7 and this needs at least one more respin, testing, ACKs, etc. it might not land in Linus' tree until sometime post v6.11-rc1. Considering that, I might suggest dropping the Landlock hook in -stable and re-adding it to Linus' tree once this fix lands, but that decision is up to you. > > diff --git a/security/security.c b/security/security.c > > index b52e81ac5526..bc6805f7332e 100644 > > --- a/security/security.c > > +++ b/security/security.c > > @@ -1596,9 +1596,8 @@ int security_inode_alloc(struct inode *inode) > > > > static void inode_free_by_rcu(struct rcu_head *head) > > { > > - /* > > - * The rcu head is at the start of the inode blob > > - */ > > + /* The rcu head is at the start of the inode blob */ > > + call_void_hook(inode_free_security_rcu, head); > > For this to work, we need to extend the inode blob size (lbs_inode) with > sizeof(struct rcu_head). The current implementation override the > content of the blob with a new rcu_head. Take a look at lsm_set_blob_sizes() and you'll see that the rcu_head struct is already accounted for in the inode->i_security blob. > > @@ -1606,20 +1605,21 @@ static void inode_free_by_rcu(struct rcu_head *head) > > * security_inode_free() - Free an inode's LSM blob > > * @inode: the inode > > * > > - * Deallocate the inode security structure and set @inode->i_security to NULL. > > + * Release any LSM resources associated with @inode, although due to the > > + * inode's RCU protections it is possible that the resources will not be > > + * fully released until after the current RCU grace period has elapsed. > > + * > > + * It is important for LSMs to note that despite being present in a call to > > + * security_inode_free(), @inode may still be referenced in a VFS path walk > > + * and calls to security_inode_permission() may be made during, or after, > > + * a call to security_inode_free(). For this reason the inode->i_security > > + * field is released via a call_rcu() callback and any LSMs which need to > > + * retain inode state for use in security_inode_permission() should only > > + * release that state in the inode_free_security_rcu() LSM hook callback. > > */ > > void security_inode_free(struct inode *inode) > > { > > call_void_hook(inode_free_security, inode); > > - /* > > - * The inode may still be referenced in a path walk and > > - * a call to security_inode_permission() can be made > > - * after inode_free_security() is called. Ideally, the VFS > > - * wouldn't do this, but fixing that is a much harder > > - * job. For now, simply free the i_security via RCU, and > > - * leave the current inode->i_security pointer intact. > > - * The inode will be freed after the RCU grace period too. > > - */ > > if (inode->i_security) > > call_rcu((struct rcu_head *)inode->i_security, > > inode_free_by_rcu); > > We should have something like: > call_rcu(inode->i_security.rcu, inode_free_by_rcu); The unfortunate side effect of that is that the patch grows significantly as everyone that touches inode->i_security would need to be updated to use inode->i_security.data or something similar. For a patch that we likely want to see backported to -stable that could make things more difficult than needed. However, I have always thought we should add some better structure/typing to these opaque LSM blobs both to get away from the raw pointer math and add a marginal layer of safety. I've envisioned doing something like this: struct lsm_blob_inode { struct selinux_blob_inode selinux; struct smack_blob_inode smack; struct aa_blob_inode apparmor; ... struct rcu_head rcu; } ... with lsm_blob_inode allocated and assigned to inode->i_security. Those LSMs that are enabled and require blob space would define their struct with the necessary fields, those that were disabled or did not require space would define an empty struct; some minor pre-processor checking and header file work would be needed, but it shouldn't be too bad. Ideally, we would use the same approach for all of the LSM security blobs, only with different LSM supplied structs. Perhaps once we land Casey's latest patchset I'll see about doing something like that, but right now we've got bigger problems to address. I'll hold off on posting a proper patch for this RFC until I hear back from Mini or Roberto on the IMA changes.
On Wed, Jul 10, 2024 at 8:02 AM Mickaël Salaün <mic@digikod.net> wrote: > On Tue, Jul 09, 2024 at 10:47:45PM -0400, Paul Moore wrote: > > On Tue, Jul 9, 2024 at 10:40 PM Paul Moore <paul@paul-moore.com> wrote: > > > > > > The LSM framework has an existing inode_free_security() hook which > > > is used by LSMs that manage state associated with an inode, but > > > due to the use of RCU to protect the inode, special care must be > > > taken to ensure that the LSMs do not fully release the inode state > > > until it is safe from a RCU perspective. > > > > > > This patch implements a new inode_free_security_rcu() implementation > > > hook which is called when it is safe to free the LSM's internal inode > > > state. Unfortunately, this new hook does not have access to the inode > > > itself as it may already be released, so the existing > > > inode_free_security() hook is retained for those LSMs which require > > > access to the inode. > > > > > > Signed-off-by: Paul Moore <paul@paul-moore.com> > > > --- > > > include/linux/lsm_hook_defs.h | 1 + > > > security/integrity/ima/ima.h | 2 +- > > > security/integrity/ima/ima_iint.c | 20 ++++++++------------ > > > security/integrity/ima/ima_main.c | 2 +- > > > security/landlock/fs.c | 9 ++++++--- > > > security/security.c | 26 +++++++++++++------------- > > > 6 files changed, 30 insertions(+), 30 deletions(-) > > > > FYI, this has only received "light" testing, and even that is fairly > > generous. I booted up a system with IMA set to measure the TCB and > > ran through the audit and SELinux test suites; IMA seemed to be > > working just fine but I didn't poke at it too hard. I didn't have an > > explicit Landlock test handy, but I'm hoping that the Landlock > > enablement on a modern Rawhide system hit it a little :) > > If you want to test Landlock, you can do so like this: > > cd tools/testing/selftests/landlock > make -C ../../../.. headers_install > make > for f in *_test; ./$f; done Looks okay? % for f in *_test; do ./$f; done | grep "^# Totals" # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0 # SKIP overlayfs is not supported (setup) # SKIP overlayfs is not supported (setup) # SKIP this filesystem is not supported (setup) # SKIP this filesystem is not supported (setup) # SKIP this filesystem is not supported (setup) # SKIP this filesystem is not supported (setup) # SKIP this filesystem is not supported (setup) # Totals: pass:117 fail:0 xfail:0 xpass:0 skip:7 error:0 # Totals: pass:84 fail:0 xfail:0 xpass:0 skip:0 error:0 # Totals: pass:8 fail:0 xfail:0 xpass:0 skip:0 error:0
On 7/10/2024 9:20 AM, Paul Moore wrote: > On Wed, Jul 10, 2024 at 6:40 AM Mickaël Salaün <mic@digikod.net> wrote: >> On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote: >>> The LSM framework has an existing inode_free_security() hook which >>> is used by LSMs that manage state associated with an inode, but >>> due to the use of RCU to protect the inode, special care must be >>> taken to ensure that the LSMs do not fully release the inode state >>> until it is safe from a RCU perspective. >>> >>> This patch implements a new inode_free_security_rcu() implementation >>> hook which is called when it is safe to free the LSM's internal inode >>> state. Unfortunately, this new hook does not have access to the inode >>> itself as it may already be released, so the existing >>> inode_free_security() hook is retained for those LSMs which require >>> access to the inode. >>> >>> Signed-off-by: Paul Moore <paul@paul-moore.com> >> I like this new hook. It is definitely safer than the current approach. > It would be nicer to simply have a single inode free hook, but it > doesn't look like that is a possibility due to the inode > synchronization methods and differing lifetimes of inodes and their > associated LSM state. At least the workaround isn't too ugly :) > >> To make it more consistent, I think we should also rename >> security_inode_free() to security_inode_put() to highlight the fact that >> LSM implementations should not free potential pointers in this blob >> because they could still be dereferenced in a path walk. > I'd prefer to keep the naming as-is in this patch since > security_inode_free() does trigger the free'ing/release of the LSM > state associated with the inode. > >>> diff --git a/security/landlock/fs.c b/security/landlock/fs.c >>> index 22d8b7c28074..f583f8cec345 100644 >>> --- a/security/landlock/fs.c >>> +++ b/security/landlock/fs.c >>> @@ -1198,13 +1198,16 @@ static int current_check_refer_path(struct dentry *const old_dentry, >>> >>> /* Inode hooks */ >>> >>> -static void hook_inode_free_security(struct inode *const inode) >>> +static void hook_inode_free_security_rcu(void *inode_sec) >>> { >>> + struct landlock_inode_security *lisec; >> Please rename "lisec" to "inode_sec" for consistency with >> get_inode_object()'s variables. > Done. That did conflict with the parameter name so I renamed the > parameter everywhere to "inode_security". > >>> /* >>> * All inodes must already have been untied from their object by >>> * release_inode() or hook_sb_delete(). >>> */ >>> - WARN_ON_ONCE(landlock_inode(inode)->object); >>> + lisec = inode_sec + landlock_blob_sizes.lbs_inode; >>> + WARN_ON_ONCE(lisec->object); >>> } >> This looks good to me. >> >> We can add these footers: >> Reported-by: syzbot+5446fbf332b0602ede0b@syzkaller.appspotmail.com >> Closes: https://lore.kernel.org/r/00000000000076ba3b0617f65cc8@google.com > Thanks, metadata added. This was a quick RFC patch so I didn't want > to spend a lot of time chasing down metadata refs until I knew there > was some basic support for this approach. I still want to make sure > it is okay with the IMA folks. > >> However, I'm wondering if we could backport this patch down to v5.15 . >> I guess not, so I'll need to remove this hook implementation for >> Landlock, backport it to v5.15, and then you'll need to re-add this >> check with this patch. At least it has been useful to spot this inode >> issue, but it could still be useful to spot potential memory leaks with >> a negligible performance impact. > Yes, it's a bit complicated with the IMA/EVM promotion happening > fairly recently. I'm marking the patch with a stable tag, but > considering we're at -rc7 and this needs at least one more respin, > testing, ACKs, etc. it might not land in Linus' tree until sometime > post v6.11-rc1. Considering that, I might suggest dropping the > Landlock hook in -stable and re-adding it to Linus' tree once this fix > lands, but that decision is up to you. > >>> diff --git a/security/security.c b/security/security.c >>> index b52e81ac5526..bc6805f7332e 100644 >>> --- a/security/security.c >>> +++ b/security/security.c >>> @@ -1596,9 +1596,8 @@ int security_inode_alloc(struct inode *inode) >>> >>> static void inode_free_by_rcu(struct rcu_head *head) >>> { >>> - /* >>> - * The rcu head is at the start of the inode blob >>> - */ >>> + /* The rcu head is at the start of the inode blob */ >>> + call_void_hook(inode_free_security_rcu, head); >> For this to work, we need to extend the inode blob size (lbs_inode) with >> sizeof(struct rcu_head). The current implementation override the >> content of the blob with a new rcu_head. > Take a look at lsm_set_blob_sizes() and you'll see that the rcu_head > struct is already accounted for in the inode->i_security blob. > >>> @@ -1606,20 +1605,21 @@ static void inode_free_by_rcu(struct rcu_head *head) >>> * security_inode_free() - Free an inode's LSM blob >>> * @inode: the inode >>> * >>> - * Deallocate the inode security structure and set @inode->i_security to NULL. >>> + * Release any LSM resources associated with @inode, although due to the >>> + * inode's RCU protections it is possible that the resources will not be >>> + * fully released until after the current RCU grace period has elapsed. >>> + * >>> + * It is important for LSMs to note that despite being present in a call to >>> + * security_inode_free(), @inode may still be referenced in a VFS path walk >>> + * and calls to security_inode_permission() may be made during, or after, >>> + * a call to security_inode_free(). For this reason the inode->i_security >>> + * field is released via a call_rcu() callback and any LSMs which need to >>> + * retain inode state for use in security_inode_permission() should only >>> + * release that state in the inode_free_security_rcu() LSM hook callback. >>> */ >>> void security_inode_free(struct inode *inode) >>> { >>> call_void_hook(inode_free_security, inode); >>> - /* >>> - * The inode may still be referenced in a path walk and >>> - * a call to security_inode_permission() can be made >>> - * after inode_free_security() is called. Ideally, the VFS >>> - * wouldn't do this, but fixing that is a much harder >>> - * job. For now, simply free the i_security via RCU, and >>> - * leave the current inode->i_security pointer intact. >>> - * The inode will be freed after the RCU grace period too. >>> - */ >>> if (inode->i_security) >>> call_rcu((struct rcu_head *)inode->i_security, >>> inode_free_by_rcu); >> We should have something like: >> call_rcu(inode->i_security.rcu, inode_free_by_rcu); > The unfortunate side effect of that is that the patch grows > significantly as everyone that touches inode->i_security would need to > be updated to use inode->i_security.data or something similar. For a > patch that we likely want to see backported to -stable that could make > things more difficult than needed. > > However, I have always thought we should add some better > structure/typing to these opaque LSM blobs both to get away from the > raw pointer math and add a marginal layer of safety. I've envisioned > doing something like this: > > struct lsm_blob_inode { > struct selinux_blob_inode selinux; > struct smack_blob_inode smack; > struct aa_blob_inode apparmor; > ... > struct rcu_head rcu; > } I have considered doing this as part of the stacking effort for quite some time. I've already done it for the lsmblob structure that will replace most uses of the u32 secid in the LSM APIs. I am concerned that there would be considerable gnashing of teeth over the potential increase in the blob sizes for kernels compiled with LSMs that aren't active. I have been frantically avoiding anything that might slow the stacking effort further. If this would help moving this along I'll include it in v40. > > .. with lsm_blob_inode allocated and assigned to inode->i_security. > Those LSMs that are enabled and require blob space would define their > struct with the necessary fields, those that were disabled or did not > require space would define an empty struct; some minor pre-processor > checking and header file work would be needed, but it shouldn't be too > bad. Ideally, we would use the same approach for all of the LSM > security blobs, only with different LSM supplied structs. Perhaps > once we land Casey's latest patchset I'll see about doing something > like that, but right now we've got bigger problems to address. > > I'll hold off on posting a proper patch for this RFC until I hear back > from Mini or Roberto on the IMA changes. >
On Wed, Jul 10, 2024 at 12:41 PM Casey Schaufler <casey@schaufler-ca.com> wrote: > On 7/10/2024 9:20 AM, Paul Moore wrote: ... > > However, I have always thought we should add some better > > structure/typing to these opaque LSM blobs both to get away from the > > raw pointer math and add a marginal layer of safety. I've envisioned > > doing something like this: > > > > struct lsm_blob_inode { > > struct selinux_blob_inode selinux; > > struct smack_blob_inode smack; > > struct aa_blob_inode apparmor; > > ... > > struct rcu_head rcu; > > } > > I have considered doing this as part of the stacking effort for quite > some time. I've already done it for the lsmblob structure that will replace > most uses of the u32 secid in the LSM APIs. I am concerned that there would > be considerable gnashing of teeth over the potential increase in the blob > sizes for kernels compiled with LSMs that aren't active. Yes, that's a fair point and something that needs to be considered. > I have been frantically > avoiding anything that might slow the stacking effort further. If this would > help moving this along I'll include it in v40. No, don't worry about this as part of improving the multi-LSM support, this is something that can be done independently, if at all.
On 7/10/2024 4:40 AM, Paul Moore wrote: > The LSM framework has an existing inode_free_security() hook which > is used by LSMs that manage state associated with an inode, but > due to the use of RCU to protect the inode, special care must be > taken to ensure that the LSMs do not fully release the inode state > until it is safe from a RCU perspective. > > This patch implements a new inode_free_security_rcu() implementation > hook which is called when it is safe to free the LSM's internal inode > state. Unfortunately, this new hook does not have access to the inode > itself as it may already be released, so the existing > inode_free_security() hook is retained for those LSMs which require > access to the inode. > > Signed-off-by: Paul Moore <paul@paul-moore.com> > --- > include/linux/lsm_hook_defs.h | 1 + > security/integrity/ima/ima.h | 2 +- > security/integrity/ima/ima_iint.c | 20 ++++++++------------ > security/integrity/ima/ima_main.c | 2 +- > security/landlock/fs.c | 9 ++++++--- > security/security.c | 26 +++++++++++++------------- > 6 files changed, 30 insertions(+), 30 deletions(-) > > diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h > index 8fd87f823d3a..abe6b0ef892a 100644 > --- a/include/linux/lsm_hook_defs.h > +++ b/include/linux/lsm_hook_defs.h > @@ -114,6 +114,7 @@ LSM_HOOK(int, 0, path_notify, const struct path *path, u64 mask, > unsigned int obj_type) > LSM_HOOK(int, 0, inode_alloc_security, struct inode *inode) > LSM_HOOK(void, LSM_RET_VOID, inode_free_security, struct inode *inode) > +LSM_HOOK(void, LSM_RET_VOID, inode_free_security_rcu, void *) > LSM_HOOK(int, -EOPNOTSUPP, inode_init_security, struct inode *inode, > struct inode *dir, const struct qstr *qstr, struct xattr *xattrs, > int *xattr_count) > diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h > index 3e568126cd48..e2a2e4c7eab6 100644 > --- a/security/integrity/ima/ima.h > +++ b/security/integrity/ima/ima.h > @@ -223,7 +223,7 @@ static inline void ima_inode_set_iint(const struct inode *inode, > > struct ima_iint_cache *ima_iint_find(struct inode *inode); > struct ima_iint_cache *ima_inode_get(struct inode *inode); > -void ima_inode_free(struct inode *inode); > +void ima_inode_free_rcu(void *inode_sec); > void __init ima_iintcache_init(void); > > extern const int read_idmap[]; > diff --git a/security/integrity/ima/ima_iint.c b/security/integrity/ima/ima_iint.c > index e23412a2c56b..54480df90bdc 100644 > --- a/security/integrity/ima/ima_iint.c > +++ b/security/integrity/ima/ima_iint.c > @@ -109,22 +109,18 @@ struct ima_iint_cache *ima_inode_get(struct inode *inode) > } > > /** > - * ima_inode_free - Called on inode free > - * @inode: Pointer to the inode > + * ima_inode_free_rcu - Called to free an inode via a RCU callback > + * @inode_sec: The inode::i_security pointer > * > - * Free the iint associated with an inode. > + * Free the IMA data associated with an inode. > */ > -void ima_inode_free(struct inode *inode) > +void ima_inode_free_rcu(void *inode_sec) > { > - struct ima_iint_cache *iint; > + struct ima_iint_cache **iint_p = inode_sec + ima_blob_sizes.lbs_inode; > > - if (!IS_IMA(inode)) > - return; > - > - iint = ima_iint_find(inode); > - ima_inode_set_iint(inode, NULL); > - > - ima_iint_free(iint); > + /* *iint_p should be NULL if !IS_IMA(inode) */ > + if (*iint_p) > + ima_iint_free(*iint_p); It looks ok for me. The only thing we are not doing is to set the pointer to NULL, but I guess does not matter, since the inode security blob is being freed. I also ran some UML kernel tests in the CI, and everything looks good: https://github.com/robertosassu/ima-evm-utils/actions/runs/9880817007/job/27291259487 Will think a bit, if I'm missing something. Roberto > } > > static void ima_iint_init_once(void *foo) > diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c > index f04f43af651c..5b3394864b21 100644 > --- a/security/integrity/ima/ima_main.c > +++ b/security/integrity/ima/ima_main.c > @@ -1193,7 +1193,7 @@ static struct security_hook_list ima_hooks[] __ro_after_init = { > #ifdef CONFIG_INTEGRITY_ASYMMETRIC_KEYS > LSM_HOOK_INIT(kernel_module_request, ima_kernel_module_request), > #endif > - LSM_HOOK_INIT(inode_free_security, ima_inode_free), > + LSM_HOOK_INIT(inode_free_security_rcu, ima_inode_free_rcu), > }; > > static const struct lsm_id ima_lsmid = { > diff --git a/security/landlock/fs.c b/security/landlock/fs.c > index 22d8b7c28074..f583f8cec345 100644 > --- a/security/landlock/fs.c > +++ b/security/landlock/fs.c > @@ -1198,13 +1198,16 @@ static int current_check_refer_path(struct dentry *const old_dentry, > > /* Inode hooks */ > > -static void hook_inode_free_security(struct inode *const inode) > +static void hook_inode_free_security_rcu(void *inode_sec) > { > + struct landlock_inode_security *lisec; > + > /* > * All inodes must already have been untied from their object by > * release_inode() or hook_sb_delete(). > */ > - WARN_ON_ONCE(landlock_inode(inode)->object); > + lisec = inode_sec + landlock_blob_sizes.lbs_inode; > + WARN_ON_ONCE(lisec->object); > } > > /* Super-block hooks */ > @@ -1628,7 +1631,7 @@ static int hook_file_ioctl_compat(struct file *file, unsigned int cmd, > } > > static struct security_hook_list landlock_hooks[] __ro_after_init = { > - LSM_HOOK_INIT(inode_free_security, hook_inode_free_security), > + LSM_HOOK_INIT(inode_free_security_rcu, hook_inode_free_security_rcu), > > LSM_HOOK_INIT(sb_delete, hook_sb_delete), > LSM_HOOK_INIT(sb_mount, hook_sb_mount), > diff --git a/security/security.c b/security/security.c > index b52e81ac5526..bc6805f7332e 100644 > --- a/security/security.c > +++ b/security/security.c > @@ -1596,9 +1596,8 @@ int security_inode_alloc(struct inode *inode) > > static void inode_free_by_rcu(struct rcu_head *head) > { > - /* > - * The rcu head is at the start of the inode blob > - */ > + /* The rcu head is at the start of the inode blob */ > + call_void_hook(inode_free_security_rcu, head); > kmem_cache_free(lsm_inode_cache, head); > } > > @@ -1606,20 +1605,21 @@ static void inode_free_by_rcu(struct rcu_head *head) > * security_inode_free() - Free an inode's LSM blob > * @inode: the inode > * > - * Deallocate the inode security structure and set @inode->i_security to NULL. > + * Release any LSM resources associated with @inode, although due to the > + * inode's RCU protections it is possible that the resources will not be > + * fully released until after the current RCU grace period has elapsed. > + * > + * It is important for LSMs to note that despite being present in a call to > + * security_inode_free(), @inode may still be referenced in a VFS path walk > + * and calls to security_inode_permission() may be made during, or after, > + * a call to security_inode_free(). For this reason the inode->i_security > + * field is released via a call_rcu() callback and any LSMs which need to > + * retain inode state for use in security_inode_permission() should only > + * release that state in the inode_free_security_rcu() LSM hook callback. > */ > void security_inode_free(struct inode *inode) > { > call_void_hook(inode_free_security, inode); > - /* > - * The inode may still be referenced in a path walk and > - * a call to security_inode_permission() can be made > - * after inode_free_security() is called. Ideally, the VFS > - * wouldn't do this, but fixing that is a much harder > - * job. For now, simply free the i_security via RCU, and > - * leave the current inode->i_security pointer intact. > - * The inode will be freed after the RCU grace period too. > - */ > if (inode->i_security) > call_rcu((struct rcu_head *)inode->i_security, > inode_free_by_rcu);
On Wed, Jul 10, 2024 at 5:01 PM Roberto Sassu <roberto.sassu@huaweicloud.com> wrote: > > It looks ok for me. The only thing we are not doing is to set the > pointer to NULL, but I guess does not matter, since the inode security > blob is being freed. Yes, that shouldn't be an issue, and depending on how things get scheduled, it's entirely possible that the associated inode is completely gone by the time ima_inode_free_rcu() is called. > I also ran some UML kernel tests in the CI, and everything looks good: > > https://github.com/robertosassu/ima-evm-utils/actions/runs/9880817007/job/27291259487 > > Will think a bit, if I'm missing something. Great, thank you.
On Wed, Jul 10, 2024 at 12:24:31PM -0400, Paul Moore wrote: > On Wed, Jul 10, 2024 at 8:02 AM Mickaël Salaün <mic@digikod.net> wrote: > > On Tue, Jul 09, 2024 at 10:47:45PM -0400, Paul Moore wrote: > > > On Tue, Jul 9, 2024 at 10:40 PM Paul Moore <paul@paul-moore.com> wrote: > > > > > > > > The LSM framework has an existing inode_free_security() hook which > > > > is used by LSMs that manage state associated with an inode, but > > > > due to the use of RCU to protect the inode, special care must be > > > > taken to ensure that the LSMs do not fully release the inode state > > > > until it is safe from a RCU perspective. > > > > > > > > This patch implements a new inode_free_security_rcu() implementation > > > > hook which is called when it is safe to free the LSM's internal inode > > > > state. Unfortunately, this new hook does not have access to the inode > > > > itself as it may already be released, so the existing > > > > inode_free_security() hook is retained for those LSMs which require > > > > access to the inode. > > > > > > > > Signed-off-by: Paul Moore <paul@paul-moore.com> > > > > --- > > > > include/linux/lsm_hook_defs.h | 1 + > > > > security/integrity/ima/ima.h | 2 +- > > > > security/integrity/ima/ima_iint.c | 20 ++++++++------------ > > > > security/integrity/ima/ima_main.c | 2 +- > > > > security/landlock/fs.c | 9 ++++++--- > > > > security/security.c | 26 +++++++++++++------------- > > > > 6 files changed, 30 insertions(+), 30 deletions(-) > > > > > > FYI, this has only received "light" testing, and even that is fairly > > > generous. I booted up a system with IMA set to measure the TCB and > > > ran through the audit and SELinux test suites; IMA seemed to be > > > working just fine but I didn't poke at it too hard. I didn't have an > > > explicit Landlock test handy, but I'm hoping that the Landlock > > > enablement on a modern Rawhide system hit it a little :) > > > > If you want to test Landlock, you can do so like this: > > > > cd tools/testing/selftests/landlock > > make -C ../../../.. headers_install > > make > > for f in *_test; ./$f; done > > Looks okay? > > % for f in *_test; do ./$f; done | grep "^# Totals" > # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0 > # SKIP overlayfs is not supported (setup) > # SKIP overlayfs is not supported (setup) > # SKIP this filesystem is not supported (setup) > # SKIP this filesystem is not supported (setup) > # SKIP this filesystem is not supported (setup) > # SKIP this filesystem is not supported (setup) > # SKIP this filesystem is not supported (setup) > # Totals: pass:117 fail:0 xfail:0 xpass:0 skip:7 error:0 > # Totals: pass:84 fail:0 xfail:0 xpass:0 skip:0 error:0 > # Totals: pass:8 fail:0 xfail:0 xpass:0 skip:0 error:0 It should be enough, thanks. FYI, the minimal configuration required to run all tests (except hostfs) is listed in tools/testing/selftests/landlock/config
On Wed, Jul 10, 2024 at 12:20:18PM -0400, Paul Moore wrote: > On Wed, Jul 10, 2024 at 6:40 AM Mickaël Salaün <mic@digikod.net> wrote: > > On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote: > > > The LSM framework has an existing inode_free_security() hook which > > > is used by LSMs that manage state associated with an inode, but > > > due to the use of RCU to protect the inode, special care must be > > > taken to ensure that the LSMs do not fully release the inode state > > > until it is safe from a RCU perspective. > > > > > > This patch implements a new inode_free_security_rcu() implementation > > > hook which is called when it is safe to free the LSM's internal inode > > > state. Unfortunately, this new hook does not have access to the inode > > > itself as it may already be released, so the existing > > > inode_free_security() hook is retained for those LSMs which require > > > access to the inode. > > > > > > Signed-off-by: Paul Moore <paul@paul-moore.com> > > > diff --git a/security/landlock/fs.c b/security/landlock/fs.c > > > index 22d8b7c28074..f583f8cec345 100644 > > > --- a/security/landlock/fs.c > > > +++ b/security/landlock/fs.c > > > @@ -1198,13 +1198,16 @@ static int current_check_refer_path(struct dentry *const old_dentry, > > > > > > /* Inode hooks */ > > > > > > -static void hook_inode_free_security(struct inode *const inode) > > > +static void hook_inode_free_security_rcu(void *inode_sec) > > > { > > > + struct landlock_inode_security *lisec; > > > > Please rename "lisec" to "inode_sec" for consistency with > > get_inode_object()'s variables. > > Done. That did conflict with the parameter name so I renamed the > parameter everywhere to "inode_security". Oh, I didn't see this conflict. Using inode_security as function argument looks good. > > However, I'm wondering if we could backport this patch down to v5.15 . > > I guess not, so I'll need to remove this hook implementation for > > Landlock, backport it to v5.15, and then you'll need to re-add this > > check with this patch. At least it has been useful to spot this inode > > issue, but it could still be useful to spot potential memory leaks with > > a negligible performance impact. > > Yes, it's a bit complicated with the IMA/EVM promotion happening > fairly recently. I'm marking the patch with a stable tag, but > considering we're at -rc7 and this needs at least one more respin, > testing, ACKs, etc. it might not land in Linus' tree until sometime > post v6.11-rc1. Considering that, I might suggest dropping the > Landlock hook in -stable and re-adding it to Linus' tree once this fix > lands, but that decision is up to you. I would prefer to backport the new hook. I can help with that. In fact, I tried to backport the removal of the hook for Landlock, and it requires the same amount of work, so it would be better to work together. That should also ease future backports impacting the same part of the code. BTW, while trying to backport that to linux-5.15.y I noticed that a fix is missing in this branch: the default return value for the inode_init_security hook, see commit 6bcdfd2cac55 ("security: Allow all LSMs to provide xattrs for inode_init_security hook"). > > > > diff --git a/security/security.c b/security/security.c > > > index b52e81ac5526..bc6805f7332e 100644 > > > --- a/security/security.c > > > +++ b/security/security.c > > > @@ -1596,9 +1596,8 @@ int security_inode_alloc(struct inode *inode) > > > > > > static void inode_free_by_rcu(struct rcu_head *head) > > > { > > > - /* > > > - * The rcu head is at the start of the inode blob > > > - */ > > > + /* The rcu head is at the start of the inode blob */ > > > + call_void_hook(inode_free_security_rcu, head); > > > > For this to work, we need to extend the inode blob size (lbs_inode) with > > sizeof(struct rcu_head). The current implementation override the > > content of the blob with a new rcu_head. > > Take a look at lsm_set_blob_sizes() and you'll see that the rcu_head > struct is already accounted for in the inode->i_security blob. Indeed, I was confused with the lsm_set_blob_size() name which *adds* a size. > > > > @@ -1606,20 +1605,21 @@ static void inode_free_by_rcu(struct rcu_head *head) > > > * security_inode_free() - Free an inode's LSM blob > > > * @inode: the inode > > > * > > > - * Deallocate the inode security structure and set @inode->i_security to NULL. > > > + * Release any LSM resources associated with @inode, although due to the > > > + * inode's RCU protections it is possible that the resources will not be > > > + * fully released until after the current RCU grace period has elapsed. > > > + * > > > + * It is important for LSMs to note that despite being present in a call to > > > + * security_inode_free(), @inode may still be referenced in a VFS path walk > > > + * and calls to security_inode_permission() may be made during, or after, > > > + * a call to security_inode_free(). For this reason the inode->i_security > > > + * field is released via a call_rcu() callback and any LSMs which need to > > > + * retain inode state for use in security_inode_permission() should only > > > + * release that state in the inode_free_security_rcu() LSM hook callback. > > > */ > > > void security_inode_free(struct inode *inode) > > > { > > > call_void_hook(inode_free_security, inode); > > > - /* > > > - * The inode may still be referenced in a path walk and > > > - * a call to security_inode_permission() can be made > > > - * after inode_free_security() is called. Ideally, the VFS > > > - * wouldn't do this, but fixing that is a much harder > > > - * job. For now, simply free the i_security via RCU, and > > > - * leave the current inode->i_security pointer intact. > > > - * The inode will be freed after the RCU grace period too. > > > - */ > > > if (inode->i_security) > > > call_rcu((struct rcu_head *)inode->i_security, > > > inode_free_by_rcu); > > > > We should have something like: > > call_rcu(inode->i_security.rcu, inode_free_by_rcu); > > The unfortunate side effect of that is that the patch grows > significantly as everyone that touches inode->i_security would need to > be updated to use inode->i_security.data or something similar. For a > patch that we likely want to see backported to -stable that could make > things more difficult than needed. Sure, this was related to my confusion with the rcu_head size addition.
On Mon, Jul 15, 2024 at 9:34 AM Mickaël Salaün <mic@digikod.net> wrote: > On Wed, Jul 10, 2024 at 12:24:31PM -0400, Paul Moore wrote: > > On Wed, Jul 10, 2024 at 8:02 AM Mickaël Salaün <mic@digikod.net> wrote: > > > On Tue, Jul 09, 2024 at 10:47:45PM -0400, Paul Moore wrote: > > > > On Tue, Jul 9, 2024 at 10:40 PM Paul Moore <paul@paul-moore.com> wrote: > > > > > > > > > > The LSM framework has an existing inode_free_security() hook which > > > > > is used by LSMs that manage state associated with an inode, but > > > > > due to the use of RCU to protect the inode, special care must be > > > > > taken to ensure that the LSMs do not fully release the inode state > > > > > until it is safe from a RCU perspective. > > > > > > > > > > This patch implements a new inode_free_security_rcu() implementation > > > > > hook which is called when it is safe to free the LSM's internal inode > > > > > state. Unfortunately, this new hook does not have access to the inode > > > > > itself as it may already be released, so the existing > > > > > inode_free_security() hook is retained for those LSMs which require > > > > > access to the inode. > > > > > > > > > > Signed-off-by: Paul Moore <paul@paul-moore.com> > > > > > --- > > > > > include/linux/lsm_hook_defs.h | 1 + > > > > > security/integrity/ima/ima.h | 2 +- > > > > > security/integrity/ima/ima_iint.c | 20 ++++++++------------ > > > > > security/integrity/ima/ima_main.c | 2 +- > > > > > security/landlock/fs.c | 9 ++++++--- > > > > > security/security.c | 26 +++++++++++++------------- > > > > > 6 files changed, 30 insertions(+), 30 deletions(-) > > > > > > > > FYI, this has only received "light" testing, and even that is fairly > > > > generous. I booted up a system with IMA set to measure the TCB and > > > > ran through the audit and SELinux test suites; IMA seemed to be > > > > working just fine but I didn't poke at it too hard. I didn't have an > > > > explicit Landlock test handy, but I'm hoping that the Landlock > > > > enablement on a modern Rawhide system hit it a little :) > > > > > > If you want to test Landlock, you can do so like this: > > > > > > cd tools/testing/selftests/landlock > > > make -C ../../../.. headers_install > > > make > > > for f in *_test; ./$f; done > > > > Looks okay? > > > > % for f in *_test; do ./$f; done | grep "^# Totals" > > # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0 > > # SKIP overlayfs is not supported (setup) > > # SKIP overlayfs is not supported (setup) > > # SKIP this filesystem is not supported (setup) > > # SKIP this filesystem is not supported (setup) > > # SKIP this filesystem is not supported (setup) > > # SKIP this filesystem is not supported (setup) > > # SKIP this filesystem is not supported (setup) > > # Totals: pass:117 fail:0 xfail:0 xpass:0 skip:7 error:0 > > # Totals: pass:84 fail:0 xfail:0 xpass:0 skip:0 error:0 > > # Totals: pass:8 fail:0 xfail:0 xpass:0 skip:0 error:0 > > It should be enough, thanks. FYI, the minimal configuration required to > run all tests (except hostfs) is listed in > tools/testing/selftests/landlock/config Thanks, I'll try to remember to add that to my test builds.
On Mon, Jul 15, 2024 at 9:35 AM Mickaël Salaün <mic@digikod.net> wrote: > On Wed, Jul 10, 2024 at 12:20:18PM -0400, Paul Moore wrote: > > On Wed, Jul 10, 2024 at 6:40 AM Mickaël Salaün <mic@digikod.net> wrote: > > > On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote: ... > > > However, I'm wondering if we could backport this patch down to v5.15 . > > > I guess not, so I'll need to remove this hook implementation for > > > Landlock, backport it to v5.15, and then you'll need to re-add this > > > check with this patch. At least it has been useful to spot this inode > > > issue, but it could still be useful to spot potential memory leaks with > > > a negligible performance impact. > > > > Yes, it's a bit complicated with the IMA/EVM promotion happening > > fairly recently. I'm marking the patch with a stable tag, but > > considering we're at -rc7 and this needs at least one more respin, > > testing, ACKs, etc. it might not land in Linus' tree until sometime > > post v6.11-rc1. Considering that, I might suggest dropping the > > Landlock hook in -stable and re-adding it to Linus' tree once this fix > > lands, but that decision is up to you. > > I would prefer to backport the new hook. I can help with that. In > fact, I tried to backport the removal of the hook for Landlock, and it > requires the same amount of work, so it would be better to work > together. That should also ease future backports impacting the same > part of the code. Okay, let's get the initial v6.11 LSM PR merged (I just sent it to Linus) and then I'll post the updated patchset and a proper patch for review. > BTW, while trying to backport that to linux-5.15.y I noticed that a fix > is missing in this branch: the default return value for the > inode_init_security hook, see commit 6bcdfd2cac55 ("security: Allow all > LSMs to provide xattrs for inode_init_security hook"). Likely a casualty of a merge conflict; I haven't noticed the stable kernel folks doing any manual merging of LSM patches that fail an automated merge. You can always do the merge and send it to them.
On 10. 7. 2024 12:40, Mickaël Salaün wrote: > On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote: >> The LSM framework has an existing inode_free_security() hook which >> is used by LSMs that manage state associated with an inode, but >> due to the use of RCU to protect the inode, special care must be >> taken to ensure that the LSMs do not fully release the inode state >> until it is safe from a RCU perspective. >> >> This patch implements a new inode_free_security_rcu() implementation >> hook which is called when it is safe to free the LSM's internal inode >> state. Unfortunately, this new hook does not have access to the inode >> itself as it may already be released, so the existing >> inode_free_security() hook is retained for those LSMs which require >> access to the inode. >> >> Signed-off-by: Paul Moore <paul@paul-moore.com> > > I like this new hook. It is definitely safer than the current approach. > > To make it more consistent, I think we should also rename > security_inode_free() to security_inode_put() to highlight the fact that > LSM implementations should not free potential pointers in this blob > because they could still be dereferenced in a path walk. > >> --- >> include/linux/lsm_hook_defs.h | 1 + >> security/integrity/ima/ima.h | 2 +- >> security/integrity/ima/ima_iint.c | 20 ++++++++------------ >> security/integrity/ima/ima_main.c | 2 +- >> security/landlock/fs.c | 9 ++++++--- >> security/security.c | 26 +++++++++++++------------- >> 6 files changed, 30 insertions(+), 30 deletions(-) >> >> diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h >> index 8fd87f823d3a..abe6b0ef892a 100644 >> --- a/include/linux/lsm_hook_defs.h >> +++ b/include/linux/lsm_hook_defs.h >> @@ -114,6 +114,7 @@ LSM_HOOK(int, 0, path_notify, const struct path *path, u64 mask, >> unsigned int obj_type) >> LSM_HOOK(int, 0, inode_alloc_security, struct inode *inode) >> LSM_HOOK(void, LSM_RET_VOID, inode_free_security, struct inode *inode) >> +LSM_HOOK(void, LSM_RET_VOID, inode_free_security_rcu, void *) >> LSM_HOOK(int, -EOPNOTSUPP, inode_init_security, struct inode *inode, >> struct inode *dir, const struct qstr *qstr, struct xattr *xattrs, >> int *xattr_count) >> diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h >> index 3e568126cd48..e2a2e4c7eab6 100644 >> --- a/security/integrity/ima/ima.h >> +++ b/security/integrity/ima/ima.h >> @@ -223,7 +223,7 @@ static inline void ima_inode_set_iint(const struct inode *inode, >> >> struct ima_iint_cache *ima_iint_find(struct inode *inode); >> struct ima_iint_cache *ima_inode_get(struct inode *inode); >> -void ima_inode_free(struct inode *inode); >> +void ima_inode_free_rcu(void *inode_sec); >> void __init ima_iintcache_init(void); >> >> extern const int read_idmap[]; >> diff --git a/security/integrity/ima/ima_iint.c b/security/integrity/ima/ima_iint.c >> index e23412a2c56b..54480df90bdc 100644 >> --- a/security/integrity/ima/ima_iint.c >> +++ b/security/integrity/ima/ima_iint.c >> @@ -109,22 +109,18 @@ struct ima_iint_cache *ima_inode_get(struct inode *inode) >> } >> >> /** >> - * ima_inode_free - Called on inode free >> - * @inode: Pointer to the inode >> + * ima_inode_free_rcu - Called to free an inode via a RCU callback >> + * @inode_sec: The inode::i_security pointer >> * >> - * Free the iint associated with an inode. >> + * Free the IMA data associated with an inode. >> */ >> -void ima_inode_free(struct inode *inode) >> +void ima_inode_free_rcu(void *inode_sec) >> { >> - struct ima_iint_cache *iint; >> + struct ima_iint_cache **iint_p = inode_sec + ima_blob_sizes.lbs_inode; >> >> - if (!IS_IMA(inode)) >> - return; >> - >> - iint = ima_iint_find(inode); >> - ima_inode_set_iint(inode, NULL); >> - >> - ima_iint_free(iint); >> + /* *iint_p should be NULL if !IS_IMA(inode) */ >> + if (*iint_p) >> + ima_iint_free(*iint_p); >> } >> >> static void ima_iint_init_once(void *foo) >> diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c >> index f04f43af651c..5b3394864b21 100644 >> --- a/security/integrity/ima/ima_main.c >> +++ b/security/integrity/ima/ima_main.c >> @@ -1193,7 +1193,7 @@ static struct security_hook_list ima_hooks[] __ro_after_init = { >> #ifdef CONFIG_INTEGRITY_ASYMMETRIC_KEYS >> LSM_HOOK_INIT(kernel_module_request, ima_kernel_module_request), >> #endif >> - LSM_HOOK_INIT(inode_free_security, ima_inode_free), >> + LSM_HOOK_INIT(inode_free_security_rcu, ima_inode_free_rcu), >> }; >> >> static const struct lsm_id ima_lsmid = { >> diff --git a/security/landlock/fs.c b/security/landlock/fs.c >> index 22d8b7c28074..f583f8cec345 100644 >> --- a/security/landlock/fs.c >> +++ b/security/landlock/fs.c >> @@ -1198,13 +1198,16 @@ static int current_check_refer_path(struct dentry *const old_dentry, >> >> /* Inode hooks */ >> >> -static void hook_inode_free_security(struct inode *const inode) >> +static void hook_inode_free_security_rcu(void *inode_sec) >> { >> + struct landlock_inode_security *lisec; > > Please rename "lisec" to "inode_sec" for consistency with > get_inode_object()'s variables. > >> + >> /* >> * All inodes must already have been untied from their object by >> * release_inode() or hook_sb_delete(). >> */ >> - WARN_ON_ONCE(landlock_inode(inode)->object); >> + lisec = inode_sec + landlock_blob_sizes.lbs_inode; >> + WARN_ON_ONCE(lisec->object); >> } > > This looks good to me. > > We can add these footers: > Reported-by: syzbot+5446fbf332b0602ede0b@syzkaller.appspotmail.com > Closes: https://lore.kernel.org/r/00000000000076ba3b0617f65cc8@google.com Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC: 1) How does this patch close [1]? As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode." Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission, i.e. referencing the inode in a VFS path walk while destroying it... Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy. 2) Is there a guarantee that inode_free_by_rcu and i_callback will be called within the same RCU grace period? If not, can the security context be released earlier than the inode itself? If yes, can be executed inode_permission concurrently, leading to UAF of inode security context in security_inode_permission? Can fsnotify affect this (leading to different RCU grace periods)? (Again probably a question for VFS people.) I know, this RFC doesn't address exactly that question, but I'd like to know the answer. Many thanks for the helpful answers and your time, mY [1] https://lore.kernel.org/r/00000000000076ba3b0617f65cc8@google.com [2] https://lore.kernel.org/linux-security-module/CAHC9VhQUqJkWxhe5KukPOVQMnOhcOH5E+BJ4_b3Qn6edsL5YJQ@mail.gmail.com/T/#m6e6b01b196eac15a7ad99cf366614bbe60b8d9a2 > > However, I'm wondering if we could backport this patch down to v5.15 . > I guess not, so I'll need to remove this hook implementation for > Landlock, backport it to v5.15, and then you'll need to re-add this > check with this patch. At least it has been useful to spot this inode > issue, but it could still be useful to spot potential memory leaks with > a negligible performance impact. > > >> >> /* Super-block hooks */ >> @@ -1628,7 +1631,7 @@ static int hook_file_ioctl_compat(struct file *file, unsigned int cmd, >> } >> >> static struct security_hook_list landlock_hooks[] __ro_after_init = { >> - LSM_HOOK_INIT(inode_free_security, hook_inode_free_security), >> + LSM_HOOK_INIT(inode_free_security_rcu, hook_inode_free_security_rcu), >> >> LSM_HOOK_INIT(sb_delete, hook_sb_delete), >> LSM_HOOK_INIT(sb_mount, hook_sb_mount), >> diff --git a/security/security.c b/security/security.c >> index b52e81ac5526..bc6805f7332e 100644 >> --- a/security/security.c >> +++ b/security/security.c >> @@ -1596,9 +1596,8 @@ int security_inode_alloc(struct inode *inode) >> >> static void inode_free_by_rcu(struct rcu_head *head) >> { >> - /* >> - * The rcu head is at the start of the inode blob >> - */ >> + /* The rcu head is at the start of the inode blob */ >> + call_void_hook(inode_free_security_rcu, head); > > For this to work, we need to extend the inode blob size (lbs_inode) with > sizeof(struct rcu_head). The current implementation override the > content of the blob with a new rcu_head. > >> kmem_cache_free(lsm_inode_cache, head); >> } >> >> @@ -1606,20 +1605,21 @@ static void inode_free_by_rcu(struct rcu_head *head) >> * security_inode_free() - Free an inode's LSM blob >> * @inode: the inode >> * >> - * Deallocate the inode security structure and set @inode->i_security to NULL. >> + * Release any LSM resources associated with @inode, although due to the >> + * inode's RCU protections it is possible that the resources will not be >> + * fully released until after the current RCU grace period has elapsed. >> + * >> + * It is important for LSMs to note that despite being present in a call to >> + * security_inode_free(), @inode may still be referenced in a VFS path walk >> + * and calls to security_inode_permission() may be made during, or after, >> + * a call to security_inode_free(). For this reason the inode->i_security >> + * field is released via a call_rcu() callback and any LSMs which need to >> + * retain inode state for use in security_inode_permission() should only >> + * release that state in the inode_free_security_rcu() LSM hook callback. >> */ >> void security_inode_free(struct inode *inode) >> { >> call_void_hook(inode_free_security, inode); >> - /* >> - * The inode may still be referenced in a path walk and >> - * a call to security_inode_permission() can be made >> - * after inode_free_security() is called. Ideally, the VFS >> - * wouldn't do this, but fixing that is a much harder >> - * job. For now, simply free the i_security via RCU, and >> - * leave the current inode->i_security pointer intact. >> - * The inode will be freed after the RCU grace period too. >> - */ >> if (inode->i_security) >> call_rcu((struct rcu_head *)inode->i_security, >> inode_free_by_rcu); > > We should have something like: > call_rcu(inode->i_security.rcu, inode_free_by_rcu); > >> -- >> 2.45.2 >> >> >
On Mon, Jul 22, 2024 at 8:30 AM Matus Jokay <matus.jokay@stuba.sk> wrote: > On 10. 7. 2024 12:40, Mickaël Salaün wrote: > > On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote: > >> The LSM framework has an existing inode_free_security() hook which > >> is used by LSMs that manage state associated with an inode, but > >> due to the use of RCU to protect the inode, special care must be > >> taken to ensure that the LSMs do not fully release the inode state > >> until it is safe from a RCU perspective. > >> > >> This patch implements a new inode_free_security_rcu() implementation > >> hook which is called when it is safe to free the LSM's internal inode > >> state. Unfortunately, this new hook does not have access to the inode > >> itself as it may already be released, so the existing > >> inode_free_security() hook is retained for those LSMs which require > >> access to the inode. > >> > >> Signed-off-by: Paul Moore <paul@paul-moore.com> > > > > I like this new hook. It is definitely safer than the current approach. > > > > To make it more consistent, I think we should also rename > > security_inode_free() to security_inode_put() to highlight the fact that > > LSM implementations should not free potential pointers in this blob > > because they could still be dereferenced in a path walk. > > > >> --- > >> include/linux/lsm_hook_defs.h | 1 + > >> security/integrity/ima/ima.h | 2 +- > >> security/integrity/ima/ima_iint.c | 20 ++++++++------------ > >> security/integrity/ima/ima_main.c | 2 +- > >> security/landlock/fs.c | 9 ++++++--- > >> security/security.c | 26 +++++++++++++------------- > >> 6 files changed, 30 insertions(+), 30 deletions(-) ... > Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC: > > 1) How does this patch close [1]? > As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode." > Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission, > i.e. referencing the inode in a VFS path walk while destroying it... > Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy. The VFS folks can likely provide a better, or perhaps a more correct answer, but my understanding is that during the path walk the inode is protected by a RCU lock which allows for multiple threads to access the inode simultaneously; this could result in some cases where one thread is destroying the inode while another is accessing it. Changing this would require changes to the VFS code, and I'm not sure why you would want to change it anyway, the performance win of using RCU here is likely significant. > 2) Is there a guarantee that inode_free_by_rcu and i_callback will be called within the same RCU grace period? I'm not an RCU expert, but I don't believe there are any guarantees that the inode_free_by_rcu() and the inode's own free routines are going to be called within the same RCU grace period (not really applicable as inode_free_by_rcu() isn't called *during* a grace period, but *after* the grace period of the associated security_inode_free() call). However, this patch does not rely on synchronization between the inode and inode LSM free routine in inode_free_by_rcu(); the inode_free_by_rcu() function and the new inode_free_security_rcu() LSM callback does not have a pointer to the inode, only the inode's LSM blob. I agree that it is a bit odd, but freeing the inode and inode's LSM blob independently of each other should not cause a problem so long as the inode is no longer in use (hence the RCU callbacks). > If not, can the security context be released earlier than the inode itself? Possibly, but it should only happen after the inode is no longer in use (the call_rcu () callback should ensure that we are past all of the currently executing RCU critical sections). > If yes, can be executed > inode_permission concurrently, leading to UAF of inode security context in security_inode_permission? I do not believe so, see the discussion above, but I welcome any corrections. > Can fsnotify affect this (leading to different RCU grace periods)? (Again probably a question for VFS people.) If fsnotify is affecting this negatively then I suspect that is a reason for much larger concern as I believe that would indicate a problem with fsnotify and the inode locking scheme.
On Mon, Jul 22, 2024 at 03:46:36PM -0400, Paul Moore wrote: > On Mon, Jul 22, 2024 at 8:30 AM Matus Jokay <matus.jokay@stuba.sk> wrote: > > On 10. 7. 2024 12:40, Mickaël Salaün wrote: > > > On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote: > > >> The LSM framework has an existing inode_free_security() hook which > > >> is used by LSMs that manage state associated with an inode, but > > >> due to the use of RCU to protect the inode, special care must be > > >> taken to ensure that the LSMs do not fully release the inode state > > >> until it is safe from a RCU perspective. > > >> > > >> This patch implements a new inode_free_security_rcu() implementation > > >> hook which is called when it is safe to free the LSM's internal inode > > >> state. Unfortunately, this new hook does not have access to the inode > > >> itself as it may already be released, so the existing > > >> inode_free_security() hook is retained for those LSMs which require > > >> access to the inode. > > >> > > >> Signed-off-by: Paul Moore <paul@paul-moore.com> > > > > > > I like this new hook. It is definitely safer than the current approach. > > > > > > To make it more consistent, I think we should also rename > > > security_inode_free() to security_inode_put() to highlight the fact that > > > LSM implementations should not free potential pointers in this blob > > > because they could still be dereferenced in a path walk. > > > > > >> --- > > >> include/linux/lsm_hook_defs.h | 1 + > > >> security/integrity/ima/ima.h | 2 +- > > >> security/integrity/ima/ima_iint.c | 20 ++++++++------------ > > >> security/integrity/ima/ima_main.c | 2 +- > > >> security/landlock/fs.c | 9 ++++++--- > > >> security/security.c | 26 +++++++++++++------------- > > >> 6 files changed, 30 insertions(+), 30 deletions(-) > > ... > > > Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC: > > > > 1) How does this patch close [1]? > > As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode." > > Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission, > > i.e. referencing the inode in a VFS path walk while destroying it... > > Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy. > > The VFS folks can likely provide a better, or perhaps a more correct > answer, but my understanding is that during the path walk the inode is > protected by a RCU lock which allows for multiple threads to access > the inode simultaneously; this could result in some cases where one > thread is destroying the inode while another is accessing it. Shouldn't may_lookup() be checking the inode for (I_NEW | I_WILLFREE | I_FREE) so that it doesn't access an inode either not completely initialised or being evicted during the RCU path walk? All accesses to the VFS inode that don't have explicit reference counts have to do these checks... IIUC, at the may_lookup() point, the RCU pathwalk doesn't have a fully validate reference count to the dentry or the inode at this point, so it seems accessing random objects attached to an inode that can be anywhere in the setup or teardown process isn't at all safe... -Dave.
On 22. 7. 2024 21:46, Paul Moore wrote: > On Mon, Jul 22, 2024 at 8:30 AM Matus Jokay <matus.jokay@stuba.sk> wrote: >> On 10. 7. 2024 12:40, Mickaël Salaün wrote: >>> On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote: >>>> The LSM framework has an existing inode_free_security() hook which >>>> is used by LSMs that manage state associated with an inode, but >>>> due to the use of RCU to protect the inode, special care must be >>>> taken to ensure that the LSMs do not fully release the inode state >>>> until it is safe from a RCU perspective. >>>> >>>> This patch implements a new inode_free_security_rcu() implementation >>>> hook which is called when it is safe to free the LSM's internal inode >>>> state. Unfortunately, this new hook does not have access to the inode >>>> itself as it may already be released, so the existing >>>> inode_free_security() hook is retained for those LSMs which require >>>> access to the inode. >>>> >>>> Signed-off-by: Paul Moore <paul@paul-moore.com> >>> >>> I like this new hook. It is definitely safer than the current approach. >>> >>> To make it more consistent, I think we should also rename >>> security_inode_free() to security_inode_put() to highlight the fact that >>> LSM implementations should not free potential pointers in this blob >>> because they could still be dereferenced in a path walk. >>> >>>> --- >>>> include/linux/lsm_hook_defs.h | 1 + >>>> security/integrity/ima/ima.h | 2 +- >>>> security/integrity/ima/ima_iint.c | 20 ++++++++------------ >>>> security/integrity/ima/ima_main.c | 2 +- >>>> security/landlock/fs.c | 9 ++++++--- >>>> security/security.c | 26 +++++++++++++------------- >>>> 6 files changed, 30 insertions(+), 30 deletions(-) > > ... > >> Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC: >> >> 1) How does this patch close [1]? >> As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode." >> Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission, >> i.e. referencing the inode in a VFS path walk while destroying it... >> Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy. > > The VFS folks can likely provide a better, or perhaps a more correct > answer, but my understanding is that during the path walk the inode is > protected by a RCU lock which allows for multiple threads to access > the inode simultaneously; this could result in some cases where one > thread is destroying the inode while another is accessing it. > Changing this would require changes to the VFS code, and I'm not sure > why you would want to change it anyway, the performance win of using > RCU here is likely significant. > >> 2) Is there a guarantee that inode_free_by_rcu and i_callback will be called within the same RCU grace period? > > I'm not an RCU expert, but I don't believe there are any guarantees > that the inode_free_by_rcu() and the inode's own free routines are > going to be called within the same RCU grace period (not really > applicable as inode_free_by_rcu() isn't called *during* a grace > period, but *after* the grace period of the associated > security_inode_free() call). However, this patch does not rely on > synchronization between the inode and inode LSM free routine in > inode_free_by_rcu(); the inode_free_by_rcu() function and the new > inode_free_security_rcu() LSM callback does not have a pointer to the > inode, only the inode's LSM blob. I agree that it is a bit odd, but > freeing the inode and inode's LSM blob independently of each other > should not cause a problem so long as the inode is no longer in use > (hence the RCU callbacks). Paul, many thanks for your answer. I will try to clarify the issue, because fsnotify was a bad example. Here is the related code taken from v10. void security_inode_free(struct inode *inode) { call_void_hook(inode_free_security, inode); /* * The inode may still be referenced in a path walk and * a call to security_inode_permission() can be made * after inode_free_security() is called. Ideally, the VFS * wouldn't do this, but fixing that is a much harder * job. For now, simply free the i_security via RCU, and * leave the current inode->i_security pointer intact. * The inode will be freed after the RCU grace period too. */ if (inode->i_security) call_rcu((struct rcu_head *)inode->i_security, inode_free_by_rcu); } void __destroy_inode(struct inode *inode) { BUG_ON(inode_has_buffers(inode)); inode_detach_wb(inode); security_inode_free(inode); fsnotify_inode_delete(inode); locks_free_lock_context(inode); if (!inode->i_nlink) { WARN_ON(atomic_long_read(&inode->i_sb->s_remove_count) == 0); atomic_long_dec(&inode->i_sb->s_remove_count); } #ifdef CONFIG_FS_POSIX_ACL if (inode->i_acl && !is_uncached_acl(inode->i_acl)) posix_acl_release(inode->i_acl); if (inode->i_default_acl && !is_uncached_acl(inode->i_default_acl)) posix_acl_release(inode->i_default_acl); #endif this_cpu_dec(nr_inodes); } static void destroy_inode(struct inode *inode) { const struct super_operations *ops = inode->i_sb->s_op; BUG_ON(!list_empty(&inode->i_lru)); __destroy_inode(inode); if (ops->destroy_inode) { ops->destroy_inode(inode); if (!ops->free_inode) return; } inode->free_inode = ops->free_inode; call_rcu(&inode->i_rcu, i_callback); } Yes, inode_free_by_rcu() is being called after the grace period of the associated security_inode_free(). i_callback() is also called after the grace period, but is it always the same grace period as in the case of inode_free_by_rcu()? If not in general, maybe it could be a problem. Explanation below. If there is a function call leading to the end of the grace period between call_rcu(inode_free_by_rcu) and call_rcu(i_callback) (by reaching a CPU quiescent state or another mechanism?), there will be a small time window, when the inode security context is released, but the inode itself not, because call_rcu(i_callback) was not called yet. So in that case each access to inode security blob leads to UAF. For example, see invoking ops->destroy_inode() after call_rcu(inode_free_by_rcu) but *before* call_rcu(i_callback). If destroy_inode() may sleep, can be reached end of the grace period? destroy_inode() is *before* call_rcu(i_callback), therefore simultaneous access to the inode during path lookup may be performed. Note: I use destroy_inode() as *an example* of the idea. I'm not expert at all in fsnotify, posix ACL, VFS in general and RCU, too. In the previous message I only mentioned fsnotify, but it was only as an example. I think that destroy_inode() is a better example of the idea I wanted to express. I repeat that I'm aware that this RFC does not aim to solve this issue. But it can be unpleasant to get another UAF in a production environment. And regarding the UAF in [1], it seems very strange to me. The object managed by Landlock was *not* dereferenced. There was access to the inode security blob itself. static void hook_inode_free_security(struct inode *const inode) { /* * All inodes must already have been untied from their object by * release_inode() or hook_sb_delete(). */ WARN_ON_ONCE(landlock_inode(inode)->object); } But security blob is released at the end of the grace period related to security_inode_free(): call_rcu(inode_free_by_rcu) is *after* invoking all registered inode_free_security hooks. The only place of releasing inode security blob I see in inode_free_by_rcu(). Thus, I think, there was another call of __destroy_inode(). Or general protection fault was not caused by UAF. Any ideas? Can someone explain it? I don't understand what and *how* happened. If Landlock had dereferenced the object it manages, this RFC could be the right one (if it were a dereference from a fast path lookup, of course). [1] https://lore.kernel.org/all/00000000000076ba3b0617f65cc8@google.com/ > >> If not, can the security context be released earlier than the inode itself? > > Possibly, but it should only happen after the inode is no longer in > use (the call_rcu () callback should ensure that we are past all of > the currently executing RCU critical sections). > >> If yes, can be executed >> inode_permission concurrently, leading to UAF of inode security context in security_inode_permission? > > I do not believe so, see the discussion above, but I welcome any corrections. > >> Can fsnotify affect this (leading to different RCU grace periods)? (Again probably a question for VFS people.) > > If fsnotify is affecting this negatively then I suspect that is a > reason for much larger concern as I believe that would indicate a > problem with fsnotify and the inode locking scheme. >
On 23. 7. 2024 5:34, Dave Chinner wrote: > On Mon, Jul 22, 2024 at 03:46:36PM -0400, Paul Moore wrote: >> On Mon, Jul 22, 2024 at 8:30 AM Matus Jokay <matus.jokay@stuba.sk> wrote: >>> On 10. 7. 2024 12:40, Mickaël Salaün wrote: >>>> On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote: >>>>> The LSM framework has an existing inode_free_security() hook which >>>>> is used by LSMs that manage state associated with an inode, but >>>>> due to the use of RCU to protect the inode, special care must be >>>>> taken to ensure that the LSMs do not fully release the inode state >>>>> until it is safe from a RCU perspective. >>>>> >>>>> This patch implements a new inode_free_security_rcu() implementation >>>>> hook which is called when it is safe to free the LSM's internal inode >>>>> state. Unfortunately, this new hook does not have access to the inode >>>>> itself as it may already be released, so the existing >>>>> inode_free_security() hook is retained for those LSMs which require >>>>> access to the inode. >>>>> >>>>> Signed-off-by: Paul Moore <paul@paul-moore.com> >>>> >>>> I like this new hook. It is definitely safer than the current approach. >>>> >>>> To make it more consistent, I think we should also rename >>>> security_inode_free() to security_inode_put() to highlight the fact that >>>> LSM implementations should not free potential pointers in this blob >>>> because they could still be dereferenced in a path walk. >>>> >>>>> --- >>>>> include/linux/lsm_hook_defs.h | 1 + >>>>> security/integrity/ima/ima.h | 2 +- >>>>> security/integrity/ima/ima_iint.c | 20 ++++++++------------ >>>>> security/integrity/ima/ima_main.c | 2 +- >>>>> security/landlock/fs.c | 9 ++++++--- >>>>> security/security.c | 26 +++++++++++++------------- >>>>> 6 files changed, 30 insertions(+), 30 deletions(-) >> >> ... >> >>> Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC: >>> >>> 1) How does this patch close [1]? >>> As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode." >>> Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission, >>> i.e. referencing the inode in a VFS path walk while destroying it... >>> Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy. >> >> The VFS folks can likely provide a better, or perhaps a more correct >> answer, but my understanding is that during the path walk the inode is >> protected by a RCU lock which allows for multiple threads to access >> the inode simultaneously; this could result in some cases where one >> thread is destroying the inode while another is accessing it. > > Shouldn't may_lookup() be checking the inode for (I_NEW | > I_WILLFREE | I_FREE) so that it doesn't access an inode either not > completely initialised or being evicted during the RCU path walk? > All accesses to the VFS inode that don't have explicit reference > counts have to do these checks... > > IIUC, at the may_lookup() point, the RCU pathwalk doesn't have a > fully validate reference count to the dentry or the inode at this > point, so it seems accessing random objects attached to an inode > that can be anywhere in the setup or teardown process isn't at all > safe... > > -Dave. Indeed, but maybe only VFS maintainers can give us the answer to why may_lookup() does not need at some point check for (I_NEW | I_WILL_FREE | I_FREEING). Thanks, mY
On Tue, Jul 23, 2024 at 01:34:44PM GMT, Dave Chinner wrote: > On Mon, Jul 22, 2024 at 03:46:36PM -0400, Paul Moore wrote: > > On Mon, Jul 22, 2024 at 8:30 AM Matus Jokay <matus.jokay@stuba.sk> wrote: > > > On 10. 7. 2024 12:40, Mickaël Salaün wrote: > > > > On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote: > > > >> The LSM framework has an existing inode_free_security() hook which > > > >> is used by LSMs that manage state associated with an inode, but > > > >> due to the use of RCU to protect the inode, special care must be > > > >> taken to ensure that the LSMs do not fully release the inode state > > > >> until it is safe from a RCU perspective. > > > >> > > > >> This patch implements a new inode_free_security_rcu() implementation > > > >> hook which is called when it is safe to free the LSM's internal inode > > > >> state. Unfortunately, this new hook does not have access to the inode > > > >> itself as it may already be released, so the existing > > > >> inode_free_security() hook is retained for those LSMs which require > > > >> access to the inode. > > > >> > > > >> Signed-off-by: Paul Moore <paul@paul-moore.com> > > > > > > > > I like this new hook. It is definitely safer than the current approach. > > > > > > > > To make it more consistent, I think we should also rename > > > > security_inode_free() to security_inode_put() to highlight the fact that > > > > LSM implementations should not free potential pointers in this blob > > > > because they could still be dereferenced in a path walk. > > > > > > > >> --- > > > >> include/linux/lsm_hook_defs.h | 1 + > > > >> security/integrity/ima/ima.h | 2 +- > > > >> security/integrity/ima/ima_iint.c | 20 ++++++++------------ > > > >> security/integrity/ima/ima_main.c | 2 +- > > > >> security/landlock/fs.c | 9 ++++++--- > > > >> security/security.c | 26 +++++++++++++------------- > > > >> 6 files changed, 30 insertions(+), 30 deletions(-) > > > > ... > > > > > Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC: > > > > > > 1) How does this patch close [1]? > > > As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode." > > > Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission, > > > i.e. referencing the inode in a VFS path walk while destroying it... > > > Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy. > > > > The VFS folks can likely provide a better, or perhaps a more correct > > answer, but my understanding is that during the path walk the inode is > > protected by a RCU lock which allows for multiple threads to access > > the inode simultaneously; this could result in some cases where one > > thread is destroying the inode while another is accessing it. > > Shouldn't may_lookup() be checking the inode for (I_NEW | > I_WILLFREE | I_FREE) so that it doesn't access an inode either not > completely initialised or being evicted during the RCU path walk? Going from memory since I don't have time to go really into the weeds. A non-completely initalised inode shouldn't appear in path lookup. Before the inode is attached to a dentry I_NEW would have been removed otherwise this is a bug. That can either happen via unlock_new_inode() and d_splice_alias() or in some cases directly via d_instantiate_new(). Concurrent inode lookup calls on the same inode (e.g., iget_locked() and friends) will sleep until I_NEW is cleared. > All accesses to the VFS inode that don't have explicit reference > counts have to do these checks... > > IIUC, at the may_lookup() point, the RCU pathwalk doesn't have a > fully validate reference count to the dentry or the inode at this > point, so it seems accessing random objects attached to an inode > that can be anywhere in the setup or teardown process isn't at all > safe... may_lookup() cannot encounter inodes in random states. It will start from a well-known struct path and sample sequence counters for rename, mount, and dentry changes. Each component will be subject to checks after may_lookup() via these sequence counters to ensure that no change occurred that would invalidate the lookup just done. To be precise to ensure that no state could be reached via rcu that couldn't have been reached via ref walk. So may_lookup() may be called on something that's about to be freed (concurrent unlink on a directory that's empty that we're trying to create or lookup something nonexistent under) while we're looking at it but all the machinery is in place so that it will be detected and force a drop out of rcu and into reference walking mode. When may_lookup() calls inode_permission() it only calls into the filesystem itself if the filesystem has a custom i_op->permission() handler. And if it has to call into the filesystem it passes MAY_NOT_BLOCK to inform the filesystem about this. And in those cases the filesystem must ensure any additional data structures can safely be accessed under rcu_read_lock() (documented in path_lookup.rst). If the filesystem detects that it cannot safely handle this or detects that something is invalid it can return -ECHILD causing the VFS to drop out of rcu and into ref walking mode to redo the lookup. That may happen directly in may_lookup() it unconditionally happens in walk_component() when it's verified that the parent had no changes while we were looking at it. The same logic extends to security modules. Both selinux and smack handle MAY_NOT_BLOCK calls from security_inode_permission() with e.g., selinux returning -ECHILD in case the inode security context isn't properly initialized causing the VFS to drop into ref walking mode and allowing selinux to redo the initialization. Checking inode state flags isn't needed because the VFS already handles all of this via other means as e.g., sequence counters in various core structs. It also likely wouldn't help because we'd have to take locks to access i_state or sample i_state before calling into inode_permission() and then it could still change behind our back. It's also the wrong layer as we're dealing almost exclusively with dentries as the main data structure during lookup. Fwiw, a bunch of this is documented in path_lookup.rst, vfs.rst, and porting.rst. (I'm running out of time with other stuff so I want to point out that I can't really spend a lot more time on this thread tbh.)
On Tue, Jul 23, 2024 at 5:27 AM Matus Jokay <matus.jokay@stuba.sk> wrote: > On 22. 7. 2024 21:46, Paul Moore wrote: > > On Mon, Jul 22, 2024 at 8:30 AM Matus Jokay <matus.jokay@stuba.sk> wrote: > >> On 10. 7. 2024 12:40, Mickaël Salaün wrote: > >>> On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote: > >>>> The LSM framework has an existing inode_free_security() hook which > >>>> is used by LSMs that manage state associated with an inode, but > >>>> due to the use of RCU to protect the inode, special care must be > >>>> taken to ensure that the LSMs do not fully release the inode state > >>>> until it is safe from a RCU perspective. > >>>> > >>>> This patch implements a new inode_free_security_rcu() implementation > >>>> hook which is called when it is safe to free the LSM's internal inode > >>>> state. Unfortunately, this new hook does not have access to the inode > >>>> itself as it may already be released, so the existing > >>>> inode_free_security() hook is retained for those LSMs which require > >>>> access to the inode. > >>>> > >>>> Signed-off-by: Paul Moore <paul@paul-moore.com> > >>> > >>> I like this new hook. It is definitely safer than the current approach. > >>> > >>> To make it more consistent, I think we should also rename > >>> security_inode_free() to security_inode_put() to highlight the fact that > >>> LSM implementations should not free potential pointers in this blob > >>> because they could still be dereferenced in a path walk. > >>> > >>>> --- > >>>> include/linux/lsm_hook_defs.h | 1 + > >>>> security/integrity/ima/ima.h | 2 +- > >>>> security/integrity/ima/ima_iint.c | 20 ++++++++------------ > >>>> security/integrity/ima/ima_main.c | 2 +- > >>>> security/landlock/fs.c | 9 ++++++--- > >>>> security/security.c | 26 +++++++++++++------------- > >>>> 6 files changed, 30 insertions(+), 30 deletions(-) > > > > ... > > > >> Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC: > >> > >> 1) How does this patch close [1]? > >> As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode." > >> Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission, > >> i.e. referencing the inode in a VFS path walk while destroying it... > >> Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy. > > > > The VFS folks can likely provide a better, or perhaps a more correct > > answer, but my understanding is that during the path walk the inode is > > protected by a RCU lock which allows for multiple threads to access > > the inode simultaneously; this could result in some cases where one > > thread is destroying the inode while another is accessing it. > > Changing this would require changes to the VFS code, and I'm not sure > > why you would want to change it anyway, the performance win of using > > RCU here is likely significant. > > > >> 2) Is there a guarantee that inode_free_by_rcu and i_callback will be called within the same RCU grace period? > > > > I'm not an RCU expert, but I don't believe there are any guarantees > > that the inode_free_by_rcu() and the inode's own free routines are > > going to be called within the same RCU grace period (not really > > applicable as inode_free_by_rcu() isn't called *during* a grace > > period, but *after* the grace period of the associated > > security_inode_free() call). However, this patch does not rely on > > synchronization between the inode and inode LSM free routine in > > inode_free_by_rcu(); the inode_free_by_rcu() function and the new > > inode_free_security_rcu() LSM callback does not have a pointer to the > > inode, only the inode's LSM blob. I agree that it is a bit odd, but > > freeing the inode and inode's LSM blob independently of each other > > should not cause a problem so long as the inode is no longer in use > > (hence the RCU callbacks). > > Paul, many thanks for your answer. > > I will try to clarify the issue, because fsnotify was a bad example. > Here is the related code taken from v10. > > void security_inode_free(struct inode *inode) > { > call_void_hook(inode_free_security, inode); > /* > * The inode may still be referenced in a path walk and > * a call to security_inode_permission() can be made > * after inode_free_security() is called. Ideally, the VFS > * wouldn't do this, but fixing that is a much harder > * job. For now, simply free the i_security via RCU, and > * leave the current inode->i_security pointer intact. > * The inode will be freed after the RCU grace period too. > */ > if (inode->i_security) > call_rcu((struct rcu_head *)inode->i_security, > inode_free_by_rcu); > } > > void __destroy_inode(struct inode *inode) > { > BUG_ON(inode_has_buffers(inode)); > inode_detach_wb(inode); > security_inode_free(inode); > fsnotify_inode_delete(inode); > locks_free_lock_context(inode); > if (!inode->i_nlink) { > WARN_ON(atomic_long_read(&inode->i_sb->s_remove_count) == 0); > atomic_long_dec(&inode->i_sb->s_remove_count); > } > > #ifdef CONFIG_FS_POSIX_ACL > if (inode->i_acl && !is_uncached_acl(inode->i_acl)) > posix_acl_release(inode->i_acl); > if (inode->i_default_acl && !is_uncached_acl(inode->i_default_acl)) > posix_acl_release(inode->i_default_acl); > #endif > this_cpu_dec(nr_inodes); > } > > static void destroy_inode(struct inode *inode) > { > const struct super_operations *ops = inode->i_sb->s_op; > > BUG_ON(!list_empty(&inode->i_lru)); > __destroy_inode(inode); > if (ops->destroy_inode) { > ops->destroy_inode(inode); > if (!ops->free_inode) > return; > } > inode->free_inode = ops->free_inode; > call_rcu(&inode->i_rcu, i_callback); > } > > Yes, inode_free_by_rcu() is being called after the grace period of the associated > security_inode_free(). i_callback() is also called after the grace period, but is it > always the same grace period as in the case of inode_free_by_rcu()? If not in general, > maybe it could be a problem. Explanation below. > > If there is a function call leading to the end of the grace period between > call_rcu(inode_free_by_rcu) and call_rcu(i_callback) (by reaching a CPU quiescent state > or another mechanism?), there will be a small time window, when the inode security > context is released, but the inode itself not, because call_rcu(i_callback) was not called > yet. So in that case each access to inode security blob leads to UAF. While it should be possible for the inode's LSM blob to be free'd prior to the inode itself, the RCU callback mechanism provided by call_rcu() should ensure that both the LSM's free routine and the inode's free routine happen at a point in time after the current RCU critical sections have lapsed and the inode is no longer being accessed. The LSM's inode_free_rcu callback can run independent of the inode's callback as it doesn't access the inode and if it does happen to run before the inode's RCU callback that should also be okay as we are past the original RCU critical sections and the inode should no longer be in use. If the inode is still in use by the time the LSM's RCU callback is triggered then there is a flaw in the inode RCU/locking/synchronization code. It is also worth mentioning that while this patch shuffles around some code at the LSM layer, the basic idea of the LSM using a RCU callback to free state associated with an inode is far from new. While that doesn't mean there isn't a problem, we have a few years of experience across a large number of systems, that would indicate this isn't a problem. > For example, see invoking ops->destroy_inode() after call_rcu(inode_free_by_rcu) but > *before* call_rcu(i_callback). If destroy_inode() may sleep, can be reached end of the > grace period? destroy_inode() is *before* call_rcu(i_callback), therefore simultaneous > access to the inode during path lookup may be performed. Note: I use destroy_inode() as > *an example* of the idea. I'm not expert at all in fsnotify, posix ACL, VFS in general > and RCU, too. > > In the previous message I only mentioned fsnotify, but it was only as an example. > I think that destroy_inode() is a better example of the idea I wanted to express. > > I repeat that I'm aware that this RFC does not aim to solve this issue. But it can be > unpleasant to get another UAF in a production environment. I'm open to there being another fix needed, or a change to this fix, but I don't believe the problems you are describing are possible. Of course it's very possible that I'm wrong, so if you are aware of an issue I would appreciate a concrete example explaining the code path and timeline between tasks A and B that would trigger the flaw ... and yes, patches are always welcome ;)
On Tue, Jul 23, 2024 at 11:19 AM Christian Brauner <brauner@kernel.org> wrote: > The same logic extends to security modules. Both selinux and smack > handle MAY_NOT_BLOCK calls from security_inode_permission() with e.g., > selinux returning -ECHILD in case the inode security context isn't > properly initialized causing the VFS to drop into ref walking mode and > allowing selinux to redo the initialization. Since we are talking mostly about the destruction of an inode, it is worth mentioning that the SELinux -ECHILD case that Christian is referring to isn't a common occurrence as SELinux only invalidates inode labels on network filesystems under certain circumstances (chase the security_inode_invalidate_secctx() hook). On most normal SELinux systems inodes are labeled as part of the creation process so long as a SELinux policy is loaded into the kernel; this does mean that there is a window during early boot where the inodes are in an invalid state, but they are properly initialized later (there are different ways this could happen). For local filesystems with inodes created after the SELinux policy is loaded, inodes have a valid SELinux label from their very creation up until their memory is released.
On Tue, Jul 23, 2024 at 05:19:40PM +0200, Christian Brauner wrote: > On Tue, Jul 23, 2024 at 01:34:44PM GMT, Dave Chinner wrote: > > All accesses to the VFS inode that don't have explicit reference > > counts have to do these checks... > > > > IIUC, at the may_lookup() point, the RCU pathwalk doesn't have a > > fully validate reference count to the dentry or the inode at this > > point, so it seems accessing random objects attached to an inode > > that can be anywhere in the setup or teardown process isn't at all > > safe... > > may_lookup() cannot encounter inodes in random states. It will start > from a well-known struct path and sample sequence counters for rename, > mount, and dentry changes. Each component will be subject to checks > after may_lookup() via these sequence counters to ensure that no change > occurred that would invalidate the lookup just done. To be precise to > ensure that no state could be reached via rcu that couldn't have been > reached via ref walk. > > So may_lookup() may be called on something that's about to be freed > (concurrent unlink on a directory that's empty that we're trying to > create or lookup something nonexistent under) while we're looking at it > but all the machinery is in place so that it will be detected and force > a drop out of rcu and into reference walking mode. Yes, but... > When may_lookup() calls inode_permission() it only calls into the > filesystem itself if the filesystem has a custom i_op->permission() > handler. And if it has to call into the filesystem it passes > MAY_NOT_BLOCK to inform the filesystem about this. And in those cases > the filesystem must ensure any additional data structures can safely be > accessed under rcu_read_lock() (documented in path_lookup.rst). The problem isn't the call into the filesystem - it's may_lookup() passing the inode to security modules where we dereference inode->i_security and assume that it is valid for the life of the object access being made. That's my point - if we have a lookup race and the inode is being destroyed at this point (i.e. I_FREEING is set) inode->i_security *is not valid* and should not be accessed by *anyone*. > Checking inode state flags isn't needed because the VFS already handles > all of this via other means as e.g., sequence counters in various core > structs. I don't think that is sufficient to avoid races with inode teardown. The inode can be destroyed between the initial dentry count sequence sampling and the "done processing, check that the seq count is unchanged" validation to solidify the path. We've seen this before with ->get_link fast path that stores the link name in inode->i_link, and both inode->i_link and ->get_link are accessed during RCU It is documented as such: If the filesystem stores the symlink target in ->i_link, the VFS may use it directly without calling ->get_link(); however, ->get_link() must still be provided. ->i_link must not be freed until after an RCU grace period. Writing to ->i_link post-iget() time requires a 'release' memory barrier. XFS doesn't use RCU mode ->get_link or use ->i_link anymore, because its has a corner case in it's inode life cycle since 2007 where it can recycle inodes before the RCU grace period expires. It took 15 years for this issue to be noticed, but the fix for this specific symptom is to not allow the VFS direct access to the underlying filesystem memory that does not follow VFS inode RCU lifecycle rules. There was another symptom of this issue - ->get_link changed (i.e. went to null) on certain types of inode reuse - and that caused pathwalk to panic on a NULL pointer. Ian posted this fix for the issue: https://lore.kernel.org/linux-xfs/164180589176.86426.501271559065590169.stgit@mickey.themaw.net/ Which revalidates the dentry sequence number before calling ->get_link(). This clearly demonstrates we cannot rely on the existing pathwalk dentry sequence number checks to catch an inode concurrently moving into I_FREEING state and changing/freeing inode object state concurrently in a way that affects the pathwalk operation. My point here is not about XFS - my point is that every object attached to an inode that is accessed during a RCU pathwalk *must* follow the same rules as memory attached to inode->i_link. The only alternative to that (i.e. if we continue to free RCU pathwalk visible objects in the evict() path) is to prevent pathwalk from tripping over I_FREEING inodes. If we decide that every pathwalk accessed object attached to the inode needs to be RCU freed, then __destroy_inode() is the wrong place to be freeing them - i_callback() (the call_rcu() inode freeing callback) is the place to be freeing these objects. At this point, the subsystems that own the objects don't have to care about RCU at all - the objects have already gone through the necessary grace period that makes them safe to be freed immediately. That's a far better solution than forcing every LSM and FS developer to have to fully understand how pathwalk and inode lifecycles interact to get stuff right... > It also likely wouldn't help because we'd have to take locks to > access i_state or sample i_state before calling into inode_permission() > and then it could still change behind our back. It's also the wrong > layer as we're dealing almost exclusively with dentries as the main data > structure during lookup. Yup, I never said that's how we should fix the problem. All I'm stating is that pathwalk vs I_FREEING is currently not safe and that I_FREEING is detectable from the pathwalk side. This observation could help provide a solution, but it's almost certainly not the solution to the problem... -Dave.
On 23. 7. 2024 21:48, Paul Moore wrote: > On Tue, Jul 23, 2024 at 5:27 AM Matus Jokay <matus.jokay@stuba.sk> wrote: >> On 22. 7. 2024 21:46, Paul Moore wrote: >>> On Mon, Jul 22, 2024 at 8:30 AM Matus Jokay <matus.jokay@stuba.sk> wrote: >>>> On 10. 7. 2024 12:40, Mickaël Salaün wrote: >>>>> On Tue, Jul 09, 2024 at 10:40:30PM -0400, Paul Moore wrote: >>>>>> The LSM framework has an existing inode_free_security() hook which >>>>>> is used by LSMs that manage state associated with an inode, but >>>>>> due to the use of RCU to protect the inode, special care must be >>>>>> taken to ensure that the LSMs do not fully release the inode state >>>>>> until it is safe from a RCU perspective. >>>>>> >>>>>> This patch implements a new inode_free_security_rcu() implementation >>>>>> hook which is called when it is safe to free the LSM's internal inode >>>>>> state. Unfortunately, this new hook does not have access to the inode >>>>>> itself as it may already be released, so the existing >>>>>> inode_free_security() hook is retained for those LSMs which require >>>>>> access to the inode. >>>>>> >>>>>> Signed-off-by: Paul Moore <paul@paul-moore.com> >>>>> >>>>> I like this new hook. It is definitely safer than the current approach. >>>>> >>>>> To make it more consistent, I think we should also rename >>>>> security_inode_free() to security_inode_put() to highlight the fact that >>>>> LSM implementations should not free potential pointers in this blob >>>>> because they could still be dereferenced in a path walk. >>>>> >>>>>> --- >>>>>> include/linux/lsm_hook_defs.h | 1 + >>>>>> security/integrity/ima/ima.h | 2 +- >>>>>> security/integrity/ima/ima_iint.c | 20 ++++++++------------ >>>>>> security/integrity/ima/ima_main.c | 2 +- >>>>>> security/landlock/fs.c | 9 ++++++--- >>>>>> security/security.c | 26 +++++++++++++------------- >>>>>> 6 files changed, 30 insertions(+), 30 deletions(-) >>> >>> ... >>> >>>> Sorry for the questions, but for several weeks I can't find answers to two things related to this RFC: >>>> >>>> 1) How does this patch close [1]? >>>> As Mickaël pointed in [2], "It looks like security_inode_free() is called two times on the same inode." >>>> Indeed, it does not seem from the backtrace that it is a case of race between destroy_inode and inode_permission, >>>> i.e. referencing the inode in a VFS path walk while destroying it... >>>> Please, can anyone tell me how this situation could have happened? Maybe folks from VFS... I added them to the copy. >>> >>> The VFS folks can likely provide a better, or perhaps a more correct >>> answer, but my understanding is that during the path walk the inode is >>> protected by a RCU lock which allows for multiple threads to access >>> the inode simultaneously; this could result in some cases where one >>> thread is destroying the inode while another is accessing it. >>> Changing this would require changes to the VFS code, and I'm not sure >>> why you would want to change it anyway, the performance win of using >>> RCU here is likely significant. >>> >>>> 2) Is there a guarantee that inode_free_by_rcu and i_callback will be called within the same RCU grace period? >>> >>> I'm not an RCU expert, but I don't believe there are any guarantees >>> that the inode_free_by_rcu() and the inode's own free routines are >>> going to be called within the same RCU grace period (not really >>> applicable as inode_free_by_rcu() isn't called *during* a grace >>> period, but *after* the grace period of the associated >>> security_inode_free() call). However, this patch does not rely on >>> synchronization between the inode and inode LSM free routine in >>> inode_free_by_rcu(); the inode_free_by_rcu() function and the new >>> inode_free_security_rcu() LSM callback does not have a pointer to the >>> inode, only the inode's LSM blob. I agree that it is a bit odd, but >>> freeing the inode and inode's LSM blob independently of each other >>> should not cause a problem so long as the inode is no longer in use >>> (hence the RCU callbacks). >> >> Paul, many thanks for your answer. >> >> I will try to clarify the issue, because fsnotify was a bad example. >> Here is the related code taken from v10. >> >> void security_inode_free(struct inode *inode) >> { >> call_void_hook(inode_free_security, inode); >> /* >> * The inode may still be referenced in a path walk and >> * a call to security_inode_permission() can be made >> * after inode_free_security() is called. Ideally, the VFS >> * wouldn't do this, but fixing that is a much harder >> * job. For now, simply free the i_security via RCU, and >> * leave the current inode->i_security pointer intact. >> * The inode will be freed after the RCU grace period too. >> */ >> if (inode->i_security) >> call_rcu((struct rcu_head *)inode->i_security, >> inode_free_by_rcu); >> } >> >> void __destroy_inode(struct inode *inode) >> { >> BUG_ON(inode_has_buffers(inode)); >> inode_detach_wb(inode); >> security_inode_free(inode); >> fsnotify_inode_delete(inode); >> locks_free_lock_context(inode); >> if (!inode->i_nlink) { >> WARN_ON(atomic_long_read(&inode->i_sb->s_remove_count) == 0); >> atomic_long_dec(&inode->i_sb->s_remove_count); >> } >> >> #ifdef CONFIG_FS_POSIX_ACL >> if (inode->i_acl && !is_uncached_acl(inode->i_acl)) >> posix_acl_release(inode->i_acl); >> if (inode->i_default_acl && !is_uncached_acl(inode->i_default_acl)) >> posix_acl_release(inode->i_default_acl); >> #endif >> this_cpu_dec(nr_inodes); >> } >> >> static void destroy_inode(struct inode *inode) >> { >> const struct super_operations *ops = inode->i_sb->s_op; >> >> BUG_ON(!list_empty(&inode->i_lru)); >> __destroy_inode(inode); >> if (ops->destroy_inode) { >> ops->destroy_inode(inode); >> if (!ops->free_inode) >> return; >> } >> inode->free_inode = ops->free_inode; >> call_rcu(&inode->i_rcu, i_callback); >> } >> >> Yes, inode_free_by_rcu() is being called after the grace period of the associated >> security_inode_free(). i_callback() is also called after the grace period, but is it >> always the same grace period as in the case of inode_free_by_rcu()? If not in general, >> maybe it could be a problem. Explanation below. >> >> If there is a function call leading to the end of the grace period between >> call_rcu(inode_free_by_rcu) and call_rcu(i_callback) (by reaching a CPU quiescent state >> or another mechanism?), there will be a small time window, when the inode security >> context is released, but the inode itself not, because call_rcu(i_callback) was not called >> yet. So in that case each access to inode security blob leads to UAF. > > While it should be possible for the inode's LSM blob to be free'd > prior to the inode itself, the RCU callback mechanism provided by > call_rcu() should ensure that both the LSM's free routine and the > inode's free routine happen at a point in time after the current RCU > critical sections have lapsed and the inode is no longer being It is questionable whether the "current RCU CS" refers to both functions together, see the diagram below. > accessed. The LSM's inode_free_rcu callback can run independent of > the inode's callback as it doesn't access the inode and if it does Agree, there are two independent calls as you described. > happen to run before the inode's RCU callback that should also be okay > as we are past the original RCU critical sections and the inode should > no longer be in use. If the inode is still in use by the time the I think the inode can be still in use in may_lookup() after the security RCU callback function, see below. > LSM's RCU callback is triggered then there is a flaw in the inode > RCU/locking/synchronization code. I don't think it is a flaw. It may be the use of the RCU mechanism with incorrect assumption, that both RCU callbacks belong to the common GP. > > It is also worth mentioning that while this patch shuffles around some > code at the LSM layer, the basic idea of the LSM using a RCU callback > to free state associated with an inode is far from new. While that > doesn't mean there isn't a problem, we have a few years of experience > across a large number of systems, that would indicate this isn't a > problem. I agree. But history only shows that it is very difficult to achieve this race. And yes, I agree that we may address this issue when it turns out to be relevant. > >> For example, see invoking ops->destroy_inode() after call_rcu(inode_free_by_rcu) but >> *before* call_rcu(i_callback). If destroy_inode() may sleep, can be reached end of the >> grace period? destroy_inode() is *before* call_rcu(i_callback), therefore simultaneous >> access to the inode during path lookup may be performed. Note: I use destroy_inode() as >> *an example* of the idea. I'm not expert at all in fsnotify, posix ACL, VFS in general >> and RCU, too. >> >> In the previous message I only mentioned fsnotify, but it was only as an example. >> I think that destroy_inode() is a better example of the idea I wanted to express. >> >> I repeat that I'm aware that this RFC does not aim to solve this issue. But it can be >> unpleasant to get another UAF in a production environment. > > I'm open to there being another fix needed, or a change to this fix, > but I don't believe the problems you are describing are possible. Of > course it's very possible that I'm wrong, so if you are aware of an > issue I would appreciate a concrete example explaining the code path > and timeline between tasks A and B that would trigger the flaw ... and > yes, patches are always welcome ;) > Oh patches... Even from the message from Dave it can be seen that the cooperation of people from VFS and some very good idea of a solution are needed. Of course, provided that the scheme below was correct. I would be very happy if someone could explain to me why this cannot be so! CPU related to RCU callbacks task A task B ================== ================================= ======================= ... __destroy_inode() ... security_inode_free() ... call_rcu(inode_free_by_rcu) ... ops->destroy_inode() // *suppose* may sleep // end of GP; // inode *can* be used as // i_callback does not // belong to this GP inode_free_by_rcu() ------------------------------------------------------------------------------------------------------ // start of another GP ... ... rcu_read_lock() call_rcu(i_callback) ... ... security_inode_permission() // <-- UAF ... rcu_read_unlock() ... // end of GP; // right now inode is not in use anymore i_callback() Why is it difficult to achieve this race? The GP (grace period) between two call_rcu() calls must come to an end. Again, I chose as an example of this situation destroy_inode() function. But there can be others in the code path, I really don't know. I looked at the destroy_inode() functions (kernel v10) and from a quick look I found overlayfs, which directly calls dput(), see [1]. If it is possible to force printk() to sleep, then it is possible to consider afs [2] and, under certain circumstances, ext4 [3]. After a deeper analysis, maybe even more. I think it is difficult to divide the GP exactly as this situation requires. That's probably why this hasn't appeared yet. Or it is impossible to achieve it. All the better. But that would require an audit of the code between our two call_rcu()'s, whether at some point it cannot come to the end of the GP. And I agree with the opinion that as long as this type of error has not yet occurred, we can just play possum. When it comes to the crunch, we can deal with it more deeply. [1] https://elixir.bootlin.com/linux/v6.10/source/fs/overlayfs/super.c#L181 [2] https://elixir.bootlin.com/linux/v6.10/source/fs/afs/super.c#L718 [3] https://elixir.bootlin.com/linux/v6.10/source/fs/ext4/super.c#L1448
> The problem isn't the call into the filesystem - it's may_lookup() > passing the inode to security modules where we dereference > inode->i_security and assume that it is valid for the life of the > object access being made. > > That's my point - if we have a lookup race and the inode is being > destroyed at this point (i.e. I_FREEING is set) inode->i_security > *is not valid* and should not be accessed by *anyone*. It's valid. The inode_free_security hooks unlist or deregister the context as e.g., selinux_inode_free_security() does. It's fine to access inode->i_security after the individual hooks have been called; before or within the delayed freeing of i_security. And selinux and bpf are the only cases where deregistration happens while also implementing or in the case of bpf allowing to implement security_inode_permission(). > If we decide that every pathwalk accessed object attached to the > inode needs to be RCU freed, then __destroy_inode() is the wrong > place to be freeing them - i_callback() (the call_rcu() inode The superblock might already be gone by the time free_inode() is called. And stuff like selinux accesses the superblock from inode->i_sb during security_inode_free_security(). So moving it in there isn't an option. It needs to be called before. If one wanted it in there the obvious way to do it would be to split deregistration and freeing into two hooks security_inode_deregister() and then move the rcu part of security_inode_free() into i_callback so it gets wasted together with the inode. But i_callback() isn't called for xfs and so the way it's currently done is so far the best.
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index 8fd87f823d3a..abe6b0ef892a 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -114,6 +114,7 @@ LSM_HOOK(int, 0, path_notify, const struct path *path, u64 mask, unsigned int obj_type) LSM_HOOK(int, 0, inode_alloc_security, struct inode *inode) LSM_HOOK(void, LSM_RET_VOID, inode_free_security, struct inode *inode) +LSM_HOOK(void, LSM_RET_VOID, inode_free_security_rcu, void *) LSM_HOOK(int, -EOPNOTSUPP, inode_init_security, struct inode *inode, struct inode *dir, const struct qstr *qstr, struct xattr *xattrs, int *xattr_count) diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h index 3e568126cd48..e2a2e4c7eab6 100644 --- a/security/integrity/ima/ima.h +++ b/security/integrity/ima/ima.h @@ -223,7 +223,7 @@ static inline void ima_inode_set_iint(const struct inode *inode, struct ima_iint_cache *ima_iint_find(struct inode *inode); struct ima_iint_cache *ima_inode_get(struct inode *inode); -void ima_inode_free(struct inode *inode); +void ima_inode_free_rcu(void *inode_sec); void __init ima_iintcache_init(void); extern const int read_idmap[]; diff --git a/security/integrity/ima/ima_iint.c b/security/integrity/ima/ima_iint.c index e23412a2c56b..54480df90bdc 100644 --- a/security/integrity/ima/ima_iint.c +++ b/security/integrity/ima/ima_iint.c @@ -109,22 +109,18 @@ struct ima_iint_cache *ima_inode_get(struct inode *inode) } /** - * ima_inode_free - Called on inode free - * @inode: Pointer to the inode + * ima_inode_free_rcu - Called to free an inode via a RCU callback + * @inode_sec: The inode::i_security pointer * - * Free the iint associated with an inode. + * Free the IMA data associated with an inode. */ -void ima_inode_free(struct inode *inode) +void ima_inode_free_rcu(void *inode_sec) { - struct ima_iint_cache *iint; + struct ima_iint_cache **iint_p = inode_sec + ima_blob_sizes.lbs_inode; - if (!IS_IMA(inode)) - return; - - iint = ima_iint_find(inode); - ima_inode_set_iint(inode, NULL); - - ima_iint_free(iint); + /* *iint_p should be NULL if !IS_IMA(inode) */ + if (*iint_p) + ima_iint_free(*iint_p); } static void ima_iint_init_once(void *foo) diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c index f04f43af651c..5b3394864b21 100644 --- a/security/integrity/ima/ima_main.c +++ b/security/integrity/ima/ima_main.c @@ -1193,7 +1193,7 @@ static struct security_hook_list ima_hooks[] __ro_after_init = { #ifdef CONFIG_INTEGRITY_ASYMMETRIC_KEYS LSM_HOOK_INIT(kernel_module_request, ima_kernel_module_request), #endif - LSM_HOOK_INIT(inode_free_security, ima_inode_free), + LSM_HOOK_INIT(inode_free_security_rcu, ima_inode_free_rcu), }; static const struct lsm_id ima_lsmid = { diff --git a/security/landlock/fs.c b/security/landlock/fs.c index 22d8b7c28074..f583f8cec345 100644 --- a/security/landlock/fs.c +++ b/security/landlock/fs.c @@ -1198,13 +1198,16 @@ static int current_check_refer_path(struct dentry *const old_dentry, /* Inode hooks */ -static void hook_inode_free_security(struct inode *const inode) +static void hook_inode_free_security_rcu(void *inode_sec) { + struct landlock_inode_security *lisec; + /* * All inodes must already have been untied from their object by * release_inode() or hook_sb_delete(). */ - WARN_ON_ONCE(landlock_inode(inode)->object); + lisec = inode_sec + landlock_blob_sizes.lbs_inode; + WARN_ON_ONCE(lisec->object); } /* Super-block hooks */ @@ -1628,7 +1631,7 @@ static int hook_file_ioctl_compat(struct file *file, unsigned int cmd, } static struct security_hook_list landlock_hooks[] __ro_after_init = { - LSM_HOOK_INIT(inode_free_security, hook_inode_free_security), + LSM_HOOK_INIT(inode_free_security_rcu, hook_inode_free_security_rcu), LSM_HOOK_INIT(sb_delete, hook_sb_delete), LSM_HOOK_INIT(sb_mount, hook_sb_mount), diff --git a/security/security.c b/security/security.c index b52e81ac5526..bc6805f7332e 100644 --- a/security/security.c +++ b/security/security.c @@ -1596,9 +1596,8 @@ int security_inode_alloc(struct inode *inode) static void inode_free_by_rcu(struct rcu_head *head) { - /* - * The rcu head is at the start of the inode blob - */ + /* The rcu head is at the start of the inode blob */ + call_void_hook(inode_free_security_rcu, head); kmem_cache_free(lsm_inode_cache, head); } @@ -1606,20 +1605,21 @@ static void inode_free_by_rcu(struct rcu_head *head) * security_inode_free() - Free an inode's LSM blob * @inode: the inode * - * Deallocate the inode security structure and set @inode->i_security to NULL. + * Release any LSM resources associated with @inode, although due to the + * inode's RCU protections it is possible that the resources will not be + * fully released until after the current RCU grace period has elapsed. + * + * It is important for LSMs to note that despite being present in a call to + * security_inode_free(), @inode may still be referenced in a VFS path walk + * and calls to security_inode_permission() may be made during, or after, + * a call to security_inode_free(). For this reason the inode->i_security + * field is released via a call_rcu() callback and any LSMs which need to + * retain inode state for use in security_inode_permission() should only + * release that state in the inode_free_security_rcu() LSM hook callback. */ void security_inode_free(struct inode *inode) { call_void_hook(inode_free_security, inode); - /* - * The inode may still be referenced in a path walk and - * a call to security_inode_permission() can be made - * after inode_free_security() is called. Ideally, the VFS - * wouldn't do this, but fixing that is a much harder - * job. For now, simply free the i_security via RCU, and - * leave the current inode->i_security pointer intact. - * The inode will be freed after the RCU grace period too. - */ if (inode->i_security) call_rcu((struct rcu_head *)inode->i_security, inode_free_by_rcu);
The LSM framework has an existing inode_free_security() hook which is used by LSMs that manage state associated with an inode, but due to the use of RCU to protect the inode, special care must be taken to ensure that the LSMs do not fully release the inode state until it is safe from a RCU perspective. This patch implements a new inode_free_security_rcu() implementation hook which is called when it is safe to free the LSM's internal inode state. Unfortunately, this new hook does not have access to the inode itself as it may already be released, so the existing inode_free_security() hook is retained for those LSMs which require access to the inode. Signed-off-by: Paul Moore <paul@paul-moore.com> --- include/linux/lsm_hook_defs.h | 1 + security/integrity/ima/ima.h | 2 +- security/integrity/ima/ima_iint.c | 20 ++++++++------------ security/integrity/ima/ima_main.c | 2 +- security/landlock/fs.c | 9 ++++++--- security/security.c | 26 +++++++++++++------------- 6 files changed, 30 insertions(+), 30 deletions(-)