Message ID | 8a8b4385143d66feec39e3925a399c118846a686.1701281422.git.josef@toxicpanda.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: cache that we don't have security.capability set | expand |
On Wed, Nov 29, 2023 at 6:10 PM Josef Bacik <josef@toxicpanda.com> wrote: > > When profiling a workload I noticed we were constantly calling getxattr. > These were mostly coming from __remove_privs, which will lookup if > security.capability exists to remove it. However instrumenting getxattr > showed we get called nearly constantly on an idle machine on a lot of > accesses. > > These are wasteful and not free. Other security LSM's have a way to > cache their results, but capability doesn't have this, so it's asking us > all the time for the xattr. > > Fix this by setting a flag in our inode that it doesn't have a > security.capability xattr. We set this on new inodes and after a failed > lookup of security.capability. If we set this xattr at all we'll clear > the flag. > > I haven't found a test in fsperf that this makes a visible difference > on, but I assume fs_mark related tests would show it clearly. This is a > perf report output of the smallfiles100k run where it shows 20% of our > time spent in __remove_privs because we're looking up the non-existent > xattr. > > --21.86%--btrfs_write_check.constprop.0 > --21.62%--__file_remove_privs > --21.55%--security_inode_need_killpriv > --21.54%--cap_inode_need_killpriv > --21.53%--__vfs_getxattr > --20.89%--btrfs_getxattr > > Obviously this is just CPU time in a mostly IO bound test, so the actual > effect of removing this callchain is minimal. However in just normal > testing of an idle system tracing showed around 100 getxattr calls per > minute, and with this patch there are 0. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Looks good, thanks. > --- > fs/btrfs/btrfs_inode.h | 1 + > fs/btrfs/inode.c | 7 +++++ > fs/btrfs/xattr.c | 59 ++++++++++++++++++++++++++++++++++++++++-- > 3 files changed, 65 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h > index 5572ae52444e..de9f71743b6b 100644 > --- a/fs/btrfs/btrfs_inode.h > +++ b/fs/btrfs/btrfs_inode.h > @@ -69,6 +69,7 @@ enum { > BTRFS_INODE_VERITY_IN_PROGRESS, > /* Set when this inode is a free space inode. */ > BTRFS_INODE_FREE_SPACE_INODE, > + BTRFS_INODE_NO_CAP_XATTR, > }; > > /* in memory btrfs inode */ > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 096b3004a19f..f8647d8271b7 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -6225,6 +6225,13 @@ int btrfs_create_new_inode(struct btrfs_trans_handle *trans, > BTRFS_I(inode)->generation = trans->transid; > inode->i_generation = BTRFS_I(inode)->generation; > > + /* > + * We don't have any capability xattrs set here yet, shortcut any > + * queries for the xattrs here. If we add them later via the inode > + * security init path or any other path this flag will be cleared. > + */ > + set_bit(BTRFS_INODE_NO_CAP_XATTR, &BTRFS_I(inode)->runtime_flags); > + > /* > * Subvolumes don't inherit flags from their parent directory. > * Originally this was probably by accident, but we probably can't > diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c > index 3cf236fb40a4..caf8de1158b9 100644 > --- a/fs/btrfs/xattr.c > +++ b/fs/btrfs/xattr.c > @@ -382,6 +382,56 @@ static int btrfs_xattr_handler_set(const struct xattr_handler *handler, > return btrfs_setxattr_trans(inode, name, buffer, size, flags); > } > > +static int btrfs_xattr_handler_get_security(const struct xattr_handler *handler, > + struct dentry *unused, > + struct inode *inode, > + const char *name, void *buffer, > + size_t size) > +{ > + int ret; > + bool is_cap = false; > + > + name = xattr_full_name(handler, name); > + > + /* > + * security.capability doesn't cache the results, so calls into us > + * constantly to see if there's a capability xattr. Cache the result > + * here in order to avoid wasting time doing lookups for xattrs we know > + * don't exist. > + */ > + if (!strcmp(name, XATTR_NAME_CAPS)) { > + is_cap = true; > + if (test_bit(BTRFS_INODE_NO_CAP_XATTR, > + &BTRFS_I(inode)->runtime_flags)) > + return -ENODATA; > + } > + > + ret = btrfs_getxattr(inode, name, buffer, size); > + if (ret == -ENODATA && is_cap) > + set_bit(BTRFS_INODE_NO_CAP_XATTR, > + &BTRFS_I(inode)->runtime_flags); > + return ret; > +} > + > +static int btrfs_xattr_handler_set_security(const struct xattr_handler *handler, > + struct mnt_idmap *idmap, > + struct dentry *unused, > + struct inode *inode, > + const char *name, > + const void *buffer, > + size_t size, int flags) > +{ > + if (btrfs_root_readonly(BTRFS_I(inode)->root)) > + return -EROFS; > + > + name = xattr_full_name(handler, name); > + if (!strcmp(name, XATTR_NAME_CAPS)) > + clear_bit(BTRFS_INODE_NO_CAP_XATTR, > + &BTRFS_I(inode)->runtime_flags); > + > + return btrfs_setxattr_trans(inode, name, buffer, size, flags); > +} > + > static int btrfs_xattr_handler_set_prop(const struct xattr_handler *handler, > struct mnt_idmap *idmap, > struct dentry *unused, struct inode *inode, > @@ -420,8 +470,8 @@ static int btrfs_xattr_handler_set_prop(const struct xattr_handler *handler, > > static const struct xattr_handler btrfs_security_xattr_handler = { > .prefix = XATTR_SECURITY_PREFIX, > - .get = btrfs_xattr_handler_get, > - .set = btrfs_xattr_handler_set, > + .get = btrfs_xattr_handler_get_security, > + .set = btrfs_xattr_handler_set_security, > }; > > static const struct xattr_handler btrfs_trusted_xattr_handler = { > @@ -473,6 +523,11 @@ static int btrfs_initxattrs(struct inode *inode, > } > strcpy(name, XATTR_SECURITY_PREFIX); > strcpy(name + XATTR_SECURITY_PREFIX_LEN, xattr->name); > + > + if (!strcmp(name, XATTR_NAME_CAPS)) > + clear_bit(BTRFS_INODE_NO_CAP_XATTR, > + &BTRFS_I(inode)->runtime_flags); > + > err = btrfs_setxattr(trans, inode, name, xattr->value, > xattr->value_len, 0); > kfree(name); > -- > 2.41.0 > >
On Wed, Nov 29, 2023 at 01:10:31PM -0500, Josef Bacik wrote: > When profiling a workload I noticed we were constantly calling getxattr. > These were mostly coming from __remove_privs, which will lookup if > security.capability exists to remove it. However instrumenting getxattr > showed we get called nearly constantly on an idle machine on a lot of > accesses. > > These are wasteful and not free. Other security LSM's have a way to > cache their results, but capability doesn't have this, so it's asking us > all the time for the xattr. > > Fix this by setting a flag in our inode that it doesn't have a > security.capability xattr. We set this on new inodes and after a failed > lookup of security.capability. If we set this xattr at all we'll clear > the flag. > > I haven't found a test in fsperf that this makes a visible difference > on, but I assume fs_mark related tests would show it clearly. This is a > perf report output of the smallfiles100k run where it shows 20% of our > time spent in __remove_privs because we're looking up the non-existent > xattr. > > --21.86%--btrfs_write_check.constprop.0 > --21.62%--__file_remove_privs > --21.55%--security_inode_need_killpriv > --21.54%--cap_inode_need_killpriv > --21.53%--__vfs_getxattr > --20.89%--btrfs_getxattr > > Obviously this is just CPU time in a mostly IO bound test, so the actual > effect of removing this callchain is minimal. However in just normal > testing of an idle system tracing showed around 100 getxattr calls per > minute, and with this patch there are 0. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> Added to misc-next, thanks. > --- > fs/btrfs/btrfs_inode.h | 1 + > fs/btrfs/inode.c | 7 +++++ > fs/btrfs/xattr.c | 59 ++++++++++++++++++++++++++++++++++++++++-- > 3 files changed, 65 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h > index 5572ae52444e..de9f71743b6b 100644 > --- a/fs/btrfs/btrfs_inode.h > +++ b/fs/btrfs/btrfs_inode.h > @@ -69,6 +69,7 @@ enum { > BTRFS_INODE_VERITY_IN_PROGRESS, > /* Set when this inode is a free space inode. */ > BTRFS_INODE_FREE_SPACE_INODE, > + BTRFS_INODE_NO_CAP_XATTR, I've added a comment > }; > > /* in memory btrfs inode */ > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 096b3004a19f..f8647d8271b7 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -6225,6 +6225,13 @@ int btrfs_create_new_inode(struct btrfs_trans_handle *trans, > BTRFS_I(inode)->generation = trans->transid; > inode->i_generation = BTRFS_I(inode)->generation; > > + /* > + * We don't have any capability xattrs set here yet, shortcut any > + * queries for the xattrs here. If we add them later via the inode > + * security init path or any other path this flag will be cleared. > + */ > + set_bit(BTRFS_INODE_NO_CAP_XATTR, &BTRFS_I(inode)->runtime_flags); > + > /* > * Subvolumes don't inherit flags from their parent directory. > * Originally this was probably by accident, but we probably can't > diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c > index 3cf236fb40a4..caf8de1158b9 100644 > --- a/fs/btrfs/xattr.c > +++ b/fs/btrfs/xattr.c > @@ -382,6 +382,56 @@ static int btrfs_xattr_handler_set(const struct xattr_handler *handler, > return btrfs_setxattr_trans(inode, name, buffer, size, flags); > } > > +static int btrfs_xattr_handler_get_security(const struct xattr_handler *handler, > + struct dentry *unused, > + struct inode *inode, > + const char *name, void *buffer, > + size_t size) > +{ > + int ret; > + bool is_cap = false; > + > + name = xattr_full_name(handler, name); > + > + /* > + * security.capability doesn't cache the results, so calls into us > + * constantly to see if there's a capability xattr. Cache the result > + * here in order to avoid wasting time doing lookups for xattrs we know > + * don't exist. > + */ > + if (!strcmp(name, XATTR_NAME_CAPS)) { Please use "strcmp(...) == 0" everywhere, that way it reads as that the strings match and one does not flip the logic when there's "!strcmp(...)".
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 5572ae52444e..de9f71743b6b 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -69,6 +69,7 @@ enum { BTRFS_INODE_VERITY_IN_PROGRESS, /* Set when this inode is a free space inode. */ BTRFS_INODE_FREE_SPACE_INODE, + BTRFS_INODE_NO_CAP_XATTR, }; /* in memory btrfs inode */ diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 096b3004a19f..f8647d8271b7 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6225,6 +6225,13 @@ int btrfs_create_new_inode(struct btrfs_trans_handle *trans, BTRFS_I(inode)->generation = trans->transid; inode->i_generation = BTRFS_I(inode)->generation; + /* + * We don't have any capability xattrs set here yet, shortcut any + * queries for the xattrs here. If we add them later via the inode + * security init path or any other path this flag will be cleared. + */ + set_bit(BTRFS_INODE_NO_CAP_XATTR, &BTRFS_I(inode)->runtime_flags); + /* * Subvolumes don't inherit flags from their parent directory. * Originally this was probably by accident, but we probably can't diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c index 3cf236fb40a4..caf8de1158b9 100644 --- a/fs/btrfs/xattr.c +++ b/fs/btrfs/xattr.c @@ -382,6 +382,56 @@ static int btrfs_xattr_handler_set(const struct xattr_handler *handler, return btrfs_setxattr_trans(inode, name, buffer, size, flags); } +static int btrfs_xattr_handler_get_security(const struct xattr_handler *handler, + struct dentry *unused, + struct inode *inode, + const char *name, void *buffer, + size_t size) +{ + int ret; + bool is_cap = false; + + name = xattr_full_name(handler, name); + + /* + * security.capability doesn't cache the results, so calls into us + * constantly to see if there's a capability xattr. Cache the result + * here in order to avoid wasting time doing lookups for xattrs we know + * don't exist. + */ + if (!strcmp(name, XATTR_NAME_CAPS)) { + is_cap = true; + if (test_bit(BTRFS_INODE_NO_CAP_XATTR, + &BTRFS_I(inode)->runtime_flags)) + return -ENODATA; + } + + ret = btrfs_getxattr(inode, name, buffer, size); + if (ret == -ENODATA && is_cap) + set_bit(BTRFS_INODE_NO_CAP_XATTR, + &BTRFS_I(inode)->runtime_flags); + return ret; +} + +static int btrfs_xattr_handler_set_security(const struct xattr_handler *handler, + struct mnt_idmap *idmap, + struct dentry *unused, + struct inode *inode, + const char *name, + const void *buffer, + size_t size, int flags) +{ + if (btrfs_root_readonly(BTRFS_I(inode)->root)) + return -EROFS; + + name = xattr_full_name(handler, name); + if (!strcmp(name, XATTR_NAME_CAPS)) + clear_bit(BTRFS_INODE_NO_CAP_XATTR, + &BTRFS_I(inode)->runtime_flags); + + return btrfs_setxattr_trans(inode, name, buffer, size, flags); +} + static int btrfs_xattr_handler_set_prop(const struct xattr_handler *handler, struct mnt_idmap *idmap, struct dentry *unused, struct inode *inode, @@ -420,8 +470,8 @@ static int btrfs_xattr_handler_set_prop(const struct xattr_handler *handler, static const struct xattr_handler btrfs_security_xattr_handler = { .prefix = XATTR_SECURITY_PREFIX, - .get = btrfs_xattr_handler_get, - .set = btrfs_xattr_handler_set, + .get = btrfs_xattr_handler_get_security, + .set = btrfs_xattr_handler_set_security, }; static const struct xattr_handler btrfs_trusted_xattr_handler = { @@ -473,6 +523,11 @@ static int btrfs_initxattrs(struct inode *inode, } strcpy(name, XATTR_SECURITY_PREFIX); strcpy(name + XATTR_SECURITY_PREFIX_LEN, xattr->name); + + if (!strcmp(name, XATTR_NAME_CAPS)) + clear_bit(BTRFS_INODE_NO_CAP_XATTR, + &BTRFS_I(inode)->runtime_flags); + err = btrfs_setxattr(trans, inode, name, xattr->value, xattr->value_len, 0); kfree(name);
When profiling a workload I noticed we were constantly calling getxattr. These were mostly coming from __remove_privs, which will lookup if security.capability exists to remove it. However instrumenting getxattr showed we get called nearly constantly on an idle machine on a lot of accesses. These are wasteful and not free. Other security LSM's have a way to cache their results, but capability doesn't have this, so it's asking us all the time for the xattr. Fix this by setting a flag in our inode that it doesn't have a security.capability xattr. We set this on new inodes and after a failed lookup of security.capability. If we set this xattr at all we'll clear the flag. I haven't found a test in fsperf that this makes a visible difference on, but I assume fs_mark related tests would show it clearly. This is a perf report output of the smallfiles100k run where it shows 20% of our time spent in __remove_privs because we're looking up the non-existent xattr. --21.86%--btrfs_write_check.constprop.0 --21.62%--__file_remove_privs --21.55%--security_inode_need_killpriv --21.54%--cap_inode_need_killpriv --21.53%--__vfs_getxattr --20.89%--btrfs_getxattr Obviously this is just CPU time in a mostly IO bound test, so the actual effect of removing this callchain is minimal. However in just normal testing of an idle system tracing showed around 100 getxattr calls per minute, and with this patch there are 0. Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/btrfs/btrfs_inode.h | 1 + fs/btrfs/inode.c | 7 +++++ fs/btrfs/xattr.c | 59 ++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 65 insertions(+), 2 deletions(-)