Message ID | 20220608171741.3875418-9-shr@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | io-uring/xfs: support async buffered writes | expand |
On Wed, Jun 08, 2022 at 10:17:35AM -0700, Stefan Roesch wrote: > This adds the function __remove_file_privs, which allows the caller to > pass the kiocb flags parameter. > > No intended functional changes in this patch. > > Signed-off-by: Stefan Roesch <shr@fb.com> > Reviewed-by: Christoph Hellwig <hch@lst.de> > Reviewed-by: Jan Kara <jack@suse.cz> > --- Looks good to me, Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Greeting, FYI, we noticed a -4.3% regression of phoronix-test-suite.fio.SequentialWrite.Sync.Yes.No.4KB.DefaultTestDirectory.mb_s due to commit: commit: b6c81e63ec1e668f8df716a91c5386074411dbbe ("[PATCH v8 08/14] fs: add __remove_file_privs() with flags parameter") url: https://github.com/intel-lab-lkp/linux/commits/Stefan-Roesch/io-uring-xfs-support-async-buffered-writes/20220609-013837 patch link: https://lore.kernel.org/io-uring/20220608171741.3875418-9-shr@fb.com in testcase: phoronix-test-suite on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 512G memory with following parameters: test: fio-1.14.1 option_a: Sequential Write option_b: Sync option_c: Yes option_d: No option_e: 4KB option_f: Default Test Directory cpufreq_governor: performance ucode: 0x500320a test-description: The Phoronix Test Suite is the most comprehensive testing and benchmarking platform available that provides an extensible framework for which new tests can be easily added. test-url: http://www.phoronix-test-suite.com/ If you fix the issue, kindly add following tag Reported-by: kernel test robot <oliver.sang@intel.com> Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests sudo bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run sudo bin/lkp run generated-yaml-file # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state. ========================================================================================= compiler/cpufreq_governor/kconfig/option_a/option_b/option_c/option_d/option_e/option_f/rootfs/tbox_group/test/testcase/ucode: gcc-11/performance/x86_64-rhel-8.3/Sequential Write/Sync/Yes/No/4KB/Default Test Directory/debian-x86_64-phoronix/lkp-csl-2sp7/fio-1.14.1/phoronix-test-suite/0x500320a commit: 003096ce91 ("fs: Add check for async buffered writes to generic_write_checks") b6c81e63ec ("fs: add __remove_file_privs() with flags parameter") 003096ce910675ce b6c81e63ec1e668f8df716a91c5 ---------------- --------------------------- %stddev %change %stddev \ | \ 245666 -4.3% 235222 phoronix-test-suite.fio.SequentialWrite.Sync.Yes.No.4KB.DefaultTestDirectory.iops 960.00 -4.3% 919.17 phoronix-test-suite.fio.SequentialWrite.Sync.Yes.No.4KB.DefaultTestDirectory.mb_s 1.51e+08 -5.5% 1.427e+08 phoronix-test-suite.time.file_system_outputs 256970 ± 9% +44.3% 370918 ± 11% meminfo.Dirty 267660 ± 11% +39.6% 373743 ± 11% numa-meminfo.node0.Dirty 610732 -4.5% 583080 vmstat.io.bo 67160 ± 11% +38.3% 92904 ± 11% numa-vmstat.node0.nr_dirty 79263 ± 8% +33.0% 105418 ± 10% numa-vmstat.node0.nr_zone_write_pending 32.67 +2.5% 33.48 perf-stat.i.major-faults 5211 +2.3% 5331 perf-stat.i.minor-faults 3509036 -7.2% 3255351 ± 3% perf-stat.i.node-stores 5244 +2.3% 5364 perf-stat.i.page-faults 32.44 +2.5% 33.24 perf-stat.ps.major-faults 5173 +2.3% 5292 perf-stat.ps.minor-faults 3477114 -7.2% 3225288 ± 3% perf-stat.ps.node-stores 5206 +2.3% 5324 perf-stat.ps.page-faults 18872500 -5.5% 17841229 proc-vmstat.nr_dirtied 65070 ± 9% +42.8% 92916 ± 11% proc-vmstat.nr_dirty 827164 +3.2% 853524 proc-vmstat.nr_file_pages 386220 +6.9% 412691 ± 2% proc-vmstat.nr_inactive_file 29963 +2.5% 30724 proc-vmstat.nr_slab_reclaimable 18830707 -5.8% 17734737 proc-vmstat.nr_written 386220 +6.9% 412691 ± 2% proc-vmstat.nr_zone_inactive_file 77263 ± 6% +35.2% 104462 ± 9% proc-vmstat.nr_zone_write_pending 23867810 ± 3% -8.0% 21964544 proc-vmstat.numa_hit 22976263 -4.8% 21871403 proc-vmstat.numa_local 22987067 -4.6% 21929083 proc-vmstat.pgalloc_normal 22684993 -5.0% 21562034 proc-vmstat.pgfree 75323098 -5.8% 70939215 proc-vmstat.pgpgout 9889 ± 9% -17.4% 8171 ± 10% sched_debug.cfs_rq:/.min_vruntime.stddev 9.45 ± 51% +247.3% 32.81 ± 72% sched_debug.cfs_rq:/.removed.load_avg.avg 53.83 ± 25% +159.7% 139.81 ± 54% sched_debug.cfs_rq:/.removed.load_avg.stddev 4.30 ± 52% +260.1% 15.49 ± 76% sched_debug.cfs_rq:/.removed.runnable_avg.avg 24.46 ± 33% +172.5% 66.65 ± 57% sched_debug.cfs_rq:/.removed.runnable_avg.stddev 4.30 ± 52% +260.1% 15.49 ± 76% sched_debug.cfs_rq:/.removed.util_avg.avg 24.46 ± 33% +172.5% 66.65 ± 57% sched_debug.cfs_rq:/.removed.util_avg.stddev 820.61 ± 7% +18.9% 975.33 ± 7% sched_debug.cfs_rq:/.runnable_avg.max 9891 ± 9% -17.4% 8171 ± 10% sched_debug.cfs_rq:/.spread0.stddev 818.56 ± 7% +19.1% 975.06 ± 7% sched_debug.cfs_rq:/.util_avg.max 11.42 ± 27% +92.0% 21.92 ± 33% sched_debug.cfs_rq:/.util_est_enqueued.avg 427.50 ± 24% +84.2% 787.44 ± 30% sched_debug.cfs_rq:/.util_est_enqueued.max 58.65 ± 22% +79.2% 105.07 ± 29% sched_debug.cfs_rq:/.util_est_enqueued.stddev 0.00 +0.7 0.74 ± 11% perf-profile.calltrace.cycles-pp.__vfs_getxattr.cap_inode_need_killpriv.security_inode_need_killpriv.__file_remove_privs.file_modified 0.00 +0.8 0.80 ± 9% perf-profile.calltrace.cycles-pp.cap_inode_need_killpriv.security_inode_need_killpriv.__file_remove_privs.file_modified.ext4_buffered_write_iter 0.00 +0.8 0.84 ± 11% perf-profile.calltrace.cycles-pp.security_inode_need_killpriv.__file_remove_privs.file_modified.ext4_buffered_write_iter.new_sync_write 0.00 +0.9 0.90 ± 9% perf-profile.calltrace.cycles-pp.__file_remove_privs.file_modified.ext4_buffered_write_iter.new_sync_write.vfs_write 0.00 +0.9 0.91 ± 9% perf-profile.calltrace.cycles-pp.file_modified.ext4_buffered_write_iter.new_sync_write.vfs_write.ksys_write 0.40 ± 10% -0.1 0.33 ± 13% perf-profile.children.cycles-pp.jbd2_journal_grab_journal_head 0.26 ± 18% -0.1 0.20 ± 13% perf-profile.children.cycles-pp.file_update_time 0.12 ± 8% -0.0 0.08 ± 20% perf-profile.children.cycles-pp.xa_get_order 0.09 ± 16% +0.0 0.12 ± 13% perf-profile.children.cycles-pp.do_filp_open 0.09 ± 17% +0.0 0.12 ± 13% perf-profile.children.cycles-pp.path_openat 0.00 +0.1 0.08 ± 16% perf-profile.children.cycles-pp.up_read 0.00 +0.1 0.09 ± 30% perf-profile.children.cycles-pp.shmem_xattr_handler_get 0.00 +0.1 0.11 ± 31% perf-profile.children.cycles-pp.strlen 0.00 +0.1 0.13 ± 27% perf-profile.children.cycles-pp.simple_xattr_get 0.00 +0.2 0.19 ± 17% perf-profile.children.cycles-pp.down_read 0.00 +0.4 0.36 ± 9% perf-profile.children.cycles-pp.xattr_resolve_name 0.00 +0.4 0.44 ± 11% perf-profile.children.cycles-pp.ext4_xattr_get 0.02 ±141% +0.9 0.91 ± 9% perf-profile.children.cycles-pp.file_modified 0.00 +1.1 1.14 ± 10% perf-profile.children.cycles-pp.__vfs_getxattr 0.00 +1.2 1.22 ± 10% perf-profile.children.cycles-pp.cap_inode_need_killpriv 0.00 +1.3 1.26 ± 10% perf-profile.children.cycles-pp.security_inode_need_killpriv 0.00 +1.4 1.37 ± 10% perf-profile.children.cycles-pp.__file_remove_privs 0.40 ± 11% -0.1 0.32 ± 12% perf-profile.self.cycles-pp.jbd2_journal_grab_journal_head 0.16 ± 19% -0.1 0.09 ± 26% perf-profile.self.cycles-pp.ext4_da_write_begin 0.22 ± 10% -0.1 0.16 ± 14% perf-profile.self.cycles-pp.create_empty_buffers 0.00 +0.1 0.08 ± 16% perf-profile.self.cycles-pp.up_read 0.00 +0.1 0.08 ± 16% perf-profile.self.cycles-pp.ext4_xattr_get 0.00 +0.1 0.09 ± 15% perf-profile.self.cycles-pp.down_read 0.00 +0.1 0.09 ± 22% perf-profile.self.cycles-pp.__file_remove_privs 0.00 +0.1 0.09 ± 22% perf-profile.self.cycles-pp.cap_inode_need_killpriv 0.00 +0.1 0.10 ± 36% perf-profile.self.cycles-pp.strlen 0.00 +0.4 0.35 ± 9% perf-profile.self.cycles-pp.xattr_resolve_name Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.
diff --git a/fs/inode.c b/fs/inode.c index 9d9b422504d1..ac1cf5aa78c8 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -2010,36 +2010,43 @@ static int __remove_privs(struct user_namespace *mnt_userns, return notify_change(mnt_userns, dentry, &newattrs, NULL); } -/* - * Remove special file priviledges (suid, capabilities) when file is written - * to or truncated. - */ -int file_remove_privs(struct file *file) +static int __file_remove_privs(struct file *file, unsigned int flags) { struct dentry *dentry = file_dentry(file); struct inode *inode = file_inode(file); + int error; int kill; - int error = 0; - /* - * Fast path for nothing security related. - * As well for non-regular files, e.g. blkdev inodes. - * For example, blkdev_write_iter() might get here - * trying to remove privs which it is not allowed to. - */ if (IS_NOSEC(inode) || !S_ISREG(inode->i_mode)) return 0; kill = dentry_needs_remove_privs(dentry); - if (kill < 0) + if (kill <= 0) return kill; - if (kill) - error = __remove_privs(file_mnt_user_ns(file), dentry, kill); + + if (flags & IOCB_NOWAIT) + return -EAGAIN; + + error = __remove_privs(file_mnt_user_ns(file), dentry, kill); if (!error) inode_has_no_xattr(inode); return error; } + +/** + * file_remove_privs - remove special file privileges (suid, capabilities) + * @file: file to remove privileges from + * + * When file is modified by a write or truncation ensure that special + * file privileges are removed. + * + * Return: 0 on success, negative errno on failure. + */ +int file_remove_privs(struct file *file) +{ + return __file_remove_privs(file, 0); +} EXPORT_SYMBOL(file_remove_privs); /** @@ -2090,18 +2097,28 @@ int file_update_time(struct file *file) } EXPORT_SYMBOL(file_update_time); -/* Caller must hold the file's inode lock */ +/** + * file_modified - handle mandated vfs changes when modifying a file + * @file: file that was modified + * + * When file has been modified ensure that special + * file privileges are removed and time settings are updated. + * + * Context: Caller must hold the file's inode lock. + * + * Return: 0 on success, negative errno on failure. + */ int file_modified(struct file *file) { - int err; + int ret; /* * Clear the security bits if the process is not being run by root. * This keeps people from modifying setuid and setgid binaries. */ - err = file_remove_privs(file); - if (err) - return err; + ret = __file_remove_privs(file, 0); + if (ret) + return ret; if (unlikely(file->f_mode & FMODE_NOCMTIME)) return 0;