Message ID | 20241007-brauner-file-rcuref-v2-0-387e24dc9163@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | fs: introduce file_ref_t | expand |
On 10/7/24 8:23 AM, Christian Brauner wrote: > As atomic_inc_not_zero() is implemented with a try_cmpxchg() loop it has > O(N^2) behaviour under contention with N concurrent operations and it is > in a hot path in __fget_files_rcu(). > > The rcuref infrastructures remedies this problem by using an > unconditional increment relying on safe- and dead zones to make this > work and requiring rcu protection for the data structure in question. > This not just scales better it also introduces overflow protection. > > However, in contrast to generic rcuref, files require a memory barrier > and thus cannot rely on *_relaxed() atomic operations and also require > to be built on atomic_long_t as having massive amounts of reference > isn't unheard of even if it is just an attack. > > As suggested by Linus, add a file specific variant instead of making > this a generic library. > > I've been testing this with will-it-scale using a multi-threaded fstat() > on the same file descriptor on a machine that Jens gave me access (thank > you very much!): > > processor : 511 > vendor_id : AuthenticAMD > cpu family : 25 > model : 160 > model name : AMD EPYC 9754 128-Core Processor > > and I consistently get a 3-5% improvement on workloads with 256+ and > more threads comparing v6.12-rc1 as base with and without these patches > applied. FWIW, I ran this on another box, which is a 2-socket with these CPUs: AMD EPYC 7763 64-Core Processor hence 128 cores, 256 threads. I ran my usual max iops test case, which is 24 threads, each driving a fast drive. If I run without io_uring direct descriptors, then fget/fput is hit decently hard. In that case, I see a net reduction of about 1.2% CPU time for the fget/fput parts. So not as huge a win as mentioned above, but it's also using way fewer threads and different file descriptors. I'd say that's a pretty noticeable win!
As atomic_inc_not_zero() is implemented with a try_cmpxchg() loop it has O(N^2) behaviour under contention with N concurrent operations and it is in a hot path in __fget_files_rcu(). The rcuref infrastructures remedies this problem by using an unconditional increment relying on safe- and dead zones to make this work and requiring rcu protection for the data structure in question. This not just scales better it also introduces overflow protection. However, in contrast to generic rcuref, files require a memory barrier and thus cannot rely on *_relaxed() atomic operations and also require to be built on atomic_long_t as having massive amounts of reference isn't unheard of even if it is just an attack. As suggested by Linus, add a file specific variant instead of making this a generic library. I've been testing this with will-it-scale using a multi-threaded fstat() on the same file descriptor on a machine that Jens gave me access (thank you very much!): processor : 511 vendor_id : AuthenticAMD cpu family : 25 model : 160 model name : AMD EPYC 9754 128-Core Processor and I consistently get a 3-5% improvement on workloads with 256+ and more threads comparing v6.12-rc1 as base with and without these patches applied. In vfs.file now. Signed-off-by: Christian Brauner <brauner@kernel.org> --- Changes in v2: - Don't introduce a separate rcuref_long_t library just implement it only for files for now. - Retain memory barrier by using atomic_long_add_negative() instead of atomic_long_add_negative_relaxed() to order the loads before and after the increment in __fget_files_rcu(). - Link to v1: https://lore.kernel.org/r/20241005-brauner-file-rcuref-v1-0-725d5e713c86@kernel.org --- Christian Brauner (3): fs: protect backing files with rcu fs: add file_ref fs: port files to file_ref drivers/gpu/drm/i915/gt/shmem_utils.c | 2 +- drivers/gpu/drm/vmwgfx/ttm_object.c | 2 +- fs/eventpoll.c | 2 +- fs/file.c | 120 ++++++++++++++++++++++++++++++++-- fs/file_table.c | 23 +++++-- include/linux/file_ref.h | 116 ++++++++++++++++++++++++++++++++ include/linux/fs.h | 9 +-- 7 files changed, 253 insertions(+), 21 deletions(-) --- base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc change-id: 20240927-brauner-file-rcuref-bfa4a4ba915b