Message ID | 20230928110554.34758-3-jlayton@kernel.org (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | fs: new accessor methods for atime and mtime | expand |
Context | Check | Description |
---|---|---|
bpf/vmtest-bpf-next-PR | success | PR summary |
bpf/vmtest-bpf-next-VM_Test-0 | success | Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain_full }} |
bpf/vmtest-bpf-next-VM_Test-1 | success | Logs for ShellCheck |
bpf/vmtest-bpf-next-VM_Test-2 | fail | Logs for build for aarch64 with gcc |
bpf/vmtest-bpf-next-VM_Test-3 | fail | Logs for build for s390x with gcc |
bpf/vmtest-bpf-next-VM_Test-4 | fail | Logs for build for x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-5 | fail | Logs for build for x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-6 | success | Logs for set-matrix |
bpf/vmtest-bpf-next-VM_Test-7 | success | Logs for veristat |
On Thu, Sep 28, 2023 at 2:06 PM Jeff Layton <jlayton@kernel.org> wrote: > > The recent change to use discrete integers instead of struct timespec64 > in struct inode shaved 8 bytes off of it, but it also moves the i_lock > into the previous cacheline, away from the fields that it protects. > > Move i_blocks up above the i_lock, which moves the new 4 byte hole to > just after the timestamps, without changing the size of the structure. > Instead of creating an implicit hole, can you please move i_generation to fill the 4 bytes hole. It makes sense in the same cache line with i_ino and I could use the vacant 4 bytes hole above i_fsnotify_mask to expand the mask to 64bit (the 32bit event mask space is running out). Thanks, Amir. > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > include/linux/fs.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index de902ff2938b..3e0fe0f52e7c 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -677,11 +677,11 @@ struct inode { > u32 i_atime_nsec; > u32 i_mtime_nsec; > u32 i_ctime_nsec; > + blkcnt_t i_blocks; > spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */ > unsigned short i_bytes; > u8 i_blkbits; > u8 i_write_hint; > - blkcnt_t i_blocks; > > #ifdef __NEED_I_SIZE_ORDERED > seqcount_t i_size_seqcount; > -- > 2.41.0 >
On Thu, 2023-09-28 at 14:35 +0300, Amir Goldstein wrote: > On Thu, Sep 28, 2023 at 2:06 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > The recent change to use discrete integers instead of struct timespec64 > > in struct inode shaved 8 bytes off of it, but it also moves the i_lock > > into the previous cacheline, away from the fields that it protects. > > > > Move i_blocks up above the i_lock, which moves the new 4 byte hole to > > just after the timestamps, without changing the size of the structure. > > > > Instead of creating an implicit hole, can you please move i_generation > to fill the 4 bytes hole. > > It makes sense in the same cache line with i_ino and I could > use the vacant 4 bytes hole above i_fsnotify_mask to expand the > mask to 64bit (the 32bit event mask space is running out). > > Thanks, > Amir. > Sounds like a plan. Resulting struct inode size is the same (616 bytes with my kdevops kconfig). BTW: all of these changes are in my "amtime" branch if anyone wants to pull them down. -- Jeff Layton <jlayton@kernel.org>
On Thu, 28 Sept 2023 at 04:06, Jeff Layton <jlayton@kernel.org> wrote: > > Move i_blocks up above the i_lock, which moves the new 4 byte hole to > just after the timestamps, without changing the size of the structure. I'm sure others have mentioned this, but 'struct inode' is marked with __randomize_layout, so the actual layout may end up being very different. I'm personally not convinced the whole structure randomization is worth it - it's easy enough to figure out for any distro kernel since the seed has to be the same across machines for modules to work, so even if the seed isn't "public", any layout is bound to be fairly easily discoverable. So the whole randomization only really works for private kernel builds, and it adds this kind of pain where "optimizing" the structure layout is kind of pointless depending on various options. I certainly *hope* no distro enables that pointless thing, but it's a worry. Linus
On Thu, 2023-09-28 at 10:41 -0700, Linus Torvalds wrote: > On Thu, 28 Sept 2023 at 04:06, Jeff Layton <jlayton@kernel.org> wrote: > > > > Move i_blocks up above the i_lock, which moves the new 4 byte hole to > > just after the timestamps, without changing the size of the structure. > > I'm sure others have mentioned this, but 'struct inode' is marked with > __randomize_layout, so the actual layout may end up being very > different. > > I'm personally not convinced the whole structure randomization is > worth it - it's easy enough to figure out for any distro kernel since > the seed has to be the same across machines for modules to work, so > even if the seed isn't "public", any layout is bound to be fairly > easily discoverable. > > So the whole randomization only really works for private kernel > builds, and it adds this kind of pain where "optimizing" the structure > layout is kind of pointless depending on various options. > > I certainly *hope* no distro enables that pointless thing, but it's a worry. > I've never enabled struct randomization and don't know anyone who does. I figure if you turn that on, you get to keep all of the pieces when you start seeing weird performance problems. I think that we have to optimize for that being disabled. Even without that though, turning on and off options can change the layout...and then there are different arches, etc. I'm using a config derived from the Fedora x86_64 kernel images and hope that represents a reasonably common configuration. The only conditional members before the timestamps are based on CONFIG_FS_POSIX_ACL and CONFIG_SECURITY, which are almost always turned on with most distros.
On Thu, Sep 28, 2023 at 10:41:34AM -0700, Linus Torvalds wrote: > On Thu, 28 Sept 2023 at 04:06, Jeff Layton <jlayton@kernel.org> wrote: > > > > Move i_blocks up above the i_lock, which moves the new 4 byte hole to > > just after the timestamps, without changing the size of the structure. > > I'm sure others have mentioned this, but 'struct inode' is marked with > __randomize_layout, so the actual layout may end up being very > different. > > I'm personally not convinced the whole structure randomization is > worth it - it's easy enough to figure out for any distro kernel since > the seed has to be the same across machines for modules to work, so > even if the seed isn't "public", any layout is bound to be fairly > easily discoverable. > > So the whole randomization only really works for private kernel > builds, and it adds this kind of pain where "optimizing" the structure > layout is kind of pointless depending on various options. > > I certainly *hope* no distro enables that pointless thing, but it's a worry. They don't last we checked. Just last cycle we moved stuff in struct file around to optimize things and we explicitly said we don't give a damn about struct randomization. Anyone who enables this will bleed performance pretty badly, I would reckon.
diff --git a/include/linux/fs.h b/include/linux/fs.h index de902ff2938b..3e0fe0f52e7c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -677,11 +677,11 @@ struct inode { u32 i_atime_nsec; u32 i_mtime_nsec; u32 i_ctime_nsec; + blkcnt_t i_blocks; spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */ unsigned short i_bytes; u8 i_blkbits; u8 i_write_hint; - blkcnt_t i_blocks; #ifdef __NEED_I_SIZE_ORDERED seqcount_t i_size_seqcount;
The recent change to use discrete integers instead of struct timespec64 in struct inode shaved 8 bytes off of it, but it also moves the i_lock into the previous cacheline, away from the fields that it protects. Move i_blocks up above the i_lock, which moves the new 4 byte hole to just after the timestamps, without changing the size of the structure. Signed-off-by: Jeff Layton <jlayton@kernel.org> --- include/linux/fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)