diff mbox series

[87/87] fs: move i_blocks up a few places in struct inode

Message ID 20230928110554.34758-3-jlayton@kernel.org (mailing list archive)
State Not Applicable
Headers show
Series fs: new accessor methods for atime and mtime | expand

Commit Message

Jeff Layton Sept. 28, 2023, 11:05 a.m. UTC
The recent change to use discrete integers instead of struct timespec64
in struct inode shaved 8 bytes off of it, but it also moves the i_lock
into the previous cacheline, away from the fields that it protects.

Move i_blocks up above the i_lock, which moves the new 4 byte hole to
just after the timestamps, without changing the size of the structure.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/fs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Amir Goldstein Sept. 28, 2023, 11:35 a.m. UTC | #1
On Thu, Sep 28, 2023 at 2:06 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> The recent change to use discrete integers instead of struct timespec64
> in struct inode shaved 8 bytes off of it, but it also moves the i_lock
> into the previous cacheline, away from the fields that it protects.
>
> Move i_blocks up above the i_lock, which moves the new 4 byte hole to
> just after the timestamps, without changing the size of the structure.
>

Instead of creating an implicit hole, can you please move i_generation
to fill the 4 bytes hole.

It makes sense in the same cache line with i_ino and I could
use the vacant 4 bytes hole above i_fsnotify_mask to expand the
mask to 64bit (the 32bit event mask space is running out).

Thanks,
Amir.

> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  include/linux/fs.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index de902ff2938b..3e0fe0f52e7c 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -677,11 +677,11 @@ struct inode {
>         u32                     i_atime_nsec;
>         u32                     i_mtime_nsec;
>         u32                     i_ctime_nsec;
> +       blkcnt_t                i_blocks;
>         spinlock_t              i_lock; /* i_blocks, i_bytes, maybe i_size */
>         unsigned short          i_bytes;
>         u8                      i_blkbits;
>         u8                      i_write_hint;
> -       blkcnt_t                i_blocks;
>
>  #ifdef __NEED_I_SIZE_ORDERED
>         seqcount_t              i_size_seqcount;
> --
> 2.41.0
>
Jeff Layton Sept. 28, 2023, 12:01 p.m. UTC | #2
On Thu, 2023-09-28 at 14:35 +0300, Amir Goldstein wrote:
> On Thu, Sep 28, 2023 at 2:06 PM Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > The recent change to use discrete integers instead of struct timespec64
> > in struct inode shaved 8 bytes off of it, but it also moves the i_lock
> > into the previous cacheline, away from the fields that it protects.
> > 
> > Move i_blocks up above the i_lock, which moves the new 4 byte hole to
> > just after the timestamps, without changing the size of the structure.
> > 
> 
> Instead of creating an implicit hole, can you please move i_generation
> to fill the 4 bytes hole.
> 
> It makes sense in the same cache line with i_ino and I could
> use the vacant 4 bytes hole above i_fsnotify_mask to expand the
> mask to 64bit (the 32bit event mask space is running out).
> 
> Thanks,
> Amir.
> 

Sounds like a plan. Resulting struct inode size is the same (616 bytes
with my kdevops kconfig). BTW: all of these changes are in my "amtime"
branch if anyone wants to pull them down.
--
Jeff Layton <jlayton@kernel.org>
Linus Torvalds Sept. 28, 2023, 5:41 p.m. UTC | #3
On Thu, 28 Sept 2023 at 04:06, Jeff Layton <jlayton@kernel.org> wrote:
>
> Move i_blocks up above the i_lock, which moves the new 4 byte hole to
> just after the timestamps, without changing the size of the structure.

I'm sure others have mentioned this, but 'struct inode' is marked with
__randomize_layout, so the actual layout may end up being very
different.

I'm personally not convinced the whole structure randomization is
worth it - it's easy enough to figure out for any distro kernel since
the seed has to be the same across machines for modules to work, so
even if the seed isn't "public", any layout is bound to be fairly
easily discoverable.

So the whole randomization only really works for private kernel
builds, and it adds this kind of pain where "optimizing" the structure
layout is kind of pointless depending on various options.

I certainly *hope* no distro enables that pointless thing, but it's a worry.

               Linus
Jeff Layton Sept. 28, 2023, 6:01 p.m. UTC | #4
On Thu, 2023-09-28 at 10:41 -0700, Linus Torvalds wrote:
> On Thu, 28 Sept 2023 at 04:06, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > Move i_blocks up above the i_lock, which moves the new 4 byte hole to
> > just after the timestamps, without changing the size of the structure.
> 
> I'm sure others have mentioned this, but 'struct inode' is marked with
> __randomize_layout, so the actual layout may end up being very
> different.
> 
> I'm personally not convinced the whole structure randomization is
> worth it - it's easy enough to figure out for any distro kernel since
> the seed has to be the same across machines for modules to work, so
> even if the seed isn't "public", any layout is bound to be fairly
> easily discoverable.
> 
> So the whole randomization only really works for private kernel
> builds, and it adds this kind of pain where "optimizing" the structure
> layout is kind of pointless depending on various options.
> 
> I certainly *hope* no distro enables that pointless thing, but it's a worry.
> 

I've never enabled struct randomization and don't know anyone who does.
I figure if you turn that on, you get to keep all of the pieces when you
start seeing weird performance problems.

I think that we have to optimize for that being disabled. Even without
that though, turning on and off options can change the layout...and then
there are different arches, etc.

I'm using a config derived from the Fedora x86_64 kernel images and hope
that represents a reasonably common configuration. The only conditional
members before the timestamps are based on CONFIG_FS_POSIX_ACL and
CONFIG_SECURITY, which are almost always turned on with most distros.
Christian Brauner Sept. 29, 2023, 9:32 a.m. UTC | #5
On Thu, Sep 28, 2023 at 10:41:34AM -0700, Linus Torvalds wrote:
> On Thu, 28 Sept 2023 at 04:06, Jeff Layton <jlayton@kernel.org> wrote:
> >
> > Move i_blocks up above the i_lock, which moves the new 4 byte hole to
> > just after the timestamps, without changing the size of the structure.
> 
> I'm sure others have mentioned this, but 'struct inode' is marked with
> __randomize_layout, so the actual layout may end up being very
> different.
> 
> I'm personally not convinced the whole structure randomization is
> worth it - it's easy enough to figure out for any distro kernel since
> the seed has to be the same across machines for modules to work, so
> even if the seed isn't "public", any layout is bound to be fairly
> easily discoverable.
> 
> So the whole randomization only really works for private kernel
> builds, and it adds this kind of pain where "optimizing" the structure
> layout is kind of pointless depending on various options.
> 
> I certainly *hope* no distro enables that pointless thing, but it's a worry.

They don't last we checked. Just last cycle we moved stuff in struct
file around to optimize things and we explicitly said we don't give a
damn about struct randomization. Anyone who enables this will bleed
performance pretty badly, I would reckon.
diff mbox series

Patch

diff --git a/include/linux/fs.h b/include/linux/fs.h
index de902ff2938b..3e0fe0f52e7c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -677,11 +677,11 @@  struct inode {
 	u32			i_atime_nsec;
 	u32			i_mtime_nsec;
 	u32			i_ctime_nsec;
+	blkcnt_t		i_blocks;
 	spinlock_t		i_lock;	/* i_blocks, i_bytes, maybe i_size */
 	unsigned short          i_bytes;
 	u8			i_blkbits;
 	u8			i_write_hint;
-	blkcnt_t		i_blocks;
 
 #ifdef __NEED_I_SIZE_ORDERED
 	seqcount_t		i_size_seqcount;