diff mbox series

xfs: Remove noinline from #define STATIC

Message ID 7302f4a13c1cbf62b07f636878ce25fcca84b6c4.camel@perches.com (mailing list archive)
State Deferred, archived
Headers show
Series xfs: Remove noinline from #define STATIC | expand

Commit Message

Joe Perches Nov. 11, 2018, 1:21 a.m. UTC
Reduce total object size quite a bit (~32KB) and presumably
improve performance at the same time.

Total object size old vs new (x86-64 defconfig with xfs)

    text	   data	    bss	    dec	    hex	filename
- 959351	 165573	    632	1125556	 112cb4	(TOTALS) (old)
+ 924683	 165669	    632	1090984	 10a5a8	(TOTALS) (new)

Signed-off-by: Joe Perches <joe@perches.com>
---
 fs/xfs/xfs_linux.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Eric Sandeen Nov. 12, 2018, 8:12 p.m. UTC | #1
On 11/10/18 7:21 PM, Joe Perches wrote:
> Reduce total object size quite a bit (~32KB) and presumably
> improve performance at the same time.
> 
> Total object size old vs new (x86-64 defconfig with xfs)
> 
>     text	   data	    bss	    dec	    hex	filename
> - 959351	 165573	    632	1125556	 112cb4	(TOTALS) (old)
> + 924683	 165669	    632	1090984	 10a5a8	(TOTALS) (new)

And what does it do to maximum stack excursions?

-Eric

> Signed-off-by: Joe Perches <joe@perches.com>
> ---
>  fs/xfs/xfs_linux.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> index edbd5a210df2..f33c8b626bca 100644
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -242,7 +242,7 @@ static inline uint64_t howmany_64(uint64_t x, uint32_t y)
>  #endif /* XFS_WARN */
>  #endif /* DEBUG */
>  
> -#define STATIC static noinline
> +#define STATIC static
>  
>  #ifdef CONFIG_XFS_RT
>  
>
Joe Perches Nov. 12, 2018, 9:05 p.m. UTC | #2
On Mon, 2018-11-12 at 14:12 -0600, Eric Sandeen wrote:
> On 11/10/18 7:21 PM, Joe Perches wrote:
> > Reduce total object size quite a bit (~32KB) and presumably
> > improve performance at the same time.
> > 
> > Total object size old vs new (x86-64 defconfig with xfs)
> > 
> >     text	   data	    bss	    dec	    hex	filename
> > - 959351	 165573	    632	1125556	 112cb4	(TOTALS) (old)
> > + 924683	 165669	    632	1090984	 10a5a8	(TOTALS) (new)
> 
> And what does it do to maximum stack excursions?

It seems to add a maximum of 40 bytes in xfs_reclaim_inodes_ag
and xfs_readdir, but I didn't do more than visually scan
checkstack output.

Using scripts/checkstack.pl on defconfig x86-64 with xfs
and cutting out absolute addresses

 diff -urN xfs_cs_old_2 xfs_cs_new_2 
--- xfs_cs_old_2	2018-11-12 12:55:03.195282512 -0800
+++ xfs_cs_new_2	2018-11-12 12:55:09.219168923 -0800
@@ -1,18 +1,20 @@
 xfs_ag_init_headers [vmlinux]:	464
 xfs_ag_init_headers [vmlinux]:	464
-xfs_inode_ag_walk.isra.19 [vmlinux]:	416
-xfs_inode_ag_walk.isra.19 [vmlinux]:	416
+xfs_inode_ag_walk.isra.21 [vmlinux]:	424
+xfs_inode_ag_walk.isra.21 [vmlinux]:	424
+xfs_reclaim_inodes_ag [vmlinux]:	368
+xfs_reclaim_inodes_ag [vmlinux]:	368
 xfs_trans_committed_bulk [vmlinux]:	336
-xfs_reclaim_inodes_ag [vmlinux]:	328
-xfs_reclaim_inodes_ag [vmlinux]:	328
 xfs_ioc_getfsmap.isra.23 [vmlinux]:	320
 xfs_ioc_getfsmap.isra.23 [vmlinux]:	320
+xfs_file_ioctl [vmlinux]:		320
+xfs_file_ioctl [vmlinux]:		320
+xfs_bmapi_write [vmlinux]:		312
 xfs_getfsmap [vmlinux]:		304
 xfs_qm_dquot_walk.isra.11 [vmlinux]:	296
 xfs_qm_dquot_walk.isra.11 [vmlinux]:	296
-xfs_bmapi_write [vmlinux]:		288
-xfs_file_ioctl [vmlinux]:		288
-xfs_file_ioctl [vmlinux]:		288
+xfs_file_compat_ioctl [vmlinux]:	288
+xfs_file_compat_ioctl [vmlinux]:	288
 xfs_sb_read_verify [vmlinux]:	280
 xfs_sb_read_verify [vmlinux]:	280
 xfs_sb_write_verify [vmlinux]:	272
@@ -21,36 +23,38 @@
 xfs_rmap_convert [vmlinux]:		264
 xfs_rmap_convert_shared [vmlinux]:	256
 xfs_rmap_convert_shared [vmlinux]:	256
-xfs_file_compat_ioctl [vmlinux]:	256
-xfs_file_compat_ioctl [vmlinux]:	256
 xfs_bmap_extents_to_btree [vmlinux]:	248
 xfs_bmap_extents_to_btree [vmlinux]:	248
+xfs_bmap_del_extent_real [vmlinux]:	240
+xfs_alloc_fix_freelist [vmlinux]:	224
+xfs_alloc_fix_freelist [vmlinux]:	224
+xfs_bmap_add_extent_unwritten_real [vmlinux]:224
 xfs_symlink [vmlinux]:		224
 xfs_symlink [vmlinux]:		224
-xfs_alloc_fix_freelist [vmlinux]:	216
-xfs_alloc_fix_freelist [vmlinux]:	216
-xfs_bmap_local_to_extents.constprop.27 [vmlinux]:208
-xfs_bmap_local_to_extents.constprop.27 [vmlinux]:208
-xfs_ialloc_ag_alloc [vmlinux]:	208
-xfs_ialloc_ag_alloc [vmlinux]:	208
-xfs_bmap_add_extent_unwritten_real [vmlinux]:192
-xfs_bmap_add_extent_unwritten_real [vmlinux]:192
-xfs_bmap_add_extent_delay_real [vmlinux]:192
-xfs_bmap_add_extent_delay_real [vmlinux]:192
-xfs_bmap_del_extent_real [vmlinux]:	192
-xfs_bmap_del_extent_real [vmlinux]:	192
-xfs_bmap_btalloc [vmlinux]:		192
-xfs_bmap_btalloc [vmlinux]:		192
+xfs_bmap_btalloc [vmlinux]:		216
+xfs_bmap_btalloc [vmlinux]:		216
+xfs_attr_rmtval_set [vmlinux]:	208
+xfs_bmap_add_extent_delay_real [vmlinux]:208
+xfs_bmap_local_to_extents.constprop.28 [vmlinux]:208
+xfs_bmap_local_to_extents.constprop.28 [vmlinux]:208
+xfs_bulkstat [vmlinux]:		208
+xfs_ialloc_ag_alloc [vmlinux]:	200
+xfs_ialloc_ag_alloc [vmlinux]:	200
 xfs_refcountbt_alloc_block [vmlinux]:192
 xfs_refcountbt_alloc_block [vmlinux]:192
+xfs_reflink_remap_blocks [vmlinux]:	192
+xfs_reflink_remap_blocks [vmlinux]:	192
 __xfs_bunmapi [vmlinux]:		184
 __xfs_bunmapi [vmlinux]:		184
 xfs_rmap_map [vmlinux]:		184
 xfs_rmap_map [vmlinux]:		184
+xfs_readdir [vmlinux]:		184
+xfs_readdir [vmlinux]:		184
 xfs_attr_set [vmlinux]:		176
 xfs_attr_set [vmlinux]:		176
 xfs_attr3_leaf_to_shortform [vmlinux]:176
 xfs_attr3_leaf_to_shortform [vmlinux]:176
+xfs_bmap_add_extent_hole_real [vmlinux]:176
 xfs_attr_shortform_to_leaf [vmlinux]:168
 xfs_attr_shortform_to_leaf [vmlinux]:168
 xfs_btree_delrec [vmlinux]:		168
@@ -59,89 +63,99 @@
 xfs_rmap_unmap [vmlinux]:		168
 xfs_rmap_map_shared [vmlinux]:	168
 xfs_rmap_map_shared [vmlinux]:	168
-xfs_bulkstat [vmlinux]:		168
-xfs_bulkstat [vmlinux]:		168
 xfs_scrub_metadata [vmlinux]:	168
 xfs_scrub_metadata [vmlinux]:	168
 xfs_free_extent_fix_freelist [vmlinux]:160
 xfs_free_extent_fix_freelist [vmlinux]:160
-xfs_bmap_add_extent_hole_real [vmlinux]:160
-xfs_bmap_add_extent_hole_real [vmlinux]:160
+xfs_attr3_leaf_split [vmlinux]:	160
+xfs_attr3_leaf_split [vmlinux]:	160
 xfs_bmap_insert_extents [vmlinux]:	160
 xfs_bmap_insert_extents [vmlinux]:	160
 xfs_bmbt_alloc_block [vmlinux]:	160
 xfs_bmbt_alloc_block [vmlinux]:	160
 xfs_btree_overlapped_query_range [vmlinux]:160
 xfs_btree_overlapped_query_range [vmlinux]:160
+xfs_dialloc [vmlinux]:		160
+xfs_dialloc [vmlinux]:		160
 __xfs_inobt_alloc_block.isra.9 [vmlinux]:160
 __xfs_inobt_alloc_block.isra.9 [vmlinux]:160
 xfs_ag_resv_rmapbt_alloc [vmlinux]:	160
 xfs_ag_resv_rmapbt_alloc [vmlinux]:	160
+xfs_do_writepage [vmlinux]:		160
+xfs_do_writepage [vmlinux]:		160
 xfs_dir2_leaf_readbuf [vmlinux]:	160
 xfs_readlink_bmap_ilocked [vmlinux]:	160
 xfs_readlink_bmap_ilocked [vmlinux]:	160
 xfs_ag_resv_rmapbt_alloc [vmlinux]:	160
 xfs_ag_resv_rmapbt_alloc [vmlinux]:	160
+xfs_da3_split [vmlinux]:		152
+xfs_da3_split [vmlinux]:		152
 xfs_getbmap [vmlinux]:		152
 xfs_getbmap [vmlinux]:		152
 trace_raw_output_xfs_loggrant_class [vmlinux]:144
 trace_raw_output_xfs_alloc_class [vmlinux]:144
 xfs_attr_remove [vmlinux]:		144
 xfs_attr_remove [vmlinux]:		144
-xfs_dialloc_ag_inobt [vmlinux]:	144
-xfs_dialloc_ag_inobt [vmlinux]:	144
+xfs_bmap_split_extent_at [vmlinux]:	144
+__xfs_btree_split.isra.45 [vmlinux]:	144
+__xfs_btree_split.isra.45 [vmlinux]:	144
+xfs_btree_insrec [vmlinux]:		144
+xfs_btree_insrec [vmlinux]:		144
+xfs_da_shrink_inode [vmlinux]:	144
+xfs_da_shrink_inode [vmlinux]:	144
+xfs_attr_list_int_ilocked [vmlinux]:	144
+xfs_attr_list_int_ilocked [vmlinux]:	144
 xfs_swap_extent_rmap [vmlinux]:	144
 xfs_swap_extent_rmap [vmlinux]:	144
-xfs_readdir [vmlinux]:		144
-xfs_readdir [vmlinux]:		144
+xfs_ioc_trim [vmlinux]:		144
+xfs_ioc_trim [vmlinux]:		144
 xfs_inactive_symlink_rmt [vmlinux]:	144
 xfs_inactive_symlink_rmt [vmlinux]:	144
 xfs_attr_get [vmlinux]:		136
 xfs_attr_get [vmlinux]:		136
+xfs_attr_rmtval_get [vmlinux]:	136
+xfs_attr_rmtval_get [vmlinux]:	136
 xfs_bmap_add_attrfork_local [vmlinux]:136
 xfs_bmap_add_attrfork_local [vmlinux]:136
-__xfs_btree_split.isra.38 [vmlinux]:	136
-__xfs_btree_split.isra.38 [vmlinux]:	136
 xfs_btree_split [vmlinux]:		136
 xfs_btree_split [vmlinux]:		136
-xfs_btree_insrec [vmlinux]:		136
-xfs_btree_insrec [vmlinux]:		136
+xfs_rmap_finish_one [vmlinux]:	136
+xfs_rmap_finish_one [vmlinux]:	136
+xfs_attr3_leaf_inactive [vmlinux]:	136
+xfs_attr3_leaf_inactive [vmlinux]:	136
 xfs_rename [vmlinux]:		136
 xfs_rename [vmlinux]:		136
 xfs_attr3_leaf_unbalance [vmlinux]:	120
 xfs_attr3_leaf_unbalance [vmlinux]:	120
-xfs_attr_rmtval_set [vmlinux]:	120
-xfs_attr_rmtval_set [vmlinux]:	120
-xfs_ioc_fsgeometry_v1 [vmlinux]:	120
-xfs_ioc_fsgeometry_v1 [vmlinux]:	120
-xfs_ioc_fsgeometry [vmlinux]:	120
-xfs_ioc_fsgeometry [vmlinux]:	120
+xfs_refcount_merge_extents [vmlinux]:120
+xfs_refcount_merge_extents [vmlinux]:120
 xfs_file_iomap_begin [vmlinux]:	120
 xfs_file_iomap_begin [vmlinux]:	120
-xfs_compat_ioc_fsgeometry_v1 [vmlinux]:120
-xfs_compat_ioc_fsgeometry_v1 [vmlinux]:120
+xfs_ifree [vmlinux]:			120
+xfs_ifree [vmlinux]:			120
+xfs_qm_reset_dqcounts_buf.part.15 [vmlinux]:120
+xfs_qm_reset_dqcounts_buf.part.15 [vmlinux]:120
+xfs_rtallocate_extent [vmlinux]:	120
+xfs_rtallocate_extent [vmlinux]:	120
+xfs_free_ag_extent [vmlinux]:	112
 xfs_attr3_leaf_toosmall [vmlinux]:	112
 xfs_attr3_leaf_toosmall [vmlinux]:	112
-xfs_attr3_leaf_rebalance [vmlinux]:	112
-xfs_attr3_leaf_rebalance [vmlinux]:	112
+xfs_bmap_del_extent_delay [vmlinux]:	112
+xfs_bmap_del_extent_delay [vmlinux]:	112
 xfs_iread_extents [vmlinux]:		112
 xfs_iread_extents [vmlinux]:		112
-xfs_bmap_split_extent_at [vmlinux]:	112
-xfs_bmap_split_extent_at [vmlinux]:	112
 xfs_btree_insert [vmlinux]:		112
 xfs_btree_query_range [vmlinux]:	112
-xfs_da3_swap_lastblock [vmlinux]:	112
-xfs_da3_swap_lastblock [vmlinux]:	112
 xfs_dir2_node_addname [vmlinux]:	112
 xfs_dir2_node_addname [vmlinux]:	112
 xfs_dir2_node_removename [vmlinux]:	112
 xfs_dir2_node_removename [vmlinux]:	112
-xfs_trim_extents [vmlinux]:		112
-xfs_trim_extents [vmlinux]:		112
+xfs_difree [vmlinux]:		112
+xfs_difree [vmlinux]:		112
 xfs_growfs_data [vmlinux]:		112
 xfs_growfs_data [vmlinux]:		112
-xfs_ifree_cluster.isra.21 [vmlinux]:	112
-xfs_ifree_cluster.isra.21 [vmlinux]:	112
+xfs_iget [vmlinux]:			112
+xfs_iget [vmlinux]:			112
 xfs_inumbers [vmlinux]:		112
 xfs_inumbers [vmlinux]:		112
 xfs_reflink_end_cow [vmlinux]:	112
@@ -149,17 +163,19 @@
 xfs_vn_listxattr [vmlinux]:		112
 xfs_vn_listxattr [vmlinux]:		112
 xfsaild [vmlinux]:			112
+xfs_growfs_rt [vmlinux]:		112
+xfs_growfs_rt [vmlinux]:		112
 trace_raw_output_xfs_dquot_class [vmlinux]:104
 xfs_bmapi_reserve_delalloc [vmlinux]:104
 xfs_bmapi_reserve_delalloc [vmlinux]:104
+xfs_btree_lshift [vmlinux]:		104
+xfs_btree_lshift [vmlinux]:		104
 xfs_da_grow_inode_int [vmlinux]:	104
 xfs_da_grow_inode_int [vmlinux]:	104
 xfs_dir2_leaf_removename [vmlinux]:	104
 xfs_dir2_leaf_removename [vmlinux]:	104
 xfs_dir2_leafn_split [vmlinux]:	104
 xfs_dir2_leafn_split [vmlinux]:	104
-xfs_iget [vmlinux]:			104
-xfs_iget [vmlinux]:			104
 xfs_ioc_space [vmlinux]:		104
 xfs_ioc_space [vmlinux]:		104
 xfs_emerg [vmlinux]:			104
@@ -169,15 +185,9 @@
 xfs_warn [vmlinux]:			104
 xfs_notice [vmlinux]:		104
 xfs_info [vmlinux]:			104
-xfs_reflink_remap_extent [vmlinux]:	104
-xfs_reflink_remap_extent [vmlinux]:	104
-xfs_reflink_dirty_extents [vmlinux]:	104
-xfs_reflink_dirty_extents [vmlinux]:	104
 xfs_reflink_cancel_cow_blocks [vmlinux]:104
 xfs_reflink_cancel_cow_blocks [vmlinux]:104
 xfs_log_quiesce [vmlinux]:		104
 xfs_log_quiesce [vmlinux]:		104
-xfs_rtallocate_extent_near [vmlinux]:104
-xfs_rtallocate_extent_near [vmlinux]:104
 xfs_set_acl [vmlinux]:		104
 xfs_set_acl [vmlinux]:		104
Dave Chinner Nov. 12, 2018, 9:45 p.m. UTC | #3
On Mon, Nov 12, 2018 at 02:12:08PM -0600, Eric Sandeen wrote:
> On 11/10/18 7:21 PM, Joe Perches wrote:
> > Reduce total object size quite a bit (~32KB) and presumably
> > improve performance at the same time.
> > 
> > Total object size old vs new (x86-64 defconfig with xfs)
> > 
> >     text	   data	    bss	    dec	    hex	filename
> > - 959351	 165573	    632	1125556	 112cb4	(TOTALS) (old)
> > + 924683	 165669	    632	1090984	 10a5a8	(TOTALS) (new)
> 
> And what does it do to maximum stack excursions?

Better yet: what does it do to corruption stack traces and debugging
tools like profiling traces?

i.e. this noinline directive isn't about stack usage, this is about
being able to debug production code. Basically the compiler inliner
is so agressive on static functions that it makes it impossible to
decipher the stack traces. It flattens them way too much to
be able to tell how we got to a specific location in the code.

In reality, being able to find problems quickly and efficiently is
far more important to us than being able to run everything at
ludicrous speed....

Cheers,

Dave.
Joe Perches Nov. 12, 2018, 10:30 p.m. UTC | #4
On Tue, 2018-11-13 at 08:45 +1100, Dave Chinner wrote:
> On Mon, Nov 12, 2018 at 02:12:08PM -0600, Eric Sandeen wrote:
> > On 11/10/18 7:21 PM, Joe Perches wrote:
> > > Reduce total object size quite a bit (~32KB) and presumably
> > > improve performance at the same time.
> > > 
> > > Total object size old vs new (x86-64 defconfig with xfs)
> > > 
> > >     text	   data	    bss	    dec	    hex	filename
> > > - 959351	 165573	    632	1125556	 112cb4	(TOTALS) (old)
> > > + 924683	 165669	    632	1090984	 10a5a8	(TOTALS) (new)
> > 
> > And what does it do to maximum stack excursions?
> 
> Better yet: what does it do to corruption stack traces and debugging
> tools like profiling traces?
> 
> i.e. this noinline directive isn't about stack usage, this is about
> being able to debug production code. Basically the compiler inliner
> is so agressive on static functions that it makes it impossible to
> decipher the stack traces. It flattens them way too much to
> be able to tell how we got to a specific location in the code.
> 
> In reality, being able to find problems quickly and efficiently is
> far more important to us than being able to run everything at
> ludicrous speed....

Is that really a compelling argument given thw ~50:50
split of static/STATIC uses in xfs?

$ git grep -w STATIC fs/xfs/ | wc -l
1064
$ git grep -w static fs/xfs/ | wc -l
942
Dave Chinner Nov. 13, 2018, 1:18 a.m. UTC | #5
On Mon, Nov 12, 2018 at 02:30:01PM -0800, Joe Perches wrote:
> On Tue, 2018-11-13 at 08:45 +1100, Dave Chinner wrote:
> > On Mon, Nov 12, 2018 at 02:12:08PM -0600, Eric Sandeen wrote:
> > > On 11/10/18 7:21 PM, Joe Perches wrote:
> > > > Reduce total object size quite a bit (~32KB) and presumably
> > > > improve performance at the same time.
> > > > 
> > > > Total object size old vs new (x86-64 defconfig with xfs)
> > > > 
> > > >     text	   data	    bss	    dec	    hex	filename
> > > > - 959351	 165573	    632	1125556	 112cb4	(TOTALS) (old)
> > > > + 924683	 165669	    632	1090984	 10a5a8	(TOTALS) (new)
> > > 
> > > And what does it do to maximum stack excursions?
> > 
> > Better yet: what does it do to corruption stack traces and debugging
> > tools like profiling traces?
> > 
> > i.e. this noinline directive isn't about stack usage, this is about
> > being able to debug production code. Basically the compiler inliner
> > is so agressive on static functions that it makes it impossible to
> > decipher the stack traces. It flattens them way too much to
> > be able to tell how we got to a specific location in the code.
> > 
> > In reality, being able to find problems quickly and efficiently is
> > far more important to us than being able to run everything at
> > ludicrous speed....
> 
> Is that really a compelling argument given thw ~50:50
> split of static/STATIC uses in xfs?

Historically, yes. We're talking about code with call chains that
can go 50-60 functions deep here. If that gets flattened to 10-20
functions by compiler inlining (which is the sort of thing that
happens) then we lose a huge amount of visibility into the workings
of the code. This affects profiling, stack traces on corruption,
dynamic debug probes, kernel crash dump analysis, etc.

I'm not interested in making code fast if distro support engineers
can't debug problems on user systems easily. Optimising for
performance over debuggability is a horrible trade off for us to
make because it means users and distros end up much more reliant on
single points of expertise for debugging problems. And that means
the majority of the load of problem triage falls directly on very
limited resources - the core XFS development team. A little bit of
thought about how to make code easier to triage and debug goes a
long, long way....

Indeed, this is not a new problem - we've been using techniques like
STATIC in one form or another to stop compiler inlining and/or
function hiding since XFS was first ported to linux 20 years ago.
In fact, STATIC was inherited from Irix because it helped with
debugging via the userspace simulator that the initial XFS code was
developed on. i.e.  STATIC was present in the initial XFS commit
made way back in 1993, and we've been using it ever since...

Cheers,

Dave.
Theodore Ts'o Nov. 13, 2018, 1:54 a.m. UTC | #6
On Tue, Nov 13, 2018 at 12:18:05PM +1100, Dave Chinner wrote:
> I'm not interested in making code fast if distro support engineers
> can't debug problems on user systems easily. Optimising for
> performance over debuggability is a horrible trade off for us to
> make because it means users and distros end up much more reliant on
> single points of expertise for debugging problems. And that means
> the majority of the load of problem triage falls directly on very
> limited resources - the core XFS development team. A little bit of
> thought about how to make code easier to triage and debug goes a
> long, long way....

So at least in my experience, if the kernels are compiled with
CONFIG_DEBUG_INFO and/or CONFIG_DEBUG_INFO_REDUCED,
scripts/decode_stracktrace.sh seems to do a very nice job with inlined
functions.  Now, ext4 generally only has about 3 or 4 nested inlines,
and so I don't know how it works with 20 or 30 nested inlined
functions, so perhaps this is not applicable for XFS.

But it perhaps toolchain technology has advanced since the Irix days
such that it's no longer as necessary to force the non-inlining of
functions for easing debugging?

					- Ted
Dave Chinner Nov. 13, 2018, 3:09 a.m. UTC | #7
On Mon, Nov 12, 2018 at 08:54:10PM -0500, Theodore Y. Ts'o wrote:
> On Tue, Nov 13, 2018 at 12:18:05PM +1100, Dave Chinner wrote:
> > I'm not interested in making code fast if distro support engineers
> > can't debug problems on user systems easily. Optimising for
> > performance over debuggability is a horrible trade off for us to
> > make because it means users and distros end up much more reliant on
> > single points of expertise for debugging problems. And that means
> > the majority of the load of problem triage falls directly on very
> > limited resources - the core XFS development team. A little bit of
> > thought about how to make code easier to triage and debug goes a
> > long, long way....
> 
> So at least in my experience, if the kernels are compiled with
> CONFIG_DEBUG_INFO and/or CONFIG_DEBUG_INFO_REDUCED,
> scripts/decode_stracktrace.sh seems to do a very nice job with inlined

That doesn't help with kernel profiling and other such things that
are based on callgraphs...

> functions.  Now, ext4 generally only has about 3 or 4 nested inlines,
> and so I don't know how it works with 20 or 30 nested inlined
> functions, so perhaps this is not applicable for XFS.
> 
> But it perhaps toolchain technology has advanced since the Irix days
> such that it's no longer as necessary to force the non-inlining of
> functions for easing debugging?

Not that I've noticed.

Indeed, modern toolchains are moving the opposite direction - have
you ever tried to debug a binary with gdb that was compiled with LTO
enabled? Or maybe even just tried to profile it with perf?

Cheers,

Dave.
Joe Perches Nov. 13, 2018, 4:23 a.m. UTC | #8
On Tue, 2018-11-13 at 14:09 +1100, Dave Chinner wrote:
> On Mon, Nov 12, 2018 at 08:54:10PM -0500, Theodore Y. Ts'o wrote:
> > On Tue, Nov 13, 2018 at 12:18:05PM +1100, Dave Chinner wrote:
> > > I'm not interested in making code fast if distro support engineers
> > > can't debug problems on user systems easily. Optimising for
> > > performance over debuggability is a horrible trade off for us to
> > > make because it means users and distros end up much more reliant on
> > > single points of expertise for debugging problems. And that means
> > > the majority of the load of problem triage falls directly on very
> > > limited resources - the core XFS development team. A little bit of
> > > thought about how to make code easier to triage and debug goes a
> > > long, long way....
> > 
> > So at least in my experience, if the kernels are compiled with
> > CONFIG_DEBUG_INFO and/or CONFIG_DEBUG_INFO_REDUCED,
> > scripts/decode_stracktrace.sh seems to do a very nice job with inlined
> 
> That doesn't help with kernel profiling and other such things that
> are based on callgraphs...

If that's really the case:

I rather suspect the xfs static v STATIC function marking is not
particularly curated and the marking is somewhat arbitrary.

So perhaps given the large number of static, but not STATIC
functions, perhaps a sed of s/static/STATIC/ should be done
when it's not inline for all xfs functions.
Dave Chinner Nov. 13, 2018, 5:26 a.m. UTC | #9
On Mon, Nov 12, 2018 at 08:23:42PM -0800, Joe Perches wrote:
> On Tue, 2018-11-13 at 14:09 +1100, Dave Chinner wrote:
> > On Mon, Nov 12, 2018 at 08:54:10PM -0500, Theodore Y. Ts'o wrote:
> > > On Tue, Nov 13, 2018 at 12:18:05PM +1100, Dave Chinner wrote:
> > > > I'm not interested in making code fast if distro support engineers
> > > > can't debug problems on user systems easily. Optimising for
> > > > performance over debuggability is a horrible trade off for us to
> > > > make because it means users and distros end up much more reliant on
> > > > single points of expertise for debugging problems. And that means
> > > > the majority of the load of problem triage falls directly on very
> > > > limited resources - the core XFS development team. A little bit of
> > > > thought about how to make code easier to triage and debug goes a
> > > > long, long way....
> > > 
> > > So at least in my experience, if the kernels are compiled with
> > > CONFIG_DEBUG_INFO and/or CONFIG_DEBUG_INFO_REDUCED,
> > > scripts/decode_stracktrace.sh seems to do a very nice job with inlined
> > 
> > That doesn't help with kernel profiling and other such things that
> > are based on callgraphs...
> 
> If that's really the case:
> 
> I rather suspect the xfs static v STATIC function marking is not
> particularly curated and the marking is somewhat arbitrary.

That's a common opinion for an outsider to form when they come
across something unfamiliar they don't really understand. "I don't
understand this, so I must rewrite it" is an unfortunate habit that
programmers have.

> So perhaps given the large number of static, but not STATIC
> functions, perhaps a sed of s/static/STATIC/ should be done
> when it's not inline for all xfs functions.

That's just as bad as removing them all, if not worse.

If you are writing new code or reworking existing code, then we'll
consider the usage of STATIC/static in the context of that work.
Otherwise, we leave it alone.

It if ain't broke, don't fix it. And it sure as hell isn't broken
right now. We've got more than enough bugs to fix without having to
deal with drive-by bikeshed painting...

-Dave.
Joe Perches Nov. 13, 2018, 5:31 a.m. UTC | #10
On Tue, 2018-11-13 at 16:26 +1100, Dave Chinner wrote:
> On Mon, Nov 12, 2018 at 08:23:42PM -0800, Joe Perches wrote:
> > On Tue, 2018-11-13 at 14:09 +1100, Dave Chinner wrote:
> > > On Mon, Nov 12, 2018 at 08:54:10PM -0500, Theodore Y. Ts'o wrote:
> > > > On Tue, Nov 13, 2018 at 12:18:05PM +1100, Dave Chinner wrote:
> > > > > I'm not interested in making code fast if distro support engineers
> > > > > can't debug problems on user systems easily. Optimising for
> > > > > performance over debuggability is a horrible trade off for us to
> > > > > make because it means users and distros end up much more reliant on
> > > > > single points of expertise for debugging problems. And that means
> > > > > the majority of the load of problem triage falls directly on very
> > > > > limited resources - the core XFS development team. A little bit of
> > > > > thought about how to make code easier to triage and debug goes a
> > > > > long, long way....
> > > > 
> > > > So at least in my experience, if the kernels are compiled with
> > > > CONFIG_DEBUG_INFO and/or CONFIG_DEBUG_INFO_REDUCED,
> > > > scripts/decode_stracktrace.sh seems to do a very nice job with inlined
> > > 
> > > That doesn't help with kernel profiling and other such things that
> > > are based on callgraphs...
> > 
> > If that's really the case:
> > 
> > I rather suspect the xfs static v STATIC function marking is not
> > particularly curated and the marking is somewhat arbitrary.
> 
> That's a common opinion for an outsider to form when they come
> across something unfamiliar they don't really understand. "I don't
> understand this, so I must rewrite it" is an unfortunate habit that
> programmers have.

Silly.

> > So perhaps given the large number of static, but not STATIC
> > functions, perhaps a sed of s/static/STATIC/ should be done
> > when it's not inline for all xfs functions.
> 
> That's just as bad as removing them all, if not worse. 

Why?

> If you are writing new code or reworking existing code, then we'll
> consider the usage of STATIC/static in the context of that work.
> Otherwise, we leave it alone.

If your statement is as described above, and
the STATIC use to enable call stack tracing i
useful, why shouldn't it be systemic?

> It if ain't broke, don't fix it.

A generically lazy statement.
Darrick J. Wong Nov. 13, 2018, 5:44 a.m. UTC | #11
On Mon, Nov 12, 2018 at 09:31:51PM -0800, Joe Perches wrote:
> On Tue, 2018-11-13 at 16:26 +1100, Dave Chinner wrote:
> > On Mon, Nov 12, 2018 at 08:23:42PM -0800, Joe Perches wrote:
> > > On Tue, 2018-11-13 at 14:09 +1100, Dave Chinner wrote:
> > > > On Mon, Nov 12, 2018 at 08:54:10PM -0500, Theodore Y. Ts'o wrote:
> > > > > On Tue, Nov 13, 2018 at 12:18:05PM +1100, Dave Chinner wrote:
> > > > > > I'm not interested in making code fast if distro support engineers
> > > > > > can't debug problems on user systems easily. Optimising for
> > > > > > performance over debuggability is a horrible trade off for us to
> > > > > > make because it means users and distros end up much more reliant on
> > > > > > single points of expertise for debugging problems. And that means
> > > > > > the majority of the load of problem triage falls directly on very
> > > > > > limited resources - the core XFS development team. A little bit of
> > > > > > thought about how to make code easier to triage and debug goes a
> > > > > > long, long way....
> > > > > 
> > > > > So at least in my experience, if the kernels are compiled with
> > > > > CONFIG_DEBUG_INFO and/or CONFIG_DEBUG_INFO_REDUCED,
> > > > > scripts/decode_stracktrace.sh seems to do a very nice job with inlined
> > > > 
> > > > That doesn't help with kernel profiling and other such things that
> > > > are based on callgraphs...
> > > 
> > > If that's really the case:
> > > 
> > > I rather suspect the xfs static v STATIC function marking is not
> > > particularly curated and the marking is somewhat arbitrary.

I disagree.  I've added plenty of code over the past couple of years.
Short functions with few or no branches (e.g. converters) are 'static';
longer functions (loops, iterators, "decide what to do with this"
functions, etc.) with many branches are STATIC to make it easier for me
to ftrace their decisions over a given dataset.

> > That's a common opinion for an outsider to form when they come
> > across something unfamiliar they don't really understand. "I don't
> > understand this, so I must rewrite it" is an unfortunate habit that
> > programmers have.
> 
> Silly.

Yet frequent.

> > > So perhaps given the large number of static, but not STATIC
> > > functions, perhaps a sed of s/static/STATIC/ should be done
> > > when it's not inline for all xfs functions.
> > 
> > That's just as bad as removing them all, if not worse. 
> 
> Why?
> 
> > If you are writing new code or reworking existing code, then we'll
> > consider the usage of STATIC/static in the context of that work.
> > Otherwise, we leave it alone.
> 
> If your statement is as described above, and
> the STATIC use to enable call stack tracing i
> useful, why shouldn't it be systemic?
> 
> > It if ain't broke, don't fix it.
> 
> A generically lazy statement.

Please everyone let's take a breather from this thread for a few hours.
A 3.7% reduction in code size is not worth getting worked up over, IMO.

--D

> 
>
Christoph Hellwig Nov. 15, 2018, 10:12 a.m. UTC | #12
On Tue, Nov 13, 2018 at 04:26:51PM +1100, Dave Chinner wrote:
> That's just as bad as removing them all, if not worse.
> 
> If you are writing new code or reworking existing code, then we'll
> consider the usage of STATIC/static in the context of that work.
> Otherwise, we leave it alone.
> 
> It if ain't broke, don't fix it. And it sure as hell isn't broken
> right now. We've got more than enough bugs to fix without having to
> deal with drive-by bikeshed painting...

Agreed.  That being said I think we should aim to add manual
noline annotations to those functions where we really care while we
go through the code instead of the weird STATIC that just seems to
cause this kind of confusion.

And if Joe or somone else can come up with a few patches where removing
the noinline (aka STATIC) makes a huge difference for a small number
of functions we should consider it as well.
diff mbox series

Patch

diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index edbd5a210df2..f33c8b626bca 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -242,7 +242,7 @@  static inline uint64_t howmany_64(uint64_t x, uint32_t y)
 #endif /* XFS_WARN */
 #endif /* DEBUG */
 
-#define STATIC static noinline
+#define STATIC static
 
 #ifdef CONFIG_XFS_RT