Message ID | 20240415012136.3636671-1-leo.lilong@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xarray: inline xas_descend to improve performance | expand |
On Mon, 15 Apr 2024 09:21:36 +0800 Long Li <leo.lilong@huawei.com> wrote: > The commit 63b1898fffcd ("XArray: Disallow sibling entries of nodes") > modified the xas_descend function in such a way that it was no longer > being compiled as an inline function, because it increased the size of > xas_descend(), and the compiler no longer optimizes it as inline. This > had a negative impact on performance, xas_descend is called frequently > to traverse downwards in the xarray tree, making it a hot function. > > Inlining xas_descend has been shown to significantly improve performance > by approximately 4.95% in the iozone write test. > > Machine: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz > #iozone i 0 -i 1 -s 64g -r 16m -f /test/tmptest > > Before this patch: > > kB reclen write rewrite read reread > 67108864 16384 2230080 3637689 6 315197 5496027 > > After this patch: > > kB reclen write rewrite read reread > 67108864 16384 2340360 3666175 6272401 5460782 > > Percentage change: > 4.95% 0.78% -0.68% -0.64% > > This patch introduces inlining to the xas_descend function. While this > change increases the size of lib/xarray.o, the performance gains in > critical workloads make this an acceptable trade-off. > > Size comparison before and after patch: > .text .data .bss file > 0x3502 0 0 lib/xarray.o.before > 0x3602 0 0 lib/xarray.o.after > > ... > > --- a/lib/xarray.c > +++ b/lib/xarray.c > @@ -200,7 +200,7 @@ static void *xas_start(struct xa_state *xas) > return entry; > } > > -static void *xas_descend(struct xa_state *xas, struct xa_node *node) > +static inline void *xas_descend(struct xa_state *xas, struct xa_node *node) > { > unsigned int offset = get_offset(xas->xa_index, node); > void *entry = xa_entry(xas->xa, node, offset); I thought gcc nowadays treats `inline' as avisory and still makes up its own mind? Perhaps we should use __always_inline here?
On Mon, Apr 15, 2024 at 01:10:53PM -0700, Andrew Morton wrote: > On Mon, 15 Apr 2024 09:21:36 +0800 Long Li <leo.lilong@huawei.com> wrote: > > > The commit 63b1898fffcd ("XArray: Disallow sibling entries of nodes") > > modified the xas_descend function in such a way that it was no longer > > being compiled as an inline function, because it increased the size of > > xas_descend(), and the compiler no longer optimizes it as inline. This > > had a negative impact on performance, xas_descend is called frequently > > to traverse downwards in the xarray tree, making it a hot function. > > > > Inlining xas_descend has been shown to significantly improve performance > > by approximately 4.95% in the iozone write test. > > > > Machine: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz > > #iozone i 0 -i 1 -s 64g -r 16m -f /test/tmptest > > > > Before this patch: > > > > kB reclen write rewrite read reread > > 67108864 16384 2230080 3637689 6 315197 5496027 > > > > After this patch: > > > > kB reclen write rewrite read reread > > 67108864 16384 2340360 3666175 6272401 5460782 > > > > Percentage change: > > 4.95% 0.78% -0.68% -0.64% > > > > This patch introduces inlining to the xas_descend function. While this > > change increases the size of lib/xarray.o, the performance gains in > > critical workloads make this an acceptable trade-off. > > > > Size comparison before and after patch: > > .text .data .bss file > > 0x3502 0 0 lib/xarray.o.before > > 0x3602 0 0 lib/xarray.o.after > > > > ... > > > > --- a/lib/xarray.c > > +++ b/lib/xarray.c > > @@ -200,7 +200,7 @@ static void *xas_start(struct xa_state *xas) > > return entry; > > } > > > > -static void *xas_descend(struct xa_state *xas, struct xa_node *node) > > +static inline void *xas_descend(struct xa_state *xas, struct xa_node *node) > > { > > unsigned int offset = get_offset(xas->xa_index, node); > > void *entry = xa_entry(xas->xa, node, offset); > > I thought gcc nowadays treats `inline' as avisory and still makes up > its own mind? > > Perhaps we should use __always_inline here? Yes, I agree with you, I will send a new version. thanks!
diff --git a/lib/xarray.c b/lib/xarray.c index fbf1d1dd83bc..31e95b461c03 100644 --- a/lib/xarray.c +++ b/lib/xarray.c @@ -200,7 +200,7 @@ static void *xas_start(struct xa_state *xas) return entry; } -static void *xas_descend(struct xa_state *xas, struct xa_node *node) +static inline void *xas_descend(struct xa_state *xas, struct xa_node *node) { unsigned int offset = get_offset(xas->xa_index, node); void *entry = xa_entry(xas->xa, node, offset);
The commit 63b1898fffcd ("XArray: Disallow sibling entries of nodes") modified the xas_descend function in such a way that it was no longer being compiled as an inline function, because it increased the size of xas_descend(), and the compiler no longer optimizes it as inline. This had a negative impact on performance, xas_descend is called frequently to traverse downwards in the xarray tree, making it a hot function. Inlining xas_descend has been shown to significantly improve performance by approximately 4.95% in the iozone write test. Machine: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz #iozone i 0 -i 1 -s 64g -r 16m -f /test/tmptest Before this patch: kB reclen write rewrite read reread 67108864 16384 2230080 3637689 6 315197 5496027 After this patch: kB reclen write rewrite read reread 67108864 16384 2340360 3666175 6272401 5460782 Percentage change: 4.95% 0.78% -0.68% -0.64% This patch introduces inlining to the xas_descend function. While this change increases the size of lib/xarray.o, the performance gains in critical workloads make this an acceptable trade-off. Size comparison before and after patch: .text .data .bss file 0x3502 0 0 lib/xarray.o.before 0x3602 0 0 lib/xarray.o.after Signed-off-by: Long Li <leo.lilong@huawei.com> --- lib/xarray.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)