diff mbox series

percpu: fix clang modpost warning in pcpu_build_alloc_info()

Message ID 20201231212852.3175381-1-dennis@kernel.org (mailing list archive)
State New, archived
Headers show
Series percpu: fix clang modpost warning in pcpu_build_alloc_info() | expand

Commit Message

Dennis Zhou Dec. 31, 2020, 9:28 p.m. UTC
This is an unusual situation so I thought it best to explain it in a
separate patch.

"percpu: reduce the number of cpu distance comparisons" introduces a
dependency on cpumask helper functions in __init code. This code
references a struct cpumask annotated __initdata. When the function is
inlined (gcc), everything is fine, but clang decides not to inline these
function calls. This causes modpost to warn about an __initdata access
by a function not annotated with __init [1].

Ways I thought about fixing it:
1. figure out why clang thinks this inlining is too costly.
2. create a wrapper function annotated __init (this).
3. annotate cpumask with __refdata.

Ultimately it comes down to if it's worth saving the cpumask memory and
allowing it to be freed. IIUC, __refdata won't be freed, so option 3 is
just a little wasteful. 1 is out of my depth, leaving 2. I don't feel
great about this behavior being dependent on inlining semantics, but
cpumask helpers are small and probably should be inlined.

modpost complaint:
  WARNING: modpost: vmlinux.o(.text+0x735425): Section mismatch in reference from the function cpumask_clear_cpu() to the variable .init.data:pcpu_build_alloc_info.mask
  The function cpumask_clear_cpu() references
  the variable __initdata pcpu_build_alloc_info.mask.
  This is often because cpumask_clear_cpu lacks a __initdata
  annotation or the annotation of pcpu_build_alloc_info.mask is wrong.

clang output:
  mm/percpu.c:2724:5: remark: cpumask_clear_cpu not inlined into pcpu_build_alloc_info because too costly to inline (cost=725, threshold=325) [-Rpass-missed=inline]

[1] https://lore.kernel.org/linux-mm/202012220454.9F6Bkz9q-lkp@intel.com/

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
This is on top of percpu#for-5.12.

 mm/percpu.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Comments

Nathan Chancellor Jan. 4, 2021, 11:46 p.m. UTC | #1
On Thu, Dec 31, 2020 at 09:28:52PM +0000, Dennis Zhou wrote:
> This is an unusual situation so I thought it best to explain it in a
> separate patch.
> 
> "percpu: reduce the number of cpu distance comparisons" introduces a
> dependency on cpumask helper functions in __init code. This code
> references a struct cpumask annotated __initdata. When the function is
> inlined (gcc), everything is fine, but clang decides not to inline these
> function calls. This causes modpost to warn about an __initdata access
> by a function not annotated with __init [1].
> 
> Ways I thought about fixing it:
> 1. figure out why clang thinks this inlining is too costly.
> 2. create a wrapper function annotated __init (this).
> 3. annotate cpumask with __refdata.
> 
> Ultimately it comes down to if it's worth saving the cpumask memory and
> allowing it to be freed. IIUC, __refdata won't be freed, so option 3 is
> just a little wasteful. 1 is out of my depth, leaving 2. I don't feel
> great about this behavior being dependent on inlining semantics, but
> cpumask helpers are small and probably should be inlined.
> 
> modpost complaint:
>   WARNING: modpost: vmlinux.o(.text+0x735425): Section mismatch in reference from the function cpumask_clear_cpu() to the variable .init.data:pcpu_build_alloc_info.mask
>   The function cpumask_clear_cpu() references
>   the variable __initdata pcpu_build_alloc_info.mask.
>   This is often because cpumask_clear_cpu lacks a __initdata
>   annotation or the annotation of pcpu_build_alloc_info.mask is wrong.
> 
> clang output:
>   mm/percpu.c:2724:5: remark: cpumask_clear_cpu not inlined into pcpu_build_alloc_info because too costly to inline (cost=725, threshold=325) [-Rpass-missed=inline]
> 
> [1] https://lore.kernel.org/linux-mm/202012220454.9F6Bkz9q-lkp@intel.com/
> 
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
> ---
> This is on top of percpu#for-5.12.
> 
>  mm/percpu.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 80f8f885a990..357977c4cb00 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -2642,6 +2642,18 @@ early_param("percpu_alloc", percpu_alloc_setup);
>  
>  /* pcpu_build_alloc_info() is used by both embed and page first chunk */
>  #if defined(BUILD_EMBED_FIRST_CHUNK) || defined(BUILD_PAGE_FIRST_CHUNK)
> +
> +/*
> + * This wrapper is to avoid a warning where cpumask_clear_cpu() is not inlined
> + * when compiling with clang causing modpost to warn about accessing __initdata
> + * from a non __init function.  By doing this, we allow the struct cpumask to be
> + * freed instead of it taking space by annotating with __refdata.
> + */
> +static void __init pcpu_cpumask_clear_cpu(int cpu, struct cpumask *mask)
> +{
> +	cpumask_clear_cpu(cpu, mask);
> +}
> +
>  /**
>   * pcpu_build_alloc_info - build alloc_info considering distances between CPUs
>   * @reserved_size: the size of reserved percpu area in bytes
> @@ -2713,7 +2725,7 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
>  		cpu = cpumask_first(&mask);
>  		group_map[cpu] = group;
>  		group_cnt[group]++;
> -		cpumask_clear_cpu(cpu, &mask);
> +		pcpu_cpumask_clear_cpu(cpu, &mask);
>  
>  		for_each_cpu(tcpu, &mask) {
>  			if (!cpu_distance_fn ||
> @@ -2721,7 +2733,7 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
>  			     cpu_distance_fn(tcpu, cpu) == LOCAL_DISTANCE)) {
>  				group_map[tcpu] = group;
>  				group_cnt[group]++;
> -				cpumask_clear_cpu(tcpu, &mask);
> +				pcpu_cpumask_clear_cpu(tcpu, &mask);
>  			}
>  		}
>  	}
> -- 
> 2.29.2.729.g45daf8777d-goog
> 

Hi Dennis,

I did a bisect of the problematic config against defconfig and it points
out that CONFIG_GCOV_PROFILE_ALL is in the bad config but not the good
config, which makes some sense as that will mess with clang's inlining
heuristics. It does not appear to be the single config that makes a
difference but it gives some clarity.

I do not personally have any strong opinions around the patch but is it
really that much wasted memory to just annotate mask with __refdata?

Cheers,
Nathan
Dennis Zhou Jan. 5, 2021, 12:55 a.m. UTC | #2
On Mon, Jan 04, 2021 at 04:46:51PM -0700, Nathan Chancellor wrote:
> On Thu, Dec 31, 2020 at 09:28:52PM +0000, Dennis Zhou wrote:
> > This is an unusual situation so I thought it best to explain it in a
> > separate patch.
> > 
> > "percpu: reduce the number of cpu distance comparisons" introduces a
> > dependency on cpumask helper functions in __init code. This code
> > references a struct cpumask annotated __initdata. When the function is
> > inlined (gcc), everything is fine, but clang decides not to inline these
> > function calls. This causes modpost to warn about an __initdata access
> > by a function not annotated with __init [1].
> > 
> > Ways I thought about fixing it:
> > 1. figure out why clang thinks this inlining is too costly.
> > 2. create a wrapper function annotated __init (this).
> > 3. annotate cpumask with __refdata.
> > 
> > Ultimately it comes down to if it's worth saving the cpumask memory and
> > allowing it to be freed. IIUC, __refdata won't be freed, so option 3 is
> > just a little wasteful. 1 is out of my depth, leaving 2. I don't feel
> > great about this behavior being dependent on inlining semantics, but
> > cpumask helpers are small and probably should be inlined.
> > 
> > modpost complaint:
> >   WARNING: modpost: vmlinux.o(.text+0x735425): Section mismatch in reference from the function cpumask_clear_cpu() to the variable .init.data:pcpu_build_alloc_info.mask
> >   The function cpumask_clear_cpu() references
> >   the variable __initdata pcpu_build_alloc_info.mask.
> >   This is often because cpumask_clear_cpu lacks a __initdata
> >   annotation or the annotation of pcpu_build_alloc_info.mask is wrong.
> > 
> > clang output:
> >   mm/percpu.c:2724:5: remark: cpumask_clear_cpu not inlined into pcpu_build_alloc_info because too costly to inline (cost=725, threshold=325) [-Rpass-missed=inline]
> > 
> > [1] https://lore.kernel.org/linux-mm/202012220454.9F6Bkz9q-lkp@intel.com/
> > 
> > Reported-by: kernel test robot <lkp@intel.com>
> > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> > ---
> > This is on top of percpu#for-5.12.
> > 
> >  mm/percpu.c | 16 ++++++++++++++--
> >  1 file changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/percpu.c b/mm/percpu.c
> > index 80f8f885a990..357977c4cb00 100644
> > --- a/mm/percpu.c
> > +++ b/mm/percpu.c
> > @@ -2642,6 +2642,18 @@ early_param("percpu_alloc", percpu_alloc_setup);
> >  
> >  /* pcpu_build_alloc_info() is used by both embed and page first chunk */
> >  #if defined(BUILD_EMBED_FIRST_CHUNK) || defined(BUILD_PAGE_FIRST_CHUNK)
> > +
> > +/*
> > + * This wrapper is to avoid a warning where cpumask_clear_cpu() is not inlined
> > + * when compiling with clang causing modpost to warn about accessing __initdata
> > + * from a non __init function.  By doing this, we allow the struct cpumask to be
> > + * freed instead of it taking space by annotating with __refdata.
> > + */
> > +static void __init pcpu_cpumask_clear_cpu(int cpu, struct cpumask *mask)
> > +{
> > +	cpumask_clear_cpu(cpu, mask);
> > +}
> > +
> >  /**
> >   * pcpu_build_alloc_info - build alloc_info considering distances between CPUs
> >   * @reserved_size: the size of reserved percpu area in bytes
> > @@ -2713,7 +2725,7 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
> >  		cpu = cpumask_first(&mask);
> >  		group_map[cpu] = group;
> >  		group_cnt[group]++;
> > -		cpumask_clear_cpu(cpu, &mask);
> > +		pcpu_cpumask_clear_cpu(cpu, &mask);
> >  
> >  		for_each_cpu(tcpu, &mask) {
> >  			if (!cpu_distance_fn ||
> > @@ -2721,7 +2733,7 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
> >  			     cpu_distance_fn(tcpu, cpu) == LOCAL_DISTANCE)) {
> >  				group_map[tcpu] = group;
> >  				group_cnt[group]++;
> > -				cpumask_clear_cpu(tcpu, &mask);
> > +				pcpu_cpumask_clear_cpu(tcpu, &mask);
> >  			}
> >  		}
> >  	}
> > -- 
> > 2.29.2.729.g45daf8777d-goog
> > 

Hi Nathan,

> 
> Hi Dennis,
> 
> I did a bisect of the problematic config against defconfig and it points
> out that CONFIG_GCOV_PROFILE_ALL is in the bad config but not the good
> config, which makes some sense as that will mess with clang's inlining
> heuristics. It does not appear to be the single config that makes a
> difference but it gives some clarity.
> 

Ah, thanks. To me it's kind of a corner case that I don't have a lot of
insight into. __init code is pretty limited and this warning is really
at the compilers whim. However, in this case only clang throws this
warning.

> I do not personally have any strong opinions around the patch but is it
> really that much wasted memory to just annotate mask with __refdata?

It's really not much memory, 1 bit per max # of cpus. The reported
config is on the extreme side compiling with 8k NR_CPUS, so 1kb. I'm
just not in love with the idea of adding a patch to improve readability
and it cost idle memory to resolve a compile time warning.

If no one else chimes in in the next few days, I'll probably just apply
it and go from there. If another issue comes up I'll drop this and tag
it as __refdata.

Thanks,
Dennis
Arnd Bergmann Jan. 25, 2021, 11:07 a.m. UTC | #3
On Tue, Jan 5, 2021 at 1:55 AM Dennis Zhou <dennis@kernel.org> wrote:
>
> On Mon, Jan 04, 2021 at 04:46:51PM -0700, Nathan Chancellor wrote:
> > On Thu, Dec 31, 2020 at 09:28:52PM +0000, Dennis Zhou wrote:
> > >
>
> Hi Nathan,
>
> >
> > Hi Dennis,
> >
> > I did a bisect of the problematic config against defconfig and it points
> > out that CONFIG_GCOV_PROFILE_ALL is in the bad config but not the good
> > config, which makes some sense as that will mess with clang's inlining
> > heuristics. It does not appear to be the single config that makes a
> > difference but it gives some clarity.
> >
>
> Ah, thanks. To me it's kind of a corner case that I don't have a lot of
> insight into. __init code is pretty limited and this warning is really
> at the compilers whim. However, in this case only clang throws this
> warning.
>
> > I do not personally have any strong opinions around the patch but is it
> > really that much wasted memory to just annotate mask with __refdata?
>
> It's really not much memory, 1 bit per max # of cpus. The reported
> config is on the extreme side compiling with 8k NR_CPUS, so 1kb. I'm
> just not in love with the idea of adding a patch to improve readability
> and it cost idle memory to resolve a compile time warning.
>
> If no one else chimes in in the next few days, I'll probably just apply
> it and go from there. If another issue comes up I'll drop this and tag
> it as __refdata.

I've come across this one again in linux-next today, and found that
I had an old patch for it already, that I had never submitted:

From 7d6f40414490092b86f1a64d8c42426ee350da1a Mon Sep 17 00:00:00 2001
From: Arnd Bergmann <arnd@arndb.de>
Date: Mon, 7 Dec 2020 23:24:20 +0100
Subject: [PATCH] mm: percpu: fix section mismatch warning

Building with arm64 clang sometimes (fairly rarely) shows a
warning about the pcpu_build_alloc_info() function:

WARNING: modpost: vmlinux.o(.text+0x21697c): Section mismatch in
reference from the function cpumask_clear_cpu() to the variable
.init.data:pcpu_build_alloc_info.mask
The function cpumask_clear_cpu() references
the variable __initdata pcpu_build_alloc_info.mask.
This is often because cpumask_clear_cpu lacks a __initdata
annotation or the annotation of pcpu_build_alloc_info.mask is wrong.

What appears to be going on here is that the compiler decides to not
inline the cpumask_clear_cpu() function that is marked 'inline' but not
'always_inline', and it then produces a specialized version of it that
references the static mask unconditionally as an optimization.

Marking cpumask_clear_cpu() as __always_inline would fix it, as would
removing the __initdata annotation on the variable.  I went for marking
the function as __attribute__((flatten)) instead because all functions
called from it are really meant to be inlined here, and it prevents
the same problem happening here again. This is unlikely to be a problem
elsewhere because there are very few function-local static __initdata
variables in the kernel.

Fixes: 6c207504ae79 ("percpu: reduce the number of cpu distance comparisons")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

diff --git a/mm/percpu.c b/mm/percpu.c
index 5ede8dd407d5..527181c46b08 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -2662,10 +2662,9 @@ early_param("percpu_alloc", percpu_alloc_setup);
  * On success, pointer to the new allocation_info is returned.  On
  * failure, ERR_PTR value is returned.
  */
-static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
-                               size_t reserved_size, size_t dyn_size,
-                               size_t atom_size,
-                               pcpu_fc_cpu_distance_fn_t cpu_distance_fn)
+static struct pcpu_alloc_info * __init __attribute__((flatten))
+pcpu_build_alloc_info(size_t reserved_size, size_t dyn_size, size_t atom_size,
+                     pcpu_fc_cpu_distance_fn_t cpu_distance_fn)
 {
        static int group_map[NR_CPUS] __initdata;
        static int group_cnt[NR_CPUS] __initdata;


Not sure if this would be any better than your patch.

       Arnd
Nick Desaulniers Jan. 25, 2021, 6:27 p.m. UTC | #4
On Mon, Jan 25, 2021 at 3:07 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Tue, Jan 5, 2021 at 1:55 AM Dennis Zhou <dennis@kernel.org> wrote:
> >
> > On Mon, Jan 04, 2021 at 04:46:51PM -0700, Nathan Chancellor wrote:
> > > On Thu, Dec 31, 2020 at 09:28:52PM +0000, Dennis Zhou wrote:
> > > >
> >
> > Hi Nathan,
> >
> > >
> > > Hi Dennis,
> > >
> > > I did a bisect of the problematic config against defconfig and it points
> > > out that CONFIG_GCOV_PROFILE_ALL is in the bad config but not the good
> > > config, which makes some sense as that will mess with clang's inlining
> > > heuristics. It does not appear to be the single config that makes a
> > > difference but it gives some clarity.
> > >
> >
> > Ah, thanks. To me it's kind of a corner case that I don't have a lot of
> > insight into. __init code is pretty limited and this warning is really
> > at the compilers whim. However, in this case only clang throws this
> > warning.
> >
> > > I do not personally have any strong opinions around the patch but is it
> > > really that much wasted memory to just annotate mask with __refdata?
> >
> > It's really not much memory, 1 bit per max # of cpus. The reported
> > config is on the extreme side compiling with 8k NR_CPUS, so 1kb. I'm
> > just not in love with the idea of adding a patch to improve readability
> > and it cost idle memory to resolve a compile time warning.
> >
> > If no one else chimes in in the next few days, I'll probably just apply
> > it and go from there. If another issue comes up I'll drop this and tag
> > it as __refdata.
>
> I've come across this one again in linux-next today, and found that
> I had an old patch for it already, that I had never submitted:
>
> From 7d6f40414490092b86f1a64d8c42426ee350da1a Mon Sep 17 00:00:00 2001
> From: Arnd Bergmann <arnd@arndb.de>
> Date: Mon, 7 Dec 2020 23:24:20 +0100
> Subject: [PATCH] mm: percpu: fix section mismatch warning
>
> Building with arm64 clang sometimes (fairly rarely) shows a
> warning about the pcpu_build_alloc_info() function:
>
> WARNING: modpost: vmlinux.o(.text+0x21697c): Section mismatch in
> reference from the function cpumask_clear_cpu() to the variable
> .init.data:pcpu_build_alloc_info.mask
> The function cpumask_clear_cpu() references
> the variable __initdata pcpu_build_alloc_info.mask.
> This is often because cpumask_clear_cpu lacks a __initdata
> annotation or the annotation of pcpu_build_alloc_info.mask is wrong.
>
> What appears to be going on here is that the compiler decides to not
> inline the cpumask_clear_cpu() function that is marked 'inline' but not
> 'always_inline', and it then produces a specialized version of it that
> references the static mask unconditionally as an optimization.
>
> Marking cpumask_clear_cpu() as __always_inline would fix it, as would
> removing the __initdata annotation on the variable.  I went for marking
> the function as __attribute__((flatten)) instead because all functions

I had to look this one up; it's new to me!
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
https://awesomekling.github.io/Smarter-C++-inlining-with-attribute-flatten/

Seems pretty cool/flexible to control inlining on the caller side!

At the least though, we should avoid open coding the function attributes.  See
include/linux/compiler_attributes.h

Testing quickly in godbolt, __flatten__ has been supported since at
least clang 3.5 and gcc 4.4, FWIW (so it doesn't need a
__has_attribute guard).

> called from it are really meant to be inlined here, and it prevents
> the same problem happening here again. This is unlikely to be a problem
> elsewhere because there are very few function-local static __initdata
> variables in the kernel.
>
> Fixes: 6c207504ae79 ("percpu: reduce the number of cpu distance comparisons")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 5ede8dd407d5..527181c46b08 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -2662,10 +2662,9 @@ early_param("percpu_alloc", percpu_alloc_setup);
>   * On success, pointer to the new allocation_info is returned.  On
>   * failure, ERR_PTR value is returned.
>   */
> -static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
> -                               size_t reserved_size, size_t dyn_size,
> -                               size_t atom_size,
> -                               pcpu_fc_cpu_distance_fn_t cpu_distance_fn)
> +static struct pcpu_alloc_info * __init __attribute__((flatten))
> +pcpu_build_alloc_info(size_t reserved_size, size_t dyn_size, size_t atom_size,
> +                     pcpu_fc_cpu_distance_fn_t cpu_distance_fn)
>  {
>         static int group_map[NR_CPUS] __initdata;
>         static int group_cnt[NR_CPUS] __initdata;
>
>
> Not sure if this would be any better than your patch.
>
>        Arnd
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/CAK8P3a2ZWfNeXKSm8K_SUhhwkor17jFo3xApLXjzfPqX0eUDUA%40mail.gmail.com.
Dennis Zhou Jan. 26, 2021, 5:04 a.m. UTC | #5
On Mon, Jan 25, 2021 at 12:07:24PM +0100, Arnd Bergmann wrote:
> On Tue, Jan 5, 2021 at 1:55 AM Dennis Zhou <dennis@kernel.org> wrote:
> >
> > On Mon, Jan 04, 2021 at 04:46:51PM -0700, Nathan Chancellor wrote:
> > > On Thu, Dec 31, 2020 at 09:28:52PM +0000, Dennis Zhou wrote:
> > > >
> >
> > Hi Nathan,
> >
> > >
> > > Hi Dennis,
> > >
> > > I did a bisect of the problematic config against defconfig and it points
> > > out that CONFIG_GCOV_PROFILE_ALL is in the bad config but not the good
> > > config, which makes some sense as that will mess with clang's inlining
> > > heuristics. It does not appear to be the single config that makes a
> > > difference but it gives some clarity.
> > >
> >
> > Ah, thanks. To me it's kind of a corner case that I don't have a lot of
> > insight into. __init code is pretty limited and this warning is really
> > at the compilers whim. However, in this case only clang throws this
> > warning.
> >
> > > I do not personally have any strong opinions around the patch but is it
> > > really that much wasted memory to just annotate mask with __refdata?
> >
> > It's really not much memory, 1 bit per max # of cpus. The reported
> > config is on the extreme side compiling with 8k NR_CPUS, so 1kb. I'm
> > just not in love with the idea of adding a patch to improve readability
> > and it cost idle memory to resolve a compile time warning.
> >
> > If no one else chimes in in the next few days, I'll probably just apply
> > it and go from there. If another issue comes up I'll drop this and tag
> > it as __refdata.
> 
> I've come across this one again in linux-next today, and found that
> I had an old patch for it already, that I had never submitted:
> 
> From 7d6f40414490092b86f1a64d8c42426ee350da1a Mon Sep 17 00:00:00 2001
> From: Arnd Bergmann <arnd@arndb.de>
> Date: Mon, 7 Dec 2020 23:24:20 +0100
> Subject: [PATCH] mm: percpu: fix section mismatch warning
> 
> Building with arm64 clang sometimes (fairly rarely) shows a
> warning about the pcpu_build_alloc_info() function:
> 
> WARNING: modpost: vmlinux.o(.text+0x21697c): Section mismatch in
> reference from the function cpumask_clear_cpu() to the variable
> .init.data:pcpu_build_alloc_info.mask
> The function cpumask_clear_cpu() references
> the variable __initdata pcpu_build_alloc_info.mask.
> This is often because cpumask_clear_cpu lacks a __initdata
> annotation or the annotation of pcpu_build_alloc_info.mask is wrong.
> 
> What appears to be going on here is that the compiler decides to not
> inline the cpumask_clear_cpu() function that is marked 'inline' but not
> 'always_inline', and it then produces a specialized version of it that
> references the static mask unconditionally as an optimization.
> 
> Marking cpumask_clear_cpu() as __always_inline would fix it, as would
> removing the __initdata annotation on the variable.  I went for marking
> the function as __attribute__((flatten)) instead because all functions
> called from it are really meant to be inlined here, and it prevents
> the same problem happening here again. This is unlikely to be a problem
> elsewhere because there are very few function-local static __initdata
> variables in the kernel.
> 
> Fixes: 6c207504ae79 ("percpu: reduce the number of cpu distance comparisons")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> 
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 5ede8dd407d5..527181c46b08 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -2662,10 +2662,9 @@ early_param("percpu_alloc", percpu_alloc_setup);
>   * On success, pointer to the new allocation_info is returned.  On
>   * failure, ERR_PTR value is returned.
>   */
> -static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
> -                               size_t reserved_size, size_t dyn_size,
> -                               size_t atom_size,
> -                               pcpu_fc_cpu_distance_fn_t cpu_distance_fn)
> +static struct pcpu_alloc_info * __init __attribute__((flatten))
> +pcpu_build_alloc_info(size_t reserved_size, size_t dyn_size, size_t atom_size,
> +                     pcpu_fc_cpu_distance_fn_t cpu_distance_fn)
>  {
>         static int group_map[NR_CPUS] __initdata;
>         static int group_cnt[NR_CPUS] __initdata;
> 
> 
> Not sure if this would be any better than your patch.
> 
>        Arnd

Hi Arnd,

I like this solution a lot more than my previous solution because this
is a lot less fragile.

Thanks,
Dennis
Dennis Zhou Jan. 26, 2021, 5:11 a.m. UTC | #6
Hi Nick,

On Mon, Jan 25, 2021 at 10:27:11AM -0800, Nick Desaulniers wrote:
> On Mon, Jan 25, 2021 at 3:07 AM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > On Tue, Jan 5, 2021 at 1:55 AM Dennis Zhou <dennis@kernel.org> wrote:
> > >
> > > On Mon, Jan 04, 2021 at 04:46:51PM -0700, Nathan Chancellor wrote:
> > > > On Thu, Dec 31, 2020 at 09:28:52PM +0000, Dennis Zhou wrote:
> > > > >
> > >
> > > Hi Nathan,
> > >
> > > >
> > > > Hi Dennis,
> > > >
> > > > I did a bisect of the problematic config against defconfig and it points
> > > > out that CONFIG_GCOV_PROFILE_ALL is in the bad config but not the good
> > > > config, which makes some sense as that will mess with clang's inlining
> > > > heuristics. It does not appear to be the single config that makes a
> > > > difference but it gives some clarity.
> > > >
> > >
> > > Ah, thanks. To me it's kind of a corner case that I don't have a lot of
> > > insight into. __init code is pretty limited and this warning is really
> > > at the compilers whim. However, in this case only clang throws this
> > > warning.
> > >
> > > > I do not personally have any strong opinions around the patch but is it
> > > > really that much wasted memory to just annotate mask with __refdata?
> > >
> > > It's really not much memory, 1 bit per max # of cpus. The reported
> > > config is on the extreme side compiling with 8k NR_CPUS, so 1kb. I'm
> > > just not in love with the idea of adding a patch to improve readability
> > > and it cost idle memory to resolve a compile time warning.
> > >
> > > If no one else chimes in in the next few days, I'll probably just apply
> > > it and go from there. If another issue comes up I'll drop this and tag
> > > it as __refdata.
> >
> > I've come across this one again in linux-next today, and found that
> > I had an old patch for it already, that I had never submitted:
> >
> > From 7d6f40414490092b86f1a64d8c42426ee350da1a Mon Sep 17 00:00:00 2001
> > From: Arnd Bergmann <arnd@arndb.de>
> > Date: Mon, 7 Dec 2020 23:24:20 +0100
> > Subject: [PATCH] mm: percpu: fix section mismatch warning
> >
> > Building with arm64 clang sometimes (fairly rarely) shows a
> > warning about the pcpu_build_alloc_info() function:
> >
> > WARNING: modpost: vmlinux.o(.text+0x21697c): Section mismatch in
> > reference from the function cpumask_clear_cpu() to the variable
> > .init.data:pcpu_build_alloc_info.mask
> > The function cpumask_clear_cpu() references
> > the variable __initdata pcpu_build_alloc_info.mask.
> > This is often because cpumask_clear_cpu lacks a __initdata
> > annotation or the annotation of pcpu_build_alloc_info.mask is wrong.
> >
> > What appears to be going on here is that the compiler decides to not
> > inline the cpumask_clear_cpu() function that is marked 'inline' but not
> > 'always_inline', and it then produces a specialized version of it that
> > references the static mask unconditionally as an optimization.
> >
> > Marking cpumask_clear_cpu() as __always_inline would fix it, as would
> > removing the __initdata annotation on the variable.  I went for marking
> > the function as __attribute__((flatten)) instead because all functions
> 
> I had to look this one up; it's new to me!
> https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
> https://awesomekling.github.io/Smarter-C++-inlining-with-attribute-flatten/
> 
> Seems pretty cool/flexible to control inlining on the caller side!
> 
> At the least though, we should avoid open coding the function attributes.  See
> include/linux/compiler_attributes.h
> 

Arnd do you mind spinning a new version to add __flatten to
compiler_attributes.h?

> Testing quickly in godbolt, __flatten__ has been supported since at
> least clang 3.5 and gcc 4.4, FWIW (so it doesn't need a
> __has_attribute guard).
> 

Thanks for testing this!

Thanks,
Dennis
diff mbox series

Patch

diff --git a/mm/percpu.c b/mm/percpu.c
index 80f8f885a990..357977c4cb00 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -2642,6 +2642,18 @@  early_param("percpu_alloc", percpu_alloc_setup);
 
 /* pcpu_build_alloc_info() is used by both embed and page first chunk */
 #if defined(BUILD_EMBED_FIRST_CHUNK) || defined(BUILD_PAGE_FIRST_CHUNK)
+
+/*
+ * This wrapper is to avoid a warning where cpumask_clear_cpu() is not inlined
+ * when compiling with clang causing modpost to warn about accessing __initdata
+ * from a non __init function.  By doing this, we allow the struct cpumask to be
+ * freed instead of it taking space by annotating with __refdata.
+ */
+static void __init pcpu_cpumask_clear_cpu(int cpu, struct cpumask *mask)
+{
+	cpumask_clear_cpu(cpu, mask);
+}
+
 /**
  * pcpu_build_alloc_info - build alloc_info considering distances between CPUs
  * @reserved_size: the size of reserved percpu area in bytes
@@ -2713,7 +2725,7 @@  static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
 		cpu = cpumask_first(&mask);
 		group_map[cpu] = group;
 		group_cnt[group]++;
-		cpumask_clear_cpu(cpu, &mask);
+		pcpu_cpumask_clear_cpu(cpu, &mask);
 
 		for_each_cpu(tcpu, &mask) {
 			if (!cpu_distance_fn ||
@@ -2721,7 +2733,7 @@  static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
 			     cpu_distance_fn(tcpu, cpu) == LOCAL_DISTANCE)) {
 				group_map[tcpu] = group;
 				group_cnt[group]++;
-				cpumask_clear_cpu(tcpu, &mask);
+				pcpu_cpumask_clear_cpu(tcpu, &mask);
 			}
 		}
 	}