Message ID | 20210929225850.3889950-1-ndesaulniers@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | modpost: add allow list for llvm IPSCCP | expand |
On Wed, Sep 29, 2021 at 3:59 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > +static const struct secref_exception secref_allowlist[] = { > + { .fromsym = "__first_node", .tosym = "numa_nodes_parsed" }, > + { .fromsym = "__next_node", .tosym = "numa_nodes_parsed" }, > + { .fromsym = "__nodes_weight", .tosym = "numa_nodes_parsed" }, > + { .fromsym = "early_get_smp_config", .tosym = "x86_init" }, > + { .fromsym = "test_bit", .tosym = "numa_nodes_parsed" }, > +}; This list is basically made-up and random. Why did those functions not get inlined? Wouldn't it be better to make them always-inline? Or, like in at least the early_get_smp_config() case, just make it be marked __init, so that if it doesn't get inlined it gets the right section? It seems silly to add random source mappings to a checking program. It was bad for the gcc constprop hack, but at least there it was a clear case of "this inlining failed". This ad-hoc list has cases of things that are clearly wrong in general ("test_bit()" must not use initdata), and that "ok, the function just doesn't have the right section marker. (All of get_smp_config/early_get_smp_config/find_smp_config should be __init, since they most definitely cannot work after __init time - but why a compiler doesn't just inline them when they are one single indirect call, I don't really get) Linus
On Wed, Sep 29, 2021 at 4:28 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, Sep 29, 2021 at 3:59 PM Nick Desaulniers > <ndesaulniers@google.com> wrote: > > > > +static const struct secref_exception secref_allowlist[] = { > > + { .fromsym = "__first_node", .tosym = "numa_nodes_parsed" }, > > + { .fromsym = "__next_node", .tosym = "numa_nodes_parsed" }, > > + { .fromsym = "__nodes_weight", .tosym = "numa_nodes_parsed" }, > > + { .fromsym = "early_get_smp_config", .tosym = "x86_init" }, > > + { .fromsym = "test_bit", .tosym = "numa_nodes_parsed" }, > > +}; Thanks for your feedback. This has been a long-standing issue with no clear path forward; I was looking forward to your input. > > This list is basically made-up and random. Definitely brittle. And it contains checks that are specific to basically one set of configs for one arch. It sucks to pay that cost for unaffected architectures. > Why did those functions not get inlined? $ make LLVM=1 -j72 allmodconfig $ make LLVM=1 -j72 arch/x86/mm/amdtopology.o KCFLAGS=-Rpass-missed=inline. ... arch/x86/mm/amdtopology.c:110:7: remark: 'test_bit' not inlined into 'amd_numa_init' because too costly to inline (cost=115, threshold=45) [-Rpass-missed=inline] if (node_isset(nodeid, numa_nodes_parsed)) { ^ arch/x86/mm/amdtopology.c:157:7: remark: '__nodes_weight' not inlined into 'amd_numa_init' because too costly to inline (cost=60, threshold=45) [-Rpass-missed=inline] if (!nodes_weight(numa_nodes_parsed)) ^ arch/x86/mm/amdtopology.c:171:2: remark: 'early_get_smp_config' not inlined into 'amd_numa_init' because too costly to inline (cost=85, threshold=45) [-Rpass-missed=inline] early_get_smp_config(); ^ arch/x86/mm/amdtopology.c:178:2: remark: '__first_node' not inlined into 'amd_numa_init' because too costly to inline (cost=70, threshold=45) [-Rpass-missed=inline] for_each_node_mask(i, numa_nodes_parsed) ^ arch/x86/mm/amdtopology.c:178:2: remark: '__next_node' not inlined into 'amd_numa_init' because too costly to inline (cost=95, threshold=45) [-Rpass-missed=inline] ie. for allmodconfig, the sanitizers add too much instrumentation to the callees that they become too large to be considered profitable to inline by the cost model. Note that LLVM's inliner works bottom up, not top down. Though for the defconfig case...somehow the cost is more than with the sanitizers... arch/x86/mm/amdtopology.c:157:7: remark: '__nodes_weight' not inlined into 'amd_numa_init' because too costly to inline (cost=930, threshold=45) [-Rpass-missed=inline] if (!nodes_weight(numa_nodes_parsed)) ^ Looking at the output of `make LLVM=1 -j72 arch/x86/mm/amdtopology.ll`, @__nodes_weight is just some inline asm (.altinstructions). I wonder if I need to teach the cost model about `asm inline`... For the allmodconfig build it looks like `__nodes_weight` calls `__bitmap_weight` and the code coverage runtime hooks. > Wouldn't it be better to make > them always-inline? Perhaps, see what that might look like: https://github.com/ClangBuiltLinux/linux/issues/1302#issuecomment-807260475 Does that look better? > Or, like in at least the early_get_smp_config() case, just make it be > marked __init, so that if it doesn't get inlined it gets the right > section? In the case of early_get_smp_config(), that's what Boris suggested: https://lore.kernel.org/lkml/20210225114533.GA380@zn.tnic/ > > It seems silly to add random source mappings to a checking program. > > It was bad for the gcc constprop hack, but at least there it was a Part of me feels like modpost not warning on those is permitting a "memory leak," in so far as code that's only called from .init callers is never reclaimed. Or leaving behind gadgets... > clear case of "this inlining failed". This ad-hoc list has cases of > things that are clearly wrong in general ("test_bit()" must not use > initdata), and that "ok, the function just doesn't have the right > section marker. Sorry, what do you mean "test_bit() must not use initdata?" Because it can lead to problems like this? Or...? include/linux/nodemask.h has a comment that I'd bet predates that modpost "Pattern 5" gcc constprop hack. https://github.com/ClangBuiltLinux/linux/blob/83d09ad4b950651a95d37697f1493c00d888d0db/include/linux/nodemask.h#L119-L125 > > (All of get_smp_config/early_get_smp_config/find_smp_config should be > __init, since they most definitely cannot work after __init time - but > why a compiler doesn't just inline them when they are one single > indirect call, I don't really get) > > Linus
On 30/09/2021 02.18, Nick Desaulniers wrote: > On Wed, Sep 29, 2021 at 4:28 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> > Though for the defconfig case...somehow the cost is more than with the > sanitizers... > > arch/x86/mm/amdtopology.c:157:7: remark: '__nodes_weight' not inlined > into 'amd_numa_init' because too costly to inline (cost=930, > threshold=45) [-Rpass-missed=inline] > if (!nodes_weight(numa_nodes_parsed)) > ^ > > Looking at the output of `make LLVM=1 -j72 > arch/x86/mm/amdtopology.ll`, @__nodes_weight is just some inline asm > (.altinstructions). I wonder if I need to teach the cost model about > `asm inline`... Remind me, does clang understand 'asm inline("foo")'? Regardless, it seems that the asm (ALTERNATIVE("call __sw_hweight32", ... asm (ALTERNATIVE("call __sw_hweight64", ... in arch/x86/include/asm/arch_hweight.h could/should be made asm_inline at least for gcc's sake. Somewhat related: I really think we should remove __cold from the definition of __init: It hurts boot time (on a simple board with quite reproducible boot timing I measured 1-3% some time ago), and it is likely at least partially responsible for the never-ending tsunami of functions-that-obviously-should-have-been-inlined(TM) but were not because the caller is being optimized for size. Whatever small cost in extra .text is reclaimed after init - and those who are concerned about the size of the kernel image itself probably build with CONFIG_OPTIMIZE_FOR_SIZE=y, and I see no change in such an image whether __init includes __cold or not. Rasmus
On Thu, Sep 30, 2021 at 9:19 AM Nick Desaulniers <ndesaulniers@google.com> wrote: > > On Wed, Sep 29, 2021 at 4:28 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > On Wed, Sep 29, 2021 at 3:59 PM Nick Desaulniers > > <ndesaulniers@google.com> wrote: > > > > > > +static const struct secref_exception secref_allowlist[] = { > > > + { .fromsym = "__first_node", .tosym = "numa_nodes_parsed" }, > > > + { .fromsym = "__next_node", .tosym = "numa_nodes_parsed" }, > > > + { .fromsym = "__nodes_weight", .tosym = "numa_nodes_parsed" }, > > > + { .fromsym = "early_get_smp_config", .tosym = "x86_init" }, > > > + { .fromsym = "test_bit", .tosym = "numa_nodes_parsed" }, > > > +}; > > Thanks for your feedback. This has been a long-standing issue with no > clear path forward; I was looking forward to your input. > > > > > This list is basically made-up and random. > > Definitely brittle. And it contains checks that are specific to > basically one set of configs for one arch. It sucks to pay that cost > for unaffected architectures. > > > Why did those functions not get inlined? > > $ make LLVM=1 -j72 allmodconfig > $ make LLVM=1 -j72 arch/x86/mm/amdtopology.o KCFLAGS=-Rpass-missed=inline. > ... > arch/x86/mm/amdtopology.c:110:7: remark: 'test_bit' not inlined into > 'amd_numa_init' because too costly to inline (cost=115, threshold=45) > [-Rpass-missed=inline] > if (node_isset(nodeid, numa_nodes_parsed)) { > ^ > arch/x86/mm/amdtopology.c:157:7: remark: '__nodes_weight' not inlined > into 'amd_numa_init' because too costly to inline (cost=60, > threshold=45) [-Rpass-missed=inline] > if (!nodes_weight(numa_nodes_parsed)) > ^ > arch/x86/mm/amdtopology.c:171:2: remark: 'early_get_smp_config' not > inlined into 'amd_numa_init' because too costly to inline (cost=85, > threshold=45) [-Rpass-missed=inline] > early_get_smp_config(); > ^ > arch/x86/mm/amdtopology.c:178:2: remark: '__first_node' not inlined > into 'amd_numa_init' because too costly to inline (cost=70, > threshold=45) [-Rpass-missed=inline] > for_each_node_mask(i, numa_nodes_parsed) > ^ > arch/x86/mm/amdtopology.c:178:2: remark: '__next_node' not inlined > into 'amd_numa_init' because too costly to inline (cost=95, > threshold=45) [-Rpass-missed=inline] > > > ie. for allmodconfig, the sanitizers add too much instrumentation to > the callees that they become too large to be considered profitable to > inline by the cost model. Note that LLVM's inliner works bottom up, > not top down. > > Though for the defconfig case...somehow the cost is more than with the > sanitizers... > > arch/x86/mm/amdtopology.c:157:7: remark: '__nodes_weight' not inlined > into 'amd_numa_init' because too costly to inline (cost=930, > threshold=45) [-Rpass-missed=inline] > if (!nodes_weight(numa_nodes_parsed)) > ^ > > Looking at the output of `make LLVM=1 -j72 > arch/x86/mm/amdtopology.ll`, @__nodes_weight is just some inline asm > (.altinstructions). I wonder if I need to teach the cost model about > `asm inline`... > > For the allmodconfig build it looks like `__nodes_weight` calls > `__bitmap_weight` and the code coverage runtime hooks. > > > Wouldn't it be better to make > > them always-inline? > > Perhaps, see what that might look like: > https://github.com/ClangBuiltLinux/linux/issues/1302#issuecomment-807260475 > Does that look better? > > > Or, like in at least the early_get_smp_config() case, just make it be > > marked __init, so that if it doesn't get inlined it gets the right > > section? > > In the case of early_get_smp_config(), that's what Boris suggested: > https://lore.kernel.org/lkml/20210225114533.GA380@zn.tnic/ __init works particularly for early_get_smp_config(). For static line helpers that are called from __init and non-__init functions, maybe __ref will work. In my understanding, the .ref.text section is not free'd, but modpost bypasses the section mismatch checks. I am not sure what is a better approach for generic cases, __always_inline, __ref, or what else? I am not a big fan of this patch, at least... (The reason was already stated by Linus)
On Wed, Sep 29, 2021 at 5:19 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > ... > arch/x86/mm/amdtopology.c:110:7: remark: 'test_bit' not inlined into > 'amd_numa_init' because too costly to inline (cost=115, threshold=45) > [-Rpass-missed=inline] > if (node_isset(nodeid, numa_nodes_parsed)) { Yeah, I think that we should just do the __always_inline thing. I'd rather have the stupid debug code overhead in the caller - that may end up knowing that the pointer actually is so that the debug code goes away - than have "test_bit()" uninlined because there's so much crazy debug code in it. I also happen to believe that we have too much crazy "instrumentation" crap. Why is that test_bit() word read so magical that it merits a "instrument_atomic_read()"? But I absolutely detest how KCSAN and some other tooling seems to get a free pass on doing stupid things, just because they generated bad warnings so then they can freely generate these much more fundamental problems because the result is a f*cking mess. > Though for the defconfig case...somehow the cost is more than with the > sanitizers... Maybe the solution is that if you have some of the crazy sanitizers, we just say "the end result is not worth even checking". And stop checking all the section mismatches, and all the stack size things. Because it looks like this is more of a real issue: > arch/x86/mm/amdtopology.c:157:7: remark: '__nodes_weight' not inlined > into 'amd_numa_init' because too costly to inline (cost=930, > threshold=45) [-Rpass-missed=inline] > if (!nodes_weight(numa_nodes_parsed)) > ^ Hmm. That's just a "bitmap_weight()", and that function in turn is __always_inline. And the *reason* it is __always_inline is that it really wants to act as a macro, and look at the second argument and do special things if it is a small constant value. And it looks like clang messes things up by simply not doing enough simplification before inlining decisions, so it all looks very complicated to clang, even though when you actually generate code, you have one (of two) very simple code sequences. > > Wouldn't it be better to make > > them always-inline? > > Perhaps, see what that might look like: > https://github.com/ClangBuiltLinux/linux/issues/1302#issuecomment-807260475 > Does that look better? I suspect that in this case, because of clang deficiencies, that __always_inline actually is the right thing to do at least on __nodes_weight. Looking at your comment lower down https://github.com/ClangBuiltLinux/linux/issues/1302#issuecomment-807757878 I really think this is a clang bug, and that you need to do certain simplifications both before _and_ after inlining. Before, because of the inlining cost decisions particularly wrt constant arguments. After, because successful inlining changes things completely. Marking __nodes_weight() be __always_inline just works around clang being broken in this regard. It is _possible_ that it might help to make bitmap_weight() be a macro instead of an inline function, but it's a kind of sad state of affairs if that is required. And it might well fail - if you don't do the constant propagation before making inlining decisions, you'll _still_ end up thinking that bitmap_weight() is very costly because you don't do that __builtin_constant_p() lowering. And then you end up using the (much more expensive) generic function instead of the cheap "look, for single words this is a trivial" thing. > Part of me feels like modpost not warning on those is permitting a > "memory leak," in so far as code that's only called from .init callers > is never reclaimed. Or leaving behind gadgets... I think we can just treat modpost as a "good heuristic". If it catches all the normal cases, it's fine - but it must not have false positives. That's basically true of all warnings. False positive warnings make a warning worthless. That's just *basic*. So the gcc thing is a "ok, we know compilers mess this up if they do partial inlining with constant propagation, so we will suppress what is quite likely a false positive for that case". That clang patch, in comparison? That's just a hack enumerating random cases. TRhere is no logic to it, and there is absolutely zero maintainability. It will cause us to forever just add other random cases to the list, making the whole tooling entirely pointless. See the difference? Maybe clang should just adopt the gcc naming convention, so that we can just use the gcc heuristic. > > clear case of "this inlining failed". This ad-hoc list has cases of > > things that are clearly wrong in general ("test_bit()" must not use > > initdata), and that "ok, the function just doesn't have the right > > section marker. > > Sorry, what do you mean "test_bit() must not use initdata?" Because it > can lead to problems like this? Or...? No, I mean that it is completely unacceptable to add some crazy rule like "you can access this init-data from any context, as long as you use test_bit to do so". That's basically what your rule does. And it's a FUNDAMENTALLY invalid rule. It's simply not true. The rule is invalid, it's just that clang has made such a mess of it that in one particular case it happens to be true. The gcc "rule" is much more reasonable: it's *not* saying "it's ok to access this init-data from test_bit". The gcc rule says "we know gcc messes up our heuristics when out-of-lining with constprop, so we just won't warn because false positives are bad, bad, bad. One rule is fundamentally garbage and wrong. The other rule is a generic "we know this situation cannot be tested for". Very different. Linus
On Thu, Sep 30, 2021 at 11:54 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Hmm. That's just a "bitmap_weight()", and that function in turn is > __always_inline. > > And the *reason* it is __always_inline is that it really wants to act > as a macro, and look at the second argument and do special things if > it is a small constant value. Looking around, it's not the only one. A lot of the bitmap functions do that, but it looks like we're missing a few __always_inline cases. I wonder if we should have a macro to generate those "do X or Y depending on small_const_nbits()" - and have it generate __always_inline functions. Of course, some of those functions have more complex "check at build time" cases, like that bitmap_clear/set() thing that has a special case for when it just turns into "memset()" We have a lot of these kinds of situations where we have a "generic" function that specializes itself based on arguments. And yes, they are often recursive, so that you need more than one level of inlining to actually determine what the arguments are. I don't know if we might have some way to mark these (and detect the cases where they don't get inlined and we lose the vasy basic optimizations). It's kind of similar to the _Generic() thing that does different things based on static types, it's just that it does it based on argument ranges. Linus
On Wed, Sep 29, 2021 at 05:18:49PM -0700, Nick Desaulniers wrote: > On Wed, Sep 29, 2021 at 4:28 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > Or, like in at least the early_get_smp_config() case, just make it be > > marked __init, so that if it doesn't get inlined it gets the right > > section? > > In the case of early_get_smp_config(), that's what Boris suggested: > https://lore.kernel.org/lkml/20210225114533.GA380@zn.tnic/ Sorry, I misremembered the above thread. That's what *Arnd* had suggested.
On Thu, Sep 30, 2021 at 11:54 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, Sep 29, 2021 at 5:19 PM Nick Desaulniers > <ndesaulniers@google.com> wrote: > > ... > > arch/x86/mm/amdtopology.c:110:7: remark: 'test_bit' not inlined into > > 'amd_numa_init' because too costly to inline (cost=115, threshold=45) > > [-Rpass-missed=inline] > > if (node_isset(nodeid, numa_nodes_parsed)) { > > Yeah, I think that we should just do the __always_inline thing. > > I'd rather have the stupid debug code overhead in the caller - that > may end up knowing that the pointer actually is so that the debug code > goes away - than have "test_bit()" uninlined because there's so much > crazy debug code in it. > > I also happen to believe that we have too much crazy "instrumentation" crap. > > Why is that test_bit() word read so magical that it merits a > "instrument_atomic_read()"? > > But I absolutely detest how KCSAN and some other tooling seems to get > a free pass on doing stupid things, just because they generated bad > warnings so then they can freely generate these much more fundamental > problems because the result is a f*cking mess. > > > Though for the defconfig case...somehow the cost is more than with the > > sanitizers... > > Maybe the solution is that if you have some of the crazy sanitizers, > we just say "the end result is not worth even checking". And stop > checking all the section mismatches, and all the stack size things. > > Because it looks like this is more of a real issue: > > > arch/x86/mm/amdtopology.c:157:7: remark: '__nodes_weight' not inlined > > into 'amd_numa_init' because too costly to inline (cost=930, > > threshold=45) [-Rpass-missed=inline] > > if (!nodes_weight(numa_nodes_parsed)) > > ^ > > Hmm. That's just a "bitmap_weight()", and that function in turn is > __always_inline. > > And the *reason* it is __always_inline is that it really wants to act > as a macro, and look at the second argument and do special things if > it is a small constant value. > > And it looks like clang messes things up by simply not doing enough > simplification before inlining decisions, so it all looks very > complicated to clang, even though when you actually generate code, you > have one (of two) very simple code sequences. > > > > Wouldn't it be better to make > > > them always-inline? > > > > Perhaps, see what that might look like: > > https://github.com/ClangBuiltLinux/linux/issues/1302#issuecomment-807260475 > > Does that look better? > > I suspect that in this case, because of clang deficiencies, that > __always_inline actually is the right thing to do at least on > __nodes_weight. > > Looking at your comment lower down > > https://github.com/ClangBuiltLinux/linux/issues/1302#issuecomment-807757878 > > I really think this is a clang bug, and that you need to do certain > simplifications both before _and_ after inlining. > > Before, because of the inlining cost decisions particularly wrt > constant arguments. > > After, because successful inlining changes things completely. I made that comment because our LowerConstantIntrinsics pass was simplifying our internal representation of __builtin_constant_p and __builtin_object_size in the same pass, after inlining. For the case of (defconfig) __nodes_weight not being inlined into amd_numa_init, it's because hweight_long is a huge block of code all predicated on __builtin_constant_p (BCP). Because we evaluate (BCP) AFTER inlining which is very much necessary for the semantics of BCP, we don't eliminate that entire block. Playing with the cost model though... we basically have the pattern: void caller (void) { caller(42); } void callee (int x) { if (__builtin_constant_p(x)) { // M "instructions" } else { // N "instructions" } } the current cost model says that the cost of inlining callee into caller is ~ M + N + C. If we knew that BCP would fold away based on inling, and which way, we might be able to consider the cost just M + C or N + C. In the case of "(defconfig) __nodes_weight not being inlined into amd_numa_init" M >> N, and yet after inlining BCP evaluates to false (so the estimated cost was M + N but in actuality was closer to N). (See how big __const_hweight64 is before BCP evaluation; in this case callee is the analog for hweight64). I guess if the argument to BCP is a parameter of callee (brittle), then we perhaps should be able to figure whether BCP would evaluate to true or false, then properly select M or N. Ok, let me see if I can go build that into the cost model...and if that actually helps any of these cases... Though I do wonder what happens when there's more than one level here...ie. void caller (void) { mid(42): } void mid (int x) { callee(x); } void callee (int x) { if __builtin_constant_p(x) /* M */; else /* N */; } or void caller0 (void) { mid(42): } // BCP would be true void caller1 (int x) { mid(x); } // BCP would be false void mid (int x) { callee(x); } // looking just at the call site, BCP would be false void callee (int x) { if __builtin_constant_p(x) /* M */; else /* N */; } (Sorry for the brain dump; this is more for me than for you. Good chat!) > > Marking __nodes_weight() be __always_inline just works around clang > being broken in this regard. > > It is _possible_ that it might help to make bitmap_weight() be a macro > instead of an inline function, but it's a kind of sad state of affairs > if that is required. > > And it might well fail - if you don't do the constant propagation > before making inlining decisions, you'll _still_ end up thinking that > bitmap_weight() is very costly because you don't do that > __builtin_constant_p() lowering. > > And then you end up using the (much more expensive) generic function > instead of the cheap "look, for single words this is a trivial" thing.
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index cb8ab7d91d30..c3d0395315ef 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -865,6 +865,29 @@ static int match(const char *sym, const char * const pat[]) return 0; } +struct secref_exception { + const char * const fromsym, * const tosym; +}; + +static const struct secref_exception secref_allowlist[] = { + { .fromsym = "__first_node", .tosym = "numa_nodes_parsed" }, + { .fromsym = "__next_node", .tosym = "numa_nodes_parsed" }, + { .fromsym = "__nodes_weight", .tosym = "numa_nodes_parsed" }, + { .fromsym = "early_get_smp_config", .tosym = "x86_init" }, + { .fromsym = "test_bit", .tosym = "numa_nodes_parsed" }, +}; + +static int match_allowlist(const char *fromsym, const char *tosym) +{ + int i = 0, e = ARRAY_SIZE(secref_allowlist); + + for (; i != e; ++i) + if (!strcmp(secref_allowlist[i].fromsym, fromsym) && + !strcmp(secref_allowlist[i].tosym, tosym)) + return 1; + return 0; +} + /* sections that we do not want to do full section mismatch check on */ static const char *const section_white_list[] = { @@ -1204,6 +1227,8 @@ static const struct sectioncheck *section_mismatch( * tosec = init section * fromsec = text section * refsymname = *.constprop.* + * LLVM will do similar constant propagation, but it will not rename the + * transformed callee. * * Pattern 6: * Hide section mismatch warnings for ELF local symbols. The goal @@ -1216,7 +1241,8 @@ static const struct sectioncheck *section_mismatch( **/ static int secref_whitelist(const struct sectioncheck *mismatch, const char *fromsec, const char *fromsym, - const char *tosec, const char *tosym) + const char *tosec, const char *tosym, + _Bool isclang) { /* Check for pattern 1 */ if (match(tosec, init_data_sections) && @@ -1247,9 +1273,10 @@ static int secref_whitelist(const struct sectioncheck *mismatch, /* Check for pattern 5 */ if (match(fromsec, text_sections) && - match(tosec, init_sections) && - match(fromsym, optim_symbols)) - return 0; + match(tosec, init_sections)) + if (match(fromsym, optim_symbols) || + (isclang && match_allowlist(fromsym, tosym))) + return 0; /* Check for pattern 6 */ if (strstarts(fromsym, ".L")) @@ -1573,6 +1600,21 @@ static void report_sec_mismatch(const char *modname, fprintf(stderr, "\n"); } +static _Bool is_clang(struct elf_info *elf) +{ + Elf_Sym *sym; + + for (sym = elf->symtab_start; sym < elf->symtab_stop; sym++) { + if (is_shndx_special(sym->st_shndx)) + continue; + if (strcmp(sec_name(elf, get_secindex(elf, sym)), ".comment") != 0) + continue; + return strstr(sym_get_data(elf, sym), "clang") != NULL; + } + + return false; +} + static void default_mismatch_handler(const char *modname, struct elf_info *elf, const struct sectioncheck* const mismatch, Elf_Rela *r, Elf_Sym *sym, const char *fromsec) @@ -1582,6 +1624,7 @@ static void default_mismatch_handler(const char *modname, struct elf_info *elf, Elf_Sym *from; const char *tosym; const char *fromsym; + _Bool isclang; from = find_elf_symbol2(elf, r->r_offset, fromsec); fromsym = sym_name(elf, from); @@ -1592,10 +1635,11 @@ static void default_mismatch_handler(const char *modname, struct elf_info *elf, tosec = sec_name(elf, get_secindex(elf, sym)); to = find_elf_symbol(elf, r->r_addend, sym); tosym = sym_name(elf, to); + isclang = is_clang(elf); /* check whitelist - we may ignore it */ - if (secref_whitelist(mismatch, - fromsec, fromsym, tosec, tosym)) { + if (secref_whitelist(mismatch, fromsec, fromsym, tosec, tosym, + isclang)) { report_sec_mismatch(modname, mismatch, fromsec, r->r_offset, fromsym, is_function(from), tosec, tosym, diff --git a/scripts/mod/modpost.h b/scripts/mod/modpost.h index 0c47ff95c0e2..d8afc912fd92 100644 --- a/scripts/mod/modpost.h +++ b/scripts/mod/modpost.h @@ -214,3 +214,5 @@ void modpost_log(enum loglevel loglevel, const char *fmt, ...); #define warn(fmt, args...) modpost_log(LOG_WARN, fmt, ##args) #define error(fmt, args...) modpost_log(LOG_ERROR, fmt, ##args) #define fatal(fmt, args...) modpost_log(LOG_FATAL, fmt, ##args) + +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
Modpost validation of calls from .text to .init sections relies on GCC's constant propagation renaming specialized functions being renamed to have .constprop.* suffixes. See the comment on Pattern 5 in scripts/mod/modpost.c: GCC may optimize static inlines when fed constant arg(s) resulting in functions like cpumask_empty() -- generating an associated symbol cpumask_empty.constprop.3 that appears in the audit. If the const that is passed in comes from __init, like say nmi_ipi_mask, we get a meaningless section warning. LLVM does similar optimizations (inter-procedural sparse conditional constant propagation; IPSCCP), but doesn't rename the specialized functions, so we still observe modpost warnings. Add checks in modpost to check if the .comment section contains that string "clang" (ie. was the object file built with clang?), and if so additionally check an allow list. Fixes the following modpost warnings observed on clang-13+: allmodconfig: WARNING: modpost: vmlinux.o(.text+0x*): Section mismatch in reference from the function test_bit() to the variable .init.data:numa_nodes_parsed __first_node() to the variable .init.data:numa_nodes_parsed __next_node() to the variable .init.data:numa_nodes_parsed __nodes_weight() to the variable .init.data:numa_nodes_parsed early_get_smp_config() to the variable .init.data:x86_init defconfig: __nodes_weight() to the variable .init.data:numa_nodes_parsed Link: https://github.com/ClangBuiltLinux/linux/issues/1302 Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> --- scripts/mod/modpost.c | 56 ++++++++++++++++++++++++++++++++++++++----- scripts/mod/modpost.h | 2 ++ 2 files changed, 52 insertions(+), 6 deletions(-)