diff mbox series

[RFC] kbuild: Prevent compiler mismatch with external modules

Message ID fff056a7c9e6050c2d60910f70b6d99602f3bec4.1611863075.git.jpoimboe@redhat.com (mailing list archive)
State New, archived
Headers show
Series [RFC] kbuild: Prevent compiler mismatch with external modules | expand

Commit Message

Josh Poimboeuf Jan. 28, 2021, 8:08 p.m. UTC
When building an external module, if the compiler version differs from
what the kernel was built with, bad things can happen.  Many kernel
features change based on available compiler features.  Silently removing
a compiler-dependent feature in the external module build can cause
unpredictable behavior.  Right now there are no checks to help prevent
such mismatches.

On the other hand, when a user is building an external module against a
distro kernel, the exact compiler version may not be installed, or in
some cases not even released yet.  In fact it's quite common for
external modules to be built with a slightly different version of GCC
than the kernel.

A minor version mismatch should be ok.  User space does it all the time.
New compiler features aren't added within a major version.

Add a check for compiler mismatch, but only check the major version.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
This is related to the previous RFC I posted:

  https://lkml.kernel.org/r/efe6b039a544da8215d5e54aa7c4b6d1986fc2b0.1611607264.git.jpoimboe@redhat.com

The discussion revealed gaps between developer perceptions and distro
realities with respect to external (out-of-tree) modules...

Backing up a bit, let's please decide on what exactly is supported (or
not supported) with respect to mismatched compiler versions.  Then let's
try to enforce and/or document the decision.

Please stick to technical arguments...


 Makefile | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Linus Torvalds Jan. 28, 2021, 8:24 p.m. UTC | #1
On Thu, Jan 28, 2021 at 12:08 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> Add a check for compiler mismatch, but only check the major version.

I think this is wrong for multiple reasons.

The most fundamental reason is that it's pointless and doesn't
actually do what you claim it does.

Just doing a "make oldconfig" will reset the CONFIG_xyz_VERSION to
whatever is installed, and now your check doesn't actually do
anything, since you're not actually checking what the kernel was
compiled with!

So I think that check is pointless and entirely misleading. It doesn't
do what you want it to do, and what you claim it does.

I'm not convinced about the whole magic vs minor argument either. The
whole "new compiler features" thing is a red herring - even if you do
have new compiler features, that in itself has very little to do with
whether the resulting object files are compatible or not.

So I say NAK, on the basis that the patch is nonsensical, tests the
wrong thing, and doesn't really have a technical reason for it.

             Linus
Josh Poimboeuf Jan. 28, 2021, 8:52 p.m. UTC | #2
On Thu, Jan 28, 2021 at 12:24:45PM -0800, Linus Torvalds wrote:
> On Thu, Jan 28, 2021 at 12:08 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >
> > Add a check for compiler mismatch, but only check the major version.
> 
> I think this is wrong for multiple reasons.
> 
> The most fundamental reason is that it's pointless and doesn't
> actually do what you claim it does.
> 
> Just doing a "make oldconfig" will reset the CONFIG_xyz_VERSION to
> whatever is installed, and now your check doesn't actually do
> anything, since you're not actually checking what the kernel was
> compiled with!

Huh?  Why would you do a "make oldconfig" on a distro-released kernel
before building an OOT module?

Usually you build an OOT module against /lib/modules/$(uname -r)/build,
and the .config isn't even writable.

> So I think that check is pointless and entirely misleading. It doesn't
> do what you want it to do, and what you claim it does.

It's not a magic bullet, but doesn't it catch the vast majority of use
cases?  Which makes it a lot better than what we have now (nothing).

> I'm not convinced about the whole magic vs minor argument either. The
> whole "new compiler features" thing is a red herring - even if you do
> have new compiler features, that in itself has very little to do with
> whether the resulting object files are compatible or not.

Hm?  Are you saying the check is too strict, since GCC9 binaries _might_
be compatible with GCC10?
Linus Torvalds Jan. 28, 2021, 9:03 p.m. UTC | #3
On Thu, Jan 28, 2021 at 12:52 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> Huh?  Why would you do a "make oldconfig" on a distro-released kernel
> before building an OOT module?

I guarantee you that this patch will *make* people do that.

> Hm?  Are you saying the check is too strict, since GCC9 binaries _might_
> be compatible with GCC10?

I'm saying that your argument about minor and major versions is bogus.

There is absolutely nothing that makes gcc9 object files not
compatible with gcc10.

And this is not just some theoretical issue: this is a fundamental
fact of EVERY SINGLE LIBRARY OUT THERE.

Do you think that when you compile your binaries with gcc-10, you need
to link against a standard library that has been compiled with gcc-10?

I really think the whole compiler version check is purely voodoo programming.

                 Linus
Linus Torvalds Jan. 28, 2021, 9:23 p.m. UTC | #4
On Thu, Jan 28, 2021 at 1:03 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I really think the whole compiler version check is purely voodoo programming.

.. but there are obviously potentially things we - in the kernel - do
that may make certain compiler versions incompatible. We long long ago
used to have things like "you can't have an empty struct because gcc
version x.y.z doesn't support it", so even a UP spinlock would be

       typedef struct { int gcc_is_buggy; } raw_spinlock_t;

but only if you compiled it with a version of gcc older than 3.0. So
compiling one file with one compiler, and another with a newer one,
would result in the data structures simply not having the same layout.

That's not because of compiler versions per se, it's because of our
version checks.

THAT workaround is long gone, but I didn't check what other ones we
might have now. But the gcc version checks we _do_ have are not
necessarily about major versions at all (ie I trivially found checks
for 4.9, 4.9.2, 5.1, 7.2 and 9.1).

               Linus
Josh Poimboeuf Jan. 28, 2021, 9:34 p.m. UTC | #5
On Thu, Jan 28, 2021 at 01:23:11PM -0800, Linus Torvalds wrote:
> On Thu, Jan 28, 2021 at 1:03 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I really think the whole compiler version check is purely voodoo programming.
> 
> .. but there are obviously potentially things we - in the kernel - do
> that may make certain compiler versions incompatible. We long long ago
> used to have things like "you can't have an empty struct because gcc
> version x.y.z doesn't support it", so even a UP spinlock would be
> 
>        typedef struct { int gcc_is_buggy; } raw_spinlock_t;
> 
> but only if you compiled it with a version of gcc older than 3.0. So
> compiling one file with one compiler, and another with a newer one,
> would result in the data structures simply not having the same layout.
> 
> That's not because of compiler versions per se, it's because of our
> version checks.

Right, this is what I'm trying to say.  We have features based on
compiler version checks.  Peterz pointed out asm goto as a previous
example.

> THAT workaround is long gone, but I didn't check what other ones we
> might have now. But the gcc version checks we _do_ have are not
> necessarily about major versions at all (ie I trivially found checks
> for 4.9, 4.9.2, 5.1, 7.2 and 9.1).

Then maybe the check should be same major.minor?

And convert it to a strongly worded warning/disclaimer?
Linus Torvalds Jan. 28, 2021, 9:45 p.m. UTC | #6
On Thu, Jan 28, 2021 at 1:34 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> On Thu, Jan 28, 2021 at 01:23:11PM -0800, Linus Torvalds wrote:
> > THAT workaround is long gone, but I didn't check what other ones we
> > might have now. But the gcc version checks we _do_ have are not
> > necessarily about major versions at all (ie I trivially found checks
> > for 4.9, 4.9.2, 5.1, 7.2 and 9.1).
>
> Then maybe the check should be same major.minor?

Well, how many of them are actually about things that generate
incompatible object code?

The main one I can think of is the KASAN ABI version checks, but
honestly, I think that's irrelevant. I really hope no distros enable
KASAN in user kernels.

Another version check I looked at was the one that just checks whether
the compiler natively supports __builtin_mul_overflow() or not - it
doesn't generate incompatible object code, it just takes advantage of
a compiler feature if one exists. You can mix and match those kinds of
things well enough.

So I'd really like to hear actual hard technical reasons with
examples, for why you'd want to add this test in the first place.

No hand-waving "different compiler versions don't work together".
Because that's simply not true.

> And convert it to a strongly worded warning/disclaimer?

A warning might be better for the simple reason that it wouldn't cause
people to just fix it with "make oldconfig".

Maybe you haven't looked at people who compile external modules, but
they always have various "this is how to work around issues with
version XYZ". That "make oldconfig" would simply just become the
workaround for any build errors.

And a warning might be more palatable even if different compiler
version work fine together. Just a heads up on "it looks like you
might be mixing compiler versions" is a valid note, and isn't
necessarily wrong. Even when they work well together, maybe you want
to have people at least _aware_ of it.

              Linus
Josh Poimboeuf Jan. 28, 2021, 10:08 p.m. UTC | #7
On Thu, Jan 28, 2021 at 01:45:51PM -0800, Linus Torvalds wrote:
> On Thu, Jan 28, 2021 at 1:34 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >
> > On Thu, Jan 28, 2021 at 01:23:11PM -0800, Linus Torvalds wrote:
> > > THAT workaround is long gone, but I didn't check what other ones we
> > > might have now. But the gcc version checks we _do_ have are not
> > > necessarily about major versions at all (ie I trivially found checks
> > > for 4.9, 4.9.2, 5.1, 7.2 and 9.1).
> >
> > Then maybe the check should be same major.minor?
> 
> Well, how many of them are actually about things that generate
> incompatible object code?
> 
> The main one I can think of is the KASAN ABI version checks, but
> honestly, I think that's irrelevant. I really hope no distros enable
> KASAN in user kernels.
> 
> Another version check I looked at was the one that just checks whether
> the compiler natively supports __builtin_mul_overflow() or not - it
> doesn't generate incompatible object code, it just takes advantage of
> a compiler feature if one exists. You can mix and match those kinds of
> things well enough.
> 
> So I'd really like to hear actual hard technical reasons with
> examples, for why you'd want to add this test in the first place.

Unfortunately I don't have technical reasons beyond what we've already
discussed, found from code inspection.

This patch was born from a discussion where wildly different opinions
were expressed about whether such a mismatch scenario (or even external
modules in general!) would be supported at all.

So I guess the goal is to clarify (in the code base) to what extent
compiler mismatches are supported/allowed/encouraged.  Because they
definitely happen in the real world, but a lot of people seem to be
sticking their head in the sand about it.

If we decide it's not a cut-and-dry makefile check, then the policy
should at least be documented.

I'd prefer the warning though, since nobody's going to read the docs.

> No hand-waving "different compiler versions don't work together".
> Because that's simply not true.
> 
> > And convert it to a strongly worded warning/disclaimer?
> 
> A warning might be better for the simple reason that it wouldn't cause
> people to just fix it with "make oldconfig".
> 
> Maybe you haven't looked at people who compile external modules, but
> they always have various "this is how to work around issues with
> version XYZ". That "make oldconfig" would simply just become the
> workaround for any build errors.
> 
> And a warning might be more palatable even if different compiler
> version work fine together. Just a heads up on "it looks like you
> might be mixing compiler versions" is a valid note, and isn't
> necessarily wrong. Even when they work well together, maybe you want
> to have people at least _aware_ of it.

Sounds reasonable.
Masahiro Yamada Jan. 28, 2021, 11:17 p.m. UTC | #8
On Fri, Jan 29, 2021 at 7:08 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> On Thu, Jan 28, 2021 at 01:45:51PM -0800, Linus Torvalds wrote:
> > On Thu, Jan 28, 2021 at 1:34 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > >
> > > On Thu, Jan 28, 2021 at 01:23:11PM -0800, Linus Torvalds wrote:
> > > > THAT workaround is long gone, but I didn't check what other ones we
> > > > might have now. But the gcc version checks we _do_ have are not
> > > > necessarily about major versions at all (ie I trivially found checks
> > > > for 4.9, 4.9.2, 5.1, 7.2 and 9.1).
> > >
> > > Then maybe the check should be same major.minor?
> >
> > Well, how many of them are actually about things that generate
> > incompatible object code?
> >
> > The main one I can think of is the KASAN ABI version checks, but
> > honestly, I think that's irrelevant. I really hope no distros enable
> > KASAN in user kernels.
> >
> > Another version check I looked at was the one that just checks whether
> > the compiler natively supports __builtin_mul_overflow() or not - it
> > doesn't generate incompatible object code, it just takes advantage of
> > a compiler feature if one exists. You can mix and match those kinds of
> > things well enough.
> >
> > So I'd really like to hear actual hard technical reasons with
> > examples, for why you'd want to add this test in the first place.
>
> Unfortunately I don't have technical reasons beyond what we've already
> discussed, found from code inspection.
>
> This patch was born from a discussion where wildly different opinions
> were expressed about whether such a mismatch scenario (or even external
> modules in general!) would be supported at all.
>
> So I guess the goal is to clarify (in the code base) to what extent
> compiler mismatches are supported/allowed/encouraged.  Because they
> definitely happen in the real world, but a lot of people seem to be
> sticking their head in the sand about it.
>
> If we decide it's not a cut-and-dry makefile check, then the policy
> should at least be documented.
>
> I'd prefer the warning though, since nobody's going to read the docs.
>
> > No hand-waving "different compiler versions don't work together".
> > Because that's simply not true.
> >
> > > And convert it to a strongly worded warning/disclaimer?
> >
> > A warning might be better for the simple reason that it wouldn't cause
> > people to just fix it with "make oldconfig".
> >
> > Maybe you haven't looked at people who compile external modules, but
> > they always have various "this is how to work around issues with
> > version XYZ". That "make oldconfig" would simply just become the
> > workaround for any build errors.
> >
> > And a warning might be more palatable even if different compiler
> > version work fine together. Just a heads up on "it looks like you
> > might be mixing compiler versions" is a valid note, and isn't
> > necessarily wrong. Even when they work well together, maybe you want
> > to have people at least _aware_ of it.
>
> Sounds reasonable.
>
> --
> Josh
>

[1]

First, let me explain how Kbuild works w.r.t the compiler version
check.

When working on the kernel tree, Kbuild automatically detects
the compiler upgrade (this is done by comparing the output
of '$(CC) --version'), and invokes Kconfig to sync the configuration.
So, the .config is updated even if you do not explicitly
do "make oldconfig".


When working on external modules, in contrast,
Kbuild does not attempt to update anything in the kernel tree.
This makes sense since the build tree, /lib/modules/$(uname -r)/build/
is read-only.
You cannot manually run Kconfig either because the config targets
are hidden for external modules.

$ make M=../qemu-build/extmod  oldconfig
make: *** No rule to make target 'oldconfig'.  Stop.



[2]

As for this patch, it is wrong to do this check in the Makefile
parse stage.

"make M=...  clean"
"make M=...  help"

etc. will fail.
Such targets do not require the compiler in the first place.

This check must be done before starting building something,


Also, this patch is not applicable.
gcc-version.sh and clang-version.sh do not exist.
See linux-next.



[3]
Peterz already pointed out asm-goto as an example of ABI mismatch.

I remember a trouble reported in the past due
to the mismatch of -mstack-protector-guard-offset.

https://bugzilla.kernel.org/show_bug.cgi?id=201891

This has already been fixed,
and it will no longer happen though.





--
Best Regards




Masahiro Yamada
Josh Poimboeuf Feb. 1, 2021, 9:13 p.m. UTC | #9
On Fri, Jan 29, 2021 at 08:17:51AM +0900, Masahiro Yamada wrote:
> [3]
> Peterz already pointed out asm-goto as an example of ABI mismatch.
> 
> I remember a trouble reported in the past due
> to the mismatch of -mstack-protector-guard-offset.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=201891
> 
> This has already been fixed,
> and it will no longer happen though.

This is kind of concerning though.  It would be nice to somehow store
KCLAGS in the config and warn if it changes unexpectedly.

This can be a problem not only for OOT modules, but for regular kernel
builds which have a .config copied from somewhere.

Because of the toolchain-dependent kconfig options, features can
silently disappear if the toolchain doesn't support them, due to a
different compiler version, or even a missing library.

> [2]
> 
> As for this patch, it is wrong to do this check in the Makefile
> parse stage.
> 
> "make M=...  clean"
> "make M=...  help"
> 
> etc. will fail.
> Such targets do not require the compiler in the first place.
> 
> This check must be done before starting building something,
>
> Also, this patch is not applicable.
> gcc-version.sh and clang-version.sh do not exist.
> See linux-next.

Something like so?

diff --git a/Makefile b/Makefile
index 95ab9856f357..10ca621369fb 100644
--- a/Makefile
+++ b/Makefile
@@ -1721,12 +1721,25 @@ KBUILD_MODULES := 1
 
 build-dirs := $(KBUILD_EXTMOD)
 PHONY += modules
-modules: $(MODORDER)
+modules: ext_compiler_check $(MODORDER)
 	$(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modpost
 
 $(MODORDER): descend
 	@:
 
+orig_name   := $(if $(CONFIG_CC_IS_GCC),GCC,CLANG)
+orig_minor  := $(shell expr $(if $(CONFIG_CC_IS_GCC),$(CONFIG_GCC_VERSION),$(CONFIG_CLANG_VERSION)) / 100)
+cur_namever := $(shell $(srctree)/scripts/cc-version.sh $(CC))
+cur_name    := $(word 1,$(cur_namever))
+cur_minor   := $(shell expr $(word 2,$(cur_namever)) / 100)
+PHONY += ext_compiler_check
+ext_compiler_check:
+	@if [ $(orig_name) != $(cur_name) ] || [ $(orig_minor) != $(cur_minor) ]; then \
+		echo >&2 "warning: The compiler differs from the version which was used to build the kernel."; \
+		echo >&2 "warning: Some kernel features are compiler-dependent."; \
+		echo >&2 "warning: It's recommended that you change your compiler to match the version in the .config file."; \
+	fi
+
 PHONY += modules_install
 modules_install: _emodinst_ _emodinst_post
Masahiro Yamada March 5, 2021, 4:28 p.m. UTC | #10
On Tue, Feb 2, 2021 at 6:13 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> On Fri, Jan 29, 2021 at 08:17:51AM +0900, Masahiro Yamada wrote:
> > [3]
> > Peterz already pointed out asm-goto as an example of ABI mismatch.
> >
> > I remember a trouble reported in the past due
> > to the mismatch of -mstack-protector-guard-offset.
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=201891
> >
> > This has already been fixed,
> > and it will no longer happen though.
>
> This is kind of concerning though.  It would be nice to somehow store
> KCLAGS in the config and warn if it changes unexpectedly.
>
> This can be a problem not only for OOT modules, but for regular kernel
> builds which have a .config copied from somewhere.
>
> Because of the toolchain-dependent kconfig options, features can
> silently disappear if the toolchain doesn't support them, due to a
> different compiler version, or even a missing library.
>
> > [2]
> >
> > As for this patch, it is wrong to do this check in the Makefile
> > parse stage.
> >
> > "make M=...  clean"
> > "make M=...  help"
> >
> > etc. will fail.
> > Such targets do not require the compiler in the first place.
> >
> > This check must be done before starting building something,
> >
> > Also, this patch is not applicable.
> > gcc-version.sh and clang-version.sh do not exist.
> > See linux-next.
>
> Something like so?

No, I do not think so.


>
> diff --git a/Makefile b/Makefile
> index 95ab9856f357..10ca621369fb 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1721,12 +1721,25 @@ KBUILD_MODULES := 1
>
>  build-dirs := $(KBUILD_EXTMOD)
>  PHONY += modules
> -modules: $(MODORDER)
> +modules: ext_compiler_check $(MODORDER)


For the single thread build, yes
GNU Make will run the targets from left to right,
so ext_compiler_check is run before compiling
any build artifact.

If -j <N> option is given, there is no guarantee
that ext_compiler_check finishes first.



>         $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modpost
>
>  $(MODORDER): descend
>         @:
>
> +orig_name   := $(if $(CONFIG_CC_IS_GCC),GCC,CLANG)
> +orig_minor  := $(shell expr $(if $(CONFIG_CC_IS_GCC),$(CONFIG_GCC_VERSION),$(CONFIG_CLANG_VERSION)) / 100)
> +cur_namever := $(shell $(srctree)/scripts/cc-version.sh $(CC))
> +cur_name    := $(word 1,$(cur_namever))
> +cur_minor   := $(shell expr $(word 2,$(cur_namever)) / 100)

These are still calculated by 'make M=... clean' or 'make M=... help'.
Using '=' assignment solves it, but the code is still ugly.


I attached my alternative implementation.


> +PHONY += ext_compiler_check
> +ext_compiler_check:
> +       @if [ $(orig_name) != $(cur_name) ] || [ $(orig_minor) != $(cur_minor) ]; then \
> +               echo >&2 "warning: The compiler differs from the version which was used to build the kernel."; \
> +               echo >&2 "warning: Some kernel features are compiler-dependent."; \
> +               echo >&2 "warning: It's recommended that you change your compiler to match the version in the .config file."; \
> +       fi
> +
>  PHONY += modules_install
>  modules_install: _emodinst_ _emodinst_post
>
>

--
Best Regards
Masahiro Yamada
Josh Poimboeuf March 5, 2021, 7:24 p.m. UTC | #11
On Sat, Mar 06, 2021 at 01:28:22AM +0900, Masahiro Yamada wrote:
> > +orig_name   := $(if $(CONFIG_CC_IS_GCC),GCC,CLANG)
> > +orig_minor  := $(shell expr $(if $(CONFIG_CC_IS_GCC),$(CONFIG_GCC_VERSION),$(CONFIG_CLANG_VERSION)) / 100)
> > +cur_namever := $(shell $(srctree)/scripts/cc-version.sh $(CC))
> > +cur_name    := $(word 1,$(cur_namever))
> > +cur_minor   := $(shell expr $(word 2,$(cur_namever)) / 100)
> 
> These are still calculated by 'make M=... clean' or 'make M=... help'.
> Using '=' assignment solves it, but the code is still ugly.
> 
> 
> I attached my alternative implementation.

Thanks for the attached patch, yours looks much cleaner.  Looks like it
warns on *any* mismatch, rather than just a major.minor mismatch.  But
that's ok with me.

Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index b0e4767735dc..f281d2587fa5 100644
--- a/Makefile
+++ b/Makefile
@@ -1744,6 +1744,14 @@  help:
 # no-op for external module builds
 PHONY += prepare modules_prepare
 
+# External module compiler (major) version must match the kernel
+ifneq ($(shell echo $(CONFIG_GCC_VERSION) | cut -c-2), $(shell $(srctree)/scripts/gcc-version.sh $(CC) | cut -c-2))
+  $(error ERROR: Compiler version mismatch in external module build)
+endif
+ifneq ($(shell echo $(CONFIG_CLANG_VERSION) | cut -c-2), $(shell $(srctree)/scripts/clang-version.sh $(CC) | cut -c-2))
+  $(error ERROR: Compiler version mismatch in external module build)
+endif
+
 endif # KBUILD_EXTMOD
 
 # Single targets