Message ID | fff056a7c9e6050c2d60910f70b6d99602f3bec4.1611863075.git.jpoimboe@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] kbuild: Prevent compiler mismatch with external modules | expand |
On Thu, Jan 28, 2021 at 12:08 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > Add a check for compiler mismatch, but only check the major version. I think this is wrong for multiple reasons. The most fundamental reason is that it's pointless and doesn't actually do what you claim it does. Just doing a "make oldconfig" will reset the CONFIG_xyz_VERSION to whatever is installed, and now your check doesn't actually do anything, since you're not actually checking what the kernel was compiled with! So I think that check is pointless and entirely misleading. It doesn't do what you want it to do, and what you claim it does. I'm not convinced about the whole magic vs minor argument either. The whole "new compiler features" thing is a red herring - even if you do have new compiler features, that in itself has very little to do with whether the resulting object files are compatible or not. So I say NAK, on the basis that the patch is nonsensical, tests the wrong thing, and doesn't really have a technical reason for it. Linus
On Thu, Jan 28, 2021 at 12:24:45PM -0800, Linus Torvalds wrote: > On Thu, Jan 28, 2021 at 12:08 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > > > Add a check for compiler mismatch, but only check the major version. > > I think this is wrong for multiple reasons. > > The most fundamental reason is that it's pointless and doesn't > actually do what you claim it does. > > Just doing a "make oldconfig" will reset the CONFIG_xyz_VERSION to > whatever is installed, and now your check doesn't actually do > anything, since you're not actually checking what the kernel was > compiled with! Huh? Why would you do a "make oldconfig" on a distro-released kernel before building an OOT module? Usually you build an OOT module against /lib/modules/$(uname -r)/build, and the .config isn't even writable. > So I think that check is pointless and entirely misleading. It doesn't > do what you want it to do, and what you claim it does. It's not a magic bullet, but doesn't it catch the vast majority of use cases? Which makes it a lot better than what we have now (nothing). > I'm not convinced about the whole magic vs minor argument either. The > whole "new compiler features" thing is a red herring - even if you do > have new compiler features, that in itself has very little to do with > whether the resulting object files are compatible or not. Hm? Are you saying the check is too strict, since GCC9 binaries _might_ be compatible with GCC10?
On Thu, Jan 28, 2021 at 12:52 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > Huh? Why would you do a "make oldconfig" on a distro-released kernel > before building an OOT module? I guarantee you that this patch will *make* people do that. > Hm? Are you saying the check is too strict, since GCC9 binaries _might_ > be compatible with GCC10? I'm saying that your argument about minor and major versions is bogus. There is absolutely nothing that makes gcc9 object files not compatible with gcc10. And this is not just some theoretical issue: this is a fundamental fact of EVERY SINGLE LIBRARY OUT THERE. Do you think that when you compile your binaries with gcc-10, you need to link against a standard library that has been compiled with gcc-10? I really think the whole compiler version check is purely voodoo programming. Linus
On Thu, Jan 28, 2021 at 1:03 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I really think the whole compiler version check is purely voodoo programming. .. but there are obviously potentially things we - in the kernel - do that may make certain compiler versions incompatible. We long long ago used to have things like "you can't have an empty struct because gcc version x.y.z doesn't support it", so even a UP spinlock would be typedef struct { int gcc_is_buggy; } raw_spinlock_t; but only if you compiled it with a version of gcc older than 3.0. So compiling one file with one compiler, and another with a newer one, would result in the data structures simply not having the same layout. That's not because of compiler versions per se, it's because of our version checks. THAT workaround is long gone, but I didn't check what other ones we might have now. But the gcc version checks we _do_ have are not necessarily about major versions at all (ie I trivially found checks for 4.9, 4.9.2, 5.1, 7.2 and 9.1). Linus
On Thu, Jan 28, 2021 at 01:23:11PM -0800, Linus Torvalds wrote: > On Thu, Jan 28, 2021 at 1:03 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > I really think the whole compiler version check is purely voodoo programming. > > .. but there are obviously potentially things we - in the kernel - do > that may make certain compiler versions incompatible. We long long ago > used to have things like "you can't have an empty struct because gcc > version x.y.z doesn't support it", so even a UP spinlock would be > > typedef struct { int gcc_is_buggy; } raw_spinlock_t; > > but only if you compiled it with a version of gcc older than 3.0. So > compiling one file with one compiler, and another with a newer one, > would result in the data structures simply not having the same layout. > > That's not because of compiler versions per se, it's because of our > version checks. Right, this is what I'm trying to say. We have features based on compiler version checks. Peterz pointed out asm goto as a previous example. > THAT workaround is long gone, but I didn't check what other ones we > might have now. But the gcc version checks we _do_ have are not > necessarily about major versions at all (ie I trivially found checks > for 4.9, 4.9.2, 5.1, 7.2 and 9.1). Then maybe the check should be same major.minor? And convert it to a strongly worded warning/disclaimer?
On Thu, Jan 28, 2021 at 1:34 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > On Thu, Jan 28, 2021 at 01:23:11PM -0800, Linus Torvalds wrote: > > THAT workaround is long gone, but I didn't check what other ones we > > might have now. But the gcc version checks we _do_ have are not > > necessarily about major versions at all (ie I trivially found checks > > for 4.9, 4.9.2, 5.1, 7.2 and 9.1). > > Then maybe the check should be same major.minor? Well, how many of them are actually about things that generate incompatible object code? The main one I can think of is the KASAN ABI version checks, but honestly, I think that's irrelevant. I really hope no distros enable KASAN in user kernels. Another version check I looked at was the one that just checks whether the compiler natively supports __builtin_mul_overflow() or not - it doesn't generate incompatible object code, it just takes advantage of a compiler feature if one exists. You can mix and match those kinds of things well enough. So I'd really like to hear actual hard technical reasons with examples, for why you'd want to add this test in the first place. No hand-waving "different compiler versions don't work together". Because that's simply not true. > And convert it to a strongly worded warning/disclaimer? A warning might be better for the simple reason that it wouldn't cause people to just fix it with "make oldconfig". Maybe you haven't looked at people who compile external modules, but they always have various "this is how to work around issues with version XYZ". That "make oldconfig" would simply just become the workaround for any build errors. And a warning might be more palatable even if different compiler version work fine together. Just a heads up on "it looks like you might be mixing compiler versions" is a valid note, and isn't necessarily wrong. Even when they work well together, maybe you want to have people at least _aware_ of it. Linus
On Thu, Jan 28, 2021 at 01:45:51PM -0800, Linus Torvalds wrote: > On Thu, Jan 28, 2021 at 1:34 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > > > On Thu, Jan 28, 2021 at 01:23:11PM -0800, Linus Torvalds wrote: > > > THAT workaround is long gone, but I didn't check what other ones we > > > might have now. But the gcc version checks we _do_ have are not > > > necessarily about major versions at all (ie I trivially found checks > > > for 4.9, 4.9.2, 5.1, 7.2 and 9.1). > > > > Then maybe the check should be same major.minor? > > Well, how many of them are actually about things that generate > incompatible object code? > > The main one I can think of is the KASAN ABI version checks, but > honestly, I think that's irrelevant. I really hope no distros enable > KASAN in user kernels. > > Another version check I looked at was the one that just checks whether > the compiler natively supports __builtin_mul_overflow() or not - it > doesn't generate incompatible object code, it just takes advantage of > a compiler feature if one exists. You can mix and match those kinds of > things well enough. > > So I'd really like to hear actual hard technical reasons with > examples, for why you'd want to add this test in the first place. Unfortunately I don't have technical reasons beyond what we've already discussed, found from code inspection. This patch was born from a discussion where wildly different opinions were expressed about whether such a mismatch scenario (or even external modules in general!) would be supported at all. So I guess the goal is to clarify (in the code base) to what extent compiler mismatches are supported/allowed/encouraged. Because they definitely happen in the real world, but a lot of people seem to be sticking their head in the sand about it. If we decide it's not a cut-and-dry makefile check, then the policy should at least be documented. I'd prefer the warning though, since nobody's going to read the docs. > No hand-waving "different compiler versions don't work together". > Because that's simply not true. > > > And convert it to a strongly worded warning/disclaimer? > > A warning might be better for the simple reason that it wouldn't cause > people to just fix it with "make oldconfig". > > Maybe you haven't looked at people who compile external modules, but > they always have various "this is how to work around issues with > version XYZ". That "make oldconfig" would simply just become the > workaround for any build errors. > > And a warning might be more palatable even if different compiler > version work fine together. Just a heads up on "it looks like you > might be mixing compiler versions" is a valid note, and isn't > necessarily wrong. Even when they work well together, maybe you want > to have people at least _aware_ of it. Sounds reasonable.
On Fri, Jan 29, 2021 at 7:08 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > On Thu, Jan 28, 2021 at 01:45:51PM -0800, Linus Torvalds wrote: > > On Thu, Jan 28, 2021 at 1:34 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > > > > > On Thu, Jan 28, 2021 at 01:23:11PM -0800, Linus Torvalds wrote: > > > > THAT workaround is long gone, but I didn't check what other ones we > > > > might have now. But the gcc version checks we _do_ have are not > > > > necessarily about major versions at all (ie I trivially found checks > > > > for 4.9, 4.9.2, 5.1, 7.2 and 9.1). > > > > > > Then maybe the check should be same major.minor? > > > > Well, how many of them are actually about things that generate > > incompatible object code? > > > > The main one I can think of is the KASAN ABI version checks, but > > honestly, I think that's irrelevant. I really hope no distros enable > > KASAN in user kernels. > > > > Another version check I looked at was the one that just checks whether > > the compiler natively supports __builtin_mul_overflow() or not - it > > doesn't generate incompatible object code, it just takes advantage of > > a compiler feature if one exists. You can mix and match those kinds of > > things well enough. > > > > So I'd really like to hear actual hard technical reasons with > > examples, for why you'd want to add this test in the first place. > > Unfortunately I don't have technical reasons beyond what we've already > discussed, found from code inspection. > > This patch was born from a discussion where wildly different opinions > were expressed about whether such a mismatch scenario (or even external > modules in general!) would be supported at all. > > So I guess the goal is to clarify (in the code base) to what extent > compiler mismatches are supported/allowed/encouraged. Because they > definitely happen in the real world, but a lot of people seem to be > sticking their head in the sand about it. > > If we decide it's not a cut-and-dry makefile check, then the policy > should at least be documented. > > I'd prefer the warning though, since nobody's going to read the docs. > > > No hand-waving "different compiler versions don't work together". > > Because that's simply not true. > > > > > And convert it to a strongly worded warning/disclaimer? > > > > A warning might be better for the simple reason that it wouldn't cause > > people to just fix it with "make oldconfig". > > > > Maybe you haven't looked at people who compile external modules, but > > they always have various "this is how to work around issues with > > version XYZ". That "make oldconfig" would simply just become the > > workaround for any build errors. > > > > And a warning might be more palatable even if different compiler > > version work fine together. Just a heads up on "it looks like you > > might be mixing compiler versions" is a valid note, and isn't > > necessarily wrong. Even when they work well together, maybe you want > > to have people at least _aware_ of it. > > Sounds reasonable. > > -- > Josh > [1] First, let me explain how Kbuild works w.r.t the compiler version check. When working on the kernel tree, Kbuild automatically detects the compiler upgrade (this is done by comparing the output of '$(CC) --version'), and invokes Kconfig to sync the configuration. So, the .config is updated even if you do not explicitly do "make oldconfig". When working on external modules, in contrast, Kbuild does not attempt to update anything in the kernel tree. This makes sense since the build tree, /lib/modules/$(uname -r)/build/ is read-only. You cannot manually run Kconfig either because the config targets are hidden for external modules. $ make M=../qemu-build/extmod oldconfig make: *** No rule to make target 'oldconfig'. Stop. [2] As for this patch, it is wrong to do this check in the Makefile parse stage. "make M=... clean" "make M=... help" etc. will fail. Such targets do not require the compiler in the first place. This check must be done before starting building something, Also, this patch is not applicable. gcc-version.sh and clang-version.sh do not exist. See linux-next. [3] Peterz already pointed out asm-goto as an example of ABI mismatch. I remember a trouble reported in the past due to the mismatch of -mstack-protector-guard-offset. https://bugzilla.kernel.org/show_bug.cgi?id=201891 This has already been fixed, and it will no longer happen though. -- Best Regards Masahiro Yamada
On Fri, Jan 29, 2021 at 08:17:51AM +0900, Masahiro Yamada wrote: > [3] > Peterz already pointed out asm-goto as an example of ABI mismatch. > > I remember a trouble reported in the past due > to the mismatch of -mstack-protector-guard-offset. > > https://bugzilla.kernel.org/show_bug.cgi?id=201891 > > This has already been fixed, > and it will no longer happen though. This is kind of concerning though. It would be nice to somehow store KCLAGS in the config and warn if it changes unexpectedly. This can be a problem not only for OOT modules, but for regular kernel builds which have a .config copied from somewhere. Because of the toolchain-dependent kconfig options, features can silently disappear if the toolchain doesn't support them, due to a different compiler version, or even a missing library. > [2] > > As for this patch, it is wrong to do this check in the Makefile > parse stage. > > "make M=... clean" > "make M=... help" > > etc. will fail. > Such targets do not require the compiler in the first place. > > This check must be done before starting building something, > > Also, this patch is not applicable. > gcc-version.sh and clang-version.sh do not exist. > See linux-next. Something like so? diff --git a/Makefile b/Makefile index 95ab9856f357..10ca621369fb 100644 --- a/Makefile +++ b/Makefile @@ -1721,12 +1721,25 @@ KBUILD_MODULES := 1 build-dirs := $(KBUILD_EXTMOD) PHONY += modules -modules: $(MODORDER) +modules: ext_compiler_check $(MODORDER) $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modpost $(MODORDER): descend @: +orig_name := $(if $(CONFIG_CC_IS_GCC),GCC,CLANG) +orig_minor := $(shell expr $(if $(CONFIG_CC_IS_GCC),$(CONFIG_GCC_VERSION),$(CONFIG_CLANG_VERSION)) / 100) +cur_namever := $(shell $(srctree)/scripts/cc-version.sh $(CC)) +cur_name := $(word 1,$(cur_namever)) +cur_minor := $(shell expr $(word 2,$(cur_namever)) / 100) +PHONY += ext_compiler_check +ext_compiler_check: + @if [ $(orig_name) != $(cur_name) ] || [ $(orig_minor) != $(cur_minor) ]; then \ + echo >&2 "warning: The compiler differs from the version which was used to build the kernel."; \ + echo >&2 "warning: Some kernel features are compiler-dependent."; \ + echo >&2 "warning: It's recommended that you change your compiler to match the version in the .config file."; \ + fi + PHONY += modules_install modules_install: _emodinst_ _emodinst_post
On Tue, Feb 2, 2021 at 6:13 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > On Fri, Jan 29, 2021 at 08:17:51AM +0900, Masahiro Yamada wrote: > > [3] > > Peterz already pointed out asm-goto as an example of ABI mismatch. > > > > I remember a trouble reported in the past due > > to the mismatch of -mstack-protector-guard-offset. > > > > https://bugzilla.kernel.org/show_bug.cgi?id=201891 > > > > This has already been fixed, > > and it will no longer happen though. > > This is kind of concerning though. It would be nice to somehow store > KCLAGS in the config and warn if it changes unexpectedly. > > This can be a problem not only for OOT modules, but for regular kernel > builds which have a .config copied from somewhere. > > Because of the toolchain-dependent kconfig options, features can > silently disappear if the toolchain doesn't support them, due to a > different compiler version, or even a missing library. > > > [2] > > > > As for this patch, it is wrong to do this check in the Makefile > > parse stage. > > > > "make M=... clean" > > "make M=... help" > > > > etc. will fail. > > Such targets do not require the compiler in the first place. > > > > This check must be done before starting building something, > > > > Also, this patch is not applicable. > > gcc-version.sh and clang-version.sh do not exist. > > See linux-next. > > Something like so? No, I do not think so. > > diff --git a/Makefile b/Makefile > index 95ab9856f357..10ca621369fb 100644 > --- a/Makefile > +++ b/Makefile > @@ -1721,12 +1721,25 @@ KBUILD_MODULES := 1 > > build-dirs := $(KBUILD_EXTMOD) > PHONY += modules > -modules: $(MODORDER) > +modules: ext_compiler_check $(MODORDER) For the single thread build, yes GNU Make will run the targets from left to right, so ext_compiler_check is run before compiling any build artifact. If -j <N> option is given, there is no guarantee that ext_compiler_check finishes first. > $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modpost > > $(MODORDER): descend > @: > > +orig_name := $(if $(CONFIG_CC_IS_GCC),GCC,CLANG) > +orig_minor := $(shell expr $(if $(CONFIG_CC_IS_GCC),$(CONFIG_GCC_VERSION),$(CONFIG_CLANG_VERSION)) / 100) > +cur_namever := $(shell $(srctree)/scripts/cc-version.sh $(CC)) > +cur_name := $(word 1,$(cur_namever)) > +cur_minor := $(shell expr $(word 2,$(cur_namever)) / 100) These are still calculated by 'make M=... clean' or 'make M=... help'. Using '=' assignment solves it, but the code is still ugly. I attached my alternative implementation. > +PHONY += ext_compiler_check > +ext_compiler_check: > + @if [ $(orig_name) != $(cur_name) ] || [ $(orig_minor) != $(cur_minor) ]; then \ > + echo >&2 "warning: The compiler differs from the version which was used to build the kernel."; \ > + echo >&2 "warning: Some kernel features are compiler-dependent."; \ > + echo >&2 "warning: It's recommended that you change your compiler to match the version in the .config file."; \ > + fi > + > PHONY += modules_install > modules_install: _emodinst_ _emodinst_post > > -- Best Regards Masahiro Yamada
On Sat, Mar 06, 2021 at 01:28:22AM +0900, Masahiro Yamada wrote: > > +orig_name := $(if $(CONFIG_CC_IS_GCC),GCC,CLANG) > > +orig_minor := $(shell expr $(if $(CONFIG_CC_IS_GCC),$(CONFIG_GCC_VERSION),$(CONFIG_CLANG_VERSION)) / 100) > > +cur_namever := $(shell $(srctree)/scripts/cc-version.sh $(CC)) > > +cur_name := $(word 1,$(cur_namever)) > > +cur_minor := $(shell expr $(word 2,$(cur_namever)) / 100) > > These are still calculated by 'make M=... clean' or 'make M=... help'. > Using '=' assignment solves it, but the code is still ugly. > > > I attached my alternative implementation. Thanks for the attached patch, yours looks much cleaner. Looks like it warns on *any* mismatch, rather than just a major.minor mismatch. But that's ok with me. Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
diff --git a/Makefile b/Makefile index b0e4767735dc..f281d2587fa5 100644 --- a/Makefile +++ b/Makefile @@ -1744,6 +1744,14 @@ help: # no-op for external module builds PHONY += prepare modules_prepare +# External module compiler (major) version must match the kernel +ifneq ($(shell echo $(CONFIG_GCC_VERSION) | cut -c-2), $(shell $(srctree)/scripts/gcc-version.sh $(CC) | cut -c-2)) + $(error ERROR: Compiler version mismatch in external module build) +endif +ifneq ($(shell echo $(CONFIG_CLANG_VERSION) | cut -c-2), $(shell $(srctree)/scripts/clang-version.sh $(CC) | cut -c-2)) + $(error ERROR: Compiler version mismatch in external module build) +endif + endif # KBUILD_EXTMOD # Single targets
When building an external module, if the compiler version differs from what the kernel was built with, bad things can happen. Many kernel features change based on available compiler features. Silently removing a compiler-dependent feature in the external module build can cause unpredictable behavior. Right now there are no checks to help prevent such mismatches. On the other hand, when a user is building an external module against a distro kernel, the exact compiler version may not be installed, or in some cases not even released yet. In fact it's quite common for external modules to be built with a slightly different version of GCC than the kernel. A minor version mismatch should be ok. User space does it all the time. New compiler features aren't added within a major version. Add a check for compiler mismatch, but only check the major version. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> --- This is related to the previous RFC I posted: https://lkml.kernel.org/r/efe6b039a544da8215d5e54aa7c4b6d1986fc2b0.1611607264.git.jpoimboe@redhat.com The discussion revealed gaps between developer perceptions and distro realities with respect to external (out-of-tree) modules... Backing up a bit, let's please decide on what exactly is supported (or not supported) with respect to mismatched compiler versions. Then let's try to enforce and/or document the decision. Please stick to technical arguments... Makefile | 8 ++++++++ 1 file changed, 8 insertions(+)