Message ID | 20210907183843.33028-1-ndesaulniers@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Revert "Enable '-Werror' by default for all kernel builds" | expand |
On Tue, Sep 7, 2021 at 11:39 AM Nick Desaulniers <ndesaulniers@google.com> wrote: > > This reverts commit 3fe617ccafd6f5bb33c2391d6f4eeb41c1fd0151. > > The above commit seems as though it was merged in response to > https://lore.kernel.org/linux-hardening/CAHk-=wj4EG=kCOaqyPEq5VXa97kyUHsBpBn3DWwE91qcnDytOQ@mail.gmail.com/. > > While I can appreciate the intent of enabling -Werror, I don't think it > is the right tool to address the root cause of developers not testing > certain toolchains or configurations, or taking existing reports they're > getting serious enough. > > Having more appropriate CI or processes in place to prevent untested > patches from being merged into mainline may also be worth discussing. I agree that -Werror by default needs more discussion. Default WERROR makes building old kernels with new compilers more painful. CI systems could do a better job surfacing compiler warnings if they don't do it currently. > I'd also like to see such a patch sent formally to the list for > discussion and have time to soak in next rather than be merged directly > into mainline without either. > > -Werror is great for preventing new errors from creeping in when a > codebase is free of warnings for all configs and all targets and the > toolchain is never updated. Unfortunately, none of the above is the case > for the Linux kernel at this time. > > The addition of new compiler diagnostic flags in the -W group to -Wall > make toolchain updates excessively more painful. This can lead to > commits that disable warnings rather than work towards addressing them. > Some diagnostics are useful but take incredible work or churn to > completely free a codebase from them. > > Warning can be upgraded to errors with -Werror=foo or downgraded from > errors back to warnings via -Wno-error=foo. -Wno-error=foo is a double > edged sword; it doesn't help you spot the introduction of additional > instances of that warning easily. > > This change has caused nearly all of our CI to go red, and requires us > to now disable CONFIG_WERROR until every last target and every last > config is addressed. Rather than require everyone to disable the above > config to keep builds going, perhaps certain CI systems should instead > set CFLAGS_KERNEL=-Werror. > > Why don't we just fix every warning? We have been, for years, and we're > still not done yet. See our issue tracker below, contributors wanted. > > With more time/active discussion, we can probably land something more > appropriate. It should involve the Kbuild maintainer and list. > > For instance, I have questions around how should such a config interact > with randconfigs and allconfigs. This config also seems to duplicate the > existing CONFIG_PPC_DISABLE_WERROR without merging the two. > > I do recognize the irony of someone who's spent a lot of time cleaning > up warnings to be advocating for disabling -Werror...it's not lost on > me. Our Pixel (née Nexus) team has been effectively carrying an out of > tree patch enabling -Werror since before I ever contributed to the > kernel. > > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Guenter Roeck <linux@roeck-us.net> > Cc: Kees Cook <keescook@chromium.org> > Cc: Mark Brown <broonie@kernel.org> > Cc: Masahiro Yamada <masahiroy@kernel.org> > Cc: Nathan Chancellor <nathan@kernel.org> > Link: https://github.com/ClangBuiltLinux/linux/issues/1449 > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> > --- > Makefile | 3 --- > init/Kconfig | 14 -------------- > 2 files changed, 17 deletions(-) > > diff --git a/Makefile b/Makefile > index d45fc2edf186..6bc1c5b17a62 100644 > --- a/Makefile > +++ b/Makefile > @@ -785,9 +785,6 @@ stackp-flags-$(CONFIG_STACKPROTECTOR_STRONG) := -fstack-protector-strong > > KBUILD_CFLAGS += $(stackp-flags-y) > > -KBUILD_CFLAGS-$(CONFIG_WERROR) += -Werror > -KBUILD_CFLAGS += $(KBUILD_CFLAGS-y) > - > ifdef CONFIG_CC_IS_CLANG > KBUILD_CPPFLAGS += -Qunused-arguments > # The kernel builds with '-std=gnu89' so use of GNU extensions is acceptable. > diff --git a/init/Kconfig b/init/Kconfig > index 8cb97f141b70..e708180e9a59 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -137,20 +137,6 @@ config COMPILE_TEST > here. If you are a user/distributor, say N here to exclude useless > drivers to be distributed. > > -config WERROR > - bool "Compile the kernel with warnings as errors" > - default y > - help > - A kernel build should not cause any compiler warnings, and this > - enables the '-Werror' flag to enforce that rule by default. > - > - However, if you have a new (or very old) compiler with odd and > - unusual warnings, or you have some architecture with problems, > - you may need to disable this config option in order to > - successfully build the kernel. > - > - If in doubt, say Y. > - > config UAPI_HEADER_TEST > bool "Compile test UAPI headers" > depends on HEADERS_INSTALL && CC_CAN_LINK > > base-commit: 4b93c544e90e2b28326182d31ee008eb80e02074 > -- > 2.33.0.153.gba50c8fa24-goog >
On Tue, Sep 7, 2021 at 11:39 AM Nick Desaulniers <ndesaulniers@google.com> wrote: > > The above commit seems as though it was merged in response to > https://lore.kernel.org/linux-hardening/CAHk-=wj4EG=kCOaqyPEq5VXa97kyUHsBpBn3DWwE91qcnDytOQ@mail.gmail.com/. No. It was merged in response of _years_ of pain, with the last one just being the final drop. I'm not going to revert that change. I probably will have to limit it (by making that WERROR option depend on certain expectations), but basically any maintainer who has code that causes warnings should expect that they will have to fix those warnings. If it's clang that generates bogus warnings, then we'll have to start disable clang warnings. The clang people tend to be proud of thir fewer false positives, but so far looking at things, I am not convinced. And I'm most definitely not convinced when the "let's finally enable -Werror after years of talking about it", people end up going "but but but I have thousands of warnings". That's the POINT of that commit. That "but but but I have thousands of warnings" is not acceptable. I spent hours yesterday getting rid of some warnings. It shouldn't be on me fixing peoples code. It shouldn't be on me noticing that people send me crap that warns. And it really shouldn't be "Linus cares about warnings, so configurations that Linus doesn't test can continue for years to have them". My "no warnings" policy isn't exactly new, and people shouldn't be shocked when I then say "time to clean up *YOUR* house too". Linus
On Tue, Sep 7, 2021 at 12:16 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Tue, Sep 7, 2021 at 11:39 AM Nick Desaulniers > <ndesaulniers@google.com> wrote: > > > > The above commit seems as though it was merged in response to > > https://lore.kernel.org/linux-hardening/CAHk-=wj4EG=kCOaqyPEq5VXa97kyUHsBpBn3DWwE91qcnDytOQ@mail.gmail.com/. > > No. It was merged in response of _years_ of pain, with the last one > just being the final drop. Well, I think it's important to enumerate some of the pain points explicitly so that we might all better understand what they are, and come to an agreement together on what the right way to resolve them is. Impulse commits without discussion over a holiday weekend are firmly in the category of "I wish you'd rather not have done that." Let's look closer at the final drop. I think there's a bigger question here: 1. did the CI have enough time to run in multiple configurations with multiple toolchains to even find a warning, let alone report it? 2. what did the developer do with the report (if they even got it, see 1 above)? Ignore it? Proceed with the PR anyways? it looks to me as though the "final drop" fits into 1 above. I'm not sure that -Werror is the correct way to resolve issues from that. > I'm not going to revert that change. I probably will have to limit it > (by making that WERROR option depend on certain expectations), but > basically any maintainer who has code that causes warnings should > expect that they will have to fix those warnings. I'm not 100% against it; I think it could land in a more useful variation. But it would be good to discuss that on-list, and give it time to soak. This is a conversation we should be having with CI maintainers IMO, and folks that maintain the build infra, at least. I'm happy to kick off that discussion with this RFC. > If it's clang that generates bogus warnings, then we'll have to start > disable clang warnings. The clang people tend to be proud of thir > fewer false positives, but so far looking at things, I am not > convinced. Now you're just attacking a strawman. > And I'm most definitely not convinced when the "let's finally enable > -Werror after years of talking about it", people end up going "but but > but I have thousands of warnings". The kernel is full of thousands of warnings at the moment. It might not be for your limited set of configs, targets, and toolchains, but the folks running CI are very very well aware that the kernel is far from enabling -Werror seriously. Any given maintainer sending you a PR cannot (and should not, IMO) know under what combination of configs, targets, and toolchains you'll test under, and -Werror isn't going to help them figure it out. Not every commit that makes its way to mainline has spent equal amounts of soak time in -next. (I suspect there are commits going into mainline that spent zero time in -next, not just from you). Patches merged by maintainers shortly before the merge window open have minimal coverage compared to older commits they've accepted. So CI systems can't find diagnostics (warnings/errors) in various combinations of config/target/toolchain if patches are skipping -next or spending a short amount of time in -next. To be clear, you have merged patches into mainline that broke the build for combinations of configs/targets/toolchains that you are not testing. It's not realistic for you or any one person to test all such combinations either. Other software projects have solved this by relying on bots to handle merges. Bots that don't forget to test combinations of various targets, or enable developers to ignore warnings (if the warnings are even reported). > That's the POINT of that commit. That "but but but I have thousands of > warnings" is not acceptable. > > I spent hours yesterday getting rid of some warnings. It shouldn't be > on me fixing peoples code. It shouldn't be on me noticing that people > send me crap that warns. I agree. I disagree that -Werror will solve that. Developers will get the same report from 0day bot, just now as an error rather than warning, and they will continue to ignore the report because "well it works for me." (Maybe I'm attacking the strawman now). > And it really shouldn't be "Linus cares about warnings, so > configurations that Linus doesn't test can continue for years to have > them". I agree, and we have been making progress on this. I'd say the advent of the various CI systems over the past 5 years is a much welcomed improvement in at least understanding which combinations are exposing issues. > My "no warnings" policy isn't exactly new, and people shouldn't be > shocked when I then say "time to clean up *YOUR* house too". We have been cleaning house, for nearly half a decade in my experience. Fixing warnings in the Linux kernel for all possible configs, targets, and toolchains while the toolchains continue to add more diagnostics is more sisyphean than digging a hole in wet sand.
On Tue, Sep 07, 2021 at 01:30PM -0700, Nick Desaulniers wrote: > On Tue, Sep 7, 2021 at 12:16 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: [...] > > I'm not going to revert that change. I probably will have to limit it > > (by making that WERROR option depend on certain expectations), but > > basically any maintainer who has code that causes warnings should > > expect that they will have to fix those warnings. > > I'm not 100% against it; I think it could land in a more useful > variation. But it would be good to discuss that on-list, and give it > time to soak. This is a conversation we should be having with CI > maintainers IMO, and folks that maintain the build infra, at least. > I'm happy to kick off that discussion with this RFC. Here's a datapoint: I had to disable CONFIG_WERROR on a bunch of syzbot instances which started failing because of -Werror [1], because syzbot's time is better spent on fuzzing, and having the odd warning in some subsystem penalize fuzzing of the entire kernel is not appropriate. [1] https://github.com/google/syzkaller/commit/e096c0a2a414e487412c9669426780ce5acdde9d Getting the kernel built is a hard requirement for any sort of runtime testing. Once the kernel is built, runtime testing of various subsystems can proceed in parallel. A single warning in some odd subsystem penalizing the _entire_ kernel's testing progress is inappropriate. The severity of a use-after-free bug found by runtime testing is orders of magnitude more severe than some "unused variable" warning. Now such a warning would delay finding bugs at runtime on many CI systems that haven't yet disabled the warning. I'm predicting most distributions and runtime-focused CIs will disable the warning. I have formulated this in the form of a patch below, that might move this new Kconfig option towards its appropriate usecases by default. The intent is not to dispute the usefulness of -Werror, but select the appropriate usecases by default, limiting friction for all those who can do little more than say CONFIG_WERROR=n. Thanks, -- Marco ------ >8 ------ From: Marco Elver <elver@google.com> Date: Tue, 7 Sep 2021 23:12:08 +0200 Subject: [PATCH] kbuild: Only default to -Werror if COMPILE_TEST The cross-product of the kernel's supported toolchains, architectures, and configuration options is large. So large, that it's generally accepted to be infeasible to enumerate and build+test them all (many compile-testers rely on randomly generated configs). Without the possibility to enumerate all possible combinations of toolchains, architectures, and configuration options, it is inevitable that compiler warnings in this space exist. With -Werror, this means that an innumerable set of kernels are now broken, yet had been perfectly usable before (confused compilers, code with warnings unused, or luck). Distributors will necessarily pick a point in the toolchain X arch X config space, and if unlucky, will have a broken build. Granted, those will likely disable CONFIG_WERROR and move on. The kernel's default configuration is unlikely to be suitable for all users, but it's inappropriate to force many users to set CONFIG_WERROR=n. This also holds for CI systems which are focused on runtime testing, where the odd warning in some subsystem will disrupt testing of the rest of the kernel. Many of those runtime-focused CI systems run tests or fuzz the kernel using runtime debugging tools. Runtime testing of different subsystems can proceed in parallel, and potentially uncover serious bugs; halting runtime testing of the entire kernel because of the odd warning (now error) in a subsystem or driver is simply inappropriate. Therefore, runtime-focused CI systems will likely choose CONFIG_WERROR=n as well. The appropriate usecase for -Werror is therefore compile-test focused builds (often done by developers or CI systems). Reflect this in the Kconfig option by making the default value of WERROR match COMPILE_TEST. Signed-off-by: Marco Elver <elver@google.com> --- init/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/init/Kconfig b/init/Kconfig index 8cb97f141b70..11f8a845f259 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -139,7 +139,7 @@ config COMPILE_TEST config WERROR bool "Compile the kernel with warnings as errors" - default y + default COMPILE_TEST help A kernel build should not cause any compiler warnings, and this enables the '-Werror' flag to enforce that rule by default.
On Tue, Sep 7, 2021 at 3:14 PM Marco Elver <elver@google.com> wrote: > > > config WERROR > bool "Compile the kernel with warnings as errors" > - default y > + default COMPILE_TEST That seems reasonable. It very much is about build-testing. Linus
On 9/7/21 3:14 PM, Marco Elver wrote: > On Tue, Sep 07, 2021 at 01:30PM -0700, Nick Desaulniers wrote: >> On Tue, Sep 7, 2021 at 12:16 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > [...] >>> I'm not going to revert that change. I probably will have to limit it >>> (by making that WERROR option depend on certain expectations), but >>> basically any maintainer who has code that causes warnings should >>> expect that they will have to fix those warnings. >> >> I'm not 100% against it; I think it could land in a more useful >> variation. But it would be good to discuss that on-list, and give it >> time to soak. This is a conversation we should be having with CI >> maintainers IMO, and folks that maintain the build infra, at least. >> I'm happy to kick off that discussion with this RFC. > > Here's a datapoint: I had to disable CONFIG_WERROR on a bunch of syzbot > instances which started failing because of -Werror [1], because syzbot's > time is better spent on fuzzing, and having the odd warning in some > subsystem penalize fuzzing of the entire kernel is not appropriate. > > [1] https://github.com/google/syzkaller/commit/e096c0a2a414e487412c9669426780ce5acdde9d > > Getting the kernel built is a hard requirement for any sort of runtime > testing. Once the kernel is built, runtime testing of various subsystems > can proceed in parallel. A single warning in some odd subsystem > penalizing the _entire_ kernel's testing progress is inappropriate. The > severity of a use-after-free bug found by runtime testing is orders of > magnitude more severe than some "unused variable" warning. Now such a > warning would delay finding bugs at runtime on many CI systems that > haven't yet disabled the warning. > > I'm predicting most distributions and runtime-focused CIs will disable > the warning. > > I have formulated this in the form of a patch below, that might move > this new Kconfig option towards its appropriate usecases by default. > > The intent is not to dispute the usefulness of -Werror, but select the > appropriate usecases by default, limiting friction for all those who can > do little more than say CONFIG_WERROR=n. > > Thanks, > -- Marco > > ------ >8 ------ > > From: Marco Elver <elver@google.com> > Date: Tue, 7 Sep 2021 23:12:08 +0200 > Subject: [PATCH] kbuild: Only default to -Werror if COMPILE_TEST > > The cross-product of the kernel's supported toolchains, architectures, > and configuration options is large. So large, that it's generally > accepted to be infeasible to enumerate and build+test them all > (many compile-testers rely on randomly generated configs). > > Without the possibility to enumerate all possible combinations of > toolchains, architectures, and configuration options, it is inevitable > that compiler warnings in this space exist. > > With -Werror, this means that an innumerable set of kernels are now > broken, yet had been perfectly usable before (confused compilers, code > with warnings unused, or luck). > > Distributors will necessarily pick a point in the toolchain X arch X > config space, and if unlucky, will have a broken build. Granted, those > will likely disable CONFIG_WERROR and move on. > > The kernel's default configuration is unlikely to be suitable for all > users, but it's inappropriate to force many users to set CONFIG_WERROR=n. > > This also holds for CI systems which are focused on runtime testing, > where the odd warning in some subsystem will disrupt testing of the rest > of the kernel. Many of those runtime-focused CI systems run tests or > fuzz the kernel using runtime debugging tools. Runtime testing of > different subsystems can proceed in parallel, and potentially uncover > serious bugs; halting runtime testing of the entire kernel because of > the odd warning (now error) in a subsystem or driver is simply > inappropriate. > > Therefore, runtime-focused CI systems will likely choose CONFIG_WERROR=n > as well. > > The appropriate usecase for -Werror is therefore compile-test focused > builds (often done by developers or CI systems). > > Reflect this in the Kconfig option by making the default value of WERROR > match COMPILE_TEST. > > Signed-off-by: Marco Elver <elver@google.com> I like it. Acked-by: Guenter Roeck <linux@roeck-us.net> > --- > init/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/init/Kconfig b/init/Kconfig > index 8cb97f141b70..11f8a845f259 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -139,7 +139,7 @@ config COMPILE_TEST > > config WERROR > bool "Compile the kernel with warnings as errors" > - default y > + default COMPILE_TEST > help > A kernel build should not cause any compiler warnings, and this > enables the '-Werror' flag to enforce that rule by default. >
On 9/7/21 3:18 PM, Linus Torvalds wrote: > On Tue, Sep 7, 2021 at 3:14 PM Marco Elver <elver@google.com> wrote: >> >> >> config WERROR >> bool "Compile the kernel with warnings as errors" >> - default y >> + default COMPILE_TEST > > That seems reasonable. It very much is about build-testing. That and 2 more things IMO: a. having developers be responsible for build warnings, not just build errors b. having maintainers merge them more like they are build errors and not just some warnings that can be overlooked. I don't see enough of a. or b. :(
On Wed, Sep 08, 2021 at 12:14:19AM +0200, Marco Elver wrote: > Here's a datapoint: I had to disable CONFIG_WERROR on a bunch of syzbot > instances which started failing because of -Werror [1], because syzbot's > time is better spent on fuzzing, and having the odd warning in some > subsystem penalize fuzzing of the entire kernel is not appropriate. Similarly, I have to disable -Werror (which various archs and subsystems already use) whenever I test building the kernel with new toolchains. It is the biggest set of kernel patches I keep, already, since many years. I actually have good hopes that a centralised -Werror thing will make this easier :-) Maybe there can be an E=[01] kernel build flag to disable / enable CONFIG_WERROR? Something that will override it for just that command. This would make life easier for many use cases, while at the same time not being something that people can "forget" they did. Segher
On Wed, Sep 08, 2021 at 12:14:19AM +0200, Marco Elver wrote: > I'm predicting most distributions and runtime-focused CIs will disable > the warning. Yes, indeed. > Date: Tue, 7 Sep 2021 23:12:08 +0200 > Subject: [PATCH] kbuild: Only default to -Werror if COMPILE_TEST > > The cross-product of the kernel's supported toolchains, architectures, > and configuration options is large. So large, that it's generally > accepted to be infeasible to enumerate and build+test them all > (many compile-testers rely on randomly generated configs). Reviwed-by: Mark Brown <broonie@kernel.org>
On Wed, Sep 08, 2021 at 12:14:19AM +0200, Marco Elver wrote: > On Tue, Sep 07, 2021 at 01:30PM -0700, Nick Desaulniers wrote: > > On Tue, Sep 7, 2021 at 12:16 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > [...] > > > I'm not going to revert that change. I probably will have to limit it > > > (by making that WERROR option depend on certain expectations), but > > > basically any maintainer who has code that causes warnings should > > > expect that they will have to fix those warnings. > > > > I'm not 100% against it; I think it could land in a more useful > > variation. But it would be good to discuss that on-list, and give it > > time to soak. This is a conversation we should be having with CI > > maintainers IMO, and folks that maintain the build infra, at least. > > I'm happy to kick off that discussion with this RFC. > > Here's a datapoint: I had to disable CONFIG_WERROR on a bunch of syzbot > instances which started failing because of -Werror [1], because syzbot's > time is better spent on fuzzing, and having the odd warning in some > subsystem penalize fuzzing of the entire kernel is not appropriate. > > [1] https://github.com/google/syzkaller/commit/e096c0a2a414e487412c9669426780ce5acdde9d > > Getting the kernel built is a hard requirement for any sort of runtime > testing. Once the kernel is built, runtime testing of various subsystems > can proceed in parallel. A single warning in some odd subsystem > penalizing the _entire_ kernel's testing progress is inappropriate. The > severity of a use-after-free bug found by runtime testing is orders of > magnitude more severe than some "unused variable" warning. Now such a > warning would delay finding bugs at runtime on many CI systems that > haven't yet disabled the warning. > > I'm predicting most distributions and runtime-focused CIs will disable > the warning. > > I have formulated this in the form of a patch below, that might move > this new Kconfig option towards its appropriate usecases by default. > > The intent is not to dispute the usefulness of -Werror, but select the > appropriate usecases by default, limiting friction for all those who can > do little more than say CONFIG_WERROR=n. > > Thanks, > -- Marco > > ------ >8 ------ > > From: Marco Elver <elver@google.com> > Date: Tue, 7 Sep 2021 23:12:08 +0200 > Subject: [PATCH] kbuild: Only default to -Werror if COMPILE_TEST > > The cross-product of the kernel's supported toolchains, architectures, > and configuration options is large. So large, that it's generally > accepted to be infeasible to enumerate and build+test them all > (many compile-testers rely on randomly generated configs). > > Without the possibility to enumerate all possible combinations of > toolchains, architectures, and configuration options, it is inevitable > that compiler warnings in this space exist. > > With -Werror, this means that an innumerable set of kernels are now > broken, yet had been perfectly usable before (confused compilers, code > with warnings unused, or luck). > > Distributors will necessarily pick a point in the toolchain X arch X > config space, and if unlucky, will have a broken build. Granted, those > will likely disable CONFIG_WERROR and move on. > > The kernel's default configuration is unlikely to be suitable for all > users, but it's inappropriate to force many users to set CONFIG_WERROR=n. > > This also holds for CI systems which are focused on runtime testing, > where the odd warning in some subsystem will disrupt testing of the rest > of the kernel. Many of those runtime-focused CI systems run tests or > fuzz the kernel using runtime debugging tools. Runtime testing of > different subsystems can proceed in parallel, and potentially uncover > serious bugs; halting runtime testing of the entire kernel because of > the odd warning (now error) in a subsystem or driver is simply > inappropriate. > > Therefore, runtime-focused CI systems will likely choose CONFIG_WERROR=n > as well. > > The appropriate usecase for -Werror is therefore compile-test focused > builds (often done by developers or CI systems). > > Reflect this in the Kconfig option by making the default value of WERROR > match COMPILE_TEST. > > Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> > --- > init/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/init/Kconfig b/init/Kconfig > index 8cb97f141b70..11f8a845f259 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -139,7 +139,7 @@ config COMPILE_TEST > > config WERROR > bool "Compile the kernel with warnings as errors" > - default y > + default COMPILE_TEST > help > A kernel build should not cause any compiler warnings, and this > enables the '-Werror' flag to enforce that rule by default. > -- > 2.33.0.153.gba50c8fa24-goog
On Tue, Sep 07, 2021 at 12:16:22PM -0700, Linus Torvalds wrote: > On Tue, Sep 7, 2021 at 11:39 AM Nick Desaulniers > <ndesaulniers@google.com> wrote: > > The above commit seems as though it was merged in response to > > https://lore.kernel.org/linux-hardening/CAHk-=wj4EG=kCOaqyPEq5VXa97kyUHsBpBn3DWwE91qcnDytOQ@mail.gmail.com/. > No. It was merged in response of _years_ of pain, with the last one > just being the final drop. > I'm not going to revert that change. I probably will have to limit it > (by making that WERROR option depend on certain expectations), but > basically any maintainer who has code that causes warnings should > expect that they will have to fix those warnings. Echoing what others have said about runtime testing having -Werror on by default for defconfigs is going to cause issues for bisection and just generally noticing promptly when runtime issues are introduced - people or systems bisecting boot or testsuite issues will encounter more blocks of commits that they skip due to unrelated issues, and it's much more likely that we'll just have gaps in coverage when for example the day's -next has some random warning on some platforms. In terms of prioritisation it feels like the wrong call to have -Werror on everywhere. The bisection really is helpful, especially when the people looking at the problem don't have direct access the systems to be able to run tests themselves - it really increases the quality of reports when the automation in CI services is able to identify the likely commit, and makes it *much* more likely those reports be sent in the first place. Of course this isn't insurmountable, the test systems can always add config fragments to tweak the configurations they're testing (they already need to do this to run lots of kselftest for example) but that's something we've generally tried to minimise since everyone working with custom configs has generally caused friction in the past. > And it really shouldn't be "Linus cares about warnings, so > configurations that Linus doesn't test can continue for years to have > them". > My "no warnings" policy isn't exactly new, and people shouldn't be > shocked when I then say "time to clean up *YOUR* house too". IME pushing on this stuff it's not that people don't care, it's that they naturally have different test coverage due to their particular interests and the resources available to them. Even people who are very dilligent in their own testing and paying a lot of attention to -next and the automated test stuff that's out there are going to introduce some breakage from time to time, and sometimes there's good reasons for processes to get short circuited. Some of this is also on the people doing the more niche stuff to keep on top of reporting issues that come up as they do so, they should be able to expect that people will pay attention when they do so but the more niche you get the less surprising it is when nobody else notices issues.
On Tue, Sep 07, 2021 at 12:16:22PM -0700, Linus Torvalds wrote: > > That's the POINT of that commit. That "but but but I have thousands of > warnings" is not acceptable. I'm actually surprised you did this after the discussion with gcc warning about using "main" as a local variable. > My "no warnings" policy isn't exactly new, and people shouldn't be > shocked when I then say "time to clean up *YOUR* house too". Note, ktest has a way to create a list of current warnings, and then test your code against it, and will fail on any new warning. I run this on all my pull requests to make sure that I do not introduce any new warnings. That said, I still get bug reports on various configs that I did not test, where my code introduces a warning. I hate to be the one that fails their builds. It's not the configs that have something enabled, its the configs that have something not enabled, where another config depends on it. I'm not against the -Werror. I just don't want to be changing local variables called "main" because it breaks someones build due to some daft warning that the compiler is emitting. -- Steve
Hi! > >> config WERROR > >> bool "Compile the kernel with warnings as errors" > >>- default y > >>+ default COMPILE_TEST > > > >That seems reasonable. It very much is about build-testing. > > That and 2 more things IMO: > > a. having developers be responsible for build warnings, not just > build errors > > b. having maintainers merge them more like they are build errors > and not just some warnings that can be overlooked. > > I don't see enough of a. or b. :( Do we really want developers treat warnings as errors? When the code is okay but some random version of gcc dislikes it... Plus, there's question of stable. We already get ton of churn there ("this fixes random warning"). WERROR will only encourage that... Best regards, Pavel
On Mon, Sep 13, 2021 at 11:32:56AM +0200, Pavel Machek wrote: > Hi! > > > >> config WERROR > > >> bool "Compile the kernel with warnings as errors" > > >>- default y > > >>+ default COMPILE_TEST > > > > > >That seems reasonable. It very much is about build-testing. > > > > That and 2 more things IMO: > > > > a. having developers be responsible for build warnings, not just > > build errors > > > > b. having maintainers merge them more like they are build errors > > and not just some warnings that can be overlooked. > > > > I don't see enough of a. or b. :( > > Do we really want developers treat warnings as errors? When the code > is okay but some random version of gcc dislikes it... > > Plus, there's question of stable. We already get ton of churn there > ("this fixes random warning"). WERROR will only encourage that... I will not be backporting this patch to older stable kernels, but I _want_ to see stable builds build with no warnings. When we add warnings, they are almost always things we need to fix up properly. Over time, I have worked to reduce the number of build warnings in older stable kernels. For newer versions of gcc, sometimes that is impossible, but we are close... thanks, greg k-h
* Pavel Machek: > Do we really want developers treat warnings as errors? When the code > is okay but some random version of gcc dislikes it... There are some warnings-as-errors which are quite reasonable, like -Werror=implicit-function-declaration (which we can't make the compiler default without cleaning up userspace first) and perhaps -Werror=implicit-int. Some other warnings can be used to enforce coding style, and there -Werror could make sense as well (-Werror=vla and others). But there are also warnings which are emitted by the GCC middle-end (the optimizers), and turning on -Werror for those is very problematic. These warnings are very target-specific and also depend on compiler version and optimization parameters. Unfortunately that includes the buffer size warnings based on function attributes (which would otherwise be a good fit for the kernel because it uses few external headers). GCC also lacks a facility to suppress warnings if they concern code that was introduced during optimization and removed again later (e.g. inlining, constant propagation, dead code removal). Thanks, Florian
Hi! > > Do we really want developers treat warnings as errors? When the code > > is okay but some random version of gcc dislikes it... > > > > Plus, there's question of stable. We already get ton of churn there > > ("this fixes random warning"). WERROR will only encourage that... > > I will not be backporting this patch to older stable kernels, but I > _want_ to see stable builds build with no warnings. When we add > warnings, they are almost always things we need to fix up properly. Well, everyone _wants_ to see clean builds... unless the price is too high. > Over time, I have worked to reduce the number of build warnings in older > stable kernels. For newer versions of gcc, sometimes that is > impossible, but we are close... You clearly can't backport this patch, but for 5.16-stable, you'll have it in, and now warnings are same as errors... and I don't believe that's good idea for stable. Best regards, Pavel
On Mon, Sep 13, 2021 at 12:02:30PM +0200, Pavel Machek wrote: > Hi! > > > > Do we really want developers treat warnings as errors? When the code > > > is okay but some random version of gcc dislikes it... > > > > > > Plus, there's question of stable. We already get ton of churn there > > > ("this fixes random warning"). WERROR will only encourage that... > > > > I will not be backporting this patch to older stable kernels, but I > > _want_ to see stable builds build with no warnings. When we add > > warnings, they are almost always things we need to fix up properly. > > Well, everyone _wants_ to see clean builds... unless the price is too > high. > > > Over time, I have worked to reduce the number of build warnings in older > > stable kernels. For newer versions of gcc, sometimes that is > > impossible, but we are close... > > You clearly can't backport this patch, but for 5.16-stable, you'll > have it in, and now warnings are same as errors... and I don't believe > that's good idea for stable. I do, it will force us to keep these trees clean over time. And it will be in 5.15, not 5.16 :) Worst case, we disable it in 4 years when gcc 15 or so generates so many errors we can't resolve them in this old kernel. thanks, greg k-h
On 9/13/21 2:32 AM, Pavel Machek wrote: > Hi! > >>>> config WERROR >>>> bool "Compile the kernel with warnings as errors" >>>> - default y >>>> + default COMPILE_TEST >>> >>> That seems reasonable. It very much is about build-testing. >> >> That and 2 more things IMO: >> >> a. having developers be responsible for build warnings, not just >> build errors >> >> b. having maintainers merge them more like they are build errors >> and not just some warnings that can be overlooked. >> >> I don't see enough of a. or b. :( > > Do we really want developers treat warnings as errors? When the code > is okay but some random version of gcc dislikes it... > > Plus, there's question of stable. We already get ton of churn there > ("this fixes random warning"). WERROR will only encourage that... > All Chrome OS builds are already done with -Werror enabled. Having it enabled in the incoming stable releases will reduce our workload when backporting stable releases. I am actually working on making at least chromeos-5.10 "clean" for allmodconfig builds on arm, arm64, and x86 (everything else is hopeless, and even arm may be futile, but arm64 and x86 seem to be doable). I'd rather have warnings fixed in incoming stable releases than having to pull additional patches into our kernels. Guenter
On Mon, Sep 13, 2021 at 2:50 AM Florian Weimer <fweimer@redhat.com> wrote: > > But there are also warnings which are emitted by the GCC middle-end (the > optimizers), and turning on -Werror for those is very problematic. People say that, but let's face it, that's simply not true. There are real problematic warnings, and we just turn those warnings off. People who want the self-flagellation can enable them with W=1 (or bigger values), and spend their life fighting stupid random compiler warnings that have tons of false positives. But the fact is, I've required a warning-free build on x86-64 for anything I notice for the last several years by now, and it really hasn't been a problem. What _has_ been a problem is that (a) build bots don't care about and (b) the configs I don't personally test (other non-x86-64 architectures stand out, but there's certainly been other configuration issues too). But "bogus compiler warnings" is very much *not* in that list of problems. I've looked at a lot of the warnings that are now errors, and while a number of them have made me go "So why didn't we see that on x86-64?" not one of them has actually made me go "-Werror was wrong". Because EVERY single one I've seen has been for something that should have been fixed. Presumably long long ago, but the warning it generated had been ignored. So stop with the "some warnings just happen" crap. Outside of actual compiler bugs, and truly stupid warnings (that we turn off), that's simply not true. And yes, those compiler bugs happen. The new warning already found one issue with current gcc trunk (non-released). So right now the count is "lots of valid warnings, and one compiler bug that was found _thanks_ to me enabling -Werror". Yes, we have issues with having to work around older compiler bugs. Those aren't going away, and yes, -Werror may well mean that non-x86-64 people now have to deal with them. And yes, this is painful. I'm very much aware of that. But we just need to do it. Because the warnings don't go away on their own, and not making them fatal clearly just means that they'll stay around forever. Linus
On Mon, Sep 13, 2021 at 12:50 PM Pavel Machek <pavel@ucw.cz> wrote: > > > Do we really want developers treat warnings as errors? When the code > > > is okay but some random version of gcc dislikes it... > > > > > > Plus, there's question of stable. We already get ton of churn there > > > ("this fixes random warning"). WERROR will only encourage that... > > > > I will not be backporting this patch to older stable kernels, but I > > _want_ to see stable builds build with no warnings. When we add > > warnings, they are almost always things we need to fix up properly. > > Well, everyone _wants_ to see clean builds... unless the price is too > high. > > > Over time, I have worked to reduce the number of build warnings in older > > stable kernels. For newer versions of gcc, sometimes that is > > impossible, but we are close... > > You clearly can't backport this patch, but for 5.16-stable, you'll > have it in, and now warnings are same as errors... and I don't believe > that's good idea for stable. The good thing about the config option is that there's now a single point to enable or disable -Werror. In the past, maintainers sprinkled -Werror all over the various Makefiles in the tree, which means you have to edit multiple files to disable it again. Background: I've been investigating an issue that involved building old v2.6.x kernels. Apart from having to use very old compilers, it still caused compiler warnings that obviously weren't seen with the slightly different compiler versions used by maintainers who added -Werror. Gr{oetje,eeting}s, Geert
diff --git a/Makefile b/Makefile index d45fc2edf186..6bc1c5b17a62 100644 --- a/Makefile +++ b/Makefile @@ -785,9 +785,6 @@ stackp-flags-$(CONFIG_STACKPROTECTOR_STRONG) := -fstack-protector-strong KBUILD_CFLAGS += $(stackp-flags-y) -KBUILD_CFLAGS-$(CONFIG_WERROR) += -Werror -KBUILD_CFLAGS += $(KBUILD_CFLAGS-y) - ifdef CONFIG_CC_IS_CLANG KBUILD_CPPFLAGS += -Qunused-arguments # The kernel builds with '-std=gnu89' so use of GNU extensions is acceptable. diff --git a/init/Kconfig b/init/Kconfig index 8cb97f141b70..e708180e9a59 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -137,20 +137,6 @@ config COMPILE_TEST here. If you are a user/distributor, say N here to exclude useless drivers to be distributed. -config WERROR - bool "Compile the kernel with warnings as errors" - default y - help - A kernel build should not cause any compiler warnings, and this - enables the '-Werror' flag to enforce that rule by default. - - However, if you have a new (or very old) compiler with odd and - unusual warnings, or you have some architecture with problems, - you may need to disable this config option in order to - successfully build the kernel. - - If in doubt, say Y. - config UAPI_HEADER_TEST bool "Compile test UAPI headers" depends on HEADERS_INSTALL && CC_CAN_LINK
This reverts commit 3fe617ccafd6f5bb33c2391d6f4eeb41c1fd0151. The above commit seems as though it was merged in response to https://lore.kernel.org/linux-hardening/CAHk-=wj4EG=kCOaqyPEq5VXa97kyUHsBpBn3DWwE91qcnDytOQ@mail.gmail.com/. While I can appreciate the intent of enabling -Werror, I don't think it is the right tool to address the root cause of developers not testing certain toolchains or configurations, or taking existing reports they're getting serious enough. Having more appropriate CI or processes in place to prevent untested patches from being merged into mainline may also be worth discussing. I'd also like to see such a patch sent formally to the list for discussion and have time to soak in next rather than be merged directly into mainline without either. -Werror is great for preventing new errors from creeping in when a codebase is free of warnings for all configs and all targets and the toolchain is never updated. Unfortunately, none of the above is the case for the Linux kernel at this time. The addition of new compiler diagnostic flags in the -W group to -Wall make toolchain updates excessively more painful. This can lead to commits that disable warnings rather than work towards addressing them. Some diagnostics are useful but take incredible work or churn to completely free a codebase from them. Warning can be upgraded to errors with -Werror=foo or downgraded from errors back to warnings via -Wno-error=foo. -Wno-error=foo is a double edged sword; it doesn't help you spot the introduction of additional instances of that warning easily. This change has caused nearly all of our CI to go red, and requires us to now disable CONFIG_WERROR until every last target and every last config is addressed. Rather than require everyone to disable the above config to keep builds going, perhaps certain CI systems should instead set CFLAGS_KERNEL=-Werror. Why don't we just fix every warning? We have been, for years, and we're still not done yet. See our issue tracker below, contributors wanted. With more time/active discussion, we can probably land something more appropriate. It should involve the Kbuild maintainer and list. For instance, I have questions around how should such a config interact with randconfigs and allconfigs. This config also seems to duplicate the existing CONFIG_PPC_DISABLE_WERROR without merging the two. I do recognize the irony of someone who's spent a lot of time cleaning up warnings to be advocating for disabling -Werror...it's not lost on me. Our Pixel (née Nexus) team has been effectively carrying an out of tree patch enabling -Werror since before I ever contributed to the kernel. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Brown <broonie@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Link: https://github.com/ClangBuiltLinux/linux/issues/1449 Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> --- Makefile | 3 --- init/Kconfig | 14 -------------- 2 files changed, 17 deletions(-) base-commit: 4b93c544e90e2b28326182d31ee008eb80e02074