diff mbox series

Revert "Enable '-Werror' by default for all kernel builds"

Message ID 20210907183843.33028-1-ndesaulniers@google.com (mailing list archive)
State New, archived
Headers show
Series Revert "Enable '-Werror' by default for all kernel builds" | expand

Commit Message

Nick Desaulniers Sept. 7, 2021, 6:38 p.m. UTC
This reverts commit 3fe617ccafd6f5bb33c2391d6f4eeb41c1fd0151.

The above commit seems as though it was merged in response to
https://lore.kernel.org/linux-hardening/CAHk-=wj4EG=kCOaqyPEq5VXa97kyUHsBpBn3DWwE91qcnDytOQ@mail.gmail.com/.

While I can appreciate the intent of enabling -Werror, I don't think it
is the right tool to address the root cause of developers not testing
certain toolchains or configurations, or taking existing reports they're
getting serious enough.

Having more appropriate CI or processes in place to prevent untested
patches from being merged into mainline may also be worth discussing.

I'd also like to see such a patch sent formally to the list for
discussion and have time to soak in next rather than be merged directly
into mainline without either.

-Werror is great for preventing new errors from creeping in when a
codebase is free of warnings for all configs and all targets and the
toolchain is never updated. Unfortunately, none of the above is the case
for the Linux kernel at this time.

The addition of new compiler diagnostic flags in the -W group to -Wall
make toolchain updates excessively more painful. This can lead to
commits that disable warnings rather than work towards addressing them.
Some diagnostics are useful but take incredible work or churn to
completely free a codebase from them.

Warning can be upgraded to errors with -Werror=foo or downgraded from
errors back to warnings via -Wno-error=foo. -Wno-error=foo is a double
edged sword; it doesn't help you spot the introduction of additional
instances of that warning easily.

This change has caused nearly all of our CI to go red, and requires us
to now disable CONFIG_WERROR until every last target and every last
config is addressed. Rather than require everyone to disable the above
config to keep builds going, perhaps certain CI systems should instead
set CFLAGS_KERNEL=-Werror.

Why don't we just fix every warning? We have been, for years, and we're
still not done yet. See our issue tracker below, contributors wanted.

With more time/active discussion, we can probably land something more
appropriate. It should involve the Kbuild maintainer and list.

For instance, I have questions around how should such a config interact
with randconfigs and allconfigs. This config also seems to duplicate the
existing CONFIG_PPC_DISABLE_WERROR without merging the two.

I do recognize the irony of someone who's spent a lot of time cleaning
up warnings to be advocating for disabling -Werror...it's not lost on
me. Our Pixel (née Nexus) team has been effectively carrying an out of
tree patch enabling -Werror since before I ever contributed to the
kernel.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1449
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
 Makefile     |  3 ---
 init/Kconfig | 14 --------------
 2 files changed, 17 deletions(-)


base-commit: 4b93c544e90e2b28326182d31ee008eb80e02074

Comments

Fangrui Song Sept. 7, 2021, 6:55 p.m. UTC | #1
On Tue, Sep 7, 2021 at 11:39 AM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> This reverts commit 3fe617ccafd6f5bb33c2391d6f4eeb41c1fd0151.
>
> The above commit seems as though it was merged in response to
> https://lore.kernel.org/linux-hardening/CAHk-=wj4EG=kCOaqyPEq5VXa97kyUHsBpBn3DWwE91qcnDytOQ@mail.gmail.com/.
>
> While I can appreciate the intent of enabling -Werror, I don't think it
> is the right tool to address the root cause of developers not testing
> certain toolchains or configurations, or taking existing reports they're
> getting serious enough.
>
> Having more appropriate CI or processes in place to prevent untested
> patches from being merged into mainline may also be worth discussing.

I agree that -Werror by default needs more discussion.
Default WERROR makes building old kernels with new compilers more painful.

CI systems could do a better job surfacing compiler warnings if they
don't do it currently.

> I'd also like to see such a patch sent formally to the list for
> discussion and have time to soak in next rather than be merged directly
> into mainline without either.
>
> -Werror is great for preventing new errors from creeping in when a
> codebase is free of warnings for all configs and all targets and the
> toolchain is never updated. Unfortunately, none of the above is the case
> for the Linux kernel at this time.
>
> The addition of new compiler diagnostic flags in the -W group to -Wall
> make toolchain updates excessively more painful. This can lead to
> commits that disable warnings rather than work towards addressing them.
> Some diagnostics are useful but take incredible work or churn to
> completely free a codebase from them.
>
> Warning can be upgraded to errors with -Werror=foo or downgraded from
> errors back to warnings via -Wno-error=foo. -Wno-error=foo is a double
> edged sword; it doesn't help you spot the introduction of additional
> instances of that warning easily.
>
> This change has caused nearly all of our CI to go red, and requires us
> to now disable CONFIG_WERROR until every last target and every last
> config is addressed. Rather than require everyone to disable the above
> config to keep builds going, perhaps certain CI systems should instead
> set CFLAGS_KERNEL=-Werror.
>
> Why don't we just fix every warning? We have been, for years, and we're
> still not done yet. See our issue tracker below, contributors wanted.
>
> With more time/active discussion, we can probably land something more
> appropriate. It should involve the Kbuild maintainer and list.
>
> For instance, I have questions around how should such a config interact
> with randconfigs and allconfigs. This config also seems to duplicate the
> existing CONFIG_PPC_DISABLE_WERROR without merging the two.
>
> I do recognize the irony of someone who's spent a lot of time cleaning
> up warnings to be advocating for disabling -Werror...it's not lost on
> me. Our Pixel (née Nexus) team has been effectively carrying an out of
> tree patch enabling -Werror since before I ever contributed to the
> kernel.
>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Masahiro Yamada <masahiroy@kernel.org>
> Cc: Nathan Chancellor <nathan@kernel.org>
> Link: https://github.com/ClangBuiltLinux/linux/issues/1449
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> ---
>  Makefile     |  3 ---
>  init/Kconfig | 14 --------------
>  2 files changed, 17 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index d45fc2edf186..6bc1c5b17a62 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -785,9 +785,6 @@ stackp-flags-$(CONFIG_STACKPROTECTOR_STRONG)      := -fstack-protector-strong
>
>  KBUILD_CFLAGS += $(stackp-flags-y)
>
> -KBUILD_CFLAGS-$(CONFIG_WERROR) += -Werror
> -KBUILD_CFLAGS += $(KBUILD_CFLAGS-y)
> -
>  ifdef CONFIG_CC_IS_CLANG
>  KBUILD_CPPFLAGS += -Qunused-arguments
>  # The kernel builds with '-std=gnu89' so use of GNU extensions is acceptable.
> diff --git a/init/Kconfig b/init/Kconfig
> index 8cb97f141b70..e708180e9a59 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -137,20 +137,6 @@ config COMPILE_TEST
>           here. If you are a user/distributor, say N here to exclude useless
>           drivers to be distributed.
>
> -config WERROR
> -       bool "Compile the kernel with warnings as errors"
> -       default y
> -       help
> -         A kernel build should not cause any compiler warnings, and this
> -         enables the '-Werror' flag to enforce that rule by default.
> -
> -         However, if you have a new (or very old) compiler with odd and
> -         unusual warnings, or you have some architecture with problems,
> -         you may need to disable this config option in order to
> -         successfully build the kernel.
> -
> -         If in doubt, say Y.
> -
>  config UAPI_HEADER_TEST
>         bool "Compile test UAPI headers"
>         depends on HEADERS_INSTALL && CC_CAN_LINK
>
> base-commit: 4b93c544e90e2b28326182d31ee008eb80e02074
> --
> 2.33.0.153.gba50c8fa24-goog
>
Linus Torvalds Sept. 7, 2021, 7:16 p.m. UTC | #2
On Tue, Sep 7, 2021 at 11:39 AM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> The above commit seems as though it was merged in response to
> https://lore.kernel.org/linux-hardening/CAHk-=wj4EG=kCOaqyPEq5VXa97kyUHsBpBn3DWwE91qcnDytOQ@mail.gmail.com/.

No. It was merged in response of _years_ of pain, with the last one
just being the final drop.

I'm not going to revert that change. I probably will have to limit it
(by making that WERROR option depend on certain expectations), but
basically any maintainer who has code that causes warnings should
expect that they will have to fix those warnings.

If it's clang that generates bogus warnings, then we'll have to start
disable clang warnings. The clang people tend to be proud of thir
fewer false positives, but so far looking at things, I am not
convinced.

And I'm most definitely not convinced when the "let's finally enable
-Werror after years of talking about it", people end up going "but but
but I have thousands of warnings".

That's the POINT of that commit. That "but but but I have thousands of
warnings" is not acceptable.

I spent hours yesterday getting rid of some warnings. It shouldn't be
on me fixing peoples code. It shouldn't be on me noticing that people
send me crap that warns.

And it really shouldn't be "Linus cares about warnings, so
configurations that Linus doesn't test can continue for years to have
them".

My "no warnings" policy isn't exactly new, and people shouldn't be
shocked when I then say "time to clean up *YOUR* house too".

            Linus
Nick Desaulniers Sept. 7, 2021, 8:30 p.m. UTC | #3
On Tue, Sep 7, 2021 at 12:16 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, Sep 7, 2021 at 11:39 AM Nick Desaulniers
> <ndesaulniers@google.com> wrote:
> >
> > The above commit seems as though it was merged in response to
> > https://lore.kernel.org/linux-hardening/CAHk-=wj4EG=kCOaqyPEq5VXa97kyUHsBpBn3DWwE91qcnDytOQ@mail.gmail.com/.
>
> No. It was merged in response of _years_ of pain, with the last one
> just being the final drop.

Well, I think it's important to enumerate some of the pain points
explicitly so that we might all better understand what they are, and
come to an agreement together on what the right way to resolve them
is.

Impulse commits without discussion over a holiday weekend are firmly
in the category of "I wish you'd rather not have done that."

Let's look closer at the final drop.  I think there's a bigger question here:
1. did the CI have enough time to run in multiple configurations with
multiple toolchains to even find a warning, let alone report it?
2. what did the developer do with the report (if they even got it, see
1 above)? Ignore it? Proceed with the PR anyways?

it looks to me as though the "final drop" fits into 1 above. I'm not
sure that -Werror is the correct way to resolve issues from that.

> I'm not going to revert that change. I probably will have to limit it
> (by making that WERROR option depend on certain expectations), but
> basically any maintainer who has code that causes warnings should
> expect that they will have to fix those warnings.

I'm not 100% against it; I think it could land in a more useful
variation. But it would be good to discuss that on-list, and give it
time to soak. This is a conversation we should be having with CI
maintainers IMO, and folks that maintain the build infra, at least.
I'm happy to kick off that discussion with this RFC.

> If it's clang that generates bogus warnings, then we'll have to start
> disable clang warnings. The clang people tend to be proud of thir
> fewer false positives, but so far looking at things, I am not
> convinced.

Now you're just attacking a strawman.

> And I'm most definitely not convinced when the "let's finally enable
> -Werror after years of talking about it", people end up going "but but
> but I have thousands of warnings".

The kernel is full of thousands of warnings at the moment.  It might
not be for your limited set of configs, targets, and toolchains, but
the folks running CI are very very well aware that the kernel is far
from enabling -Werror seriously.

Any given maintainer sending you a PR cannot (and should not, IMO)
know under what combination of configs, targets, and toolchains you'll
test under, and -Werror isn't going to help them figure it out.

Not every commit that makes its way to mainline has spent equal
amounts of soak time in -next. (I suspect there are commits going into
mainline that spent zero time in -next, not just from you).  Patches
merged by maintainers shortly before the merge window open have
minimal coverage compared to older commits they've accepted.  So CI
systems can't find diagnostics (warnings/errors) in various
combinations of config/target/toolchain if patches are skipping -next
or spending a short amount of time in -next.

To be clear, you have merged patches into mainline that broke the
build for combinations of configs/targets/toolchains that you are not
testing.  It's not realistic for you or any one person to test all
such combinations either.  Other software projects have solved this by
relying on bots to handle merges. Bots that don't forget to test
combinations of various targets, or enable developers to ignore
warnings (if the warnings are even reported).

> That's the POINT of that commit. That "but but but I have thousands of
> warnings" is not acceptable.
>
> I spent hours yesterday getting rid of some warnings. It shouldn't be
> on me fixing peoples code. It shouldn't be on me noticing that people
> send me crap that warns.

I agree. I disagree that -Werror will solve that.  Developers will get
the same report from 0day bot, just now as an error rather than
warning, and they will continue to ignore the report because "well it
works for me."  (Maybe I'm attacking the strawman now).

> And it really shouldn't be "Linus cares about warnings, so
> configurations that Linus doesn't test can continue for years to have
> them".

I agree, and we have been making progress on this. I'd say the advent
of the various CI systems over the past 5 years is a much welcomed
improvement in at least understanding which combinations are exposing
issues.

> My "no warnings" policy isn't exactly new, and people shouldn't be
> shocked when I then say "time to clean up *YOUR* house too".

We have been cleaning house, for nearly half a decade in my
experience. Fixing warnings in the Linux kernel for all possible
configs, targets, and toolchains while the toolchains continue to add
more diagnostics is more sisyphean than digging a hole in wet sand.
Marco Elver Sept. 7, 2021, 10:14 p.m. UTC | #4
On Tue, Sep 07, 2021 at 01:30PM -0700, Nick Desaulniers wrote:
> On Tue, Sep 7, 2021 at 12:16 PM Linus Torvalds <torvalds@linux-foundation.org> wrote:
[...]
> > I'm not going to revert that change. I probably will have to limit it
> > (by making that WERROR option depend on certain expectations), but
> > basically any maintainer who has code that causes warnings should
> > expect that they will have to fix those warnings.
> 
> I'm not 100% against it; I think it could land in a more useful
> variation. But it would be good to discuss that on-list, and give it
> time to soak. This is a conversation we should be having with CI
> maintainers IMO, and folks that maintain the build infra, at least.
> I'm happy to kick off that discussion with this RFC.

Here's a datapoint: I had to disable CONFIG_WERROR on a bunch of syzbot
instances which started failing because of -Werror [1], because syzbot's
time is better spent on fuzzing, and having the odd warning in some
subsystem penalize fuzzing of the entire kernel is not appropriate.

[1] https://github.com/google/syzkaller/commit/e096c0a2a414e487412c9669426780ce5acdde9d

Getting the kernel built is a hard requirement for any sort of runtime
testing. Once the kernel is built, runtime testing of various subsystems
can proceed in parallel. A single warning in some odd subsystem
penalizing the _entire_ kernel's testing progress is inappropriate. The
severity of a use-after-free bug found by runtime testing is orders of
magnitude more severe than some "unused variable" warning. Now such a
warning would delay finding bugs at runtime on many CI systems that
haven't yet disabled the warning.

I'm predicting most distributions and runtime-focused CIs will disable
the warning.

I have formulated this in the form of a patch below, that might move
this new Kconfig option towards its appropriate usecases by default.

The intent is not to dispute the usefulness of -Werror, but select the
appropriate usecases by default, limiting friction for all those who can
do little more than say CONFIG_WERROR=n.

Thanks,
-- Marco

------ >8 ------

From: Marco Elver <elver@google.com>
Date: Tue, 7 Sep 2021 23:12:08 +0200
Subject: [PATCH] kbuild: Only default to -Werror if COMPILE_TEST

The cross-product of the kernel's supported toolchains, architectures,
and configuration options is large. So large, that it's generally
accepted to be infeasible to enumerate and build+test them all
(many compile-testers rely on randomly generated configs).

Without the possibility to enumerate all possible combinations of
toolchains, architectures, and configuration options, it is inevitable
that compiler warnings in this space exist.

With -Werror, this means that an innumerable set of kernels are now
broken, yet had been perfectly usable before (confused compilers, code
with warnings unused, or luck).

Distributors will necessarily pick a point in the toolchain X arch X
config space, and if unlucky, will have a broken build. Granted, those
will likely disable CONFIG_WERROR and move on.

The kernel's default configuration is unlikely to be suitable for all
users, but it's inappropriate to force many users to set CONFIG_WERROR=n.

This also holds for CI systems which are focused on runtime testing,
where the odd warning in some subsystem will disrupt testing of the rest
of the kernel. Many of those runtime-focused CI systems run tests or
fuzz the kernel using runtime debugging tools. Runtime testing of
different subsystems can proceed in parallel, and potentially uncover
serious bugs; halting runtime testing of the entire kernel because of
the odd warning (now error) in a subsystem or driver is simply
inappropriate.

Therefore, runtime-focused CI systems will likely choose CONFIG_WERROR=n
as well.

The appropriate usecase for -Werror is therefore compile-test focused
builds (often done by developers or CI systems).

Reflect this in the Kconfig option by making the default value of WERROR
match COMPILE_TEST.

Signed-off-by: Marco Elver <elver@google.com>
---
 init/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 8cb97f141b70..11f8a845f259 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -139,7 +139,7 @@ config COMPILE_TEST
 
 config WERROR
 	bool "Compile the kernel with warnings as errors"
-	default y
+	default COMPILE_TEST
 	help
 	  A kernel build should not cause any compiler warnings, and this
 	  enables the '-Werror' flag to enforce that rule by default.
Linus Torvalds Sept. 7, 2021, 10:18 p.m. UTC | #5
On Tue, Sep 7, 2021 at 3:14 PM Marco Elver <elver@google.com> wrote:
>
>
>  config WERROR
>         bool "Compile the kernel with warnings as errors"
> -       default y
> +       default COMPILE_TEST

That seems reasonable. It very much is about build-testing.

              Linus
Guenter Roeck Sept. 7, 2021, 10:28 p.m. UTC | #6
On 9/7/21 3:14 PM, Marco Elver wrote:
> On Tue, Sep 07, 2021 at 01:30PM -0700, Nick Desaulniers wrote:
>> On Tue, Sep 7, 2021 at 12:16 PM Linus Torvalds <torvalds@linux-foundation.org> wrote:
> [...]
>>> I'm not going to revert that change. I probably will have to limit it
>>> (by making that WERROR option depend on certain expectations), but
>>> basically any maintainer who has code that causes warnings should
>>> expect that they will have to fix those warnings.
>>
>> I'm not 100% against it; I think it could land in a more useful
>> variation. But it would be good to discuss that on-list, and give it
>> time to soak. This is a conversation we should be having with CI
>> maintainers IMO, and folks that maintain the build infra, at least.
>> I'm happy to kick off that discussion with this RFC.
> 
> Here's a datapoint: I had to disable CONFIG_WERROR on a bunch of syzbot
> instances which started failing because of -Werror [1], because syzbot's
> time is better spent on fuzzing, and having the odd warning in some
> subsystem penalize fuzzing of the entire kernel is not appropriate.
> 
> [1] https://github.com/google/syzkaller/commit/e096c0a2a414e487412c9669426780ce5acdde9d
> 
> Getting the kernel built is a hard requirement for any sort of runtime
> testing. Once the kernel is built, runtime testing of various subsystems
> can proceed in parallel. A single warning in some odd subsystem
> penalizing the _entire_ kernel's testing progress is inappropriate. The
> severity of a use-after-free bug found by runtime testing is orders of
> magnitude more severe than some "unused variable" warning. Now such a
> warning would delay finding bugs at runtime on many CI systems that
> haven't yet disabled the warning.
> 
> I'm predicting most distributions and runtime-focused CIs will disable
> the warning.
> 
> I have formulated this in the form of a patch below, that might move
> this new Kconfig option towards its appropriate usecases by default.
> 
> The intent is not to dispute the usefulness of -Werror, but select the
> appropriate usecases by default, limiting friction for all those who can
> do little more than say CONFIG_WERROR=n.
> 
> Thanks,
> -- Marco
> 
> ------ >8 ------
> 
> From: Marco Elver <elver@google.com>
> Date: Tue, 7 Sep 2021 23:12:08 +0200
> Subject: [PATCH] kbuild: Only default to -Werror if COMPILE_TEST
> 
> The cross-product of the kernel's supported toolchains, architectures,
> and configuration options is large. So large, that it's generally
> accepted to be infeasible to enumerate and build+test them all
> (many compile-testers rely on randomly generated configs).
> 
> Without the possibility to enumerate all possible combinations of
> toolchains, architectures, and configuration options, it is inevitable
> that compiler warnings in this space exist.
> 
> With -Werror, this means that an innumerable set of kernels are now
> broken, yet had been perfectly usable before (confused compilers, code
> with warnings unused, or luck).
> 
> Distributors will necessarily pick a point in the toolchain X arch X
> config space, and if unlucky, will have a broken build. Granted, those
> will likely disable CONFIG_WERROR and move on.
> 
> The kernel's default configuration is unlikely to be suitable for all
> users, but it's inappropriate to force many users to set CONFIG_WERROR=n.
> 
> This also holds for CI systems which are focused on runtime testing,
> where the odd warning in some subsystem will disrupt testing of the rest
> of the kernel. Many of those runtime-focused CI systems run tests or
> fuzz the kernel using runtime debugging tools. Runtime testing of
> different subsystems can proceed in parallel, and potentially uncover
> serious bugs; halting runtime testing of the entire kernel because of
> the odd warning (now error) in a subsystem or driver is simply
> inappropriate.
> 
> Therefore, runtime-focused CI systems will likely choose CONFIG_WERROR=n
> as well.
> 
> The appropriate usecase for -Werror is therefore compile-test focused
> builds (often done by developers or CI systems).
> 
> Reflect this in the Kconfig option by making the default value of WERROR
> match COMPILE_TEST.
> 
> Signed-off-by: Marco Elver <elver@google.com>

I like it.

Acked-by: Guenter Roeck <linux@roeck-us.net>

> ---
>   init/Kconfig | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index 8cb97f141b70..11f8a845f259 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -139,7 +139,7 @@ config COMPILE_TEST
>   
>   config WERROR
>   	bool "Compile the kernel with warnings as errors"
> -	default y
> +	default COMPILE_TEST
>   	help
>   	  A kernel build should not cause any compiler warnings, and this
>   	  enables the '-Werror' flag to enforce that rule by default.
>
Randy Dunlap Sept. 7, 2021, 10:33 p.m. UTC | #7
On 9/7/21 3:18 PM, Linus Torvalds wrote:
> On Tue, Sep 7, 2021 at 3:14 PM Marco Elver <elver@google.com> wrote:
>>
>>
>>   config WERROR
>>          bool "Compile the kernel with warnings as errors"
>> -       default y
>> +       default COMPILE_TEST
> 
> That seems reasonable. It very much is about build-testing.

That and 2 more things IMO:

a. having developers be responsible for build warnings, not just
    build errors

b. having maintainers merge them more like they are build errors
    and not just some warnings that can be overlooked.

I don't see enough of a. or b.  :(
Segher Boessenkool Sept. 7, 2021, 10:42 p.m. UTC | #8
On Wed, Sep 08, 2021 at 12:14:19AM +0200, Marco Elver wrote:
> Here's a datapoint: I had to disable CONFIG_WERROR on a bunch of syzbot
> instances which started failing because of -Werror [1], because syzbot's
> time is better spent on fuzzing, and having the odd warning in some
> subsystem penalize fuzzing of the entire kernel is not appropriate.

Similarly, I have to disable -Werror (which various archs and subsystems
already use) whenever I test building the kernel with new toolchains.
It is the biggest set of kernel patches I keep, already, since many
years.

I actually have good hopes that a centralised -Werror thing will make
this easier :-)

Maybe there can be an E=[01] kernel build flag to disable / enable
CONFIG_WERROR?  Something that will override it for just that command.
This would make life easier for many use cases, while at the same time
not being something that people can "forget" they did.


Segher
Mark Brown Sept. 7, 2021, 10:55 p.m. UTC | #9
On Wed, Sep 08, 2021 at 12:14:19AM +0200, Marco Elver wrote:

> I'm predicting most distributions and runtime-focused CIs will disable
> the warning.

Yes, indeed.

> Date: Tue, 7 Sep 2021 23:12:08 +0200
> Subject: [PATCH] kbuild: Only default to -Werror if COMPILE_TEST
> 
> The cross-product of the kernel's supported toolchains, architectures,
> and configuration options is large. So large, that it's generally
> accepted to be infeasible to enumerate and build+test them all
> (many compile-testers rely on randomly generated configs).

Reviwed-by: Mark Brown <broonie@kernel.org>
Nathan Chancellor Sept. 7, 2021, 11 p.m. UTC | #10
On Wed, Sep 08, 2021 at 12:14:19AM +0200, Marco Elver wrote:
> On Tue, Sep 07, 2021 at 01:30PM -0700, Nick Desaulniers wrote:
> > On Tue, Sep 7, 2021 at 12:16 PM Linus Torvalds <torvalds@linux-foundation.org> wrote:
> [...]
> > > I'm not going to revert that change. I probably will have to limit it
> > > (by making that WERROR option depend on certain expectations), but
> > > basically any maintainer who has code that causes warnings should
> > > expect that they will have to fix those warnings.
> > 
> > I'm not 100% against it; I think it could land in a more useful
> > variation. But it would be good to discuss that on-list, and give it
> > time to soak. This is a conversation we should be having with CI
> > maintainers IMO, and folks that maintain the build infra, at least.
> > I'm happy to kick off that discussion with this RFC.
> 
> Here's a datapoint: I had to disable CONFIG_WERROR on a bunch of syzbot
> instances which started failing because of -Werror [1], because syzbot's
> time is better spent on fuzzing, and having the odd warning in some
> subsystem penalize fuzzing of the entire kernel is not appropriate.
> 
> [1] https://github.com/google/syzkaller/commit/e096c0a2a414e487412c9669426780ce5acdde9d
> 
> Getting the kernel built is a hard requirement for any sort of runtime
> testing. Once the kernel is built, runtime testing of various subsystems
> can proceed in parallel. A single warning in some odd subsystem
> penalizing the _entire_ kernel's testing progress is inappropriate. The
> severity of a use-after-free bug found by runtime testing is orders of
> magnitude more severe than some "unused variable" warning. Now such a
> warning would delay finding bugs at runtime on many CI systems that
> haven't yet disabled the warning.
> 
> I'm predicting most distributions and runtime-focused CIs will disable
> the warning.
> 
> I have formulated this in the form of a patch below, that might move
> this new Kconfig option towards its appropriate usecases by default.
> 
> The intent is not to dispute the usefulness of -Werror, but select the
> appropriate usecases by default, limiting friction for all those who can
> do little more than say CONFIG_WERROR=n.
> 
> Thanks,
> -- Marco
> 
> ------ >8 ------
> 
> From: Marco Elver <elver@google.com>
> Date: Tue, 7 Sep 2021 23:12:08 +0200
> Subject: [PATCH] kbuild: Only default to -Werror if COMPILE_TEST
> 
> The cross-product of the kernel's supported toolchains, architectures,
> and configuration options is large. So large, that it's generally
> accepted to be infeasible to enumerate and build+test them all
> (many compile-testers rely on randomly generated configs).
> 
> Without the possibility to enumerate all possible combinations of
> toolchains, architectures, and configuration options, it is inevitable
> that compiler warnings in this space exist.
> 
> With -Werror, this means that an innumerable set of kernels are now
> broken, yet had been perfectly usable before (confused compilers, code
> with warnings unused, or luck).
> 
> Distributors will necessarily pick a point in the toolchain X arch X
> config space, and if unlucky, will have a broken build. Granted, those
> will likely disable CONFIG_WERROR and move on.
> 
> The kernel's default configuration is unlikely to be suitable for all
> users, but it's inappropriate to force many users to set CONFIG_WERROR=n.
> 
> This also holds for CI systems which are focused on runtime testing,
> where the odd warning in some subsystem will disrupt testing of the rest
> of the kernel. Many of those runtime-focused CI systems run tests or
> fuzz the kernel using runtime debugging tools. Runtime testing of
> different subsystems can proceed in parallel, and potentially uncover
> serious bugs; halting runtime testing of the entire kernel because of
> the odd warning (now error) in a subsystem or driver is simply
> inappropriate.
> 
> Therefore, runtime-focused CI systems will likely choose CONFIG_WERROR=n
> as well.
> 
> The appropriate usecase for -Werror is therefore compile-test focused
> builds (often done by developers or CI systems).
> 
> Reflect this in the Kconfig option by making the default value of WERROR
> match COMPILE_TEST.
> 
> Signed-off-by: Marco Elver <elver@google.com>

Reviewed-by: Nathan Chancellor <nathan@kernel.org>

> ---
>  init/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index 8cb97f141b70..11f8a845f259 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -139,7 +139,7 @@ config COMPILE_TEST
>  
>  config WERROR
>  	bool "Compile the kernel with warnings as errors"
> -	default y
> +	default COMPILE_TEST
>  	help
>  	  A kernel build should not cause any compiler warnings, and this
>  	  enables the '-Werror' flag to enforce that rule by default.
> -- 
> 2.33.0.153.gba50c8fa24-goog
Mark Brown Sept. 7, 2021, 11:35 p.m. UTC | #11
On Tue, Sep 07, 2021 at 12:16:22PM -0700, Linus Torvalds wrote:
> On Tue, Sep 7, 2021 at 11:39 AM Nick Desaulniers
> <ndesaulniers@google.com> wrote:

> > The above commit seems as though it was merged in response to
> > https://lore.kernel.org/linux-hardening/CAHk-=wj4EG=kCOaqyPEq5VXa97kyUHsBpBn3DWwE91qcnDytOQ@mail.gmail.com/.

> No. It was merged in response of _years_ of pain, with the last one
> just being the final drop.

> I'm not going to revert that change. I probably will have to limit it
> (by making that WERROR option depend on certain expectations), but
> basically any maintainer who has code that causes warnings should
> expect that they will have to fix those warnings.

Echoing what others have said about runtime testing having -Werror on by
default for defconfigs is going to cause issues for bisection and just
generally noticing promptly when runtime issues are introduced - people
or systems bisecting boot or testsuite issues will encounter more blocks
of commits that they skip due to unrelated issues, and it's much more
likely that we'll just have gaps in coverage when for example the day's
-next has some random warning on some platforms.  In terms of
prioritisation it feels like the wrong call to have -Werror on
everywhere.

The bisection really is helpful, especially when the people looking at
the problem don't have direct access the systems to be able to run tests
themselves - it really increases the quality of reports when the
automation in CI services is able to identify the likely commit, and
makes it *much* more likely those reports be sent in the first place.

Of course this isn't insurmountable, the test systems can always add
config fragments to tweak the configurations they're testing (they
already need to do this to run lots of kselftest for example) but that's
something we've generally tried to minimise since everyone working with
custom configs has generally caused friction in the past.

> And it really shouldn't be "Linus cares about warnings, so
> configurations that Linus doesn't test can continue for years to have
> them".

> My "no warnings" policy isn't exactly new, and people shouldn't be
> shocked when I then say "time to clean up *YOUR* house too".

IME pushing on this stuff it's not that people don't care, it's that
they naturally have different test coverage due to their particular
interests and the resources available to them.  Even people who are very
dilligent in their own testing and paying a lot of attention to -next
and the automated test stuff that's out there are going to introduce
some breakage from time to time, and sometimes there's good reasons for
processes to get short circuited.

Some of this is also on the people doing the more niche stuff to keep on
top of reporting issues that come up as they do so, they should be able
to expect that people will pay attention when they do so but the more
niche you get the less surprising it is when nobody else notices issues.
Steven Rostedt Sept. 8, 2021, 4:12 p.m. UTC | #12
On Tue, Sep 07, 2021 at 12:16:22PM -0700, Linus Torvalds wrote:
> 
> That's the POINT of that commit. That "but but but I have thousands of
> warnings" is not acceptable.

I'm actually surprised you did this after the discussion with gcc warning
about using "main" as a local variable.

> My "no warnings" policy isn't exactly new, and people shouldn't be
> shocked when I then say "time to clean up *YOUR* house too".

Note, ktest has a way to create a list of current warnings, and then test your
code against it, and will fail on any new warning. I run this on all my pull
requests to make sure that I do not introduce any new warnings.

That said, I still get bug reports on various configs that I did not test,
where my code introduces a warning. I hate to be the one that fails their
builds.

It's not the configs that have something enabled, its the configs that have
something not enabled, where another config depends on it.

I'm not against the -Werror. I just don't want to be changing local variables
called "main" because it breaks someones build due to some daft warning that
the compiler is emitting.

-- Steve
Pavel Machek Sept. 13, 2021, 9:32 a.m. UTC | #13
Hi!

> >>  config WERROR
> >>         bool "Compile the kernel with warnings as errors"
> >>-       default y
> >>+       default COMPILE_TEST
> >
> >That seems reasonable. It very much is about build-testing.
> 
> That and 2 more things IMO:
> 
> a. having developers be responsible for build warnings, not just
>    build errors
> 
> b. having maintainers merge them more like they are build errors
>    and not just some warnings that can be overlooked.
> 
> I don't see enough of a. or b.  :(

Do we really want developers treat warnings as errors? When the code
is okay but some random version of gcc dislikes it...

Plus, there's question of stable. We already get ton of churn there
("this fixes random warning"). WERROR will only encourage that...

Best regards,
								Pavel
Greg Kroah-Hartman Sept. 13, 2021, 9:46 a.m. UTC | #14
On Mon, Sep 13, 2021 at 11:32:56AM +0200, Pavel Machek wrote:
> Hi!
> 
> > >>  config WERROR
> > >>         bool "Compile the kernel with warnings as errors"
> > >>-       default y
> > >>+       default COMPILE_TEST
> > >
> > >That seems reasonable. It very much is about build-testing.
> > 
> > That and 2 more things IMO:
> > 
> > a. having developers be responsible for build warnings, not just
> >    build errors
> > 
> > b. having maintainers merge them more like they are build errors
> >    and not just some warnings that can be overlooked.
> > 
> > I don't see enough of a. or b.  :(
> 
> Do we really want developers treat warnings as errors? When the code
> is okay but some random version of gcc dislikes it...
> 
> Plus, there's question of stable. We already get ton of churn there
> ("this fixes random warning"). WERROR will only encourage that...

I will not be backporting this patch to older stable kernels, but I
_want_ to see stable builds build with no warnings.  When we add
warnings, they are almost always things we need to fix up properly.

Over time, I have worked to reduce the number of build warnings in older
stable kernels.  For newer versions of gcc, sometimes that is
impossible, but we are close...

thanks,

greg k-h
Florian Weimer Sept. 13, 2021, 9:50 a.m. UTC | #15
* Pavel Machek:

> Do we really want developers treat warnings as errors? When the code
> is okay but some random version of gcc dislikes it...

There are some warnings-as-errors which are quite reasonable, like
-Werror=implicit-function-declaration (which we can't make the compiler
default without cleaning up userspace first) and perhaps
-Werror=implicit-int.  Some other warnings can be used to enforce coding
style, and there -Werror could make sense as well (-Werror=vla and
others).

But there are also warnings which are emitted by the GCC middle-end (the
optimizers), and turning on -Werror for those is very problematic.
These warnings are very target-specific and also depend on compiler
version and optimization parameters.  Unfortunately that includes the
buffer size warnings based on function attributes (which would otherwise
be a good fit for the kernel because it uses few external headers).

GCC also lacks a facility to suppress warnings if they concern code that
was introduced during optimization and removed again later
(e.g. inlining, constant propagation, dead code removal).

Thanks,
Florian
Pavel Machek Sept. 13, 2021, 10:02 a.m. UTC | #16
Hi!

> > Do we really want developers treat warnings as errors? When the code
> > is okay but some random version of gcc dislikes it...
> > 
> > Plus, there's question of stable. We already get ton of churn there
> > ("this fixes random warning"). WERROR will only encourage that...
> 
> I will not be backporting this patch to older stable kernels, but I
> _want_ to see stable builds build with no warnings.  When we add
> warnings, they are almost always things we need to fix up properly.

Well, everyone _wants_ to see clean builds... unless the price is too
high.

> Over time, I have worked to reduce the number of build warnings in older
> stable kernels.  For newer versions of gcc, sometimes that is
> impossible, but we are close...

You clearly can't backport this patch, but for 5.16-stable, you'll
have it in, and now warnings are same as errors... and I don't believe
that's good idea for stable.

Best regards,
								Pavel
Greg Kroah-Hartman Sept. 13, 2021, 10:51 a.m. UTC | #17
On Mon, Sep 13, 2021 at 12:02:30PM +0200, Pavel Machek wrote:
> Hi!
> 
> > > Do we really want developers treat warnings as errors? When the code
> > > is okay but some random version of gcc dislikes it...
> > > 
> > > Plus, there's question of stable. We already get ton of churn there
> > > ("this fixes random warning"). WERROR will only encourage that...
> > 
> > I will not be backporting this patch to older stable kernels, but I
> > _want_ to see stable builds build with no warnings.  When we add
> > warnings, they are almost always things we need to fix up properly.
> 
> Well, everyone _wants_ to see clean builds... unless the price is too
> high.
> 
> > Over time, I have worked to reduce the number of build warnings in older
> > stable kernels.  For newer versions of gcc, sometimes that is
> > impossible, but we are close...
> 
> You clearly can't backport this patch, but for 5.16-stable, you'll
> have it in, and now warnings are same as errors... and I don't believe
> that's good idea for stable.

I do, it will force us to keep these trees clean over time.

And it will be in 5.15, not 5.16 :)

Worst case, we disable it in 4 years when gcc 15 or so generates so
many errors we can't resolve them in this old kernel.

thanks,

greg k-h
Guenter Roeck Sept. 13, 2021, 2:33 p.m. UTC | #18
On 9/13/21 2:32 AM, Pavel Machek wrote:
> Hi!
> 
>>>>   config WERROR
>>>>          bool "Compile the kernel with warnings as errors"
>>>> -       default y
>>>> +       default COMPILE_TEST
>>>
>>> That seems reasonable. It very much is about build-testing.
>>
>> That and 2 more things IMO:
>>
>> a. having developers be responsible for build warnings, not just
>>     build errors
>>
>> b. having maintainers merge them more like they are build errors
>>     and not just some warnings that can be overlooked.
>>
>> I don't see enough of a. or b.  :(
> 
> Do we really want developers treat warnings as errors? When the code
> is okay but some random version of gcc dislikes it...
> 
> Plus, there's question of stable. We already get ton of churn there
> ("this fixes random warning"). WERROR will only encourage that...
> 

All Chrome OS builds are already done with -Werror enabled. Having it
enabled in the incoming stable releases will reduce our workload when
backporting stable releases. I am actually working on making at
least chromeos-5.10 "clean" for allmodconfig builds on arm, arm64,
and x86 (everything else is hopeless, and even arm may be futile,
but arm64 and x86 seem to be doable).

I'd rather have warnings fixed in incoming stable releases than having
to pull additional patches into our kernels.

Guenter
Linus Torvalds Sept. 13, 2021, 5:42 p.m. UTC | #19
On Mon, Sep 13, 2021 at 2:50 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> But there are also warnings which are emitted by the GCC middle-end (the
> optimizers), and turning on -Werror for those is very problematic.

People say that, but let's face it, that's simply not true.

There are real problematic warnings, and we just turn those warnings
off. People who want the self-flagellation can enable them with W=1
(or bigger values), and spend their life fighting stupid random
compiler warnings that have tons of false positives.

But the fact is, I've required a warning-free build on x86-64 for
anything I notice for the last several years by now, and it really
hasn't been a problem.

What _has_ been a problem is that (a) build bots don't care about and
(b) the configs I don't personally test (other non-x86-64
architectures stand out, but there's certainly been other
configuration issues too).

But "bogus compiler warnings" is very much *not* in that list of problems.

I've looked at a lot of the warnings that are now errors, and while a
number of them have made me go "So why didn't we see that on x86-64?"
not one of them has actually made me go "-Werror was wrong".

Because EVERY single one I've seen has been for something that should
have been fixed. Presumably long long ago, but the warning it
generated had been ignored.

So stop with the "some warnings just happen" crap. Outside of actual
compiler bugs, and truly stupid warnings (that we turn off), that's
simply not true.

And yes, those compiler bugs happen. The new warning already found one
issue with current gcc trunk (non-released). So right now the count is
"lots of valid warnings, and one compiler bug that was found _thanks_
to me enabling -Werror".

Yes, we have issues with having to work around older compiler bugs.
Those aren't going away, and yes, -Werror may well mean that
non-x86-64 people now have to deal with them.

And yes, this is painful. I'm very much aware of that. But we just
need to do it. Because the warnings don't go away on their own, and
not making them fatal clearly just means that they'll stay around
forever.

           Linus
Geert Uytterhoeven Sept. 20, 2021, 4:26 p.m. UTC | #20
On Mon, Sep 13, 2021 at 12:50 PM Pavel Machek <pavel@ucw.cz> wrote:
> > > Do we really want developers treat warnings as errors? When the code
> > > is okay but some random version of gcc dislikes it...
> > >
> > > Plus, there's question of stable. We already get ton of churn there
> > > ("this fixes random warning"). WERROR will only encourage that...
> >
> > I will not be backporting this patch to older stable kernels, but I
> > _want_ to see stable builds build with no warnings.  When we add
> > warnings, they are almost always things we need to fix up properly.
>
> Well, everyone _wants_ to see clean builds... unless the price is too
> high.
>
> > Over time, I have worked to reduce the number of build warnings in older
> > stable kernels.  For newer versions of gcc, sometimes that is
> > impossible, but we are close...
>
> You clearly can't backport this patch, but for 5.16-stable, you'll
> have it in, and now warnings are same as errors... and I don't believe
> that's good idea for stable.

The good thing about the config option is that there's now a single point
to enable or disable -Werror.  In the past, maintainers sprinkled -Werror
all over the various Makefiles in the tree, which means you have to
edit multiple files to disable it again.

Background: I've been investigating an issue that involved building old
v2.6.x kernels. Apart from having to use very old compilers, it still caused
compiler warnings that obviously weren't seen with the slightly different
compiler versions used by maintainers who added -Werror.

Gr{oetje,eeting}s,

                        Geert
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index d45fc2edf186..6bc1c5b17a62 100644
--- a/Makefile
+++ b/Makefile
@@ -785,9 +785,6 @@  stackp-flags-$(CONFIG_STACKPROTECTOR_STRONG)      := -fstack-protector-strong
 
 KBUILD_CFLAGS += $(stackp-flags-y)
 
-KBUILD_CFLAGS-$(CONFIG_WERROR) += -Werror
-KBUILD_CFLAGS += $(KBUILD_CFLAGS-y)
-
 ifdef CONFIG_CC_IS_CLANG
 KBUILD_CPPFLAGS += -Qunused-arguments
 # The kernel builds with '-std=gnu89' so use of GNU extensions is acceptable.
diff --git a/init/Kconfig b/init/Kconfig
index 8cb97f141b70..e708180e9a59 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -137,20 +137,6 @@  config COMPILE_TEST
 	  here. If you are a user/distributor, say N here to exclude useless
 	  drivers to be distributed.
 
-config WERROR
-	bool "Compile the kernel with warnings as errors"
-	default y
-	help
-	  A kernel build should not cause any compiler warnings, and this
-	  enables the '-Werror' flag to enforce that rule by default.
-
-	  However, if you have a new (or very old) compiler with odd and
-	  unusual warnings, or you have some architecture with problems,
-	  you may need to disable this config option in order to
-	  successfully build the kernel.
-
-	  If in doubt, say Y.
-
 config UAPI_HEADER_TEST
 	bool "Compile test UAPI headers"
 	depends on HEADERS_INSTALL && CC_CAN_LINK