Message ID | E1XeWYh-0004Mw-FC@rmk-PC.arm.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 10/15/2014 05:56 PM, Russell King wrote: > These stock GCC versions miscompile the kernel by incorrectly optimising > the function epilogue code - by first increasing the stack pointer, and > then loading entries from below the stack. This means that an opportune > interrupt or exception will corrupt the current function's saved state, > which may result in the parent function seeing different register > values. > > As this bug has been known to result in corrupted filesystems, and these > buggy compiler versions seem to be frequently used, we have little > option but to blacklist these compiler versions. > > Distributions may have fixed PR58854, but as their compilers are totally > indistinguishable from the buggy versions, it is unfortunate that this > also results in those also being blacklisted. Given the filesystem > corruption potential of the original, this is the lesser evil. People > who want to build with their fixed compiler versions will need to adjust > the kernel source. (Distros need to think about the implications of > fixing such a compiler bug, and consider how to ensure that their fixed > compiler versions can be detected if they wish to avoid this.) > > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> > --- > > This is what I came up with - this places the build check right at the > beginning of the kernel build, rather than at some point where > linux/compiler.h gets included. Note that this is where we have > previous ARM specific GCC version blacklisting. I'm blacklisting > GCC 4.8.0 to GCC 4.8.2 inclussive, which seems to be the right versions > for stock GCC. > > I was in two minds whether to include 4.8.3 as Linaro released a buggy > toolchain which identifies itself as 4.8.3, but I decided that's also > a distro problem. IMHO Linaro should really think about taking that > compiler down given the seriousness of this bug and it being > indistinguishable from the fixed stock version. Maybe it's unfair to blame them; Linaro just took a snapshot and released what was there. If gcc is going to retain the "change release number then add all the new features" model, some kind of prerelease indicator would help eliminate this kind of problem. And that indicator should be both a preprocessor define and parseable from the command line :) > arch/arm/kernel/asm-offsets.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c > index 713e807621d2..e14c1a12b414 100644 > --- a/arch/arm/kernel/asm-offsets.c > +++ b/arch/arm/kernel/asm-offsets.c > @@ -10,6 +10,7 @@ > * it under the terms of the GNU General Public License version 2 as > * published by the Free Software Foundation. > */ > +#include <linux/compiler.h> > #include <linux/sched.h> > #include <linux/mm.h> > #include <linux/dma-mapping.h> > @@ -39,10 +40,19 @@ > * GCC 3.2.x: miscompiles NEW_AUX_ENT in fs/binfmt_elf.c > * (http://gcc.gnu.org/PR8896) and incorrect structure > * initialisation in fs/jffs2/erase.c > + * GCC 4.8.0-4.8.2: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854 > + * miscompiles find_get_entry(), and can result in EXT3 and EXT4 > + * filesystem corruption (possibly other FS too). > */ > +#ifdef __GNUC__ > #if (__GNUC__ == 3 && __GNUC_MINOR__ < 3) > #error Your compiler is too buggy; it is known to miscompile kernels. > -#error Known good compilers: 3.3 > +#error Known good compilers: 3.3, 4.x > +#endif > +#if GCC_VERSION >= 40800 || GCC_VERSION < 40803 ^^ && > +#error Your compiler is too buggy; it is known to miscompile kernels > +#error and result in filesystem corruption and oopses. > +#endif > #endif > > int main(void) > Regards, Peter Hurley -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 15, 2014 at 06:18:30PM -0400, Peter Hurley wrote: > On 10/15/2014 05:56 PM, Russell King wrote: > > I was in two minds whether to include 4.8.3 as Linaro released a buggy > > toolchain which identifies itself as 4.8.3, but I decided that's also > > a distro problem. IMHO Linaro should really think about taking that > > compiler down given the seriousness of this bug and it being > > indistinguishable from the fixed stock version. > > Maybe it's unfair to blame them; Linaro just took a snapshot and > released what was there. > > If gcc is going to retain the "change release number then add all the > new features" model, some kind of prerelease indicator would help > eliminate this kind of problem. And that indicator should be both > a preprocessor define and parseable from the command line :) My comment is not to attribute blame to them, my comment is entirely on a technical level. My reasoning is that the bug is just as prevalent in userspace, though it will occur less often. Any program which uses signal handlers is a candidate for exactly the same kind of corruption, since you can receive that signal between the point that the stack pointer is modified and the function loads the parent context. Of course, there are ways around that: don't use signal handlers, or if you do, use alternate signal stacks. Neither of those can be guaranteed for any program though. So, let me put this another way: a compiler with this bug is _completely_ unsuitable for use for compiling programs for use under the Linux kernel _as well_ as the Linux kernel itself. The difference is that the Linaro compilers come with an expectation that they are usable on ARM... whereas stock versions cover a lot more and so the ARM arch is probably very small number of their users. Hence why I recommend that Linaro takes down their buggy compiler. Their 4.8.3 version should not be used *anywhere*, just the same as the stock 4.8 to 4.8.2 inclusive should also not be used anywhere on ARM either.
On 10/15/2014 08:18 PM, Russell King - ARM Linux wrote: > Hence why I recommend that Linaro takes down their buggy compiler. > Their 4.8.3 version should not be used *anywhere*, just the same as > the stock 4.8 to 4.8.2 inclusive should also not be used anywhere on > ARM either. Completely agree. -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Russell King > These stock GCC versions miscompile the kernel by incorrectly optimising > the function epilogue code - by first increasing the stack pointer, and > then loading entries from below the stack. This means that an opportune > interrupt or exception will corrupt the current function's saved state, > which may result in the parent function seeing different register > values. > > As this bug has been known to result in corrupted filesystems, and these > buggy compiler versions seem to be frequently used, we have little > option but to blacklist these compiler versions. > > Distributions may have fixed PR58854, but as their compilers are totally > indistinguishable from the buggy versions, it is unfortunate that this > also results in those also being blacklisted. Given the filesystem > corruption potential of the original, this is the lesser evil. People > who want to build with their fixed compiler versions will need to adjust > the kernel source. (Distros need to think about the implications of > fixing such a compiler bug, and consider how to ensure that their fixed > compiler versions can be detected if they wish to avoid this.) > > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> > --- > > This is what I came up with - this places the build check right at the > beginning of the kernel build, rather than at some point where > linux/compiler.h gets included. Note that this is where we have > previous ARM specific GCC version blacklisting. I'm blacklisting > GCC 4.8.0 to GCC 4.8.2 inclussive, which seems to be the right versions > for stock GCC. > > I was in two minds whether to include 4.8.3 as Linaro released a buggy > toolchain which identifies itself as 4.8.3, but I decided that's also > a distro problem. IMHO Linaro should really think about taking that > compiler down given the seriousness of this bug and it being > indistinguishable from the fixed stock version. > > arch/arm/kernel/asm-offsets.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c > index 713e807621d2..e14c1a12b414 100644 > --- a/arch/arm/kernel/asm-offsets.c > +++ b/arch/arm/kernel/asm-offsets.c > @@ -10,6 +10,7 @@ > * it under the terms of the GNU General Public License version 2 as > * published by the Free Software Foundation. > */ > +#include <linux/compiler.h> > #include <linux/sched.h> > #include <linux/mm.h> > #include <linux/dma-mapping.h> > @@ -39,10 +40,19 @@ > * GCC 3.2.x: miscompiles NEW_AUX_ENT in fs/binfmt_elf.c > * (http://gcc.gnu.org/PR8896) and incorrect structure > * initialisation in fs/jffs2/erase.c > + * GCC 4.8.0-4.8.2: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854 > + * miscompiles find_get_entry(), and can result in EXT3 and EXT4 > + * filesystem corruption (possibly other FS too). > */ > +#ifdef __GNUC__ > #if (__GNUC__ == 3 && __GNUC_MINOR__ < 3) > #error Your compiler is too buggy; it is known to miscompile kernels. > -#error Known good compilers: 3.3 > +#error Known good compilers: 3.3, 4.x Except that isn't true since 4.8.0 isn't a good compiler. > +#endif > +#if GCC_VERSION >= 40800 || GCC_VERSION < 40803 > +#error Your compiler is too buggy; it is known to miscompile kernels > +#error and result in filesystem corruption and oopses. > +#endif > #endif You are mixing the style of the version check. Why not the single test: #if GCC_VERSION < 30300 || (GCC_VERSION >= 40800 && GCC_VERSION < 40803) #error Your compiler is too buggy; it is known to miscompile code. #error Known good compilers: 3.3 onwards excluding 4.8.0 through 4.8.2 #endif David > > int main(void) > -- > 1.8.3.1
On Thu, Oct 16, 2014 at 10:30:30AM +0000, David Laight wrote: > > +#ifdef __GNUC__ > > #if (__GNUC__ == 3 && __GNUC_MINOR__ < 3) > > #error Your compiler is too buggy; it is known to miscompile kernels. > > -#error Known good compilers: 3.3 > > +#error Known good compilers: 3.3, 4.x > > Except that isn't true since 4.8.0 isn't a good compiler. > > > +#endif > > +#if GCC_VERSION >= 40800 || GCC_VERSION < 40803 > > +#error Your compiler is too buggy; it is known to miscompile kernels > > +#error and result in filesystem corruption and oopses. > > +#endif > > #endif > > You are mixing the style of the version check. > Why not the single test: > #if GCC_VERSION < 30300 || (GCC_VERSION >= 40800 && GCC_VERSION < 40803) > #error Your compiler is too buggy; it is known to miscompile code. > #error Known good compilers: 3.3 onwards excluding 4.8.0 through 4.8.2 > #endif As you have just nicely demonstrated, changing what's already there can cause unintentional changes to creep in. The original test tested for GCC 3 compilers older than GCC 3.3, excluding GCC 2.96 and older. Your version blacklists everything before GCC 3.3. Such a change should be a separate patch. Moreover, if we really do want to blacklist all compilers older than GCC 3.3, then that too needs an explanation in the comments above this block, and in commit message.
On Thu, 16 Oct 2014, Russell King - ARM Linux wrote: > So, let me put this another way: a compiler with this bug is _completely_ > unsuitable for use for compiling programs for use under the Linux > kernel _as well_ as the Linux kernel itself. > > The difference is that the Linaro compilers come with an expectation > that they are usable on ARM... whereas stock versions cover a lot more > and so the ARM arch is probably very small number of their users. > > Hence why I recommend that Linaro takes down their buggy compiler. > Their 4.8.3 version should not be used *anywhere*, just the same as > the stock 4.8 to 4.8.2 inclusive should also not be used anywhere on > ARM either. Here's the answer from the toolchain team: On Thu, 16 Oct 2014, Yvan Roux <yvan.roux@linaro.org> wrote: | Hi Nicolas, | | thanks for bringing this to our knowledge. | | The fix for PR58854 was included in our releases since the GCC | 4.8-2013.12 which is based on a 4.8.3 prerelease version (at svn | revision 205577). As we are doing monthly releases based on a | revision of the related FSF branch, using GCC_VERSION (__GNUC__, | __GNUC_MINOR,__GNUC_PATCHLEVEL) is not accurate enough to identify the | release version (all our releases between November 2013 and May 2014 | will be 4.8.3) the __VERSION__ predefined macro is a bit more accurate | here ("4.8.3 20131202 (prerelease)") but not completely satisfactory. | | I completely agree that we should at least mention in our impacted | releases download pages that this bug is present. Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 16, 2014 at 10:58:06AM -0400, Nicolas Pitre wrote: > On Thu, 16 Oct 2014, Russell King - ARM Linux wrote: > > > So, let me put this another way: a compiler with this bug is _completely_ > > unsuitable for use for compiling programs for use under the Linux > > kernel _as well_ as the Linux kernel itself. > > > > The difference is that the Linaro compilers come with an expectation > > that they are usable on ARM... whereas stock versions cover a lot more > > and so the ARM arch is probably very small number of their users. > > > > Hence why I recommend that Linaro takes down their buggy compiler. > > Their 4.8.3 version should not be used *anywhere*, just the same as > > the stock 4.8 to 4.8.2 inclusive should also not be used anywhere on > > ARM either. > > Here's the answer from the toolchain team: > > On Thu, 16 Oct 2014, Yvan Roux <yvan.roux@linaro.org> wrote: > > | Hi Nicolas, > | > | thanks for bringing this to our knowledge. > | > | The fix for PR58854 was included in our releases since the GCC > | 4.8-2013.12 which is based on a 4.8.3 prerelease version (at svn > | revision 205577). As we are doing monthly releases based on a > | revision of the related FSF branch, using GCC_VERSION (__GNUC__, > | __GNUC_MINOR,__GNUC_PATCHLEVEL) is not accurate enough to identify the > | release version (all our releases between November 2013 and May 2014 > | will be 4.8.3) the __VERSION__ predefined macro is a bit more accurate > | here ("4.8.3 20131202 (prerelease)") but not completely satisfactory. > | > | I completely agree that we should at least mention in our impacted > | releases download pages that this bug is present. Unfortunately, __VERSION__ doesn't help us identify that the bug has been fixed - it's not something which can be tested at preprocessor time. Those which identify themselves as 4.8.3 won't be impacted by the patch, which means that the Linaro 4.8.3 GCC versions will continue to build the kernel just fine. Even the buggy versions. I don't see the point of continuing to offer the buggy versions for download though - the compiler is totally unsuitable for building any Linux related binaries, be that kernel or userspace, and so should not be used under any circumstances, period.
On Fri, Oct 17, 2014 at 02:41:12PM +0100, Russell King - ARM Linux wrote: > On Thu, Oct 16, 2014 at 10:58:06AM -0400, Nicolas Pitre wrote: > > On Thu, 16 Oct 2014, Russell King - ARM Linux wrote: > > > > > So, let me put this another way: a compiler with this bug is _completely_ > > > unsuitable for use for compiling programs for use under the Linux > > > kernel _as well_ as the Linux kernel itself. > > > > > > The difference is that the Linaro compilers come with an expectation > > > that they are usable on ARM... whereas stock versions cover a lot more > > > and so the ARM arch is probably very small number of their users. > > > > > > Hence why I recommend that Linaro takes down their buggy compiler. > > > Their 4.8.3 version should not be used *anywhere*, just the same as > > > the stock 4.8 to 4.8.2 inclusive should also not be used anywhere on > > > ARM either. > > > > Here's the answer from the toolchain team: > > > > On Thu, 16 Oct 2014, Yvan Roux <yvan.roux@linaro.org> wrote: > > > > | Hi Nicolas, > > | > > | thanks for bringing this to our knowledge. > > | > > | The fix for PR58854 was included in our releases since the GCC > > | 4.8-2013.12 which is based on a 4.8.3 prerelease version (at svn > > | revision 205577). As we are doing monthly releases based on a > > | revision of the related FSF branch, using GCC_VERSION (__GNUC__, > > | __GNUC_MINOR,__GNUC_PATCHLEVEL) is not accurate enough to identify the > > | release version (all our releases between November 2013 and May 2014 > > | will be 4.8.3) the __VERSION__ predefined macro is a bit more accurate > > | here ("4.8.3 20131202 (prerelease)") but not completely satisfactory. > > | > > | I completely agree that we should at least mention in our impacted > > | releases download pages that this bug is present. > > Unfortunately, __VERSION__ doesn't help us identify that the bug has > been fixed - it's not something which can be tested at preprocessor > time. > > Those which identify themselves as 4.8.3 won't be impacted by the patch, > which means that the Linaro 4.8.3 GCC versions will continue to build > the kernel just fine. Even the buggy versions. > > I don't see the point of continuing to offer the buggy versions for > download though - the compiler is totally unsuitable for building any > Linux related binaries, be that kernel or userspace, and so should not > be used under any circumstances, period. Right, with the test fixed (&& instead of ||), I've just pushed it out this morning, and received the results from Olof's kernel builder, which shows that he's using a gcc 4.8.x version which gets caught by the test. I think Olof needs to do something about his gcc version there.
On Sun, Oct 19, 2014 at 08:55:14AM -0700, Linus Torvalds wrote: > On Oct 19, 2014 2:56 AM, "Russell King - ARM Linux" <linux@arm.linux.org.uk> > wrote: > > > > Right, with the test fixed (&& instead of ||), I've just pushed it out > > this morning, and received the results from Olof's kernel builder, > > which shows that he's using a gcc 4.8.x version which gets caught by > > the test. > > > > I think Olof needs to do something about his gcc version there. > > ... or maybe you could just add a "I have a working compiler" config option > override to show people to say "I have one of the affected versions, but my > compiler has the fix"? Does that work for things like allnoconfig and randconfig? I guess people /could/ seed those appropriately, but that seems to be something that Olof wants to avoid doing.
On Sun, Oct 19, 2014 at 11:51 AM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > > Does that work for things like allnoconfig and randconfig? I guess people > /could/ seed those appropriately, but that seems to be something that Olof > wants to avoid doing. You can force particular config options for those, using something like export KCONFIG_ALLCONFIG=custom.config and filling the custom config file with the required config setting. So the downside is that you do need to have this custom support to seed particular things, but yes, it's doable, and it only requires a one-time setup with an environment variable, so it doesn't need changes to the actual testing infrastructure. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Oct 19, 2014 at 01:04:20PM -0700, Linus Torvalds wrote: > On Sun, Oct 19, 2014 at 11:51 AM, Russell King - ARM Linux > <linux@arm.linux.org.uk> wrote: > > > > Does that work for things like allnoconfig and randconfig? I guess people > > /could/ seed those appropriately, but that seems to be something that Olof > > wants to avoid doing. > > You can force particular config options for those, using something like > > export KCONFIG_ALLCONFIG=custom.config > > and filling the custom config file with the required config setting. Yes, this is what I meant above by "seed"ing the configurations, and it is something I do all the time with my autobuilder here for things like allnoconfig where I want to ensure that certain options have specific values. When I talked to Olof about it, he seemed to be against using the facility in his autobuilder. I'm not clear on why that was, but I suspect it's because Olof wants to test real allnoconfig builds. An alternative is we could have a HAVE_GCC_PR58854_FIX preprocessor conditional, and pass in KCFLAGS=-DHAVE_GCC_PR58854_FIX to the kernel to bypass the check. Either way, I think we need Olof's input on this to say what he's willing to do.
Hi, On Sun, Oct 19, 2014 at 1:14 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Sun, Oct 19, 2014 at 01:04:20PM -0700, Linus Torvalds wrote: >> On Sun, Oct 19, 2014 at 11:51 AM, Russell King - ARM Linux >> <linux@arm.linux.org.uk> wrote: >> > >> > Does that work for things like allnoconfig and randconfig? I guess people >> > /could/ seed those appropriately, but that seems to be something that Olof >> > wants to avoid doing. >> >> You can force particular config options for those, using something like >> >> export KCONFIG_ALLCONFIG=custom.config >> >> and filling the custom config file with the required config setting. > > Yes, this is what I meant above by "seed"ing the configurations, and it > is something I do all the time with my autobuilder here for things like > allnoconfig where I want to ensure that certain options have specific > values. > > When I talked to Olof about it, he seemed to be against using the > facility in his autobuilder. I'm not clear on why that was, but I > suspect it's because Olof wants to test real allnoconfig builds. > > An alternative is we could have a HAVE_GCC_PR58854_FIX preprocessor > conditional, and pass in KCFLAGS=-DHAVE_GCC_PR58854_FIX to the kernel > to bypass the check. > > Either way, I think we need Olof's input on this to say what he's > willing to do. I definitely don't mind adjusting the script, I've got some plans to add debug info on the builds that I boot as well, so I need to deal with a few fragments anyway. I haven't had much time to do that just yet, so this might raise it on the list of priorities a bit. However, in this particular case I think I'll just upgrade gcc instead. Last time I tried 4.9, it spewed a lot of new warnings. As long as they're mostly gone I'll just switch over. Russell: You might get a few duplicate reports as I try out the new toolchain. -Olof -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c index 713e807621d2..e14c1a12b414 100644 --- a/arch/arm/kernel/asm-offsets.c +++ b/arch/arm/kernel/asm-offsets.c @@ -10,6 +10,7 @@ * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ +#include <linux/compiler.h> #include <linux/sched.h> #include <linux/mm.h> #include <linux/dma-mapping.h> @@ -39,10 +40,19 @@ * GCC 3.2.x: miscompiles NEW_AUX_ENT in fs/binfmt_elf.c * (http://gcc.gnu.org/PR8896) and incorrect structure * initialisation in fs/jffs2/erase.c + * GCC 4.8.0-4.8.2: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854 + * miscompiles find_get_entry(), and can result in EXT3 and EXT4 + * filesystem corruption (possibly other FS too). */ +#ifdef __GNUC__ #if (__GNUC__ == 3 && __GNUC_MINOR__ < 3) #error Your compiler is too buggy; it is known to miscompile kernels. -#error Known good compilers: 3.3 +#error Known good compilers: 3.3, 4.x +#endif +#if GCC_VERSION >= 40800 || GCC_VERSION < 40803 +#error Your compiler is too buggy; it is known to miscompile kernels +#error and result in filesystem corruption and oopses. +#endif #endif int main(void)
These stock GCC versions miscompile the kernel by incorrectly optimising the function epilogue code - by first increasing the stack pointer, and then loading entries from below the stack. This means that an opportune interrupt or exception will corrupt the current function's saved state, which may result in the parent function seeing different register values. As this bug has been known to result in corrupted filesystems, and these buggy compiler versions seem to be frequently used, we have little option but to blacklist these compiler versions. Distributions may have fixed PR58854, but as their compilers are totally indistinguishable from the buggy versions, it is unfortunate that this also results in those also being blacklisted. Given the filesystem corruption potential of the original, this is the lesser evil. People who want to build with their fixed compiler versions will need to adjust the kernel source. (Distros need to think about the implications of fixing such a compiler bug, and consider how to ensure that their fixed compiler versions can be detected if they wish to avoid this.) Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> --- This is what I came up with - this places the build check right at the beginning of the kernel build, rather than at some point where linux/compiler.h gets included. Note that this is where we have previous ARM specific GCC version blacklisting. I'm blacklisting GCC 4.8.0 to GCC 4.8.2 inclussive, which seems to be the right versions for stock GCC. I was in two minds whether to include 4.8.3 as Linaro released a buggy toolchain which identifies itself as 4.8.3, but I decided that's also a distro problem. IMHO Linaro should really think about taking that compiler down given the seriousness of this bug and it being indistinguishable from the fixed stock version. arch/arm/kernel/asm-offsets.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)