Message ID | 20200509120707.188595-2-arnd@arndb.de (mailing list archive) |
---|---|
State | Accepted |
Commit | 32221df6765b3773ff1af37c77f8531ebc48f246 |
Delegated to: | Kalle Valo |
Headers | show |
Series | [net-next,1/2] ath10k: fix gcc-10 zero-length-bounds warnings | expand |
Arnd Bergmann <arnd@arndb.de> writes: > gcc-10 correctly points out a bug with a zero-length array in > struct ath10k_pci: > > drivers/net/wireless/ath/ath10k/ahb.c: In function 'ath10k_ahb_remove': > drivers/net/wireless/ath/ath10k/ahb.c:30:9: error: array subscript 0 > is outside the bounds of an interior zero-length array 'struct > ath10k_ahb[0]' [-Werror=zero-length-bounds] > 30 | return &((struct ath10k_pci *)ar->drv_priv)->ahb[0]; > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > In file included from drivers/net/wireless/ath/ath10k/ahb.c:13: > drivers/net/wireless/ath/ath10k/pci.h:185:20: note: while referencing 'ahb' > 185 | struct ath10k_ahb ahb[0]; > | ^~~ > > The last addition to the struct ignored the comments and added > new members behind the array that must remain last. > > Change it to a flexible-array member and move it last again to > make it work correctly, prevent the same thing from happening > again (all compilers warn about flexible-array members in the > middle of a struct) and get it to build without warnings. Very good find, thanks! This bug would cause all sort of strange memory corruption issues.
Kalle Valo <kvalo@codeaurora.org> writes: > Arnd Bergmann <arnd@arndb.de> writes: > >> gcc-10 correctly points out a bug with a zero-length array in >> struct ath10k_pci: >> >> drivers/net/wireless/ath/ath10k/ahb.c: In function 'ath10k_ahb_remove': >> drivers/net/wireless/ath/ath10k/ahb.c:30:9: error: array subscript 0 >> is outside the bounds of an interior zero-length array 'struct >> ath10k_ahb[0]' [-Werror=zero-length-bounds] >> 30 | return &((struct ath10k_pci *)ar->drv_priv)->ahb[0]; >> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> In file included from drivers/net/wireless/ath/ath10k/ahb.c:13: >> drivers/net/wireless/ath/ath10k/pci.h:185:20: note: while referencing 'ahb' >> 185 | struct ath10k_ahb ahb[0]; >> | ^~~ >> >> The last addition to the struct ignored the comments and added >> new members behind the array that must remain last. >> >> Change it to a flexible-array member and move it last again to >> make it work correctly, prevent the same thing from happening >> again (all compilers warn about flexible-array members in the >> middle of a struct) and get it to build without warnings. > > Very good find, thanks! This bug would cause all sort of strange memory > corruption issues. This motivated me to switch to using GCC 10.x and I noticed that you had already upgraded crosstool so it was a trivial thing to do, awesome :) https://mirrors.edge.kernel.org/pub/tools/crosstool/ I use crosstool like this using GNUmakefile: CROSS_COMPILE=/opt/cross/gcc-10.1.0-nolibc/x86_64-linux/bin/x86_64-linux- include Makefile I think it's handy trick and would be good to mention that in the crosstool main page. That way I could just point people to the crosstool main page when they are using ancient compilers and would need to upgrade.
On Mon, May 11, 2020 at 2:17 PM Kalle Valo <kvalo@codeaurora.org> wrote: > > Kalle Valo <kvalo@codeaurora.org> writes: > >> > >> Change it to a flexible-array member and move it last again to > >> make it work correctly, prevent the same thing from happening > >> again (all compilers warn about flexible-array members in the > >> middle of a struct) and get it to build without warnings. > > > > Very good find, thanks! This bug would cause all sort of strange memory > > corruption issues. > > This motivated me to switch to using GCC 10.x and I noticed that you had > already upgraded crosstool so it was a trivial thing to do, awesome :) > > https://mirrors.edge.kernel.org/pub/tools/crosstool/ > > I use crosstool like this using GNUmakefile: > > CROSS_COMPILE=/opt/cross/gcc-10.1.0-nolibc/x86_64-linux/bin/x86_64-linux- > include Makefile Right, I have something similar (with many more additional things) in a local makefile here. I mainly use that to pick the correct cross toolchain based on ${ARCH}, and to build multiple randconfig kernels in parallel with 'make -j${NR_CPUS}' for better CPU utilization. > I think it's handy trick and would be good to mention that in the > crosstool main page. That way I could just point people to the crosstool > main page when they are using ancient compilers and would need to > upgrade. I actually started working on a script that I'd like to include the kernel sources to list the installed compilers, automatically pick on that works for the current architecture, or download one for local installation. Arnd
(trimming CC, changing title) Kalle Valo <kvalo@codeaurora.org> writes: > Kalle Valo <kvalo@codeaurora.org> writes: > >> Arnd Bergmann <arnd@arndb.de> writes: >> >>> gcc-10 correctly points out a bug with a zero-length array in >>> struct ath10k_pci: >>> >>> drivers/net/wireless/ath/ath10k/ahb.c: In function 'ath10k_ahb_remove': >>> drivers/net/wireless/ath/ath10k/ahb.c:30:9: error: array subscript 0 >>> is outside the bounds of an interior zero-length array 'struct >>> ath10k_ahb[0]' [-Werror=zero-length-bounds] >>> 30 | return &((struct ath10k_pci *)ar->drv_priv)->ahb[0]; >>> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> In file included from drivers/net/wireless/ath/ath10k/ahb.c:13: >>> drivers/net/wireless/ath/ath10k/pci.h:185:20: note: while referencing 'ahb' >>> 185 | struct ath10k_ahb ahb[0]; >>> | ^~~ >>> >>> The last addition to the struct ignored the comments and added >>> new members behind the array that must remain last. >>> >>> Change it to a flexible-array member and move it last again to >>> make it work correctly, prevent the same thing from happening >>> again (all compilers warn about flexible-array members in the >>> middle of a struct) and get it to build without warnings. >> >> Very good find, thanks! This bug would cause all sort of strange memory >> corruption issues. > > This motivated me to switch to using GCC 10.x and I noticed that you had > already upgraded crosstool so it was a trivial thing to do, awesome :) > > https://mirrors.edge.kernel.org/pub/tools/crosstool/ And now I have a problem :) I first noticed that my x86 testbox is not booting when I compile the kernel with GCC 10.1.0 from crosstool. I didn't get any error messages so I just downgraded the compiler and the kernel was booting fine again. Next I decided to try GCC 10.1 with my x86 laptop and it also failed to boot, but this time I got kernel logs and saw this: Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_secodary+0x178/0x180 Call Trace: dump_stack panic ? _raw_spin_unlock_irqrestore ? start_secondary __stack_chk_fail start_secondary secondary_startup (I wrote the above messages manually from a picture so expect typos) Then also on my x86 laptop I downgraded the compiler to GCC 8.1.0 (from crosstool), rebuilt the exactly same kernel version and the kernel booted without issues. I'm using 5.7.0-rc4-wt-ath+ which is basically v5.7-rc4 plus latest wireless patches, and I doubt the wireless patches are making any difference this early in the boot. All compilers I use are prebuilt binaries from kernel.org crosstool repo[1] with addition of ccache v3.4.1 to speed up my builds. Any ideas? How should I debug this further? [1] https://mirrors.edge.kernel.org/pub/tools/crosstool/
On Wed, May 13, 2020 at 8:50 AM Kalle Valo <kvalo@codeaurora.org> wrote: > > Kalle Valo <kvalo@codeaurora.org> writes: > > > This motivated me to switch to using GCC 10.x and I noticed that you had > > already upgraded crosstool so it was a trivial thing to do, awesome :) > > > > https://mirrors.edge.kernel.org/pub/tools/crosstool/ > > And now I have a problem :) I first noticed that my x86 testbox is not > booting when I compile the kernel with GCC 10.1.0 from crosstool. I > didn't get any error messages so I just downgraded the compiler and the > kernel was booting fine again. Next I decided to try GCC 10.1 with my > x86 laptop and it also failed to boot, but this time I got kernel logs > and saw this: > > Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_secodary+0x178/0x180 > > Call Trace: > dump_stack > panic > ? _raw_spin_unlock_irqrestore > ? start_secondary > __stack_chk_fail > start_secondary > secondary_startup > > (I wrote the above messages manually from a picture so expect typos) > > Then also on my x86 laptop I downgraded the compiler to GCC 8.1.0 (from > crosstool), rebuilt the exactly same kernel version and the kernel > booted without issues. > > I'm using 5.7.0-rc4-wt-ath+ which is basically v5.7-rc4 plus latest > wireless patches, and I doubt the wireless patches are making any > difference this early in the boot. All compilers I use are prebuilt > binaries from kernel.org crosstool repo[1] with addition of ccache > v3.4.1 to speed up my builds. > > Any ideas? How should I debug this further? At least if it fails reproducibly, it's probably not too hard to drill down further. Some ideas: * I'd first try to reproduce it in qemu. Since you don't even need any user space or modules, I would simply try $ qemu-system-x86_64 -nographic -monitor none -append "console=ttyS0" -serial stdio -smp 4 -kernel arch/x86/boot/bzImage I tried it here with an x86 defconfig linux-next kernel but did not run into the problem you described. If you share your .config, I can try reproducing with that as well. Once there is a reproducer in qemu, it should be trivial to step through it using gdb. * There are still two prerelease compiler versions on kernel.org, from February and from April. You can try each one to see if this was a recent regression. It's also possible that there is a problem with my specific builds of gcc-10.1, and that the compiler is actually fine for others.The gcc-10 packages in Fedora/Debian/Ubuntu are probably better tested. Arnd
Arnd Bergmann <arnd@arndb.de> writes: > On Wed, May 13, 2020 at 8:50 AM Kalle Valo <kvalo@codeaurora.org> wrote: >> >> Kalle Valo <kvalo@codeaurora.org> writes: >> >> > This motivated me to switch to using GCC 10.x and I noticed that you had >> > already upgraded crosstool so it was a trivial thing to do, awesome :) >> > >> > https://mirrors.edge.kernel.org/pub/tools/crosstool/ >> >> And now I have a problem :) I first noticed that my x86 testbox is not >> booting when I compile the kernel with GCC 10.1.0 from crosstool. I >> didn't get any error messages so I just downgraded the compiler and the >> kernel was booting fine again. Next I decided to try GCC 10.1 with my >> x86 laptop and it also failed to boot, but this time I got kernel logs >> and saw this: >> >> Kernel panic - not syncing: stack-protector: Kernel stack is >> corrupted in: start_secodary+0x178/0x180 >> >> Call Trace: >> dump_stack >> panic >> ? _raw_spin_unlock_irqrestore >> ? start_secondary >> __stack_chk_fail >> start_secondary >> secondary_startup >> >> (I wrote the above messages manually from a picture so expect typos) >> >> Then also on my x86 laptop I downgraded the compiler to GCC 8.1.0 (from >> crosstool), rebuilt the exactly same kernel version and the kernel >> booted without issues. >> >> I'm using 5.7.0-rc4-wt-ath+ which is basically v5.7-rc4 plus latest >> wireless patches, and I doubt the wireless patches are making any >> difference this early in the boot. All compilers I use are prebuilt >> binaries from kernel.org crosstool repo[1] with addition of ccache >> v3.4.1 to speed up my builds. >> >> Any ideas? How should I debug this further? > > At least if it fails reproducibly, it's probably not too hard to drill > down further. Some ideas: > > * I'd first try to reproduce it in qemu. Since you don't even need > any user space or modules, I would simply try > $ qemu-system-x86_64 -nographic -monitor none -append > "console=ttyS0" -serial stdio -smp 4 -kernel arch/x86/boot/bzImage > I tried it here with an x86 defconfig linux-next kernel but did not > run into the problem you described. Thanks, I'll try that but I expect it will take few days before I can do it. > If you share your .config, I can try reproducing with that as well. > Once there is a reproducer in qemu, it should be trivial to step > through it using gdb. I have attached the .config I used with GCC 10.1. If you are able to test it please do let me know how it went. > * There are still two prerelease compiler versions on kernel.org, > from February and from April. You can try each one to see > if this was a recent regression. It's also possible that there is > a problem with my specific builds of gcc-10.1, and that the > compiler is actually fine for others.The gcc-10 packages in > Fedora/Debian/Ubuntu are probably better tested. I'm still using Ubuntu 16.04 so not sure how easy it is to find a package for that, but maybe this is a good reason to finally my upgrade my laptop :)
On Wed, May 13, 2020 at 2:57 PM Kalle Valo <kvalo@codeaurora.org> wrote: > > Arnd Bergmann <arnd@arndb.de> writes: > > > On Wed, May 13, 2020 at 8:50 AM Kalle Valo <kvalo@codeaurora.org> wrote: > >> > >> Kalle Valo <kvalo@codeaurora.org> writes: > > > > At least if it fails reproducibly, it's probably not too hard to drill > > down further. Some ideas: > > > > * I'd first try to reproduce it in qemu. Since you don't even need > > any user space or modules, I would simply try > > $ qemu-system-x86_64 -nographic -monitor none -append > > "console=ttyS0" -serial stdio -smp 4 -kernel arch/x86/boot/bzImage > > I tried it here with an x86 defconfig linux-next kernel but did not > > run into the problem you described. > > Thanks, I'll try that but I expect it will take few days before I can do > it. > > > If you share your .config, I can try reproducing with that as well. > > Once there is a reproducer in qemu, it should be trivial to step > > through it using gdb. > > I have attached the .config I used with GCC 10.1. If you are able to > test it please do let me know how it went. Yes, I see the same problem now, but have not investigated any further. > > * There are still two prerelease compiler versions on kernel.org, > > from February and from April. You can try each one to see > > if this was a recent regression. It's also possible that there is > > a problem with my specific builds of gcc-10.1, and that the > > compiler is actually fine for others.The gcc-10 packages in > > Fedora/Debian/Ubuntu are probably better tested. > > I'm still using Ubuntu 16.04 so not sure how easy it is to find a > package for that, but maybe this is a good reason to finally my upgrade > my laptop :) I checked with the gcc-10 package from Ubuntu 20.04, same result as with my version, at least that indicates it's not my fault ;-) Arnd
Arnd Bergmann <arnd@arndb.de> writes: > On Wed, May 13, 2020 at 2:57 PM Kalle Valo <kvalo@codeaurora.org> wrote: >> >> Arnd Bergmann <arnd@arndb.de> writes: >> >> > On Wed, May 13, 2020 at 8:50 AM Kalle Valo <kvalo@codeaurora.org> wrote: >> >> >> >> Kalle Valo <kvalo@codeaurora.org> writes: >> > >> > At least if it fails reproducibly, it's probably not too hard to drill >> > down further. Some ideas: >> > >> > * I'd first try to reproduce it in qemu. Since you don't even need >> > any user space or modules, I would simply try >> > $ qemu-system-x86_64 -nographic -monitor none -append >> > "console=ttyS0" -serial stdio -smp 4 -kernel arch/x86/boot/bzImage >> > I tried it here with an x86 defconfig linux-next kernel but did not >> > run into the problem you described. >> >> Thanks, I'll try that but I expect it will take few days before I can do >> it. >> >> > If you share your .config, I can try reproducing with that as well. >> > Once there is a reproducer in qemu, it should be trivial to step >> > through it using gdb. >> >> I have attached the .config I used with GCC 10.1. If you are able to >> test it please do let me know how it went. > > Yes, I see the same problem now, but have not investigated > any further. Great, so it's not a problem due to my setup. >> > * There are still two prerelease compiler versions on kernel.org, >> > from February and from April. You can try each one to see >> > if this was a recent regression. It's also possible that there is >> > a problem with my specific builds of gcc-10.1, and that the >> > compiler is actually fine for others.The gcc-10 packages in >> > Fedora/Debian/Ubuntu are probably better tested. >> >> I'm still using Ubuntu 16.04 so not sure how easy it is to find a >> package for that, but maybe this is a good reason to finally my upgrade >> my laptop :) > > I checked with the gcc-10 package from Ubuntu 20.04, same > result as with my version, at least that indicates it's not my fault ;-) That's good to know as well. And now I can delay my laptop upgrade even more ;)
On Wed, May 13, 2020 at 09:50:03AM +0300, Kalle Valo wrote: > (trimming CC, changing title) > > Kalle Valo <kvalo@codeaurora.org> writes: > > > Kalle Valo <kvalo@codeaurora.org> writes: > > > >> Arnd Bergmann <arnd@arndb.de> writes: > >> > >>> gcc-10 correctly points out a bug with a zero-length array in > >>> struct ath10k_pci: > >>> > >>> drivers/net/wireless/ath/ath10k/ahb.c: In function 'ath10k_ahb_remove': > >>> drivers/net/wireless/ath/ath10k/ahb.c:30:9: error: array subscript 0 > >>> is outside the bounds of an interior zero-length array 'struct > >>> ath10k_ahb[0]' [-Werror=zero-length-bounds] > >>> 30 | return &((struct ath10k_pci *)ar->drv_priv)->ahb[0]; > >>> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >>> In file included from drivers/net/wireless/ath/ath10k/ahb.c:13: > >>> drivers/net/wireless/ath/ath10k/pci.h:185:20: note: while referencing 'ahb' > >>> 185 | struct ath10k_ahb ahb[0]; > >>> | ^~~ > >>> > >>> The last addition to the struct ignored the comments and added > >>> new members behind the array that must remain last. > >>> > >>> Change it to a flexible-array member and move it last again to > >>> make it work correctly, prevent the same thing from happening > >>> again (all compilers warn about flexible-array members in the > >>> middle of a struct) and get it to build without warnings. > >> > >> Very good find, thanks! This bug would cause all sort of strange memory > >> corruption issues. > > > > This motivated me to switch to using GCC 10.x and I noticed that you had > > already upgraded crosstool so it was a trivial thing to do, awesome :) > > > > https://mirrors.edge.kernel.org/pub/tools/crosstool/ > > And now I have a problem :) I first noticed that my x86 testbox is not > booting when I compile the kernel with GCC 10.1.0 from crosstool. I > didn't get any error messages so I just downgraded the compiler and the > kernel was booting fine again. Next I decided to try GCC 10.1 with my > x86 laptop and it also failed to boot, but this time I got kernel logs > and saw this: > > Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_secodary+0x178/0x180 > See https://lore.kernel.org/lkml/20200423161126.GD26021@zn.tnic/
On Wed, May 13, 2020 at 5:31 PM Kalle Valo <kvalo@codeaurora.org> wrote: > Arnd Bergmann <arnd@arndb.de> writes: > > On Wed, May 13, 2020 at 2:57 PM Kalle Valo <kvalo@codeaurora.org> wrote: > >> > >> Arnd Bergmann <arnd@arndb.de> writes: > >> > >> > If you share your .config, I can try reproducing with that as well. > >> > Once there is a reproducer in qemu, it should be trivial to step > >> > through it using gdb. > >> > >> I have attached the .config I used with GCC 10.1. If you are able to > >> test it please do let me know how it went. > > > > Yes, I see the same problem now, but have not investigated > > any further. > > Great, so it's not a problem due to my setup. I investigated a little more: This does happen with 'defconfig' after all, in my first try I must have missed the '-smp 2' argument to qemu, and it ended up working correctly with just one CPU but fails now. Stepping through the boot process, I see where it crashes in start_secondary: | /* to prevent fake stack check failure in clock setup */ | boot_init_stack_canary(); | | x86_cpuinit.setup_percpu_clockev(); | | wmb(); | cpu_startup_entry(CPUHP_AP_ONLINE_IDLE); The call to cpu_startup_entry() does not succeed, instead it jumps to __stack_chk_fail() from there. Arnd
From: Arnd Bergmann > Sent: 13 May 2020 17:00 > On Wed, May 13, 2020 at 5:31 PM Kalle Valo <kvalo@codeaurora.org> wrote: ... > I investigated a little more: This does happen with 'defconfig' > after all, in my first try I must have missed the '-smp 2' argument > to qemu, and it ended up working correctly with just one CPU > but fails now. > > Stepping through the boot process, I see where it crashes > in start_secondary: > > | /* to prevent fake stack check failure in clock setup */ > | boot_init_stack_canary(); > | > | x86_cpuinit.setup_percpu_clockev(); > | > | wmb(); > | cpu_startup_entry(CPUHP_AP_ONLINE_IDLE); > > The call to cpu_startup_entry() does not succeed, instead > it jumps to __stack_chk_fail() from there. Hasn't this already been fixed? Add: asm(""); after cpu_startup_entry() to stop it being tail-called. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Wed, May 13, 2020 at 5:48 PM Arvind Sankar <nivedita@alum.mit.edu> wrote: > > On Wed, May 13, 2020 at 09:50:03AM +0300, Kalle Valo wrote: > > And now I have a problem :) I first noticed that my x86 testbox is not > > booting when I compile the kernel with GCC 10.1.0 from crosstool. I > > didn't get any error messages so I just downgraded the compiler and the > > kernel was booting fine again. Next I decided to try GCC 10.1 with my > > x86 laptop and it also failed to boot, but this time I got kernel logs > > and saw this: > > > > Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_secodary+0x178/0x180 > > > > See https://lore.kernel.org/lkml/20200423161126.GD26021@zn.tnic/ Thanks! I see the patch in linux-next but not in mainline. I suppose we want it in v5.7 and backported to stable kernels so they can boot when built with gcc-10? I suppose the only reason that the other architectures don't run into the problem is that they don't call boot_init_stack_canary() in start_secondary() though they probably should? Arnd
On Wed, May 13, 2020 at 11:28:09PM +0200, Arnd Bergmann wrote: > I see the patch in linux-next but not in mainline. I suppose we want > it in v5.7 and backported to stable kernels so they can boot when > built with gcc-10? It is queued for 5.8. For a good reason, if you read the whole thread Arvind pointed you to. Lemme guess: gcc10 got released in the meantime (hohumm, website says so) and so we probably should expedite this and send it to Linus now...?
On Wed, May 13, 2020 at 11:41 PM Borislav Petkov <bp@suse.de> wrote: > > On Wed, May 13, 2020 at 11:28:09PM +0200, Arnd Bergmann wrote: > > I see the patch in linux-next but not in mainline. I suppose we want > > it in v5.7 and backported to stable kernels so they can boot when > > built with gcc-10? > > It is queued for 5.8. For a good reason, if you read the whole thread > Arvind pointed you to. > > Lemme guess: gcc10 got released in the meantime (hohumm, website says > so) and so we probably should expedite this and send it to Linus now...? Right, in particular since Linus started building with gcc-10 already and would likely soon run into that problem if he hasn't already ;-) Arnd
On Wed, May 13, 2020 at 11:49:49PM +0200, Arnd Bergmann wrote: > Right, in particular since Linus started building with gcc-10 already and > would likely soon run into that problem if he hasn't already ;-) Oh noo, we don't want Linus' kernel broken. ;-) We will send him the fix this weekend. Looking at that branch: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/build I think we can send the whole thing even. Linus, shout if you'd prefer only the last three commits there: 950a37078aa0 x86/build: Use $(CONFIG_SHELL) f670269a42bf x86: Fix early boot crash on gcc-10, next try 73da86741e7f x86/build: Check whether the compiler is sane but the other three from Masahiro are simple cleanups: 675a59b7dec6 x86/boot/build: Add phony targets in arch/x86/boot/Makefile to PHONY 30ce434e44d7 x86/boot/build: Make 'make bzlilo' not depend on vmlinux or $(obj)/bzImage e3c7c1052271 x86/boot/build: Add cpustr.h to targets and remove clean-files which should be ok to take now too AFAICT. Thx.
On Thu, May 14, 2020 at 12:20:38AM +0200, Borislav Petkov wrote: > On Wed, May 13, 2020 at 11:49:49PM +0200, Arnd Bergmann wrote: > > Right, in particular since Linus started building with gcc-10 already and > > would likely soon run into that problem if he hasn't already ;-) > > Oh noo, we don't want Linus' kernel broken. ;-) > > We will send him the fix this weekend. > > Looking at that branch: > > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/build > > I think we can send the whole thing even. > > Linus, shout if you'd prefer only the last three commits there: > > 950a37078aa0 x86/build: Use $(CONFIG_SHELL) > f670269a42bf x86: Fix early boot crash on gcc-10, next try > 73da86741e7f x86/build: Check whether the compiler is sane > > but the other three from Masahiro are simple cleanups: > > 675a59b7dec6 x86/boot/build: Add phony targets in arch/x86/boot/Makefile to PHONY > 30ce434e44d7 x86/boot/build: Make 'make bzlilo' not depend on vmlinux or $(obj)/bzImage > e3c7c1052271 x86/boot/build: Add cpustr.h to targets and remove clean-files > > which should be ok to take now too AFAICT. > > Thx. > > -- > Regards/Gruss, > Boris. > > SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg I notice that doesn't add it to start_kernel (init/main.c). That's not currently affected, but I think we should it add for consistency and future-proofing, at least for 5.8?
On Wed, May 13, 2020 at 2:50 PM Arnd Bergmann <arnd@arndb.de> wrote: > > Right, in particular since Linus started building with gcc-10 already and > would likely soon run into that problem if he hasn't already ;-) I don't happen to have stack canaries on the configs I actually boot, so I didn't notice. But yes, we should get this before 5.7 and mark it for stable, since F32 is now gcc-10, so people will start hitting it if they enable stack canaries. Linus
On Wed, May 13, 2020 at 3:20 PM Borislav Petkov <bp@suse.de> wrote: > > Linus, shout if you'd prefer only the last three commits there: > > 950a37078aa0 x86/build: Use $(CONFIG_SHELL) > f670269a42bf x86: Fix early boot crash on gcc-10, next try > 73da86741e7f x86/build: Check whether the compiler is sane Do we really need that sanity check? Are there known compilers that fail that check? Because honestly, that sounds unlikely to me to begin with, but if it does happen then that just means that the prevent_tail_call_optimization() thing is broken. The check itself doesn't seem worth it. If your worry is that an empty asm() can be optimized away, then don't use an empty asm! In other words, the only reason for that check seems to be a worry that simply isn't worth having. In fact, I think the check is wrong anyway, since the main thing I can see that would do a tailcall despite the empty asm is link-time optimizations that that check doesn't even check for! So everything I see there just screams "the check is bogus" to me. The check doesn't work, and if it were to work it only means that the prevent_tail_call_optimization() thing is too fragile. Just put a full memory barrier in there, with an actual "mfence" instruction or whatever, so that you know that the check is pointless, and so that you know that a link-time optimizer can't turn the call+return into a tailcall. Don't send me the broken check. Linus
On Wed, May 13, 2020 at 04:13:53PM -0700, Linus Torvalds wrote: > The check itself doesn't seem worth it. If your worry is that an empty > asm() can be optimized away, then don't use an empty asm! gcc guys said we should use that since the first attempt using __attribute__((optimize("-fno-stack-protector"))) didn't work because, well, that attribute turned out to be "not suitable in production code". :) Full thread here: https://lore.kernel.org/lkml/20200314164451.346497-1-slyfox@gentoo.org/ > In other words, the only reason for that check seems to be a worry > that simply isn't worth having. Yes, that was me asking for a way to check whether any future gccs would violate that. But if they'd do that, they would break a lot of code depending on it. > In fact, I think the check is wrong anyway, since the main thing I can > see that would do a tailcall despite the empty asm is link-time > optimizations that that check doesn't even check for! > > So everything I see there just screams "the check is bogus" to me. The > check doesn't work, and if it were to work it only means that the > prevent_tail_call_optimization() thing is too fragile. So I did test it trivially by removing the asm("") and then it would tailcall optimize. But we didn't think about LTO so hm, that would probably break it. > Just put a full memory barrier in there, with an actual "mfence" > instruction or whatever, so that you know that the check is pointless, > and so that you know that a link-time optimizer can't turn the > call+return into a tailcall. Right, the intention here was to have it arch-agnostic in include/linux/compiler.h because powerpc might need it too soon: arch/powerpc/kernel/smp.c:1296: boot_init_stack_canary(); Looking at them, they do have an mb() too so how about this then instead? #define prevent_tail_call_optimization() mb() Thx.
On Wed, May 13, 2020 at 4:36 PM Borislav Petkov <bp@suse.de> wrote: > > > Looking at them, they do have an mb() too so how about this then > instead? > > #define prevent_tail_call_optimization() mb() Yeah, I think a full mb() is likely safe, because that's pretty much always going to be a real instruction with real semantics, and no amount of link-time optimizations can move it around a call instruction. I could imagine some completely UP in-order CPU that doesn't need to serialize with anything at all, and even "mb()" might be empty. I think you can compile old ARM kernels for that. But realistically I think we can ignore them at least for now - I'm not sure the link-time optimization will even do things like that tailcall conversion, and I'm not convinced that old pre-ARMv7 systems will be relevant by the time (if) it ever does. Linus
On Wed, May 13, 2020 at 5:11 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, May 13, 2020 at 4:36 PM Borislav Petkov <bp@suse.de> wrote: > > > > > > Looking at them, they do have an mb() too so how about this then > > instead? > > > > #define prevent_tail_call_optimization() mb() > > Yeah, I think a full mb() is likely safe, because that's pretty much > always going to be a real instruction with real semantics, and no > amount of link-time optimizations can move it around a call > instruction. Are you sure LTO treats empty asm statements differently than full memory barriers in regards to preventing tail calls? (I'll take your word for it, I don't actually know, but seeing an example of real code run through a production compiler is much much more convincing). The TL;DR of the very long thread is that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94722 is a proper fix, on the GCC side. Adding arbitrary empty asm statements to work around it? Hacks. Full memory barriers? Hacks. I'm happy that GCC does an optimization that Clang does not. At the same time, it sucks to pay a penalty for a bug we don't trigger. This is the same reason why `asm_volatile_goto` expands differently between GCC and Clang (and why I tried to undo that like a year ago). If Clang realizes the same optimization GCC is doing here (related to tailcalls) tomorrow, well we already support __attribute__((no_stack_protector)) which can be added to the callees we don't want tail called in this case (i.e. allowing tail calls). I should send a patch adding that to include/linux/compiler_attributes.h and annotate the callees in question, before we forget about this issue. Sprinkling empty asm statements or full memory barriers should be treated with the same hesitancy as adding sleep()s to "work around" concurrency bugs. Red flag. And LTO is fun; we've been shipping it in Android for years (and need to attempt upstreaming again). Just today we found an ODR violation in one of the most important symbols in the kernel. Will be sending a patch for that tomorrow. > > I could imagine some completely UP in-order CPU that doesn't need to > serialize with anything at all, and even "mb()" might be empty. I > think you can compile old ARM kernels for that. But realistically I > think we can ignore them at least for now - I'm not sure the link-time > optimization will even do things like that tailcall conversion, and > I'm not convinced that old pre-ARMv7 systems will be relevant by the > time (if) it ever does. > > Linus
On Wed, May 13, 2020 at 5:51 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > Are you sure LTO treats empty asm statements differently than full > memory barriers in regards to preventing tail calls? It had better. At link-time, there is nothing left of an empty asm statement. So by the time the linker runs, it only sees call xyz ret in the object code. At that point, it's somewhat reasonable for any link-time optimizer (or an optimizing assembler, for that matter) to say "I'll just turn that sequence into a simple 'jmp xyz' instead". Now, don't get me wrong - I'm not convinced any existing LTO does that. But I'd also not be shocked by something like that. In contrast, if it's a real mb(), the linker won't see just a 'call+ret" sequence. It will see something like call xyz mfence ret (ok, the mfence may actually be something else, and we'll have a label on it and an alternatives table pointing to it, but the point is, unlike an empty asm, there's something _there_). Now, if the linker turns that 'call' into a 'jmp', the linker is just plain buggy. See the difference? > The TL;DR of the very long thread is that > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94722 is a proper fix, on > the GCC side. Adding arbitrary empty asm statements to work around > it? Hacks. Full memory barriers? Hacks. BS. A compiler person might call it a "hack". But said compiler person will be _wrong_. An intelligent developer knows that it will take years for compilers to give us all the infrastructure we need, and even then the compiler won't actually give us everything - and people will be using the old compilers for years anyway. That's why inline asm's exist. They are the escape from the excessive confines of "let's wait for the compiler person to solve this for us" - which they'll never do completely anyway. It's a bit like unsafe C type casts and allowing people to write "non-portable code". Some compiler people will say that it's bad, and unsafe. Sure, it can be unsafe, but the point is that it allows you to do things that aren't necessarily _possible_ to do in an overly restrictive language. Sometimes you need to break the rules. There's a reason everybody writes library routines in "unsafe" languages like C. Because you need those kinds of escapes in order to actually do something like a memory allocator etc. And that's also why we have inline asm - because the compiler will never know everything or be able to generate code for everything people wants to do. And anybody that _thinks_ that the compiler will always know better and should be in complete control shouldn't be working on compilers. They should probably be repeating kindergarten, sitting in a corner eating paste with their friends. So no. Using inline asms to work around the compiler not understanding what is going on isn't a "hack". It's the _point_ of inline asm. Is it perfect? No. But it's not like there are many alternatives. Linus
> On May 13, 2020, at 7:20 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, May 13, 2020 at 5:51 PM Nick Desaulniers > <ndesaulniers@google.com> wrote: >> >> Are you sure LTO treats empty asm statements differently than full >> memory barriers in regards to preventing tail calls? > > It had better. > > At link-time, there is nothing left of an empty asm statement. So by > the time the linker runs, it only sees > > call xyz > ret > > in the object code. At that point, it's somewhat reasonable for any > link-time optimizer (or an optimizing assembler, for that matter) to > say "I'll just turn that sequence into a simple 'jmp xyz' instead". > What, what? LTO isn’t a linker taking regular .o files full of regular machine code and optimizing it. That’s nuts. LTO takes an intermediate representation and optimizes *that*. This will contain actual indications that something is inline asm. If LTO starts rewriting inline asm, then I bet all kinds of things will go wrong and this is the least of our worries. Also, trying to do the kinds of stuff LTO does by looking at just machine code isn't going to work. So the difference between: asm volatile ("nop"); and asm volatile (""); will be, literally, the absence of the nop. (And alignment changes, etc.)
On Wed, May 13, 2020 at 09:52:07PM -0700, Linus Torvalds wrote: > On Wed, May 13, 2020, 20:50 Andy Lutomirski <luto@amacapital.net> wrote: > > > > > LTO isn’t a linker taking regular .o files full of regular machine > > code and optimizing it. That’s nuts. > > > > Yeah, you're right. I wear originally thinking just an optimizing > assembler, and then started thinking about link-time optimizations in that > sense, but it was wrong to then go from that to LTO which has a very > specific meaning. > > We do have assemblers that do some optimizations, but they tend to all be > at the single instruction level (eg things like turning "add $128" into > "sub $-128" which fits in a byte constant). > > Linus > > > The gcc docs [1,2] at least don't inspire much confidence that this will continue working with plain asm("") though: "Note that GCC’s optimizers can move asm statements relative to other code, including across jumps." ... "Note that the compiler can move even volatile asm instructions relative to other code, including across jump instructions." Even if we don't include an instruction in it I think it should at least have a memory clobber, to stop the compiler from deciding that it can be moved before the call so it can do the tail-call optimization. [1] https://gcc.gnu.org/onlinedocs/gcc/Basic-Asm.html#Basic-Asm [2] https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile
From: Linus Torvalds > Sent: 14 May 2020 03:20 > On Wed, May 13, 2020 at 5:51 PM Nick Desaulniers > <ndesaulniers@google.com> wrote: > > > > Are you sure LTO treats empty asm statements differently than full > > memory barriers in regards to preventing tail calls? > > It had better. > > At link-time, there is nothing left of an empty asm statement. So by > the time the linker runs, it only sees > > call xyz > ret > > in the object code. At that point, it's somewhat reasonable for any > link-time optimizer (or an optimizing assembler, for that matter) to > say "I'll just turn that sequence into a simple 'jmp xyz' instead". Except is sees: call xyz canary_check_code ret There's also almost certainly some stack frame tidyup. Which it would have to detect and convert. And, in principle, the function is allowed to access the stack space than contains the canary. > Now, don't get me wrong - I'm not convinced any existing LTO does > that. But I'd also not be shocked by something like that. > > In contrast, if it's a real mb(), the linker won't see just a > 'call+ret" sequence. It will see something like > > call xyz > mfence > ret > > (ok, the mfence may actually be something else, and we'll have a label > on it and an alternatives table pointing to it, but the point is, > unlike an empty asm, there's something _there_). Not if you've an architecture that doesn't have any memory barriers. In that case mb() may not even be asm(""). (although it might have to be asm ("":::memory)). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Thu, May 14, 2020 at 7:22 AM Arvind Sankar <nivedita@alum.mit.edu> wrote: > On Wed, May 13, 2020 at 09:52:07PM -0700, Linus Torvalds wrote: > > On Wed, May 13, 2020, 20:50 Andy Lutomirski <luto@amacapital.net> wrote: > The gcc docs [1,2] at least don't inspire much confidence that this will > continue working with plain asm("") though: > > "Note that GCC’s optimizers can move asm statements relative to other > code, including across jumps." > ... > "Note that the compiler can move even volatile asm instructions relative > to other code, including across jump instructions." > > Even if we don't include an instruction in it I think it should at least > have a memory clobber, to stop the compiler from deciding that it can be > moved before the call so it can do the tail-call optimization. I think LTO would still be able to notice that cpu_startup_entry() can be annotated __attribute__((noreturn)) and optimize the callers accordingly, which in turn would allow a tail call again after dead code elimination. Arnd
Kalle Valo [13.05.2020 17:31]:
> Great, so it's not a problem due to my setup.
I see the same thing on two machines, using a self-compiled gcc 10.1.0.
Glad to hear it's not just me. Switched back to 9.3.0 for the time being.
On Thu, May 14, 2020 at 10:40:44AM +0200, Arnd Bergmann wrote: > On Thu, May 14, 2020 at 7:22 AM Arvind Sankar <nivedita@alum.mit.edu> wrote: > > On Wed, May 13, 2020 at 09:52:07PM -0700, Linus Torvalds wrote: > > > On Wed, May 13, 2020, 20:50 Andy Lutomirski <luto@amacapital.net> wrote: > > The gcc docs [1,2] at least don't inspire much confidence that this will > > continue working with plain asm("") though: > > > > "Note that GCC’s optimizers can move asm statements relative to other > > code, including across jumps." > > ... > > "Note that the compiler can move even volatile asm instructions relative > > to other code, including across jump instructions." > > > > Even if we don't include an instruction in it I think it should at least > > have a memory clobber, to stop the compiler from deciding that it can be > > moved before the call so it can do the tail-call optimization. > > I think LTO would still be able to notice that cpu_startup_entry() can > be annotated __attribute__((noreturn)) and optimize the callers > accordingly, which in turn would allow a tail call again after dead code > elimination. > > Arnd Yes, with LTO the only solution is to actually compile the caller without stack checking I think. Although at present gcc actually doesn't tail-call optimize calls to noreturn functions that could easily change.
diff --git a/drivers/net/wireless/ath/ath10k/pci.h b/drivers/net/wireless/ath/ath10k/pci.h index e3cbd259a2dc..862d0901c5b8 100644 --- a/drivers/net/wireless/ath/ath10k/pci.h +++ b/drivers/net/wireless/ath/ath10k/pci.h @@ -178,15 +178,16 @@ struct ath10k_pci { */ u32 (*targ_cpu_to_ce_addr)(struct ath10k *ar, u32 addr); + struct ce_attr *attr; + struct ce_pipe_config *pipe_config; + struct ce_service_to_pipe *serv_to_pipe; + /* Keep this entry in the last, memory for struct ath10k_ahb is * allocated (ahb support enabled case) in the continuation of * this struct. */ - struct ath10k_ahb ahb[0]; + struct ath10k_ahb ahb[]; - struct ce_attr *attr; - struct ce_pipe_config *pipe_config; - struct ce_service_to_pipe *serv_to_pipe; }; static inline struct ath10k_pci *ath10k_pci_priv(struct ath10k *ar)
gcc-10 correctly points out a bug with a zero-length array in struct ath10k_pci: drivers/net/wireless/ath/ath10k/ahb.c: In function 'ath10k_ahb_remove': drivers/net/wireless/ath/ath10k/ahb.c:30:9: error: array subscript 0 is outside the bounds of an interior zero-length array 'struct ath10k_ahb[0]' [-Werror=zero-length-bounds] 30 | return &((struct ath10k_pci *)ar->drv_priv)->ahb[0]; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from drivers/net/wireless/ath/ath10k/ahb.c:13: drivers/net/wireless/ath/ath10k/pci.h:185:20: note: while referencing 'ahb' 185 | struct ath10k_ahb ahb[0]; | ^~~ The last addition to the struct ignored the comments and added new members behind the array that must remain last. Change it to a flexible-array member and move it last again to make it work correctly, prevent the same thing from happening again (all compilers warn about flexible-array members in the middle of a struct) and get it to build without warnings. Fixes: 521fc37be3d8 ("ath10k: Avoid override CE5 configuration for QCA99X0 chipsets") Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- drivers/net/wireless/ath/ath10k/pci.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)