Message ID | 20171220205213.1025257-1-arnd@arndb.de (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Herbert Xu |
Headers | show |
On 20 December 2017 at 20:52, Arnd Bergmann <arnd@arndb.de> wrote: > While testing other changes, I discovered that gcc-7.2.1 produces badly > optimized code for aes_encrypt/aes_decrypt. This is especially true when > CONFIG_UBSAN_SANITIZE_ALL is enabled, where it leads to extremely > large stack usage that in turn might cause kernel stack overflows: > > crypto/aes_generic.c: In function 'aes_encrypt': > crypto/aes_generic.c:1371:1: warning: the frame size of 4880 bytes is larger than 2048 bytes [-Wframe-larger-than=] > crypto/aes_generic.c: In function 'aes_decrypt': > crypto/aes_generic.c:1441:1: warning: the frame size of 4864 bytes is larger than 2048 bytes [-Wframe-larger-than=] > > I verified that this problem exists on all architectures that are > supported by gcc-7.2, though arm64 in particular is less affected than > the others. I also found that gcc-7.1 and gcc-8 do not show the extreme > stack usage but still produce worse code than earlier versions for this > file, apparently because of optimization passes that generally provide > a substantial improvement in object code quality but understandably fail > to find any shortcuts in the AES algorithm. > > Turning off the tree-pre and tree-sra optimization steps seems to > reverse the effect, and could be used as a workaround in case we > don't get a good gcc fix. > > Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 > Cc: Richard Biener <rguenther@suse.de> > Cc: Jakub Jelinek <jakub@gcc.gnu.org> > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > Jakub and Richard have done a more detailed analysis of this, and are > working on ways to improve the code again. In the meantime, I'm sending > this workaround to the Linux crypto maintainers to make them aware of > this issue and for testing. > > What would be a good way to test the performance of the AES code with > the various combinations of compiler versions, as well as UBSAN and this > patch enabled or disabled? You can use the tcrypt.ko module to benchmark AES. modprobe tcrypt mode=200 sec=1 to run a (lengthy) AES benchmark in various modes. AES-128 in ECB mode using the largest block size tested is what I usually use for comparison. On my Cortex-A57, the generic AES code runs at ~18 cycles per byte. Note that we have alternative scalar implementations on ARM and arm64 that are faster so the performance of aes-generic is not really relevant (and so it is essentially dead code) > --- > crypto/aes_generic.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c > index ca554d57d01e..35f973ba9878 100644 > --- a/crypto/aes_generic.c > +++ b/crypto/aes_generic.c > @@ -1331,6 +1331,20 @@ EXPORT_SYMBOL_GPL(crypto_aes_set_key); > f_rl(bo, bi, 3, k); \ > } while (0) > > +#if __GNUC__ >= 7 > +/* > + * Newer compilers try to optimize integer arithmetic more aggressively, > + * which generally improves code quality a lot, but in this specific case > + * ends up hurting more than it helps, in some configurations drastically > + * so. This turns off two optimization steps that have been shown to > + * lead to rather badly optimized code with gcc-7. > + * > + * See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 > + */ > +#pragma GCC optimize("-fno-tree-pre") > +#pragma GCC optimize("-fno-tree-sra") > +#endif > + > static void aes_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) > { > const struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); > -- > 2.9.0 >
On Wed, Dec 20, 2017 at 10:14 PM, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 20 December 2017 at 20:52, Arnd Bergmann <arnd@arndb.de> wrote: > > You can use the tcrypt.ko module to benchmark AES. > > modprobe tcrypt mode=200 sec=1 Ok, that's what I was looking for. I don't think I'll have time to analyze this before my Christmas break (I'm only here one more day, and have not set up a test environment for this) > to run a (lengthy) AES benchmark in various modes. AES-128 in ECB mode > using the largest block size tested is what I usually use for > comparison. > > On my Cortex-A57, the generic AES code runs at ~18 cycles per byte. > Note that we have alternative scalar implementations on ARM and arm64 > that are faster so the performance of aes-generic is not really > relevant (and so it is essentially dead code) Ok. arm64 is also the least affected by this problem out of all architectures, but it most architectures don't have an optimized implementation. Arnd
On Wed, Dec 20, 2017 at 09:52:05PM +0100, Arnd Bergmann wrote: > diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c > index ca554d57d01e..35f973ba9878 100644 > --- a/crypto/aes_generic.c > +++ b/crypto/aes_generic.c > @@ -1331,6 +1331,20 @@ EXPORT_SYMBOL_GPL(crypto_aes_set_key); > f_rl(bo, bi, 3, k); \ > } while (0) > > +#if __GNUC__ >= 7 > +/* > + * Newer compilers try to optimize integer arithmetic more aggressively, > + * which generally improves code quality a lot, but in this specific case > + * ends up hurting more than it helps, in some configurations drastically > + * so. This turns off two optimization steps that have been shown to > + * lead to rather badly optimized code with gcc-7. > + * > + * See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 > + */ > +#pragma GCC optimize("-fno-tree-pre") > +#pragma GCC optimize("-fno-tree-sra") So do it only when UBSAN is enabled? GCC doesn't have a particular predefined macro for those (only for asan and tsan), but either the kernel does have something already, or could have something added in the corresponding Makefile. Jakub
On Wed, Dec 20, 2017 at 10:46 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Wed, Dec 20, 2017 at 09:52:05PM +0100, Arnd Bergmann wrote: >> diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c >> index ca554d57d01e..35f973ba9878 100644 >> --- a/crypto/aes_generic.c >> +++ b/crypto/aes_generic.c >> @@ -1331,6 +1331,20 @@ EXPORT_SYMBOL_GPL(crypto_aes_set_key); >> f_rl(bo, bi, 3, k); \ >> } while (0) >> >> +#if __GNUC__ >= 7 >> +/* >> + * Newer compilers try to optimize integer arithmetic more aggressively, >> + * which generally improves code quality a lot, but in this specific case >> + * ends up hurting more than it helps, in some configurations drastically >> + * so. This turns off two optimization steps that have been shown to >> + * lead to rather badly optimized code with gcc-7. >> + * >> + * See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 >> + */ >> +#pragma GCC optimize("-fno-tree-pre") >> +#pragma GCC optimize("-fno-tree-sra") > > So do it only when UBSAN is enabled? GCC doesn't have a particular > predefined macro for those (only for asan and tsan), but either the kernel > does have something already, or could have something added in the > corresponding Makefile. My original interpretation of the resulting object code suggested that disabling those two optimizations produced better results for this particular file even without UBSAN, on both gcc-7 and gcc-8 (but not gcc-6), so my patch might have been better, but I did some measurements now as Ard suggested, showing cycles/byte for AES256/CBC with 8KB blocks: default ubsan patched patched+ubsan gcc-4.3.6 14.9 ---- 14.9 ---- gcc-4.6.4 15.0 ---- 15.8 ---- gcc-4.9.4 15.5 20.7 15.9 20.9 gcc-5.5.0 15.6 47.3 86.4 48.8 gcc-6.3.1 14.6 49.4 94.3 50.9 gcc-7.1.1 13.5 54.6 15.2 52.0 gcc-7.2.1 16.8 124.7 92.0 52.2 gcc-8.0.0 15.0 no boot 15.3 no boot I checked that there are actually three significant digits on the measurements, detailed output is available at https://pastebin.com/eFsWYjQp It seems that I was wrong about the interpretation that disabling the optimization would be a win on gcc-7 and higher, it almost always makes things worse even with UBSAN. Making that check "#if __GNUC__ == 7 && IS_ENABLED(CONFIG_UBSAN_SANITIZE_ALL)" would help here, or we could list the file as an exception for UBSAN and never sanitize it. Looking at the 'default' column, I wonder if anyone would be interested in looking at why the throughput regressed with gcc-7.2 and gcc-8. Arnd
On 21 December 2017 at 10:20, Arnd Bergmann <arnd@arndb.de> wrote: > On Wed, Dec 20, 2017 at 10:46 PM, Jakub Jelinek <jakub@redhat.com> wrote: >> On Wed, Dec 20, 2017 at 09:52:05PM +0100, Arnd Bergmann wrote: >>> diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c >>> index ca554d57d01e..35f973ba9878 100644 >>> --- a/crypto/aes_generic.c >>> +++ b/crypto/aes_generic.c >>> @@ -1331,6 +1331,20 @@ EXPORT_SYMBOL_GPL(crypto_aes_set_key); >>> f_rl(bo, bi, 3, k); \ >>> } while (0) >>> >>> +#if __GNUC__ >= 7 >>> +/* >>> + * Newer compilers try to optimize integer arithmetic more aggressively, >>> + * which generally improves code quality a lot, but in this specific case >>> + * ends up hurting more than it helps, in some configurations drastically >>> + * so. This turns off two optimization steps that have been shown to >>> + * lead to rather badly optimized code with gcc-7. >>> + * >>> + * See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 >>> + */ >>> +#pragma GCC optimize("-fno-tree-pre") >>> +#pragma GCC optimize("-fno-tree-sra") >> >> So do it only when UBSAN is enabled? GCC doesn't have a particular >> predefined macro for those (only for asan and tsan), but either the kernel >> does have something already, or could have something added in the >> corresponding Makefile. > > My original interpretation of the resulting object code suggested that disabling > those two optimizations produced better results for this particular > file even without > UBSAN, on both gcc-7 and gcc-8 (but not gcc-6), so my patch might have > been better, but I did some measurements now as Ard suggested, showing > cycles/byte for AES256/CBC with 8KB blocks: > > > default ubsan patched patched+ubsan > gcc-4.3.6 14.9 ---- 14.9 ---- > gcc-4.6.4 15.0 ---- 15.8 ---- > gcc-4.9.4 15.5 20.7 15.9 20.9 > gcc-5.5.0 15.6 47.3 86.4 48.8 > gcc-6.3.1 14.6 49.4 94.3 50.9 > gcc-7.1.1 13.5 54.6 15.2 52.0 > gcc-7.2.1 16.8 124.7 92.0 52.2 > gcc-8.0.0 15.0 no boot 15.3 no boot > > I checked that there are actually three significant digits on the measurements, > detailed output is available at https://pastebin.com/eFsWYjQp > > It seems that I was wrong about the interpretation that disabling > the optimization would be a win on gcc-7 and higher, it almost > always makes things worse even with UBSAN. Making that > check "#if __GNUC__ == 7 && IS_ENABLED(CONFIG_UBSAN_SANITIZE_ALL)" > would help here, or we could list the file as an exception for > UBSAN and never sanitize it. > > Looking at the 'default' column, I wonder if anyone would be interested > in looking at why the throughput regressed with gcc-7.2 and gcc-8. > Thanks for the elaborate benchmarks. Looking at the bugzilla entry, it appears the UBSAN code inserts runtime checks to ensure that certain u8 variables don't assume values <0 or >255, which seems like a rather pointless exercise to me. But even if it didn't, I think it is justified to disable UBSAN on all of the low-level cipher implementations, given that they are hardly ever modified, and typically don't suffer from the issues UBSAN tries to identify. So my vote is to disable UBSAN for all such cipher implementations: aes_generic, but also aes_ti, which has a similar 256 byte lookup table [although it does not seem to be affected by the same issue as aes_generic], and possibly others as well. Perhaps it makes sense to move core cipher code into a separate sub-directory, and disable UBSAN at the directory level? It would involve the following files crypto/aes_generic.c crypto/aes_ti.c crypto/anubis.c crypto/arc4.c crypto/blowfish_generic.c crypto/camellia_generic.c crypto/cast5_generic.c crypto/cast6_generic.c crypto/des_generic.c crypto/fcrypt.c crypto/khazad.c crypto/seed.c crypto/serpent_generic.c crypto/tea.c crypto/twofish_generic.c
Hi Ard, On 21 December 2017 at 17:52, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 21 December 2017 at 10:20, Arnd Bergmann <arnd@arndb.de> wrote: >> On Wed, Dec 20, 2017 at 10:46 PM, Jakub Jelinek <jakub@redhat.com> wrote: >>> On Wed, Dec 20, 2017 at 09:52:05PM +0100, Arnd Bergmann wrote: >>>> diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c >>>> index ca554d57d01e..35f973ba9878 100644 >>>> --- a/crypto/aes_generic.c >>>> +++ b/crypto/aes_generic.c >>>> @@ -1331,6 +1331,20 @@ EXPORT_SYMBOL_GPL(crypto_aes_set_key); >>>> f_rl(bo, bi, 3, k); \ >>>> } while (0) >>>> >>>> +#if __GNUC__ >= 7 >>>> +/* >>>> + * Newer compilers try to optimize integer arithmetic more aggressively, >>>> + * which generally improves code quality a lot, but in this specific case >>>> + * ends up hurting more than it helps, in some configurations drastically >>>> + * so. This turns off two optimization steps that have been shown to >>>> + * lead to rather badly optimized code with gcc-7. >>>> + * >>>> + * See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 >>>> + */ >>>> +#pragma GCC optimize("-fno-tree-pre") >>>> +#pragma GCC optimize("-fno-tree-sra") >>> >>> So do it only when UBSAN is enabled? GCC doesn't have a particular >>> predefined macro for those (only for asan and tsan), but either the kernel >>> does have something already, or could have something added in the >>> corresponding Makefile. >> >> My original interpretation of the resulting object code suggested that disabling >> those two optimizations produced better results for this particular >> file even without >> UBSAN, on both gcc-7 and gcc-8 (but not gcc-6), so my patch might have >> been better, but I did some measurements now as Ard suggested, showing >> cycles/byte for AES256/CBC with 8KB blocks: >> >> >> default ubsan patched patched+ubsan >> gcc-4.3.6 14.9 ---- 14.9 ---- >> gcc-4.6.4 15.0 ---- 15.8 ---- >> gcc-4.9.4 15.5 20.7 15.9 20.9 >> gcc-5.5.0 15.6 47.3 86.4 48.8 >> gcc-6.3.1 14.6 49.4 94.3 50.9 >> gcc-7.1.1 13.5 54.6 15.2 52.0 >> gcc-7.2.1 16.8 124.7 92.0 52.2 >> gcc-8.0.0 15.0 no boot 15.3 no boot >> >> I checked that there are actually three significant digits on the measurements, >> detailed output is available at https://pastebin.com/eFsWYjQp >> >> It seems that I was wrong about the interpretation that disabling >> the optimization would be a win on gcc-7 and higher, it almost >> always makes things worse even with UBSAN. Making that >> check "#if __GNUC__ == 7 && IS_ENABLED(CONFIG_UBSAN_SANITIZE_ALL)" >> would help here, or we could list the file as an exception for >> UBSAN and never sanitize it. >> >> Looking at the 'default' column, I wonder if anyone would be interested >> in looking at why the throughput regressed with gcc-7.2 and gcc-8. >> > > Thanks for the elaborate benchmarks. Looking at the bugzilla entry, it > appears the UBSAN code inserts runtime checks to ensure that certain > u8 variables don't assume values <0 or >255, which seems like a rather > pointless exercise to me. But even if it didn't, I think it is > justified to disable UBSAN on all of the low-level cipher > implementations, given that they are hardly ever modified, and > typically don't suffer from the issues UBSAN tries to identify. > > So my vote is to disable UBSAN for all such cipher implementations: > aes_generic, but also aes_ti, which has a similar 256 byte lookup > table [although it does not seem to be affected by the same issue as > aes_generic], and possibly others as well. > > Perhaps it makes sense to move core cipher code into a separate > sub-directory, and disable UBSAN at the directory level? > > It would involve the following files > > crypto/aes_generic.c > crypto/aes_ti.c > crypto/anubis.c > crypto/arc4.c > crypto/blowfish_generic.c > crypto/camellia_generic.c > crypto/cast5_generic.c > crypto/cast6_generic.c > crypto/des_generic.c > crypto/fcrypt.c > crypto/khazad.c > crypto/seed.c > crypto/serpent_generic.c > crypto/tea.c > crypto/twofish_generic.c As *SAN is enabled only on developer setup, is such a change required? Looks like I am missing something here. Can you explain what value it provides? Regards, PrasannaKumar
On 21 December 2017 at 13:47, PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com> wrote: > Hi Ard, > > On 21 December 2017 at 17:52, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >> On 21 December 2017 at 10:20, Arnd Bergmann <arnd@arndb.de> wrote: >>> On Wed, Dec 20, 2017 at 10:46 PM, Jakub Jelinek <jakub@redhat.com> wrote: >>>> On Wed, Dec 20, 2017 at 09:52:05PM +0100, Arnd Bergmann wrote: >>>>> diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c >>>>> index ca554d57d01e..35f973ba9878 100644 >>>>> --- a/crypto/aes_generic.c >>>>> +++ b/crypto/aes_generic.c >>>>> @@ -1331,6 +1331,20 @@ EXPORT_SYMBOL_GPL(crypto_aes_set_key); >>>>> f_rl(bo, bi, 3, k); \ >>>>> } while (0) >>>>> >>>>> +#if __GNUC__ >= 7 >>>>> +/* >>>>> + * Newer compilers try to optimize integer arithmetic more aggressively, >>>>> + * which generally improves code quality a lot, but in this specific case >>>>> + * ends up hurting more than it helps, in some configurations drastically >>>>> + * so. This turns off two optimization steps that have been shown to >>>>> + * lead to rather badly optimized code with gcc-7. >>>>> + * >>>>> + * See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 >>>>> + */ >>>>> +#pragma GCC optimize("-fno-tree-pre") >>>>> +#pragma GCC optimize("-fno-tree-sra") >>>> >>>> So do it only when UBSAN is enabled? GCC doesn't have a particular >>>> predefined macro for those (only for asan and tsan), but either the kernel >>>> does have something already, or could have something added in the >>>> corresponding Makefile. >>> >>> My original interpretation of the resulting object code suggested that disabling >>> those two optimizations produced better results for this particular >>> file even without >>> UBSAN, on both gcc-7 and gcc-8 (but not gcc-6), so my patch might have >>> been better, but I did some measurements now as Ard suggested, showing >>> cycles/byte for AES256/CBC with 8KB blocks: >>> >>> >>> default ubsan patched patched+ubsan >>> gcc-4.3.6 14.9 ---- 14.9 ---- >>> gcc-4.6.4 15.0 ---- 15.8 ---- >>> gcc-4.9.4 15.5 20.7 15.9 20.9 >>> gcc-5.5.0 15.6 47.3 86.4 48.8 >>> gcc-6.3.1 14.6 49.4 94.3 50.9 >>> gcc-7.1.1 13.5 54.6 15.2 52.0 >>> gcc-7.2.1 16.8 124.7 92.0 52.2 >>> gcc-8.0.0 15.0 no boot 15.3 no boot >>> >>> I checked that there are actually three significant digits on the measurements, >>> detailed output is available at https://pastebin.com/eFsWYjQp >>> >>> It seems that I was wrong about the interpretation that disabling >>> the optimization would be a win on gcc-7 and higher, it almost >>> always makes things worse even with UBSAN. Making that >>> check "#if __GNUC__ == 7 && IS_ENABLED(CONFIG_UBSAN_SANITIZE_ALL)" >>> would help here, or we could list the file as an exception for >>> UBSAN and never sanitize it. >>> >>> Looking at the 'default' column, I wonder if anyone would be interested >>> in looking at why the throughput regressed with gcc-7.2 and gcc-8. >>> >> >> Thanks for the elaborate benchmarks. Looking at the bugzilla entry, it >> appears the UBSAN code inserts runtime checks to ensure that certain >> u8 variables don't assume values <0 or >255, which seems like a rather >> pointless exercise to me. But even if it didn't, I think it is >> justified to disable UBSAN on all of the low-level cipher >> implementations, given that they are hardly ever modified, and >> typically don't suffer from the issues UBSAN tries to identify. >> >> So my vote is to disable UBSAN for all such cipher implementations: >> aes_generic, but also aes_ti, which has a similar 256 byte lookup >> table [although it does not seem to be affected by the same issue as >> aes_generic], and possibly others as well. >> >> Perhaps it makes sense to move core cipher code into a separate >> sub-directory, and disable UBSAN at the directory level? >> >> It would involve the following files >> >> crypto/aes_generic.c >> crypto/aes_ti.c >> crypto/anubis.c >> crypto/arc4.c >> crypto/blowfish_generic.c >> crypto/camellia_generic.c >> crypto/cast5_generic.c >> crypto/cast6_generic.c >> crypto/des_generic.c >> crypto/fcrypt.c >> crypto/khazad.c >> crypto/seed.c >> crypto/serpent_generic.c >> crypto/tea.c >> crypto/twofish_generic.c > > As *SAN is enabled only on developer setup, is such a change required? > Looks like I am missing something here. Can you explain what value it > provides? > Well, in this particular case, the value it provides is that the kernel can still boot and invoke the AES code without overflowing the kernel stack. Of course, this is a compiler issue that hopefully gets fixed, but I think it may be reasonable to exclude some C code from UBSAN by default.
On Fri, Dec 22, 2017 at 4:47 PM, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 21 December 2017 at 13:47, PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com> wrote: >> On 21 December 2017 at 17:52, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >>> On 21 December 2017 at 10:20, Arnd Bergmann <arnd@arndb.de> wrote: >>> >>> So my vote is to disable UBSAN for all such cipher implementations: >>> aes_generic, but also aes_ti, which has a similar 256 byte lookup >>> table [although it does not seem to be affected by the same issue as >>> aes_generic], and possibly others as well. >>> >>> Perhaps it makes sense to move core cipher code into a separate >>> sub-directory, and disable UBSAN at the directory level? >>> >>> It would involve the following files >>> >>> crypto/aes_generic.c >>> crypto/aes_ti.c >>> crypto/anubis.c >>> crypto/arc4.c >>> crypto/blowfish_generic.c >>> crypto/camellia_generic.c >>> crypto/cast5_generic.c >>> crypto/cast6_generic.c >>> crypto/des_generic.c >>> crypto/fcrypt.c >>> crypto/khazad.c >>> crypto/seed.c >>> crypto/serpent_generic.c >>> crypto/tea.c >>> crypto/twofish_generic.c >> >> As *SAN is enabled only on developer setup, is such a change required? >> Looks like I am missing something here. Can you explain what value it >> provides? >> > > Well, in this particular case, the value it provides is that the > kernel can still boot and invoke the AES code without overflowing the > kernel stack. Of course, this is a compiler issue that hopefully gets > fixed, but I think it may be reasonable to exclude some C code from > UBSAN by default. Any idea how to proceed here? I've retested with the latest gcc snapshot and verified that the problem is still there. No idea what the chance of getting it fixed before the 7.3 release is. From the performance tests I've done, the patch I posted is pretty much useless, it causes significant performance regressions on most other compiler versions. A minimal patch would be to disable UBSAN specifically for aes-generic.c for gcc-7.2+ but not gcc-8 to avoid the potential stack overflow. We could also force building with -Os on gcc-7, and leave UBSAN enabled, this would improve performance some 3-5% on x86 with gcc-7 (both 7.1 and 7.2.1) and avoid the stack overflow. For the performance regression in gcc-7.2.1 on this file, I've opened a separate gcc PR now, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651 I've also tested the libressl version of their generic AES code, with mixed results (it's appears to be much slower than the kernel version to start with, and while it has further performance regressions with recent compilers, those are with a different set of versions compared to the kernel implementation, and it does not suffer from the high stack usage). Arnd
On 3 January 2018 at 16:37, Arnd Bergmann <arnd@arndb.de> wrote: > On Fri, Dec 22, 2017 at 4:47 PM, Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: >> On 21 December 2017 at 13:47, PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com> wrote: >>> On 21 December 2017 at 17:52, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >>>> On 21 December 2017 at 10:20, Arnd Bergmann <arnd@arndb.de> wrote: >>>> >>>> So my vote is to disable UBSAN for all such cipher implementations: >>>> aes_generic, but also aes_ti, which has a similar 256 byte lookup >>>> table [although it does not seem to be affected by the same issue as >>>> aes_generic], and possibly others as well. >>>> >>>> Perhaps it makes sense to move core cipher code into a separate >>>> sub-directory, and disable UBSAN at the directory level? >>>> >>>> It would involve the following files >>>> >>>> crypto/aes_generic.c >>>> crypto/aes_ti.c >>>> crypto/anubis.c >>>> crypto/arc4.c >>>> crypto/blowfish_generic.c >>>> crypto/camellia_generic.c >>>> crypto/cast5_generic.c >>>> crypto/cast6_generic.c >>>> crypto/des_generic.c >>>> crypto/fcrypt.c >>>> crypto/khazad.c >>>> crypto/seed.c >>>> crypto/serpent_generic.c >>>> crypto/tea.c >>>> crypto/twofish_generic.c >>> >>> As *SAN is enabled only on developer setup, is such a change required? >>> Looks like I am missing something here. Can you explain what value it >>> provides? >>> >> >> Well, in this particular case, the value it provides is that the >> kernel can still boot and invoke the AES code without overflowing the >> kernel stack. Of course, this is a compiler issue that hopefully gets >> fixed, but I think it may be reasonable to exclude some C code from >> UBSAN by default. > > Any idea how to proceed here? I've retested with the latest gcc snapshot > and verified that the problem is still there. No idea what the chance of > getting it fixed before the 7.3 release is. From the performance tests > I've done, the patch I posted is pretty much useless, it causes significant > performance regressions on most other compiler versions. > > A minimal patch would be to disable UBSAN specifically for aes-generic.c > for gcc-7.2+ but not gcc-8 to avoid the potential stack overflow. We could > also force building with -Os on gcc-7, and leave UBSAN enabled, > this would improve performance some 3-5% on x86 with gcc-7 (both > 7.1 and 7.2.1) and avoid the stack overflow. > Can't we just disable UBSAN for that file for all GCC versions and be done with it? It is not a production feature, and that code is unlikely to change in ways where UBSAN would make a difference anyway, nor is it ever executed on 99.9% of systems running Linux. > For the performance regression in gcc-7.2.1 on this file, I've opened > a separate gcc PR now, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651 > I've also tested the libressl version of their generic AES code, with > mixed results (it's appears to be much slower than the kernel version > to start with, and while it has further performance regressions with recent > compilers, those are with a different set of versions compared to the > kernel implementation, and it does not suffer from the high stack usage). >
On Wed, Jan 3, 2018 at 6:36 PM, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 3 January 2018 at 16:37, Arnd Bergmann <arnd@arndb.de> wrote: >> On Fri, Dec 22, 2017 at 4:47 PM, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >> >> A minimal patch would be to disable UBSAN specifically for aes-generic.c >> for gcc-7.2+ but not gcc-8 to avoid the potential stack overflow. We could >> also force building with -Os on gcc-7, and leave UBSAN enabled, >> this would improve performance some 3-5% on x86 with gcc-7 (both >> 7.1 and 7.2.1) and avoid the stack overflow. >> > > Can't we just disable UBSAN for that file for all GCC versions and be > done with it? It is not a production feature, and that code is > unlikely to change in ways where UBSAN would make a difference anyway, > nor is it ever executed on 99.9% of systems running Linux. It's up to Herbert in the end. I'm leaning to the -Os change in the Makefile, since it improves the performance for the common case as well, and once we start disabling UBSAN for performance-critical files, it becomes a slippery slope. I'll send the patch with the -Os change as v2, and note the ubsan-disabling alternative in the changelog. Arnd
diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c index ca554d57d01e..35f973ba9878 100644 --- a/crypto/aes_generic.c +++ b/crypto/aes_generic.c @@ -1331,6 +1331,20 @@ EXPORT_SYMBOL_GPL(crypto_aes_set_key); f_rl(bo, bi, 3, k); \ } while (0) +#if __GNUC__ >= 7 +/* + * Newer compilers try to optimize integer arithmetic more aggressively, + * which generally improves code quality a lot, but in this specific case + * ends up hurting more than it helps, in some configurations drastically + * so. This turns off two optimization steps that have been shown to + * lead to rather badly optimized code with gcc-7. + * + * See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 + */ +#pragma GCC optimize("-fno-tree-pre") +#pragma GCC optimize("-fno-tree-sra") +#endif + static void aes_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) { const struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
While testing other changes, I discovered that gcc-7.2.1 produces badly optimized code for aes_encrypt/aes_decrypt. This is especially true when CONFIG_UBSAN_SANITIZE_ALL is enabled, where it leads to extremely large stack usage that in turn might cause kernel stack overflows: crypto/aes_generic.c: In function 'aes_encrypt': crypto/aes_generic.c:1371:1: warning: the frame size of 4880 bytes is larger than 2048 bytes [-Wframe-larger-than=] crypto/aes_generic.c: In function 'aes_decrypt': crypto/aes_generic.c:1441:1: warning: the frame size of 4864 bytes is larger than 2048 bytes [-Wframe-larger-than=] I verified that this problem exists on all architectures that are supported by gcc-7.2, though arm64 in particular is less affected than the others. I also found that gcc-7.1 and gcc-8 do not show the extreme stack usage but still produce worse code than earlier versions for this file, apparently because of optimization passes that generally provide a substantial improvement in object code quality but understandably fail to find any shortcuts in the AES algorithm. Turning off the tree-pre and tree-sra optimization steps seems to reverse the effect, and could be used as a workaround in case we don't get a good gcc fix. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 Cc: Richard Biener <rguenther@suse.de> Cc: Jakub Jelinek <jakub@gcc.gnu.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- Jakub and Richard have done a more detailed analysis of this, and are working on ways to improve the code again. In the meantime, I'm sending this workaround to the Linux crypto maintainers to make them aware of this issue and for testing. What would be a good way to test the performance of the AES code with the various combinations of compiler versions, as well as UBSAN and this patch enabled or disabled? --- crypto/aes_generic.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)