Message ID | 1470989957-23671-1-git-send-email-npiggin@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Aug 12, 2016 at 06:19:17PM +1000, Nicholas Piggin wrote: > This patch adds an option which defaults to "y" in cases where we > could possibly be running Cortex A8 and using Thumb2 instructions. > In reality the workaround might not be required at all for the kernel > if virtual instruction memory is linear in physical memory. Hmm. The main kernel image is guaranteed to be contiguous in physical memory for all sorts of reasons, so this really isn't a concern for the kernel itself. Modules, however, are a different matter, as they are mapped in using individual pages, and are most likely to be non-contiguous in physical memory. The kernel's module linker knows nothing about this errata, so it'll generally just fix up the relocations in the most basic of ways. So, I think we should always use this --no-fix-cortex-a8 option where the linker supports it irrespective of whether we're running on a core needing this workaround, but we probably need to fix the kernel module linker to know about this.
On Fri, 12 Aug 2016 13:33:14 +0100 Russell King - ARM Linux <linux@armlinux.org.uk> wrote: > On Fri, Aug 12, 2016 at 06:19:17PM +1000, Nicholas Piggin wrote: > > This patch adds an option which defaults to "y" in cases where we > > could possibly be running Cortex A8 and using Thumb2 instructions. > > In reality the workaround might not be required at all for the kernel > > if virtual instruction memory is linear in physical memory. > > Hmm. > > The main kernel image is guaranteed to be contiguous in physical memory > for all sorts of reasons, so this really isn't a concern for the kernel > itself. That's what it *seems* like. I wanted to be conservative because I don't know the architecture nor have actually looked at the errata docs. You can probably make stronger guarantees to avoid it. Perhaps enabling just for modules would be workable. > Modules, however, are a different matter, as they are mapped in using > individual pages, and are most likely to be non-contiguous in physical > memory. The kernel's module linker knows nothing about this errata, > so it'll generally just fix up the relocations in the most basic of > ways. > > So, I think we should always use this --no-fix-cortex-a8 option where > the linker supports it irrespective of whether we're running on a core > needing this workaround, but we probably need to fix the kernel module > linker to know about this. It looks like it would be a bit of work to go that route. The linker of course would not give you relocations or stubs for the branches you need them. Anyway do what you think best with the patch. It seems to eliminate the link time regression on ARM allyesconfig when using thin archives for building which is the main thing I was concerned about. Thanks, Nick
On 12 August 2016 at 15:15, Nicholas Piggin <npiggin@gmail.com> wrote: > On Fri, 12 Aug 2016 13:33:14 +0100 > Russell King - ARM Linux <linux@armlinux.org.uk> wrote: > >> On Fri, Aug 12, 2016 at 06:19:17PM +1000, Nicholas Piggin wrote: >> > This patch adds an option which defaults to "y" in cases where we >> > could possibly be running Cortex A8 and using Thumb2 instructions. >> > In reality the workaround might not be required at all for the kernel >> > if virtual instruction memory is linear in physical memory. >> >> Hmm. >> >> The main kernel image is guaranteed to be contiguous in physical memory >> for all sorts of reasons, so this really isn't a concern for the kernel >> itself. > > That's what it *seems* like. I wanted to be conservative because I don't > know the architecture nor have actually looked at the errata docs. You > can probably make stronger guarantees to avoid it. Perhaps enabling just > for modules would be workable. > > >> Modules, however, are a different matter, as they are mapped in using >> individual pages, and are most likely to be non-contiguous in physical >> memory. The kernel's module linker knows nothing about this errata, >> so it'll generally just fix up the relocations in the most basic of >> ways. >> >> So, I think we should always use this --no-fix-cortex-a8 option where >> the linker supports it irrespective of whether we're running on a core >> needing this workaround, but we probably need to fix the kernel module >> linker to know about this. > > It looks like it would be a bit of work to go that route. The linker of > course would not give you relocations or stubs for the branches you > need them. > We could enable CONFIG_ARM_MODULE_PLTS in this case, and force a branch via a PLT entry if an affected instruction is encountered. However, this only covers branch instructions that are covered by relocations, so we'd still need to scan the module .text to look for affected instructions whose targets has been resolved at compile time. Running this
On 12 August 2016 at 15:49, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 12 August 2016 at 15:15, Nicholas Piggin <npiggin@gmail.com> wrote: >> On Fri, 12 Aug 2016 13:33:14 +0100 >> Russell King - ARM Linux <linux@armlinux.org.uk> wrote: >> >>> On Fri, Aug 12, 2016 at 06:19:17PM +1000, Nicholas Piggin wrote: >>> > This patch adds an option which defaults to "y" in cases where we >>> > could possibly be running Cortex A8 and using Thumb2 instructions. >>> > In reality the workaround might not be required at all for the kernel >>> > if virtual instruction memory is linear in physical memory. >>> >>> Hmm. >>> >>> The main kernel image is guaranteed to be contiguous in physical memory >>> for all sorts of reasons, so this really isn't a concern for the kernel >>> itself. >> >> That's what it *seems* like. I wanted to be conservative because I don't >> know the architecture nor have actually looked at the errata docs. You >> can probably make stronger guarantees to avoid it. Perhaps enabling just >> for modules would be workable. >> >> >>> Modules, however, are a different matter, as they are mapped in using >>> individual pages, and are most likely to be non-contiguous in physical >>> memory. The kernel's module linker knows nothing about this errata, >>> so it'll generally just fix up the relocations in the most basic of >>> ways. >>> >>> So, I think we should always use this --no-fix-cortex-a8 option where >>> the linker supports it irrespective of whether we're running on a core >>> needing this workaround, but we probably need to fix the kernel module >>> linker to know about this. >> >> It looks like it would be a bit of work to go that route. The linker of >> course would not give you relocations or stubs for the branches you >> need them. >> > > We could enable CONFIG_ARM_MODULE_PLTS in this case, and force a > branch via a PLT entry if an affected instruction is encountered. > However, this only covers branch instructions that are covered by > relocations, so we'd still need to scan the module .text to look for > affected instructions whose targets has been resolved at compile time. > > Running this $ objdump -dr vmlinux |grep -A1 -E \\sb\.w |less I get numerous instances of b.w that are not covered by any relocations, so i assume that will be the case for modules as well.
On Fri, Aug 12, 2016 at 03:49:15PM +0200, Ard Biesheuvel wrote: > We could enable CONFIG_ARM_MODULE_PLTS in this case, and force a > branch via a PLT entry if an affected instruction is encountered. > However, this only covers branch instructions that are covered by > relocations, so we'd still need to scan the module .text to look for > affected instructions whose targets has been resolved at compile time. Only if it's combined with detecting which regions of the .text section are really instructions - just looking for something that appears to be a branch instruction will be unreliable as the .text contains literal data, and literal data could look like a branch instruction.
On Fri, Aug 12, 2016 at 03:50:17PM +0200, Ard Biesheuvel wrote: > $ objdump -dr vmlinux |grep -A1 -E \\sb\.w |less > > I get numerous instances of b.w that are not covered by any > relocations, so i assume that will be the case for modules as well. Not surprising. vmlinux is fully linked. There's no relocations.
On 12 August 2016 at 15:52, Russell King - ARM Linux <linux@armlinux.org.uk> wrote: > On Fri, Aug 12, 2016 at 03:50:17PM +0200, Ard Biesheuvel wrote: >> $ objdump -dr vmlinux |grep -A1 -E \\sb\.w |less >> >> I get numerous instances of b.w that are not covered by any >> relocations, so i assume that will be the case for modules as well. > > Not surprising. vmlinux is fully linked. There's no relocations. > My bad. It does if you link it with --emit-relocs. Random snippet: c036b098: f7ff bff8 b.w c036b08c <rcu_barrier> c036b098: R_ARM_THM_JUMP24 rcu_barrier -- c036b5ae: f7ff ba5b b.w c036aa68 <rcu_gp_kthread_wake> c036b5b2: bf00 nop -- c036bc28: f1c6 b6da b.w c09329e0 <_raw_spin_unlock_irqrestore> c036bc28: R_ARM_THM_JUMP24 _raw_spin_unlock_irqrestore -- c036bcbc: f7ee bc2e b.w c035a51c <swake_up> c036bcbc: R_ARM_THM_JUMP24 swake_up -- c036c63a: f7ff bfd5 b.w c036c5e8 <synchronize_sched> c036c63a: R_ARM_THM_JUMP24 synchronize_sched -- c036c640: f7ff bff0 b.w c036c624 <cond_synchronize_sched> c036c640: R_ARM_THM_JUMP24 cond_synchronize_sched -- c036c644: f7ff bb3e b.w c036bcc4 <synchronize_sched_expedited> c036c644: R_ARM_THM_JUMP24 synchronize_sched_expedited -- c036c690: f7dc ba0a b.w c0348aa8 <resched_cpu> c036c690: R_ARM_THM_JUMP24 resched_cpu -- c036c6a2: f7ff baad b.w c036bc00 <rcu_report_exp_cpu_mult.constprop.22> c036c6a6: bf00 nop -- c036c716: f7ff ba73 b.w c036bc00 <rcu_report_exp_cpu_mult.constprop.22> c036c71a: bf00 nop -- c036cf3a: f7fd bc4d b.w c036a7d8 <rcu_eqs_enter_common> c036cf3e: bf00 nop So some branches are relocated, some have already been resolved at compile time.
On 12/08/16 13:33, Russell King - ARM Linux wrote: > On Fri, Aug 12, 2016 at 06:19:17PM +1000, Nicholas Piggin wrote: >> This patch adds an option which defaults to "y" in cases where we >> could possibly be running Cortex A8 and using Thumb2 instructions. >> In reality the workaround might not be required at all for the kernel >> if virtual instruction memory is linear in physical memory. > > Hmm. > > The main kernel image is guaranteed to be contiguous in physical memory > for all sorts of reasons, so this really isn't a concern for the kernel > itself. I'm not sure being contiguous matters much - looking at the errata doc, the implication is that the branch is supposed to use bits 31:12 of the address of the first page, but under the erratum conditions ends up taking bits 31:12 of the address of the _second_ page instead. There doesn't seem to be any importance of where those pages actually are relative to each other. > Modules, however, are a different matter, as they are mapped in using > individual pages, and are most likely to be non-contiguous in physical > memory. The kernel's module linker knows nothing about this errata, > so it'll generally just fix up the relocations in the most basic of > ways. > > So, I think we should always use this --no-fix-cortex-a8 option where > the linker supports it irrespective of whether we're running on a core > needing this workaround, but we probably need to fix the kernel module > linker to know about this. Given the above, I'm not convinced that sounds safe, but then I can't claim to have fist-hand experience with this bug either. Robin.
On Fri, Aug 12, 2016 at 03:17:06PM +0100, Robin Murphy wrote: > On 12/08/16 13:33, Russell King - ARM Linux wrote: > > On Fri, Aug 12, 2016 at 06:19:17PM +1000, Nicholas Piggin wrote: > >> This patch adds an option which defaults to "y" in cases where we > >> could possibly be running Cortex A8 and using Thumb2 instructions. > >> In reality the workaround might not be required at all for the kernel > >> if virtual instruction memory is linear in physical memory. > > > > Hmm. > > > > The main kernel image is guaranteed to be contiguous in physical memory > > for all sorts of reasons, so this really isn't a concern for the kernel > > itself. > > I'm not sure being contiguous matters much - looking at the errata doc, > the implication is that the branch is supposed to use bits 31:12 of the > address of the first page, but under the erratum conditions ends up > taking bits 31:12 of the address of the _second_ page instead. There > doesn't seem to be any importance of where those pages actually are > relative to each other. I've not actually looked at the errata document - I need to jump through all sorts of stupid hoops to get it through the ARM website. Ever since I requested a change of my email address, it now wants all sorts of personal information that I'm refusing to type in again. I've no idea why ARM Ltd wiped out all that information just because I asked for my email address to be changed.
On Fri, Aug 12, 2016 at 03:23:25PM +0100, Russell King - ARM Linux wrote: > I've not actually looked at the errata document - I need to jump through > all sorts of stupid hoops to get it through the ARM website. Ever since > I requested a change of my email address, it now wants all sorts of > personal information that I'm refusing to type in again. I've no idea > why ARM Ltd wiped out all that information just because I asked for my > email address to be changed. ... oh, and it's forgotten that I'm supposed to be able to access that information too - it wants me to apply for approval to access errata documents. Maybe someone can send the appropriate document(s) my way instead.
On Friday, August 12, 2016 6:19:17 PM CEST Nicholas Piggin wrote: > Erratum 657417 is worked around by the linker by inserting additional > branch trampolines to avoid problematic branch target locations. This > results in much higher linking time and presumably slower and larger > generated code. The workaround also seems to only be required when > linking thumb2 code, but the linker applies it for non-thumb2 code as > well. > > The workaround today is left to the linker to apply, which is overly > conservative. > > https://sourceware.org/ml/binutils/2009-05/msg00297.html > > This patch adds an option which defaults to "y" in cases where we > could possibly be running Cortex A8 and using Thumb2 instructions. > In reality the workaround might not be required at all for the kernel > if virtual instruction memory is linear in physical memory. However it > is more conservative to keep the workaround, and it may be the case > that the TLB lookup would be required in order to catch branches to > unmapped or no-execute pages. > > In an allyesconfig build, this workaround causes a large load on > the linker's branch stub hash and slows down the final link by a > factor of 5. > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > Thanks a lot for finding this issue. I can confirm that your patch helps noticeably in all configurations, reducing time for a relink from 18 minutes to 9 minutes on my machine in the best case, but the factor 10 slowdown of the final link with your thin archives and gc-sections patches remains. I suspect there is still something else going on besides the 657417 slowing things down, but it's also possible that I'm doing something wrong here. Aside from that, I notice that for the purpose of speeding up "allyesconfig", we don't actually need to make this user configurable, it's sufficient to disable the workaround when CONFIG_THUMB2_KERNEL is disabled, which is what allyesconfig and all the defconfig files (but not randconfig) use. I also found that using THUMB2_KERNEL itself causes a 50% slowdown. I have patches on my "randconfig" test tree that have the side-effect of enabling THUMB2_KERNEL for allyesconfig, which is one reason I have been getting worse results than others. I could also try to revive an older patch I started, to annotate the specific CPU core on each ARMv7 platform. I think I have all the information we need for that, and there are other advantages in doing it: we could be more selective with all the ARMv7 errata, and automatically determine whether some optional CPU features (LPAE, virtualization, integer divide) are available on all of the selected CPU cores. Arnd --- Full link timing results follow || THUMB2, thin archive + gc-sections, before: 18 minutes 09:56:47 LINK vmlinux 09:56:47 AR built-in.o 09:56:49 LD vmlinux.o 10:04:27 MODPOST vmlinux.o 10:04:29 GEN .version 10:04:29 CHK include/generated/compile.h UPD include/generated/compile.h 10:04:29 CC init/version.o 10:04:29 AR init/built-in.o 10:07:39 KSYM .tmp_kallsyms1.o 10:11:05 KSYM .tmp_kallsyms2.o 10:11:16 LD vmlinux 10:14:30 SORTEX vmlinux 10:14:30 SYSMAP System.map || THUMB2, thin archive + gc-sections, after: 9 minutes 10:16:01 CHK include/generated/uapi/linux/version.h 10:16:02 LINK vmlinux 10:16:02 AR built-in.o 10:16:03 LD vmlinux.o 10:23:43 MODPOST vmlinux.o 10:23:46 GEN .version 10:23:46 CHK include/generated/compile.h UPD include/generated/compile.h 10:23:46 CC init/version.o 10:23:47 AR init/built-in.o 10:24:04 KSYM .tmp_kallsyms1.o 10:24:32 KSYM .tmp_kallsyms2.o 10:24:45 LD vmlinux 10:25:00 SORTEX vmlinux 10:25:00 SYSMAP System.map || THUMB2, no thin archive + gc-sections, before: 93 seconds 10:44:35 CHK include/generated/uapi/linux/version.h 10:44:35 LINK vmlinux 10:44:35 LD vmlinux.o 10:44:39 MODPOST vmlinux.o 10:44:41 GEN .version 10:44:41 CHK include/generated/compile.h UPD include/generated/compile.h 10:44:41 CC init/version.o 10:44:41 LD init/built-in.o 10:45:02 KSYM .tmp_kallsyms1.o 10:45:35 KSYM .tmp_kallsyms2.o 10:45:47 LD vmlinux 10:46:06 SORTEX vmlinux 10:46:06 SYSMAP System.map 10:46:08 OBJCOPY arch/arm/boot/Image || THUMB2, no thin archive + gc-sections, after: 52 seconds 10:41:46 LINK vmlinux 10:41:46 LD vmlinux.o 10:41:49 MODPOST vmlinux.o 10:41:52 GEN .version 10:41:52 CHK include/generated/compile.h UPD include/generated/compile.h 10:41:52 CC init/version.o 10:41:52 LD init/built-in.o 10:41:58 KSYM .tmp_kallsyms1.o 10:42:17 KSYM .tmp_kallsyms2.o 10:42:31 LD vmlinux 10:42:36 SORTEX vmlinux 10:42:36 SYSMAP System.map 10:42:38 OBJCOPY arch/arm/boot/Image || THUMB2_KERNEL disabled, no thin archives + gc-sections, before: 59 seconds 11:25:05 LINK vmlinux 11:25:05 LD vmlinux.o 11:25:07 MODPOST vmlinux.o 11:25:10 GEN .version 11:25:10 CHK include/generated/compile.h UPD include/generated/compile.h 11:25:10 CC init/version.o 11:25:10 LD init/built-in.o 11:25:19 KSYM .tmp_kallsyms1.o 11:25:41 KSYM .tmp_kallsyms2.o 11:25:53 LD vmlinux 11:26:03 SORTEX vmlinux 11:26:03 SYSMAP System.map Building modules, stage 2. || THUMB2_KERNEL disabled, no thin archives + gc-sections, after: 46 seconds 11:27:36 LINK vmlinux 11:27:36 LD vmlinux.o 11:27:39 MODPOST vmlinux.o 11:27:41 GEN .version 11:27:41 CHK include/generated/compile.h UPD include/generated/compile.h 11:27:41 CC init/version.o 11:27:41 LD init/built-in.o 11:27:46 KSYM .tmp_kallsyms1.o 11:28:04 KSYM .tmp_kallsyms2.o 11:28:15 LD vmlinux 11:28:20 SORTEX vmlinux 11:28:20 SYSMAP System.map 11:28:22 OBJCOPY arch/arm/boot/Image || THUMB2_KERNEL disabled, thin archives+gc-sections, before: 12 minutes 13:18:39 LINK vmlinux 13:18:39 AR built-in.o 13:18:40 LD vmlinux.o 13:24:44 MODPOST vmlinux.o 13:24:46 GEN .version 13:24:46 CHK include/generated/compile.h UPD include/generated/compile.h 13:24:46 CC init/version.o 13:24:46 AR init/built-in.o 13:26:34 KSYM .tmp_kallsyms1.o 13:28:32 KSYM .tmp_kallsyms2.o 13:28:43 LD vmlinux 13:30:31 SORTEX vmlinux 13:30:31 SYSMAP System.map 13:30:33 OBJCOPY arch/arm/boot/Image || THUMB2_KERNEL disabled, thin archives+gc-sections, after: 7 minutes 12:43:15 LINK vmlinux 12:43:15 AR built-in.o 12:43:16 LD vmlinux.o 12:49:19 MODPOST vmlinux.o 12:49:21 GEN .version 12:49:21 CHK include/generated/compile.h UPD include/generated/compile.h 12:49:22 CC init/version.o 12:49:22 AR init/built-in.o 12:49:33 KSYM .tmp_kallsyms1.o 12:49:56 KSYM .tmp_kallsyms2.o 12:50:07 LD vmlinux 12:50:19 SORTEX vmlinux 12:50:19 SYSMAP System.map 12:50:21 OBJCOPY arch/arm/boot/Image Building modules, stage 2.
On Tue, 23 Aug 2016 14:01:29 +0200 Arnd Bergmann <arnd@arndb.de> wrote: > On Friday, August 12, 2016 6:19:17 PM CEST Nicholas Piggin wrote: > > Erratum 657417 is worked around by the linker by inserting additional > > branch trampolines to avoid problematic branch target locations. This > > results in much higher linking time and presumably slower and larger > > generated code. The workaround also seems to only be required when > > linking thumb2 code, but the linker applies it for non-thumb2 code as > > well. > > > > The workaround today is left to the linker to apply, which is overly > > conservative. > > > > https://sourceware.org/ml/binutils/2009-05/msg00297.html > > > > This patch adds an option which defaults to "y" in cases where we > > could possibly be running Cortex A8 and using Thumb2 instructions. > > In reality the workaround might not be required at all for the kernel > > if virtual instruction memory is linear in physical memory. However it > > is more conservative to keep the workaround, and it may be the case > > that the TLB lookup would be required in order to catch branches to > > unmapped or no-execute pages. > > > > In an allyesconfig build, this workaround causes a large load on > > the linker's branch stub hash and slows down the final link by a > > factor of 5. > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > > > > Thanks a lot for finding this issue. I can confirm that your patch > helps noticeably in all configurations, reducing time for a relink > from 18 minutes to 9 minutes on my machine in the best case, but > the factor 10 slowdown of the final link with your thin archives > and gc-sections patches remains. > > I suspect there is still something else going on besides the 657417 > slowing things down, but it's also possible that I'm doing something > wrong here. Okay, I was only testing thin archives. gc-sections I didn't look at. With thin archives, one final arm allyesconfig link with this patch is not showing a regression. gc-sections must be causing something else ARM specific, because powerpc seems to link fast with gc-sections. Can you send your latest ARM patch to enable this and I'll have a look at it? > Aside from that, I notice that for the purpose of speeding up > "allyesconfig", we don't actually need to make this user > configurable, it's sufficient to disable the workaround when > CONFIG_THUMB2_KERNEL is disabled, which is what allyesconfig > and all the defconfig files (but not randconfig) use. I also found > that using THUMB2_KERNEL itself causes a 50% slowdown. I have > patches on my "randconfig" test tree that have the side-effect > of enabling THUMB2_KERNEL for allyesconfig, which is one reason > I have been getting worse results than others. > > I could also try to revive an older patch I started, to annotate > the specific CPU core on each ARMv7 platform. I think I have all the > information we need for that, and there are other advantages in > doing it: we could be more selective with all the ARMv7 errata, > and automatically determine whether some optional CPU features > (LPAE, virtualization, integer divide) are available on all > of the selected CPU cores. Yeah I was just trying to follow existing pattern and be conservative with the workaround. From my brief look, it does seem like there is room to optimize build options by having more fine grained selection of target CPU/platform. Thanks, Nick
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 90542db..3c7dde1 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1033,6 +1033,20 @@ config ARM_ERRATA_460075 ACTLR register. Note that setting specific bits in the ACTLR register may not be available in non-secure mode. +config ARM_ERRATA_657417 + bool "ARM errata: A 32-bit branch instruction that spans two 4K regions can result in an incorrect operation" + depends on CPU_V7 + depends on THUMB2_KERNEL + default y + help + This option enables the workaround for the 657417 Cortex-A8 erratum. + If, while executing code in Thumb or ThumbEE state, a 32-bit Thumb-2 + branch instruction is executed that spans two 4KB regions, and the + target address of the branch falls within the first region, it is + possible for the processor to behave incorrectly. This workaround + enables a linker workaround that adds branch trampolines that bounce + offending branches via a safe location. + config ARM_ERRATA_742230 bool "ARM errata: DMB operation may be faulty" depends on CPU_V7 && SMP diff --git a/arch/arm/Makefile b/arch/arm/Makefile index 274e8a6..b49a2e0 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -43,6 +43,13 @@ ifeq ($(CONFIG_FRAME_POINTER),y) KBUILD_CFLAGS +=-fno-omit-frame-pointer -mapcs -mno-sched-prolog endif +ifneq ($(CONFIG_ARM_ERRATA_657417),y) +# ld-option has to run before we override LD otherwise it fails on +# cross compile with mismatched endian (C compiler outputs one endian, +# LD accepts another) +LDFLAGS +=$(call ld-option, --no-fix-cortex-a8,) +endif + ifeq ($(CONFIG_CPU_BIG_ENDIAN),y) KBUILD_CPPFLAGS += -mbig-endian AS += -EB
Erratum 657417 is worked around by the linker by inserting additional branch trampolines to avoid problematic branch target locations. This results in much higher linking time and presumably slower and larger generated code. The workaround also seems to only be required when linking thumb2 code, but the linker applies it for non-thumb2 code as well. The workaround today is left to the linker to apply, which is overly conservative. https://sourceware.org/ml/binutils/2009-05/msg00297.html This patch adds an option which defaults to "y" in cases where we could possibly be running Cortex A8 and using Thumb2 instructions. In reality the workaround might not be required at all for the kernel if virtual instruction memory is linear in physical memory. However it is more conservative to keep the workaround, and it may be the case that the TLB lookup would be required in order to catch branches to unmapped or no-execute pages. In an allyesconfig build, this workaround causes a large load on the linker's branch stub hash and slows down the final link by a factor of 5. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- arch/arm/Kconfig | 14 ++++++++++++++ arch/arm/Makefile | 7 +++++++ 2 files changed, 21 insertions(+)