Message ID | 20181213172036.14504-1-ard.biesheuvel@linaro.org (mailing list archive) |
---|---|
Headers | show |
Series | arm64: kpti: use nG mappings unless KPTI is force disabled | expand |
Hi Ard, On Thu, Dec 13, 2018 at 06:20:31PM +0100, Ard Biesheuvel wrote: > John reports [0] that recent changes to the way the linear region is mapped > is causing ~30 seconds stalls on boot, and this turns out to be due to the > way we remap the linear space with nG mappings when enabling KPTI, which > occurs with the MMU and caches off. Recent kernels map the linear region down > to pages by default, so that restricted permissions can be applied at page > granularity (so that module .text and .rodata are no longer writable via > the linear map), and so the number of page table entries that require > updating from G to nG is much higher, resulting in the observed delays. > > This series refactors the logic that tests whether KPTI should be enabled > so that the conditions that apply regardless of CPU identification are > evaluated early, allowing us to create the mappings of the kernel and > the linear region with nG attributes from the outset. > > Patches #1 to #3 implement this so that we only create these nG mappings > if KPTI is force enabled. > > As a followup, #4 may be applied, which inverts the logic so that nG mappings > are always created, unless KPTI is forced off. This allows us to get rid > of the slow and complex asm routines that make this change later on, which > is where the boot delays occur. I think I'd like to keep those for now, since it would be nice to use global mappings for the vast majority of CPUs that are not affected by kpti. I had a crack at the problem and ended up with a slightly simpler diff below. Will --->8 diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h index 7689c7aa1d77..feb05f7b46b1 100644 --- a/arch/arm64/include/asm/mmu.h +++ b/arch/arm64/include/asm/mmu.h @@ -16,6 +16,8 @@ #ifndef __ASM_MMU_H #define __ASM_MMU_H +#include <asm/cputype.h> + #define MMCF_AARCH32 0x1 /* mm context flag for AArch32 executables */ #define USER_ASID_BIT 48 #define USER_ASID_FLAG (UL(1) << USER_ASID_BIT) @@ -44,6 +46,28 @@ static inline bool arm64_kernel_unmapped_at_el0(void) cpus_have_const_cap(ARM64_UNMAP_KERNEL_AT_EL0); } +static inline bool arm64_kernel_use_ng_mappings(void) +{ + bool tx1_bug = false; + + if (!IS_ENABLED(CONFIG_UNMAP_KERNEL_AT_EL0)) + return false; + + if (!IS_ENABLED(CONFIG_RANDOMIZE_BASE)) + return arm64_kernel_unmapped_at_el0(); + + if (IS_ENABLED(CONFIG_CAVIUM_ERRATUM_27456)) { + extern const struct midr_range cavium_erratum_27456_cpus[]; + + tx1_bug = static_branch_likely(&arm64_const_caps_ready) ? + cpus_have_const_cap(ARM64_WORKAROUND_CAVIUM_27456) : + is_midr_in_range_list(read_cpuid_id(), + cavium_erratum_27456_cpus); + } + + return !tx1_bug; +} + typedef void (*bp_hardening_cb_t)(void); struct bp_hardening_data { diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h index 78b942c1bea4..986e41c4c32b 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -37,8 +37,8 @@ #define _PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED) #define _PROT_SECT_DEFAULT (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S) -#define PTE_MAYBE_NG (arm64_kernel_unmapped_at_el0() ? PTE_NG : 0) -#define PMD_MAYBE_NG (arm64_kernel_unmapped_at_el0() ? PMD_SECT_NG : 0) +#define PTE_MAYBE_NG (arm64_kernel_use_ng_mappings() ? PTE_NG : 0) +#define PMD_MAYBE_NG (arm64_kernel_use_ng_mappings() ? PMD_SECT_NG : 0) #define PROT_DEFAULT (_PROT_DEFAULT | PTE_MAYBE_NG) #define PROT_SECT_DEFAULT (_PROT_SECT_DEFAULT | PMD_MAYBE_NG) diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index ff2fda3a98e1..2c42f57cf529 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -539,7 +539,7 @@ static const struct midr_range arm64_harden_el2_vectors[] = { #endif #ifdef CONFIG_CAVIUM_ERRATUM_27456 -static const struct midr_range cavium_erratum_27456_cpus[] = { +const struct midr_range cavium_erratum_27456_cpus[] = { /* Cavium ThunderX, T88 pass 1.x - 2.1 */ MIDR_RANGE(MIDR_THUNDERX, 0, 0, 1, 1), /* Cavium ThunderX, T81 pass 1.0 */ diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 7ea0b2f20262..c68f6ad6722f 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1006,6 +1006,9 @@ kpti_install_ng_mappings(const struct arm64_cpu_capabilities *__unused) if (kpti_applied) return; + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && __kpti_forced <= 0) + return; + remap_fn = (void *)__pa_symbol(idmap_kpti_install_ng_mappings); cpu_install_idmap();
On Thu, 13 Dec 2018 at 19:05, Will Deacon <will.deacon@arm.com> wrote: > > Hi Ard, > > On Thu, Dec 13, 2018 at 06:20:31PM +0100, Ard Biesheuvel wrote: > > John reports [0] that recent changes to the way the linear region is mapped > > is causing ~30 seconds stalls on boot, and this turns out to be due to the > > way we remap the linear space with nG mappings when enabling KPTI, which > > occurs with the MMU and caches off. Recent kernels map the linear region down > > to pages by default, so that restricted permissions can be applied at page > > granularity (so that module .text and .rodata are no longer writable via > > the linear map), and so the number of page table entries that require > > updating from G to nG is much higher, resulting in the observed delays. > > > > This series refactors the logic that tests whether KPTI should be enabled > > so that the conditions that apply regardless of CPU identification are > > evaluated early, allowing us to create the mappings of the kernel and > > the linear region with nG attributes from the outset. > > > > Patches #1 to #3 implement this so that we only create these nG mappings > > if KPTI is force enabled. > > > > As a followup, #4 may be applied, which inverts the logic so that nG mappings > > are always created, unless KPTI is forced off. This allows us to get rid > > of the slow and complex asm routines that make this change later on, which > > is where the boot delays occur. > > I think I'd like to keep those for now, since it would be nice to use global > mappings for the vast majority of CPUs that are not affected by kpti. > > I had a crack at the problem and ended up with a slightly simpler diff > below. > > Will > > --->8 > > diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h > index 7689c7aa1d77..feb05f7b46b1 100644 > --- a/arch/arm64/include/asm/mmu.h > +++ b/arch/arm64/include/asm/mmu.h > @@ -16,6 +16,8 @@ > #ifndef __ASM_MMU_H > #define __ASM_MMU_H > > +#include <asm/cputype.h> > + > #define MMCF_AARCH32 0x1 /* mm context flag for AArch32 executables */ > #define USER_ASID_BIT 48 > #define USER_ASID_FLAG (UL(1) << USER_ASID_BIT) > @@ -44,6 +46,28 @@ static inline bool arm64_kernel_unmapped_at_el0(void) > cpus_have_const_cap(ARM64_UNMAP_KERNEL_AT_EL0); > } > > +static inline bool arm64_kernel_use_ng_mappings(void) > +{ > + bool tx1_bug = false; > + > + if (!IS_ENABLED(CONFIG_UNMAP_KERNEL_AT_EL0)) > + return false; > + > + if (!IS_ENABLED(CONFIG_RANDOMIZE_BASE)) > + return arm64_kernel_unmapped_at_el0(); > + This is all pretty subtle, but it looks correct to me. We are relying on the > + if (IS_ENABLED(CONFIG_CAVIUM_ERRATUM_27456)) { > + extern const struct midr_range cavium_erratum_27456_cpus[]; > + > + tx1_bug = static_branch_likely(&arm64_const_caps_ready) ? > + cpus_have_const_cap(ARM64_WORKAROUND_CAVIUM_27456) : > + is_midr_in_range_list(read_cpuid_id(), > + cavium_erratum_27456_cpus); > + } > + > + return !tx1_bug; Should we make this !tx1_bug && kaslr_offset() > 0 ? That way, we take 'nokaslr' on the command line into account as well (see my patch #1) > +} > + > typedef void (*bp_hardening_cb_t)(void); > > struct bp_hardening_data { > diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h > index 78b942c1bea4..986e41c4c32b 100644 > --- a/arch/arm64/include/asm/pgtable-prot.h > +++ b/arch/arm64/include/asm/pgtable-prot.h > @@ -37,8 +37,8 @@ > #define _PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED) > #define _PROT_SECT_DEFAULT (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S) > > -#define PTE_MAYBE_NG (arm64_kernel_unmapped_at_el0() ? PTE_NG : 0) > -#define PMD_MAYBE_NG (arm64_kernel_unmapped_at_el0() ? PMD_SECT_NG : 0) > +#define PTE_MAYBE_NG (arm64_kernel_use_ng_mappings() ? PTE_NG : 0) > +#define PMD_MAYBE_NG (arm64_kernel_use_ng_mappings() ? PMD_SECT_NG : 0) > > #define PROT_DEFAULT (_PROT_DEFAULT | PTE_MAYBE_NG) > #define PROT_SECT_DEFAULT (_PROT_SECT_DEFAULT | PMD_MAYBE_NG) > diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c > index ff2fda3a98e1..2c42f57cf529 100644 > --- a/arch/arm64/kernel/cpu_errata.c > +++ b/arch/arm64/kernel/cpu_errata.c > @@ -539,7 +539,7 @@ static const struct midr_range arm64_harden_el2_vectors[] = { > #endif > > #ifdef CONFIG_CAVIUM_ERRATUM_27456 > -static const struct midr_range cavium_erratum_27456_cpus[] = { > +const struct midr_range cavium_erratum_27456_cpus[] = { > /* Cavium ThunderX, T88 pass 1.x - 2.1 */ > MIDR_RANGE(MIDR_THUNDERX, 0, 0, 1, 1), > /* Cavium ThunderX, T81 pass 1.0 */ > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > index 7ea0b2f20262..c68f6ad6722f 100644 > --- a/arch/arm64/kernel/cpufeature.c > +++ b/arch/arm64/kernel/cpufeature.c > @@ -1006,6 +1006,9 @@ kpti_install_ng_mappings(const struct arm64_cpu_capabilities *__unused) > if (kpti_applied) > return; > > + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && __kpti_forced <= 0) > + return; > + This is only called if the capability is detected, right? So how could __kpti_forced be negative in this case? In any case, I don't think __kpti_forced is relevant here: if we enter this function, we can exit early if arm64_kernel_use_ng_mappings() is guaranteed never to have returned 'false' at any point, and since we know we won't enter this function on TX1, testing for KASLR should be sufficient (but please include kaslr_offset() > 0) > remap_fn = (void *)__pa_symbol(idmap_kpti_install_ng_mappings); > > cpu_install_idmap();
On Thu, 13 Dec 2018 at 19:45, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > > On Thu, 13 Dec 2018 at 19:05, Will Deacon <will.deacon@arm.com> wrote: > > > > Hi Ard, > > > > On Thu, Dec 13, 2018 at 06:20:31PM +0100, Ard Biesheuvel wrote: > > > John reports [0] that recent changes to the way the linear region is mapped > > > is causing ~30 seconds stalls on boot, and this turns out to be due to the > > > way we remap the linear space with nG mappings when enabling KPTI, which > > > occurs with the MMU and caches off. Recent kernels map the linear region down > > > to pages by default, so that restricted permissions can be applied at page > > > granularity (so that module .text and .rodata are no longer writable via > > > the linear map), and so the number of page table entries that require > > > updating from G to nG is much higher, resulting in the observed delays. > > > > > > This series refactors the logic that tests whether KPTI should be enabled > > > so that the conditions that apply regardless of CPU identification are > > > evaluated early, allowing us to create the mappings of the kernel and > > > the linear region with nG attributes from the outset. > > > > > > Patches #1 to #3 implement this so that we only create these nG mappings > > > if KPTI is force enabled. > > > > > > As a followup, #4 may be applied, which inverts the logic so that nG mappings > > > are always created, unless KPTI is forced off. This allows us to get rid > > > of the slow and complex asm routines that make this change later on, which > > > is where the boot delays occur. > > > > I think I'd like to keep those for now, since it would be nice to use global > > mappings for the vast majority of CPUs that are not affected by kpti. > > > > I had a crack at the problem and ended up with a slightly simpler diff > > below. > > > > Will > > > > --->8 > > > > diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h > > index 7689c7aa1d77..feb05f7b46b1 100644 > > --- a/arch/arm64/include/asm/mmu.h > > +++ b/arch/arm64/include/asm/mmu.h > > @@ -16,6 +16,8 @@ > > #ifndef __ASM_MMU_H > > #define __ASM_MMU_H > > > > +#include <asm/cputype.h> > > + > > #define MMCF_AARCH32 0x1 /* mm context flag for AArch32 executables */ > > #define USER_ASID_BIT 48 > > #define USER_ASID_FLAG (UL(1) << USER_ASID_BIT) > > @@ -44,6 +46,28 @@ static inline bool arm64_kernel_unmapped_at_el0(void) > > cpus_have_const_cap(ARM64_UNMAP_KERNEL_AT_EL0); > > } > > > > +static inline bool arm64_kernel_use_ng_mappings(void) > > +{ > > + bool tx1_bug = false; > > + > > + if (!IS_ENABLED(CONFIG_UNMAP_KERNEL_AT_EL0)) > > + return false; > > + > > + if (!IS_ENABLED(CONFIG_RANDOMIZE_BASE)) > > + return arm64_kernel_unmapped_at_el0(); > > + > > This is all pretty subtle, but it looks correct to me. We are relying on the > .... fact that arm64_kernel_unmapped_at_el0() will not return true before the CPU feature check has completed. > > + if (IS_ENABLED(CONFIG_CAVIUM_ERRATUM_27456)) { > > + extern const struct midr_range cavium_erratum_27456_cpus[]; > > + > > + tx1_bug = static_branch_likely(&arm64_const_caps_ready) ? > > + cpus_have_const_cap(ARM64_WORKAROUND_CAVIUM_27456) : > > + is_midr_in_range_list(read_cpuid_id(), > > + cavium_erratum_27456_cpus); > > + } > > + > > + return !tx1_bug; > > Should we make this !tx1_bug && kaslr_offset() > 0 ? That way, we take > 'nokaslr' on the command line into account as well (see my patch #1) > > > +} > > + > > typedef void (*bp_hardening_cb_t)(void); > > > > struct bp_hardening_data { > > diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h > > index 78b942c1bea4..986e41c4c32b 100644 > > --- a/arch/arm64/include/asm/pgtable-prot.h > > +++ b/arch/arm64/include/asm/pgtable-prot.h > > @@ -37,8 +37,8 @@ > > #define _PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED) > > #define _PROT_SECT_DEFAULT (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S) > > > > -#define PTE_MAYBE_NG (arm64_kernel_unmapped_at_el0() ? PTE_NG : 0) > > -#define PMD_MAYBE_NG (arm64_kernel_unmapped_at_el0() ? PMD_SECT_NG : 0) > > +#define PTE_MAYBE_NG (arm64_kernel_use_ng_mappings() ? PTE_NG : 0) > > +#define PMD_MAYBE_NG (arm64_kernel_use_ng_mappings() ? PMD_SECT_NG : 0) > > > > #define PROT_DEFAULT (_PROT_DEFAULT | PTE_MAYBE_NG) > > #define PROT_SECT_DEFAULT (_PROT_SECT_DEFAULT | PMD_MAYBE_NG) > > diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c > > index ff2fda3a98e1..2c42f57cf529 100644 > > --- a/arch/arm64/kernel/cpu_errata.c > > +++ b/arch/arm64/kernel/cpu_errata.c > > @@ -539,7 +539,7 @@ static const struct midr_range arm64_harden_el2_vectors[] = { > > #endif > > > > #ifdef CONFIG_CAVIUM_ERRATUM_27456 > > -static const struct midr_range cavium_erratum_27456_cpus[] = { > > +const struct midr_range cavium_erratum_27456_cpus[] = { > > /* Cavium ThunderX, T88 pass 1.x - 2.1 */ > > MIDR_RANGE(MIDR_THUNDERX, 0, 0, 1, 1), > > /* Cavium ThunderX, T81 pass 1.0 */ > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > > index 7ea0b2f20262..c68f6ad6722f 100644 > > --- a/arch/arm64/kernel/cpufeature.c > > +++ b/arch/arm64/kernel/cpufeature.c > > @@ -1006,6 +1006,9 @@ kpti_install_ng_mappings(const struct arm64_cpu_capabilities *__unused) > > if (kpti_applied) > > return; > > > > + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && __kpti_forced <= 0) > > + return; > > + > > This is only called if the capability is detected, right? So how could > __kpti_forced be negative in this case? > > In any case, I don't think __kpti_forced is relevant here: if we enter > this function, we can exit early if arm64_kernel_use_ng_mappings() is > guaranteed never to have returned 'false' at any point, and since we > know we won't enter this function on TX1, testing for KASLR should be > sufficient (but please include kaslr_offset() > 0) > > > > remap_fn = (void *)__pa_symbol(idmap_kpti_install_ng_mappings); > > > > cpu_install_idmap();