mbox series

[RFC,0/4] arm64: kpti: use nG mappings unless KPTI is force disabled

Message ID 20181213172036.14504-1-ard.biesheuvel@linaro.org (mailing list archive)
Headers show
Series arm64: kpti: use nG mappings unless KPTI is force disabled | expand

Message

Ard Biesheuvel Dec. 13, 2018, 5:20 p.m. UTC
John reports [0] that recent changes to the way the linear region is mapped
is causing ~30 seconds stalls on boot, and this turns out to be due to the
way we remap the linear space with nG mappings when enabling KPTI, which
occurs with the MMU and caches off. Recent kernels map the linear region down
to pages by default, so that restricted permissions can be applied at page
granularity (so that module .text and .rodata are no longer writable via
the linear map), and so the number of page table entries that require
updating from G to nG is much higher, resulting in the observed delays.

This series refactors the logic that tests whether KPTI should be enabled
so that the conditions that apply regardless of CPU identification are
evaluated early, allowing us to create the mappings of the kernel and
the linear region with nG attributes from the outset.

Patches #1 to #3 implement this so that we only create these nG mappings
if KPTI is force enabled.

As a followup, #4 may be applied, which inverts the logic so that nG mappings
are always created, unless KPTI is forced off. This allows us to get rid
of the slow and complex asm routines that make this change later on, which
is where the boot delays occur.

[0] https://marc.info/?l=linux-arm-kernel&m=154463433605723

Cc: Will Deacon <will.deacon@arm.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>

Ard Biesheuvel (4):
  arm64: kpti: enable KPTI only when KASLR is truly enabled.
  arm64: kpti: add helper to decide whether nG mappings should be used
    early
  arm64: kpti: use nG mappings from the outset if kpti is force enabled
  arm64: kpti: use non-global mappings unless KPTI is forced off

 arch/arm64/include/asm/cpufeature.h |   7 +
 arch/arm64/kernel/cpufeature.c      |  63 +++----
 arch/arm64/mm/mmu.c                 |  14 ++
 arch/arm64/mm/proc.S                | 189 --------------------
 4 files changed, 48 insertions(+), 225 deletions(-)

Comments

Will Deacon Dec. 13, 2018, 6:05 p.m. UTC | #1
Hi Ard,

On Thu, Dec 13, 2018 at 06:20:31PM +0100, Ard Biesheuvel wrote:
> John reports [0] that recent changes to the way the linear region is mapped
> is causing ~30 seconds stalls on boot, and this turns out to be due to the
> way we remap the linear space with nG mappings when enabling KPTI, which
> occurs with the MMU and caches off. Recent kernels map the linear region down
> to pages by default, so that restricted permissions can be applied at page
> granularity (so that module .text and .rodata are no longer writable via
> the linear map), and so the number of page table entries that require
> updating from G to nG is much higher, resulting in the observed delays.
> 
> This series refactors the logic that tests whether KPTI should be enabled
> so that the conditions that apply regardless of CPU identification are
> evaluated early, allowing us to create the mappings of the kernel and
> the linear region with nG attributes from the outset.
> 
> Patches #1 to #3 implement this so that we only create these nG mappings
> if KPTI is force enabled.
> 
> As a followup, #4 may be applied, which inverts the logic so that nG mappings
> are always created, unless KPTI is forced off. This allows us to get rid
> of the slow and complex asm routines that make this change later on, which
> is where the boot delays occur.

I think I'd like to keep those for now, since it would be nice to use global
mappings for the vast majority of CPUs that are not affected by kpti.

I had a crack at the problem and ended up with a slightly simpler diff
below.

Will

--->8

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 7689c7aa1d77..feb05f7b46b1 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -16,6 +16,8 @@
 #ifndef __ASM_MMU_H
 #define __ASM_MMU_H
 
+#include <asm/cputype.h>
+
 #define MMCF_AARCH32	0x1	/* mm context flag for AArch32 executables */
 #define USER_ASID_BIT	48
 #define USER_ASID_FLAG	(UL(1) << USER_ASID_BIT)
@@ -44,6 +46,28 @@ static inline bool arm64_kernel_unmapped_at_el0(void)
 	       cpus_have_const_cap(ARM64_UNMAP_KERNEL_AT_EL0);
 }
 
+static inline bool arm64_kernel_use_ng_mappings(void)
+{
+	bool tx1_bug = false;
+
+	if (!IS_ENABLED(CONFIG_UNMAP_KERNEL_AT_EL0))
+		return false;
+
+	if (!IS_ENABLED(CONFIG_RANDOMIZE_BASE))
+		return arm64_kernel_unmapped_at_el0();
+
+	if (IS_ENABLED(CONFIG_CAVIUM_ERRATUM_27456)) {
+		extern const struct midr_range cavium_erratum_27456_cpus[];
+
+		tx1_bug = static_branch_likely(&arm64_const_caps_ready) ?
+			  cpus_have_const_cap(ARM64_WORKAROUND_CAVIUM_27456) :
+			  is_midr_in_range_list(read_cpuid_id(),
+						cavium_erratum_27456_cpus);
+	}
+
+	return !tx1_bug;
+}
+
 typedef void (*bp_hardening_cb_t)(void);
 
 struct bp_hardening_data {
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 78b942c1bea4..986e41c4c32b 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -37,8 +37,8 @@
 #define _PROT_DEFAULT		(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
 #define _PROT_SECT_DEFAULT	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
 
-#define PTE_MAYBE_NG		(arm64_kernel_unmapped_at_el0() ? PTE_NG : 0)
-#define PMD_MAYBE_NG		(arm64_kernel_unmapped_at_el0() ? PMD_SECT_NG : 0)
+#define PTE_MAYBE_NG		(arm64_kernel_use_ng_mappings() ? PTE_NG : 0)
+#define PMD_MAYBE_NG		(arm64_kernel_use_ng_mappings() ? PMD_SECT_NG : 0)
 
 #define PROT_DEFAULT		(_PROT_DEFAULT | PTE_MAYBE_NG)
 #define PROT_SECT_DEFAULT	(_PROT_SECT_DEFAULT | PMD_MAYBE_NG)
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index ff2fda3a98e1..2c42f57cf529 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -539,7 +539,7 @@ static const struct midr_range arm64_harden_el2_vectors[] = {
 #endif
 
 #ifdef CONFIG_CAVIUM_ERRATUM_27456
-static const struct midr_range cavium_erratum_27456_cpus[] = {
+const struct midr_range cavium_erratum_27456_cpus[] = {
 	/* Cavium ThunderX, T88 pass 1.x - 2.1 */
 	MIDR_RANGE(MIDR_THUNDERX, 0, 0, 1, 1),
 	/* Cavium ThunderX, T81 pass 1.0 */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 7ea0b2f20262..c68f6ad6722f 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1006,6 +1006,9 @@ kpti_install_ng_mappings(const struct arm64_cpu_capabilities *__unused)
 	if (kpti_applied)
 		return;
 
+	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && __kpti_forced <= 0)
+		return;
+
 	remap_fn = (void *)__pa_symbol(idmap_kpti_install_ng_mappings);
 
 	cpu_install_idmap();
Ard Biesheuvel Dec. 13, 2018, 6:45 p.m. UTC | #2
On Thu, 13 Dec 2018 at 19:05, Will Deacon <will.deacon@arm.com> wrote:
>
> Hi Ard,
>
> On Thu, Dec 13, 2018 at 06:20:31PM +0100, Ard Biesheuvel wrote:
> > John reports [0] that recent changes to the way the linear region is mapped
> > is causing ~30 seconds stalls on boot, and this turns out to be due to the
> > way we remap the linear space with nG mappings when enabling KPTI, which
> > occurs with the MMU and caches off. Recent kernels map the linear region down
> > to pages by default, so that restricted permissions can be applied at page
> > granularity (so that module .text and .rodata are no longer writable via
> > the linear map), and so the number of page table entries that require
> > updating from G to nG is much higher, resulting in the observed delays.
> >
> > This series refactors the logic that tests whether KPTI should be enabled
> > so that the conditions that apply regardless of CPU identification are
> > evaluated early, allowing us to create the mappings of the kernel and
> > the linear region with nG attributes from the outset.
> >
> > Patches #1 to #3 implement this so that we only create these nG mappings
> > if KPTI is force enabled.
> >
> > As a followup, #4 may be applied, which inverts the logic so that nG mappings
> > are always created, unless KPTI is forced off. This allows us to get rid
> > of the slow and complex asm routines that make this change later on, which
> > is where the boot delays occur.
>
> I think I'd like to keep those for now, since it would be nice to use global
> mappings for the vast majority of CPUs that are not affected by kpti.
>
> I had a crack at the problem and ended up with a slightly simpler diff
> below.
>
> Will
>
> --->8
>
> diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
> index 7689c7aa1d77..feb05f7b46b1 100644
> --- a/arch/arm64/include/asm/mmu.h
> +++ b/arch/arm64/include/asm/mmu.h
> @@ -16,6 +16,8 @@
>  #ifndef __ASM_MMU_H
>  #define __ASM_MMU_H
>
> +#include <asm/cputype.h>
> +
>  #define MMCF_AARCH32   0x1     /* mm context flag for AArch32 executables */
>  #define USER_ASID_BIT  48
>  #define USER_ASID_FLAG (UL(1) << USER_ASID_BIT)
> @@ -44,6 +46,28 @@ static inline bool arm64_kernel_unmapped_at_el0(void)
>                cpus_have_const_cap(ARM64_UNMAP_KERNEL_AT_EL0);
>  }
>
> +static inline bool arm64_kernel_use_ng_mappings(void)
> +{
> +       bool tx1_bug = false;
> +
> +       if (!IS_ENABLED(CONFIG_UNMAP_KERNEL_AT_EL0))
> +               return false;
> +
> +       if (!IS_ENABLED(CONFIG_RANDOMIZE_BASE))
> +               return arm64_kernel_unmapped_at_el0();
> +

This is all pretty subtle, but it looks correct to me. We are relying on the

> +       if (IS_ENABLED(CONFIG_CAVIUM_ERRATUM_27456)) {
> +               extern const struct midr_range cavium_erratum_27456_cpus[];
> +
> +               tx1_bug = static_branch_likely(&arm64_const_caps_ready) ?
> +                         cpus_have_const_cap(ARM64_WORKAROUND_CAVIUM_27456) :
> +                         is_midr_in_range_list(read_cpuid_id(),
> +                                               cavium_erratum_27456_cpus);
> +       }
> +
> +       return !tx1_bug;

Should we make this !tx1_bug && kaslr_offset() > 0 ? That way, we take
'nokaslr' on the command line into account as well (see my patch #1)

> +}
> +
>  typedef void (*bp_hardening_cb_t)(void);
>
>  struct bp_hardening_data {
> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
> index 78b942c1bea4..986e41c4c32b 100644
> --- a/arch/arm64/include/asm/pgtable-prot.h
> +++ b/arch/arm64/include/asm/pgtable-prot.h
> @@ -37,8 +37,8 @@
>  #define _PROT_DEFAULT          (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
>  #define _PROT_SECT_DEFAULT     (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
>
> -#define PTE_MAYBE_NG           (arm64_kernel_unmapped_at_el0() ? PTE_NG : 0)
> -#define PMD_MAYBE_NG           (arm64_kernel_unmapped_at_el0() ? PMD_SECT_NG : 0)
> +#define PTE_MAYBE_NG           (arm64_kernel_use_ng_mappings() ? PTE_NG : 0)
> +#define PMD_MAYBE_NG           (arm64_kernel_use_ng_mappings() ? PMD_SECT_NG : 0)
>
>  #define PROT_DEFAULT           (_PROT_DEFAULT | PTE_MAYBE_NG)
>  #define PROT_SECT_DEFAULT      (_PROT_SECT_DEFAULT | PMD_MAYBE_NG)
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index ff2fda3a98e1..2c42f57cf529 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -539,7 +539,7 @@ static const struct midr_range arm64_harden_el2_vectors[] = {
>  #endif
>
>  #ifdef CONFIG_CAVIUM_ERRATUM_27456
> -static const struct midr_range cavium_erratum_27456_cpus[] = {
> +const struct midr_range cavium_erratum_27456_cpus[] = {
>         /* Cavium ThunderX, T88 pass 1.x - 2.1 */
>         MIDR_RANGE(MIDR_THUNDERX, 0, 0, 1, 1),
>         /* Cavium ThunderX, T81 pass 1.0 */
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 7ea0b2f20262..c68f6ad6722f 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1006,6 +1006,9 @@ kpti_install_ng_mappings(const struct arm64_cpu_capabilities *__unused)
>         if (kpti_applied)
>                 return;
>
> +       if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && __kpti_forced <= 0)
> +               return;
> +

This is only called if the capability is detected, right? So how could
__kpti_forced be negative in this case?

In any case, I don't think __kpti_forced is relevant here: if we enter
this function, we can exit early if arm64_kernel_use_ng_mappings() is
guaranteed never to have returned 'false' at any point, and since we
know we won't enter this function on TX1, testing for KASLR should be
sufficient (but please include kaslr_offset() > 0)


>         remap_fn = (void *)__pa_symbol(idmap_kpti_install_ng_mappings);
>
>         cpu_install_idmap();
Ard Biesheuvel Dec. 13, 2018, 8:10 p.m. UTC | #3
On Thu, 13 Dec 2018 at 19:45, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>
> On Thu, 13 Dec 2018 at 19:05, Will Deacon <will.deacon@arm.com> wrote:
> >
> > Hi Ard,
> >
> > On Thu, Dec 13, 2018 at 06:20:31PM +0100, Ard Biesheuvel wrote:
> > > John reports [0] that recent changes to the way the linear region is mapped
> > > is causing ~30 seconds stalls on boot, and this turns out to be due to the
> > > way we remap the linear space with nG mappings when enabling KPTI, which
> > > occurs with the MMU and caches off. Recent kernels map the linear region down
> > > to pages by default, so that restricted permissions can be applied at page
> > > granularity (so that module .text and .rodata are no longer writable via
> > > the linear map), and so the number of page table entries that require
> > > updating from G to nG is much higher, resulting in the observed delays.
> > >
> > > This series refactors the logic that tests whether KPTI should be enabled
> > > so that the conditions that apply regardless of CPU identification are
> > > evaluated early, allowing us to create the mappings of the kernel and
> > > the linear region with nG attributes from the outset.
> > >
> > > Patches #1 to #3 implement this so that we only create these nG mappings
> > > if KPTI is force enabled.
> > >
> > > As a followup, #4 may be applied, which inverts the logic so that nG mappings
> > > are always created, unless KPTI is forced off. This allows us to get rid
> > > of the slow and complex asm routines that make this change later on, which
> > > is where the boot delays occur.
> >
> > I think I'd like to keep those for now, since it would be nice to use global
> > mappings for the vast majority of CPUs that are not affected by kpti.
> >
> > I had a crack at the problem and ended up with a slightly simpler diff
> > below.
> >
> > Will
> >
> > --->8
> >
> > diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
> > index 7689c7aa1d77..feb05f7b46b1 100644
> > --- a/arch/arm64/include/asm/mmu.h
> > +++ b/arch/arm64/include/asm/mmu.h
> > @@ -16,6 +16,8 @@
> >  #ifndef __ASM_MMU_H
> >  #define __ASM_MMU_H
> >
> > +#include <asm/cputype.h>
> > +
> >  #define MMCF_AARCH32   0x1     /* mm context flag for AArch32 executables */
> >  #define USER_ASID_BIT  48
> >  #define USER_ASID_FLAG (UL(1) << USER_ASID_BIT)
> > @@ -44,6 +46,28 @@ static inline bool arm64_kernel_unmapped_at_el0(void)
> >                cpus_have_const_cap(ARM64_UNMAP_KERNEL_AT_EL0);
> >  }
> >
> > +static inline bool arm64_kernel_use_ng_mappings(void)
> > +{
> > +       bool tx1_bug = false;
> > +
> > +       if (!IS_ENABLED(CONFIG_UNMAP_KERNEL_AT_EL0))
> > +               return false;
> > +
> > +       if (!IS_ENABLED(CONFIG_RANDOMIZE_BASE))
> > +               return arm64_kernel_unmapped_at_el0();
> > +
>
> This is all pretty subtle, but it looks correct to me. We are relying on the
>

.... fact that arm64_kernel_unmapped_at_el0() will not return true
before the CPU feature check has completed.

> > +       if (IS_ENABLED(CONFIG_CAVIUM_ERRATUM_27456)) {
> > +               extern const struct midr_range cavium_erratum_27456_cpus[];
> > +
> > +               tx1_bug = static_branch_likely(&arm64_const_caps_ready) ?
> > +                         cpus_have_const_cap(ARM64_WORKAROUND_CAVIUM_27456) :
> > +                         is_midr_in_range_list(read_cpuid_id(),
> > +                                               cavium_erratum_27456_cpus);
> > +       }
> > +
> > +       return !tx1_bug;
>
> Should we make this !tx1_bug && kaslr_offset() > 0 ? That way, we take
> 'nokaslr' on the command line into account as well (see my patch #1)
>
> > +}
> > +
> >  typedef void (*bp_hardening_cb_t)(void);
> >
> >  struct bp_hardening_data {
> > diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
> > index 78b942c1bea4..986e41c4c32b 100644
> > --- a/arch/arm64/include/asm/pgtable-prot.h
> > +++ b/arch/arm64/include/asm/pgtable-prot.h
> > @@ -37,8 +37,8 @@
> >  #define _PROT_DEFAULT          (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
> >  #define _PROT_SECT_DEFAULT     (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
> >
> > -#define PTE_MAYBE_NG           (arm64_kernel_unmapped_at_el0() ? PTE_NG : 0)
> > -#define PMD_MAYBE_NG           (arm64_kernel_unmapped_at_el0() ? PMD_SECT_NG : 0)
> > +#define PTE_MAYBE_NG           (arm64_kernel_use_ng_mappings() ? PTE_NG : 0)
> > +#define PMD_MAYBE_NG           (arm64_kernel_use_ng_mappings() ? PMD_SECT_NG : 0)
> >
> >  #define PROT_DEFAULT           (_PROT_DEFAULT | PTE_MAYBE_NG)
> >  #define PROT_SECT_DEFAULT      (_PROT_SECT_DEFAULT | PMD_MAYBE_NG)
> > diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> > index ff2fda3a98e1..2c42f57cf529 100644
> > --- a/arch/arm64/kernel/cpu_errata.c
> > +++ b/arch/arm64/kernel/cpu_errata.c
> > @@ -539,7 +539,7 @@ static const struct midr_range arm64_harden_el2_vectors[] = {
> >  #endif
> >
> >  #ifdef CONFIG_CAVIUM_ERRATUM_27456
> > -static const struct midr_range cavium_erratum_27456_cpus[] = {
> > +const struct midr_range cavium_erratum_27456_cpus[] = {
> >         /* Cavium ThunderX, T88 pass 1.x - 2.1 */
> >         MIDR_RANGE(MIDR_THUNDERX, 0, 0, 1, 1),
> >         /* Cavium ThunderX, T81 pass 1.0 */
> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index 7ea0b2f20262..c68f6ad6722f 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -1006,6 +1006,9 @@ kpti_install_ng_mappings(const struct arm64_cpu_capabilities *__unused)
> >         if (kpti_applied)
> >                 return;
> >
> > +       if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && __kpti_forced <= 0)
> > +               return;
> > +
>
> This is only called if the capability is detected, right? So how could
> __kpti_forced be negative in this case?
>
> In any case, I don't think __kpti_forced is relevant here: if we enter
> this function, we can exit early if arm64_kernel_use_ng_mappings() is
> guaranteed never to have returned 'false' at any point, and since we
> know we won't enter this function on TX1, testing for KASLR should be
> sufficient (but please include kaslr_offset() > 0)
>
>
> >         remap_fn = (void *)__pa_symbol(idmap_kpti_install_ng_mappings);
> >
> >         cpu_install_idmap();