diff mbox series

arm64: Relax ICC_PMR_EL1 accesses when ICC_CTLR_EL1.PMHE is clear

Message ID 1564496445-53486-1-git-send-email-julien.thierry.kdev@gmail.com (mailing list archive)
State New, archived
Headers show
Series arm64: Relax ICC_PMR_EL1 accesses when ICC_CTLR_EL1.PMHE is clear | expand

Commit Message

Julien July 30, 2019, 2:20 p.m. UTC
From: Marc Zyngier <marc.zyngier@arm.com>

The GICv3 architecture specification is incredibly misleading when it
comes to PMR and the requirement for a DSB. It turns out that this DSB
is only required if the CPU interface sends an Upstream Control
message to the redistributor in order to update the RD's view of PMR.

This message is only sent when ICC_CTLR_EL1.PMHE is set, which isn't
the case in Linux. It can still be set from EL3, so some special care
is required. But the upshot is that in the (hopefuly large) majority
of the cases, we can drop the DSB altogether.

This requires yet another capability and some more runtime patching.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
[JT: rebased on top of priority masking fixes,
     factorize pmr synchronization]
Signed-off-by: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/barrier.h   |  8 ++++++++
 arch/arm64/include/asm/cpucaps.h   |  3 ++-
 arch/arm64/include/asm/daifflags.h |  3 ++-
 arch/arm64/include/asm/irqflags.h  | 19 ++++++++++---------
 arch/arm64/include/asm/kvm_host.h  |  3 +--
 arch/arm64/kernel/cpufeature.c     | 33 +++++++++++++++++++++++++++++++++
 arch/arm64/kernel/entry.S          |  4 ++--
 arch/arm64/kvm/hyp/switch.c        |  3 ++-
 include/linux/irqchip/arm-gic-v3.h |  2 ++
 9 files changed, 62 insertions(+), 16 deletions(-)

Testing this on d05, there seems to be a ~15% improvement on hackbench.
This brings us on par with the no CONFIG_ARM64_PSEUDO_NMI build (or
even better):

Command: hackbench 200 process 1000

Average over 20 runs:
- v5.3-rc2, no irq priorities: 8.57345 sec
- v5.3-rc2, irq priorities: 9.99225 sec
- v5.3-rc2, "relaxed": 8.26705 sec

--
1.9.1

Comments

Will Deacon Aug. 1, 2019, 10:41 a.m. UTC | #1
On Tue, Jul 30, 2019 at 03:20:45PM +0100, Julien Thierry wrote:
> From: Marc Zyngier <marc.zyngier@arm.com>
> 
> The GICv3 architecture specification is incredibly misleading when it
> comes to PMR and the requirement for a DSB. It turns out that this DSB
> is only required if the CPU interface sends an Upstream Control
> message to the redistributor in order to update the RD's view of PMR.
> 
> This message is only sent when ICC_CTLR_EL1.PMHE is set, which isn't
> the case in Linux. It can still be set from EL3, so some special care
> is required. But the upshot is that in the (hopefuly large) majority
> of the cases, we can drop the DSB altogether.
> 
> This requires yet another capability and some more runtime patching.

Hmm, does this actually require explicit runtime patching, or can we make
things a bit simpler with a static key?

Will
Marc Zyngier Aug. 1, 2019, 10:51 a.m. UTC | #2
On 01/08/2019 11:41, Will Deacon wrote:
> On Tue, Jul 30, 2019 at 03:20:45PM +0100, Julien Thierry wrote:
>> From: Marc Zyngier <marc.zyngier@arm.com>
>>
>> The GICv3 architecture specification is incredibly misleading when it
>> comes to PMR and the requirement for a DSB. It turns out that this DSB
>> is only required if the CPU interface sends an Upstream Control
>> message to the redistributor in order to update the RD's view of PMR.
>>
>> This message is only sent when ICC_CTLR_EL1.PMHE is set, which isn't
>> the case in Linux. It can still be set from EL3, so some special care
>> is required. But the upshot is that in the (hopefuly large) majority
>> of the cases, we can drop the DSB altogether.
>>
>> This requires yet another capability and some more runtime patching.
> 
> Hmm, does this actually require explicit runtime patching, or can we make
> things a bit simpler with a static key?

The hunk in entry.S is the blocker, AFAICS. Do we have a way to express
static keys in asm?

	M.
Julien Aug. 1, 2019, 10:56 a.m. UTC | #3
On Thu, 1 Aug 2019 at 11:51, Marc Zyngier <marc.zyngier@arm.com> wrote:
>
> On 01/08/2019 11:41, Will Deacon wrote:
> > On Tue, Jul 30, 2019 at 03:20:45PM +0100, Julien Thierry wrote:
> >> From: Marc Zyngier <marc.zyngier@arm.com>
> >>
> >> The GICv3 architecture specification is incredibly misleading when it
> >> comes to PMR and the requirement for a DSB. It turns out that this DSB
> >> is only required if the CPU interface sends an Upstream Control
> >> message to the redistributor in order to update the RD's view of PMR.
> >>
> >> This message is only sent when ICC_CTLR_EL1.PMHE is set, which isn't
> >> the case in Linux. It can still be set from EL3, so some special care
> >> is required. But the upshot is that in the (hopefuly large) majority
> >> of the cases, we can drop the DSB altogether.
> >>
> >> This requires yet another capability and some more runtime patching.
> >
> > Hmm, does this actually require explicit runtime patching, or can we make
> > things a bit simpler with a static key?
>
> The hunk in entry.S is the blocker, AFAICS. Do we have a way to express
> static keys in asm?
>

Not that I'm aware of. I could leave the alternative in entry.S and
use a static_key for the pmr_sync() macro.

Does it change much over all? I don't see the static key simplifying
things too much, but I don't mind using that instead.

Cheers,
Marc Zyngier Aug. 1, 2019, 11:07 a.m. UTC | #4
On 01/08/2019 11:56, Julien Thierry wrote:
> On Thu, 1 Aug 2019 at 11:51, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>
>> On 01/08/2019 11:41, Will Deacon wrote:
>>> On Tue, Jul 30, 2019 at 03:20:45PM +0100, Julien Thierry wrote:
>>>> From: Marc Zyngier <marc.zyngier@arm.com>
>>>>
>>>> The GICv3 architecture specification is incredibly misleading when it
>>>> comes to PMR and the requirement for a DSB. It turns out that this DSB
>>>> is only required if the CPU interface sends an Upstream Control
>>>> message to the redistributor in order to update the RD's view of PMR.
>>>>
>>>> This message is only sent when ICC_CTLR_EL1.PMHE is set, which isn't
>>>> the case in Linux. It can still be set from EL3, so some special care
>>>> is required. But the upshot is that in the (hopefuly large) majority
>>>> of the cases, we can drop the DSB altogether.
>>>>
>>>> This requires yet another capability and some more runtime patching.
>>>
>>> Hmm, does this actually require explicit runtime patching, or can we make
>>> things a bit simpler with a static key?
>>
>> The hunk in entry.S is the blocker, AFAICS. Do we have a way to express
>> static keys in asm?
>>
> 
> Not that I'm aware of. I could leave the alternative in entry.S and
> use a static_key for the pmr_sync() macro.

I'm not sure that helps. It means we end-up with two mechanisms to keep
in sync instead of a single one.

> Does it change much over all? I don't see the static key simplifying
> things too much, but I don't mind using that instead.

The complexity is the same. The added benefit is that we can control it
from the GIC code rather than the architecture code. But that's assuming
we can do it all using a static key...

Thanks,

	M.
Will Deacon Aug. 1, 2019, 11:13 a.m. UTC | #5
On Thu, Aug 01, 2019 at 12:07:58PM +0100, Marc Zyngier wrote:
> On 01/08/2019 11:56, Julien Thierry wrote:
> > On Thu, 1 Aug 2019 at 11:51, Marc Zyngier <marc.zyngier@arm.com> wrote:
> >>
> >> On 01/08/2019 11:41, Will Deacon wrote:
> >>> On Tue, Jul 30, 2019 at 03:20:45PM +0100, Julien Thierry wrote:
> >>>> From: Marc Zyngier <marc.zyngier@arm.com>
> >>>>
> >>>> The GICv3 architecture specification is incredibly misleading when it
> >>>> comes to PMR and the requirement for a DSB. It turns out that this DSB
> >>>> is only required if the CPU interface sends an Upstream Control
> >>>> message to the redistributor in order to update the RD's view of PMR.
> >>>>
> >>>> This message is only sent when ICC_CTLR_EL1.PMHE is set, which isn't
> >>>> the case in Linux. It can still be set from EL3, so some special care
> >>>> is required. But the upshot is that in the (hopefuly large) majority
> >>>> of the cases, we can drop the DSB altogether.
> >>>>
> >>>> This requires yet another capability and some more runtime patching.
> >>>
> >>> Hmm, does this actually require explicit runtime patching, or can we make
> >>> things a bit simpler with a static key?
> >>
> >> The hunk in entry.S is the blocker, AFAICS. Do we have a way to express
> >> static keys in asm?
> >>
> > 
> > Not that I'm aware of. I could leave the alternative in entry.S and
> > use a static_key for the pmr_sync() macro.
> 
> I'm not sure that helps. It means we end-up with two mechanisms to keep
> in sync instead of a single one.

Yes, I missed the entry.S part initially.

> > Does it change much over all? I don't see the static key simplifying
> > things too much, but I don't mind using that instead.
> 
> The complexity is the same. The added benefit is that we can control it
> from the GIC code rather than the architecture code. But that's assuming
> we can do it all using a static key...

Well I think we should look at the numbers for static key + conditional
branch in entry.S. If entry is being hammered, the predictor should do
its job (ignoring Spectre-V2). If entry isn't being hammered, then it
shouldn't matter.

Will
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index e0e2b19..bca9faf 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -29,6 +29,14 @@ 
 						 SB_BARRIER_INSN"nop\n",	\
 						 ARM64_HAS_SB))

+#ifdef CONFIG_ARM64_PSEUDO_NMI
+#define pmr_sync()	asm volatile(ALTERNATIVE("nop",		\
+						 "dsb	sy",	\
+						 ARM64_PMR_REQUIRES_SYNC))
+#else
+#define pmr_sync()	do {} while (0)
+#endif
+
 #define mb()		dsb(sy)
 #define rmb()		dsb(ld)
 #define wmb()		dsb(st)
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index f19fe4b..616437d 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -52,7 +52,8 @@ 
 #define ARM64_HAS_IRQ_PRIO_MASKING		42
 #define ARM64_HAS_DCPODP			43
 #define ARM64_WORKAROUND_1463225		44
+#define ARM64_PMR_REQUIRES_SYNC			45

-#define ARM64_NCAPS				45
+#define ARM64_NCAPS				46

 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h
index 987926e..00b1679 100644
--- a/arch/arm64/include/asm/daifflags.h
+++ b/arch/arm64/include/asm/daifflags.h
@@ -8,6 +8,7 @@ 
 #include <linux/irqflags.h>

 #include <asm/arch_gicv3.h>
+#include <asm/barrier.h>
 #include <asm/cpufeature.h>

 #define DAIF_PROCCTX		0
@@ -63,7 +64,7 @@  static inline void local_daif_restore(unsigned long flags)

 		if (system_uses_irq_prio_masking()) {
 			gic_write_pmr(GIC_PRIO_IRQON);
-			dsb(sy);
+			pmr_sync();
 		}
 	} else if (system_uses_irq_prio_masking()) {
 		u64 pmr;
diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
index 7872f26..a5e7115 100644
--- a/arch/arm64/include/asm/irqflags.h
+++ b/arch/arm64/include/asm/irqflags.h
@@ -8,6 +8,7 @@ 
 #ifdef __KERNEL__

 #include <asm/alternative.h>
+#include <asm/barrier.h>
 #include <asm/ptrace.h>
 #include <asm/sysreg.h>

@@ -36,14 +37,14 @@  static inline void arch_local_irq_enable(void)
 	}

 	asm volatile(ALTERNATIVE(
-		"msr	daifclr, #2		// arch_local_irq_enable\n"
-		"nop",
-		__msr_s(SYS_ICC_PMR_EL1, "%0")
-		"dsb	sy",
+		"msr	daifclr, #2		// arch_local_irq_enable",
+		__msr_s(SYS_ICC_PMR_EL1, "%0"),
 		ARM64_HAS_IRQ_PRIO_MASKING)
 		:
 		: "r" ((unsigned long) GIC_PRIO_IRQON)
 		: "memory");
+
+	pmr_sync();
 }

 static inline void arch_local_irq_disable(void)
@@ -118,14 +119,14 @@  static inline unsigned long arch_local_irq_save(void)
 static inline void arch_local_irq_restore(unsigned long flags)
 {
 	asm volatile(ALTERNATIVE(
-			"msr	daif, %0\n"
-			"nop",
-			__msr_s(SYS_ICC_PMR_EL1, "%0")
-			"dsb	sy",
-			ARM64_HAS_IRQ_PRIO_MASKING)
+		"msr	daif, %0",
+		__msr_s(SYS_ICC_PMR_EL1, "%0"),
+		ARM64_HAS_IRQ_PRIO_MASKING)
 		:
 		: "r" (flags)
 		: "memory");
+
+	pmr_sync();
 }

 #endif
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index f656169..5ecb091 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -600,8 +600,7 @@  static inline void kvm_arm_vhe_guest_enter(void)
 	 * local_daif_mask() already sets GIC_PRIO_PSR_I_SET, we just need a
 	 * dsb to ensure the redistributor is forwards EL2 IRQs to the CPU.
 	 */
-	if (system_uses_irq_prio_masking())
-		dsb(sy);
+	pmr_sync();
 }

 static inline void kvm_arm_vhe_guest_exit(void)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index f29f36a..b1c036f 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1246,6 +1246,26 @@  static bool can_use_gic_priorities(const struct arm64_cpu_capabilities *entry,
 {
 	return enable_pseudo_nmi && has_useable_gicv3_cpuif(entry, scope);
 }
+
+static bool check_icc_ctlr_pmhe(const struct arm64_cpu_capabilities *entry,
+				int scope)
+{
+	bool res = can_use_gic_priorities(entry, scope);
+
+	if (res) {
+		u64 val;
+
+		/*
+		 * Linux itself doesn't use 1:N distribution, so has
+		 * no need to set PMHE. The only reason to have it set
+		 * is if EL3 requires it (and we can't change it)
+		 */
+		val = read_sysreg_s(SYS_ICC_CTLR_EL1);
+		res &= !!(val & ICC_CTLR_EL1_PMHE_MASK);
+	}
+
+	return res;
+}
 #endif

 static const struct arm64_cpu_capabilities arm64_features[] = {
@@ -1547,6 +1567,19 @@  static bool can_use_gic_priorities(const struct arm64_cpu_capabilities *entry,
 		.sign = FTR_UNSIGNED,
 		.min_field_value = 1,
 	},
+	{
+		/*
+		 * Depends on using IRQ priority masking
+		 */
+		.desc = "IRQ priority masking requires synchronization",
+		.capability = ARM64_PMR_REQUIRES_SYNC,
+		.type = ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE,
+		.matches = check_icc_ctlr_pmhe,
+		.sys_reg = SYS_ID_AA64PFR0_EL1,
+		.field_pos = ID_AA64PFR0_GIC_SHIFT,
+		.sign = FTR_UNSIGNED,
+		.min_field_value = 1,
+	},
 #endif
 	{},
 };
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 320a30d..d35ceee 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -269,9 +269,9 @@  alternative_else_nop_endif
 alternative_if ARM64_HAS_IRQ_PRIO_MASKING
 	ldr	x20, [sp, #S_PMR_SAVE]
 	msr_s	SYS_ICC_PMR_EL1, x20
-	/* Ensure priority change is seen by redistributor */
-	dsb	sy
 alternative_else_nop_endif
+	/* Ensure priority change is seen by redistributor */
+alternative_insn nop, "dsb sy", ARM64_PMR_REQUIRES_SYNC

 	ldp	x21, x22, [sp, #S_PC]		// load ELR, SPSR
 	.if	\el == 0
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index adaf266..5bb9312 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -13,6 +13,7 @@ 
 #include <kvm/arm_psci.h>

 #include <asm/arch_gicv3.h>
+#include <asm/barrier.h>
 #include <asm/cpufeature.h>
 #include <asm/kprobes.h>
 #include <asm/kvm_asm.h>
@@ -605,7 +606,7 @@  int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	 */
 	if (system_uses_irq_prio_masking()) {
 		gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET);
-		dsb(sy);
+		pmr_sync();
 	}

 	vcpu = kern_hyp_va(vcpu);
diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
index 67c4b98..74a8a3a 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -460,6 +460,8 @@ 
 #define ICC_CTLR_EL1_EOImode_MASK	(1 << ICC_CTLR_EL1_EOImode_SHIFT)
 #define ICC_CTLR_EL1_CBPR_SHIFT		0
 #define ICC_CTLR_EL1_CBPR_MASK		(1 << ICC_CTLR_EL1_CBPR_SHIFT)
+#define ICC_CTLR_EL1_PMHE_SHIFT		6
+#define ICC_CTLR_EL1_PMHE_MASK		(1 << ICC_CTLR_EL1_PMHE_SHIFT)
 #define ICC_CTLR_EL1_PRI_BITS_SHIFT	8
 #define ICC_CTLR_EL1_PRI_BITS_MASK	(0x7 << ICC_CTLR_EL1_PRI_BITS_SHIFT)
 #define ICC_CTLR_EL1_ID_BITS_SHIFT	11