diff mbox series

[v2] KVM: arm64: Don't set HCR_EL2.TVM when S2FWB is supported

Message ID 20191028130541.30536-1-christoffer.dall@arm.com (mailing list archive)
State New, archived
Headers show
Series [v2] KVM: arm64: Don't set HCR_EL2.TVM when S2FWB is supported | expand

Commit Message

Christoffer Dall Oct. 28, 2019, 1:05 p.m. UTC
On CPUs that support S2FWB (Armv8.4+), KVM configures the stage 2 page
tables to override the memory attributes of memory accesses, regardless
of the stage 1 page table configurations, and also when the stage 1 MMU
is turned off.  This results in all memory accesses to RAM being
cacheable, including during early boot of the guest.

On CPUs without this feature, memory accesses were non-cacheable during
boot until the guest turned on the stage 1 MMU, and we had to detect
when the guest turned on the MMU, such that we could invalidate all cache
entries and ensure a consistent view of memory with the MMU turned on.
When the guest turned on the caches, we would call stage2_flush_vm()
from kvm_toggle_cache().

However, stage2_flush_vm() walks all the stage 2 tables, and calls
__kvm_flush-dcache_pte, which on a system with S2FWD does ... absolutely
nothing.

We can avoid that whole song and dance, and simply not set TVM when
creating a VM on a system that has S2FWB.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
---
I was only able to test this on the model with cache modeling enabled,
but even removing TVM from HCR_EL2 without having FWB also worked with
that setup, so the testing of this has been light.  It seems like it
should obviously work, but it would be good if someone with access to
appropriate hardware could give this a spin.

 arch/arm64/include/asm/kvm_arm.h     |  3 +--
 arch/arm64/include/asm/kvm_emulate.h | 12 +++++++++++-
 2 files changed, 12 insertions(+), 3 deletions(-)

Comments

Mark Rutland Oct. 28, 2019, 1:28 p.m. UTC | #1
On Mon, Oct 28, 2019 at 02:05:41PM +0100, Christoffer Dall wrote:
> On CPUs that support S2FWB (Armv8.4+), KVM configures the stage 2 page
> tables to override the memory attributes of memory accesses, regardless
> of the stage 1 page table configurations, and also when the stage 1 MMU
> is turned off.  This results in all memory accesses to RAM being
> cacheable, including during early boot of the guest.
> 
> On CPUs without this feature, memory accesses were non-cacheable during
> boot until the guest turned on the stage 1 MMU, and we had to detect
> when the guest turned on the MMU, such that we could invalidate all cache
> entries and ensure a consistent view of memory with the MMU turned on.
> When the guest turned on the caches, we would call stage2_flush_vm()
> from kvm_toggle_cache().
> 
> However, stage2_flush_vm() walks all the stage 2 tables, and calls
> __kvm_flush-dcache_pte, which on a system with S2FWD does ... absolutely
> nothing.
> 
> We can avoid that whole song and dance, and simply not set TVM when
> creating a VM on a system that has S2FWB.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> ---
> I was only able to test this on the model with cache modeling enabled,
> but even removing TVM from HCR_EL2 without having FWB also worked with
> that setup, so the testing of this has been light.  It seems like it
> should obviously work, but it would be good if someone with access to
> appropriate hardware could give this a spin.
> 
>  arch/arm64/include/asm/kvm_arm.h     |  3 +--
>  arch/arm64/include/asm/kvm_emulate.h | 12 +++++++++++-
>  2 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index ddf9d762ac62..6e5d839f42b5 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -61,7 +61,6 @@
>   * RW:		64bit by default, can be overridden for 32bit VMs
>   * TAC:		Trap ACTLR
>   * TSC:		Trap SMC
> - * TVM:		Trap VM ops (until M+C set in SCTLR_EL1)
>   * TSW:		Trap cache operations by set/way
>   * TWE:		Trap WFE
>   * TWI:		Trap WFI
> @@ -74,7 +73,7 @@
>   * SWIO:	Turn set/way invalidates into set/way clean+invalidate
>   */
>  #define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
> -			 HCR_TVM | HCR_BSU_IS | HCR_FB | HCR_TAC | \
> +			 HCR_BSU_IS | HCR_FB | HCR_TAC | \
>  			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
>  			 HCR_FMO | HCR_IMO)
>  #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index d69c1efc63e7..70509799a2a9 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -53,8 +53,18 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>  		/* trap error record accesses */
>  		vcpu->arch.hcr_el2 |= HCR_TERR;
>  	}
> -	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> +
> +	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) {
>  		vcpu->arch.hcr_el2 |= HCR_FWB;
> +	} else {
> +		/*
> +		 * For non-FWB CPUs, we trap VM ops (HCR_EL2.TVM) until M+C
> +		 * get set in SCTLR_EL1 such that we can detect when the guest
> +		 * MMU gets turned off and do the necessary cache maintenance

Typo: s/off/on/ -- this is to make sure the accesses made when the MMU
was off (which aren't cacheable) are visible once the MMU is turned on
(e.g. not shadowed by stale clean cache lines).

Otherwise, this looks goot to me, and my R-B stands.

Thanks,
Mark.

> +		 * then.
> +		 */
> +		vcpu->arch.hcr_el2 &= ~HCR_TVM;
> +	}
>  
>  	if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
>  		vcpu->arch.hcr_el2 &= ~HCR_RW;
> -- 
> 2.18.0
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Marc Zyngier Oct. 28, 2019, 1:45 p.m. UTC | #2
On Mon, 28 Oct 2019 13:05:41 +0000,
Christoffer Dall <christoffer.dall@arm.com> wrote:
> 
> On CPUs that support S2FWB (Armv8.4+), KVM configures the stage 2 page
> tables to override the memory attributes of memory accesses, regardless
> of the stage 1 page table configurations, and also when the stage 1 MMU
> is turned off.  This results in all memory accesses to RAM being
> cacheable, including during early boot of the guest.
> 
> On CPUs without this feature, memory accesses were non-cacheable during
> boot until the guest turned on the stage 1 MMU, and we had to detect
> when the guest turned on the MMU, such that we could invalidate all cache
> entries and ensure a consistent view of memory with the MMU turned on.
> When the guest turned on the caches, we would call stage2_flush_vm()
> from kvm_toggle_cache().
> 
> However, stage2_flush_vm() walks all the stage 2 tables, and calls
> __kvm_flush-dcache_pte, which on a system with S2FWD does ... absolutely

s/FWD/FWB/

> nothing.
> 
> We can avoid that whole song and dance, and simply not set TVM when
> creating a VM on a system that has S2FWB.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Apart from the nit above, and Mark's other remark, it looks good to
me. I'll fix them up when applying the patch.

Thanks,

	M.
Alexandru Elisei Oct. 28, 2019, 3:12 p.m. UTC | #3
Hi,

Resending this email, because I replied only to the kvmarm list by accident,
instead of replying to everyone involved.

On 10/28/19 1:05 PM, Christoffer Dall wrote:
> On CPUs that support S2FWB (Armv8.4+), KVM configures the stage 2 page
> tables to override the memory attributes of memory accesses, regardless
> of the stage 1 page table configurations, and also when the stage 1 MMU
> is turned off.  This results in all memory accesses to RAM being
> cacheable, including during early boot of the guest.
>
> On CPUs without this feature, memory accesses were non-cacheable during
> boot until the guest turned on the stage 1 MMU, and we had to detect
> when the guest turned on the MMU, such that we could invalidate all cache
> entries and ensure a consistent view of memory with the MMU turned on.
> When the guest turned on the caches, we would call stage2_flush_vm()
> from kvm_toggle_cache().
>
> However, stage2_flush_vm() walks all the stage 2 tables, and calls
> __kvm_flush-dcache_pte, which on a system with S2FWD does ... absolutely
> nothing.
>
> We can avoid that whole song and dance, and simply not set TVM when
> creating a VM on a system that has S2FWB.
>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> ---
> I was only able to test this on the model with cache modeling enabled,
> but even removing TVM from HCR_EL2 without having FWB also worked with
> that setup, so the testing of this has been light.  It seems like it
> should obviously work, but it would be good if someone with access to
> appropriate hardware could give this a spin.
>
>  arch/arm64/include/asm/kvm_arm.h     |  3 +--
>  arch/arm64/include/asm/kvm_emulate.h | 12 +++++++++++-
>  2 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index ddf9d762ac62..6e5d839f42b5 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -61,7 +61,6 @@
>   * RW:		64bit by default, can be overridden for 32bit VMs
>   * TAC:		Trap ACTLR
>   * TSC:		Trap SMC
> - * TVM:		Trap VM ops (until M+C set in SCTLR_EL1)
>   * TSW:		Trap cache operations by set/way
>   * TWE:		Trap WFE
>   * TWI:		Trap WFI
> @@ -74,7 +73,7 @@
>   * SWIO:	Turn set/way invalidates into set/way clean+invalidate
>   */
>  #define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
> -			 HCR_TVM | HCR_BSU_IS | HCR_FB | HCR_TAC | \
> +			 HCR_BSU_IS | HCR_FB | HCR_TAC | \
>  			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
>  			 HCR_FMO | HCR_IMO)
>  #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index d69c1efc63e7..70509799a2a9 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -53,8 +53,18 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>  		/* trap error record accesses */
>  		vcpu->arch.hcr_el2 |= HCR_TERR;
>  	}
> -	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> +
> +	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) {
>  		vcpu->arch.hcr_el2 |= HCR_FWB;
> +	} else {
> +		/*
> +		 * For non-FWB CPUs, we trap VM ops (HCR_EL2.TVM) until M+C
> +		 * get set in SCTLR_EL1 such that we can detect when the guest
> +		 * MMU gets turned off and do the necessary cache maintenance
> +		 * then.
> +		 */
> +		vcpu->arch.hcr_el2 &= ~HCR_TVM;

Don't we want to set the bit here, so we're consistent with the previous behaviour and the comment? Because with this patch, we never set HCR_EL2.TVM...

Thanks,
Alex

> +	}
>  
>  	if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
>  		vcpu->arch.hcr_el2 &= ~HCR_RW;
Marc Zyngier Oct. 28, 2019, 4:19 p.m. UTC | #4
On Mon, 28 Oct 2019 15:12:39 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi,
> 
> Resending this email, because I replied only to the kvmarm list by accident,
> instead of replying to everyone involved.
> 
> On 10/28/19 1:05 PM, Christoffer Dall wrote:
> > On CPUs that support S2FWB (Armv8.4+), KVM configures the stage 2 page
> > tables to override the memory attributes of memory accesses, regardless
> > of the stage 1 page table configurations, and also when the stage 1 MMU
> > is turned off.  This results in all memory accesses to RAM being
> > cacheable, including during early boot of the guest.
> >
> > On CPUs without this feature, memory accesses were non-cacheable during
> > boot until the guest turned on the stage 1 MMU, and we had to detect
> > when the guest turned on the MMU, such that we could invalidate all cache
> > entries and ensure a consistent view of memory with the MMU turned on.
> > When the guest turned on the caches, we would call stage2_flush_vm()
> > from kvm_toggle_cache().
> >
> > However, stage2_flush_vm() walks all the stage 2 tables, and calls
> > __kvm_flush-dcache_pte, which on a system with S2FWD does ... absolutely
> > nothing.
> >
> > We can avoid that whole song and dance, and simply not set TVM when
> > creating a VM on a system that has S2FWB.
> >
> > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> > Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> > ---
> > I was only able to test this on the model with cache modeling enabled,
> > but even removing TVM from HCR_EL2 without having FWB also worked with
> > that setup, so the testing of this has been light.  It seems like it
> > should obviously work, but it would be good if someone with access to
> > appropriate hardware could give this a spin.
> >
> >  arch/arm64/include/asm/kvm_arm.h     |  3 +--
> >  arch/arm64/include/asm/kvm_emulate.h | 12 +++++++++++-
> >  2 files changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> > index ddf9d762ac62..6e5d839f42b5 100644
> > --- a/arch/arm64/include/asm/kvm_arm.h
> > +++ b/arch/arm64/include/asm/kvm_arm.h
> > @@ -61,7 +61,6 @@
> >   * RW:		64bit by default, can be overridden for 32bit VMs
> >   * TAC:		Trap ACTLR
> >   * TSC:		Trap SMC
> > - * TVM:		Trap VM ops (until M+C set in SCTLR_EL1)
> >   * TSW:		Trap cache operations by set/way
> >   * TWE:		Trap WFE
> >   * TWI:		Trap WFI
> > @@ -74,7 +73,7 @@
> >   * SWIO:	Turn set/way invalidates into set/way clean+invalidate
> >   */
> >  #define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
> > -			 HCR_TVM | HCR_BSU_IS | HCR_FB | HCR_TAC | \
> > +			 HCR_BSU_IS | HCR_FB | HCR_TAC | \
> >  			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
> >  			 HCR_FMO | HCR_IMO)
> >  #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > index d69c1efc63e7..70509799a2a9 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -53,8 +53,18 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
> >  		/* trap error record accesses */
> >  		vcpu->arch.hcr_el2 |= HCR_TERR;
> >  	}
> > -	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> > +
> > +	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) {
> >  		vcpu->arch.hcr_el2 |= HCR_FWB;
> > +	} else {
> > +		/*
> > +		 * For non-FWB CPUs, we trap VM ops (HCR_EL2.TVM) until M+C
> > +		 * get set in SCTLR_EL1 such that we can detect when the guest
> > +		 * MMU gets turned off and do the necessary cache maintenance
> > +		 * then.
> > +		 */
> > +		vcpu->arch.hcr_el2 &= ~HCR_TVM;
> 
> Don't we want to set the bit here, so we're consistent with the
> previous behaviour and the comment? Because with this patch, we
> never set HCR_EL2.TVM...

Of course you're right. This is how I plan to fix it:

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 47c774c2d18b..7b835337f78b 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -63,7 +63,7 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 		 * MMU gets turned on and do the necessary cache maintenance
 		 * then.
 		 */
-		vcpu->arch.hcr_el2 &= ~HCR_TVM;
+		vcpu->arch.hcr_el2 |= HCR_TVM;
 	}
 
 	if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))


Christoffer, please shout if you disagree.

	M.
Mark Rutland Oct. 28, 2019, 5:09 p.m. UTC | #5
On Mon, Oct 28, 2019 at 04:19:55PM +0000, Marc Zyngier wrote:
> On Mon, 28 Oct 2019 15:12:39 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > On 10/28/19 1:05 PM, Christoffer Dall wrote:
> > > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > > index d69c1efc63e7..70509799a2a9 100644
> > > --- a/arch/arm64/include/asm/kvm_emulate.h
> > > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > > @@ -53,8 +53,18 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
> > >  		/* trap error record accesses */
> > >  		vcpu->arch.hcr_el2 |= HCR_TERR;
> > >  	}
> > > -	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> > > +
> > > +	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) {
> > >  		vcpu->arch.hcr_el2 |= HCR_FWB;
> > > +	} else {
> > > +		/*
> > > +		 * For non-FWB CPUs, we trap VM ops (HCR_EL2.TVM) until M+C
> > > +		 * get set in SCTLR_EL1 such that we can detect when the guest
> > > +		 * MMU gets turned off and do the necessary cache maintenance
> > > +		 * then.
> > > +		 */
> > > +		vcpu->arch.hcr_el2 &= ~HCR_TVM;
> > 
> > Don't we want to set the bit here, so we're consistent with the
> > previous behaviour and the comment? Because with this patch, we
> > never set HCR_EL2.TVM...
> 
> Of course you're right. This is how I plan to fix it:
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 47c774c2d18b..7b835337f78b 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -63,7 +63,7 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>  		 * MMU gets turned on and do the necessary cache maintenance
>  		 * then.
>  		 */
> -		vcpu->arch.hcr_el2 &= ~HCR_TVM;
> +		vcpu->arch.hcr_el2 |= HCR_TVM;
>  	}

Ouch, yes. That was as suggested for v1, and I missed it when saying my
R-B held. :(

Mark.
Alexandru Elisei Nov. 6, 2019, 1:02 p.m. UTC | #6
Hi,

On 10/28/19 1:05 PM, Christoffer Dall wrote:
> On CPUs that support S2FWB (Armv8.4+), KVM configures the stage 2 page
> tables to override the memory attributes of memory accesses, regardless
> of the stage 1 page table configurations, and also when the stage 1 MMU
> is turned off.  This results in all memory accesses to RAM being
> cacheable, including during early boot of the guest.
>
> On CPUs without this feature, memory accesses were non-cacheable during
> boot until the guest turned on the stage 1 MMU, and we had to detect
> when the guest turned on the MMU, such that we could invalidate all cache
> entries and ensure a consistent view of memory with the MMU turned on.
> When the guest turned on the caches, we would call stage2_flush_vm()
> from kvm_toggle_cache().
>
> However, stage2_flush_vm() walks all the stage 2 tables, and calls
> __kvm_flush-dcache_pte, which on a system with S2FWD does ... absolutely
> nothing.
>
> We can avoid that whole song and dance, and simply not set TVM when
> creating a VM on a system that has S2FWB.
>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> ---
> I was only able to test this on the model with cache modeling enabled,
> but even removing TVM from HCR_EL2 without having FWB also worked with
> that setup, so the testing of this has been light.  It seems like it
> should obviously work, but it would be good if someone with access to
> appropriate hardware could give this a spin.
>
>  arch/arm64/include/asm/kvm_arm.h     |  3 +--
>  arch/arm64/include/asm/kvm_emulate.h | 12 +++++++++++-
>  2 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index ddf9d762ac62..6e5d839f42b5 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -61,7 +61,6 @@
>   * RW:		64bit by default, can be overridden for 32bit VMs
>   * TAC:		Trap ACTLR
>   * TSC:		Trap SMC
> - * TVM:		Trap VM ops (until M+C set in SCTLR_EL1)
>   * TSW:		Trap cache operations by set/way
>   * TWE:		Trap WFE
>   * TWI:		Trap WFI
> @@ -74,7 +73,7 @@
>   * SWIO:	Turn set/way invalidates into set/way clean+invalidate
>   */
>  #define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
> -			 HCR_TVM | HCR_BSU_IS | HCR_FB | HCR_TAC | \
> +			 HCR_BSU_IS | HCR_FB | HCR_TAC | \
>  			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
>  			 HCR_FMO | HCR_IMO)
>  #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index d69c1efc63e7..70509799a2a9 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -53,8 +53,18 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>  		/* trap error record accesses */
>  		vcpu->arch.hcr_el2 |= HCR_TERR;
>  	}
> -	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> +
> +	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) {
>  		vcpu->arch.hcr_el2 |= HCR_FWB;
> +	} else {
> +		/*
> +		 * For non-FWB CPUs, we trap VM ops (HCR_EL2.TVM) until M+C
> +		 * get set in SCTLR_EL1 such that we can detect when the guest
> +		 * MMU gets turned off and do the necessary cache maintenance
> +		 * then.
> +		 */
> +		vcpu->arch.hcr_el2 &= ~HCR_TVM;
> +	}
>  
>  	if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
>  		vcpu->arch.hcr_el2 &= ~HCR_RW;

This patch makes sense to me: when FWB is available, the guest memory is cacheable
even when the stage 1 MMU is disabled, which means it's now impossible to have a
situation where the data in memory is newer than the data in the cache.

I tested the patch with the fix suggested by Marc by doing a linux boot and then a
'ls -R /', and by running kvm-unit-tests in a loop a couple dozen times. For what
it's worth:

Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>

I do need to point out that I haven't been able to make a guest misbehave when FWB
is not enabled *and* KVM doesn't do a stage2_flush_vm when the stage 1 MMU is
enabled. I tried to write two different tests in kvm-unit-tests:

1. With the MMU never enabled, the test tells the host to read a value from memory
(so a cache line is allocated), writes another value to the same memory location,
and then enables the MMU and reads the memory back. I always got the latest value
that was written while the MMU was off.

2. One thread tells the host to read the memory location in a loop (to make sure
that the cache line doesn't get evicted), while the other thread writes a value
with the MMU off, enables the MMU and reads the memory back. I still got the
latest value written with the MMU off.

I can share the source code for the tests, if anyone is interested; I'm also open
to other suggestions.

Regards,
Alex
Christoffer Dall Nov. 6, 2019, 3:20 p.m. UTC | #7
Hi Alexandru,

On Wed, Nov 06, 2019 at 01:02:21PM +0000, Alexandru Elisei wrote:
> 
> On 10/28/19 1:05 PM, Christoffer Dall wrote:
> > On CPUs that support S2FWB (Armv8.4+), KVM configures the stage 2 page
> > tables to override the memory attributes of memory accesses, regardless
> > of the stage 1 page table configurations, and also when the stage 1 MMU
> > is turned off.  This results in all memory accesses to RAM being
> > cacheable, including during early boot of the guest.
> >
> > On CPUs without this feature, memory accesses were non-cacheable during
> > boot until the guest turned on the stage 1 MMU, and we had to detect
> > when the guest turned on the MMU, such that we could invalidate all cache
> > entries and ensure a consistent view of memory with the MMU turned on.
> > When the guest turned on the caches, we would call stage2_flush_vm()
> > from kvm_toggle_cache().
> >
> > However, stage2_flush_vm() walks all the stage 2 tables, and calls
> > __kvm_flush-dcache_pte, which on a system with S2FWD does ... absolutely
> > nothing.
> >
> > We can avoid that whole song and dance, and simply not set TVM when
> > creating a VM on a system that has S2FWB.
> >
> > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> > Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> > ---
> > I was only able to test this on the model with cache modeling enabled,
> > but even removing TVM from HCR_EL2 without having FWB also worked with
> > that setup, so the testing of this has been light.  It seems like it
> > should obviously work, but it would be good if someone with access to
> > appropriate hardware could give this a spin.
> >
> >  arch/arm64/include/asm/kvm_arm.h     |  3 +--
> >  arch/arm64/include/asm/kvm_emulate.h | 12 +++++++++++-
> >  2 files changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> > index ddf9d762ac62..6e5d839f42b5 100644
> > --- a/arch/arm64/include/asm/kvm_arm.h
> > +++ b/arch/arm64/include/asm/kvm_arm.h
> > @@ -61,7 +61,6 @@
> >   * RW:		64bit by default, can be overridden for 32bit VMs
> >   * TAC:		Trap ACTLR
> >   * TSC:		Trap SMC
> > - * TVM:		Trap VM ops (until M+C set in SCTLR_EL1)
> >   * TSW:		Trap cache operations by set/way
> >   * TWE:		Trap WFE
> >   * TWI:		Trap WFI
> > @@ -74,7 +73,7 @@
> >   * SWIO:	Turn set/way invalidates into set/way clean+invalidate
> >   */
> >  #define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
> > -			 HCR_TVM | HCR_BSU_IS | HCR_FB | HCR_TAC | \
> > +			 HCR_BSU_IS | HCR_FB | HCR_TAC | \
> >  			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
> >  			 HCR_FMO | HCR_IMO)
> >  #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > index d69c1efc63e7..70509799a2a9 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -53,8 +53,18 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
> >  		/* trap error record accesses */
> >  		vcpu->arch.hcr_el2 |= HCR_TERR;
> >  	}
> > -	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> > +
> > +	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) {
> >  		vcpu->arch.hcr_el2 |= HCR_FWB;
> > +	} else {
> > +		/*
> > +		 * For non-FWB CPUs, we trap VM ops (HCR_EL2.TVM) until M+C
> > +		 * get set in SCTLR_EL1 such that we can detect when the guest
> > +		 * MMU gets turned off and do the necessary cache maintenance
> > +		 * then.
> > +		 */
> > +		vcpu->arch.hcr_el2 &= ~HCR_TVM;
> > +	}
> >  
> >  	if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
> >  		vcpu->arch.hcr_el2 &= ~HCR_RW;
> 
> This patch makes sense to me: when FWB is available, the guest memory is cacheable
> even when the stage 1 MMU is disabled, which means it's now impossible to have a
> situation where the data in memory is newer than the data in the cache.
> 
> I tested the patch with the fix suggested by Marc by doing a linux boot and then a
> 'ls -R /', and by running kvm-unit-tests in a loop a couple dozen times. For what
> it's worth:
> 
> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
> 
> I do need to point out that I haven't been able to make a guest misbehave when FWB
> is not enabled *and* KVM doesn't do a stage2_flush_vm when the stage 1 MMU is
> enabled. I tried to write two different tests in kvm-unit-tests:
> 
> 1. With the MMU never enabled, the test tells the host to read a value from memory
> (so a cache line is allocated), writes another value to the same memory location,
> and then enables the MMU and reads the memory back. I always got the latest value
> that was written while the MMU was off.
> 
> 2. One thread tells the host to read the memory location in a loop (to make sure
> that the cache line doesn't get evicted), while the other thread writes a value
> with the MMU off, enables the MMU and reads the memory back. I still got the
> latest value written with the MMU off.
> 
> I can share the source code for the tests, if anyone is interested; I'm also open
> to other suggestions.
> 

Thanks for the thoroughness here.  I also wasn't able to produce an
error on the model, so I think we can conclude that it's at least no
worse than the original code, and since we all agree that this should be
correct, then I think it's fair that Marc has merged the patch.

It can't hurt to post the code you wrote for the test; someone might
pick that up in the future to test something similar.


Thanks,

    Christoffer
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index ddf9d762ac62..6e5d839f42b5 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -61,7 +61,6 @@ 
  * RW:		64bit by default, can be overridden for 32bit VMs
  * TAC:		Trap ACTLR
  * TSC:		Trap SMC
- * TVM:		Trap VM ops (until M+C set in SCTLR_EL1)
  * TSW:		Trap cache operations by set/way
  * TWE:		Trap WFE
  * TWI:		Trap WFI
@@ -74,7 +73,7 @@ 
  * SWIO:	Turn set/way invalidates into set/way clean+invalidate
  */
 #define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
-			 HCR_TVM | HCR_BSU_IS | HCR_FB | HCR_TAC | \
+			 HCR_BSU_IS | HCR_FB | HCR_TAC | \
 			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
 			 HCR_FMO | HCR_IMO)
 #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index d69c1efc63e7..70509799a2a9 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -53,8 +53,18 @@  static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 		/* trap error record accesses */
 		vcpu->arch.hcr_el2 |= HCR_TERR;
 	}
-	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
+
+	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) {
 		vcpu->arch.hcr_el2 |= HCR_FWB;
+	} else {
+		/*
+		 * For non-FWB CPUs, we trap VM ops (HCR_EL2.TVM) until M+C
+		 * get set in SCTLR_EL1 such that we can detect when the guest
+		 * MMU gets turned off and do the necessary cache maintenance
+		 * then.
+		 */
+		vcpu->arch.hcr_el2 &= ~HCR_TVM;
+	}
 
 	if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
 		vcpu->arch.hcr_el2 &= ~HCR_RW;