Message ID | 20170510064257.3xibvlcxjt37ulib@oak.ozlabs.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, May 10, 2017 at 04:42:57PM +1000, Paul Mackerras wrote: > POWER9 running a radix guest will take some hypervisor interrupts > without going to real mode (turning off the MMU). This means that > early hypercall handlers may now be called in virtual mode. Most of > the handlers work just fine in both modes, but there are some that > can crash the host if called in virtual mode, notably the TCE (IOMMU) > hypercalls H_PUT_TCE, H_STUFF_TCE and H_PUT_TCE_INDIRECT. These > already have both a real-mode and a virtual-mode version, so we > arrange for the real-mode version to return H_TOO_HARD for radix > guests, which will result in the virtual-mode version being called. > > The other hypercall which is sensitive to the MMU mode is H_RANDOM. > It doesn't have a virtual-mode version, so this adds code to enable > it to be called in either mode. > > An alternative solution was considered which would refuse to call any > of the early hypercall handlers when doing a virtual-mode exit from a > radix guest. However, the XICS-on-XIVE code depends on the XICS > hypercalls being handled early even for virtual-mode exits, because > the handlers need to be called before the XIVE vCPU state has been > pulled off the hardware. Therefore that solution would have become > quite invasive and complicated, and was rejected in favour of the > simpler, though less elegant, solution presented here. > > Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: David Gibson <david@gibson.dropbear.id.au> Longer term we want a saner dispatch path for hypercalls in radix mode, which will avoid this problem, but this is a worthwhile interim fix. > --- > This version applies to my kvm-ppc-next branch. I'll post a backport > to 4.12 as well. > > arch/powerpc/kvm/book3s_64_vio_hv.c | 13 +++++++++++++ > arch/powerpc/kvm/book3s_hv_builtin.c | 9 ++++++++- > 2 files changed, 21 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c > index eda0a8f6fae8..3adfd2f5301c 100644 > --- a/arch/powerpc/kvm/book3s_64_vio_hv.c > +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c > @@ -301,6 +301,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, > /* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */ > /* liobn, ioba, tce); */ > > + /* For radix, we might be in virtual mode, so punt */ > + if (kvm_is_radix(vcpu->kvm)) > + return H_TOO_HARD; > + > stt = kvmppc_find_table(vcpu->kvm, liobn); > if (!stt) > return H_TOO_HARD; > @@ -381,6 +385,10 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu, > bool prereg = false; > struct kvmppc_spapr_tce_iommu_table *stit; > > + /* For radix, we might be in virtual mode, so punt */ > + if (kvm_is_radix(vcpu->kvm)) > + return H_TOO_HARD; > + > stt = kvmppc_find_table(vcpu->kvm, liobn); > if (!stt) > return H_TOO_HARD; > @@ -491,6 +499,10 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu, > long i, ret; > struct kvmppc_spapr_tce_iommu_table *stit; > > + /* For radix, we might be in virtual mode, so punt */ > + if (kvm_is_radix(vcpu->kvm)) > + return H_TOO_HARD; > + > stt = kvmppc_find_table(vcpu->kvm, liobn); > if (!stt) > return H_TOO_HARD; > @@ -527,6 +539,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu, > return H_SUCCESS; > } > > +/* This can be called in either virtual mode or real mode */ > long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn, > unsigned long ioba) > { > diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c > index 846b40cb3a62..aea540c53607 100644 > --- a/arch/powerpc/kvm/book3s_hv_builtin.c > +++ b/arch/powerpc/kvm/book3s_hv_builtin.c > @@ -206,7 +206,14 @@ EXPORT_SYMBOL_GPL(kvmppc_hwrng_present); > > long kvmppc_h_random(struct kvm_vcpu *vcpu) > { > - if (powernv_get_random_real_mode(&vcpu->arch.gpr[4])) > + int r; > + > + /* Only need to do the expensive mfmsr() on radix */ > + if (kvm_is_radix(vcpu->kvm) && (mfmsr() & MSR_IR)) > + r = powernv_get_random_long(&vcpu->arch.gpr[4]); > + else > + r = powernv_get_random_real_mode(&vcpu->arch.gpr[4]); > + if (r) > return H_SUCCESS; > > return H_HARDWARE;
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c index eda0a8f6fae8..3adfd2f5301c 100644 --- a/arch/powerpc/kvm/book3s_64_vio_hv.c +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c @@ -301,6 +301,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, /* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */ /* liobn, ioba, tce); */ + /* For radix, we might be in virtual mode, so punt */ + if (kvm_is_radix(vcpu->kvm)) + return H_TOO_HARD; + stt = kvmppc_find_table(vcpu->kvm, liobn); if (!stt) return H_TOO_HARD; @@ -381,6 +385,10 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu, bool prereg = false; struct kvmppc_spapr_tce_iommu_table *stit; + /* For radix, we might be in virtual mode, so punt */ + if (kvm_is_radix(vcpu->kvm)) + return H_TOO_HARD; + stt = kvmppc_find_table(vcpu->kvm, liobn); if (!stt) return H_TOO_HARD; @@ -491,6 +499,10 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu, long i, ret; struct kvmppc_spapr_tce_iommu_table *stit; + /* For radix, we might be in virtual mode, so punt */ + if (kvm_is_radix(vcpu->kvm)) + return H_TOO_HARD; + stt = kvmppc_find_table(vcpu->kvm, liobn); if (!stt) return H_TOO_HARD; @@ -527,6 +539,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu, return H_SUCCESS; } +/* This can be called in either virtual mode or real mode */ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn, unsigned long ioba) { diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 846b40cb3a62..aea540c53607 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -206,7 +206,14 @@ EXPORT_SYMBOL_GPL(kvmppc_hwrng_present); long kvmppc_h_random(struct kvm_vcpu *vcpu) { - if (powernv_get_random_real_mode(&vcpu->arch.gpr[4])) + int r; + + /* Only need to do the expensive mfmsr() on radix */ + if (kvm_is_radix(vcpu->kvm) && (mfmsr() & MSR_IR)) + r = powernv_get_random_long(&vcpu->arch.gpr[4]); + else + r = powernv_get_random_real_mode(&vcpu->arch.gpr[4]); + if (r) return H_SUCCESS; return H_HARDWARE;
POWER9 running a radix guest will take some hypervisor interrupts without going to real mode (turning off the MMU). This means that early hypercall handlers may now be called in virtual mode. Most of the handlers work just fine in both modes, but there are some that can crash the host if called in virtual mode, notably the TCE (IOMMU) hypercalls H_PUT_TCE, H_STUFF_TCE and H_PUT_TCE_INDIRECT. These already have both a real-mode and a virtual-mode version, so we arrange for the real-mode version to return H_TOO_HARD for radix guests, which will result in the virtual-mode version being called. The other hypercall which is sensitive to the MMU mode is H_RANDOM. It doesn't have a virtual-mode version, so this adds code to enable it to be called in either mode. An alternative solution was considered which would refuse to call any of the early hypercall handlers when doing a virtual-mode exit from a radix guest. However, the XICS-on-XIVE code depends on the XICS hypercalls being handled early even for virtual-mode exits, because the handlers need to be called before the XIVE vCPU state has been pulled off the hardware. Therefore that solution would have become quite invasive and complicated, and was rejected in favour of the simpler, though less elegant, solution presented here. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> --- This version applies to my kvm-ppc-next branch. I'll post a backport to 4.12 as well. arch/powerpc/kvm/book3s_64_vio_hv.c | 13 +++++++++++++ arch/powerpc/kvm/book3s_hv_builtin.c | 9 ++++++++- 2 files changed, 21 insertions(+), 1 deletion(-)