Message ID | 1375869826-17509-3-git-send-email-Bharat.Bhushan@freescale.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 2013-08-07 at 15:33 +0530, Bharat Bhushan wrote: > When the MM code is invalidating a range of pages, it calls the KVM > kvm_mmu_notifier_invalidate_range_start() notifier function, which calls > kvm_unmap_hva_range(), which arranges to flush all the TLBs for guest pages. > However, the Linux PTEs for the range being flushed are still valid at > that point. We are not supposed to establish any new references to pages > in the range until the ...range_end() notifier gets called. > The PPC-specific KVM code doesn't get any explicit notification of that; > instead, we are supposed to use mmu_notifier_retry() to test whether we > are or have been inside a range flush notifier pair while we have been > referencing a page. > > This patch calls the mmu_notifier_retry() while mapping the guest > page to ensure we are not referencing a page when in range invalidation. > > This call is inside a region locked with kvm->mmu_lock, which is the > same lock that is called by the KVM MMU notifier functions, thus > ensuring that no new notification can proceed while we are in the > locked region. > > Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> > --- > arch/powerpc/kvm/e500_mmu_host.c | 19 +++++++++++++++++-- > 1 files changed, 17 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c > index ff6dd66..ae4eaf6 100644 > --- a/arch/powerpc/kvm/e500_mmu_host.c > +++ b/arch/powerpc/kvm/e500_mmu_host.c > @@ -329,8 +329,14 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, > int tsize = BOOK3E_PAGESZ_4K; > unsigned long tsize_pages = 0; > pte_t *ptep; > - int wimg = 0; > + int wimg = 0, ret = 0; > pgd_t *pgdir; > + unsigned long mmu_seq; > + struct kvm *kvm = vcpu_e500->vcpu.kvm; > + > + /* used to check for invalidations in progress */ > + mmu_seq = kvm->mmu_notifier_seq; > + smp_rmb(); > > /* > * Translate guest physical to true physical, acquiring > @@ -458,6 +464,13 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, > (long)gfn, pfn); > return -EINVAL; > } > + > + spin_lock(&kvm->mmu_lock); > + if (mmu_notifier_retry(kvm, mmu_seq)) { > + ret = -EAGAIN; > + goto out; > + } > + > kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg); > > kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize, > @@ -466,10 +479,12 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, > /* Clear i-cache for new pages */ > kvmppc_mmu_flush_icache(pfn); > > +out: > + spin_unlock(&kvm->mmu_lock); > /* Drop refcount on page, so that mmu notifiers can clear it */ > kvm_release_pfn_clean(pfn); > > - return 0; > + return ret; > } Acked-by: Scott Wood <scottwood@freescale.com> since it's currently the standard KVM approach, though I'm not happy about the busy-waiting aspect of it. What if we preempted the thread responsible for decrementing mmu_notifier_count? What if we did so being a SCHED_FIFO task of higher priority than the decrementing thread? -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07.08.2013, at 12:03, Bharat Bhushan wrote: > When the MM code is invalidating a range of pages, it calls the KVM > kvm_mmu_notifier_invalidate_range_start() notifier function, which calls > kvm_unmap_hva_range(), which arranges to flush all the TLBs for guest pages. > However, the Linux PTEs for the range being flushed are still valid at > that point. We are not supposed to establish any new references to pages > in the range until the ...range_end() notifier gets called. > The PPC-specific KVM code doesn't get any explicit notification of that; > instead, we are supposed to use mmu_notifier_retry() to test whether we > are or have been inside a range flush notifier pair while we have been > referencing a page. > > This patch calls the mmu_notifier_retry() while mapping the guest > page to ensure we are not referencing a page when in range invalidation. > > This call is inside a region locked with kvm->mmu_lock, which is the > same lock that is called by the KVM MMU notifier functions, thus > ensuring that no new notification can proceed while we are in the > locked region. > > Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Acked-by: Alexander Graf <agraf@suse.de> Gleb, Paolo, please queue for 3.12 directly. Alex > --- > arch/powerpc/kvm/e500_mmu_host.c | 19 +++++++++++++++++-- > 1 files changed, 17 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c > index ff6dd66..ae4eaf6 100644 > --- a/arch/powerpc/kvm/e500_mmu_host.c > +++ b/arch/powerpc/kvm/e500_mmu_host.c > @@ -329,8 +329,14 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, > int tsize = BOOK3E_PAGESZ_4K; > unsigned long tsize_pages = 0; > pte_t *ptep; > - int wimg = 0; > + int wimg = 0, ret = 0; > pgd_t *pgdir; > + unsigned long mmu_seq; > + struct kvm *kvm = vcpu_e500->vcpu.kvm; > + > + /* used to check for invalidations in progress */ > + mmu_seq = kvm->mmu_notifier_seq; > + smp_rmb(); > > /* > * Translate guest physical to true physical, acquiring > @@ -458,6 +464,13 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, > (long)gfn, pfn); > return -EINVAL; > } > + > + spin_lock(&kvm->mmu_lock); > + if (mmu_notifier_retry(kvm, mmu_seq)) { > + ret = -EAGAIN; > + goto out; > + } > + > kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg); > > kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize, > @@ -466,10 +479,12 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, > /* Clear i-cache for new pages */ > kvmppc_mmu_flush_icache(pfn); > > +out: > + spin_unlock(&kvm->mmu_lock); > /* Drop refcount on page, so that mmu notifiers can clear it */ > kvm_release_pfn_clean(pfn); > > - return 0; > + return ret; > } > > /* XXX only map the one-one case, for now use TLB0 */ > -- > 1.7.0.4 > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index ff6dd66..ae4eaf6 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -329,8 +329,14 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, int tsize = BOOK3E_PAGESZ_4K; unsigned long tsize_pages = 0; pte_t *ptep; - int wimg = 0; + int wimg = 0, ret = 0; pgd_t *pgdir; + unsigned long mmu_seq; + struct kvm *kvm = vcpu_e500->vcpu.kvm; + + /* used to check for invalidations in progress */ + mmu_seq = kvm->mmu_notifier_seq; + smp_rmb(); /* * Translate guest physical to true physical, acquiring @@ -458,6 +464,13 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, (long)gfn, pfn); return -EINVAL; } + + spin_lock(&kvm->mmu_lock); + if (mmu_notifier_retry(kvm, mmu_seq)) { + ret = -EAGAIN; + goto out; + } + kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg); kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize, @@ -466,10 +479,12 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, /* Clear i-cache for new pages */ kvmppc_mmu_flush_icache(pfn); +out: + spin_unlock(&kvm->mmu_lock); /* Drop refcount on page, so that mmu notifiers can clear it */ kvm_release_pfn_clean(pfn); - return 0; + return ret; } /* XXX only map the one-one case, for now use TLB0 */
When the MM code is invalidating a range of pages, it calls the KVM kvm_mmu_notifier_invalidate_range_start() notifier function, which calls kvm_unmap_hva_range(), which arranges to flush all the TLBs for guest pages. However, the Linux PTEs for the range being flushed are still valid at that point. We are not supposed to establish any new references to pages in the range until the ...range_end() notifier gets called. The PPC-specific KVM code doesn't get any explicit notification of that; instead, we are supposed to use mmu_notifier_retry() to test whether we are or have been inside a range flush notifier pair while we have been referencing a page. This patch calls the mmu_notifier_retry() while mapping the guest page to ensure we are not referencing a page when in range invalidation. This call is inside a region locked with kvm->mmu_lock, which is the same lock that is called by the KVM MMU notifier functions, thus ensuring that no new notification can proceed while we are in the locked region. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> --- arch/powerpc/kvm/e500_mmu_host.c | 19 +++++++++++++++++-- 1 files changed, 17 insertions(+), 2 deletions(-)