Message ID | 1378376958-27252-4-git-send-email-xiaoguangrong@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Sep 05, 2013 at 06:29:06PM +0800, Xiao Guangrong wrote: > Currently, kvm zaps the large spte if write-protected is needed, the later > read can fault on that spte. Actually, we can make the large spte readonly > instead of making them un-present, the page fault caused by read access can > be avoided > > The idea is from Avi: > | As I mentioned before, write-protecting a large spte is a good idea, > | since it moves some work from protect-time to fault-time, so it reduces > | jitter. This removes the need for the return value. > > This version has fixed the issue reported in 6b73a9606, the reason of that > issue is that fast_page_fault() directly sets the readonly large spte to > writable but only dirty the first page into the dirty-bitmap that means > other pages are missed. Fixed it by only the normal sptes (on the > PT_PAGE_TABLE_LEVEL level) can be fast fixed > > Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> > --- > arch/x86/kvm/mmu.c | 36 ++++++++++++++++++++---------------- > arch/x86/kvm/x86.c | 8 ++++++-- > 2 files changed, 26 insertions(+), 18 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 869f1db..88107ee 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -1177,8 +1177,7 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep) > > /* > * Write-protect on the specified @sptep, @pt_protect indicates whether > - * spte writ-protection is caused by protecting shadow page table. > - * @flush indicates whether tlb need be flushed. > + * spte write-protection is caused by protecting shadow page table. > * > * Note: write protection is difference between drity logging and spte > * protection: > @@ -1187,10 +1186,9 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep) > * - for spte protection, the spte can be writable only after unsync-ing > * shadow page. > * > - * Return true if the spte is dropped. > + * Return true if tlb need be flushed. > */ > -static bool > -spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush, bool pt_protect) > +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect) > { > u64 spte = *sptep; > > @@ -1200,17 +1198,11 @@ spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush, bool pt_protect) > > rmap_printk("rmap_write_protect: spte %p %llx\n", sptep, *sptep); > > - if (__drop_large_spte(kvm, sptep)) { > - *flush |= true; > - return true; > - } > - > if (pt_protect) > spte &= ~SPTE_MMU_WRITEABLE; > spte = spte & ~PT_WRITABLE_MASK; > > - *flush |= mmu_spte_update(sptep, spte); > - return false; > + return mmu_spte_update(sptep, spte); > } Is it necessary for kvm_mmu_unprotect_page to search for an entire range large page range now, instead of a 4k page? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Oct 1, 2013, at 6:39 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote: > On Thu, Sep 05, 2013 at 06:29:06PM +0800, Xiao Guangrong wrote: >> Currently, kvm zaps the large spte if write-protected is needed, the later >> read can fault on that spte. Actually, we can make the large spte readonly >> instead of making them un-present, the page fault caused by read access can >> be avoided >> >> The idea is from Avi: >> | As I mentioned before, write-protecting a large spte is a good idea, >> | since it moves some work from protect-time to fault-time, so it reduces >> | jitter. This removes the need for the return value. >> >> This version has fixed the issue reported in 6b73a9606, the reason of that >> issue is that fast_page_fault() directly sets the readonly large spte to >> writable but only dirty the first page into the dirty-bitmap that means >> other pages are missed. Fixed it by only the normal sptes (on the >> PT_PAGE_TABLE_LEVEL level) can be fast fixed >> >> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> >> --- >> arch/x86/kvm/mmu.c | 36 ++++++++++++++++++++---------------- >> arch/x86/kvm/x86.c | 8 ++++++-- >> 2 files changed, 26 insertions(+), 18 deletions(-) >> >> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c >> index 869f1db..88107ee 100644 >> --- a/arch/x86/kvm/mmu.c >> +++ b/arch/x86/kvm/mmu.c >> @@ -1177,8 +1177,7 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep) >> >> /* >> * Write-protect on the specified @sptep, @pt_protect indicates whether >> - * spte writ-protection is caused by protecting shadow page table. >> - * @flush indicates whether tlb need be flushed. >> + * spte write-protection is caused by protecting shadow page table. >> * >> * Note: write protection is difference between drity logging and spte >> * protection: >> @@ -1187,10 +1186,9 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep) >> * - for spte protection, the spte can be writable only after unsync-ing >> * shadow page. >> * >> - * Return true if the spte is dropped. >> + * Return true if tlb need be flushed. >> */ >> -static bool >> -spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush, bool pt_protect) >> +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect) >> { >> u64 spte = *sptep; >> >> @@ -1200,17 +1198,11 @@ spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush, bool pt_protect) >> >> rmap_printk("rmap_write_protect: spte %p %llx\n", sptep, *sptep); >> >> - if (__drop_large_spte(kvm, sptep)) { >> - *flush |= true; >> - return true; >> - } >> - >> if (pt_protect) >> spte &= ~SPTE_MMU_WRITEABLE; >> spte = spte & ~PT_WRITABLE_MASK; >> >> - *flush |= mmu_spte_update(sptep, spte); >> - return false; >> + return mmu_spte_update(sptep, spte); >> } > > Is it necessary for kvm_mmu_unprotect_page to search for an entire range large > page range now, instead of a 4k page? It is unnecessary. kvm_mmu_unprotect_page is used to delete the gfn's shadow pages then vcpu will try to re-fault. If any gfn in the large range has shadow page, it will stop using large mapping, so that the mapping will be split to small mappings when vcpu re-fault again. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 869f1db..88107ee 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1177,8 +1177,7 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep) /* * Write-protect on the specified @sptep, @pt_protect indicates whether - * spte writ-protection is caused by protecting shadow page table. - * @flush indicates whether tlb need be flushed. + * spte write-protection is caused by protecting shadow page table. * * Note: write protection is difference between drity logging and spte * protection: @@ -1187,10 +1186,9 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep) * - for spte protection, the spte can be writable only after unsync-ing * shadow page. * - * Return true if the spte is dropped. + * Return true if tlb need be flushed. */ -static bool -spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush, bool pt_protect) +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect) { u64 spte = *sptep; @@ -1200,17 +1198,11 @@ spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush, bool pt_protect) rmap_printk("rmap_write_protect: spte %p %llx\n", sptep, *sptep); - if (__drop_large_spte(kvm, sptep)) { - *flush |= true; - return true; - } - if (pt_protect) spte &= ~SPTE_MMU_WRITEABLE; spte = spte & ~PT_WRITABLE_MASK; - *flush |= mmu_spte_update(sptep, spte); - return false; + return mmu_spte_update(sptep, spte); } static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, @@ -1222,11 +1214,8 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, for (sptep = rmap_get_first(*rmapp, &iter); sptep;) { BUG_ON(!(*sptep & PT_PRESENT_MASK)); - if (spte_write_protect(kvm, sptep, &flush, pt_protect)) { - sptep = rmap_get_first(*rmapp, &iter); - continue; - } + flush |= spte_write_protect(kvm, sptep, pt_protect); sptep = rmap_get_next(&iter); } @@ -2675,6 +2664,8 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, break; } + drop_large_spte(vcpu, iterator.sptep); + if (!is_shadow_present_pte(*iterator.sptep)) { u64 base_addr = iterator.addr; @@ -2876,6 +2867,19 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level, goto exit; /* + * Do not fix write-permission on the large spte since we only dirty + * the first page into the dirty-bitmap in fast_pf_fix_direct_spte() + * that means other pages are missed if its slot is dirty-logged. + * + * Instead, we let the slow page fault path create a normal spte to + * fix the access. + * + * See the comments in kvm_arch_commit_memory_region(). + */ + if (sp->role.level > PT_PAGE_TABLE_LEVEL) + goto exit; + + /* * Currently, fast page fault only works for direct mapping since * the gfn is not stable for indirect shadow page. * See Documentation/virtual/kvm/locking.txt to get more detail. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e5ca72a..6ad0c07 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7208,8 +7208,12 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages); /* * Write protect all pages for dirty logging. - * Existing largepage mappings are destroyed here and new ones will - * not be created until the end of the logging. + * + * All the sptes including the large sptes which point to this + * slot are set to readonly. We can not create any new large + * spte on this slot until the end of the logging. + * + * See the comments in fast_page_fault(). */ if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES)) kvm_mmu_slot_remove_write_access(kvm, mem->slot);
Currently, kvm zaps the large spte if write-protected is needed, the later read can fault on that spte. Actually, we can make the large spte readonly instead of making them un-present, the page fault caused by read access can be avoided The idea is from Avi: | As I mentioned before, write-protecting a large spte is a good idea, | since it moves some work from protect-time to fault-time, so it reduces | jitter. This removes the need for the return value. This version has fixed the issue reported in 6b73a9606, the reason of that issue is that fast_page_fault() directly sets the readonly large spte to writable but only dirty the first page into the dirty-bitmap that means other pages are missed. Fixed it by only the normal sptes (on the PT_PAGE_TABLE_LEVEL level) can be fast fixed Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> --- arch/x86/kvm/mmu.c | 36 ++++++++++++++++++++---------------- arch/x86/kvm/x86.c | 8 ++++++-- 2 files changed, 26 insertions(+), 18 deletions(-)