Message ID | 20130123191811.efba4200.yoshikawa_takuya_b1@lab.ntt.co.jp (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jan 23, 2013 at 07:18:11PM +0900, Takuya Yoshikawa wrote: > We noticed that kvm_mmu_zap_all() could take hundreds of milliseconds > for zapping mmu pages with mmu_lock held. > > Although we need to do conditional rescheduling for completely > fixing this issue, we can reduce the hold time to some extent by moving > free_zapped_mmu_pages() out of the protection. Since invalid_list can > be very long, the effect is not negligible. > > Note: this patch does not treat non-trivial cases. > > Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Can you describe the case thats biting? Is it /* * If memory slot is created, or moved, we need to clear all * mmio sptes. */ if (npages && old.base_gfn != mem->guest_phys_addr >> PAGE_SHIFT) { kvm_mmu_zap_all(kvm); kvm_reload_remote_mmus(kvm); } Because conditional rescheduling for kvm_mmu_zap_all() might not be desirable: KVM_SET_USER_MEMORY has low latency requirements. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 4 Feb 2013 11:50:00 -0200 Marcelo Tosatti <mtosatti@redhat.com> wrote: > On Wed, Jan 23, 2013 at 07:18:11PM +0900, Takuya Yoshikawa wrote: > > We noticed that kvm_mmu_zap_all() could take hundreds of milliseconds > > for zapping mmu pages with mmu_lock held. > > > > Although we need to do conditional rescheduling for completely > > fixing this issue, we can reduce the hold time to some extent by moving > > free_zapped_mmu_pages() out of the protection. Since invalid_list can > > be very long, the effect is not negligible. > > > > Note: this patch does not treat non-trivial cases. > > > > Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> > > Can you describe the case thats biting? Is it Non-trivial cases: a few functions that indirectly call zap_page() and cannot access invalid_list outside of mmu_lock. Not worth fixing, I think. > > /* > * If memory slot is created, or moved, we need to clear all > * mmio sptes. > */ > if (npages && old.base_gfn != mem->guest_phys_addr >> PAGE_SHIFT) { > kvm_mmu_zap_all(kvm); > kvm_reload_remote_mmus(kvm); > } > > Because conditional rescheduling for kvm_mmu_zap_all() might not be > desirable: KVM_SET_USER_MEMORY has low latency requirements. This case is problematic. With huge pages in use, things gets improved to some extent: big guests need TDP and THP anyway. But as Avi noted once, we need a way to make long mmu_lock holder break for achieving lock-less TLB flushes. Xiao's work may help zap_all() case. But in general, protecting post zap work by spin_lock is not desirable. I'll think about inaccurate n_used_mmu_pages problem you pointed out. Thanks, Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index dd7b455..7976f55 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2185,7 +2185,6 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int goal_nr_mmu_pages) kvm_mmu_prepare_zap_page(kvm, page, &invalid_list, NULL); } kvm_mmu_commit_zap_page(kvm, &invalid_list); - free_zapped_mmu_pages(kvm, &invalid_list); goal_nr_mmu_pages = kvm->arch.n_used_mmu_pages; } @@ -2193,6 +2192,8 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int goal_nr_mmu_pages) kvm->arch.n_max_mmu_pages = goal_nr_mmu_pages; spin_unlock(&kvm->mmu_lock); + + free_zapped_mmu_pages(kvm, &invalid_list); } int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn) @@ -2213,9 +2214,10 @@ int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn) kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list, &npos); } kvm_mmu_commit_zap_page(kvm, &invalid_list); - free_zapped_mmu_pages(kvm, &invalid_list); spin_unlock(&kvm->mmu_lock); + free_zapped_mmu_pages(kvm, &invalid_list); + return r; } EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page); @@ -2934,10 +2936,11 @@ static void mmu_free_roots(struct kvm_vcpu *vcpu) kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list, NULL); kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); - free_zapped_mmu_pages(vcpu->kvm, &invalid_list); } vcpu->arch.mmu.root_hpa = INVALID_PAGE; spin_unlock(&vcpu->kvm->mmu_lock); + + free_zapped_mmu_pages(vcpu->kvm, &invalid_list); return; } for (i = 0; i < 4; ++i) { @@ -2954,9 +2957,10 @@ static void mmu_free_roots(struct kvm_vcpu *vcpu) vcpu->arch.mmu.pae_root[i] = INVALID_PAGE; } kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); - free_zapped_mmu_pages(vcpu->kvm, &invalid_list); spin_unlock(&vcpu->kvm->mmu_lock); + + free_zapped_mmu_pages(vcpu->kvm, &invalid_list); vcpu->arch.mmu.root_hpa = INVALID_PAGE; } @@ -4054,10 +4058,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, mmu_pte_write_flush_tlb(vcpu, zap_page, remote_flush, local_flush); kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); - free_zapped_mmu_pages(vcpu->kvm, &invalid_list); kvm_mmu_audit(vcpu, AUDIT_POST_PTE_WRITE); spin_unlock(&vcpu->kvm->mmu_lock); + + free_zapped_mmu_pages(vcpu->kvm, &invalid_list); } int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva) @@ -4254,9 +4259,10 @@ restart: goto restart; kvm_mmu_commit_zap_page(kvm, &invalid_list); - free_zapped_mmu_pages(kvm, &invalid_list); spin_unlock(&kvm->mmu_lock); + + free_zapped_mmu_pages(kvm, &invalid_list); } static void kvm_mmu_remove_some_alloc_mmu_pages(struct kvm *kvm, @@ -4308,10 +4314,10 @@ static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc) kvm_mmu_remove_some_alloc_mmu_pages(kvm, &invalid_list); kvm_mmu_commit_zap_page(kvm, &invalid_list); - free_zapped_mmu_pages(kvm, &invalid_list); spin_unlock(&kvm->mmu_lock); srcu_read_unlock(&kvm->srcu, idx); + free_zapped_mmu_pages(kvm, &invalid_list); list_move_tail(&kvm->vm_list, &vm_list); break;
We noticed that kvm_mmu_zap_all() could take hundreds of milliseconds for zapping mmu pages with mmu_lock held. Although we need to do conditional rescheduling for completely fixing this issue, we can reduce the hold time to some extent by moving free_zapped_mmu_pages() out of the protection. Since invalid_list can be very long, the effect is not negligible. Note: this patch does not treat non-trivial cases. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 20 +++++++++++++------- 1 files changed, 13 insertions(+), 7 deletions(-)