diff mbox

[8/8] KVM: MMU: Move free_zapped_mmu_pages() out of the protection of mmu_lock

Message ID 20130123191811.efba4200.yoshikawa_takuya_b1@lab.ntt.co.jp (mailing list archive)
State New, archived
Headers show

Commit Message

Takuya Yoshikawa Jan. 23, 2013, 10:18 a.m. UTC
We noticed that kvm_mmu_zap_all() could take hundreds of milliseconds
for zapping mmu pages with mmu_lock held.

Although we need to do conditional rescheduling for completely
fixing this issue, we can reduce the hold time to some extent by moving
free_zapped_mmu_pages() out of the protection.  Since invalid_list can
be very long, the effect is not negligible.

Note: this patch does not treat non-trivial cases.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c |   20 +++++++++++++-------
 1 files changed, 13 insertions(+), 7 deletions(-)

Comments

Marcelo Tosatti Feb. 4, 2013, 1:50 p.m. UTC | #1
On Wed, Jan 23, 2013 at 07:18:11PM +0900, Takuya Yoshikawa wrote:
> We noticed that kvm_mmu_zap_all() could take hundreds of milliseconds
> for zapping mmu pages with mmu_lock held.
> 
> Although we need to do conditional rescheduling for completely
> fixing this issue, we can reduce the hold time to some extent by moving
> free_zapped_mmu_pages() out of the protection.  Since invalid_list can
> be very long, the effect is not negligible.
> 
> Note: this patch does not treat non-trivial cases.
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>

Can you describe the case thats biting? Is it

        /*
         * If memory slot is created, or moved, we need to clear all
         * mmio sptes.
         */
        if (npages && old.base_gfn != mem->guest_phys_addr >> PAGE_SHIFT) {
                kvm_mmu_zap_all(kvm);
                kvm_reload_remote_mmus(kvm);
        }

Because conditional rescheduling for kvm_mmu_zap_all() might not be
desirable: KVM_SET_USER_MEMORY has low latency requirements.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Takuya Yoshikawa Feb. 5, 2013, 2:21 a.m. UTC | #2
On Mon, 4 Feb 2013 11:50:00 -0200
Marcelo Tosatti <mtosatti@redhat.com> wrote:

> On Wed, Jan 23, 2013 at 07:18:11PM +0900, Takuya Yoshikawa wrote:
> > We noticed that kvm_mmu_zap_all() could take hundreds of milliseconds
> > for zapping mmu pages with mmu_lock held.
> > 
> > Although we need to do conditional rescheduling for completely
> > fixing this issue, we can reduce the hold time to some extent by moving
> > free_zapped_mmu_pages() out of the protection.  Since invalid_list can
> > be very long, the effect is not negligible.
> > 
> > Note: this patch does not treat non-trivial cases.
> > 
> > Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
> 
> Can you describe the case thats biting? Is it

Non-trivial cases: a few functions that indirectly call zap_page() and
cannot access invalid_list outside of mmu_lock.  Not worth fixing, I think.

> 
>         /*
>          * If memory slot is created, or moved, we need to clear all
>          * mmio sptes.
>          */
>         if (npages && old.base_gfn != mem->guest_phys_addr >> PAGE_SHIFT) {
>                 kvm_mmu_zap_all(kvm);
>                 kvm_reload_remote_mmus(kvm);
>         }
> 
> Because conditional rescheduling for kvm_mmu_zap_all() might not be
> desirable: KVM_SET_USER_MEMORY has low latency requirements.

This case is problematic.  With huge pages in use, things gets improved
to some extent: big guests need TDP and THP anyway.

But as Avi noted once, we need a way to make long mmu_lock holder break
for achieving lock-less TLB flushes.  Xiao's work may help zap_all() case.


But in general, protecting post zap work by spin_lock is not desirable.
I'll think about inaccurate n_used_mmu_pages problem you pointed out.

Thanks,
	Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index dd7b455..7976f55 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2185,7 +2185,6 @@  void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int goal_nr_mmu_pages)
 			kvm_mmu_prepare_zap_page(kvm, page, &invalid_list, NULL);
 		}
 		kvm_mmu_commit_zap_page(kvm, &invalid_list);
-		free_zapped_mmu_pages(kvm, &invalid_list);
 
 		goal_nr_mmu_pages = kvm->arch.n_used_mmu_pages;
 	}
@@ -2193,6 +2192,8 @@  void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int goal_nr_mmu_pages)
 	kvm->arch.n_max_mmu_pages = goal_nr_mmu_pages;
 
 	spin_unlock(&kvm->mmu_lock);
+
+	free_zapped_mmu_pages(kvm, &invalid_list);
 }
 
 int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
@@ -2213,9 +2214,10 @@  int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
 		kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list, &npos);
 	}
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
-	free_zapped_mmu_pages(kvm, &invalid_list);
 	spin_unlock(&kvm->mmu_lock);
 
+	free_zapped_mmu_pages(kvm, &invalid_list);
+
 	return r;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
@@ -2934,10 +2936,11 @@  static void mmu_free_roots(struct kvm_vcpu *vcpu)
 			kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
 						 &invalid_list, NULL);
 			kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
-			free_zapped_mmu_pages(vcpu->kvm, &invalid_list);
 		}
 		vcpu->arch.mmu.root_hpa = INVALID_PAGE;
 		spin_unlock(&vcpu->kvm->mmu_lock);
+
+		free_zapped_mmu_pages(vcpu->kvm, &invalid_list);
 		return;
 	}
 	for (i = 0; i < 4; ++i) {
@@ -2954,9 +2957,10 @@  static void mmu_free_roots(struct kvm_vcpu *vcpu)
 		vcpu->arch.mmu.pae_root[i] = INVALID_PAGE;
 	}
 	kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
-	free_zapped_mmu_pages(vcpu->kvm, &invalid_list);
 
 	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	free_zapped_mmu_pages(vcpu->kvm, &invalid_list);
 	vcpu->arch.mmu.root_hpa = INVALID_PAGE;
 }
 
@@ -4054,10 +4058,11 @@  void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	mmu_pte_write_flush_tlb(vcpu, zap_page, remote_flush, local_flush);
 
 	kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
-	free_zapped_mmu_pages(vcpu->kvm, &invalid_list);
 
 	kvm_mmu_audit(vcpu, AUDIT_POST_PTE_WRITE);
 	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	free_zapped_mmu_pages(vcpu->kvm, &invalid_list);
 }
 
 int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
@@ -4254,9 +4259,10 @@  restart:
 			goto restart;
 
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
-	free_zapped_mmu_pages(kvm, &invalid_list);
 
 	spin_unlock(&kvm->mmu_lock);
+
+	free_zapped_mmu_pages(kvm, &invalid_list);
 }
 
 static void kvm_mmu_remove_some_alloc_mmu_pages(struct kvm *kvm,
@@ -4308,10 +4314,10 @@  static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
 
 		kvm_mmu_remove_some_alloc_mmu_pages(kvm, &invalid_list);
 		kvm_mmu_commit_zap_page(kvm, &invalid_list);
-		free_zapped_mmu_pages(kvm, &invalid_list);
 
 		spin_unlock(&kvm->mmu_lock);
 		srcu_read_unlock(&kvm->srcu, idx);
+		free_zapped_mmu_pages(kvm, &invalid_list);
 
 		list_move_tail(&kvm->vm_list, &vm_list);
 		break;