Message ID | 20090827155450.GA6312@amt.cnet (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 08/27/2009 06:54 PM, Marcelo Tosatti wrote: > perf report shows heavy overhead from down/up of slots_lock. > > Attempted to remove slots_lock by having vcpus stop on a synchronization > point, but this introduced further complexity (a vcpu can be scheduled > out before reaching the synchronization point, and can sched back in at > points which are slots_lock protected, etc). > > This patch changes vcpu_enter_guest to conditionally release/acquire > slots_lock in case a vcpu state bit is set. > > vmexit performance improves by 5-10% on UP guest. > Sorry, it looks pretty complex. Have you considered using srcu? It seems to me down/up_read() can be replaced by srcu_read_lock/unlock(), and after proper conversion of memslots and io_bus to rcu_assign_pointer(), we can just add synchronize_srcu() immediately after changing stuff (of course mmu_lock still needs to be held when updating slots).
On Thu, Aug 27, 2009 at 07:27:48PM +0300, Avi Kivity wrote: > On 08/27/2009 06:54 PM, Marcelo Tosatti wrote: >> perf report shows heavy overhead from down/up of slots_lock. >> >> Attempted to remove slots_lock by having vcpus stop on a synchronization >> point, but this introduced further complexity (a vcpu can be scheduled >> out before reaching the synchronization point, and can sched back in at >> points which are slots_lock protected, etc). >> >> This patch changes vcpu_enter_guest to conditionally release/acquire >> slots_lock in case a vcpu state bit is set. >> >> vmexit performance improves by 5-10% on UP guest. >> > > Sorry, it looks pretty complex. Why? > Have you considered using srcu? It seems to me down/up_read() can > be replaced by srcu_read_lock/unlock(), and after proper conversion > of memslots and io_bus to rcu_assign_pointer(), we can just add > synchronize_srcu() immediately after changing stuff (of course > mmu_lock still needs to be held when updating slots). I don't see RCU as being suitable because in certain operations you want to stop writers (on behalf of vcpus), do something, and let them continue afterwards. The dirty log, for example. Or any operation that wants to modify lockless vcpu specific data. Also, synchronize_srcu() is limited to preemptible sections. io_bus could use RCU, but I think being able to stop vcpus is also a different requirement. Do you have any suggestion on how to do it in a better way? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/28/2009 01:59 AM, Marcelo Tosatti wrote: > On Thu, Aug 27, 2009 at 07:27:48PM +0300, Avi Kivity wrote: > >> On 08/27/2009 06:54 PM, Marcelo Tosatti wrote: >> >>> perf report shows heavy overhead from down/up of slots_lock. >>> >>> Attempted to remove slots_lock by having vcpus stop on a synchronization >>> point, but this introduced further complexity (a vcpu can be scheduled >>> out before reaching the synchronization point, and can sched back in at >>> points which are slots_lock protected, etc). >>> >>> This patch changes vcpu_enter_guest to conditionally release/acquire >>> slots_lock in case a vcpu state bit is set. >>> >>> vmexit performance improves by 5-10% on UP guest. >>> >>> >> Sorry, it looks pretty complex. >> > Why? > There's a new locking protocol in there. I prefer sticking with the existing kernel plumbing, or it gets more and more difficult knowing who protects what and in what order you can do things. >> Have you considered using srcu? It seems to me down/up_read() can >> be replaced by srcu_read_lock/unlock(), and after proper conversion >> of memslots and io_bus to rcu_assign_pointer(), we can just add >> synchronize_srcu() immediately after changing stuff (of course >> mmu_lock still needs to be held when updating slots). >> > I don't see RCU as being suitable because in certain operations you > want to stop writers (on behalf of vcpus), do something, and let them > continue afterwards. The dirty log, for example. Or any operation that > wants to modify lockless vcpu specific data. > kvm_flush_remote_tlbs() (which you'd call after mmu operations), will get cpus out of guest mode, and synchronize_srcu() will wait for them to drop the srcu "read lock". So it really happens naturally: do an RCU update, send some request to all vcpus, synchronize_srcu(), done. > Also, synchronize_srcu() is limited to preemptible sections. > > io_bus could use RCU, but I think being able to stop vcpus is also a > different requirement. Do you have any suggestion on how to do it in a > better way? > We don't need to stop vcpus, just kick them out of guest mode to let them notice the new data. SRCU does that well.
On Fri, Aug 28, 2009 at 09:50:36AM +0300, Avi Kivity wrote: > On 08/28/2009 01:59 AM, Marcelo Tosatti wrote: >> On Thu, Aug 27, 2009 at 07:27:48PM +0300, Avi Kivity wrote: >> >>> On 08/27/2009 06:54 PM, Marcelo Tosatti wrote: >>> >>>> perf report shows heavy overhead from down/up of slots_lock. >>>> >>>> Attempted to remove slots_lock by having vcpus stop on a synchronization >>>> point, but this introduced further complexity (a vcpu can be scheduled >>>> out before reaching the synchronization point, and can sched back in at >>>> points which are slots_lock protected, etc). >>>> >>>> This patch changes vcpu_enter_guest to conditionally release/acquire >>>> slots_lock in case a vcpu state bit is set. >>>> >>>> vmexit performance improves by 5-10% on UP guest. >>>> >>>> >>> Sorry, it looks pretty complex. >>> >> Why? >> > > There's a new locking protocol in there. I prefer sticking with the > existing kernel plumbing, or it gets more and more difficult knowing who > protects what and in what order you can do things. > >>> Have you considered using srcu? It seems to me down/up_read() can >>> be replaced by srcu_read_lock/unlock(), and after proper conversion >>> of memslots and io_bus to rcu_assign_pointer(), we can just add >>> synchronize_srcu() immediately after changing stuff (of course >>> mmu_lock still needs to be held when updating slots). >>> >> I don't see RCU as being suitable because in certain operations you >> want to stop writers (on behalf of vcpus), do something, and let them >> continue afterwards. The dirty log, for example. Or any operation that >> wants to modify lockless vcpu specific data. >> > > kvm_flush_remote_tlbs() (which you'd call after mmu operations), will > get cpus out of guest mode, and synchronize_srcu() will wait for them to > drop the srcu "read lock". So it really happens naturally: do an RCU > update, send some request to all vcpus, synchronize_srcu(), done. > >> Also, synchronize_srcu() is limited to preemptible sections. >> >> io_bus could use RCU, but I think being able to stop vcpus is also a >> different requirement. Do you have any suggestion on how to do it in a >> better way? >> > > We don't need to stop vcpus, just kick them out of guest mode to let > them notice the new data. SRCU does that well. Two problems: 1. The removal of memslots/aliases and zapping of mmu (to remove any shadow pages with stale sp->gfn) must be atomic from the POV of the pagefault path. Otherwise something like this can happen: fault path set_memory_region walk_addr fetches and validates table_gfns delete memslot synchronize_srcu fetch creates shadow page with nonexistant sp->gfn OR mmu_alloc_roots path set_memory_region delete memslot root_gfn = vcpu->arch.cr3 << PAGE_SHIFT mmu_check_root(root_gfn) synchronize_rcu kvm_mmu_get_page() kvm_mmu_zap_all Accesses between kvm_mmu_get_page and kvm_mmu_zap_all window can see shadow pages with stale gfn. But, if you still think its worthwhile to use RCU, at least handling gfn_to_memslot / unalias_gfn errors _and_ adding mmu_notifier_retry invalidation to set_memory_region path will be necessary (so that gfn_to_pfn validation, in the fault path, is discarded in case of memslot/alias update). 2. Another complication is that memslot creation and kvm_iommu_map_pages are not atomic. create memslot synchronize_srcu <----- vcpu grabs gfn reference without iommu mapping. kvm_iommu_map_pages Which can be solved by changing kvm_iommu_map_pages (and new gfn_to_pfn helper) to use base_gfn,npages,hva information from somewhere else other than visible kvm->memslots (so that when the slot becomes visible its already iommu mapped). So it appears to me using RCU introduces more complications / subtle details than mutual exclusion here. The new request bit which the original patch introduces is limited to enabling/disabling conditional acquision of slots_lock (calling it a "new locking protocol" is unfair) to improve write acquision latency. Better ideas/directions welcome. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/11/2009 01:30 AM, Marcelo Tosatti wrote: > >> We don't need to stop vcpus, just kick them out of guest mode to let >> them notice the new data. SRCU does that well. >> > Two problems: > > 1. The removal of memslots/aliases and zapping of mmu (to remove any > shadow pages with stale sp->gfn) must be atomic from the POV of the > pagefault path. Otherwise something like this can happen: > > fault path set_memory_region > srcu_read_lock() > walk_addr fetches and validates > table_gfns > delete memslot > synchronize_srcu > > fetch creates shadow > srcu_read_unlock() > page with nonexistant sp->gfn > I think synchronize_srcu() will be deferred until the fault path is complete (and srcu_read_unlock() runs). Copying someone who knows for sure. > OR > > mmu_alloc_roots path set_memory_region > srcu_read_lock() > > delete memslot > root_gfn = vcpu->arch.cr3<< PAGE_SHIFT > mmu_check_root(root_gfn) synchronize_rcu > kvm_mmu_get_page() > srcu_read_unlock() > kvm_mmu_zap_all > Ditto, srcu_read_lock() protects us. > Accesses between kvm_mmu_get_page and kvm_mmu_zap_all window can see > shadow pages with stale gfn. > > But, if you still think its worthwhile to use RCU, at least handling > gfn_to_memslot / unalias_gfn errors _and_ adding mmu_notifier_retry > invalidation to set_memory_region path will be necessary (so that > gfn_to_pfn validation, in the fault path, is discarded in case > of memslot/alias update). > It really is worthwhile to reuse complex infrastructure instead of writing new infrastructure. > 2. Another complication is that memslot creation and kvm_iommu_map_pages > are not atomic. > > create memslot > synchronize_srcu > <----- vcpu grabs gfn reference without > iommu mapping. > kvm_iommu_map_pages > > Which can be solved by changing kvm_iommu_map_pages (and new gfn_to_pfn > helper) to use base_gfn,npages,hva information from somewhere else other > than visible kvm->memslots (so that when the slot becomes visible its > already iommu mapped). > Yes. It can accept a memslots structure instead of deriving it from kvm->memslots. Then we do a rcu_assign_pointer() to switch the tables. > So it appears to me using RCU introduces more complications / subtle > details than mutual exclusion here. The new request bit which the > original patch introduces is limited to enabling/disabling conditional > acquision of slots_lock (calling it a "new locking protocol" is unfair) > to improve write acquision latency. > It's true that it is not a new locking protocol. But I feel it is worthwhile to try to use rcu for this; at least it will make it easier for newcomers (provided they understand rcu).
On Sun, Sep 13, 2009 at 06:42:49PM +0300, Avi Kivity wrote: > On 09/11/2009 01:30 AM, Marcelo Tosatti wrote: >> >>> We don't need to stop vcpus, just kick them out of guest mode to let >>> them notice the new data. SRCU does that well. >>> >> Two problems: >> >> 1. The removal of memslots/aliases and zapping of mmu (to remove any >> shadow pages with stale sp->gfn) must be atomic from the POV of the >> pagefault path. Otherwise something like this can happen: >> >> fault path set_memory_region >> > > srcu_read_lock() > >> walk_addr fetches and validates >> table_gfns >> delete memslot >> synchronize_srcu >> >> fetch creates shadow >> > > srcu_read_unlock() > >> page with nonexistant sp->gfn >> > > I think synchronize_srcu() will be deferred until the fault path is > complete (and srcu_read_unlock() runs). Copying someone who knows for > sure. Yes, synchronize_srcu() will block until srcu_read_unlock() in this scenario, assuming that the same srcu_struct is used by both. >> OR >> >> mmu_alloc_roots path set_memory_region >> > > srcu_read_lock() > >> >> delete memslot >> root_gfn = vcpu->arch.cr3<< PAGE_SHIFT >> mmu_check_root(root_gfn) synchronize_rcu >> kvm_mmu_get_page() >> > > srcu_read_unlock() > >> kvm_mmu_zap_all >> > > Ditto, srcu_read_lock() protects us. Yep! >> Accesses between kvm_mmu_get_page and kvm_mmu_zap_all window can see >> shadow pages with stale gfn. >> >> But, if you still think its worthwhile to use RCU, at least handling >> gfn_to_memslot / unalias_gfn errors _and_ adding mmu_notifier_retry >> invalidation to set_memory_region path will be necessary (so that >> gfn_to_pfn validation, in the fault path, is discarded in case >> of memslot/alias update). > > It really is worthwhile to reuse complex infrastructure instead of writing > new infrastructure. Marcelo, in your first example, is your concern that the fault path needs to detect the memslot deletion? Or that the use of sp->gfn "leaks" out of the SRCU read-side critical section? Thanx, Paul >> 2. Another complication is that memslot creation and kvm_iommu_map_pages >> are not atomic. >> >> create memslot >> synchronize_srcu >> <----- vcpu grabs gfn reference without >> iommu mapping. >> kvm_iommu_map_pages >> >> Which can be solved by changing kvm_iommu_map_pages (and new gfn_to_pfn >> helper) to use base_gfn,npages,hva information from somewhere else other >> than visible kvm->memslots (so that when the slot becomes visible its >> already iommu mapped). > > Yes. It can accept a memslots structure instead of deriving it from > kvm->memslots. Then we do a rcu_assign_pointer() to switch the tables. > >> So it appears to me using RCU introduces more complications / subtle >> details than mutual exclusion here. The new request bit which the >> original patch introduces is limited to enabling/disabling conditional >> acquision of slots_lock (calling it a "new locking protocol" is unfair) >> to improve write acquision latency. >> > > It's true that it is not a new locking protocol. But I feel it is > worthwhile to try to use rcu for this; at least it will make it easier for > newcomers (provided they understand rcu). > > > -- > error compiling committee.c: too many arguments to function > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Sep 13, 2009 at 09:26:52AM -0700, Paul E. McKenney wrote: > On Sun, Sep 13, 2009 at 06:42:49PM +0300, Avi Kivity wrote: > > On 09/11/2009 01:30 AM, Marcelo Tosatti wrote: > >> > >>> We don't need to stop vcpus, just kick them out of guest mode to let > >>> them notice the new data. SRCU does that well. > >>> > >> Two problems: > >> > >> 1. The removal of memslots/aliases and zapping of mmu (to remove any > >> shadow pages with stale sp->gfn) must be atomic from the POV of the > >> pagefault path. Otherwise something like this can happen: > >> > >> fault path set_memory_region > >> > > > > srcu_read_lock() > > > >> walk_addr fetches and validates > >> table_gfns > >> delete memslot > >> synchronize_srcu > >> > >> fetch creates shadow > >> > > > > srcu_read_unlock() > > > >> page with nonexistant sp->gfn > >> > > > > I think synchronize_srcu() will be deferred until the fault path is > > complete (and srcu_read_unlock() runs). Copying someone who knows for > > sure. > > Yes, synchronize_srcu() will block until srcu_read_unlock() in this > scenario, assuming that the same srcu_struct is used by both. Right it will. But this does not stop the fault path from creating shadow pages with stale sp->gfn (the only way to do that would be mutual exclusion AFAICS). > >> OR > >> > >> mmu_alloc_roots path set_memory_region > >> > > > > srcu_read_lock() > > > >> > >> delete memslot > >> root_gfn = vcpu->arch.cr3<< PAGE_SHIFT > >> mmu_check_root(root_gfn) synchronize_rcu > >> kvm_mmu_get_page() > >> > > > > srcu_read_unlock() > > > >> kvm_mmu_zap_all > >> > > > > Ditto, srcu_read_lock() protects us. > > Yep! The RCU read-protected side does not stop a new memslots pointer from being assigned (with rcu_assign_pointer), does it? > >> Accesses between kvm_mmu_get_page and kvm_mmu_zap_all window can see > >> shadow pages with stale gfn. > >> > >> But, if you still think its worthwhile to use RCU, at least handling > >> gfn_to_memslot / unalias_gfn errors _and_ adding mmu_notifier_retry > >> invalidation to set_memory_region path will be necessary (so that > >> gfn_to_pfn validation, in the fault path, is discarded in case > >> of memslot/alias update). > > > > It really is worthwhile to reuse complex infrastructure instead of writing > > new infrastructure. > > Marcelo, in your first example, is your concern that the fault path > needs to detect the memslot deletion? Yes, it needs to invalidate the leakage, which in this case is a shadow page data structure which was created containing information from a now deleted memslot. > Or that the use of sp->gfn "leaks" out of the SRCU read-side critical > section? Yes, use of a stale sp->gfn leaks outside of the SRCU read side critical section and currently the rest of the code is not ready to deal with that... but it will have to. > Thanx, Paul > > >> 2. Another complication is that memslot creation and kvm_iommu_map_pages > >> are not atomic. > >> > >> create memslot > >> synchronize_srcu > >> <----- vcpu grabs gfn reference without > >> iommu mapping. > >> kvm_iommu_map_pages > >> > >> Which can be solved by changing kvm_iommu_map_pages (and new gfn_to_pfn > >> helper) to use base_gfn,npages,hva information from somewhere else other > >> than visible kvm->memslots (so that when the slot becomes visible its > >> already iommu mapped). > > > > Yes. It can accept a memslots structure instead of deriving it from > > kvm->memslots. Then we do a rcu_assign_pointer() to switch the tables. Alright. > >> So it appears to me using RCU introduces more complications / subtle > >> details than mutual exclusion here. The new request bit which the > >> original patch introduces is limited to enabling/disabling conditional > >> acquision of slots_lock (calling it a "new locking protocol" is unfair) > >> to improve write acquision latency. > >> > > > > It's true that it is not a new locking protocol. But I feel it is > > worthwhile to try to use rcu for this; at least it will make it easier for > > newcomers (provided they understand rcu). Sure. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/14/2009 01:49 AM, Marcelo Tosatti wrote: > >>> I think synchronize_srcu() will be deferred until the fault path is >>> complete (and srcu_read_unlock() runs). Copying someone who knows for >>> sure. >>> >> Yes, synchronize_srcu() will block until srcu_read_unlock() in this >> scenario, assuming that the same srcu_struct is used by both. >> > Right it will. But this does not stop the fault path from creating > shadow pages with stale sp->gfn (the only way to do that would be mutual > exclusion AFAICS). > So we put the kvm_mmu_zap_pages() call as part of the synchronize_srcu() callback to take advantage of the srcu guarantees. We know that when when the callback is called all new reads see the new slots and all old readers have completed. > The RCU read-protected side does not stop a new memslots pointer from > being assigned (with rcu_assign_pointer), does it? > > It doesn't. It only gives you a point in time where you know no one is using the old pointer, but before it has been deleted.
On 09/14/2009 08:03 AM, Avi Kivity wrote: >> Right it will. But this does not stop the fault path from creating >> shadow pages with stale sp->gfn (the only way to do that would be mutual >> exclusion AFAICS). > > So we put the kvm_mmu_zap_pages() call as part of the > synchronize_srcu() callback to take advantage of the srcu guarantees. > We know that when when the callback is called all new reads see the > new slots and all old readers have completed. I think I see your concern - assigning sp->gfn leaks information out of the srcu critical section. Two ways out: 1) copy kvm->slots into sp->slots and use it when dropping the shadow page. Intrusive and increases shadow footprint. 1b) Instead of sp->slots, use a 1-bit generation counter. Even uglier but reduces the shadow footprint. 2) instead of removing the slot in rcu_assign_pointer(), mark it invalid. gfn_to_page() will fail on such slots but the teardown paths (like unaccount_shadow) continue to work. One we've zapped the mmu we drop the slot completely (can do in place, no need to rcu_assign_pointer).
Index: kvm-requests/arch/x86/kvm/vmx.c =================================================================== --- kvm-requests.orig/arch/x86/kvm/vmx.c +++ kvm-requests/arch/x86/kvm/vmx.c @@ -2169,7 +2169,7 @@ static int alloc_apic_access_page(struct struct kvm_userspace_memory_region kvm_userspace_mem; int r = 0; - down_write(&kvm->slots_lock); + kvm_grab_global_lock(kvm); if (kvm->arch.apic_access_page) goto out; kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT; @@ -2191,7 +2191,7 @@ static int alloc_identity_pagetable(stru struct kvm_userspace_memory_region kvm_userspace_mem; int r = 0; - down_write(&kvm->slots_lock); + kvm_grab_global_lock(kvm); if (kvm->arch.ept_identity_pagetable) goto out; kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT; Index: kvm-requests/arch/x86/kvm/x86.c =================================================================== --- kvm-requests.orig/arch/x86/kvm/x86.c +++ kvm-requests/arch/x86/kvm/x86.c @@ -1926,7 +1926,7 @@ static int kvm_vm_ioctl_set_nr_mmu_pages if (kvm_nr_mmu_pages < KVM_MIN_ALLOC_MMU_PAGES) return -EINVAL; - down_write(&kvm->slots_lock); + kvm_grab_global_lock(kvm); spin_lock(&kvm->mmu_lock); kvm_mmu_change_mmu_pages(kvm, kvm_nr_mmu_pages); @@ -1982,7 +1982,7 @@ static int kvm_vm_ioctl_set_memory_alias < alias->target_phys_addr) goto out; - down_write(&kvm->slots_lock); + kvm_grab_global_lock(kvm); spin_lock(&kvm->mmu_lock); p = &kvm->arch.aliases[alias->slot]; @@ -2137,7 +2137,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kv struct kvm_memory_slot *memslot; int is_dirty = 0; - down_write(&kvm->slots_lock); + kvm_grab_global_lock(kvm); r = kvm_get_dirty_log(kvm, log, &is_dirty); if (r) @@ -2253,7 +2253,7 @@ long kvm_arch_vm_ioctl(struct file *filp sizeof(struct kvm_pit_config))) goto out; create_pit: - down_write(&kvm->slots_lock); + kvm_grab_global_lock(kvm); r = -EEXIST; if (kvm->arch.vpit) goto create_pit_unlock; @@ -3548,7 +3548,7 @@ static void inject_pending_event(struct static int vcpu_enter_guest(struct kvm_vcpu *vcpu) { - int r; + int r, dropped_slots_lock = 0; bool req_int_win = !irqchip_in_kernel(vcpu->kvm) && vcpu->run->request_interrupt_window; @@ -3616,7 +3616,10 @@ static int vcpu_enter_guest(struct kvm_v kvm_lapic_sync_to_vapic(vcpu); } - up_read(&vcpu->kvm->slots_lock); + if (unlikely(test_bit(KVM_VCPU_DROP_LOCK, &vcpu->vcpu_state))) { + dropped_slots_lock = 1; + up_read(&vcpu->kvm->slots_lock); + } kvm_guest_enter(); @@ -3668,8 +3671,8 @@ static int vcpu_enter_guest(struct kvm_v preempt_enable(); - down_read(&vcpu->kvm->slots_lock); - + if (dropped_slots_lock) + down_read(&vcpu->kvm->slots_lock); /* * Profile KVM exit RIPs: */ Index: kvm-requests/include/linux/kvm_host.h =================================================================== --- kvm-requests.orig/include/linux/kvm_host.h +++ kvm-requests/include/linux/kvm_host.h @@ -44,6 +44,7 @@ #define KVM_VCPU_GUEST_MODE 0 #define KVM_VCPU_KICKED 1 +#define KVM_VCPU_DROP_LOCK 2 struct kvm; struct kvm_vcpu; @@ -408,6 +409,7 @@ void kvm_unregister_irq_ack_notifier(str struct kvm_irq_ack_notifier *kian); int kvm_request_irq_source_id(struct kvm *kvm); void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id); +void kvm_grab_global_lock(struct kvm *kvm); /* For vcpu->arch.iommu_flags */ #define KVM_IOMMU_CACHE_COHERENCY 0x1 Index: kvm-requests/virt/kvm/coalesced_mmio.c =================================================================== --- kvm-requests.orig/virt/kvm/coalesced_mmio.c +++ kvm-requests/virt/kvm/coalesced_mmio.c @@ -117,7 +117,7 @@ int kvm_vm_ioctl_register_coalesced_mmio if (dev == NULL) return -EINVAL; - down_write(&kvm->slots_lock); + kvm_grab_global_lock(kvm); if (dev->nb_zones >= KVM_COALESCED_MMIO_ZONE_MAX) { up_write(&kvm->slots_lock); return -ENOBUFS; @@ -140,7 +140,7 @@ int kvm_vm_ioctl_unregister_coalesced_mm if (dev == NULL) return -EINVAL; - down_write(&kvm->slots_lock); + kvm_grab_global_lock(kvm); i = dev->nb_zones; while(i) { Index: kvm-requests/virt/kvm/eventfd.c =================================================================== --- kvm-requests.orig/virt/kvm/eventfd.c +++ kvm-requests/virt/kvm/eventfd.c @@ -498,7 +498,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, st else p->wildcard = true; - down_write(&kvm->slots_lock); + kvm_grab_global_lock(kvm); /* Verify that there isnt a match already */ if (ioeventfd_check_collision(kvm, p)) { @@ -541,7 +541,7 @@ kvm_deassign_ioeventfd(struct kvm *kvm, if (IS_ERR(eventfd)) return PTR_ERR(eventfd); - down_write(&kvm->slots_lock); + kvm_grab_global_lock(kvm); list_for_each_entry_safe(p, tmp, &kvm->ioeventfds, list) { bool wildcard = !(args->flags & KVM_IOEVENTFD_FLAG_DATAMATCH); Index: kvm-requests/virt/kvm/kvm_main.c =================================================================== --- kvm-requests.orig/virt/kvm/kvm_main.c +++ kvm-requests/virt/kvm/kvm_main.c @@ -787,6 +787,22 @@ void kvm_reload_remote_mmus(struct kvm * kvm_vcpus_request(kvm, KVM_REQ_MMU_RELOAD); } +void kvm_grab_global_lock(struct kvm *kvm) +{ + int i; + struct kvm_vcpu *vcpu; + + kvm_for_each_vcpu(i, vcpu, kvm) { + set_bit(KVM_VCPU_DROP_LOCK, &vcpu->vcpu_state); + barrier(); + kvm_vcpu_ipi(vcpu); + } + down_write(&kvm->slots_lock); + kvm_for_each_vcpu(i, vcpu, kvm) + clear_bit(KVM_VCPU_DROP_LOCK, &vcpu->vcpu_state); +} +EXPORT_SYMBOL_GPL(kvm_grab_global_lock); + int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) { struct page *page; @@ -1286,7 +1302,7 @@ int kvm_set_memory_region(struct kvm *kv { int r; - down_write(&kvm->slots_lock); + kvm_grab_global_lock(kvm); r = __kvm_set_memory_region(kvm, mem, user_alloc); up_write(&kvm->slots_lock); return r; @@ -2556,7 +2572,7 @@ int kvm_io_bus_register_dev(struct kvm * { int ret; - down_write(&kvm->slots_lock); + kvm_grab_global_lock(kvm); ret = __kvm_io_bus_register_dev(bus, dev); up_write(&kvm->slots_lock); @@ -2579,7 +2595,7 @@ void kvm_io_bus_unregister_dev(struct kv struct kvm_io_bus *bus, struct kvm_io_device *dev) { - down_write(&kvm->slots_lock); + kvm_grab_global_lock(kvm); __kvm_io_bus_unregister_dev(bus, dev); up_write(&kvm->slots_lock); }
perf report shows heavy overhead from down/up of slots_lock. Attempted to remove slots_lock by having vcpus stop on a synchronization point, but this introduced further complexity (a vcpu can be scheduled out before reaching the synchronization point, and can sched back in at points which are slots_lock protected, etc). This patch changes vcpu_enter_guest to conditionally release/acquire slots_lock in case a vcpu state bit is set. vmexit performance improves by 5-10% on UP guest. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html