Message ID | 20210427223635.2711774-6-bgardon@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Lazily allocate memslot rmaps | expand |
On 28/04/21 00:36, Ben Gardon wrote: > +void kvm_arch_assign_memslots(struct kvm *kvm, int as_id, > + struct kvm_memslots *slots) > +{ > + mutex_lock(&kvm->arch.memslot_assignment_lock); > + rcu_assign_pointer(kvm->memslots[as_id], slots); > + mutex_unlock(&kvm->arch.memslot_assignment_lock); > +} Does the assignment also needs the lock, or only the rmap allocation? I would prefer the hook to be something like kvm_arch_setup_new_memslots. (Also it is useful to have a comment somewhere explaining why the slots_lock does not work. IIUC there would be a deadlock because you'd be taking the slots_lock inside an SRCU critical region, while usually the slots_lock critical section is the one that includes a synchronize_srcu; I should dig that up and document that ordering in Documentation/virt/kvm too). Paolo
On Tue, Apr 27, 2021 at 11:25 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 28/04/21 00:36, Ben Gardon wrote: > > +void kvm_arch_assign_memslots(struct kvm *kvm, int as_id, > > + struct kvm_memslots *slots) > > +{ > > + mutex_lock(&kvm->arch.memslot_assignment_lock); > > + rcu_assign_pointer(kvm->memslots[as_id], slots); > > + mutex_unlock(&kvm->arch.memslot_assignment_lock); > > +} > > Does the assignment also needs the lock, or only the rmap allocation? I > would prefer the hook to be something like kvm_arch_setup_new_memslots. The assignment does need to be under the lock to prevent the following race: 1. Thread 1 (installing a new memslot): Acquires memslot assignment lock (or perhaps in this case rmap_allocation_lock would be more apt.) 2. Thread 1: Check alloc_memslot_rmaps (it is false) 3. Thread 1: doesn't allocate memslot rmaps for new slot 4. Thread 1: Releases memslot assignment lock 5. Thread 2 (allocating a shadow root): Acquires memslot assignment lock 6. Thread 2: Sets alloc_memslot_rmaps = true 7. Thread 2: Allocates rmaps for all existing slots 8. Thread 2: Releases memslot assignment lock 9. Thread 2: Sets shadow_mmu_active = true 10. Thread 1: Installs the new memslots 11. Thread 3: Null pointer dereference when trying to access rmaps on the new slot. Putting the assignment under the lock prevents 5-8 from happening between 2 and 10. I'm open to other ideas as far as how to prevent this race though. I admit this solution is not the most elegant looking. > > (Also it is useful to have a comment somewhere explaining why the > slots_lock does not work. IIUC there would be a deadlock because you'd > be taking the slots_lock inside an SRCU critical region, while usually > the slots_lock critical section is the one that includes a > synchronize_srcu; I should dig that up and document that ordering in > Documentation/virt/kvm too). Yeah, sorry about that. I should have added a comment to that effect. As you suspected, it's because of the slots lock / SRCU deadlock. Using the slots lock was my original implementation, until the deadlock issue came up. I can add comments about that in a v2. > > Paolo >
On 28/04/21 18:40, Ben Gardon wrote: > On Tue, Apr 27, 2021 at 11:25 PM Paolo Bonzini <pbonzini@redhat.com> wrote: >> >> On 28/04/21 00:36, Ben Gardon wrote: >>> +void kvm_arch_assign_memslots(struct kvm *kvm, int as_id, >>> + struct kvm_memslots *slots) >>> +{ >>> + mutex_lock(&kvm->arch.memslot_assignment_lock); >>> + rcu_assign_pointer(kvm->memslots[as_id], slots); >>> + mutex_unlock(&kvm->arch.memslot_assignment_lock); >>> +} >> >> Does the assignment also needs the lock, or only the rmap allocation? I >> would prefer the hook to be something like kvm_arch_setup_new_memslots. > > The assignment does need to be under the lock to prevent the following race: > 1. Thread 1 (installing a new memslot): Acquires memslot assignment > lock (or perhaps in this case rmap_allocation_lock would be more apt.) > 2. Thread 1: Check alloc_memslot_rmaps (it is false) > 3. Thread 1: doesn't allocate memslot rmaps for new slot > 4. Thread 1: Releases memslot assignment lock > 5. Thread 2 (allocating a shadow root): Acquires memslot assignment lock > 6. Thread 2: Sets alloc_memslot_rmaps = true > 7. Thread 2: Allocates rmaps for all existing slots > 8. Thread 2: Releases memslot assignment lock > 9. Thread 2: Sets shadow_mmu_active = true > 10. Thread 1: Installs the new memslots > 11. Thread 3: Null pointer dereference when trying to access rmaps on > the new slot. ... because thread 3 would be under mmu_lock and therefore cannot allocate the rmap itself (you have to do it in mmu_alloc_shadow_roots, as in patch 6). Related to this, your solution does not have to protect kvm_dup_memslots with the new lock, because the first update of the memslots will not go through kvm_arch_prepare_memory_region but it _will_ go through install_new_memslots and therefore through the new hook. But overall I think I'd prefer to have a kvm->slots_arch_lock mutex in generic code, and place the call to kvm_dup_memslots and kvm_arch_prepare_memory_region inside that mutex. That makes the new lock decently intuitive, and easily documented as "Architecture code can use slots_arch_lock if the contents of struct kvm_arch_memory_slot needs to be written outside kvm_arch_prepare_memory_region. Unlike slots_lock, slots_arch_lock can be taken inside a ``kvm->srcu`` read-side critical section". I admit I haven't thought about it very thoroughly, but if something like this is enough, it is relatively pretty: diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 9b8e30dd5b9b..6e5106365597 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1333,6 +1333,7 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm, rcu_assign_pointer(kvm->memslots[as_id], slots); + mutex_unlock(&kvm->slots_arch_lock); synchronize_srcu_expedited(&kvm->srcu); /* @@ -1399,6 +1398,7 @@ static int kvm_set_memslot(struct kvm *kvm, struct kvm_memslots *slots; int r; + mutex_lock(&kvm->slots_arch_lock); slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change); if (!slots) return -ENOMEM; @@ -1427,6 +1427,7 @@ static int kvm_set_memslot(struct kvm *kvm, * - kvm_is_visible_gfn (mmu_check_root) */ kvm_arch_flush_shadow_memslot(kvm, slot); + mutex_lock(&kvm->slots_arch_lock); } r = kvm_arch_prepare_memory_region(kvm, new, mem, change); It does make the critical section a bit larger, so that the initialization of the shadow page (which is in KVM_RUN context) contends with slightly more code than necessary. However it's all but a performance critical situation, as it will only happen just once per VM. WDYT? Paolo > Putting the assignment under the lock prevents 5-8 from happening > between 2 and 10. > > I'm open to other ideas as far as how to prevent this race though. I > admit this solution is not the most elegant looking.
On Wed, Apr 28, 2021 at 10:46 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 28/04/21 18:40, Ben Gardon wrote: > > On Tue, Apr 27, 2021 at 11:25 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > >> > >> On 28/04/21 00:36, Ben Gardon wrote: > >>> +void kvm_arch_assign_memslots(struct kvm *kvm, int as_id, > >>> + struct kvm_memslots *slots) > >>> +{ > >>> + mutex_lock(&kvm->arch.memslot_assignment_lock); > >>> + rcu_assign_pointer(kvm->memslots[as_id], slots); > >>> + mutex_unlock(&kvm->arch.memslot_assignment_lock); > >>> +} > >> > >> Does the assignment also needs the lock, or only the rmap allocation? I > >> would prefer the hook to be something like kvm_arch_setup_new_memslots. > > > > The assignment does need to be under the lock to prevent the following race: > > 1. Thread 1 (installing a new memslot): Acquires memslot assignment > > lock (or perhaps in this case rmap_allocation_lock would be more apt.) > > 2. Thread 1: Check alloc_memslot_rmaps (it is false) > > 3. Thread 1: doesn't allocate memslot rmaps for new slot > > 4. Thread 1: Releases memslot assignment lock > > 5. Thread 2 (allocating a shadow root): Acquires memslot assignment lock > > 6. Thread 2: Sets alloc_memslot_rmaps = true > > 7. Thread 2: Allocates rmaps for all existing slots > > 8. Thread 2: Releases memslot assignment lock > > 9. Thread 2: Sets shadow_mmu_active = true > > 10. Thread 1: Installs the new memslots > > 11. Thread 3: Null pointer dereference when trying to access rmaps on > > the new slot. > > ... because thread 3 would be under mmu_lock and therefore cannot > allocate the rmap itself (you have to do it in mmu_alloc_shadow_roots, > as in patch 6). > > Related to this, your solution does not have to protect kvm_dup_memslots > with the new lock, because the first update of the memslots will not go > through kvm_arch_prepare_memory_region but it _will_ go through > install_new_memslots and therefore through the new hook. But overall I > think I'd prefer to have a kvm->slots_arch_lock mutex in generic code, > and place the call to kvm_dup_memslots and > kvm_arch_prepare_memory_region inside that mutex. That makes sense, and I think it also avoids a bug in this series. Currently, if the rmaps are allocated between kvm_dup_memslots and and install_new_memslots, we run into a problem where the copied slots will have new rmaps allocated for them before installation. Potentially creating all sorts of problems. I had fixed that issue in the past by allocating the array of per-level rmaps at memslot creation, seperate from the memslot. That meant that the copy would get the newly allocated rmaps as well, but I thought that was obsolete after some memslot refactors went in. I hoped we could get away without that change, but I was probably wrong. However, with this enlarged critical section, that should not be an issue for creating memslots since kvm_dup_memslots will either copy memslots with the rmaps already allocated, or the whole thing will happen before the rmaps are allocated. ... However with the locking you propose below, we might still run into issues on a move or delete, which would mean we'd still need the separate memory allocation for the rmaps array. Or we do some shenanigans where we try to copy the rmap pointers from the other set of memslots. I can put together a v2 with the seperate rmap memory and more generic locking and see how that looks. > > That makes the new lock decently intuitive, and easily documented as > "Architecture code can use slots_arch_lock if the contents of struct > kvm_arch_memory_slot needs to be written outside > kvm_arch_prepare_memory_region. Unlike slots_lock, slots_arch_lock can > be taken inside a ``kvm->srcu`` read-side critical section". > > I admit I haven't thought about it very thoroughly, but if something > like this is enough, it is relatively pretty: > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 9b8e30dd5b9b..6e5106365597 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1333,6 +1333,7 @@ static struct kvm_memslots > *install_new_memslots(struct kvm *kvm, > > rcu_assign_pointer(kvm->memslots[as_id], slots); > > + mutex_unlock(&kvm->slots_arch_lock); > synchronize_srcu_expedited(&kvm->srcu); > > /* > @@ -1399,6 +1398,7 @@ static int kvm_set_memslot(struct kvm *kvm, > struct kvm_memslots *slots; > int r; > > + mutex_lock(&kvm->slots_arch_lock); > slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change); > if (!slots) > return -ENOMEM; > @@ -1427,6 +1427,7 @@ static int kvm_set_memslot(struct kvm *kvm, > * - kvm_is_visible_gfn (mmu_check_root) > */ > kvm_arch_flush_shadow_memslot(kvm, slot); > + mutex_lock(&kvm->slots_arch_lock); > } > > r = kvm_arch_prepare_memory_region(kvm, new, mem, change); > > It does make the critical section a bit larger, so that the > initialization of the shadow page (which is in KVM_RUN context) contends > with slightly more code than necessary. However it's all but a > performance critical situation, as it will only happen just once per VM. I agree performance is not a huge concern here. Excluding kvm_arch_flush_shadow_memslot from the critical section also helps a lot because that's where most of the work could be if we're deleting / moving a slot. My only worry is the latency this could add to a nested VM launch, but it seems pretty unlikely that that would be frequently coinciding with a memslot change in practice. > > WDYT? > > Paolo > > > Putting the assignment under the lock prevents 5-8 from happening > > between 2 and 10. > > > > I'm open to other ideas as far as how to prevent this race though. I > > admit this solution is not the most elegant looking. > >
On 28/04/21 22:40, Ben Gardon wrote: > ... However with the locking you propose below, we might still run > into issues on a move or delete, which would mean we'd still need the > separate memory allocation for the rmaps array. Or we do some > shenanigans where we try to copy the rmap pointers from the other set > of memslots. If that's (almost) as easy as passing old to kvm_arch_prepare_memory_region, that would be totally okay. > My only worry is the latency this could add to a nested VM launch, but > it seems pretty unlikely that that would be frequently coinciding with > a memslot change in practice. Right, memslot changes in practice occur only at boot and on hotplug. If that was a problem we could always make the allocation state off/in-progress/on, allowing to check the allocation state out of the lock. This would only potentially slow down the first nested VM launch. Paolo
On Wed, Apr 28, 2021 at 2:41 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 28/04/21 22:40, Ben Gardon wrote: > > ... However with the locking you propose below, we might still run > > into issues on a move or delete, which would mean we'd still need the > > separate memory allocation for the rmaps array. Or we do some > > shenanigans where we try to copy the rmap pointers from the other set > > of memslots. > > If that's (almost) as easy as passing old to > kvm_arch_prepare_memory_region, that would be totally okay. Unfortunately it's not quite that easy because it's all the slots _besides_ the one being modified where we'd need to copy the rmaps. > > > My only worry is the latency this could add to a nested VM launch, but > > it seems pretty unlikely that that would be frequently coinciding with > > a memslot change in practice. > > Right, memslot changes in practice occur only at boot and on hotplug. > If that was a problem we could always make the allocation state > off/in-progress/on, allowing to check the allocation state out of the > lock. This would only potentially slow down the first nested VM launch. > > Paolo >
On 28/04/21 23:46, Ben Gardon wrote: > On Wed, Apr 28, 2021 at 2:41 PM Paolo Bonzini <pbonzini@redhat.com> wrote: >> >> On 28/04/21 22:40, Ben Gardon wrote: >>> ... However with the locking you propose below, we might still run >>> into issues on a move or delete, which would mean we'd still need the >>> separate memory allocation for the rmaps array. Or we do some >>> shenanigans where we try to copy the rmap pointers from the other set >>> of memslots. >> >> If that's (almost) as easy as passing old to >> kvm_arch_prepare_memory_region, that would be totally okay. > > Unfortunately it's not quite that easy because it's all the slots > _besides_ the one being modified where we'd need to copy the rmaps. Ah, now I understand the whole race. And it seems to me that if one kvm_dup_memslots within the new lock fixed a bug, two kvm_dup_memslots within the new lock are going to fix two bugs. :) Seriously: unless I'm missing another case (it's late here...), it's not ugly and it's still relatively easy to explain. diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 2799c6660cce..48929dd5fb29 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1270,7 +1270,7 @@ static int check_memory_region_flags(const struct kvm_userspace_memory_region *m return 0; } -static struct kvm_memslots *install_new_memslots(struct kvm *kvm, +static void install_new_memslots(struct kvm *kvm, int as_id, struct kvm_memslots *slots) { struct kvm_memslots *old_memslots = __kvm_memslots(kvm, as_id); @@ -1280,7 +1280,9 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm, slots->generation = gen | KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS; rcu_assign_pointer(kvm->memslots[as_id], slots); + mutex_unlock(&kvm->slots_arch_lock); synchronize_srcu_expedited(&kvm->srcu); + kvfree(old_memslots); /* * Increment the new memslot generation a second time, dropping the @@ -1302,8 +1304,6 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm, kvm_arch_memslots_updated(kvm, gen); slots->generation = gen; - - return old_memslots; } /* @@ -1342,6 +1342,7 @@ static int kvm_set_memslot(struct kvm *kvm, struct kvm_memslots *slots; int r; + mutex_lock(&kvm->slots_arch_lock); slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change); if (!slots) return -ENOMEM; @@ -1353,14 +1354,7 @@ static int kvm_set_memslot(struct kvm *kvm, */ slot = id_to_memslot(slots, old->id); slot->flags |= KVM_MEMSLOT_INVALID; - - /* - * We can re-use the old memslots, the only difference from the - * newly installed memslots is the invalid flag, which will get - * dropped by update_memslots anyway. We'll also revert to the - * old memslots if preparing the new memory region fails. - */ - slots = install_new_memslots(kvm, as_id, slots); + install_new_memslots(kvm, as_id, slots); /* From this point no new shadow pages pointing to a deleted, * or moved, memslot will be created. @@ -1370,6 +1364,9 @@ static int kvm_set_memslot(struct kvm *kvm, * - kvm_is_visible_gfn (mmu_check_root) */ kvm_arch_flush_shadow_memslot(kvm, slot); + + mutex_lock(&kvm->slots_arch_lock); + slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change); } r = kvm_arch_prepare_memory_region(kvm, new, mem, change); @@ -1377,16 +1374,17 @@ static int kvm_set_memslot(struct kvm *kvm, goto out_slots; update_memslots(slots, new, change); - slots = install_new_memslots(kvm, as_id, slots); + install_new_memslots(kvm, as_id, slots); kvm_arch_commit_memory_region(kvm, mem, old, new, change); - - kvfree(slots); return 0; out_slots: - if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) + if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) { + slot = id_to_memslot(slots, old->id); + slot->flags &= ~KVM_MEMSLOT_INVALID; slots = install_new_memslots(kvm, as_id, slots); + } kvfree(slots); return r; } One could optimize things a bit by reusing the allocation and only doing a memcpy from the new memslots array to the old one under the slots_arch_lock. (Plus the above still lacks a mutex_init and should be split in two patches, with the mutex going in the second; but you get the idea and code sometimes is easier than words). Paolo
On Thu, Apr 29, 2021, Paolo Bonzini wrote: > it's not ugly and it's still relatively easy to explain. LOL, that's debatable. > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 2799c6660cce..48929dd5fb29 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1377,16 +1374,17 @@ static int kvm_set_memslot(struct kvm *kvm, > goto out_slots; > update_memslots(slots, new, change); > - slots = install_new_memslots(kvm, as_id, slots); > + install_new_memslots(kvm, as_id, slots); > kvm_arch_commit_memory_region(kvm, mem, old, new, change); > - > - kvfree(slots); > return 0; > out_slots: > - if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) > + if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) { > + slot = id_to_memslot(slots, old->id); > + slot->flags &= ~KVM_MEMSLOT_INVALID; Modifying flags on an SRCU-protect field outside of said protection is sketchy. It's probably ok to do this prior to the generation update, emphasis on "probably". Of course, the VM is also likely about to be killed in this case... > slots = install_new_memslots(kvm, as_id, slots); This will explode if memory allocation for KVM_MR_MOVE fails. In that case, the rmaps for "slots" will have been cleared by kvm_alloc_memslot_metadata(). > + } > kvfree(slots); > return r; > } The SRCU index is already tracked in vcpu->srcu_idx, why not temporarily drop the SRCU lock if activate_shadow_mmu() needs to do work so that it can take slots_lock? That seems simpler and I think would avoid modifying the common memslot code. kvm_arch_async_page_ready() is the only path for reaching kvm_mmu_reload() that looks scary, but that should be impossible to reach with the correct MMU context. We could always and an explicit sanity check on the rmaps being avaiable.
On Thu, Apr 29, 2021, Sean Christopherson wrote: > On Thu, Apr 29, 2021, Paolo Bonzini wrote: > > it's not ugly and it's still relatively easy to explain. > > LOL, that's debatable. > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 2799c6660cce..48929dd5fb29 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -1377,16 +1374,17 @@ static int kvm_set_memslot(struct kvm *kvm, > > goto out_slots; > > update_memslots(slots, new, change); > > - slots = install_new_memslots(kvm, as_id, slots); > > + install_new_memslots(kvm, as_id, slots); > > kvm_arch_commit_memory_region(kvm, mem, old, new, change); > > - > > - kvfree(slots); > > return 0; > > out_slots: > > - if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) > > + if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) { > > + slot = id_to_memslot(slots, old->id); > > + slot->flags &= ~KVM_MEMSLOT_INVALID; > > Modifying flags on an SRCU-protect field outside of said protection is sketchy. > It's probably ok to do this prior to the generation update, emphasis on > "probably". Of course, the VM is also likely about to be killed in this case... > > > slots = install_new_memslots(kvm, as_id, slots); > > This will explode if memory allocation for KVM_MR_MOVE fails. In that case, > the rmaps for "slots" will have been cleared by kvm_alloc_memslot_metadata(). Gah, that's all wrong, slots are the second duplicate and the clear happens on the new slot, not the old slot with the same id. Though I still think temporarily dropping the SRCU lock would be simpler. If performance is a concern, it could be mitigated by adding a capability to preallocate the rmaps. > > + } > > kvfree(slots); > > return r; > > } > > The SRCU index is already tracked in vcpu->srcu_idx, why not temporarily drop > the SRCU lock if activate_shadow_mmu() needs to do work so that it can take > slots_lock? That seems simpler and I think would avoid modifying the common > memslot code. > > kvm_arch_async_page_ready() is the only path for reaching kvm_mmu_reload() that > looks scary, but that should be impossible to reach with the correct MMU context. > We could always and an explicit sanity check on the rmaps being avaiable.
On 29/04/21 02:40, Sean Christopherson wrote: > On Thu, Apr 29, 2021, Paolo Bonzini wrote: >> it's not ugly and it's still relatively easy to explain. > > LOL, that's debatable. From your remark below it looks like we have different priorities on what to avoid modifying. I like the locks to be either very coarse or fine-grained enough for them to be leaves, as I find that to be the easiest way to avoid deadlocks and complex hierarchies. For this reason, I treat unlocking in the middle of a large critical section as "scary by default"; you have to worry about which invariants might be required (in the case of RCU, which pointers might be stored somewhere and would be invalidated), and which locks are taken at that point so that the subsequent relocking would change the lock order from AB to BA. This applies to every path leading to the unlock/relock. So instead what matters IMO is shielding architecture code from the races that Ben had to point out to me, _and the possibility to apply easily explained rules_ outside more complex core code. So, well, "relatively easy" because it's indeed subtle. But if you consider what the locking rules are, "you can choose to protect slots->arch data with this mutex and it will have no problematic interactions with the memslot copy/update code" is as simple as it can get. >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >> index 2799c6660cce..48929dd5fb29 100644 >> --- a/virt/kvm/kvm_main.c >> +++ b/virt/kvm/kvm_main.c >> @@ -1377,16 +1374,17 @@ static int kvm_set_memslot(struct kvm *kvm, >> goto out_slots; >> update_memslots(slots, new, change); >> - slots = install_new_memslots(kvm, as_id, slots); >> + install_new_memslots(kvm, as_id, slots); >> kvm_arch_commit_memory_region(kvm, mem, old, new, change); >> - >> - kvfree(slots); >> return 0; >> out_slots: >> - if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) >> + if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) { >> + slot = id_to_memslot(slots, old->id); >> + slot->flags &= ~KVM_MEMSLOT_INVALID; > > Modifying flags on an SRCU-protect field outside of said protection is sketchy. > It's probably ok to do this prior to the generation update, emphasis on > "probably". Of course, the VM is also likely about to be killed in this case... > >> slots = install_new_memslots(kvm, as_id, slots); > > This will explode if memory allocation for KVM_MR_MOVE fails. In that case, > the rmaps for "slots" will have been cleared by kvm_alloc_memslot_metadata(). I take your subsequent reply as a sort-of-review that the above approach works, though we may disagree on its elegance and complexity. Paolo > The SRCU index is already tracked in vcpu->srcu_idx, why not temporarily drop > the SRCU lock if activate_shadow_mmu() needs to do work so that it can take > slots_lock? That seems simpler and I think would avoid modifying the common > memslot code. > > kvm_arch_async_page_ready() is the only path for reaching kvm_mmu_reload() that > looks scary, but that should be impossible to reach with the correct MMU context. > We could always and an explicit sanity check on the rmaps being avaiable.
On Thu, Apr 29, 2021 at 12:02 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 29/04/21 02:40, Sean Christopherson wrote: > > On Thu, Apr 29, 2021, Paolo Bonzini wrote: > >> it's not ugly and it's still relatively easy to explain. > > > > LOL, that's debatable. > > From your remark below it looks like we have different priorities on > what to avoid modifying. > > I like the locks to be either very coarse or fine-grained enough for > them to be leaves, as I find that to be the easiest way to avoid > deadlocks and complex hierarchies. For this reason, I treat unlocking > in the middle of a large critical section as "scary by default"; you > have to worry about which invariants might be required (in the case of > RCU, which pointers might be stored somewhere and would be invalidated), > and which locks are taken at that point so that the subsequent relocking > would change the lock order from AB to BA. The idea of dropping the srcu read lock to allocate the rmaps scares me too. I have no idea what it protects, in addition to the memslots, and where we might be making assumptions about things staying valid because of it. Is there anywhere that we enumerate the things protected by that SRCU? I wonder if we have complete enough annotations for the SRCU / lockdep checker to find a problem if we were dropping the SRCU read lock in a bad place. > > This applies to every path leading to the unlock/relock. So instead > what matters IMO is shielding architecture code from the races that Ben > had to point out to me, _and the possibility to apply easily explained > rules_ outside more complex core code. > > So, well, "relatively easy" because it's indeed subtle. But if you > consider what the locking rules are, "you can choose to protect > slots->arch data with this mutex and it will have no problematic > interactions with the memslot copy/update code" is as simple as it can get. > > >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > >> index 2799c6660cce..48929dd5fb29 100644 > >> --- a/virt/kvm/kvm_main.c > >> +++ b/virt/kvm/kvm_main.c > >> @@ -1377,16 +1374,17 @@ static int kvm_set_memslot(struct kvm *kvm, > >> goto out_slots; > >> update_memslots(slots, new, change); > >> - slots = install_new_memslots(kvm, as_id, slots); > >> + install_new_memslots(kvm, as_id, slots); > >> kvm_arch_commit_memory_region(kvm, mem, old, new, change); > >> - > >> - kvfree(slots); > >> return 0; > >> out_slots: > >> - if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) > >> + if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) { > >> + slot = id_to_memslot(slots, old->id); > >> + slot->flags &= ~KVM_MEMSLOT_INVALID; > > > > Modifying flags on an SRCU-protect field outside of said protection is sketchy. > > It's probably ok to do this prior to the generation update, emphasis on > > "probably". Of course, the VM is also likely about to be killed in this case... > > > >> slots = install_new_memslots(kvm, as_id, slots); > > > > This will explode if memory allocation for KVM_MR_MOVE fails. In that case, > > the rmaps for "slots" will have been cleared by kvm_alloc_memslot_metadata(). > > I take your subsequent reply as a sort-of-review that the above approach > works, though we may disagree on its elegance and complexity. I'll try Paolo's suggestion about using a second dup_slots to avoid backing the higher level rmap arrays with dynamic memory and send out a V2. > > Paolo > > > The SRCU index is already tracked in vcpu->srcu_idx, why not temporarily drop > > the SRCU lock if activate_shadow_mmu() needs to do work so that it can take > > slots_lock? That seems simpler and I think would avoid modifying the common > > memslot code. > > > > kvm_arch_async_page_ready() is the only path for reaching kvm_mmu_reload() that > > looks scary, but that should be impossible to reach with the correct MMU context. > > We could always and an explicit sanity check on the rmaps being avaiable. >
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3900dcf2439e..bce7fa152473 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1124,6 +1124,11 @@ struct kvm_arch { #endif /* CONFIG_X86_64 */ bool shadow_mmu_active; + + /* + * Protects kvm->memslots. + */ + struct mutex memslot_assignment_lock; }; struct kvm_vm_stat { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fc32a7dbe4c4..30234fe96f48 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10649,6 +10649,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) raw_spin_lock_init(&kvm->arch.tsc_write_lock); mutex_init(&kvm->arch.apic_map_lock); spin_lock_init(&kvm->arch.pvclock_gtod_sync_lock); + mutex_init(&kvm->arch.memslot_assignment_lock); kvm->arch.kvmclock_offset = -get_kvmclock_base_ns(); pvclock_update_vm_gtod_copy(kvm); @@ -10868,6 +10869,16 @@ static int alloc_memslot_rmap(struct kvm_memory_slot *slot, return -ENOMEM; } + +void kvm_arch_assign_memslots(struct kvm *kvm, int as_id, + struct kvm_memslots *slots) +{ + mutex_lock(&kvm->arch.memslot_assignment_lock); + rcu_assign_pointer(kvm->memslots[as_id], slots); + mutex_unlock(&kvm->arch.memslot_assignment_lock); +} + + static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot, unsigned long npages) { diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 8895b95b6a22..146bb839c754 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -720,6 +720,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, struct kvm_memory_slot *memslot, const struct kvm_userspace_memory_region *mem, enum kvm_mr_change change); +void kvm_arch_assign_memslots(struct kvm *kvm, int as_id, + struct kvm_memslots *slots); void kvm_arch_commit_memory_region(struct kvm *kvm, const struct kvm_userspace_memory_region *mem, struct kvm_memory_slot *old, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 2799c6660cce..e62a37bc5b90 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1270,6 +1270,12 @@ static int check_memory_region_flags(const struct kvm_userspace_memory_region *m return 0; } +__weak void kvm_arch_assign_memslots(struct kvm *kvm, int as_id, + struct kvm_memslots *slots) +{ + rcu_assign_pointer(kvm->memslots[as_id], slots); +} + static struct kvm_memslots *install_new_memslots(struct kvm *kvm, int as_id, struct kvm_memslots *slots) { @@ -1279,7 +1285,8 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm, WARN_ON(gen & KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS); slots->generation = gen | KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS; - rcu_assign_pointer(kvm->memslots[as_id], slots); + kvm_arch_assign_memslots(kvm, as_id, slots); + synchronize_srcu_expedited(&kvm->srcu); /*
Adds a lock around memslots changes. Currently this lock does not have any effect on the syncronization model, but it will be used in a future commit to facilitate lazy rmap allocation. Signed-off-by: Ben Gardon <bgardon@google.com> --- arch/x86/include/asm/kvm_host.h | 5 +++++ arch/x86/kvm/x86.c | 11 +++++++++++ include/linux/kvm_host.h | 2 ++ virt/kvm/kvm_main.c | 9 ++++++++- 4 files changed, 26 insertions(+), 1 deletion(-)