Message ID | 20221027221752.1683510-10-oliver.upton@linux.dev (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm64: Parallel stage-2 fault handling | expand |
On Thu, Oct 27, 2022, Oliver Upton wrote: > There is no real urgency to free a stage-2 subtree that was pruned. > Nonetheless, KVM does the tear down in the stage-2 fault path while > holding the MMU lock. > > Free removed stage-2 subtrees after an RCU grace period. To guarantee > all stage-2 table pages are freed before killing a VM, add an > rcu_barrier() to the flush path. This is _very_ misleading. The above paints RCU as an optimization of sorts to avoid doing work while holding mmu_lock. Freeing page tables in an RCU callback is _required_ for correctness when allowing parallel page faults to remove page tables, as holding mmu_lock for read in that case doesn't ensure no other CPU is accessing and/or holds a reference to the to-be-freed page table. IMO, this patch should to be squashed with the previous patch, "Protect stage-2 traversal with RCU". One doesn't make any sense without the other.
On Tue, Nov 01, 2022 at 08:28:04PM +0000, Sean Christopherson wrote: > On Thu, Oct 27, 2022, Oliver Upton wrote: > > There is no real urgency to free a stage-2 subtree that was pruned. > > Nonetheless, KVM does the tear down in the stage-2 fault path while > > holding the MMU lock. > > [ copy ] > This is _very_ misleading. The above paints RCU as an optimization of sorts to > avoid doing work while holding mmu_lock. Freeing page tables in an RCU callback > is _required_ for correctness when allowing parallel page faults to remove page > tables, as holding mmu_lock for read in that case doesn't ensure no other CPU is > accessing and/or holds a reference to the to-be-freed page table. Agree, but it is still important to reason about what is changing here too. Moving work out of the vCPU fault path _is_ valuable, though ancillary to the correctness requirements. > IMO, this patch should to be squashed with the previous patch, "Protect stage-2 > traversal with RCU". One doesn't make any sense without the other. I had split these up back when this series was a lot more gnarly and there was too much slop in a single diff. That isn't the case any more, so yeah I'll squash them. [ paste ] > > Free removed stage-2 subtrees after an RCU grace period. To guarantee > > all stage-2 table pages are freed before killing a VM, add an > > rcu_barrier() to the flush path. An aside, this is flat-out wrong now. -- Thanks, Oliver
On Tue, Nov 01, 2022, Oliver Upton wrote: > On Tue, Nov 01, 2022 at 08:28:04PM +0000, Sean Christopherson wrote: > > On Thu, Oct 27, 2022, Oliver Upton wrote: > > > There is no real urgency to free a stage-2 subtree that was pruned. > > > Nonetheless, KVM does the tear down in the stage-2 fault path while > > > holding the MMU lock. > > > > > [ copy ] > > > This is _very_ misleading. The above paints RCU as an optimization of sorts to > > avoid doing work while holding mmu_lock. Freeing page tables in an RCU callback > > is _required_ for correctness when allowing parallel page faults to remove page > > tables, as holding mmu_lock for read in that case doesn't ensure no other CPU is > > accessing and/or holds a reference to the to-be-freed page table. > > Agree, but it is still important to reason about what is changing here > too. Moving work out of the vCPU fault path _is_ valuable, though > ancillary to the correctness requirements. Sure, but that's at best a footnote. Similar to protecting freeing, RCU isn't the only option for moving work out of the vCPU fault path. In fact, it's probably one of the worst options because RCU callbacks run with soft IRQs disabled, i.e. doing _too_ much in a RCU callback is a real problem. If RCU weren't being used to protect readers, deferring freeing via a workqueue, kthread, etc... would work just as well, if not better.
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index d1859e8550df..f35e7d4d73f1 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -65,7 +65,7 @@ typedef kvm_pte_t __rcu *kvm_pteref_t; static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared) { - return rcu_dereference_check(pteref, shared); + return rcu_dereference_check(pteref, !shared); } static inline void kvm_pgtable_walk_begin(void) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 175a0e9a04b6..a1a623c5b069 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -130,9 +130,21 @@ static void kvm_s2_free_pages_exact(void *virt, size_t size) static struct kvm_pgtable_mm_ops kvm_s2_mm_ops; +static void stage2_free_removed_table_rcu_cb(struct rcu_head *head) +{ + struct page *page = container_of(head, struct page, rcu_head); + void *pgtable = page_to_virt(page); + u32 level = page_private(page); + + kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, pgtable, level); +} + static void stage2_free_removed_table(void *addr, u32 level) { - kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, addr, level); + struct page *page = virt_to_page(addr); + + set_page_private(page, (unsigned long)level); + call_rcu(&page->rcu_head, stage2_free_removed_table_rcu_cb); } static void kvm_host_get_page(void *addr)
There is no real urgency to free a stage-2 subtree that was pruned. Nonetheless, KVM does the tear down in the stage-2 fault path while holding the MMU lock. Free removed stage-2 subtrees after an RCU grace period. To guarantee all stage-2 table pages are freed before killing a VM, add an rcu_barrier() to the flush path. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> --- arch/arm64/include/asm/kvm_pgtable.h | 2 +- arch/arm64/kvm/mmu.c | 14 +++++++++++++- 2 files changed, 14 insertions(+), 2 deletions(-)