Message ID | 20210804222844.1419481-1-dmatlack@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Improve gfn-to-memslot performance during page faults | expand |
On 05/08/21 00:28, David Matlack wrote: > This series improves the performance of gfn-to-memslot lookups during > page faults. Ben Gardon originally identified this performance gap and > sufficiently addressed it in Google's kernel by reading the memslot once > at the beginning of the page fault and passing around the pointer. > > This series takes an alternative approach by introducing a per-vCPU > cache of the least recently used memslot index. This avoids needing to > binary search the existing memslots multiple times during a page fault. > Unlike passing around the pointer, the cache has an additional benefit > in that it speeds up gfn-to-memslot lookups *across* faults and during > spte prefetching where the gfn changes. > > This difference can be seen clearly when looking at the performance of > fast_page_fault when multiple slots are in play: > > Metric | Baseline | Pass* | Cache** > ----------------------------- | ------------ | -------- | ---------- > Iteration 2 dirty memory time | 2.8s | 1.6s | 0.30s > > * Pass: Lookup the memslot once per fault and pass it around. > ** Cache: Cache the last used slot per vCPU (i.e. this series). > > (Collected via ./dirty_log_perf_test -v64 -x64) > > I plan to also send a follow-up series with a version of Ben's patches > to pass the pointer to the memslot through the page fault handling code > rather than looking it up multiple times. Even when applied on top of > the cache series it has some performance improvements by avoiding a few > extra memory accesses (mainly kvm->memslots[as_id] and > slots->used_slots). But it will be a judgement call whether or not it's > worth the code churn and complexity. Queued, thanks. Paolo > v2: > * Rename lru to last_used [Paolo] > * Tree-wide replace search_memslots with __gfn_to_memslot [Paolo] > * Avoid speculation when accessesing slots->memslots [Paolo] > * Refactor tdp_set_spte_atomic to leverage vcpu->last_used_slot [Paolo] > * Add Paolo's Reviewed-by tags > * Fix build failures in mmu_audit.c [kernel test robot] > > v1: https://lore.kernel.org/kvm/20210730223707.4083785-1-dmatlack@google.com/ > > David Matlack (7): > KVM: Rename lru_slot to last_used_slot > KVM: Move last_used_slot logic out of search_memslots > KVM: Cache the last used slot index per vCPU > KVM: x86/mmu: Leverage vcpu->last_used_slot in > tdp_mmu_map_handle_target_level > KVM: x86/mmu: Leverage vcpu->last_used_slot for rmap_add and > rmap_recycle > KVM: x86/mmu: Rename __gfn_to_rmap to gfn_to_rmap > KVM: selftests: Support multiple slots in dirty_log_perf_test > > arch/powerpc/kvm/book3s_64_vio.c | 2 +- > arch/powerpc/kvm/book3s_64_vio_hv.c | 2 +- > arch/s390/kvm/kvm-s390.c | 4 +- > arch/x86/kvm/mmu/mmu.c | 54 +++++++------ > arch/x86/kvm/mmu/mmu_audit.c | 4 +- > arch/x86/kvm/mmu/tdp_mmu.c | 42 +++++++--- > include/linux/kvm_host.h | 80 +++++++++++++++---- > .../selftests/kvm/access_tracking_perf_test.c | 2 +- > .../selftests/kvm/demand_paging_test.c | 2 +- > .../selftests/kvm/dirty_log_perf_test.c | 76 +++++++++++++++--- > .../selftests/kvm/include/perf_test_util.h | 2 +- > .../selftests/kvm/lib/perf_test_util.c | 20 +++-- > .../kvm/memslot_modification_stress_test.c | 2 +- > virt/kvm/kvm_main.c | 26 +++++- > 14 files changed, 238 insertions(+), 80 deletions(-) >