From patchwork Mon Apr 1 23:29:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13613116 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9DBB6CD1288 for ; Mon, 1 Apr 2024 23:30:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=IZgPZYPBdk/WA3b4KYulUuyqln9aXQgD3lkIHizX7Dg=; b=XdbL7xpJOo//YQz5pLff18NnAy xFvoxAz2APalrJpqwm3ik28sXkq8TsCC7DloCIb0n/K08GG3rb+CdSl8XQJq3d33AHQhCcEVHMWiS e1YxUbDRUnSbbqJWwHDrkS2igKd63WqoYvTKrpAvD7vET+TMmoFDcHGZYgA+t/1CksoRN2LZB9BJW 2yN+MO7E4oqwpensGN3pV0ViUXhzNRFQEtosRG6hNE8/ynVGnwh8ZKIkii1KGwhxdmz65vtVZqt8Q 1IfvF1WGelZUoWkxJUvlLk4Qs33euKLONBhGQ2IbaWcpF211polwVQ0aSFLjR1OWq3ysDtrWgA+2F 6v4hOLtA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rrR6O-000000097NE-3RNx; Mon, 01 Apr 2024 23:30:08 +0000 Received: from mail-ua1-x94a.google.com ([2607:f8b0:4864:20::94a]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rrR6G-000000097Fc-1Ro6 for linux-arm-kernel@lists.infradead.org; Mon, 01 Apr 2024 23:30:02 +0000 Received: by mail-ua1-x94a.google.com with SMTP id a1e0cc1a2514c-7e331817bb4so1808140241.0 for ; Mon, 01 Apr 2024 16:29:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712014199; x=1712618999; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PTb42yX5D5jCiyjo92iMcOe2nJRX5BH9KgDDQYg5TSM=; b=OQ4CDXTFjnXlcLy4juSLZGjN48RzbhgfgDs7AmtQRJPckA8TCZ+fuTv1z0/+l9eKDM KnTYnCM5+GT9O8F0NUxuav7seAsHmV/vETByKoJChr/i+0VR4pslAVe37T+WfeYEEkhB bNXygMWbDqncs7zF3u/fymrwEObEWnDRrIgYaPHSg6fCc6eCKil1azPYNtCRMp1JxyXa 9/xi8mang65+yw92MURM7l+UEttmH7s6XZ0jBqLUOWYw78cTJDmA0mchuXMKuFMnlesR sS0yjJrOczXW+D5iYVpgbMf8QIIl2lIXTnoqxgR03+NhbjkCD5THapyM7L018MVsY3nL 93kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712014199; x=1712618999; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PTb42yX5D5jCiyjo92iMcOe2nJRX5BH9KgDDQYg5TSM=; b=wNDEv+U3zYiXUK6xeXyg+OZ7THhhy1MZnpxnNsbc2jU/nk9Qcc72JpB/pT36hjiDOX ZCJ8qWjXde3DsKxkqDVaISmicydvdYEbDGVXcwQ0OC1LPvejPcdKKLSf5NtsxRXMAUOl pmWpsnx+w7IYLCPfScbKdnTQPBVs99Amr2hAco34dTrvUutAFaOtTVeW8jpYd5+j0EBD 2mOTKLvVmVk80f1u4D0EPEu47SSY2ZwPMCBiFfDAKwLCnYjE5jtOa615FfGh1mht7vLb rY8pr+RuBoJ0oRxfrlTfy1VyZYlTrElFrBWT3/P1d8/0N+JoqeRuiwis5VemRze4+1Ag v2rA== X-Forwarded-Encrypted: i=1; AJvYcCU6zqvqjVvKFI1dmZ1xKO2oX1n+/ThlJReBV9R0u89U7mghB4cAiaL/DS+B6K9eCflNPOIFoDf0qwIo2bGnzjlDwNa0LWRzHhxb1XiBYfYNld26H2Y= X-Gm-Message-State: AOJu0Yzdp+lI986HdaJcJNbkA9PXaiRFbHZPXRAX+Vh8Va7Hf2wgDKvb POKzWwnE15fQkG5xy/IOALzFGPFDzFUKFkR/4btVhtZjtCK3I22wd17KvDYWx0gXpvpmoRCFCb1 KwOWDeKzJwfLuU24YMw== X-Google-Smtp-Source: AGHT+IGmJlmVbaC/lfrLJ5ZQ93VOyi/dOsu4Z25niMdu8RAczUBaed8myhkzYvDdfTTf9baVVIW2FHfRFR9HE1rY X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6102:2d0d:b0:470:5ca5:3e95 with SMTP id ih13-20020a0561022d0d00b004705ca53e95mr1009542vsb.2.1712014198891; Mon, 01 Apr 2024 16:29:58 -0700 (PDT) Date: Mon, 1 Apr 2024 23:29:42 +0000 In-Reply-To: <20240401232946.1837665-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240401232946.1837665-1-jthoughton@google.com> X-Mailer: git-send-email 2.44.0.478.gd926399ef9-goog Message-ID: <20240401232946.1837665-4-jthoughton@google.com> Subject: [PATCH v3 3/7] KVM: Add basic bitmap support into kvm_mmu_notifier_test/clear_young From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Yu Zhao , David Matlack , Marc Zyngier , Oliver Upton , Sean Christopherson , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shaoqin Huang , Gavin Shan , Ricardo Koller , Raghavendra Rao Ananta , Ryan Roberts , David Rientjes , Axel Rasmussen , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, James Houghton X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240401_163000_490701_C41FEC43 X-CRM114-Status: GOOD ( 25.65 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Add kvm_arch_prepare_bitmap_age() for architectures to indiciate that they support bitmap-based aging in kvm_mmu_notifier_test_clear_young() and that they do not need KVM to grab the MMU lock for writing. This function allows architectures to do other locking or other preparatory work that it needs. If an architecture does not implement kvm_arch_prepare_bitmap_age() or is unable to do bitmap-based aging at runtime (and marks the bitmap as unreliable): 1. If a bitmap was provided, we inform the caller that the bitmap is unreliable (MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE). 2. If a bitmap was not provided, fall back to the old logic. Also add logic for architectures to easily use the provided bitmap if they are able. The expectation is that the architecture's implementation of kvm_gfn_test_age() will use kvm_gfn_record_young(), and kvm_gfn_age() will use kvm_gfn_should_age(). Suggested-by: Yu Zhao Signed-off-by: James Houghton --- include/linux/kvm_host.h | 60 ++++++++++++++++++++++++++ virt/kvm/kvm_main.c | 92 +++++++++++++++++++++++++++++----------- 2 files changed, 127 insertions(+), 25 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 1800d03a06a9..5862fd7b5f9b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1992,6 +1992,26 @@ extern const struct _kvm_stats_desc kvm_vm_stats_desc[]; extern const struct kvm_stats_header kvm_vcpu_stats_header; extern const struct _kvm_stats_desc kvm_vcpu_stats_desc[]; +/* + * Architectures that support using bitmaps for kvm_age_gfn() and + * kvm_test_age_gfn should return true for kvm_arch_prepare_bitmap_age() + * and do any work they need to prepare. The subsequent walk will not + * automatically grab the KVM MMU lock, so some architectures may opt + * to grab it. + * + * If true is returned, a subsequent call to kvm_arch_finish_bitmap_age() is + * guaranteed. + */ +#ifndef kvm_arch_prepare_bitmap_age +static inline bool kvm_arch_prepare_bitmap_age(struct mmu_notifier *mn) +{ + return false; +} +#endif +#ifndef kvm_arch_finish_bitmap_age +static inline void kvm_arch_finish_bitmap_age(struct mmu_notifier *mn) {} +#endif + #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn) { @@ -2076,9 +2096,16 @@ static inline bool mmu_invalidate_retry_gfn_unsafe(struct kvm *kvm, return READ_ONCE(kvm->mmu_invalidate_seq) != mmu_seq; } +struct test_clear_young_metadata { + unsigned long *bitmap; + unsigned long bitmap_offset_end; + unsigned long end; + bool unreliable; +}; union kvm_mmu_notifier_arg { pte_t pte; unsigned long attributes; + struct test_clear_young_metadata *metadata; }; struct kvm_gfn_range { @@ -2087,11 +2114,44 @@ struct kvm_gfn_range { gfn_t end; union kvm_mmu_notifier_arg arg; bool may_block; + bool lockless; }; bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range); + +static inline void kvm_age_set_unreliable(struct kvm_gfn_range *range) +{ + struct test_clear_young_metadata *args = range->arg.metadata; + + args->unreliable = true; +} +static inline unsigned long kvm_young_bitmap_offset(struct kvm_gfn_range *range, + gfn_t gfn) +{ + struct test_clear_young_metadata *args = range->arg.metadata; + + return hva_to_gfn_memslot(args->end - 1, range->slot) - gfn; +} +static inline void kvm_gfn_record_young(struct kvm_gfn_range *range, gfn_t gfn) +{ + struct test_clear_young_metadata *args = range->arg.metadata; + + WARN_ON_ONCE(gfn < range->start || gfn >= range->end); + if (args->bitmap) + __set_bit(kvm_young_bitmap_offset(range, gfn), args->bitmap); +} +static inline bool kvm_gfn_should_age(struct kvm_gfn_range *range, gfn_t gfn) +{ + struct test_clear_young_metadata *args = range->arg.metadata; + + WARN_ON_ONCE(gfn < range->start || gfn >= range->end); + if (args->bitmap) + return test_bit(kvm_young_bitmap_offset(range, gfn), + args->bitmap); + return true; +} #endif #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d0545d88c802..7d80321e2ece 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -550,6 +550,7 @@ struct kvm_mmu_notifier_range { on_lock_fn_t on_lock; bool flush_on_ret; bool may_block; + bool lockless; }; /* @@ -598,6 +599,8 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, struct kvm_memslots *slots; int i, idx; + BUILD_BUG_ON(sizeof(gfn_range.arg) != sizeof(gfn_range.arg.pte)); + if (WARN_ON_ONCE(range->end <= range->start)) return r; @@ -637,15 +640,18 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, gfn_range.start = hva_to_gfn_memslot(hva_start, slot); gfn_range.end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot); gfn_range.slot = slot; + gfn_range.lockless = range->lockless; if (!r.found_memslot) { r.found_memslot = true; - KVM_MMU_LOCK(kvm); - if (!IS_KVM_NULL_FN(range->on_lock)) - range->on_lock(kvm); - - if (IS_KVM_NULL_FN(range->handler)) - break; + if (!range->lockless) { + KVM_MMU_LOCK(kvm); + if (!IS_KVM_NULL_FN(range->on_lock)) + range->on_lock(kvm); + + if (IS_KVM_NULL_FN(range->handler)) + break; + } } r.ret |= range->handler(kvm, &gfn_range); } @@ -654,7 +660,7 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, if (range->flush_on_ret && r.ret) kvm_flush_remote_tlbs(kvm); - if (r.found_memslot) + if (r.found_memslot && !range->lockless) KVM_MMU_UNLOCK(kvm); srcu_read_unlock(&kvm->srcu, idx); @@ -682,19 +688,24 @@ static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn, return __kvm_handle_hva_range(kvm, &range).ret; } -static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn, - unsigned long start, - unsigned long end, - gfn_handler_t handler) +static __always_inline int kvm_handle_hva_range_no_flush( + struct mmu_notifier *mn, + unsigned long start, + unsigned long end, + gfn_handler_t handler, + union kvm_mmu_notifier_arg arg, + bool lockless) { struct kvm *kvm = mmu_notifier_to_kvm(mn); const struct kvm_mmu_notifier_range range = { .start = start, .end = end, .handler = handler, + .arg = arg, .on_lock = (void *)kvm_null_fn, .flush_on_ret = false, .may_block = false, + .lockless = lockless, }; return __kvm_handle_hva_range(kvm, &range).ret; @@ -909,15 +920,36 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, kvm_age_gfn); } -static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, - struct mm_struct *mm, - unsigned long start, - unsigned long end, - unsigned long *bitmap) +static int kvm_mmu_notifier_test_clear_young(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, + unsigned long end, + unsigned long *bitmap, + bool clear) { - trace_kvm_age_hva(start, end); + if (kvm_arch_prepare_bitmap_age(mn)) { + struct test_clear_young_metadata args = { + .bitmap = bitmap, + .end = end, + .unreliable = false, + }; + union kvm_mmu_notifier_arg arg = { + .metadata = &args + }; + bool young; + + young = kvm_handle_hva_range_no_flush( + mn, start, end, + clear ? kvm_age_gfn : kvm_test_age_gfn, + arg, true); + + kvm_arch_finish_bitmap_age(mn); - /* We don't support bitmaps. Don't test or clear anything. */ + if (!args.unreliable) + return young ? MMU_NOTIFIER_YOUNG_FAST : 0; + } + + /* A bitmap was passed but the architecture doesn't support bitmaps */ if (bitmap) return MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE; @@ -934,7 +966,21 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, * cadence. If we find this inaccurate, we might come up with a * more sophisticated heuristic later. */ - return kvm_handle_hva_range_no_flush(mn, start, end, kvm_age_gfn); + return kvm_handle_hva_range_no_flush( + mn, start, end, clear ? kvm_age_gfn : kvm_test_age_gfn, + KVM_MMU_NOTIFIER_NO_ARG, false); +} + +static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, + unsigned long end, + unsigned long *bitmap) +{ + trace_kvm_age_hva(start, end); + + return kvm_mmu_notifier_test_clear_young(mn, mm, start, end, bitmap, + true); } static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, @@ -945,12 +991,8 @@ static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, { trace_kvm_test_age_hva(start, end); - /* We don't support bitmaps. Don't test or clear anything. */ - if (bitmap) - return MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE; - - return kvm_handle_hva_range_no_flush(mn, start, end, - kvm_test_age_gfn); + return kvm_mmu_notifier_test_clear_young(mn, mm, start, end, bitmap, + false); } static void kvm_mmu_notifier_release(struct mmu_notifier *mn,