From patchwork Fri Aug 25 09:35:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shameerali Kolothum Thodi X-Patchwork-Id: 13365342 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C01E6EE49A3 for ; Fri, 25 Aug 2023 09:37:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244327AbjHYJhB (ORCPT ); Fri, 25 Aug 2023 05:37:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46032 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244416AbjHYJg0 (ORCPT ); Fri, 25 Aug 2023 05:36:26 -0400 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 06FF01FD4 for ; Fri, 25 Aug 2023 02:36:25 -0700 (PDT) Received: from lhrpeml500005.china.huawei.com (unknown [172.18.147.200]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4RXF8R4527z6K659; Fri, 25 Aug 2023 17:31:51 +0800 (CST) Received: from A2006125610.china.huawei.com (10.202.227.178) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Fri, 25 Aug 2023 10:36:16 +0100 From: Shameer Kolothum To: , , , , , , CC: , , , , , Subject: [RFC PATCH v2 1/8] arm64: cpufeature: Add API to report system support of HWDBM Date: Fri, 25 Aug 2023 10:35:21 +0100 Message-ID: <20230825093528.1637-2-shameerali.kolothum.thodi@huawei.com> X-Mailer: git-send-email 2.12.0.windows.1 In-Reply-To: <20230825093528.1637-1-shameerali.kolothum.thodi@huawei.com> References: <20230825093528.1637-1-shameerali.kolothum.thodi@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.202.227.178] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To lhrpeml500005.china.huawei.com (7.191.163.240) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Keqian Zhu Though we already has a cpu capability named ARM64_HW_DBM, it's a LOCAL_CPU cap and conditionally compiled by CONFIG_ARM64_HW_AFDBM. This reports the system wide support of HW_DBM. Signed-off-by: Keqian Zhu Signed-off-by: Shameer Kolothum --- arch/arm64/include/asm/cpufeature.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 96e50227f940..edb04e45e030 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -733,6 +733,21 @@ static inline bool system_supports_mixed_endian(void) return val == 0x1; } +static inline bool system_supports_hw_dbm(void) +{ + u64 mmfr1; + u32 val; + + if (!IS_ENABLED(CONFIG_ARM64_HW_AFDBM)) + return false; + + mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1); + val = cpuid_feature_extract_unsigned_field(mmfr1, + ID_AA64MMFR1_EL1_HAFDBS_SHIFT); + + return val == ID_AA64MMFR1_EL1_HAFDBS_DBM; +} + static __always_inline bool system_supports_fpsimd(void) { return !cpus_have_const_cap(ARM64_HAS_NO_FPSIMD); From patchwork Fri Aug 25 09:35:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shameerali Kolothum Thodi X-Patchwork-Id: 13365343 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87C31EE49B2 for ; Fri, 25 Aug 2023 09:37:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244347AbjHYJhE (ORCPT ); Fri, 25 Aug 2023 05:37:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56818 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236770AbjHYJgf (ORCPT ); Fri, 25 Aug 2023 05:36:35 -0400 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6454C1FD4 for ; Fri, 25 Aug 2023 02:36:33 -0700 (PDT) Received: from lhrpeml500005.china.huawei.com (unknown [172.18.147.226]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4RXFDv3Mw3z6D8XF; Fri, 25 Aug 2023 17:35:43 +0800 (CST) Received: from A2006125610.china.huawei.com (10.202.227.178) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Fri, 25 Aug 2023 10:36:24 +0100 From: Shameer Kolothum To: , , , , , , CC: , , , , , Subject: [RFC PATCH v2 2/8] KVM: arm64: Add KVM_PGTABLE_WALK_HW_DBM for HW DBM support Date: Fri, 25 Aug 2023 10:35:22 +0100 Message-ID: <20230825093528.1637-3-shameerali.kolothum.thodi@huawei.com> X-Mailer: git-send-email 2.12.0.windows.1 In-Reply-To: <20230825093528.1637-1-shameerali.kolothum.thodi@huawei.com> References: <20230825093528.1637-1-shameerali.kolothum.thodi@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.202.227.178] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To lhrpeml500005.china.huawei.com (7.191.163.240) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org KVM_PGTABLE_WALK_HW_DBM - Indicates page table walk is for HW DBM related updates. No functional changes here. Only apply any HW DBM bit updates to last level only. These will be used by a future commit where we will add support for HW DBM. Signed-off-by: Shameer Kolothum --- arch/arm64/include/asm/kvm_pgtable.h | 3 +++ arch/arm64/kvm/hyp/pgtable.c | 10 ++++++++++ 2 files changed, 13 insertions(+) diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index d3e354bb8351..3f96bdd2086f 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -219,6 +219,8 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end, * @KVM_PGTABLE_WALK_SKIP_CMO: Visit and update table entries * without Cache maintenance * operations required. + * @KVM_PGTABLE_WALK_HW_DBM: Indicates that the attribute update is + * HW DBM related. */ enum kvm_pgtable_walk_flags { KVM_PGTABLE_WALK_LEAF = BIT(0), @@ -228,6 +230,7 @@ enum kvm_pgtable_walk_flags { KVM_PGTABLE_WALK_HANDLE_FAULT = BIT(4), KVM_PGTABLE_WALK_SKIP_BBM_TLBI = BIT(5), KVM_PGTABLE_WALK_SKIP_CMO = BIT(6), + KVM_PGTABLE_WALK_HW_DBM = BIT(7), }; struct kvm_pgtable_visit_ctx { diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index f155b8c9e98c..1e65b8c97059 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -67,6 +67,11 @@ struct kvm_pgtable_walk_data { const u64 end; }; +static bool kvm_pgtable_walk_hw_dbm(const struct kvm_pgtable_visit_ctx *ctx) +{ + return ctx->flags & KVM_PGTABLE_WALK_HW_DBM; +} + static bool kvm_pgtable_walk_skip_bbm_tlbi(const struct kvm_pgtable_visit_ctx *ctx) { return unlikely(ctx->flags & KVM_PGTABLE_WALK_SKIP_BBM_TLBI); @@ -1164,6 +1169,11 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx, if (!kvm_pte_valid(ctx->old)) return -EAGAIN; + /* Only apply HW DBM for last level */ + if (kvm_pgtable_walk_hw_dbm(ctx) && + ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1)) + return 0; + data->level = ctx->level; data->pte = pte; pte &= ~data->attr_clr; From patchwork Fri Aug 25 09:35:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shameerali Kolothum Thodi X-Patchwork-Id: 13365345 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C47CEE49BD for ; Fri, 25 Aug 2023 09:37:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244270AbjHYJhF (ORCPT ); Fri, 25 Aug 2023 05:37:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42286 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244091AbjHYJgo (ORCPT ); Fri, 25 Aug 2023 05:36:44 -0400 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 23D991FD4 for ; Fri, 25 Aug 2023 02:36:42 -0700 (PDT) Received: from lhrpeml500005.china.huawei.com (unknown [172.18.147.201]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4RXFF41TH3z6D8Wp; Fri, 25 Aug 2023 17:35:52 +0800 (CST) Received: from A2006125610.china.huawei.com (10.202.227.178) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Fri, 25 Aug 2023 10:36:33 +0100 From: Shameer Kolothum To: , , , , , , CC: , , , , , Subject: [RFC PATCH v2 3/8] KVM: arm64: Add some HW_DBM related pgtable interfaces Date: Fri, 25 Aug 2023 10:35:23 +0100 Message-ID: <20230825093528.1637-4-shameerali.kolothum.thodi@huawei.com> X-Mailer: git-send-email 2.12.0.windows.1 In-Reply-To: <20230825093528.1637-1-shameerali.kolothum.thodi@huawei.com> References: <20230825093528.1637-1-shameerali.kolothum.thodi@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.202.227.178] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To lhrpeml500005.china.huawei.com (7.191.163.240) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Keqian Zhu This adds set_dbm, clear_dbm and sync_dirty interfaces in pgtable layer. (1) set_dbm: Set DBM bit for last level PTE of a specified range. TLBI is completed. (2) clear_dbm: Clear DBM bit for last level PTE of a specified range. TLBI is not acted. (3) sync_dirty: Scan last level PTE of a specific range. Log dirty if PTE is writeable. Besides, save the dirty state of PTE if it's invalided by map or unmap. Signed-off-by: Keqian Zhu Signed-off-by: Shameer Kolothum --- arch/arm64/include/asm/kvm_pgtable.h | 45 +++++++++++++ arch/arm64/kernel/image-vars.h | 2 + arch/arm64/kvm/hyp/pgtable.c | 98 ++++++++++++++++++++++++++++ 3 files changed, 145 insertions(+) diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index 3f96bdd2086f..a12add002b89 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -578,6 +578,51 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size, */ int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size); +/** + * kvm_pgtable_stage2_clear_dbm() - Clear DBM of guest stage-2 address range + * without TLB invalidation (only last level). + * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init(). + * @addr: Intermediate physical address from which to clear DBM, + * @size: Size of the range. + * + * The offset of @addr within a page is ignored and @size is rounded-up to + * the next page boundary. + * + * Note that it is the caller's responsibility to invalidate the TLB after + * calling this function to ensure that the disabled HW dirty are visible + * to the CPUs. + * + * Return: 0 on success, negative error code on failure. + */ +int kvm_pgtable_stage2_clear_dbm(struct kvm_pgtable *pgt, u64 addr, u64 size); + +/** + * kvm_pgtable_stage2_set_dbm() - Set DBM of guest stage-2 address range to + * enable HW dirty (only last level). + * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init(). + * @addr: Intermediate physical address from which to set DBM. + * @size: Size of the range. + * + * The offset of @addr within a page is ignored and @size is rounded-up to + * the next page boundary. + * + * Return: 0 on success, negative error code on failure. + */ +int kvm_pgtable_stage2_set_dbm(struct kvm_pgtable *pgt, u64 addr, u64 size); + +/** + * kvm_pgtable_stage2_sync_dirty() - Sync HW dirty state into memslot. + * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init(). + * @addr: Intermediate physical address from which to sync. + * @size: Size of the range. + * + * The offset of @addr within a page is ignored and @size is rounded-up to + * the next page boundary. + * + * Return: 0 on success, negative error code on failure. + */ +int kvm_pgtable_stage2_sync_dirty(struct kvm_pgtable *pgt, u64 addr, u64 size); + /** * kvm_pgtable_stage2_wrprotect() - Write-protect guest stage-2 address range * without TLB invalidation. diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h index 35f3c7959513..2ca600e3d637 100644 --- a/arch/arm64/kernel/image-vars.h +++ b/arch/arm64/kernel/image-vars.h @@ -68,6 +68,8 @@ KVM_NVHE_ALIAS(__hyp_stub_vectors); KVM_NVHE_ALIAS(vgic_v2_cpuif_trap); KVM_NVHE_ALIAS(vgic_v3_cpuif_trap); +KVM_NVHE_ALIAS(mark_page_dirty); + #ifdef CONFIG_ARM64_PSEUDO_NMI /* Static key checked in GIC_PRIO_IRQOFF. */ KVM_NVHE_ALIAS(gic_nonsecure_priorities); diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 1e65b8c97059..d7a46a00a7f6 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -9,6 +9,7 @@ #include #include +#include #include @@ -42,6 +43,7 @@ #define KVM_PTE_LEAF_ATTR_HI_S1_XN BIT(54) +#define KVM_PTE_LEAF_ATTR_HI_S2_DBM BIT(51) #define KVM_PTE_LEAF_ATTR_HI_S2_XN BIT(54) #define KVM_PTE_LEAF_ATTR_HI_S1_GP BIT(50) @@ -764,8 +766,44 @@ static bool stage2_pte_is_locked(kvm_pte_t pte) return !kvm_pte_valid(pte) && (pte & KVM_INVALID_PTE_LOCKED); } +static bool stage2_pte_writeable(kvm_pte_t pte) +{ + return pte & KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W; +} + +static void kvm_update_hw_dbm(const struct kvm_pgtable_visit_ctx *ctx, + kvm_pte_t new) +{ + kvm_pte_t old_pte, pte = ctx->old; + + /* Only set DBM if page is writeable */ + if ((new & KVM_PTE_LEAF_ATTR_HI_S2_DBM) && !stage2_pte_writeable(pte)) + return; + + /* Clear DBM walk is not shared, update */ + if (!kvm_pgtable_walk_shared(ctx)) { + WRITE_ONCE(*ctx->ptep, new); + return; + } + + do { + old_pte = pte; + pte = new; + + if (old_pte == pte) + break; + + pte = cmpxchg_relaxed(ctx->ptep, old_pte, pte); + } while (pte != old_pte); +} + static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new) { + if (kvm_pgtable_walk_hw_dbm(ctx)) { + kvm_update_hw_dbm(ctx, new); + return true; + } + if (!kvm_pgtable_walk_shared(ctx)) { WRITE_ONCE(*ctx->ptep, new); return true; @@ -952,6 +990,11 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx, stage2_pte_executable(new)) mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule); + /* Save the possible hardware dirty info */ + if ((ctx->level == KVM_PGTABLE_MAX_LEVELS - 1) && + stage2_pte_writeable(ctx->old)) + mark_page_dirty(kvm_s2_mmu_to_kvm(pgt->mmu), ctx->addr >> PAGE_SHIFT); + stage2_make_pte(ctx, new); return 0; @@ -1125,6 +1168,11 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx, */ stage2_unmap_put_pte(ctx, mmu, mm_ops); + /* Save the possible hardware dirty info */ + if ((ctx->level == KVM_PGTABLE_MAX_LEVELS - 1) && + stage2_pte_writeable(ctx->old)) + mark_page_dirty(kvm_s2_mmu_to_kvm(mmu), ctx->addr >> PAGE_SHIFT); + if (need_flush && mm_ops->dcache_clean_inval_poc) mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops), kvm_granule_size(ctx->level)); @@ -1230,6 +1278,30 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr, return 0; } +int kvm_pgtable_stage2_set_dbm(struct kvm_pgtable *pgt, u64 addr, u64 size) +{ + int ret; + u64 offset; + + ret = stage2_update_leaf_attrs(pgt, addr, size, KVM_PTE_LEAF_ATTR_HI_S2_DBM, 0, + NULL, NULL, KVM_PGTABLE_WALK_HW_DBM | + KVM_PGTABLE_WALK_SHARED); + if (!ret) + return ret; + + for (offset = 0; offset < size; offset += PAGE_SIZE) + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa_nsh, pgt->mmu, addr + offset, 3); + + return 0; +} + +int kvm_pgtable_stage2_clear_dbm(struct kvm_pgtable *pgt, u64 addr, u64 size) +{ + return stage2_update_leaf_attrs(pgt, addr, size, + 0, KVM_PTE_LEAF_ATTR_HI_S2_DBM, + NULL, NULL, KVM_PGTABLE_WALK_HW_DBM); +} + int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size) { return stage2_update_leaf_attrs(pgt, addr, size, 0, @@ -1329,6 +1401,32 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr, return ret; } +static int stage2_sync_dirty_walker(const struct kvm_pgtable_visit_ctx *ctx, + enum kvm_pgtable_walk_flags visit) +{ + kvm_pte_t pte = READ_ONCE(*ctx->ptep); + struct kvm *kvm = ctx->arg; + + if (!kvm_pte_valid(pte)) + return 0; + + if ((ctx->level == KVM_PGTABLE_MAX_LEVELS - 1) && stage2_pte_writeable(pte)) + mark_page_dirty(kvm, ctx->addr >> PAGE_SHIFT); + + return 0; +} + +int kvm_pgtable_stage2_sync_dirty(struct kvm_pgtable *pgt, u64 addr, u64 size) +{ + struct kvm_pgtable_walker walker = { + .cb = stage2_sync_dirty_walker, + .flags = KVM_PGTABLE_WALK_LEAF, + .arg = kvm_s2_mmu_to_kvm(pgt->mmu), + }; + + return kvm_pgtable_walk(pgt, addr, size, &walker); +} + static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx, enum kvm_pgtable_walk_flags visit) { From patchwork Fri Aug 25 09:35:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Shameerali Kolothum Thodi X-Patchwork-Id: 13365344 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1842EE49B8 for ; Fri, 25 Aug 2023 09:37:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244369AbjHYJhG (ORCPT ); Fri, 25 Aug 2023 05:37:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241111AbjHYJgy (ORCPT ); Fri, 25 Aug 2023 05:36:54 -0400 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 667CD1FD4 for ; Fri, 25 Aug 2023 02:36:52 -0700 (PDT) Received: from lhrpeml500005.china.huawei.com (unknown [172.18.147.207]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4RXF9L1k6cz688KN; Fri, 25 Aug 2023 17:32:38 +0800 (CST) Received: from A2006125610.china.huawei.com (10.202.227.178) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Fri, 25 Aug 2023 10:36:43 +0100 From: Shameer Kolothum To: , , , , , , CC: , , , , , Subject: [RFC PATCH v2 4/8] KVM: arm64: Set DBM for previously writeable pages Date: Fri, 25 Aug 2023 10:35:24 +0100 Message-ID: <20230825093528.1637-5-shameerali.kolothum.thodi@huawei.com> X-Mailer: git-send-email 2.12.0.windows.1 In-Reply-To: <20230825093528.1637-1-shameerali.kolothum.thodi@huawei.com> References: <20230825093528.1637-1-shameerali.kolothum.thodi@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.202.227.178] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To lhrpeml500005.china.huawei.com (7.191.163.240) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org We only set DBM if the page is writeable (S2AP[1] == 1). But once migration starts, CLEAR_LOG path will write protect the pages (S2AP[1] = 0) and there isn't an easy way to differentiate the writeable pages that gets write protected from read-only pages as we only have S2AP[1] bit to check. Introduced a ctx->flag KVM_PGTABLE_WALK_WC_HINT to identify the dirty page tracking related write-protect page table walk and used one of the "Reserved for software use" bit in page descriptor to mark a page as "writeable-clean".  Signed-off-by: Shameer Kolothum --- arch/arm64/include/asm/kvm_pgtable.h | 5 +++++ arch/arm64/kvm/hyp/pgtable.c | 25 ++++++++++++++++++++++--- 2 files changed, 27 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index a12add002b89..67bcbc5984f9 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -190,6 +190,8 @@ enum kvm_pgtable_prot { #define KVM_PGTABLE_PROT_RW (KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W) #define KVM_PGTABLE_PROT_RWX (KVM_PGTABLE_PROT_RW | KVM_PGTABLE_PROT_X) +#define KVM_PGTABLE_PROT_WC KVM_PGTABLE_PROT_SW0 /*write-clean*/ + #define PKVM_HOST_MEM_PROT KVM_PGTABLE_PROT_RWX #define PKVM_HOST_MMIO_PROT KVM_PGTABLE_PROT_RW @@ -221,6 +223,8 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end, * operations required. * @KVM_PGTABLE_WALK_HW_DBM: Indicates that the attribute update is * HW DBM related. + * @KVM_PGTABLE_WALK_WC_HINT: Update the page as writeable-clean(software attribute) + * if we are write protecting a writeable page. */ enum kvm_pgtable_walk_flags { KVM_PGTABLE_WALK_LEAF = BIT(0), @@ -231,6 +235,7 @@ enum kvm_pgtable_walk_flags { KVM_PGTABLE_WALK_SKIP_BBM_TLBI = BIT(5), KVM_PGTABLE_WALK_SKIP_CMO = BIT(6), KVM_PGTABLE_WALK_HW_DBM = BIT(7), + KVM_PGTABLE_WALK_WC_HINT = BIT(8), }; struct kvm_pgtable_visit_ctx { diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index d7a46a00a7f6..4552bfb1f274 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -69,6 +69,11 @@ struct kvm_pgtable_walk_data { const u64 end; }; +static bool kvm_pgtable_walk_wc_hint(const struct kvm_pgtable_visit_ctx *ctx) +{ + return ctx->flags & KVM_PGTABLE_WALK_WC_HINT; +} + static bool kvm_pgtable_walk_hw_dbm(const struct kvm_pgtable_visit_ctx *ctx) { return ctx->flags & KVM_PGTABLE_WALK_HW_DBM; @@ -771,13 +776,24 @@ static bool stage2_pte_writeable(kvm_pte_t pte) return pte & KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W; } +static bool stage2_pte_is_write_clean(kvm_pte_t pte) +{ + return kvm_pte_valid(pte) && (pte & KVM_PGTABLE_PROT_WC); +} + +static bool stage2_pte_can_be_write_clean(const struct kvm_pgtable_visit_ctx *ctx, + kvm_pte_t new) +{ + return (stage2_pte_writeable(ctx->old) && !stage2_pte_writeable(new)); +} + static void kvm_update_hw_dbm(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new) { kvm_pte_t old_pte, pte = ctx->old; - /* Only set DBM if page is writeable */ - if ((new & KVM_PTE_LEAF_ATTR_HI_S2_DBM) && !stage2_pte_writeable(pte)) + /* Only set DBM if page is writeable-clean */ + if ((new & KVM_PTE_LEAF_ATTR_HI_S2_DBM) && !stage2_pte_is_write_clean(pte)) return; /* Clear DBM walk is not shared, update */ @@ -805,6 +821,9 @@ static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_ } if (!kvm_pgtable_walk_shared(ctx)) { + if (kvm_pgtable_walk_wc_hint(ctx) && + stage2_pte_can_be_write_clean(ctx, new)) + new |= KVM_PGTABLE_PROT_WC; WRITE_ONCE(*ctx->ptep, new); return true; } @@ -1306,7 +1325,7 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size) { return stage2_update_leaf_attrs(pgt, addr, size, 0, KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W, - NULL, NULL, 0); + NULL, NULL, KVM_PGTABLE_WALK_WC_HINT); } kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr) From patchwork Fri Aug 25 09:35:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shameerali Kolothum Thodi X-Patchwork-Id: 13365360 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B173C88CB2 for ; Fri, 25 Aug 2023 09:38:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244342AbjHYJhj (ORCPT ); Fri, 25 Aug 2023 05:37:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244339AbjHYJhD (ORCPT ); Fri, 25 Aug 2023 05:37:03 -0400 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1AA2C1FD5 for ; Fri, 25 Aug 2023 02:37:01 -0700 (PDT) Received: from lhrpeml500005.china.huawei.com (unknown [172.18.147.207]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4RXF9V6pKpz684JJ; Fri, 25 Aug 2023 17:32:46 +0800 (CST) Received: from A2006125610.china.huawei.com (10.202.227.178) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Fri, 25 Aug 2023 10:36:52 +0100 From: Shameer Kolothum To: , , , , , , CC: , , , , , Subject: [RFC PATCH v2 5/8] KVM: arm64: Add some HW_DBM related mmu interfaces Date: Fri, 25 Aug 2023 10:35:25 +0100 Message-ID: <20230825093528.1637-6-shameerali.kolothum.thodi@huawei.com> X-Mailer: git-send-email 2.12.0.windows.1 In-Reply-To: <20230825093528.1637-1-shameerali.kolothum.thodi@huawei.com> References: <20230825093528.1637-1-shameerali.kolothum.thodi@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.202.227.178] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To lhrpeml500005.china.huawei.com (7.191.163.240) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Keqian Zhu This adds set_dbm, clear_dbm and sync_dirty interfaces in mmu layer. They simply wrap those interfaces of pgtable layer. Signed-off-by: Keqian Zhu Signed-off-by: Shameer Kolothum --- arch/arm64/include/asm/kvm_mmu.h | 7 +++++++ arch/arm64/kvm/mmu.c | 30 ++++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+) diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 0e1e1ab17b4d..86e1e074337b 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -170,6 +170,13 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size, void **haddr); void __init free_hyp_pgds(void); +void kvm_stage2_clear_dbm(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn_offset, unsigned long npages); +void kvm_stage2_set_dbm(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn_offset, unsigned long npages); +void kvm_stage2_sync_dirty(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn_offset, unsigned long npages); + void stage2_unmap_vm(struct kvm *kvm); int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type); void kvm_uninit_stage2_mmu(struct kvm *kvm); diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index b16aff3f65f6..f5ae4b97df4d 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1149,6 +1149,36 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, kvm_mmu_split_huge_pages(kvm, start, end); } +void kvm_stage2_clear_dbm(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn_offset, unsigned long npages) +{ + phys_addr_t base_gfn = slot->base_gfn + gfn_offset; + phys_addr_t addr = base_gfn << PAGE_SHIFT; + phys_addr_t end = (base_gfn + npages) << PAGE_SHIFT; + + stage2_apply_range_resched(&kvm->arch.mmu, addr, end, kvm_pgtable_stage2_clear_dbm); +} + +void kvm_stage2_set_dbm(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn_offset, unsigned long npages) +{ + phys_addr_t base_gfn = slot->base_gfn + gfn_offset; + phys_addr_t addr = base_gfn << PAGE_SHIFT; + phys_addr_t end = (base_gfn + npages) << PAGE_SHIFT; + + stage2_apply_range(&kvm->arch.mmu, addr, end, kvm_pgtable_stage2_set_dbm, false); +} + +void kvm_stage2_sync_dirty(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn_offset, unsigned long npages) +{ + phys_addr_t base_gfn = slot->base_gfn + gfn_offset; + phys_addr_t addr = base_gfn << PAGE_SHIFT; + phys_addr_t end = (base_gfn + npages) << PAGE_SHIFT; + + stage2_apply_range(&kvm->arch.mmu, addr, end, kvm_pgtable_stage2_sync_dirty, false); +} + static void kvm_send_hwpoison_signal(unsigned long address, short lsb) { send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current); From patchwork Fri Aug 25 09:35:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shameerali Kolothum Thodi X-Patchwork-Id: 13365358 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 240E3C71133 for ; Fri, 25 Aug 2023 09:38:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233975AbjHYJhc (ORCPT ); Fri, 25 Aug 2023 05:37:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56204 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244407AbjHYJhO (ORCPT ); Fri, 25 Aug 2023 05:37:14 -0400 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 805361FFF for ; Fri, 25 Aug 2023 02:37:09 -0700 (PDT) Received: from lhrpeml500005.china.huawei.com (unknown [172.18.147.206]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4RXFFb4GbFz6D8WF; Fri, 25 Aug 2023 17:36:19 +0800 (CST) Received: from A2006125610.china.huawei.com (10.202.227.178) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Fri, 25 Aug 2023 10:37:01 +0100 From: Shameer Kolothum To: , , , , , , CC: , , , , , Subject: [RFC PATCH v2 6/8] KVM: arm64: Only write protect selected PTE Date: Fri, 25 Aug 2023 10:35:26 +0100 Message-ID: <20230825093528.1637-7-shameerali.kolothum.thodi@huawei.com> X-Mailer: git-send-email 2.12.0.windows.1 In-Reply-To: <20230825093528.1637-1-shameerali.kolothum.thodi@huawei.com> References: <20230825093528.1637-1-shameerali.kolothum.thodi@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.202.227.178] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To lhrpeml500005.china.huawei.com (7.191.163.240) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Keqian Zhu This function write protects all PTEs between the ffs and fls of mask. There may be unset bits between this range. It works well under pure software dirty log, as software dirty log is not working during this process. But it will unexpectly clear dirty status of PTE when hardware dirty log is enabled. So change it to only write protect selected PTE. Signed-off-by: Keqian Zhu Signed-off-by: Shameer Kolothum --- arch/arm64/kvm/mmu.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index f5ae4b97df4d..34251932560e 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1132,10 +1132,17 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, phys_addr_t base_gfn = slot->base_gfn + gfn_offset; phys_addr_t start = (base_gfn + __ffs(mask)) << PAGE_SHIFT; phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT; + int rs, re; lockdep_assert_held_write(&kvm->mmu_lock); - stage2_wp_range(&kvm->arch.mmu, start, end); + for_each_set_bitrange(rs, re, &mask, BITS_PER_LONG) { + phys_addr_t addr_s, addr_e; + + addr_s = (base_gfn + rs) << PAGE_SHIFT; + addr_e = (base_gfn + re) << PAGE_SHIFT; + stage2_wp_range(&kvm->arch.mmu, addr_s, addr_e); + } /* * Eager-splitting is done when manual-protect is set. We From patchwork Fri Aug 25 09:35:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shameerali Kolothum Thodi X-Patchwork-Id: 13365359 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A7F2C3DA66 for ; Fri, 25 Aug 2023 09:38:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244332AbjHYJhh (ORCPT ); Fri, 25 Aug 2023 05:37:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244463AbjHYJh3 (ORCPT ); Fri, 25 Aug 2023 05:37:29 -0400 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5501E1FD5 for ; Fri, 25 Aug 2023 02:37:18 -0700 (PDT) Received: from lhrpeml500005.china.huawei.com (unknown [172.18.147.201]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4RXFFm2sJ8z67n5j; Fri, 25 Aug 2023 17:36:28 +0800 (CST) Received: from A2006125610.china.huawei.com (10.202.227.178) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Fri, 25 Aug 2023 10:37:09 +0100 From: Shameer Kolothum To: , , , , , , CC: , , , , , Subject: [RFC PATCH v2 7/8] KVM: arm64: Add KVM_CAP_ARM_HW_DBM Date: Fri, 25 Aug 2023 10:35:27 +0100 Message-ID: <20230825093528.1637-8-shameerali.kolothum.thodi@huawei.com> X-Mailer: git-send-email 2.12.0.windows.1 In-Reply-To: <20230825093528.1637-1-shameerali.kolothum.thodi@huawei.com> References: <20230825093528.1637-1-shameerali.kolothum.thodi@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.202.227.178] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To lhrpeml500005.china.huawei.com (7.191.163.240) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add a capability for userspace to enable hardware DBM support for live migration. ToDo: Update documentation. Signed-off-by: Shameer Kolothum --- arch/arm64/include/asm/kvm_host.h | 2 ++ arch/arm64/kvm/arm.c | 13 +++++++++++++ include/uapi/linux/kvm.h | 1 + 3 files changed, 16 insertions(+) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index f623b989ddd1..17ac53150a1d 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -175,6 +175,8 @@ struct kvm_s2_mmu { struct kvm_mmu_memory_cache split_page_cache; uint64_t split_page_chunk_size; + bool hwdbm_enabled; /* KVM_CAP_ARM_HW_DBM enabled */ + struct kvm_arch *arch; }; diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index fd2af63d788d..0dbf2cda40d7 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -115,6 +115,16 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, } mutex_unlock(&kvm->slots_lock); break; + case KVM_CAP_ARM_HW_DBM: + mutex_lock(&kvm->slots_lock); + if (!system_supports_hw_dbm()) { + r = -EINVAL; + } else { + r = 0; + kvm->arch.mmu.hwdbm_enabled = true; + } + mutex_unlock(&kvm->slots_lock); + break; default: r = -EINVAL; break; @@ -316,6 +326,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES: r = kvm_supported_block_sizes(); break; + case KVM_CAP_ARM_HW_DBM: + r = system_supports_hw_dbm(); + break; default: r = 0; } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index f089ab290978..99bd5c0420ba 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1192,6 +1192,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_COUNTER_OFFSET 227 #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228 #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229 +#define KVM_CAP_ARM_HW_DBM 230 #ifdef KVM_CAP_IRQ_ROUTING From patchwork Fri Aug 25 09:35:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Shameerali Kolothum Thodi X-Patchwork-Id: 13365361 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DF4CC71133 for ; Fri, 25 Aug 2023 09:38:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242472AbjHYJiD (ORCPT ); Fri, 25 Aug 2023 05:38:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50782 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244390AbjHYJhk (ORCPT ); Fri, 25 Aug 2023 05:37:40 -0400 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8F7E31FD5 for ; Fri, 25 Aug 2023 02:37:29 -0700 (PDT) Received: from lhrpeml500005.china.huawei.com (unknown [172.18.147.200]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4RXFB336bSz67Q86; Fri, 25 Aug 2023 17:33:15 +0800 (CST) Received: from A2006125610.china.huawei.com (10.202.227.178) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Fri, 25 Aug 2023 10:37:18 +0100 From: Shameer Kolothum To: , , , , , , CC: , , , , , Subject: [RFC PATCH v2 8/8] KVM: arm64: Start up SW/HW combined dirty log Date: Fri, 25 Aug 2023 10:35:28 +0100 Message-ID: <20230825093528.1637-9-shameerali.kolothum.thodi@huawei.com> X-Mailer: git-send-email 2.12.0.windows.1 In-Reply-To: <20230825093528.1637-1-shameerali.kolothum.thodi@huawei.com> References: <20230825093528.1637-1-shameerali.kolothum.thodi@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.202.227.178] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To lhrpeml500005.china.huawei.com (7.191.163.240) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Keqian Zhu When a user has enabled HW DBM support for live migration,set the HW DBM bit for nearby pages(64 pages) for a write faulting page. We track the DBM set pages in a separate bitmap and uses that during sync dirty log avoiding a full scan of PTEs. Signed-off-by: Keqian Zhu Signed-off-by: Shameer Kolothum --- arch/arm64/include/asm/kvm_host.h | 6 ++ arch/arm64/kvm/arm.c | 125 ++++++++++++++++++++++++++++++ arch/arm64/kvm/hyp/pgtable.c | 10 +-- arch/arm64/kvm/mmu.c | 11 ++- 4 files changed, 144 insertions(+), 8 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 17ac53150a1d..5f0be57eebc4 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -181,6 +181,8 @@ struct kvm_s2_mmu { }; struct kvm_arch_memory_slot { + #define HWDBM_GRANULE_SHIFT 6 /* 64 pages per bit */ + unsigned long *hwdbm_bitmap; }; /** @@ -901,6 +903,10 @@ struct kvm_vcpu_stat { u64 exits; }; +int kvm_arm_init_hwdbm_bitmap(struct kvm *kvm, struct kvm_memory_slot *memslot); +void kvm_arm_destroy_hwdbm_bitmap(struct kvm_memory_slot *memslot); +void kvm_arm_enable_nearby_hwdbm(struct kvm *kvm, gfn_t gfn); + void kvm_vcpu_preferred_target(struct kvm_vcpu_init *init); unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu); int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices); diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 0dbf2cda40d7..ab1e2da3bf0d 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1540,9 +1540,134 @@ long kvm_arch_vcpu_ioctl(struct file *filp, return r; } +static unsigned long kvm_hwdbm_bitmap_bytes(struct kvm_memory_slot *memslot) +{ + unsigned long nbits = DIV_ROUND_UP(memslot->npages, 1 << HWDBM_GRANULE_SHIFT); + + return ALIGN(nbits, BITS_PER_LONG) / 8; +} + +static unsigned long *kvm_second_hwdbm_bitmap(struct kvm_memory_slot *memslot) +{ + unsigned long len = kvm_hwdbm_bitmap_bytes(memslot); + + return (void *)memslot->arch.hwdbm_bitmap + len; +} + +/* + * Allocate twice space. Refer kvm_arch_sync_dirty_log() to see why the + * second space is needed. + */ +int kvm_arm_init_hwdbm_bitmap(struct kvm *kvm, struct kvm_memory_slot *memslot) +{ + unsigned long bytes = 2 * kvm_hwdbm_bitmap_bytes(memslot); + + if (!kvm->arch.mmu.hwdbm_enabled) + return 0; + + if (memslot->arch.hwdbm_bitmap) { + /* Inherited from old memslot */ + bitmap_clear(memslot->arch.hwdbm_bitmap, 0, bytes * 8); + } else { + memslot->arch.hwdbm_bitmap = kvzalloc(bytes, GFP_KERNEL_ACCOUNT); + if (!memslot->arch.hwdbm_bitmap) + return -ENOMEM; + } + + return 0; +} + +void kvm_arm_destroy_hwdbm_bitmap(struct kvm_memory_slot *memslot) +{ + if (!memslot->arch.hwdbm_bitmap) + return; + + kvfree(memslot->arch.hwdbm_bitmap); + memslot->arch.hwdbm_bitmap = NULL; +} + +/* Add DBM for nearby pagetables but do not across memslot */ +void kvm_arm_enable_nearby_hwdbm(struct kvm *kvm, gfn_t gfn) +{ + struct kvm_memory_slot *memslot; + + memslot = gfn_to_memslot(kvm, gfn); + if (memslot && kvm_slot_dirty_track_enabled(memslot) && + memslot->arch.hwdbm_bitmap) { + unsigned long rel_gfn = gfn - memslot->base_gfn; + unsigned long dbm_idx = rel_gfn >> HWDBM_GRANULE_SHIFT; + unsigned long start_page, npages; + + if (!test_and_set_bit(dbm_idx, memslot->arch.hwdbm_bitmap)) { + start_page = dbm_idx << HWDBM_GRANULE_SHIFT; + npages = 1 << HWDBM_GRANULE_SHIFT; + npages = min(memslot->npages - start_page, npages); + kvm_stage2_set_dbm(kvm, memslot, start_page, npages); + } + } +} + +/* + * We have to find a place to clear hwdbm_bitmap, and clear hwdbm_bitmap means + * to clear DBM bit of all related pgtables. Note that between we clear DBM bit + * and flush TLB, HW dirty log may occur, so we must scan all related pgtables + * after flush TLB. Giving above, it's best choice to clear hwdbm_bitmap before + * sync HW dirty log. + */ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot) { + unsigned long *second_bitmap = kvm_second_hwdbm_bitmap(memslot); + unsigned long start_page, npages; + unsigned int end, rs, re; + bool has_hwdbm = false; + + if (!memslot->arch.hwdbm_bitmap) + return; + + end = kvm_hwdbm_bitmap_bytes(memslot) * 8; + bitmap_clear(second_bitmap, 0, end); + + write_lock(&kvm->mmu_lock); + for_each_set_bitrange(rs, re, memslot->arch.hwdbm_bitmap, end) { + has_hwdbm = true; + /* + * Must clear bitmap before clear DBM bit. During we clear DBM + * (it releases the mmu spinlock periodly), SW dirty tracking + * has chance to add DBM which overlaps what we are clearing. So + * if we clear bitmap after clear DBM, we will face a situation + * that bitmap is cleared but DBM are lefted, then we may have + * no chance to scan these lefted pgtables anymore. + */ + bitmap_clear(memslot->arch.hwdbm_bitmap, rs, re - rs); + + /* Record the bitmap cleared */ + bitmap_set(second_bitmap, rs, re - rs); + + start_page = rs << HWDBM_GRANULE_SHIFT; + npages = (re - rs) << HWDBM_GRANULE_SHIFT; + npages = min(memslot->npages - start_page, npages); + kvm_stage2_clear_dbm(kvm, memslot, start_page, npages); + } + write_unlock(&kvm->mmu_lock); + + if (!has_hwdbm) + return; + + /* + * Ensure vcpu write-actions that occur after we clear hwdbm_bitmap can + * be catched by guest memory abort handler. + */ + kvm_flush_remote_tlbs_memslot(kvm, memslot); + + read_lock(&kvm->mmu_lock); + for_each_set_bitrange(rs, re, second_bitmap, end) { + start_page = rs << HWDBM_GRANULE_SHIFT; + npages = (re - rs) << HWDBM_GRANULE_SHIFT; + npages = min(memslot->npages - start_page, npages); + kvm_stage2_sync_dirty(kvm, memslot, start_page, npages); + } + read_unlock(&kvm->mmu_lock); } static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm, diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 4552bfb1f274..330912d647c7 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -651,10 +651,10 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift) #ifdef CONFIG_ARM64_HW_AFDBM /* - * Enable the Hardware Access Flag management, unconditionally - * on all CPUs. In systems that have asymmetric support for the feature - * this allows KVM to leverage hardware support on the subset of cores - * that implement the feature. + * Enable the Hardware Access Flag management and Dirty State management, + * unconditionally on all CPUs. In systems that have asymmetric support for + * the feature this allows KVM to leverage hardware support on the subset of + * cores that implement the feature. * * The architecture requires VTCR_EL2.HA to be RES0 (thus ignored by * hardware) on implementations that do not advertise support for the @@ -663,7 +663,7 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift) * HAFDBS. Here be dragons. */ if (!cpus_have_final_cap(ARM64_WORKAROUND_AMPERE_AC03_CPU_38)) - vtcr |= VTCR_EL2_HA; + vtcr |= VTCR_EL2_HA | VTCR_EL2_HD; #endif /* CONFIG_ARM64_HW_AFDBM */ /* Set the vmid bits */ diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 34251932560e..b2fdcd762d70 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1569,14 +1569,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * permissions only if vma_pagesize equals fault_granule. Otherwise, * kvm_pgtable_stage2_map() should be called to change block size. */ - if (fault_status == ESR_ELx_FSC_PERM && vma_pagesize == fault_granule) + if (fault_status == ESR_ELx_FSC_PERM && vma_pagesize == fault_granule) { ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot); - else + /* Try to enable HW DBM for nearby pages */ + if (!ret && vma_pagesize == PAGE_SIZE && writable) + kvm_arm_enable_nearby_hwdbm(kvm, gfn); + } else { ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize, __pfn_to_phys(pfn), prot, memcache, KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED); + } /* Mark the page dirty only if the fault is handled successfully */ if (writable && !ret) { @@ -2046,11 +2050,12 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, } while (hva < reg_end); mmap_read_unlock(current->mm); - return ret; + return ret ? : kvm_arm_init_hwdbm_bitmap(kvm, new); } void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) { + kvm_arm_destroy_hwdbm_bitmap(slot); } void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)