From patchwork Tue Jan 26 12:44:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: zhukeqian X-Patchwork-Id: 12046369 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97C3FC433E0 for ; Tue, 26 Jan 2021 12:48:27 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4F5A723104 for ; Tue, 26 Jan 2021 12:48:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4F5A723104 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Message-ID:Date: Subject:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=bNCMWf5sAcuLGTr5gS2+VqBHldQBuDhkGLku2gps86w=; b=Ju+deIaoqMGp28rklU0o+LMmT nrAqsX+snf+F6FxyVU/Xw4iYZ7APbowNh+kC7c40GB6j+YngZIe/rMbptNSmFIbt4cYTnCWys7xMq nziKFU5YVDUFZ2h6s1v7b2ur6O3W5WcmRsuvcms3wobDSmTdl90cdXXcpTPvbdOxYdC6dNLV4v7GJ GFQb4GIJXNL6201nch+0RlQ4TSEHFm5C9HfBHoMwTzA1f6IUqQUlgH9LJWeBHOQy6LYrOyGxlje6p dLIsOBXAcXSdoWvdK0OBf4ddEW9LwToZbzaXOJIfZtSVWsDc6/GkwBM1tMlFnJKPT93M7ix+pP0Ri bUPmwxmAg==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l4NjP-0000aY-N5; Tue, 26 Jan 2021 12:46:03 +0000 Received: from szxga05-in.huawei.com ([45.249.212.191]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1l4Nio-0000Rp-0d for linux-arm-kernel@lists.infradead.org; Tue, 26 Jan 2021 12:45:30 +0000 Received: from DGGEMS412-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4DQ5yX1G7wzjDdr; Tue, 26 Jan 2021 20:44:04 +0800 (CST) Received: from DESKTOP-5IS4806.china.huawei.com (10.174.184.42) by DGGEMS412-HUB.china.huawei.com (10.3.19.212) with Microsoft SMTP Server id 14.3.498.0; Tue, 26 Jan 2021 20:45:07 +0800 From: Keqian Zhu To: , , , , Marc Zyngier , Will Deacon , Catalin Marinas Subject: [RFC PATCH 7/7] kvm: arm64: Start up SW/HW combined dirty log Date: Tue, 26 Jan 2021 20:44:44 +0800 Message-ID: <20210126124444.27136-8-zhukeqian1@huawei.com> X-Mailer: git-send-email 2.8.4.windows.1 In-Reply-To: <20210126124444.27136-1-zhukeqian1@huawei.com> References: <20210126124444.27136-1-zhukeqian1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.184.42] X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210126_074526_574108_F86E8AF8 X-CRM114-Status: GOOD ( 20.69 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , yubihong@huawei.com, jiangkunkun@huawei.com, Suzuki K Poulose , Cornelia Huck , Kirti Wankhede , xiexiangyou@huawei.com, zhengchuan@huawei.com, Alex Williamson , James Morse , wanghaibin.wang@huawei.com, Robin Murphy Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org We do not enable hardware dirty at start (do not add DBM bit). When an arbitrary PT occurs fault, we execute soft tracking for this PT and enable hardware tracking for its nearby PTs (Add DBM bit for nearby 64PTs). Then when sync dirty log, we have known all PTs with hardware dirty enabled, so we do not need to scan all PTs. Signed-off-by: Keqian Zhu --- arch/arm64/include/asm/kvm_host.h | 6 ++ arch/arm64/kvm/arm.c | 125 ++++++++++++++++++++++++++++++ arch/arm64/kvm/mmu.c | 7 +- arch/arm64/kvm/reset.c | 8 +- 4 files changed, 141 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 8fcfab0c2567..e9ea5b546326 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -99,6 +99,8 @@ struct kvm_s2_mmu { }; struct kvm_arch_memory_slot { + #define HWDBM_GRANULE_SHIFT 6 /* 64 pages per bit */ + unsigned long *hwdbm_bitmap; }; struct kvm_arch { @@ -565,6 +567,10 @@ struct kvm_vcpu_stat { u64 exits; }; +int kvm_arm_init_hwdbm_bitmap(struct kvm_memory_slot *memslot); +void kvm_arm_destroy_hwdbm_bitmap(struct kvm_memory_slot *memslot); +void kvm_arm_enable_nearby_hwdbm(struct kvm *kvm, gfn_t gfn); + int kvm_vcpu_preferred_target(struct kvm_vcpu_init *init); unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu); int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices); diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 04c44853b103..9e05d45fa6be 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1257,9 +1257,134 @@ long kvm_arch_vcpu_ioctl(struct file *filp, return r; } +static unsigned long kvm_hwdbm_bitmap_bytes(struct kvm_memory_slot *memslot) +{ + unsigned long nbits = DIV_ROUND_UP(memslot->npages, 1 << HWDBM_GRANULE_SHIFT); + + return ALIGN(nbits, BITS_PER_LONG) / 8; +} + +static unsigned long *kvm_second_hwdbm_bitmap(struct kvm_memory_slot *memslot) +{ + unsigned long len = kvm_hwdbm_bitmap_bytes(memslot); + + return (void *)memslot->arch.hwdbm_bitmap + len; +} + +/* + * Allocate twice space. Refer kvm_arch_sync_dirty_log() to see why the + * second space is needed. + */ +int kvm_arm_init_hwdbm_bitmap(struct kvm_memory_slot *memslot) +{ + unsigned long bytes = 2 * kvm_hwdbm_bitmap_bytes(memslot); + + if (!system_supports_hw_dbm()) + return 0; + + if (memslot->arch.hwdbm_bitmap) { + /* Inherited from old memslot */ + bitmap_clear(memslot->arch.hwdbm_bitmap, 0, bytes * 8); + } else { + memslot->arch.hwdbm_bitmap = kvzalloc(bytes, GFP_KERNEL_ACCOUNT); + if (!memslot->arch.hwdbm_bitmap) + return -ENOMEM; + } + + return 0; +} + +void kvm_arm_destroy_hwdbm_bitmap(struct kvm_memory_slot *memslot) +{ + if (!memslot->arch.hwdbm_bitmap) + return; + + kvfree(memslot->arch.hwdbm_bitmap); + memslot->arch.hwdbm_bitmap = NULL; +} + +/* Add DBM for nearby pagetables but do not across memslot */ +void kvm_arm_enable_nearby_hwdbm(struct kvm *kvm, gfn_t gfn) +{ + struct kvm_memory_slot *memslot; + + memslot = gfn_to_memslot(kvm, gfn); + if (memslot && kvm_slot_dirty_track_enabled(memslot) && + memslot->arch.hwdbm_bitmap) { + unsigned long rel_gfn = gfn - memslot->base_gfn; + unsigned long dbm_idx = rel_gfn >> HWDBM_GRANULE_SHIFT; + unsigned long start_page, npages; + + if (!test_and_set_bit(dbm_idx, memslot->arch.hwdbm_bitmap)) { + start_page = dbm_idx << HWDBM_GRANULE_SHIFT; + npages = 1 << HWDBM_GRANULE_SHIFT; + npages = min(memslot->npages - start_page, npages); + kvm_stage2_set_dbm(kvm, memslot, start_page, npages); + } + } +} + +/* + * We have to find a place to clear hwdbm_bitmap, and clear hwdbm_bitmap means + * to clear DBM bit of all related pgtables. Note that between we clear DBM bit + * and flush TLB, HW dirty log may occur, so we must scan all related pgtables + * after flush TLB. Giving above, it's best choice to clear hwdbm_bitmap before + * sync HW dirty log. + */ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot) { + unsigned long *second_bitmap = kvm_second_hwdbm_bitmap(memslot); + unsigned long start_page, npages; + unsigned int end, rs, re; + bool has_hwdbm = false; + + if (!memslot->arch.hwdbm_bitmap) + return; + end = kvm_hwdbm_bitmap_bytes(memslot) * 8; + bitmap_clear(second_bitmap, 0, end); + + spin_lock(&kvm->mmu_lock); + bitmap_for_each_set_region(memslot->arch.hwdbm_bitmap, rs, re, 0, end) { + has_hwdbm = true; + + /* + * Must clear bitmap before clear DBM bit. During we clear DBM + * (it releases the mmu spinlock periodly), SW dirty tracking + * has chance to add DBM which overlaps what we are clearing. So + * if we clear bitmap after clear DBM, we will face a situation + * that bitmap is cleared but DBM are lefted, then we may have + * no chance to scan these lefted pgtables anymore. + */ + bitmap_clear(memslot->arch.hwdbm_bitmap, rs, re - rs); + + /* Record the bitmap cleared */ + bitmap_set(second_bitmap, rs, re - rs); + + start_page = rs << HWDBM_GRANULE_SHIFT; + npages = (re - rs) << HWDBM_GRANULE_SHIFT; + npages = min(memslot->npages - start_page, npages); + kvm_stage2_clear_dbm(kvm, memslot, start_page, npages); + } + spin_unlock(&kvm->mmu_lock); + + if (!has_hwdbm) + return; + + /* + * Ensure vcpu write-actions that occur after we clear hwdbm_bitmap can + * be catched by guest memory abort handler. + */ + kvm_flush_remote_tlbs(kvm); + + spin_lock(&kvm->mmu_lock); + bitmap_for_each_set_region(second_bitmap, rs, re, 0, end) { + start_page = rs << HWDBM_GRANULE_SHIFT; + npages = (re - rs) << HWDBM_GRANULE_SHIFT; + npages = min(memslot->npages - start_page, npages); + kvm_stage2_sync_dirty(kvm, memslot, start_page, npages); + } + spin_unlock(&kvm->mmu_lock); } void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm, diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 2f8c6770a4dc..1a8702035ddd 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -939,6 +939,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, */ if (fault_status == FSC_PERM && vma_pagesize == fault_granule) { ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot); + + /* Put here with high probability that nearby PTEs are valid */ + if (!ret && vma_pagesize == PAGE_SIZE && writable) + kvm_arm_enable_nearby_hwdbm(kvm, gfn); } else { ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize, __pfn_to_phys(pfn), prot, @@ -1407,11 +1411,12 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, spin_unlock(&kvm->mmu_lock); out: mmap_read_unlock(current->mm); - return ret; + return ret ? : kvm_arm_init_hwdbm_bitmap(memslot); } void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) { + kvm_arm_destroy_hwdbm_bitmap(slot); } void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 47f3f035f3ea..231d11009db7 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -376,11 +376,11 @@ int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type) vtcr |= VTCR_EL2_LVLS_TO_SL0(lvls); /* - * Enable the Hardware Access Flag management, unconditionally - * on all CPUs. The features is RES0 on CPUs without the support - * and must be ignored by the CPUs. + * Enable the Hardware Access Flag and Dirty State management + * unconditionally on all CPUs. The features are RES0 on CPUs + * without the support and must be ignored by the CPUs. */ - vtcr |= VTCR_EL2_HA; + vtcr |= VTCR_EL2_HA | VTCR_EL2_HD; /* Set the vmid bits */ vtcr |= (kvm_get_vmid_bits() == 16) ?