From patchwork Thu Jul 2 13:55:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: zhukeqian X-Patchwork-Id: 11638987 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0C57A13B4 for ; Thu, 2 Jul 2020 13:58:46 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DA4FA20772 for ; Thu, 2 Jul 2020 13:58:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="McJRHfgu" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DA4FA20772 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:Message-ID:Date:Subject:To:From: Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender :Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Owner; bh=wQYj+gimdBQbZ0tsSg8NCMyq+s/7CSWnhDM9IM3RMac=; b=McJRHfgu89KnIF3jkvDExRm0MO dsac0PKVEtxJSo2v8Iw9vBgoF/6+uiRC1LchnVqarSopchOopyMSep+BfSFt+BE20ZDd+0vx9wedk /22yc7DV50cc1bv3O4Md7hDtfLSOcreNK1XQVjJQgyNfNRL+sYpFWUn2Ods6qGj88z1AZCEvk39Vx bQu7vW8EaQMFw76UgRh+Uth2qj/b/hcC/A06lreT6xMV1wswdxhMuBXsDzI3x9XxjTVMAjoddlVy5 kI/BVvowkeQ0okgDEt+wZy7sI4geocSmjWEIeZV6wLKLGC54fWxMJ/xHC14N3XmjrUrcUrQu5rWkm /letbxnA==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jqziH-0002IS-BZ; Thu, 02 Jul 2020 13:57:17 +0000 Received: from szxga04-in.huawei.com ([45.249.212.190] helo=huawei.com) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1jqzi8-0002Do-Cv for linux-arm-kernel@lists.infradead.org; Thu, 02 Jul 2020 13:57:09 +0000 Received: from DGGEMS406-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 913E96BE24CEBCEA50BE; Thu, 2 Jul 2020 21:56:10 +0800 (CST) Received: from DESKTOP-5IS4806.china.huawei.com (10.174.187.22) by DGGEMS406-HUB.china.huawei.com (10.3.19.206) with Microsoft SMTP Server id 14.3.487.0; Thu, 2 Jul 2020 21:56:00 +0800 From: Keqian Zhu To: , , , Subject: [PATCH v2 0/8] KVM: arm64: Support HW dirty log based on DBM Date: Thu, 2 Jul 2020 21:55:48 +0800 Message-ID: <20200702135556.36896-1-zhukeqian1@huawei.com> X-Mailer: git-send-email 2.8.4.windows.1 MIME-Version: 1.0 X-Originating-IP: [10.174.187.22] X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200702_095708_734182_825C28DB X-CRM114-Status: GOOD ( 12.10 ) X-Spam-Score: -2.3 (--) X-Spam-Report: SpamAssassin version 3.4.4 on merlin.infradead.org summary: Content analysis details: (-2.3 points) pts rule name description ---- ---------------------- -------------------------------------------------- -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/, medium trust [45.249.212.190 listed in list.dnswl.org] 0.0 RCVD_IN_MSPIKE_H4 RBL: Very Good reputation (+4) [45.249.212.190 listed in wl.mailspike.net] -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.0 SPF_PASS SPF: sender matches SPF record 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Suzuki K Poulose , Catalin Marinas , Keqian Zhu , Sean Christopherson , Steven Price , liangpeng10@huawei.com, Alexios Zavras , zhengxiang9@huawei.com, Mark Brown , James Morse , Marc Zyngier , wanghaibin.wang@huawei.com, Thomas Gleixner , Will Deacon , Andrew Morton , Julien Thierry Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org This patch series add support for dirty log based on HW DBM. It works well under some migration test cases, including VM with 4K pages or 2M THP. I checked the SHA256 hash digest of all memory and they keep same for source VM and destination VM, which means no dirty pages is missed under hardware DBM. Some key points: 1. Only support hardware updates of dirty status for PTEs. PMDs and PUDs are not involved for now. 2. About *performance*: In RFC patch, I have mentioned that for every 64GB memory, KVM consumes about 40ms to scan all PTEs to collect dirty log. This patch solves this problem through two ways: HW/SW dynamic switch and Multi-core offload. HW/SW dynamic switch: Give userspace right to enable/disable hw dirty log. This adds a new KVM cap named KVM_CAP_ARM_HW_DIRTY_LOG. We can achieve this by change the kvm->arch.vtcr value and kick vCPUs out to reload this value to VCTR_EL2. Then userspace can enable hw dirty log at the begining and disable it when dirty pages is little and about to stop VM, so VM downtime is not affected. Multi-core offload: Offload the PT scanning workload to multi-core can greatly reduce scanning time. To promise we can complete in time, I use smp_call_fuction to realize this policy, which utilize IPI to dispatch workload to other CPUs. Under 128U Kunpeng 920 platform, it just takes about 5ms to scan PTs of 256 RAM (use mempress and almost all PTs have been established). And We dispatch workload iterately (every CPU just scan PTs of 512M RAM for each iteration), so it won't affect physical CPUs seriously. 3. About correctness: Only add DBM bit when PTE is already writable, so we still have readonly PTE and some mechanisms which rely on readonly PTs are not broken. 4. About PTs modification races: There are two kinds of PTs modification. The first is adding or clearing specific bit, such as AF or RW. All these operations have been converted to be atomic, avoid covering dirty status set by hardware. The second is replacement, such as PTEs unmapping or changement. All these operations will invoke kvm_set_pte finally. kvm_set_pte have been converted to be atomic and we save the dirty status to underlying bitmap if dirty status is coverred. Change log: v2: - Address Steven's comments. - Add support of parallel dirty log sync. - Simplify and merge patches of v1. v1: - Address Catalin's comments. Keqian Zhu (8): KVM: arm64: Set DBM bit for writable PTEs KVM: arm64: Scan PTEs to sync dirty log KVM: arm64: Modify stage2 young mechanism to support hw DBM KVM: arm64: Save stage2 PTE dirty status if it is covered KVM: arm64: Steply write protect page table by mask bit KVM: arm64: Add KVM_CAP_ARM_HW_DIRTY_LOG capability KVM: arm64: Sync dirty log parallel KVM: Omit dirty log sync in log clear if initially all set arch/arm64/include/asm/kvm_host.h | 5 + arch/arm64/include/asm/kvm_mmu.h | 43 ++++- arch/arm64/kvm/arm.c | 45 ++++- arch/arm64/kvm/mmu.c | 307 ++++++++++++++++++++++++++++-- arch/arm64/kvm/reset.c | 5 + include/uapi/linux/kvm.h | 1 + tools/include/uapi/linux/kvm.h | 1 + virt/kvm/kvm_main.c | 3 +- 8 files changed, 389 insertions(+), 21 deletions(-)