From patchwork Thu Apr 15 14:03:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: zhukeqian X-Patchwork-Id: 12205583 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 125A3C43460 for ; Thu, 15 Apr 2021 14:05:55 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 87BA961153 for ; Thu, 15 Apr 2021 14:05:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 87BA961153 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Message-ID:Date: Subject:CC:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=7FokUbHAsvmESpVkTARxYbor3rKF/y0CxDy6ZRWQeF4=; b=SxtafEVV/90zPlhyLVNCBNxBT 3vECXryn0RkAQKD0BNLu1981LumCmDuiyHNTbnjMbs7thDwSAhc8MwsCFz3y9EGiOhntNPu31ygHB PnYydT1YLDR3AMUVAzqgJ3TeZ7N5/IlkJFlpBne1AaUw2OOkxxtfeW7Ba8Ky095MGzGX9X8nT2ElD bmxGWr94m/v2uNV7aiooGX/BTsfd3+oPuHyuBGOrvOue2xBxrvv6CS7m3Q+12EkYu1hkO90HdrDrl wpQDz3ixyGYfJBLzKD2RaHEUJmAteoaOEh9R0EhA1Q5tqicLbmqse2TVwoVGFoWg4MHEefLWO11bB cU+pCAYSQ==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lX2bP-00GN4h-H9; Thu, 15 Apr 2021 14:04:15 +0000 Received: from bombadil.infradead.org ([2607:7c80:54:e::133]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lX2b5-00GN0h-KE for linux-arm-kernel@desiato.infradead.org; Thu, 15 Apr 2021 14:03:55 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:CC:To:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=Qh7zdw82GVUQTbEN87J0sGMQKwtiRrtbK6kTu3hIPmc=; b=f9OV3RHJqMt05RXZPtOFifklUU ExidW2uzmD391X4y3ImA5GRRIsDryWI07Jotz1c9v94cZdnaLl7SOyQTeBWc3+FbAAadI3t3vxZot A+VbaQoeLLXERvAsMN4p2rWodYrEOL4I0iEPu1SBLj72oY2/2Q61gyNbJWNRxcXz47n2851B7jcLF cVQek4pkteOMVUdYqS8/QPNjyB4JVbDU0Mv1drc4Gl7t8E8e0BeAxQqRPktdPi5nGk2QfoGcAai49 9KqTFnh66PX3lrVBAyu3F8d6Sjd4GKu5OXISEE2pi1xXX1H9Cr1d2T6PGbuyHEWe82r1pZkmFPg76 wva2Kf4w==; Received: from szxga06-in.huawei.com ([45.249.212.32]) by bombadil.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lX2b1-008dK6-FA for linux-arm-kernel@lists.infradead.org; Thu, 15 Apr 2021 14:03:54 +0000 Received: from DGGEMS402-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4FLgxq05rgzkjm9; Thu, 15 Apr 2021 22:01:51 +0800 (CST) Received: from DESKTOP-5IS4806.china.huawei.com (10.174.187.224) by DGGEMS402-HUB.china.huawei.com (10.3.19.202) with Microsoft SMTP Server id 14.3.498.0; Thu, 15 Apr 2021 22:03:34 +0800 From: Keqian Zhu To: , , , , Marc Zyngier CC: Subject: [PATCH v4 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO Date: Thu, 15 Apr 2021 22:03:28 +0800 Message-ID: <20210415140328.24200-3-zhukeqian1@huawei.com> X-Mailer: git-send-email 2.8.4.windows.1 In-Reply-To: <20210415140328.24200-1-zhukeqian1@huawei.com> References: <20210415140328.24200-1-zhukeqian1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.187.224] X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210415_070351_869232_746B467D X-CRM114-Status: GOOD ( 17.66 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org The MMIO region of a device maybe huge (GB level), try to use block mapping in stage2 to speedup both map and unmap. Compared to normal memory mapping, we should consider two more points when try block mapping for MMIO region: 1. For normal memory mapping, the PA(host physical address) and HVA have same alignment within PUD_SIZE or PMD_SIZE when we use the HVA to request hugepage, so we don't need to consider PA alignment when verifing block mapping. But for device memory mapping, the PA and HVA may have different alignment. 2. For normal memory mapping, we are sure hugepage size properly fit into vma, so we don't check whether the mapping size exceeds the boundary of vma. But for device memory mapping, we should pay attention to this. This adds get_vma_page_shift() to get page shift for both normal memory and device MMIO region, and check these two points when selecting block mapping size for MMIO region. Signed-off-by: Keqian Zhu Reviewed-by: Gavin Shan --- arch/arm64/kvm/mmu.c | 61 ++++++++++++++++++++++++++++++++++++-------- 1 file changed, 51 insertions(+), 10 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index c59af5ca01b0..5a1cc7751e6d 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -738,6 +738,35 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot, return PAGE_SIZE; } +static int get_vma_page_shift(struct vm_area_struct *vma, unsigned long hva) +{ + unsigned long pa; + + if (is_vm_hugetlb_page(vma) && !(vma->vm_flags & VM_PFNMAP)) + return huge_page_shift(hstate_vma(vma)); + + if (!(vma->vm_flags & VM_PFNMAP)) + return PAGE_SHIFT; + + VM_BUG_ON(is_vm_hugetlb_page(vma)); + + pa = (vma->vm_pgoff << PAGE_SHIFT) + (hva - vma->vm_start); + +#ifndef __PAGETABLE_PMD_FOLDED + if ((hva & (PUD_SIZE - 1)) == (pa & (PUD_SIZE - 1)) && + ALIGN_DOWN(hva, PUD_SIZE) >= vma->vm_start && + ALIGN(hva, PUD_SIZE) <= vma->vm_end) + return PUD_SHIFT; +#endif + + if ((hva & (PMD_SIZE - 1)) == (pa & (PMD_SIZE - 1)) && + ALIGN_DOWN(hva, PMD_SIZE) >= vma->vm_start && + ALIGN(hva, PMD_SIZE) <= vma->vm_end) + return PMD_SHIFT; + + return PAGE_SHIFT; +} + static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_memory_slot *memslot, unsigned long hva, unsigned long fault_status) @@ -769,7 +798,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; } - /* Let's check if we will get back a huge page backed by hugetlbfs */ + /* + * Let's check if we will get back a huge page backed by hugetlbfs, or + * get block mapping for device MMIO region. + */ mmap_read_lock(current->mm); vma = find_vma_intersection(current->mm, hva, hva + 1); if (unlikely(!vma)) { @@ -778,15 +810,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; } - if (is_vm_hugetlb_page(vma)) - vma_shift = huge_page_shift(hstate_vma(vma)); - else - vma_shift = PAGE_SHIFT; - - if (logging_active || - (vma->vm_flags & VM_PFNMAP)) { + /* + * logging_active is guaranteed to never be true for VM_PFNMAP + * memslots. + */ + if (logging_active) { force_pte = true; vma_shift = PAGE_SHIFT; + } else { + vma_shift = get_vma_page_shift(vma, hva); } switch (vma_shift) { @@ -854,8 +886,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; if (kvm_is_device_pfn(pfn)) { + /* + * If the page was identified as device early by looking at + * the VMA flags, vma_pagesize is already representing the + * largest quantity we can map. If instead it was mapped + * via gfn_to_pfn_prot(), vma_pagesize is set to PAGE_SIZE + * and must not be upgraded. + * + * In both cases, we don't let transparent_hugepage_adjust() + * change things at the last minute. + */ device = true; - force_pte = true; } else if (logging_active && !write_fault) { /* * Only actually map the page as writable if this was a write @@ -876,7 +917,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * If we are not forced to use page mapping, check if we are * backed by a THP and thus use block mapping if possible. */ - if (vma_pagesize == PAGE_SIZE && !force_pte) + if (vma_pagesize == PAGE_SIZE && !(force_pte || device)) vma_pagesize = transparent_hugepage_adjust(memslot, hva, &pfn, &fault_ipa); if (writable)