From patchwork Thu Dec 23 12:30:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12698226 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C6A0C433F5 for ; Thu, 23 Dec 2021 12:32:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A8B436B0072; Thu, 23 Dec 2021 07:32:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A3AD86B0085; Thu, 23 Dec 2021 07:32:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 902CB6B0087; Thu, 23 Dec 2021 07:32:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0025.hostedemail.com [216.40.44.25]) by kanga.kvack.org (Postfix) with ESMTP id 7ECAA6B0072 for ; Thu, 23 Dec 2021 07:32:08 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 41CF5180E935A for ; Thu, 23 Dec 2021 12:32:08 +0000 (UTC) X-FDA: 78948996336.30.D9BAD51 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf26.hostedemail.com (Postfix) with ESMTP id D6E1B140029 for ; Thu, 23 Dec 2021 12:32:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640262727; x=1671798727; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=jFXnBQCQUYkIJxpEwP0R21h1nTh23vZQC9pPmI8hXVw=; b=jBXijGA/jNU+XhqsVggfuZ4MHOnT50ozjrxSJhAqxtNZZu8eveQMn+vy mqC2MlU41MeycklymKaOnas3lA9Xwm38l8Ws7a36cKwCTqUjINbWOHYdt 0pXLu2FL5EhAC+w9ffQr5vOoZgPztHkWZ32yy5CeiLuA4p/yH1x3zkJ+E GT63Igxu8hf3Hv72KHTM69MvkwRMCXA2rEyJVZGptPS6orl63plIM8CFN qpSvZn6ea6g6cXaraSYjjOA9NZJwnXyOYZeAWnLau+X+VXxZJl07mURPE GcFyFYngOdeVH7z3VSAER7X+7oY6EoxWtQzeB3bUgaak882dRuE3OXA/f g==; X-IronPort-AV: E=McAfee;i="6200,9189,10206"; a="227661045" X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="227661045" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Dec 2021 04:31:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="522078821" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 23 Dec 2021 04:31:50 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [PATCH v3 kvm/queue 08/16] KVM: Special handling for fd-based memory invalidation Date: Thu, 23 Dec 2021 20:30:03 +0800 Message-Id: <20211223123011.41044-9-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211223123011.41044-1-chao.p.peng@linux.intel.com> References: <20211223123011.41044-1-chao.p.peng@linux.intel.com> Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="jBXijGA/"; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf26.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.126) smtp.mailfrom=chao.p.peng@linux.intel.com X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: D6E1B140029 X-Stat-Signature: 95hf3komg5cpu4pdt6t151jfukwaw3di X-HE-Tag: 1640262726-211425 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For fd-based guest memory, the memory backend (e.g. the fd provider) should notify KVM to unmap/invalidate the privated memory from KVM secondary MMU when userspace punches hole on the fd (e.g. when userspace converts private memory to shared memory). To support fd-based memory invalidation, existing hva-based memory invalidation needs to be extended. A new 'inode' for the fd is passed in from memfd_falloc_notifier and the 'start/end' will represent start/end offset in the fd instead of hva range. During the invalidation KVM needs to check this inode against that in the memslot. Only when the 'inode' in memslot equals to the passed-in 'inode' we should invalidate the mapping in KVM. Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- virt/kvm/kvm_main.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index b7a1c4d7eaaa..19736a0013a0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -494,6 +494,7 @@ typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start, struct kvm_useraddr_range { unsigned long start; unsigned long end; + struct inode *inode; pte_t pte; gfn_handler_t handler; on_lock_fn_t on_lock; @@ -544,14 +545,27 @@ static __always_inline int __kvm_handle_useraddr_range(struct kvm *kvm, struct interval_tree_node *node; slots = __kvm_memslots(kvm, i); - useraddr_tree = &slots->hva_tree; + useraddr_tree = range->inode ? &slots->ofs_tree : &slots->hva_tree; kvm_for_each_memslot_in_useraddr_range(node, useraddr_tree, range->start, range->end - 1) { unsigned long useraddr_start, useraddr_end; + unsigned long useraddr_base; + + if (range->inode) { + slot = container_of(node, struct kvm_memory_slot, + ofs_node[slots->node_idx]); + if (!slot->file || + slot->file->f_inode != range->inode) + continue; + useraddr_base = slot->ofs; + } else { + slot = container_of(node, struct kvm_memory_slot, + hva_node[slots->node_idx]); + useraddr_base = slot->userspace_addr; + } - slot = container_of(node, struct kvm_memory_slot, hva_node[slots->node_idx]); - useraddr_start = max(range->start, slot->userspace_addr); - useraddr_end = min(range->end, slot->userspace_addr + + useraddr_start = max(range->start, useraddr_base); + useraddr_end = min(range->end, useraddr_base + (slot->npages << PAGE_SHIFT)); /* @@ -568,10 +582,10 @@ static __always_inline int __kvm_handle_useraddr_range(struct kvm *kvm, * {gfn_start, gfn_start+1, ..., gfn_end-1}. */ gfn_range.start = useraddr_to_gfn_memslot(useraddr_start, - slot, true); + slot, !range->inode); gfn_range.end = useraddr_to_gfn_memslot( useraddr_end + PAGE_SIZE - 1, - slot, true); + slot, !range->inode); gfn_range.slot = slot; if (!locked) { @@ -613,6 +627,7 @@ static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn, .on_lock = (void *)kvm_null_fn, .flush_on_ret = true, .may_block = false, + .inode = NULL, }; return __kvm_handle_useraddr_range(kvm, &range); @@ -632,6 +647,7 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn .on_lock = (void *)kvm_null_fn, .flush_on_ret = false, .may_block = false, + .inode = NULL, }; return __kvm_handle_useraddr_range(kvm, &range); @@ -700,6 +716,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, .on_lock = kvm_inc_notifier_count, .flush_on_ret = true, .may_block = mmu_notifier_range_blockable(range), + .inode = NULL, }; trace_kvm_unmap_hva_range(range->start, range->end); @@ -751,6 +768,7 @@ static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, .on_lock = kvm_dec_notifier_count, .flush_on_ret = false, .may_block = mmu_notifier_range_blockable(range), + .inode = NULL, }; bool wake;