From patchwork Wed Jul 6 08:20:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12907555 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A574DC433EF for ; Wed, 6 Jul 2022 08:26:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3F4816B0074; Wed, 6 Jul 2022 04:26:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3A4698E0007; Wed, 6 Jul 2022 04:26:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 26C938E0001; Wed, 6 Jul 2022 04:26:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 168716B0074 for ; Wed, 6 Jul 2022 04:26:26 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id CDF5F32DA5 for ; Wed, 6 Jul 2022 08:26:25 +0000 (UTC) X-FDA: 79655993130.05.37079F1 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf23.hostedemail.com (Postfix) with ESMTP id 3461814001F for ; Wed, 6 Jul 2022 08:26:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657095985; x=1688631985; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=btM1la5Pz1GNel76VNHG7FnCfSGO0SToWA3Mb7+FUy8=; b=gPoqmRSyzgBB6roaWUoNgyBM11vw6l+ewH0VljHwPB052jiQOKAZ1Vvt LE8RbUQw48gz0qsTeYHzIwbiSoMr9oiGjWFaSy6vNk4i5iojBxHrBtUZC jN+9yTZvLt3ltfYQwXm7nldQhMH+WX8SjiBovOkxzVjFjKJ//Fg9P4Mr1 z81hhA6nLaRJmkyuO5KLFluQ5CSLEvbn05UhCWsc0w1ItSNbjKE+743wT epZNOrJOoKjjsr2Yig3HJN4GvpedQjo1VQ4Zqtrx5TFVFqI0+g8aMiIB/ CzaHcyylxcUUy5U/GXuLw+EyfNFXKPBKGOI6RM4I5TEBDqG+Otr5OrjJe g==; X-IronPort-AV: E=McAfee;i="6400,9594,10399"; a="284801146" X-IronPort-AV: E=Sophos;i="5.92,249,1650956400"; d="scan'208";a="284801146" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jul 2022 01:26:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,249,1650956400"; d="scan'208";a="567968337" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga006.jf.intel.com with ESMTP; 06 Jul 2022 01:26:03 -0700 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, linux-kselftest@vger.kernel.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , Michael Roth , mhocko@suse.com, Muchun Song Subject: [PATCH v7 13/14] KVM: Enable and expose KVM_MEM_PRIVATE Date: Wed, 6 Jul 2022 16:20:15 +0800 Message-Id: <20220706082016.2603916-14-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220706082016.2603916-1-chao.p.peng@linux.intel.com> References: <20220706082016.2603916-1-chao.p.peng@linux.intel.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657095985; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9W6PHiAqfx2i0eCkUvoD7XjZ7Y6SmhNqYvW9iN0GR64=; b=APQX5CzjKrzX30ibcxDUKsfKS0etsb6He/lBxJS0tvqe7VeufV0sJdhJKsMfTKw1a6xuZn VQCjTc9/DnPRkfsN95ol00aSuXd8nGJjNIbLIlg75YKa3bKcwMiEaT3BiTCpz6wxL4kY6p oQn4E38eV8+wrU+7c+cPaDjfhKPKWfs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657095985; a=rsa-sha256; cv=none; b=RuKzOZbonqUCyRETu4nGDPGuxd62N4laLOr82w9pHOHv/4qVDyjXXLoGoS6MMG2G+Q001d vs4/2+aUc/CWWvWectRzzNahuZDRRwfFgB1UR8/9WQ86mOTQRoSw60Ruj75kwojzOe4lZm /czH7BweHgeBFidUf5uceRVQ7OVn2WU= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=gPoqmRSy; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf23.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.65) smtp.mailfrom=chao.p.peng@linux.intel.com X-Stat-Signature: otaoy7itxa1s1o4tkpnjrng3rahpru74 X-Rspamd-Queue-Id: 3461814001F X-Rspam-User: Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=gPoqmRSy; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf23.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.65) smtp.mailfrom=chao.p.peng@linux.intel.com X-Rspamd-Server: rspam10 X-HE-Tag: 1657095984-479418 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Register private memslot to fd-based memory backing store and handle the memfile notifiers to zap the existing mappings. Currently the register is happened at memslot creating time and the initial support does not include page migration/swap. KVM_MEM_PRIVATE is not exposed by default, architecture code can turn on it by implementing kvm_arch_private_mem_supported(). A 'kvm' reference is added in memslot structure since in memfile_notifier callbacks we can only obtain a memslot reference while kvm is need to do the zapping. Co-developed-by: Yu Zhang Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 117 ++++++++++++++++++++++++++++++++++++--- 2 files changed, 109 insertions(+), 9 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 8f56426aa1e3..4e5a0db68799 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -584,6 +584,7 @@ struct kvm_memory_slot { struct file *private_file; loff_t private_offset; struct memfile_notifier notifier; + struct kvm *kvm; }; static inline bool kvm_slot_can_be_private(const struct kvm_memory_slot *slot) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index bb714c2a4b06..d6f7e074cab2 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -941,6 +941,63 @@ static int kvm_vm_ioctl_set_encrypted_region(struct kvm *kvm, unsigned int ioctl return r; } + +static void kvm_memfile_notifier_invalidate(struct memfile_notifier *notifier, + pgoff_t start, pgoff_t end) +{ + struct kvm_memory_slot *slot = container_of(notifier, + struct kvm_memory_slot, + notifier); + unsigned long base_pgoff = slot->private_offset >> PAGE_SHIFT; + gfn_t start_gfn = slot->base_gfn; + gfn_t end_gfn = slot->base_gfn + slot->npages; + + + if (start > base_pgoff) + start_gfn = slot->base_gfn + start - base_pgoff; + + if (end < base_pgoff + slot->npages) + end_gfn = slot->base_gfn + end - base_pgoff; + + if (start_gfn >= end_gfn) + return; + + kvm_zap_gfn_range(slot->kvm, start_gfn, end_gfn); +} + +static struct memfile_notifier_ops kvm_memfile_notifier_ops = { + .invalidate = kvm_memfile_notifier_invalidate, +}; + +#define KVM_MEMFILE_FLAGS (MEMFILE_F_USER_INACCESSIBLE | \ + MEMFILE_F_UNMOVABLE | \ + MEMFILE_F_UNRECLAIMABLE) + +static inline int kvm_private_mem_register(struct kvm_memory_slot *slot) +{ + slot->notifier.ops = &kvm_memfile_notifier_ops; + return memfile_register_notifier(slot->private_file, KVM_MEMFILE_FLAGS, + &slot->notifier); +} + +static inline void kvm_private_mem_unregister(struct kvm_memory_slot *slot) +{ + memfile_unregister_notifier(&slot->notifier); +} + +#else /* !CONFIG_HAVE_KVM_PRIVATE_MEM */ + +static inline int kvm_private_mem_register(struct kvm_memory_slot *slot) +{ + WARN_ON_ONCE(1); + return -EOPNOTSUPP; +} + +static inline void kvm_private_mem_unregister(struct kvm_memory_slot *slot) +{ + WARN_ON_ONCE(1); +} + #endif /* CONFIG_HAVE_KVM_PRIVATE_MEM */ #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER @@ -987,6 +1044,11 @@ static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot) /* This does not remove the slot from struct kvm_memslots data structures */ static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) { + if (slot->flags & KVM_MEM_PRIVATE) { + kvm_private_mem_unregister(slot); + fput(slot->private_file); + } + kvm_destroy_dirty_bitmap(slot); kvm_arch_free_memslot(kvm, slot); @@ -1548,10 +1610,16 @@ bool __weak kvm_arch_private_mem_supported(struct kvm *kvm) return false; } -static int check_memory_region_flags(const struct kvm_user_mem_region *mem) +static int check_memory_region_flags(struct kvm *kvm, + const struct kvm_user_mem_region *mem) { u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES; +#ifdef CONFIG_HAVE_KVM_PRIVATE_MEM + if (kvm_arch_private_mem_supported(kvm)) + valid_flags |= KVM_MEM_PRIVATE; +#endif + #ifdef __KVM_HAVE_READONLY_MEM valid_flags |= KVM_MEM_READONLY; #endif @@ -1627,6 +1695,12 @@ static int kvm_prepare_memory_region(struct kvm *kvm, { int r; + if (change == KVM_MR_CREATE && new->flags & KVM_MEM_PRIVATE) { + r = kvm_private_mem_register(new); + if (r) + return r; + } + /* * If dirty logging is disabled, nullify the bitmap; the old bitmap * will be freed on "commit". If logging is enabled in both old and @@ -1655,6 +1729,9 @@ static int kvm_prepare_memory_region(struct kvm *kvm, if (r && new && new->dirty_bitmap && (!old || !old->dirty_bitmap)) kvm_destroy_dirty_bitmap(new); + if (r && change == KVM_MR_CREATE && new->flags & KVM_MEM_PRIVATE) + kvm_private_mem_unregister(new); + return r; } @@ -1952,7 +2029,7 @@ int __kvm_set_memory_region(struct kvm *kvm, int as_id, id; int r; - r = check_memory_region_flags(mem); + r = check_memory_region_flags(kvm, mem); if (r) return r; @@ -1971,6 +2048,10 @@ int __kvm_set_memory_region(struct kvm *kvm, !access_ok((void __user *)(unsigned long)mem->userspace_addr, mem->memory_size)) return -EINVAL; + if (mem->flags & KVM_MEM_PRIVATE && + (mem->private_offset & (PAGE_SIZE - 1) || + mem->private_offset > U64_MAX - mem->memory_size)) + return -EINVAL; if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_MEM_SLOTS_NUM) return -EINVAL; if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr) @@ -2009,6 +2090,9 @@ int __kvm_set_memory_region(struct kvm *kvm, if ((kvm->nr_memslot_pages + npages) < kvm->nr_memslot_pages) return -EINVAL; } else { /* Modify an existing slot. */ + /* Private memslots are immutable, they can only be deleted. */ + if (mem->flags & KVM_MEM_PRIVATE) + return -EINVAL; if ((mem->userspace_addr != old->userspace_addr) || (npages != old->npages) || ((mem->flags ^ old->flags) & KVM_MEM_READONLY)) @@ -2037,10 +2121,27 @@ int __kvm_set_memory_region(struct kvm *kvm, new->npages = npages; new->flags = mem->flags; new->userspace_addr = mem->userspace_addr; + if (mem->flags & KVM_MEM_PRIVATE) { + new->private_file = fget(mem->private_fd); + if (!new->private_file) { + r = -EINVAL; + goto out; + } + new->private_offset = mem->private_offset; + } + + new->kvm = kvm; r = kvm_set_memslot(kvm, old, new, change); if (r) - kfree(new); + goto out; + + return 0; + +out: + if (new->private_file) + fput(new->private_file); + kfree(new); return r; } EXPORT_SYMBOL_GPL(__kvm_set_memory_region); @@ -4712,12 +4813,10 @@ static long kvm_vm_ioctl(struct file *filp, (u32 __user *)(argp + offsetof(typeof(mem), flags)))) goto out; - if (flags & KVM_MEM_PRIVATE) { - r = -EINVAL; - goto out; - } - - size = sizeof(struct kvm_userspace_memory_region); + if (flags & KVM_MEM_PRIVATE) + size = sizeof(struct kvm_userspace_memory_region_ext); + else + size = sizeof(struct kvm_userspace_memory_region); if (copy_from_user(&mem, argp, size)) goto out;