From patchwork Thu Dec 23 12:29:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12698219 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90ADAC433FE for ; Thu, 23 Dec 2021 12:31:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EAB856B0073; Thu, 23 Dec 2021 07:31:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E5B186B0074; Thu, 23 Dec 2021 07:31:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C86A86B0075; Thu, 23 Dec 2021 07:31:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0251.hostedemail.com [216.40.44.251]) by kanga.kvack.org (Postfix) with ESMTP id BAC4C6B0073 for ; Thu, 23 Dec 2021 07:31:07 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 7E0CF180A30EC for ; Thu, 23 Dec 2021 12:31:07 +0000 (UTC) X-FDA: 78948993774.04.17C973B Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf17.hostedemail.com (Postfix) with ESMTP id F1C8740034 for ; Thu, 23 Dec 2021 12:30:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640262666; x=1671798666; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=xtbgbUjVJSiP/3NOnfjZyjZXB+ptWQ3Ys8Zly2meUqc=; b=UjludjZjG0RAbunWnYZQK/qotzAnvFsfLCKoKTj8LLxk50FCvlL7Jwe5 tT4UgougcBUByn8a//nDUO9boGVOHHXgWXLTw27JDtTIKQu4nJ+jm2Syx 0Tou5EPIE5OW5XLuK4WZkNvV0HcCfWL4PyBR9sdxu4dQz+u+xgwiPOq49 dGL0RtxF9QZreOr8iZ/izEaWK1xXeNNOMGxpIe/NJNB6llEQ2PR0rX/24 F181Tr+1InryMB6zTmAef7RX2PdsMP4+qbAXy5o5Jz0CqrzwXcnRmR8H6 8Q/AIAekSfBfQVmIB07YEMk7S9VqxBocfButrCkUIbNJi3hEN+YqO4ebL Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10206"; a="238352122" X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="238352122" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Dec 2021 04:31:04 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="522078480" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 23 Dec 2021 04:30:56 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [PATCH v3 kvm/queue 01/16] mm/shmem: Introduce F_SEAL_INACCESSIBLE Date: Thu, 23 Dec 2021 20:29:56 +0800 Message-Id: <20211223123011.41044-2-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211223123011.41044-1-chao.p.peng@linux.intel.com> References: <20211223123011.41044-1-chao.p.peng@linux.intel.com> Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=UjludjZj; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf17.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=chao.p.peng@linux.intel.com X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: F1C8740034 X-Stat-Signature: bcbxyxpad4xiftnmmkdhxzey8apgokxq X-HE-Tag: 1640262653-680280 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: "Kirill A. Shutemov" Introduce a new seal F_SEAL_INACCESSIBLE indicating the content of the file is inaccessible from userspace in any possible ways like read(),write() or mmap() etc. It provides semantics required for KVM guest private memory support that a file descriptor with this seal set is going to be used as the source of guest memory in confidential computing environments such as Intel TDX/AMD SEV but may not be accessible from host userspace. At this time only shmem implements this seal. Signed-off-by: Kirill A. Shutemov Signed-off-by: Chao Peng --- include/uapi/linux/fcntl.h | 1 + mm/shmem.c | 37 +++++++++++++++++++++++++++++++++++-- 2 files changed, 36 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 2f86b2ad6d7e..e2bad051936f 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -43,6 +43,7 @@ #define F_SEAL_GROW 0x0004 /* prevent file from growing */ #define F_SEAL_WRITE 0x0008 /* prevent writes */ #define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */ +#define F_SEAL_INACCESSIBLE 0x0020 /* prevent file from accessing */ /* (1U << 31) is reserved for signed error codes */ /* diff --git a/mm/shmem.c b/mm/shmem.c index 18f93c2d68f1..faa7e9b1b9bc 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1098,6 +1098,10 @@ static int shmem_setattr(struct user_namespace *mnt_userns, (newsize > oldsize && (info->seals & F_SEAL_GROW))) return -EPERM; + if ((info->seals & F_SEAL_INACCESSIBLE) && + (newsize & ~PAGE_MASK)) + return -EINVAL; + if (newsize != oldsize) { error = shmem_reacct_size(SHMEM_I(inode)->flags, oldsize, newsize); @@ -1364,6 +1368,8 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) goto redirty; if (!total_swap_pages) goto redirty; + if (info->seals & F_SEAL_INACCESSIBLE) + goto redirty; /* * Our capabilities prevent regular writeback or sync from ever calling @@ -2262,6 +2268,9 @@ static int shmem_mmap(struct file *file, struct vm_area_struct *vma) if (ret) return ret; + if (info->seals & F_SEAL_INACCESSIBLE) + return -EPERM; + /* arm64 - allow memory tagging on RAM-based files */ vma->vm_flags |= VM_MTE_ALLOWED; @@ -2459,12 +2468,15 @@ shmem_write_begin(struct file *file, struct address_space *mapping, pgoff_t index = pos >> PAGE_SHIFT; /* i_rwsem is held by caller */ - if (unlikely(info->seals & (F_SEAL_GROW | - F_SEAL_WRITE | F_SEAL_FUTURE_WRITE))) { + if (unlikely(info->seals & (F_SEAL_GROW | F_SEAL_WRITE | + F_SEAL_FUTURE_WRITE | + F_SEAL_INACCESSIBLE))) { if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) return -EPERM; if ((info->seals & F_SEAL_GROW) && pos + len > inode->i_size) return -EPERM; + if (info->seals & F_SEAL_INACCESSIBLE) + return -EPERM; } return shmem_getpage(inode, index, pagep, SGP_WRITE); @@ -2538,6 +2550,21 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) end_index = i_size >> PAGE_SHIFT; if (index > end_index) break; + + /* + * inode_lock protects setting up seals as well as write to + * i_size. Setting F_SEAL_INACCESSIBLE only allowed with + * i_size == 0. + * + * Check F_SEAL_INACCESSIBLE after i_size. It effectively + * serialize read vs. setting F_SEAL_INACCESSIBLE without + * taking inode_lock in read path. + */ + if (SHMEM_I(inode)->seals & F_SEAL_INACCESSIBLE) { + error = -EPERM; + break; + } + if (index == end_index) { nr = i_size & ~PAGE_MASK; if (nr <= offset) @@ -2663,6 +2690,12 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, goto out; } + if ((info->seals & F_SEAL_INACCESSIBLE) && + (offset & ~PAGE_MASK || len & ~PAGE_MASK)) { + error = -EINVAL; + goto out; + } + shmem_falloc.waitq = &shmem_falloc_waitq; shmem_falloc.start = (u64)unmap_start >> PAGE_SHIFT; shmem_falloc.next = (unmap_end + 1) >> PAGE_SHIFT; From patchwork Thu Dec 23 12:29:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12698220 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A61AC433F5 for ; Thu, 23 Dec 2021 12:31:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB3696B0074; Thu, 23 Dec 2021 07:31:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A627C6B0075; Thu, 23 Dec 2021 07:31:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9042E6B0078; Thu, 23 Dec 2021 07:31:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0083.hostedemail.com [216.40.44.83]) by kanga.kvack.org (Postfix) with ESMTP id 81DBB6B0074 for ; Thu, 23 Dec 2021 07:31:17 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 48C16180A7306 for ; Thu, 23 Dec 2021 12:31:17 +0000 (UTC) X-FDA: 78948994194.22.A19B8BB Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf04.hostedemail.com (Postfix) with ESMTP id 1EBCD40046 for ; Thu, 23 Dec 2021 12:31:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640262674; x=1671798674; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=L6zzX+w0dRm95JSP/7w36iYjjnEIWuQgAZ4mQ5a+0Og=; b=Y38jRWe4h4KnM43QbXNe9n2prCsLx8TYEBxlx9qy+8hFemLXVYv8kRlS V3KHZ5S4B0lONsJP+Q3ZrAV2Fpu/YshEcHFWMsjin1ITaTjIUcmX/05/I w4tdmA1GNucgg1tXdef0WjwXC7vdbywKwxMDkMSjU6xLehVny+JuP8q45 XrdGvCP84eqsiDpiHDCCk/FjshvaVkK19je7ddnwdi8YyXgCUsYxBFDrg BC1PJSWk89uK/4cvwexOq9gszsZN4D5cHa7BELWaJ2XizFvrPjAmFXjPp qlwJ/mjVEk0+387XYkmdIAW08UuCB2QPxYGT8gBdanCHtFCGZ4/aKAD5V w==; X-IronPort-AV: E=McAfee;i="6200,9189,10206"; a="239574038" X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="239574038" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Dec 2021 04:31:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="522078537" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 23 Dec 2021 04:31:04 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [PATCH v3 kvm/queue 02/16] mm/memfd: Introduce MFD_INACCESSIBLE flag Date: Thu, 23 Dec 2021 20:29:57 +0800 Message-Id: <20211223123011.41044-3-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211223123011.41044-1-chao.p.peng@linux.intel.com> References: <20211223123011.41044-1-chao.p.peng@linux.intel.com> X-Rspamd-Queue-Id: 1EBCD40046 X-Stat-Signature: tdnag46r99mwgyrrow91iidy8j9nzb6e Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Y38jRWe4; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf04.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 192.55.52.120) smtp.mailfrom=chao.p.peng@linux.intel.com X-Rspamd-Server: rspam11 X-HE-Tag: 1640262673-927935 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Introduce a new memfd_create() flag indicating the content of the created memfd is inaccessible from userspace. It does this by force setting F_SEAL_INACCESSIBLE seal when the file is created. It also set F_SEAL_SEAL to prevent future sealing, which means, it can not coexist with MFD_ALLOW_SEALING. Signed-off-by: Chao Peng --- include/uapi/linux/memfd.h | 1 + mm/memfd.c | 12 +++++++++++- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/memfd.h b/include/uapi/linux/memfd.h index 7a8a26751c23..48750474b904 100644 --- a/include/uapi/linux/memfd.h +++ b/include/uapi/linux/memfd.h @@ -8,6 +8,7 @@ #define MFD_CLOEXEC 0x0001U #define MFD_ALLOW_SEALING 0x0002U #define MFD_HUGETLB 0x0004U +#define MFD_INACCESSIBLE 0x0008U /* * Huge page size encoding when MFD_HUGETLB is specified, and a huge page diff --git a/mm/memfd.c b/mm/memfd.c index 9f80f162791a..c898a007fb76 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -245,7 +245,8 @@ long memfd_fcntl(struct file *file, unsigned int cmd, unsigned long arg) #define MFD_NAME_PREFIX_LEN (sizeof(MFD_NAME_PREFIX) - 1) #define MFD_NAME_MAX_LEN (NAME_MAX - MFD_NAME_PREFIX_LEN) -#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB) +#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB | \ + MFD_INACCESSIBLE) SYSCALL_DEFINE2(memfd_create, const char __user *, uname, @@ -267,6 +268,10 @@ SYSCALL_DEFINE2(memfd_create, return -EINVAL; } + /* Disallow sealing when MFD_INACCESSIBLE is set. */ + if (flags & MFD_INACCESSIBLE && flags & MFD_ALLOW_SEALING) + return -EINVAL; + /* length includes terminating zero */ len = strnlen_user(uname, MFD_NAME_MAX_LEN + 1); if (len <= 0) @@ -315,6 +320,11 @@ SYSCALL_DEFINE2(memfd_create, *file_seals &= ~F_SEAL_SEAL; } + if (flags & MFD_INACCESSIBLE) { + file_seals = memfd_file_seals_ptr(file); + *file_seals &= F_SEAL_SEAL | F_SEAL_INACCESSIBLE; + } + fd_install(fd, file); kfree(name); return fd; From patchwork Thu Dec 23 12:29:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12698221 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5F7EC433F5 for ; Thu, 23 Dec 2021 12:31:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 50D9B6B0075; Thu, 23 Dec 2021 07:31:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 496916B0078; Thu, 23 Dec 2021 07:31:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E91D6B007B; Thu, 23 Dec 2021 07:31:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0123.hostedemail.com [216.40.44.123]) by kanga.kvack.org (Postfix) with ESMTP id 21A956B0075 for ; Thu, 23 Dec 2021 07:31:22 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id DD5F584A0B for ; Thu, 23 Dec 2021 12:31:21 +0000 (UTC) X-FDA: 78948994362.23.4C10DC4 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf20.hostedemail.com (Postfix) with ESMTP id 10D041C004E for ; Thu, 23 Dec 2021 12:31:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640262681; x=1671798681; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=ew4/cXS3lb99/lO49ixKJ+dC7uJ0xnp/lwkjIUwbnjs=; b=i4WXg6/5LKh2A/oQGQFhENwhS6nVEpvpUvlfxouEScVvgY3KDNuc/IaU WymGV2AOFVdELcCWCU6ah5GHlji80cT8W8DRwZ4ZPsk+HPE0gkzNGL2jg F6tTw33Pk/AmYxBcv+KBID/Fk1KdU1FcmnEqjQl9ElwF0kZQoPYSJMVCY mvgORAvqUQJ3idEn9GjVhYQ8ouhnAaUOslEpOi0G7zglvKd1X59j0JcXT xJ0cSs5BaMviEbhQ3Q4jlIty6BE+tSpABTNMCee9o3ElmafNwPuaoPQ37 pPvEcRNz0kYa9bRgV1soTeBu/cdwVlwz18eFK81r3+f6U+8giRO2ST/e1 w==; X-IronPort-AV: E=McAfee;i="6200,9189,10206"; a="304187821" X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="304187821" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Dec 2021 04:31:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="522078584" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 23 Dec 2021 04:31:12 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [PATCH v3 kvm/queue 03/16] mm/memfd: Introduce MEMFD_OPS Date: Thu, 23 Dec 2021 20:29:58 +0800 Message-Id: <20211223123011.41044-4-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211223123011.41044-1-chao.p.peng@linux.intel.com> References: <20211223123011.41044-1-chao.p.peng@linux.intel.com> X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 10D041C004E X-Stat-Signature: z8hucpqjuhoye9go4smduq4d7u99ko6c Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="i4WXg6/5"; spf=none (imf20.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.100) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1640262675-310352 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: "Kirill A. Shutemov" The patch introduces new MEMFD_OPS facility around file created by memfd_create() to allow a third kernel component to make use of memory bookmarked in a memfd and gets notifier when the memory in the file is allocated/invalidated. It will be used for KVM to use memfd file descriptor as the guest memory backend and KVM will use MEMFD_OPS to interact with memfd subsystem. In the future there might be other consumers (e.g. VFIO with encrypted device memory). It consists two set of callbacks: - memfd_falloc_notifier: callbacks which provided by KVM and called by memfd when memory gets allocated/invalidated through fallocate() ioctl. - memfd_pfn_ops: callbacks which provided by memfd and called by KVM to request memory page from memfd. Locking is needed for above callbacks to prevent race condition. - get_owner/put_owner is used to ensure the owner is still alive in the invalidate_page_range/fallocate callback handlers using a reference mechanism. - page is locked between get_lock_pfn/put_unlock_pfn to ensure pfn is still valid when it's used (e.g. when KVM page fault handler uses it to establish the mapping in the secondary MMU page tables). Userspace is in charge of guest memory lifecycle: it can allocate the memory with fallocate() or punch hole to free memory from the guest. The file descriptor passed down to KVM as guest memory backend. KVM registers itself as the owner of the memfd via memfd_register_falloc_notifier() and provides memfd_falloc_notifier callbacks that need to be called on fallocate() and punching hole. memfd_register_falloc_notifier() returns memfd_pfn_ops callbacks that need to be used for requesting a new page from KVM. At this time only shmem is supported. Signed-off-by: Kirill A. Shutemov Signed-off-by: Chao Peng --- include/linux/memfd.h | 22 ++++++ include/linux/shmem_fs.h | 16 ++++ mm/Kconfig | 4 + mm/memfd.c | 21 ++++++ mm/shmem.c | 158 +++++++++++++++++++++++++++++++++++++++ 5 files changed, 221 insertions(+) diff --git a/include/linux/memfd.h b/include/linux/memfd.h index 4f1600413f91..0007073b53dc 100644 --- a/include/linux/memfd.h +++ b/include/linux/memfd.h @@ -13,4 +13,26 @@ static inline long memfd_fcntl(struct file *f, unsigned int c, unsigned long a) } #endif +#ifdef CONFIG_MEMFD_OPS +struct memfd_falloc_notifier { + void (*invalidate_page_range)(struct inode *inode, void *owner, + pgoff_t start, pgoff_t end); + void (*fallocate)(struct inode *inode, void *owner, + pgoff_t start, pgoff_t end); + bool (*get_owner)(void *owner); + void (*put_owner)(void *owner); +}; + +struct memfd_pfn_ops { + long (*get_lock_pfn)(struct inode *inode, pgoff_t offset, int *order); + void (*put_unlock_pfn)(unsigned long pfn); + +}; + +extern int memfd_register_falloc_notifier(struct inode *inode, void *owner, + const struct memfd_falloc_notifier *notifier, + const struct memfd_pfn_ops **pfn_ops); +extern void memfd_unregister_falloc_notifier(struct inode *inode); +#endif + #endif /* __LINUX_MEMFD_H */ diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 166158b6e917..503adc63728c 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -12,6 +12,11 @@ /* inode in-kernel data */ +#ifdef CONFIG_MEMFD_OPS +struct memfd_falloc_notifier; +struct memfd_pfn_ops; +#endif + struct shmem_inode_info { spinlock_t lock; unsigned int seals; /* shmem seals */ @@ -24,6 +29,10 @@ struct shmem_inode_info { struct shared_policy policy; /* NUMA memory alloc policy */ struct simple_xattrs xattrs; /* list of xattrs */ atomic_t stop_eviction; /* hold when working on inode */ +#ifdef CONFIG_MEMFD_OPS + void *owner; + const struct memfd_falloc_notifier *falloc_notifier; +#endif struct inode vfs_inode; }; @@ -96,6 +105,13 @@ extern unsigned long shmem_swap_usage(struct vm_area_struct *vma); extern unsigned long shmem_partial_swap_usage(struct address_space *mapping, pgoff_t start, pgoff_t end); +#ifdef CONFIG_MEMFD_OPS +extern int shmem_register_falloc_notifier(struct inode *inode, void *owner, + const struct memfd_falloc_notifier *notifier, + const struct memfd_pfn_ops **pfn_ops); +extern void shmem_unregister_falloc_notifier(struct inode *inode); +#endif + /* Flag allocation requirements to shmem_getpage */ enum sgp_type { SGP_READ, /* don't exceed i_size, don't allocate page */ diff --git a/mm/Kconfig b/mm/Kconfig index 28edafc820ad..9989904d1b56 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -900,6 +900,10 @@ config IO_MAPPING config SECRETMEM def_bool ARCH_HAS_SET_DIRECT_MAP && !EMBEDDED +config MEMFD_OPS + bool + depends on MEMFD_CREATE + source "mm/damon/Kconfig" endmenu diff --git a/mm/memfd.c b/mm/memfd.c index c898a007fb76..41861870fc21 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -130,6 +130,27 @@ static unsigned int *memfd_file_seals_ptr(struct file *file) return NULL; } +#ifdef CONFIG_MEMFD_OPS +int memfd_register_falloc_notifier(struct inode *inode, void *owner, + const struct memfd_falloc_notifier *notifier, + const struct memfd_pfn_ops **pfn_ops) +{ + if (shmem_mapping(inode->i_mapping)) + return shmem_register_falloc_notifier(inode, owner, + notifier, pfn_ops); + + return -EINVAL; +} +EXPORT_SYMBOL_GPL(memfd_register_falloc_notifier); + +void memfd_unregister_falloc_notifier(struct inode *inode) +{ + if (shmem_mapping(inode->i_mapping)) + shmem_unregister_falloc_notifier(inode); +} +EXPORT_SYMBOL_GPL(memfd_unregister_falloc_notifier); +#endif + #define F_ALL_SEALS (F_SEAL_SEAL | \ F_SEAL_SHRINK | \ F_SEAL_GROW | \ diff --git a/mm/shmem.c b/mm/shmem.c index faa7e9b1b9bc..4d8a75c4d037 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -78,6 +78,7 @@ static struct vfsmount *shm_mnt; #include #include #include +#include #include @@ -906,6 +907,68 @@ static bool shmem_punch_compound(struct page *page, pgoff_t start, pgoff_t end) return split_huge_page(page) >= 0; } +static void notify_fallocate(struct inode *inode, pgoff_t start, pgoff_t end) +{ +#ifdef CONFIG_MEMFD_OPS + struct shmem_inode_info *info = SHMEM_I(inode); + const struct memfd_falloc_notifier *notifier; + void *owner; + bool ret; + + if (!info->falloc_notifier) + return; + + spin_lock(&info->lock); + notifier = info->falloc_notifier; + if (!notifier) { + spin_unlock(&info->lock); + return; + } + + owner = info->owner; + ret = notifier->get_owner(owner); + spin_unlock(&info->lock); + if (!ret) + return; + + notifier->fallocate(inode, owner, start, end); + notifier->put_owner(owner); +#endif +} + +static void notify_invalidate_page(struct inode *inode, struct page *page, + pgoff_t start, pgoff_t end) +{ +#ifdef CONFIG_MEMFD_OPS + struct shmem_inode_info *info = SHMEM_I(inode); + const struct memfd_falloc_notifier *notifier; + void *owner; + bool ret; + + if (!info->falloc_notifier) + return; + + spin_lock(&info->lock); + notifier = info->falloc_notifier; + if (!notifier) { + spin_unlock(&info->lock); + return; + } + + owner = info->owner; + ret = notifier->get_owner(owner); + spin_unlock(&info->lock); + if (!ret) + return; + + start = max(start, page->index); + end = min(end, page->index + thp_nr_pages(page)); + + notifier->invalidate_page_range(inode, owner, start, end); + notifier->put_owner(owner); +#endif +} + /* * Remove range of pages and swap entries from page cache, and free them. * If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate. @@ -949,6 +1012,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, } index += thp_nr_pages(page) - 1; + notify_invalidate_page(inode, page, start, end); + if (!unfalloc || !PageUptodate(page)) truncate_inode_page(mapping, page); unlock_page(page); @@ -1025,6 +1090,9 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, index--; break; } + + notify_invalidate_page(inode, page, start, end); + VM_BUG_ON_PAGE(PageWriteback(page), page); if (shmem_punch_compound(page, start, end)) truncate_inode_page(mapping, page); @@ -2815,6 +2883,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size) i_size_write(inode, offset + len); inode->i_ctime = current_time(inode); + notify_fallocate(inode, start, end); undone: spin_lock(&inode->i_lock); inode->i_private = NULL; @@ -3784,6 +3853,20 @@ static void shmem_destroy_inodecache(void) kmem_cache_destroy(shmem_inode_cachep); } +#ifdef CONFIG_MIGRATION +int shmem_migrate_page(struct address_space *mapping, struct page *newpage, + struct page *page, enum migrate_mode mode) +{ +#ifdef CONFIG_MEMFD_OPS + struct inode *inode = mapping->host; + + if (SHMEM_I(inode)->owner) + return -EOPNOTSUPP; +#endif + return migrate_page(mapping, newpage, page, mode); +} +#endif + const struct address_space_operations shmem_aops = { .writepage = shmem_writepage, .set_page_dirty = __set_page_dirty_no_writeback, @@ -3798,6 +3881,81 @@ const struct address_space_operations shmem_aops = { }; EXPORT_SYMBOL(shmem_aops); +#ifdef CONFIG_MEMFD_OPS +static long shmem_get_lock_pfn(struct inode *inode, pgoff_t offset, int *order) +{ + struct page *page; + int ret; + + ret = shmem_getpage(inode, offset, &page, SGP_NOALLOC); + if (ret) + return ret; + + *order = thp_order(compound_head(page)); + + return page_to_pfn(page); +} + +static void shmem_put_unlock_pfn(unsigned long pfn) +{ + struct page *page = pfn_to_page(pfn); + + VM_BUG_ON_PAGE(!PageLocked(page), page); + + set_page_dirty(page); + unlock_page(page); + put_page(page); +} + +static const struct memfd_pfn_ops shmem_pfn_ops = { + .get_lock_pfn = shmem_get_lock_pfn, + .put_unlock_pfn = shmem_put_unlock_pfn, +}; + +int shmem_register_falloc_notifier(struct inode *inode, void *owner, + const struct memfd_falloc_notifier *notifier, + const struct memfd_pfn_ops **pfn_ops) +{ + gfp_t gfp; + struct shmem_inode_info *info = SHMEM_I(inode); + + if (!inode || !owner || !notifier || !pfn_ops || + !notifier->invalidate_page_range || + !notifier->fallocate || + !notifier->get_owner || + !notifier->put_owner) + return -EINVAL; + + spin_lock(&info->lock); + if (info->owner && info->owner != owner) { + spin_unlock(&info->lock); + return -EPERM; + } + + info->owner = owner; + info->falloc_notifier = notifier; + spin_unlock(&info->lock); + + gfp = mapping_gfp_mask(inode->i_mapping); + gfp &= ~__GFP_MOVABLE; + mapping_set_gfp_mask(inode->i_mapping, gfp); + mapping_set_unevictable(inode->i_mapping); + + *pfn_ops = &shmem_pfn_ops; + return 0; +} + +void shmem_unregister_falloc_notifier(struct inode *inode) +{ + struct shmem_inode_info *info = SHMEM_I(inode); + + spin_lock(&info->lock); + info->owner = NULL; + info->falloc_notifier = NULL; + spin_unlock(&info->lock); +} +#endif + static const struct file_operations shmem_file_operations = { .mmap = shmem_mmap, .get_unmapped_area = shmem_get_unmapped_area, From patchwork Thu Dec 23 12:29:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12698222 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF98CC433FE for ; Thu, 23 Dec 2021 12:31:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7F3276B0078; Thu, 23 Dec 2021 07:31:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7A2CA6B007B; Thu, 23 Dec 2021 07:31:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 644016B007D; Thu, 23 Dec 2021 07:31:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0209.hostedemail.com [216.40.44.209]) by kanga.kvack.org (Postfix) with ESMTP id 56E6C6B0078 for ; Thu, 23 Dec 2021 07:31:30 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 1650B180E9354 for ; Thu, 23 Dec 2021 12:31:30 +0000 (UTC) X-FDA: 78948994740.25.679F4F8 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf23.hostedemail.com (Postfix) with ESMTP id A7DAD140014 for ; Thu, 23 Dec 2021 12:31:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640262689; x=1671798689; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=IqTe1DsTia+Q4UUoPBjuJEeqacZHw7Gxwc749Q8nzvA=; b=Og3fuK8NX/WDGJWsZoKX8NI6t0Fx5mqp8KrMPFlVE4fF+6YjJ94Ynvaa Mwe9SA0Oe+3bodKkXlq61zY2MrcSFAmzzZwBE1iajaRfq/Mf1o77/Cqh0 uLCpiGXaixAeyoD/GpsS9YF+ndO/ZnPyKGFzY7mRcOACAVj2q6zAPX+AG cExze7Y09QfPC2vMzIT2dq0XiJ7sfxdZ/EPZLb2ixdN48X5lTcPGUSyZw PPDHX5vSUjTosMAXUXteAT6M83XyENhOIRgFN0wUR+tCuZZYTIZcPVJ/B m7D7rFFfJ5iYjnGDGZj5WiVTpdTk1jCYPGCKPKcPOqqJnzjbi5oYIuLW1 A==; X-IronPort-AV: E=McAfee;i="6200,9189,10206"; a="220826896" X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="220826896" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Dec 2021 04:31:27 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="522078644" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 23 Dec 2021 04:31:19 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [PATCH v3 kvm/queue 04/16] KVM: Extend the memslot to support fd-based private memory Date: Thu, 23 Dec 2021 20:29:59 +0800 Message-Id: <20211223123011.41044-5-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211223123011.41044-1-chao.p.peng@linux.intel.com> References: <20211223123011.41044-1-chao.p.peng@linux.intel.com> Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Og3fuK8N; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf23.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 192.55.52.136) smtp.mailfrom=chao.p.peng@linux.intel.com X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: A7DAD140014 X-Stat-Signature: xiqcjewaxo5mgeegdnwde9pktunzywxq X-HE-Tag: 1640262681-353003 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Extend the memslot definition to provide fd-based private memory support by adding two new fields(fd/ofs). The memslot then can maintain memory for both shared and private pages in a single memslot. Shared pages are provided in the existing way by using userspace_addr(hva) field and get_user_pages() while private pages are provided through the new fields(fd/ofs). Since there is no 'hva' concept anymore for private memory we cannot call get_user_pages() to get a pfn, instead we rely on the newly introduced MEMFD_OPS callbacks to do the same job. This new extension is indicated by a new flag KVM_MEM_PRIVATE. Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- include/linux/kvm_host.h | 10 ++++++++++ include/uapi/linux/kvm.h | 12 ++++++++++++ 2 files changed, 22 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f8ed799e8674..2cd35560c44b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -460,8 +460,18 @@ struct kvm_memory_slot { u32 flags; short id; u16 as_id; + u32 fd; + struct file *file; + u64 ofs; }; +static inline bool kvm_slot_is_private(const struct kvm_memory_slot *slot) +{ + if (slot && (slot->flags & KVM_MEM_PRIVATE)) + return true; + return false; +} + static inline bool kvm_slot_dirty_track_enabled(const struct kvm_memory_slot *slot) { return slot->flags & KVM_MEM_LOG_DIRTY_PAGES; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 1daa45268de2..41434322fa23 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -103,6 +103,17 @@ struct kvm_userspace_memory_region { __u64 userspace_addr; /* start of the userspace allocated memory */ }; +struct kvm_userspace_memory_region_ext { + __u32 slot; + __u32 flags; + __u64 guest_phys_addr; + __u64 memory_size; /* bytes */ + __u64 userspace_addr; /* hva */ + __u64 ofs; /* offset into fd */ + __u32 fd; + __u32 padding[5]; +}; + /* * The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace, * other bits are reserved for kvm internal use which are defined in @@ -110,6 +121,7 @@ struct kvm_userspace_memory_region { */ #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) #define KVM_MEM_READONLY (1UL << 1) +#define KVM_MEM_PRIVATE (1UL << 2) /* for KVM_IRQ_LINE */ struct kvm_irq_level { From patchwork Thu Dec 23 12:30:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12698223 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF906C433EF for ; Thu, 23 Dec 2021 12:31:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 81C436B007D; Thu, 23 Dec 2021 07:31:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7CA676B007E; Thu, 23 Dec 2021 07:31:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 66BEA6B0080; Thu, 23 Dec 2021 07:31:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0150.hostedemail.com [216.40.44.150]) by kanga.kvack.org (Postfix) with ESMTP id 59B206B007D for ; Thu, 23 Dec 2021 07:31:37 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 1CA4D811F3 for ; Thu, 23 Dec 2021 12:31:37 +0000 (UTC) X-FDA: 78948995034.20.AA891E4 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf17.hostedemail.com (Postfix) with ESMTP id 55D0540023 for ; Thu, 23 Dec 2021 12:31:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640262696; x=1671798696; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=aIJ/jdrIZIbqeWf2gsio5EQmDbnrSHD8LOLFRVTJ4eg=; b=hpxA3eS0cHz65JOvlOIzKNpy6YKdi/vkSnQJzpI4gVyrlJePr5M5MV66 WZbb2Oj17PClHASUkunvV5PUQ2bvmbi/1NYoMgpYrMv1a8cfhTL4e/sMD gQhEA80SFaDV7LPtOA2iTYqMup3pGvMdlhuS7D7kVUXPQPZgBa3w9I4rD 60amC9haCoNa7MxSDQ+TYJW+16aIniVTx+WV3RTRZGntmbNy8F6kewuRt 25y1xwA7f4s//awWeZXqhc/AyiuFyquNfMoBkDUPuij/z3XX2dlU8Gs6o SEP/dqAsSEL7pORNZWcGhnUWvqH0QbzFGZoGuQxjl51U/5bsIuIx9ufPh A==; X-IronPort-AV: E=McAfee;i="6200,9189,10206"; a="240769548" X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="240769548" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Dec 2021 04:31:35 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="522078687" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 23 Dec 2021 04:31:27 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [PATCH v3 kvm/queue 05/16] KVM: Maintain ofs_tree for fast memslot lookup by file offset Date: Thu, 23 Dec 2021 20:30:00 +0800 Message-Id: <20211223123011.41044-6-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211223123011.41044-1-chao.p.peng@linux.intel.com> References: <20211223123011.41044-1-chao.p.peng@linux.intel.com> X-Stat-Signature: j433jqewyxheia8etw4yxm3eprywubr5 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 55D0540023 Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=hpxA3eS0; spf=none (imf17.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.65) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1640262684-53389 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Similar to hva_tree for hva range, maintain interval tree ofs_tree for offset range of a fd-based memslot so the lookup by offset range can be faster when memslot count is high. Signed-off-by: Chao Peng --- include/linux/kvm_host.h | 2 ++ virt/kvm/kvm_main.c | 17 +++++++++++++---- 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 2cd35560c44b..3bd875f9669f 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -451,6 +451,7 @@ static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu) struct kvm_memory_slot { struct hlist_node id_node[2]; struct interval_tree_node hva_node[2]; + struct interval_tree_node ofs_node[2]; struct rb_node gfn_node[2]; gfn_t base_gfn; unsigned long npages; @@ -560,6 +561,7 @@ struct kvm_memslots { u64 generation; atomic_long_t last_used_slot; struct rb_root_cached hva_tree; + struct rb_root_cached ofs_tree; struct rb_root gfn_tree; /* * The mapping table from slot id to memslot. diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index b0f7e6eb00ff..47e96d1eb233 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1087,6 +1087,7 @@ static struct kvm *kvm_create_vm(unsigned long type) atomic_long_set(&slots->last_used_slot, (unsigned long)NULL); slots->hva_tree = RB_ROOT_CACHED; + slots->ofs_tree = RB_ROOT_CACHED; slots->gfn_tree = RB_ROOT; hash_init(slots->id_hash); slots->node_idx = j; @@ -1363,7 +1364,7 @@ static void kvm_replace_gfn_node(struct kvm_memslots *slots, * With NULL @old this simply adds @new. * With NULL @new this simply removes @old. * - * If @new is non-NULL its hva_node[slots_idx] range has to be set + * If @new is non-NULL its hva/ofs_node[slots_idx] range has to be set * appropriately. */ static void kvm_replace_memslot(struct kvm *kvm, @@ -1377,6 +1378,7 @@ static void kvm_replace_memslot(struct kvm *kvm, if (old) { hash_del(&old->id_node[idx]); interval_tree_remove(&old->hva_node[idx], &slots->hva_tree); + interval_tree_remove(&old->ofs_node[idx], &slots->ofs_tree); if ((long)old == atomic_long_read(&slots->last_used_slot)) atomic_long_set(&slots->last_used_slot, (long)new); @@ -1388,20 +1390,27 @@ static void kvm_replace_memslot(struct kvm *kvm, } /* - * Initialize @new's hva range. Do this even when replacing an @old + * Initialize @new's hva/ofs range. Do this even when replacing an @old * slot, kvm_copy_memslot() deliberately does not touch node data. */ new->hva_node[idx].start = new->userspace_addr; new->hva_node[idx].last = new->userspace_addr + (new->npages << PAGE_SHIFT) - 1; + if (kvm_slot_is_private(new)) { + new->ofs_node[idx].start = new->ofs; + new->ofs_node[idx].last = new->ofs + + (new->npages << PAGE_SHIFT) - 1; + } /* * (Re)Add the new memslot. There is no O(1) interval_tree_replace(), - * hva_node needs to be swapped with remove+insert even though hva can't - * change when replacing an existing slot. + * hva_node/ofs_node needs to be swapped with remove+insert even though + * hva/ofs can't change when replacing an existing slot. */ hash_add(slots->id_hash, &new->id_node[idx], new->id); interval_tree_insert(&new->hva_node[idx], &slots->hva_tree); + if (kvm_slot_is_private(new)) + interval_tree_insert(&new->ofs_node[idx], &slots->ofs_tree); /* * If the memslot gfn is unchanged, rb_replace_node() can be used to From patchwork Thu Dec 23 12:30:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12698224 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 774A5C433EF for ; Thu, 23 Dec 2021 12:31:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1150A6B0080; Thu, 23 Dec 2021 07:31:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C5C06B0081; Thu, 23 Dec 2021 07:31:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ECFEE6B0082; Thu, 23 Dec 2021 07:31:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0147.hostedemail.com [216.40.44.147]) by kanga.kvack.org (Postfix) with ESMTP id DCA176B0080 for ; Thu, 23 Dec 2021 07:31:45 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 97D1C77A7F for ; Thu, 23 Dec 2021 12:31:45 +0000 (UTC) X-FDA: 78948995370.24.DC71FB6 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf29.hostedemail.com (Postfix) with ESMTP id 23014120021 for ; Thu, 23 Dec 2021 12:31:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640262704; x=1671798704; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=A3hlDdVHRByNHIPvLLa/tw6tEvLnu05NLjt0nV08/ZE=; b=dCSqdTRA9sJTz2ibb8+DtbodVQLKZonbs3nP0HBJv0O6nRvDXE106/F4 nfbDIK6x5FFOUy2IB2JOyLwWFWAP1MMlwZJR30wAJBGuurgXR6jRx2dkl A6OCAu/VTasyHIZ0PTE4knvBUbyb9+eEvYd70GO3UICnxxZpXb26EGLgY Z2yQl9FMmQrTCz30bnVDFiQlryQR89vCnwIdVLHqP+VS4TSZ+rZrj2N7A GoEnNq+e6kiIXqsOZ1w+pRQ7qdaDbkT2J/Vya9sjgUQt99Qj09T61XzYn /64SUcyBm/y5AQ+Qhb6uFFSwO7wBtUZb391bFfVdFjPDOAwEmnjtTnCBM Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10206"; a="240619745" X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="240619745" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Dec 2021 04:31:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="522078735" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 23 Dec 2021 04:31:35 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [PATCH v3 kvm/queue 06/16] KVM: Implement fd-based memory using MEMFD_OPS interfaces Date: Thu, 23 Dec 2021 20:30:01 +0800 Message-Id: <20211223123011.41044-7-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211223123011.41044-1-chao.p.peng@linux.intel.com> References: <20211223123011.41044-1-chao.p.peng@linux.intel.com> X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 23014120021 X-Stat-Signature: oxyyxf9ts77k936qqrimyc69qacdpmgz Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=dCSqdTRA; spf=none (imf29.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.24) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1640262699-646053 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch adds the new memfd facility in KVM using MEMFD_OPS to provide guest memory from a file descriptor created in userspace with memfd_create() instead of traditional userspace hva. It mainly provides two kind of functions: - Pair/unpair a fd-based memslot to a memory backend that owns the file descriptor when such memslot gets created/deleted. - Get/put a pfn that to be used in KVM page fault handler from/to the paired memory backend. At the pairing time, KVM and the memfd subsystem exchange calllbacks that each can call into the other side. These callbacks are the major places to implement fd-based guest memory provisioning. KVM->memfd: - get_pfn: get and lock a page at specified offset in the fd. - put_pfn: put and unlock the pfn. Note: page needs to be locked between get_pfn/put_pfn to ensure pfn is valid when KVM uses it to establish the mapping in the secondary MMU page table. memfd->KVM: - invalidate_page_range: called when userspace punches hole on the fd, KVM should unmap related pages in the secondary MMU. - fallocate: called when userspace fallocates space on the fd, KVM can map related pages in the secondary MMU. - get/put_owner: used to ensure guest is still alive using a reference mechanism when calling above invalidate/fallocate callbacks. Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- arch/x86/kvm/Kconfig | 1 + include/linux/kvm_host.h | 6 +++ virt/kvm/Makefile.kvm | 2 +- virt/kvm/memfd.c | 91 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 99 insertions(+), 1 deletion(-) create mode 100644 virt/kvm/memfd.c diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 03b2ce34e7f4..86655cd660ca 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -46,6 +46,7 @@ config KVM select SRCU select INTERVAL_TREE select HAVE_KVM_PM_NOTIFIER if PM + select MEMFD_OPS help Support hosting fully virtualized guest machines using hardware virtualization extensions. You will need a fairly recent diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 3bd875f9669f..21f8b1880723 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -806,6 +806,12 @@ static inline void kvm_irqfd_exit(void) { } #endif + +int kvm_memfd_register(struct kvm *kvm, struct kvm_memory_slot *slot); +void kvm_memfd_unregister(struct kvm_memory_slot *slot); +long kvm_memfd_get_pfn(struct kvm_memory_slot *slot, gfn_t gfn, int *order); +void kvm_memfd_put_pfn(kvm_pfn_t pfn); + int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, struct module *module); void kvm_exit(void); diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm index ffdcad3cc97a..8842128d8429 100644 --- a/virt/kvm/Makefile.kvm +++ b/virt/kvm/Makefile.kvm @@ -5,7 +5,7 @@ KVM ?= ../../../virt/kvm -kvm-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o +kvm-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o $(KVM)/memfd.o kvm-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o kvm-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o diff --git a/virt/kvm/memfd.c b/virt/kvm/memfd.c new file mode 100644 index 000000000000..662393a76782 --- /dev/null +++ b/virt/kvm/memfd.c @@ -0,0 +1,91 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * memfd.c: routines for fd based guest memory + * Copyright (c) 2021, Intel Corporation. + * + * Author: + * Chao Peng + */ + +#include +#include + +#ifdef CONFIG_MEMFD_OPS +static const struct memfd_pfn_ops *memfd_ops; + +static void memfd_invalidate_page_range(struct inode *inode, void *owner, + pgoff_t start, pgoff_t end) +{ +} + +static void memfd_fallocate(struct inode *inode, void *owner, + pgoff_t start, pgoff_t end) +{ +} + +static bool memfd_get_owner(void *owner) +{ + return kvm_get_kvm_safe(owner); +} + +static void memfd_put_owner(void *owner) +{ + kvm_put_kvm(owner); +} + +static const struct memfd_falloc_notifier memfd_notifier = { + .invalidate_page_range = memfd_invalidate_page_range, + .fallocate = memfd_fallocate, + .get_owner = memfd_get_owner, + .put_owner = memfd_put_owner, +}; +#endif + +long kvm_memfd_get_pfn(struct kvm_memory_slot *slot, gfn_t gfn, int *order) +{ +#ifdef CONFIG_MEMFD_OPS + pgoff_t index = gfn - slot->base_gfn + (slot->ofs >> PAGE_SHIFT); + + return memfd_ops->get_lock_pfn(slot->file->f_inode, index, order); +#else + return -EOPNOTSUPP; +#endif +} + +void kvm_memfd_put_pfn(kvm_pfn_t pfn) +{ +#ifdef CONFIG_MEMFD_OPS + memfd_ops->put_unlock_pfn(pfn); +#endif +} + +int kvm_memfd_register(struct kvm *kvm, struct kvm_memory_slot *slot) +{ +#ifdef CONFIG_MEMFD_OPS + int ret; + struct fd fd = fdget(slot->fd); + + if (!fd.file) + return -EINVAL; + + ret = memfd_register_falloc_notifier(fd.file->f_inode, kvm, + &memfd_notifier, &memfd_ops); + if (ret) + return ret; + + slot->file = fd.file; + return 0; +#else + return -EOPNOTSUPP; +#endif +} + +void kvm_memfd_unregister(struct kvm_memory_slot *slot) +{ +#ifdef CONFIG_MEMFD_OPS + if (slot->file) { + fput(slot->file); + slot->file = NULL; + } +#endif +} From patchwork Thu Dec 23 12:30:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12698225 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F089BC433F5 for ; Thu, 23 Dec 2021 12:32:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8DD416B0082; Thu, 23 Dec 2021 07:32:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 88CDD6B0083; Thu, 23 Dec 2021 07:32:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 77C6B6B0085; Thu, 23 Dec 2021 07:32:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0145.hostedemail.com [216.40.44.145]) by kanga.kvack.org (Postfix) with ESMTP id 66DBD6B0082 for ; Thu, 23 Dec 2021 07:32:00 -0500 (EST) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 2BDE68249980 for ; Thu, 23 Dec 2021 12:32:00 +0000 (UTC) X-FDA: 78948996000.02.256FF5D Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf26.hostedemail.com (Postfix) with ESMTP id A15E1140005 for ; Thu, 23 Dec 2021 12:31:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640262718; x=1671798718; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=XyofphUy/LJlVhXv9gCJ1GZDqZ6qMpJmsB+GMsVzLxI=; b=kLbwQa5sucIx6qRdn6kUK+I2SadwbXeREbA663uNpc1zUSnkEvHXsTMg O29GTyjjqgoYnSIIX4O/qHomp49pJiBThCNwqAqwX7HjChYZ361oaf6YK nRUkWzG6REXhbeQTG3Fk94Lak/nlq8L1lM5DT6hbxQdq661NXSdkaM5Ws xb1iFspWmuvB47D9nUSy1tzSZHDx/vvrEu7rVD2ztuugCcSeDBBUrPlIL KkJ3bAfIUkr2FBBkxGbSHL4MNKwejXgXl4DSSacfDNEhXR286P3kGZJPP jNFfgnxWxob3ioBz3iKJSyPPrnGAypeERLJhQ364RHqjWxdxdmmmvrj1t A==; X-IronPort-AV: E=McAfee;i="6200,9189,10206"; a="227661037" X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="227661037" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Dec 2021 04:31:56 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="522078768" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 23 Dec 2021 04:31:42 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [PATCH v3 kvm/queue 07/16] KVM: Refactor hva based memory invalidation code Date: Thu, 23 Dec 2021 20:30:02 +0800 Message-Id: <20211223123011.41044-8-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211223123011.41044-1-chao.p.peng@linux.intel.com> References: <20211223123011.41044-1-chao.p.peng@linux.intel.com> Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=kLbwQa5s; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf26.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.126) smtp.mailfrom=chao.p.peng@linux.intel.com X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: A15E1140005 X-Stat-Signature: 8itf3er8mcu377o8bwunyj7eki4w5o7s X-HE-Tag: 1640262718-459714 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The purpose of this patch is for fd-based memslot to reuse the same mmu_notifier based guest memory invalidation code for private pages. No functional changes except renaming 'hva' to more neutral 'useraddr' so that it can also cover 'offset' in a fd that private pages live in. Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- include/linux/kvm_host.h | 8 ++++-- virt/kvm/kvm_main.c | 55 ++++++++++++++++++++++------------------ 2 files changed, 36 insertions(+), 27 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 21f8b1880723..07863ff855cd 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1464,9 +1464,13 @@ static inline int memslot_id(struct kvm *kvm, gfn_t gfn) } static inline gfn_t -hva_to_gfn_memslot(unsigned long hva, struct kvm_memory_slot *slot) +useraddr_to_gfn_memslot(unsigned long useraddr, struct kvm_memory_slot *slot, + bool addr_is_hva) { - gfn_t gfn_offset = (hva - slot->userspace_addr) >> PAGE_SHIFT; + unsigned long useraddr_base = addr_is_hva ? slot->userspace_addr + : slot->ofs; + + gfn_t gfn_offset = (useraddr - useraddr_base) >> PAGE_SHIFT; return slot->base_gfn + gfn_offset; } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 47e96d1eb233..b7a1c4d7eaaa 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -486,16 +486,16 @@ static void kvm_mmu_notifier_invalidate_range(struct mmu_notifier *mn, srcu_read_unlock(&kvm->srcu, idx); } -typedef bool (*hva_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range); +typedef bool (*gfn_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range); typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start, unsigned long end); -struct kvm_hva_range { +struct kvm_useraddr_range { unsigned long start; unsigned long end; pte_t pte; - hva_handler_t handler; + gfn_handler_t handler; on_lock_fn_t on_lock; bool flush_on_ret; bool may_block; @@ -515,13 +515,13 @@ static void kvm_null_fn(void) #define IS_KVM_NULL_FN(fn) ((fn) == (void *)kvm_null_fn) /* Iterate over each memslot intersecting [start, last] (inclusive) range */ -#define kvm_for_each_memslot_in_hva_range(node, slots, start, last) \ - for (node = interval_tree_iter_first(&slots->hva_tree, start, last); \ +#define kvm_for_each_memslot_in_useraddr_range(node, tree, start, last) \ + for (node = interval_tree_iter_first(tree, start, last); \ node; \ node = interval_tree_iter_next(node, start, last)) \ -static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, - const struct kvm_hva_range *range) +static __always_inline int __kvm_handle_useraddr_range(struct kvm *kvm, + const struct kvm_useraddr_range *range) { bool ret = false, locked = false; struct kvm_gfn_range gfn_range; @@ -540,17 +540,19 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, idx = srcu_read_lock(&kvm->srcu); for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { + struct rb_root_cached *useraddr_tree; struct interval_tree_node *node; slots = __kvm_memslots(kvm, i); - kvm_for_each_memslot_in_hva_range(node, slots, + useraddr_tree = &slots->hva_tree; + kvm_for_each_memslot_in_useraddr_range(node, useraddr_tree, range->start, range->end - 1) { - unsigned long hva_start, hva_end; + unsigned long useraddr_start, useraddr_end; slot = container_of(node, struct kvm_memory_slot, hva_node[slots->node_idx]); - hva_start = max(range->start, slot->userspace_addr); - hva_end = min(range->end, slot->userspace_addr + - (slot->npages << PAGE_SHIFT)); + useraddr_start = max(range->start, slot->userspace_addr); + useraddr_end = min(range->end, slot->userspace_addr + + (slot->npages << PAGE_SHIFT)); /* * To optimize for the likely case where the address @@ -562,11 +564,14 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, gfn_range.may_block = range->may_block; /* - * {gfn(page) | page intersects with [hva_start, hva_end)} = + * {gfn(page) | page intersects with [useraddr_start, useraddr_end)} = * {gfn_start, gfn_start+1, ..., gfn_end-1}. */ - gfn_range.start = hva_to_gfn_memslot(hva_start, slot); - gfn_range.end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot); + gfn_range.start = useraddr_to_gfn_memslot(useraddr_start, + slot, true); + gfn_range.end = useraddr_to_gfn_memslot( + useraddr_end + PAGE_SIZE - 1, + slot, true); gfn_range.slot = slot; if (!locked) { @@ -597,10 +602,10 @@ static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn, unsigned long start, unsigned long end, pte_t pte, - hva_handler_t handler) + gfn_handler_t handler) { struct kvm *kvm = mmu_notifier_to_kvm(mn); - const struct kvm_hva_range range = { + const struct kvm_useraddr_range range = { .start = start, .end = end, .pte = pte, @@ -610,16 +615,16 @@ static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn, .may_block = false, }; - return __kvm_handle_hva_range(kvm, &range); + return __kvm_handle_useraddr_range(kvm, &range); } static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn, unsigned long start, unsigned long end, - hva_handler_t handler) + gfn_handler_t handler) { struct kvm *kvm = mmu_notifier_to_kvm(mn); - const struct kvm_hva_range range = { + const struct kvm_useraddr_range range = { .start = start, .end = end, .pte = __pte(0), @@ -629,7 +634,7 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn .may_block = false, }; - return __kvm_handle_hva_range(kvm, &range); + return __kvm_handle_useraddr_range(kvm, &range); } static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, struct mm_struct *mm, @@ -687,7 +692,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, const struct mmu_notifier_range *range) { struct kvm *kvm = mmu_notifier_to_kvm(mn); - const struct kvm_hva_range hva_range = { + const struct kvm_useraddr_range useraddr_range = { .start = range->start, .end = range->end, .pte = __pte(0), @@ -711,7 +716,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, kvm->mn_active_invalidate_count++; spin_unlock(&kvm->mn_invalidate_lock); - __kvm_handle_hva_range(kvm, &hva_range); + __kvm_handle_useraddr_range(kvm, &useraddr_range); return 0; } @@ -738,7 +743,7 @@ static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, const struct mmu_notifier_range *range) { struct kvm *kvm = mmu_notifier_to_kvm(mn); - const struct kvm_hva_range hva_range = { + const struct kvm_useraddr_range useraddr_range = { .start = range->start, .end = range->end, .pte = __pte(0), @@ -749,7 +754,7 @@ static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, }; bool wake; - __kvm_handle_hva_range(kvm, &hva_range); + __kvm_handle_useraddr_range(kvm, &useraddr_range); /* Pairs with the increment in range_start(). */ spin_lock(&kvm->mn_invalidate_lock); From patchwork Thu Dec 23 12:30:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12698226 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C6A0C433F5 for ; Thu, 23 Dec 2021 12:32:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A8B436B0072; Thu, 23 Dec 2021 07:32:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A3AD86B0085; Thu, 23 Dec 2021 07:32:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 902CB6B0087; Thu, 23 Dec 2021 07:32:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0025.hostedemail.com [216.40.44.25]) by kanga.kvack.org (Postfix) with ESMTP id 7ECAA6B0072 for ; Thu, 23 Dec 2021 07:32:08 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 41CF5180E935A for ; Thu, 23 Dec 2021 12:32:08 +0000 (UTC) X-FDA: 78948996336.30.D9BAD51 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf26.hostedemail.com (Postfix) with ESMTP id D6E1B140029 for ; Thu, 23 Dec 2021 12:32:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640262727; x=1671798727; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=jFXnBQCQUYkIJxpEwP0R21h1nTh23vZQC9pPmI8hXVw=; b=jBXijGA/jNU+XhqsVggfuZ4MHOnT50ozjrxSJhAqxtNZZu8eveQMn+vy mqC2MlU41MeycklymKaOnas3lA9Xwm38l8Ws7a36cKwCTqUjINbWOHYdt 0pXLu2FL5EhAC+w9ffQr5vOoZgPztHkWZ32yy5CeiLuA4p/yH1x3zkJ+E GT63Igxu8hf3Hv72KHTM69MvkwRMCXA2rEyJVZGptPS6orl63plIM8CFN qpSvZn6ea6g6cXaraSYjjOA9NZJwnXyOYZeAWnLau+X+VXxZJl07mURPE GcFyFYngOdeVH7z3VSAER7X+7oY6EoxWtQzeB3bUgaak882dRuE3OXA/f g==; X-IronPort-AV: E=McAfee;i="6200,9189,10206"; a="227661045" X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="227661045" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Dec 2021 04:31:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="522078821" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 23 Dec 2021 04:31:50 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [PATCH v3 kvm/queue 08/16] KVM: Special handling for fd-based memory invalidation Date: Thu, 23 Dec 2021 20:30:03 +0800 Message-Id: <20211223123011.41044-9-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211223123011.41044-1-chao.p.peng@linux.intel.com> References: <20211223123011.41044-1-chao.p.peng@linux.intel.com> Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="jBXijGA/"; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf26.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.126) smtp.mailfrom=chao.p.peng@linux.intel.com X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: D6E1B140029 X-Stat-Signature: 95hf3komg5cpu4pdt6t151jfukwaw3di X-HE-Tag: 1640262726-211425 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For fd-based guest memory, the memory backend (e.g. the fd provider) should notify KVM to unmap/invalidate the privated memory from KVM secondary MMU when userspace punches hole on the fd (e.g. when userspace converts private memory to shared memory). To support fd-based memory invalidation, existing hva-based memory invalidation needs to be extended. A new 'inode' for the fd is passed in from memfd_falloc_notifier and the 'start/end' will represent start/end offset in the fd instead of hva range. During the invalidation KVM needs to check this inode against that in the memslot. Only when the 'inode' in memslot equals to the passed-in 'inode' we should invalidate the mapping in KVM. Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- virt/kvm/kvm_main.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index b7a1c4d7eaaa..19736a0013a0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -494,6 +494,7 @@ typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start, struct kvm_useraddr_range { unsigned long start; unsigned long end; + struct inode *inode; pte_t pte; gfn_handler_t handler; on_lock_fn_t on_lock; @@ -544,14 +545,27 @@ static __always_inline int __kvm_handle_useraddr_range(struct kvm *kvm, struct interval_tree_node *node; slots = __kvm_memslots(kvm, i); - useraddr_tree = &slots->hva_tree; + useraddr_tree = range->inode ? &slots->ofs_tree : &slots->hva_tree; kvm_for_each_memslot_in_useraddr_range(node, useraddr_tree, range->start, range->end - 1) { unsigned long useraddr_start, useraddr_end; + unsigned long useraddr_base; + + if (range->inode) { + slot = container_of(node, struct kvm_memory_slot, + ofs_node[slots->node_idx]); + if (!slot->file || + slot->file->f_inode != range->inode) + continue; + useraddr_base = slot->ofs; + } else { + slot = container_of(node, struct kvm_memory_slot, + hva_node[slots->node_idx]); + useraddr_base = slot->userspace_addr; + } - slot = container_of(node, struct kvm_memory_slot, hva_node[slots->node_idx]); - useraddr_start = max(range->start, slot->userspace_addr); - useraddr_end = min(range->end, slot->userspace_addr + + useraddr_start = max(range->start, useraddr_base); + useraddr_end = min(range->end, useraddr_base + (slot->npages << PAGE_SHIFT)); /* @@ -568,10 +582,10 @@ static __always_inline int __kvm_handle_useraddr_range(struct kvm *kvm, * {gfn_start, gfn_start+1, ..., gfn_end-1}. */ gfn_range.start = useraddr_to_gfn_memslot(useraddr_start, - slot, true); + slot, !range->inode); gfn_range.end = useraddr_to_gfn_memslot( useraddr_end + PAGE_SIZE - 1, - slot, true); + slot, !range->inode); gfn_range.slot = slot; if (!locked) { @@ -613,6 +627,7 @@ static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn, .on_lock = (void *)kvm_null_fn, .flush_on_ret = true, .may_block = false, + .inode = NULL, }; return __kvm_handle_useraddr_range(kvm, &range); @@ -632,6 +647,7 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn .on_lock = (void *)kvm_null_fn, .flush_on_ret = false, .may_block = false, + .inode = NULL, }; return __kvm_handle_useraddr_range(kvm, &range); @@ -700,6 +716,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, .on_lock = kvm_inc_notifier_count, .flush_on_ret = true, .may_block = mmu_notifier_range_blockable(range), + .inode = NULL, }; trace_kvm_unmap_hva_range(range->start, range->end); @@ -751,6 +768,7 @@ static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, .on_lock = kvm_dec_notifier_count, .flush_on_ret = false, .may_block = mmu_notifier_range_blockable(range), + .inode = NULL, }; bool wake; From patchwork Thu Dec 23 12:30:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12698227 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCF6FC43219 for ; Thu, 23 Dec 2021 12:32:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 77E346B0073; Thu, 23 Dec 2021 07:32:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 72D386B0085; Thu, 23 Dec 2021 07:32:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F4D96B0087; Thu, 23 Dec 2021 07:32:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0214.hostedemail.com [216.40.44.214]) by kanga.kvack.org (Postfix) with ESMTP id 4EC336B0073 for ; Thu, 23 Dec 2021 07:32:19 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 062F78249980 for ; Thu, 23 Dec 2021 12:32:19 +0000 (UTC) X-FDA: 78948996798.07.02D7531 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf18.hostedemail.com (Postfix) with ESMTP id A57641C001E for ; Thu, 23 Dec 2021 12:32:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640262738; x=1671798738; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=ALhrDscz25NWpd8el8w795CyW0Z7WyWdszVSzOyAnwk=; b=EHXmD7cy6QNGaGlYedLnH7byKw2MeaN/ECCy6HO3gMx6JHGYeBIWOrdJ dMKgcgeRXfH9hClvANtXXA91IWGkpvFEFZ97kV8Aq8rcjPeYMimS9Rk3/ rJyn3YITJPgM4n8n8u3LsarG6efIDPUBf70xrde2Y9qrjlAPIuGN1k86s 0XXGUV8vIodiWrGPyB9IDP8HGZDYYKzs0nH7hvfBC6s8LsBhzGtBSygvM biOq3mGkUkke5mHnOzc7Ro9uwkHumQ60UXRhqwNtihIjbZEK+TUPhsSI1 TuGe5weVlmjFhY7TyKqAzXgj049Ons/uNRNLzrXUsNbmoNXjaUSvaMykN A==; X-IronPort-AV: E=McAfee;i="6200,9189,10206"; a="227661095" X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="227661095" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Dec 2021 04:32:16 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="522078921" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 23 Dec 2021 04:31:58 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [PATCH v3 kvm/queue 09/16] KVM: Split out common memory invalidation code Date: Thu, 23 Dec 2021 20:30:04 +0800 Message-Id: <20211223123011.41044-10-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211223123011.41044-1-chao.p.peng@linux.intel.com> References: <20211223123011.41044-1-chao.p.peng@linux.intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: A57641C001E X-Stat-Signature: xsno11c997r6razgowszmeytjcjxpmtx Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=EHXmD7cy; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf18.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.126) smtp.mailfrom=chao.p.peng@linux.intel.com X-HE-Tag: 1640262731-383162 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When fd-based memory is enabled, there will be two types of memory invalidation: - memory invalidation from native MMU through mmu_notifier callback for hva-based memory, and, - memory invalidation from memfd through memfd_notifier callback for fd-based memory. Some code can be shared between these two types of memory invalidation. This patch moves those shared code into one place so that it can be used for both CONFIG_MMU_NOTIFIER and CONFIG_MEMFD_NOTIFIER. Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- virt/kvm/kvm_main.c | 35 +++++++++++++++++++---------------- 1 file changed, 19 insertions(+), 16 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 19736a0013a0..7b7530b1ea1e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -469,22 +469,6 @@ void kvm_destroy_vcpus(struct kvm *kvm) EXPORT_SYMBOL_GPL(kvm_destroy_vcpus); #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) -static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn) -{ - return container_of(mn, struct kvm, mmu_notifier); -} - -static void kvm_mmu_notifier_invalidate_range(struct mmu_notifier *mn, - struct mm_struct *mm, - unsigned long start, unsigned long end) -{ - struct kvm *kvm = mmu_notifier_to_kvm(mn); - int idx; - - idx = srcu_read_lock(&kvm->srcu); - kvm_arch_mmu_notifier_invalidate_range(kvm, start, end); - srcu_read_unlock(&kvm->srcu, idx); -} typedef bool (*gfn_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range); @@ -611,6 +595,25 @@ static __always_inline int __kvm_handle_useraddr_range(struct kvm *kvm, /* The notifiers are averse to booleans. :-( */ return (int)ret; } +#endif + +#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) +static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn) +{ + return container_of(mn, struct kvm, mmu_notifier); +} + +static void kvm_mmu_notifier_invalidate_range(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + int idx; + + idx = srcu_read_lock(&kvm->srcu); + kvm_arch_mmu_notifier_invalidate_range(kvm, start, end); + srcu_read_unlock(&kvm->srcu, idx); +} static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn, unsigned long start, From patchwork Thu Dec 23 12:30:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12698228 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E3F5C433FE for ; Thu, 23 Dec 2021 12:32:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BEB4D6B0085; Thu, 23 Dec 2021 07:32:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B99BE6B0087; Thu, 23 Dec 2021 07:32:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A610A6B0088; Thu, 23 Dec 2021 07:32:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0134.hostedemail.com [216.40.44.134]) by kanga.kvack.org (Postfix) with ESMTP id 874A56B0087 for ; Thu, 23 Dec 2021 07:32:19 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 4CF2988487 for ; Thu, 23 Dec 2021 12:32:19 +0000 (UTC) X-FDA: 78948996798.03.9CEF5B2 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf31.hostedemail.com (Postfix) with ESMTP id 0DDD820021 for ; Thu, 23 Dec 2021 12:32:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640262738; x=1671798738; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=4zBy2wqGUUHRa+bg8KFrUxUekBSmOOoI4eennjfoLXQ=; b=Fw3nkNZfh3EOr16a+JImoqvoFJ59BMoAPDcLqF+mMpfoOA5ndoYbkynZ iGxchhpbC0cFbGudkGApMgLnjTjUM84jnPu7+LAliRLBcqszButOGxi0H qrdToqFIzVa2Rv+2Ra+TmxfwTkgly83R8+Fp22hJoa9h9nrz4PbNCzp0y xInXBMY8cCWBDkJf1suPxwtvyDjZ5/QkH3TryDoIPYE8oMiCLeulqd6EC P3/0YKVA0vJKldbITDwtZF05qfqGiprqqUNRB940O2GCsGIVjcCrCCrD4 aaIKDa2gQ6QK+D3asQCT18YoIHW81WePwN2Fk9PRD4/cp66eVgimIIFfP w==; X-IronPort-AV: E=McAfee;i="6200,9189,10206"; a="227661096" X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="227661096" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Dec 2021 04:32:17 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="522078930" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 23 Dec 2021 04:32:09 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [PATCH v3 kvm/queue 10/16] KVM: Implement fd-based memory invalidation Date: Thu, 23 Dec 2021 20:30:05 +0800 Message-Id: <20211223123011.41044-11-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211223123011.41044-1-chao.p.peng@linux.intel.com> References: <20211223123011.41044-1-chao.p.peng@linux.intel.com> X-Rspamd-Queue-Id: 0DDD820021 X-Stat-Signature: qq3o7p8ue1qcbt94ptxcbgwzcg7rajeq Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Fw3nkNZf; spf=none (imf31.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.126) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam10 X-HE-Tag: 1640262724-357909 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: KVM gets notified when userspace punches a hole in a fd which is used for guest memory. KVM should invalidate the mapping in the secondary MMU page tables. This is the same logic as MMU notifier invalidation except the fd related information is carried around to indicate the memory range. KVM hence can reuse most of existing MMU notifier invalidation code including looping through the memslots and then calling into kvm_unmap_gfn_range() which should do whatever needed for fd-based memory unmapping (e.g. for private memory managed by TDX it may need call into SEAM-MODULE). Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- include/linux/kvm_host.h | 8 ++++- virt/kvm/kvm_main.c | 69 +++++++++++++++++++++++++++++++--------- virt/kvm/memfd.c | 2 ++ 3 files changed, 63 insertions(+), 16 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 07863ff855cd..be567925831b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -233,7 +233,7 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu); #endif -#ifdef KVM_ARCH_WANT_MMU_NOTIFIER +#if defined(KVM_ARCH_WANT_MMU_NOTIFIER) || defined(CONFIG_MEMFD_OPS) struct kvm_gfn_range { struct kvm_memory_slot *slot; gfn_t start; @@ -2012,4 +2012,10 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu) /* Max number of entries allowed for each kvm dirty ring */ #define KVM_DIRTY_RING_MAX_ENTRIES 65536 +#ifdef CONFIG_MEMFD_OPS +int kvm_memfd_invalidate_range(struct kvm *kvm, struct inode *inode, + unsigned long start, unsigned long end); +#endif /* CONFIG_MEMFD_OPS */ + + #endif diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7b7530b1ea1e..f495c1a313bd 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -468,7 +468,8 @@ void kvm_destroy_vcpus(struct kvm *kvm) } EXPORT_SYMBOL_GPL(kvm_destroy_vcpus); -#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) +#if defined(CONFIG_MEMFD_OPS) ||\ + (defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)) typedef bool (*gfn_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range); @@ -595,6 +596,30 @@ static __always_inline int __kvm_handle_useraddr_range(struct kvm *kvm, /* The notifiers are averse to booleans. :-( */ return (int)ret; } + +static void mn_active_invalidate_count_inc(struct kvm *kvm) +{ + spin_lock(&kvm->mn_invalidate_lock); + kvm->mn_active_invalidate_count++; + spin_unlock(&kvm->mn_invalidate_lock); + +} + +static void mn_active_invalidate_count_dec(struct kvm *kvm) +{ + bool wake; + + spin_lock(&kvm->mn_invalidate_lock); + wake = (--kvm->mn_active_invalidate_count == 0); + spin_unlock(&kvm->mn_invalidate_lock); + + /* + * There can only be one waiter, since the wait happens under + * slots_lock. + */ + if (wake) + rcuwait_wake_up(&kvm->mn_memslots_update_rcuwait); +} #endif #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) @@ -732,9 +757,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, * * Pairs with the decrement in range_end(). */ - spin_lock(&kvm->mn_invalidate_lock); - kvm->mn_active_invalidate_count++; - spin_unlock(&kvm->mn_invalidate_lock); + mn_active_invalidate_count_inc(kvm); __kvm_handle_useraddr_range(kvm, &useraddr_range); @@ -773,21 +796,11 @@ static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, .may_block = mmu_notifier_range_blockable(range), .inode = NULL, }; - bool wake; __kvm_handle_useraddr_range(kvm, &useraddr_range); /* Pairs with the increment in range_start(). */ - spin_lock(&kvm->mn_invalidate_lock); - wake = (--kvm->mn_active_invalidate_count == 0); - spin_unlock(&kvm->mn_invalidate_lock); - - /* - * There can only be one waiter, since the wait happens under - * slots_lock. - */ - if (wake) - rcuwait_wake_up(&kvm->mn_memslots_update_rcuwait); + mn_active_invalidate_count_dec(kvm); BUG_ON(kvm->mmu_notifier_count < 0); } @@ -872,6 +885,32 @@ static int kvm_init_mmu_notifier(struct kvm *kvm) #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */ +#ifdef CONFIG_MEMFD_OPS +int kvm_memfd_invalidate_range(struct kvm *kvm, struct inode *inode, + unsigned long start, unsigned long end) +{ + int ret; + const struct kvm_useraddr_range useraddr_range = { + .start = start, + .end = end, + .pte = __pte(0), + .handler = kvm_unmap_gfn_range, + .on_lock = (void *)kvm_null_fn, + .flush_on_ret = true, + .may_block = false, + .inode = inode, + }; + + + /* Prevent memslot modification */ + mn_active_invalidate_count_inc(kvm); + ret = __kvm_handle_useraddr_range(kvm, &useraddr_range); + mn_active_invalidate_count_dec(kvm); + + return ret; +} +#endif /* CONFIG_MEMFD_OPS */ + #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER static int kvm_pm_notifier_call(struct notifier_block *bl, unsigned long state, diff --git a/virt/kvm/memfd.c b/virt/kvm/memfd.c index 662393a76782..547f65f5a187 100644 --- a/virt/kvm/memfd.c +++ b/virt/kvm/memfd.c @@ -16,6 +16,8 @@ static const struct memfd_pfn_ops *memfd_ops; static void memfd_invalidate_page_range(struct inode *inode, void *owner, pgoff_t start, pgoff_t end) { + kvm_memfd_invalidate_range(owner, inode, start >> PAGE_SHIFT, + end >> PAGE_SHIFT); } static void memfd_fallocate(struct inode *inode, void *owner, From patchwork Thu Dec 23 12:30:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12698229 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51B94C433F5 for ; Thu, 23 Dec 2021 12:32:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E82116B0074; Thu, 23 Dec 2021 07:32:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E31806B0087; Thu, 23 Dec 2021 07:32:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD4056B0088; Thu, 23 Dec 2021 07:32:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0091.hostedemail.com [216.40.44.91]) by kanga.kvack.org (Postfix) with ESMTP id BC5626B0074 for ; Thu, 23 Dec 2021 07:32:29 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 7BBEE82FE5 for ; Thu, 23 Dec 2021 12:32:29 +0000 (UTC) X-FDA: 78948997218.26.E6A3BFA Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf11.hostedemail.com (Postfix) with ESMTP id DDDF64002A for ; Thu, 23 Dec 2021 12:32:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640262748; x=1671798748; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=nMOnvZvv3gJKoKC9EEW6w1t+Yt8L0I8cTqGCDM02VXY=; b=H9/3tQSouzNUthnq7+kLIttC1cACXnf1WlYVcm71ZUypeG14LiiRkRtC qAsVbNBtcYr1nax2+GdraOHKnjh5vgqf969ZEQMhT20KH+2ViGhL/3ytB zd7dTtC0NXRnlOc2e6+i/RKZNzqtE5MVSKZMpdqW4EqlecZv2f2Km2FDx Zn8rkvhU9oFQ2Pqf7GvW2Srnsarb3cLe1lerjXYnvw2YMl8YZaUBq3Oun 6tkQyB19pzVHYxp/OUKQxwHxoHbYj4ceYeHJPn0cnCIWfWMD7fd1UK1/C sssDZfqUh4FwvnFiOswOFlBU/7uyUaAvdT2BJu3/TK34Jdn14QcZ/75U7 Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10206"; a="238352388" X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="238352388" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Dec 2021 04:32:24 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="522078967" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 23 Dec 2021 04:32:16 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [PATCH v3 kvm/queue 11/16] KVM: Add kvm_map_gfn_range Date: Thu, 23 Dec 2021 20:30:06 +0800 Message-Id: <20211223123011.41044-12-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211223123011.41044-1-chao.p.peng@linux.intel.com> References: <20211223123011.41044-1-chao.p.peng@linux.intel.com> X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: DDDF64002A X-Stat-Signature: 7ahsqt98yaws5idw7barn6exkgjfk17w Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="H9/3tQSo"; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf11.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=chao.p.peng@linux.intel.com X-HE-Tag: 1640262745-546269 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This new function establishes the mapping in KVM page tables for a given gfn range. It can be used in the memory fallocate callback for memfd based memory to establish the mapping for KVM secondary MMU when the pages are allocated in the memory backend. Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- arch/x86/kvm/mmu/mmu.c | 47 ++++++++++++++++++++++++++++++++++++++++ include/linux/kvm_host.h | 2 ++ virt/kvm/kvm_main.c | 5 +++++ 3 files changed, 54 insertions(+) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 1d275e9d76b5..2856eb662a21 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1568,6 +1568,53 @@ static __always_inline bool kvm_handle_gfn_range(struct kvm *kvm, return ret; } +bool kvm_map_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) +{ + struct kvm_vcpu *vcpu; + kvm_pfn_t pfn; + gfn_t gfn; + int idx; + bool ret = true; + + /* Need vcpu context for kvm_mmu_do_page_fault. */ + vcpu = kvm_get_vcpu(kvm, 0); + if (mutex_lock_killable(&vcpu->mutex)) + return false; + + vcpu_load(vcpu); + idx = srcu_read_lock(&kvm->srcu); + + kvm_mmu_reload(vcpu); + + gfn = range->start; + while (gfn < range->end) { + if (signal_pending(current)) { + ret = false; + break; + } + + if (need_resched()) + cond_resched(); + + pfn = kvm_mmu_do_page_fault(vcpu, gfn << PAGE_SHIFT, + PFERR_WRITE_MASK | PFERR_USER_MASK, + false); + if (is_error_noslot_pfn(pfn) || kvm->vm_bugged) { + ret = false; + break; + } + + gfn++; + } + + srcu_read_unlock(&kvm->srcu, idx); + vcpu_put(vcpu); + + mutex_unlock(&vcpu->mutex); + + return ret; +} + bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) { bool flush = false; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index be567925831b..8c2359175509 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -241,6 +241,8 @@ struct kvm_gfn_range { pte_t pte; bool may_block; }; + +bool kvm_map_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f495c1a313bd..660ce15973ad 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -471,6 +471,11 @@ EXPORT_SYMBOL_GPL(kvm_destroy_vcpus); #if defined(CONFIG_MEMFD_OPS) ||\ (defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)) +bool __weak kvm_map_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) +{ + return false; +} + typedef bool (*gfn_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range); typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start, From patchwork Thu Dec 23 12:30:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12698230 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75D4EC433FE for ; Thu, 23 Dec 2021 12:32:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C6326B0087; Thu, 23 Dec 2021 07:32:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 076C76B0088; Thu, 23 Dec 2021 07:32:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E810A6B0089; Thu, 23 Dec 2021 07:32:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0221.hostedemail.com [216.40.44.221]) by kanga.kvack.org (Postfix) with ESMTP id D77576B0087 for ; Thu, 23 Dec 2021 07:32:33 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id A3E6D180E935A for ; Thu, 23 Dec 2021 12:32:33 +0000 (UTC) X-FDA: 78948997386.25.A14B822 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf14.hostedemail.com (Postfix) with ESMTP id 71D1C100026 for ; Thu, 23 Dec 2021 12:32:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640262753; x=1671798753; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=YEqJBACRcYF6r6P724iXUNeYRL5G8D47PprkAJHUqo4=; b=kUKw6duq3Mwk9e/E0Nx/6uDxiDWsuV1k15n8yW5Q0B1jovvgrDHr5PlT YFfMjC+zLeFUjnl9IyGVU1wT/tbG1opUYgh775aXmpYGRhrV8dum0BhkM 2jyftlJsuVYMKkwie7+YiHG5jBoyvFTOGSKNiv7/LQ9RZDU1zKc0KTdpX PfF9a+/PJUA0SIjTJjUs0+EtKFckVs33Briqp4sjpK0sJiA0rwjAW1A5m ccX/pFs7Zq/z6T+ZyW0e+QF8mdi/Vtm3LPYbniLHvdc/mMEyvBLNsmTKb bGPjGg4GvxipGxlkzMliwBPN/KF+WbinyBpAX1Lqt4ObNhvR9m0N3IPzz Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10206"; a="228114739" X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="228114739" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Dec 2021 04:32:31 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="522078999" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 23 Dec 2021 04:32:24 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [PATCH v3 kvm/queue 12/16] KVM: Implement fd-based memory fallocation Date: Thu, 23 Dec 2021 20:30:07 +0800 Message-Id: <20211223123011.41044-13-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211223123011.41044-1-chao.p.peng@linux.intel.com> References: <20211223123011.41044-1-chao.p.peng@linux.intel.com> Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=kUKw6duq; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf14.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.20) smtp.mailfrom=chao.p.peng@linux.intel.com X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 71D1C100026 X-Stat-Signature: z6nei3hei3krah3wqyi7wg4w6gu73s5p X-HE-Tag: 1640262751-615868 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: KVM gets notified through memfd_notifier when userspace allocatea space via fallocate() on the fd which is used for guest memory. KVM can set up the mapping in the secondary MMU page tables at this time. This patch adds function in KVM to map pfn to gfn when the page is allocated in the memory backend. While it's possible to postpone the mapping of the secondary MMU to KVM page fault handler but we can reduce some VMExits by also mapping the secondary page tables when a page is mapped in the primary MMU. It reuses the same code for kvm_memfd_invalidate_range, except using kvm_map_gfn_range as its handler. Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- include/linux/kvm_host.h | 2 ++ virt/kvm/kvm_main.c | 22 +++++++++++++++++++--- virt/kvm/memfd.c | 2 ++ 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 8c2359175509..ad89a0e8bf6b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2017,6 +2017,8 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu) #ifdef CONFIG_MEMFD_OPS int kvm_memfd_invalidate_range(struct kvm *kvm, struct inode *inode, unsigned long start, unsigned long end); +int kvm_memfd_fallocate_range(struct kvm *kvm, struct inode *inode, + unsigned long start, unsigned long end); #endif /* CONFIG_MEMFD_OPS */ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 660ce15973ad..36dd2adcd7fc 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -891,15 +891,17 @@ static int kvm_init_mmu_notifier(struct kvm *kvm) #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */ #ifdef CONFIG_MEMFD_OPS -int kvm_memfd_invalidate_range(struct kvm *kvm, struct inode *inode, - unsigned long start, unsigned long end) +int kvm_memfd_handle_range(struct kvm *kvm, struct inode *inode, + unsigned long start, unsigned long end, + gfn_handler_t handler) + { int ret; const struct kvm_useraddr_range useraddr_range = { .start = start, .end = end, .pte = __pte(0), - .handler = kvm_unmap_gfn_range, + .handler = handler, .on_lock = (void *)kvm_null_fn, .flush_on_ret = true, .may_block = false, @@ -914,6 +916,20 @@ int kvm_memfd_invalidate_range(struct kvm *kvm, struct inode *inode, return ret; } + +int kvm_memfd_invalidate_range(struct kvm *kvm, struct inode *inode, + unsigned long start, unsigned long end) +{ + return kvm_memfd_handle_range(kvm, inode, start, end, + kvm_unmap_gfn_range); +} + +int kvm_memfd_fallocate_range(struct kvm *kvm, struct inode *inode, + unsigned long start, unsigned long end) +{ + return kvm_memfd_handle_range(kvm, inode, start, end, + kvm_map_gfn_range); +} #endif /* CONFIG_MEMFD_OPS */ #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER diff --git a/virt/kvm/memfd.c b/virt/kvm/memfd.c index 547f65f5a187..91a17c9fbc49 100644 --- a/virt/kvm/memfd.c +++ b/virt/kvm/memfd.c @@ -23,6 +23,8 @@ static void memfd_invalidate_page_range(struct inode *inode, void *owner, static void memfd_fallocate(struct inode *inode, void *owner, pgoff_t start, pgoff_t end) { + kvm_memfd_fallocate_range(owner, inode, start >> PAGE_SHIFT, + end >> PAGE_SHIFT); } static bool memfd_get_owner(void *owner) From patchwork Thu Dec 23 12:30:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12698231 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60758C433F5 for ; Thu, 23 Dec 2021 12:32:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0299C6B0089; Thu, 23 Dec 2021 07:32:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F1BE46B008A; Thu, 23 Dec 2021 07:32:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE3F66B008C; Thu, 23 Dec 2021 07:32:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0103.hostedemail.com [216.40.44.103]) by kanga.kvack.org (Postfix) with ESMTP id CE1686B0089 for ; Thu, 23 Dec 2021 07:32:41 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 91947180E935A for ; Thu, 23 Dec 2021 12:32:41 +0000 (UTC) X-FDA: 78948997722.11.58F5554 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf08.hostedemail.com (Postfix) with ESMTP id 7A4F7160030 for ; Thu, 23 Dec 2021 12:32:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640262761; x=1671798761; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=BEfM/tVOpaAZqZG+iExZFe2tWlUrZuoAMwYVM4OH/aE=; b=aDwGCzil/jJcdHYNrN1kqwxB6OcMamZufFVhGmyVfZiBxVRNZKiJlVdk mRbtpHzItQenAXEGIRfSG+MI2tYNNFlP1o0rFME7g6gpRRwfIzaIbuqQz wv6ga6Zgn9/77uFVCeNwiIacSeLPyO3lff9gra50bwpKJkbhyfu/ZpHB5 u63aSPO438VpJqwsaCM5bZZxgdKca8hXYXth3JDKqtrsZDfSHxTFwlVM+ Du5Fiu3Reo/a5+SHC8kdI0WpUlSmXrzY+0GTXl+nc1uHXTYR5XPQ3fcRf NMLkZnXSCUwJIVbf2XDMsYLyJGPaxCTtUEU8h4VXB59/8FjtGWVbAzOxV Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10206"; a="265026981" X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="265026981" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Dec 2021 04:32:39 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="522079039" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 23 Dec 2021 04:32:31 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [PATCH v3 kvm/queue 13/16] KVM: Add KVM_EXIT_MEMORY_ERROR exit Date: Thu, 23 Dec 2021 20:30:08 +0800 Message-Id: <20211223123011.41044-14-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211223123011.41044-1-chao.p.peng@linux.intel.com> References: <20211223123011.41044-1-chao.p.peng@linux.intel.com> X-Rspamd-Queue-Id: 7A4F7160030 X-Stat-Signature: tgso3ri3zngqjj49xjinpt4atafq9oxf Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=aDwGCzil; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf08.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 192.55.52.88) smtp.mailfrom=chao.p.peng@linux.intel.com X-Rspamd-Server: rspam02 X-HE-Tag: 1640262753-943581 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This new exit allows user space to handle memory-related errors. Currently it supports two types (KVM_EXIT_MEM_MAP_SHARED/PRIVATE) of errors which are used for shared memory <-> private memory conversion in memory encryption usage. After private memory is enabled, there are two places in KVM that can exit to userspace to trigger private <-> shared conversion: - explicit conversion: happens when guest explicitly calls into KVM to map a range (as private or shared), KVM then exits to userspace to do the map/unmap operations. - implicit conversion: happens in KVM page fault handler. * if the fault is due to a private memory access then causes a userspace exit for a shared->private conversion request when the page has not been allocated in the private memory backend. * If the fault is due to a shared memory access then causes a userspace exit for a private->shared conversion request when the page has already been allocated in the private memory backend. Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- include/uapi/linux/kvm.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 41434322fa23..d68db3b2eeec 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -243,6 +243,18 @@ struct kvm_xen_exit { } u; }; +struct kvm_memory_exit { +#define KVM_EXIT_MEM_MAP_SHARED 1 +#define KVM_EXIT_MEM_MAP_PRIVATE 2 + __u32 type; + union { + struct { + __u64 gpa; + __u64 size; + } map; + } u; +}; + #define KVM_S390_GET_SKEYS_NONE 1 #define KVM_S390_SKEYS_MAX 1048576 @@ -282,6 +294,7 @@ struct kvm_xen_exit { #define KVM_EXIT_X86_BUS_LOCK 33 #define KVM_EXIT_XEN 34 #define KVM_EXIT_RISCV_SBI 35 +#define KVM_EXIT_MEMORY_ERROR 36 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -499,6 +512,8 @@ struct kvm_run { unsigned long args[6]; unsigned long ret[2]; } riscv_sbi; + /* KVM_EXIT_MEMORY_ERROR */ + struct kvm_memory_exit mem; /* Fix the size of the union. */ char padding[256]; }; From patchwork Thu Dec 23 12:30:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12698232 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86BC8C433F5 for ; Thu, 23 Dec 2021 12:32:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1158D6B008C; Thu, 23 Dec 2021 07:32:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C4886B0092; Thu, 23 Dec 2021 07:32:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E80A66B0093; Thu, 23 Dec 2021 07:32:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0162.hostedemail.com [216.40.44.162]) by kanga.kvack.org (Postfix) with ESMTP id D8F1E6B008C for ; Thu, 23 Dec 2021 07:32:49 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 92AE68249980 for ; Thu, 23 Dec 2021 12:32:49 +0000 (UTC) X-FDA: 78948998058.20.E94C2F3 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf11.hostedemail.com (Postfix) with ESMTP id 65E744002D for ; Thu, 23 Dec 2021 12:32:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640262768; x=1671798768; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=pB0U4IoAn2FVqOKKSo9Pzwd6Ru3anEta8cgDh0MsBiY=; b=DJMjQri0C/KLVbTgf8Bp68e2tjO9NT5yTO0iDg27BbHzJ6PM8J09lbau eFJncyypLlboOPgTTSe2zLveyBsCcFHRmtj+EWeH4ZTJspYwjYja2auRD yAGKU5sI9yBdhTTj1V4QPHvgSDQz4Z/kKVURVKRDExtnu3zJ5NYQVrTT8 m7KpGDAhZr/uHLz3/Ix6MviaGtRK3UuqqFPbYwOhKv1Loi+4z0xNbubu2 C9WHQ6TCYN8CxBs5kt7Hzs2LLWAv1xPbrFenl7zcEZvP7PGdtdkQTJaqq sYu2lfm/rm3Omt7BHBX3F+v336ekciS9OP4Fc85ggVRUZqlgvKwpqzI0t g==; X-IronPort-AV: E=McAfee;i="6200,9189,10206"; a="240769773" X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="240769773" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Dec 2021 04:32:47 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="522079079" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 23 Dec 2021 04:32:39 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [PATCH v3 kvm/queue 14/16] KVM: Handle page fault for private memory Date: Thu, 23 Dec 2021 20:30:09 +0800 Message-Id: <20211223123011.41044-15-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211223123011.41044-1-chao.p.peng@linux.intel.com> References: <20211223123011.41044-1-chao.p.peng@linux.intel.com> Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=DJMjQri0; spf=none (imf11.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.65) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Queue-Id: 65E744002D X-Stat-Signature: ep6qdkfe1pcoa3qh9tzs44yhwuoyma7y X-Rspamd-Server: rspam04 X-HE-Tag: 1640262765-838187 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When a page fault from the secondary page table while the guest is running happens in a memslot with KVM_MEM_PRIVATE, we need go different paths for private access and shared access. - For private access, KVM checks if the page is already allocated in the memory backend, if yes KVM establishes the mapping, otherwise exits to userspace to convert a shared page to private one. - For shared access, KVM also checks if the page is already allocated in the memory backend, if yes then exit to userspace to convert a private page to shared one, otherwise it's treated as a traditional hva-based shared memory, KVM lets existing code to obtain a pfn with get_user_pages() and establish the mapping. The above code assume private memory is persistent and pre-allocated in the memory backend so KVM can use this information as an indicator for a page is private or shared. The above check is then performed by calling kvm_memfd_get_pfn() which currently is implemented as a pagecache search but in theory that can be implemented differently (i.e. when the page is even not mapped into host pagecache there should be some different implementation). Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- arch/x86/kvm/mmu/mmu.c | 73 ++++++++++++++++++++++++++++++++-- arch/x86/kvm/mmu/paging_tmpl.h | 11 +++-- 2 files changed, 77 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 2856eb662a21..fbcdf62f8281 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2920,6 +2920,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, if (max_level == PG_LEVEL_4K) return PG_LEVEL_4K; + if (kvm_slot_is_private(slot)) + return max_level; + host_level = host_pfn_mapping_level(kvm, gfn, pfn, slot); return min(host_level, max_level); } @@ -3950,7 +3953,59 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch); } -static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, int *r) +static bool kvm_vcpu_is_private_gfn(struct kvm_vcpu *vcpu, gfn_t gfn) +{ + /* + * At this time private gfn has not been supported yet. Other patch + * that enables it should change this. + */ + return false; +} + +static bool kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault, + bool *is_private_pfn, int *r) +{ + int order; + int mem_convert_type; + struct kvm_memory_slot *slot = fault->slot; + long pfn = kvm_memfd_get_pfn(slot, fault->gfn, &order); + + if (kvm_vcpu_is_private_gfn(vcpu, fault->addr >> PAGE_SHIFT)) { + if (pfn < 0) + mem_convert_type = KVM_EXIT_MEM_MAP_PRIVATE; + else { + fault->pfn = pfn; + if (slot->flags & KVM_MEM_READONLY) + fault->map_writable = false; + else + fault->map_writable = true; + + if (order == 0) + fault->max_level = PG_LEVEL_4K; + *is_private_pfn = true; + *r = RET_PF_FIXED; + return true; + } + } else { + if (pfn < 0) + return false; + + kvm_memfd_put_pfn(pfn); + mem_convert_type = KVM_EXIT_MEM_MAP_SHARED; + } + + vcpu->run->exit_reason = KVM_EXIT_MEMORY_ERROR; + vcpu->run->mem.type = mem_convert_type; + vcpu->run->mem.u.map.gpa = fault->gfn << PAGE_SHIFT; + vcpu->run->mem.u.map.size = PAGE_SIZE; + fault->pfn = -1; + *r = -1; + return true; +} + +static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, + bool *is_private_pfn, int *r) { struct kvm_memory_slot *slot = fault->slot; bool async; @@ -3984,6 +4039,10 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, } } + if (kvm_slot_is_private(slot) && + kvm_faultin_pfn_private(vcpu, fault, is_private_pfn, r)) + return *r == RET_PF_FIXED ? false : true; + async = false; fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async, fault->write, &fault->map_writable, @@ -4044,6 +4103,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu); unsigned long mmu_seq; + bool is_private_pfn = false; int r; fault->gfn = fault->addr >> PAGE_SHIFT; @@ -4063,7 +4123,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); - if (kvm_faultin_pfn(vcpu, fault, &r)) + if (kvm_faultin_pfn(vcpu, fault, &is_private_pfn, &r)) return r; if (handle_abnormal_pfn(vcpu, fault, ACC_ALL, &r)) @@ -4076,7 +4136,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault else write_lock(&vcpu->kvm->mmu_lock); - if (is_page_fault_stale(vcpu, fault, mmu_seq)) + if (!is_private_pfn && is_page_fault_stale(vcpu, fault, mmu_seq)) goto out_unlock; r = make_mmu_pages_available(vcpu); @@ -4093,7 +4153,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault read_unlock(&vcpu->kvm->mmu_lock); else write_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(fault->pfn); + + if (is_private_pfn) + kvm_memfd_put_pfn(fault->pfn); + else + kvm_release_pfn_clean(fault->pfn); + return r; } diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 5b5bdac97c7b..640fd1e2fe4c 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -825,6 +825,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault int r; unsigned long mmu_seq; bool is_self_change_mapping; + bool is_private_pfn = false; + pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code); WARN_ON_ONCE(fault->is_tdp); @@ -873,7 +875,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); - if (kvm_faultin_pfn(vcpu, fault, &r)) + if (kvm_faultin_pfn(vcpu, fault, &is_private_pfn, &r)) return r; if (handle_abnormal_pfn(vcpu, fault, walker.pte_access, &r)) @@ -901,7 +903,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault r = RET_PF_RETRY; write_lock(&vcpu->kvm->mmu_lock); - if (is_page_fault_stale(vcpu, fault, mmu_seq)) + if (!is_private_pfn && is_page_fault_stale(vcpu, fault, mmu_seq)) goto out_unlock; kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT); @@ -913,7 +915,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault out_unlock: write_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(fault->pfn); + if (is_private_pfn) + kvm_memfd_put_pfn(fault->pfn); + else + kvm_release_pfn_clean(fault->pfn); return r; } From patchwork Thu Dec 23 12:30:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12698233 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88E81C433F5 for ; Thu, 23 Dec 2021 12:32:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D978F6B0093; Thu, 23 Dec 2021 07:32:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D47696B0095; Thu, 23 Dec 2021 07:32:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C0EBC6B0096; Thu, 23 Dec 2021 07:32:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0065.hostedemail.com [216.40.44.65]) by kanga.kvack.org (Postfix) with ESMTP id B1C6F6B0093 for ; Thu, 23 Dec 2021 07:32:56 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 7A1BC181AC9C6 for ; Thu, 23 Dec 2021 12:32:56 +0000 (UTC) X-FDA: 78948998352.30.3AC0999 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf16.hostedemail.com (Postfix) with ESMTP id A699318003D for ; Thu, 23 Dec 2021 12:32:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640262775; x=1671798775; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=ZDfyC3JzEks8dbWtdmTIC7qG6w7nH+1ENlm6SIYFT0Y=; b=jkBfnTd0y0UF7NCbe8HsJ4BkzOciZR8j9B86t+PXnqQaPdVu1MrA+jOM 6/4ZfxEMho+yAfH6/chA0kTAOG0wG9WShDoCmpVVcx5vDJQ2v3KsWSX6p 9HSf7FICjxdnymtLH78rD00ZmlIywZBKBSOubXs1qYqLNFsdEkZYvam8S inGY7a5qlcfVO/8u9nWC3TK+z9ufowKA5O007MEeNKgSy37qQ4/0mkpKg IJ3g9W/9bPD9UBx5D70tPRGov1RED+BAxcmI0P2pi6L1/WxTdr5lWqipF 99sDU0QWKfKgu+UgXQo53Jr2s1Klg08vS46oEXtYTCzVn5wQNvdIIg6P/ Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10206"; a="304188009" X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="304188009" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Dec 2021 04:32:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="522079140" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 23 Dec 2021 04:32:47 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [PATCH v3 kvm/queue 15/16] KVM: Use kvm_userspace_memory_region_ext Date: Thu, 23 Dec 2021 20:30:10 +0800 Message-Id: <20211223123011.41044-16-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211223123011.41044-1-chao.p.peng@linux.intel.com> References: <20211223123011.41044-1-chao.p.peng@linux.intel.com> X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: A699318003D X-Stat-Signature: foebcc4otmcbmdd9kx1sx6sowy3ony93 Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=jkBfnTd0; spf=none (imf16.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.100) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1640262775-569920 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Use the new extended memslot structure kvm_userspace_memory_region_ext which includes two additional fd/ofs fields comparing to the current kvm_userspace_memory_region. The fields fd/ofs will be copied from userspace only when KVM_MEM_PRIVATE is set. Internal the KVM we change all existing kvm_userspace_memory_region to kvm_userspace_memory_region_ext since the new extended structure covers all the existing fields in kvm_userspace_memory_region. Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- arch/x86/kvm/x86.c | 2 +- include/linux/kvm_host.h | 4 ++-- virt/kvm/kvm_main.c | 19 +++++++++++++------ 3 files changed, 16 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 42bde45a1bc2..52942195def3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11551,7 +11551,7 @@ void __user * __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, } for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { - struct kvm_userspace_memory_region m; + struct kvm_userspace_memory_region_ext m; m.slot = id | (i << 16); m.flags = 0; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ad89a0e8bf6b..fabab3b77d57 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -981,9 +981,9 @@ enum kvm_mr_change { }; int kvm_set_memory_region(struct kvm *kvm, - const struct kvm_userspace_memory_region *mem); + const struct kvm_userspace_memory_region_ext *mem); int __kvm_set_memory_region(struct kvm *kvm, - const struct kvm_userspace_memory_region *mem); + const struct kvm_userspace_memory_region_ext *mem); void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot); void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen); int kvm_arch_prepare_memory_region(struct kvm *kvm, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 36dd2adcd7fc..cf8dcb3b8c7f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1514,7 +1514,7 @@ static void kvm_replace_memslot(struct kvm *kvm, } } -static int check_memory_region_flags(const struct kvm_userspace_memory_region *mem) +static int check_memory_region_flags(const struct kvm_userspace_memory_region_ext *mem) { u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES; @@ -1907,7 +1907,7 @@ static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id, * Must be called holding kvm->slots_lock for write. */ int __kvm_set_memory_region(struct kvm *kvm, - const struct kvm_userspace_memory_region *mem) + const struct kvm_userspace_memory_region_ext *mem) { struct kvm_memory_slot *old, *new; struct kvm_memslots *slots; @@ -2011,7 +2011,7 @@ int __kvm_set_memory_region(struct kvm *kvm, EXPORT_SYMBOL_GPL(__kvm_set_memory_region); int kvm_set_memory_region(struct kvm *kvm, - const struct kvm_userspace_memory_region *mem) + const struct kvm_userspace_memory_region_ext *mem) { int r; @@ -2023,7 +2023,7 @@ int kvm_set_memory_region(struct kvm *kvm, EXPORT_SYMBOL_GPL(kvm_set_memory_region); static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, - struct kvm_userspace_memory_region *mem) + struct kvm_userspace_memory_region_ext *mem) { if ((u16)mem->slot >= KVM_USER_MEM_SLOTS) return -EINVAL; @@ -4569,12 +4569,19 @@ static long kvm_vm_ioctl(struct file *filp, break; } case KVM_SET_USER_MEMORY_REGION: { - struct kvm_userspace_memory_region kvm_userspace_mem; + struct kvm_userspace_memory_region_ext kvm_userspace_mem; r = -EFAULT; if (copy_from_user(&kvm_userspace_mem, argp, - sizeof(kvm_userspace_mem))) + sizeof(struct kvm_userspace_memory_region))) goto out; + if (kvm_userspace_mem.flags & KVM_MEM_PRIVATE) { + int offset = offsetof( + struct kvm_userspace_memory_region_ext, ofs); + if (copy_from_user(&kvm_userspace_mem.ofs, argp + offset, + sizeof(kvm_userspace_mem) - offset)) + goto out; + } r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem); break; From patchwork Thu Dec 23 12:30:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12698234 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95CD0C433F5 for ; Thu, 23 Dec 2021 12:33:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 310BD6B0096; Thu, 23 Dec 2021 07:33:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2BFD76B0098; Thu, 23 Dec 2021 07:33:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1877F6B0099; Thu, 23 Dec 2021 07:33:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0005.hostedemail.com [216.40.44.5]) by kanga.kvack.org (Postfix) with ESMTP id 07BE26B0096 for ; Thu, 23 Dec 2021 07:33:05 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id CB4B389926 for ; Thu, 23 Dec 2021 12:33:04 +0000 (UTC) X-FDA: 78948998688.19.47D87E1 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf19.hostedemail.com (Postfix) with ESMTP id 2ADB01A0015 for ; Thu, 23 Dec 2021 12:33:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640262784; x=1671798784; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=gr3qIQtqhPgZ2sWGOjcNk2j5JakW0dks3WgAyQ1CBMA=; b=G+K84tXEeIZIud3P8up/EW3dCApAyQXpPOHT/sWN+dn6xs9ClB4Ujl7U jUXeEd/gqxY44SOwkuK4Y+SLO8fL33b0r4B8Urxu33SIGzVEqVIdCvSCT hI/Z08QMtakfWHMzaYu1PyA7yhS8ZnsT78FX0KC/EoC1KUfdIagDsmcua Zy1c26JRZLSBvdVQ8W214xOp/aNzZrWYlILtdd6Lfamlt59slwwqakcQV 7Q9TO2OtUBAIppO+HYomPJSxl8YsIFlxqqhQMFus541K1Abia4Hacy7Po N6dcFPSZc9Y2dAQ/sOLHdMETobi0McrnI5WQIItg1Ol5QqZhBNj+iCAUV A==; X-IronPort-AV: E=McAfee;i="6200,9189,10206"; a="239574362" X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="239574362" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Dec 2021 04:33:02 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,229,1635231600"; d="scan'208";a="522079184" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 23 Dec 2021 04:32:54 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [PATCH v3 kvm/queue 16/16] KVM: Register/unregister private memory slot to memfd Date: Thu, 23 Dec 2021 20:30:11 +0800 Message-Id: <20211223123011.41044-17-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211223123011.41044-1-chao.p.peng@linux.intel.com> References: <20211223123011.41044-1-chao.p.peng@linux.intel.com> X-Stat-Signature: pwxo8te78kxw4ok7z1itqx1ecexe3xyw X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 2ADB01A0015 Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=G+K84tXE; spf=none (imf19.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 192.55.52.120) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1640262783-58023 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Expose KVM_MEM_PRIVATE flag and register/unregister private memory slot to memfd when userspace sets the flag. KVM_MEM_PRIVATE is disallowed by default but architecture code can turn on it by implementing kvm_arch_private_memory_supported(). Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 34 ++++++++++++++++++++++++++++++++-- 2 files changed, 33 insertions(+), 2 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index fabab3b77d57..5173c52e70d4 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1229,6 +1229,7 @@ bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu); int kvm_arch_post_init_vm(struct kvm *kvm); void kvm_arch_pre_destroy_vm(struct kvm *kvm); int kvm_arch_create_vm_debugfs(struct kvm *kvm); +bool kvm_arch_private_memory_supported(struct kvm *kvm); #ifndef __KVM_HAVE_ARCH_VM_ALLOC /* diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index cf8dcb3b8c7f..1caebded52c4 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1514,10 +1514,19 @@ static void kvm_replace_memslot(struct kvm *kvm, } } -static int check_memory_region_flags(const struct kvm_userspace_memory_region_ext *mem) +bool __weak kvm_arch_private_memory_supported(struct kvm *kvm) +{ + return false; +} + +static int check_memory_region_flags(struct kvm *kvm, + const struct kvm_userspace_memory_region_ext *mem) { u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES; + if (kvm_arch_private_memory_supported(kvm)) + valid_flags |= KVM_MEM_PRIVATE; + #ifdef __KVM_HAVE_READONLY_MEM valid_flags |= KVM_MEM_READONLY; #endif @@ -1756,6 +1765,8 @@ static void kvm_delete_memslot(struct kvm *kvm, struct kvm_memory_slot *old, struct kvm_memory_slot *invalid_slot) { + if (old->flags & KVM_MEM_PRIVATE) + kvm_memfd_unregister(old); /* * Remove the old memslot (in the inactive memslots) by passing NULL as * the "new" slot, and for the invalid version in the active slots. @@ -1836,6 +1847,14 @@ static int kvm_set_memslot(struct kvm *kvm, kvm_invalidate_memslot(kvm, old, invalid_slot); } + if (new->flags & KVM_MEM_PRIVATE && change == KVM_MR_CREATE) { + r = kvm_memfd_register(kvm, new); + if (r) { + mutex_unlock(&kvm->slots_arch_lock); + return r; + } + } + r = kvm_prepare_memory_region(kvm, old, new, change); if (r) { /* @@ -1850,6 +1869,10 @@ static int kvm_set_memslot(struct kvm *kvm, } else { mutex_unlock(&kvm->slots_arch_lock); } + + if (new->flags & KVM_MEM_PRIVATE && change == KVM_MR_CREATE) + kvm_memfd_unregister(new); + return r; } @@ -1917,7 +1940,7 @@ int __kvm_set_memory_region(struct kvm *kvm, int as_id, id; int r; - r = check_memory_region_flags(mem); + r = check_memory_region_flags(kvm, mem); if (r) return r; @@ -1974,6 +1997,10 @@ int __kvm_set_memory_region(struct kvm *kvm, if ((kvm->nr_memslot_pages + npages) < kvm->nr_memslot_pages) return -EINVAL; } else { /* Modify an existing slot. */ + /* Private memslots are immutable, they can only be deleted. */ + if (mem->flags & KVM_MEM_PRIVATE) + return -EINVAL; + if ((mem->userspace_addr != old->userspace_addr) || (npages != old->npages) || ((mem->flags ^ old->flags) & KVM_MEM_READONLY)) @@ -2002,6 +2029,9 @@ int __kvm_set_memory_region(struct kvm *kvm, new->npages = npages; new->flags = mem->flags; new->userspace_addr = mem->userspace_addr; + new->fd = mem->fd; + new->file = NULL; + new->ofs = mem->ofs; r = kvm_set_memslot(kvm, old, new, change); if (r)