From patchwork Wed Jul 6 08:20:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12907545 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DFD2CCCA482 for ; Wed, 6 Jul 2022 08:24:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6D9DB6B0074; Wed, 6 Jul 2022 04:24:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 662EA8E0002; Wed, 6 Jul 2022 04:24:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5038C8E0001; Wed, 6 Jul 2022 04:24:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 414876B0074 for ; Wed, 6 Jul 2022 04:24:28 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 19AF633600 for ; Wed, 6 Jul 2022 08:24:28 +0000 (UTC) X-FDA: 79655988216.13.76E6A87 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by imf13.hostedemail.com (Postfix) with ESMTP id 0E96420048 for ; Wed, 6 Jul 2022 08:24:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657095867; x=1688631867; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4RqYsqOgMgPs4JDgf0SUKlW6JB5EWiu8UDnwronwhT8=; b=mXb0aXZMaHVYftWOAFiv+C1p4rrxILwb7/4pXCTr5MGG326XAX8tH4wm q9/ZUoD7/TMuCVWvZgmmE9L5Ke5JzVV8KI4QzLaDLsplUaQASYUhZOWCm v6nJKIDUIKcD4g6TXcN8qaYgtusOsmIpusoqU4ePmPNSx/Rgm+z+RY8Y2 xwjL61yp4SqO0B2wPW1LF8e09uCB7eFjKC3HjV6zUJsuIuVsrxHzjI91h N7wdlR34nlEnQIRzdHgR54mqSTIA2qXkSH69L07Txtb53hah26bUDoAcD 89ym/3bjDoth0ooPI6XrEOt5GiBMiF+TLLOaVcUoIO8Tba60xg8Kgxdo6 w==; X-IronPort-AV: E=McAfee;i="6400,9594,10399"; a="345365428" X-IronPort-AV: E=Sophos;i="5.92,249,1650956400"; d="scan'208";a="345365428" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jul 2022 01:24:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,249,1650956400"; d="scan'208";a="567967876" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga006.jf.intel.com with ESMTP; 06 Jul 2022 01:24:08 -0700 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, linux-kselftest@vger.kernel.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , Michael Roth , mhocko@suse.com, Muchun Song Subject: [PATCH v7 03/14] mm: Introduce memfile_notifier Date: Wed, 6 Jul 2022 16:20:05 +0800 Message-Id: <20220706082016.2603916-4-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220706082016.2603916-1-chao.p.peng@linux.intel.com> References: <20220706082016.2603916-1-chao.p.peng@linux.intel.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657095867; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pT5HPA3gvGchH4AOJL3pHVOjtYCjYndK+txwR6gD1MM=; b=Q5DMBbZpbXokaS80uJvPIG/RnzgnzuTkn3HibV4pZ0JhWIdPplZcDlvWQQXZ1Rh6cKUWCe rfP/otIL8HUfNt6frqWe5YtqHOx342tOnJYI1K1mp7LZnPuyNifjLm48Wbltbt3igoShEl rD6we/oL+XeXJx/9uY4BpaqohoYIO6Y= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657095867; a=rsa-sha256; cv=none; b=q17vYFDgZSRJFp8kxAaOzIqRV0O4SphN6nXs4eNb2AaoUFpUZz4PYJPy0h052UrPLdPDQw 3eOjQBO0Toeb/gNDlHKawm5yFwF7fIpmQsYPZjRX31A17uHbDw3ctRxBCS4vhBr/Qva4cl fCXFDoM65fF5udHGfledxOP90q4EWv0= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=mXb0aXZM; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf13.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.31) smtp.mailfrom=chao.p.peng@linux.intel.com X-Rspam-User: Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=mXb0aXZM; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf13.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.31) smtp.mailfrom=chao.p.peng@linux.intel.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 0E96420048 X-Stat-Signature: 5wmr144w9i7nba5wiwf9k64sjy9wx87w X-HE-Tag: 1657095866-607061 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch introduces memfile_notifier facility so existing memory file subsystems (e.g. tmpfs/hugetlbfs) can provide memory pages to allow a third kernel component to make use of memory bookmarked in the memory file and gets notified when the pages in the memory file become invalidated. It will be used for KVM to use a file descriptor as the guest memory backing store and KVM will use this memfile_notifier interface to interact with memory file subsystems. In the future there might be other consumers (e.g. VFIO with encrypted device memory). It consists below components: - memfile_backing_store: Each supported memory file subsystem can be implemented as a memory backing store which bookmarks memory and provides callbacks for other kernel systems (memfile_notifier consumers) to interact with. - memfile_notifier: memfile_notifier consumers defines callbacks and associate them to a file using memfile_register_notifier(). - memfile_node: A memfile_node is associated with the file (inode) from the backing store and includes feature flags and a list of registered memfile_notifier for notifying. In KVM usages, userspace is in charge of guest memory lifecycle: it first allocates pages in memory backing store and then passes the fd to KVM and lets KVM register memory slot to memory backing store via memfile_register_notifier. Co-developed-by: Kirill A. Shutemov Signed-off-by: Kirill A. Shutemov Signed-off-by: Chao Peng --- include/linux/memfile_notifier.h | 93 ++++++++++++++++++++++++ mm/Kconfig | 4 + mm/Makefile | 1 + mm/memfile_notifier.c | 121 +++++++++++++++++++++++++++++++ 4 files changed, 219 insertions(+) create mode 100644 include/linux/memfile_notifier.h create mode 100644 mm/memfile_notifier.c diff --git a/include/linux/memfile_notifier.h b/include/linux/memfile_notifier.h new file mode 100644 index 000000000000..c5d66fd8ba53 --- /dev/null +++ b/include/linux/memfile_notifier.h @@ -0,0 +1,93 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_MEMFILE_NOTIFIER_H +#define _LINUX_MEMFILE_NOTIFIER_H + +#include +#include +#include +#include +#include + +/* memory in the file is inaccessible from userspace (e.g. read/write/mmap) */ +#define MEMFILE_F_USER_INACCESSIBLE BIT(0) +/* memory in the file is unmovable (e.g. via pagemigration)*/ +#define MEMFILE_F_UNMOVABLE BIT(1) +/* memory in the file is unreclaimable (e.g. via kswapd) */ +#define MEMFILE_F_UNRECLAIMABLE BIT(2) + +#define MEMFILE_F_ALLOWED_MASK (MEMFILE_F_USER_INACCESSIBLE | \ + MEMFILE_F_UNMOVABLE | \ + MEMFILE_F_UNRECLAIMABLE) + +struct memfile_node { + struct list_head notifiers; /* registered notifiers */ + unsigned long flags; /* MEMFILE_F_* flags */ +}; + +struct memfile_backing_store { + struct list_head list; + spinlock_t lock; + struct memfile_node* (*lookup_memfile_node)(struct file *file); + int (*get_pfn)(struct file *file, pgoff_t offset, pfn_t *pfn, + int *order); + void (*put_pfn)(pfn_t pfn); +}; + +struct memfile_notifier; +struct memfile_notifier_ops { + void (*invalidate)(struct memfile_notifier *notifier, + pgoff_t start, pgoff_t end); +}; + +struct memfile_notifier { + struct list_head list; + struct memfile_notifier_ops *ops; + struct memfile_backing_store *bs; +}; + +static inline void memfile_node_init(struct memfile_node *node) +{ + INIT_LIST_HEAD(&node->notifiers); + node->flags = 0; +} + +#ifdef CONFIG_MEMFILE_NOTIFIER +/* APIs for backing stores */ +extern void memfile_register_backing_store(struct memfile_backing_store *bs); +extern int memfile_node_set_flags(struct file *file, unsigned long flags); +extern void memfile_notifier_invalidate(struct memfile_node *node, + pgoff_t start, pgoff_t end); +/*APIs for notifier consumers */ +extern int memfile_register_notifier(struct file *file, unsigned long flags, + struct memfile_notifier *notifier); +extern void memfile_unregister_notifier(struct memfile_notifier *notifier); + +#else /* !CONFIG_MEMFILE_NOTIFIER */ +static inline void memfile_register_backing_store(struct memfile_backing_store *bs) +{ +} + +static inline int memfile_node_set_flags(struct file *file, unsigned long flags) +{ + return -EOPNOTSUPP; +} + +static inline void memfile_notifier_invalidate(struct memfile_node *node, + pgoff_t start, pgoff_t end) +{ +} + +static inline int memfile_register_notifier(struct file *file, + unsigned long flags, + struct memfile_notifier *notifier) +{ + return -EOPNOTSUPP; +} + +static inline void memfile_unregister_notifier(struct memfile_notifier *notifier) +{ +} + +#endif /* CONFIG_MEMFILE_NOTIFIER */ + +#endif /* _LINUX_MEMFILE_NOTIFIER_H */ diff --git a/mm/Kconfig b/mm/Kconfig index 169e64192e48..19ab9350f5cb 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1130,6 +1130,10 @@ config PTE_MARKER_UFFD_WP purposes. It is required to enable userfaultfd write protection on file-backed memory types like shmem and hugetlbfs. +config MEMFILE_NOTIFIER + bool + select SRCU + source "mm/damon/Kconfig" endmenu diff --git a/mm/Makefile b/mm/Makefile index 6f9ffa968a1a..b7e3fb5fa85b 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -133,3 +133,4 @@ obj-$(CONFIG_PAGE_REPORTING) += page_reporting.o obj-$(CONFIG_IO_MAPPING) += io-mapping.o obj-$(CONFIG_HAVE_BOOTMEM_INFO_NODE) += bootmem_info.o obj-$(CONFIG_GENERIC_IOREMAP) += ioremap.o +obj-$(CONFIG_MEMFILE_NOTIFIER) += memfile_notifier.o diff --git a/mm/memfile_notifier.c b/mm/memfile_notifier.c new file mode 100644 index 000000000000..799d3197903e --- /dev/null +++ b/mm/memfile_notifier.c @@ -0,0 +1,121 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2022 Intel Corporation. + * Chao Peng + */ + +#include +#include +#include + +DEFINE_STATIC_SRCU(memfile_srcu); +static __ro_after_init LIST_HEAD(backing_store_list); + + +void memfile_notifier_invalidate(struct memfile_node *node, + pgoff_t start, pgoff_t end) +{ + struct memfile_notifier *notifier; + int id; + + id = srcu_read_lock(&memfile_srcu); + list_for_each_entry_srcu(notifier, &node->notifiers, list, + srcu_read_lock_held(&memfile_srcu)) { + if (notifier->ops->invalidate) + notifier->ops->invalidate(notifier, start, end); + } + srcu_read_unlock(&memfile_srcu, id); +} + +void __init memfile_register_backing_store(struct memfile_backing_store *bs) +{ + spin_lock_init(&bs->lock); + list_add_tail(&bs->list, &backing_store_list); +} + +static void memfile_node_update_flags(struct file *file, unsigned long flags) +{ + struct address_space *mapping = file_inode(file)->i_mapping; + gfp_t gfp; + + gfp = mapping_gfp_mask(mapping); + if (flags & MEMFILE_F_UNMOVABLE) + gfp &= ~__GFP_MOVABLE; + else + gfp |= __GFP_MOVABLE; + mapping_set_gfp_mask(mapping, gfp); + + if (flags & MEMFILE_F_UNRECLAIMABLE) + mapping_set_unevictable(mapping); + else + mapping_clear_unevictable(mapping); +} + +int memfile_node_set_flags(struct file *file, unsigned long flags) +{ + struct memfile_backing_store *bs; + struct memfile_node *node; + + if (flags & ~MEMFILE_F_ALLOWED_MASK) + return -EINVAL; + + list_for_each_entry(bs, &backing_store_list, list) { + node = bs->lookup_memfile_node(file); + if (node) { + spin_lock(&bs->lock); + node->flags = flags; + spin_unlock(&bs->lock); + memfile_node_update_flags(file, flags); + return 0; + } + } + + return -EOPNOTSUPP; +} + +int memfile_register_notifier(struct file *file, unsigned long flags, + struct memfile_notifier *notifier) +{ + struct memfile_backing_store *bs; + struct memfile_node *node; + struct list_head *list; + + if (!file || !notifier || !notifier->ops) + return -EINVAL; + if (flags & ~MEMFILE_F_ALLOWED_MASK) + return -EINVAL; + + list_for_each_entry(bs, &backing_store_list, list) { + node = bs->lookup_memfile_node(file); + if (node) { + list = &node->notifiers; + notifier->bs = bs; + + spin_lock(&bs->lock); + if (list_empty(list)) + node->flags = flags; + else if (node->flags ^ flags) { + spin_unlock(&bs->lock); + return -EINVAL; + } + + list_add_rcu(¬ifier->list, list); + spin_unlock(&bs->lock); + memfile_node_update_flags(file, flags); + return 0; + } + } + + return -EOPNOTSUPP; +} +EXPORT_SYMBOL_GPL(memfile_register_notifier); + +void memfile_unregister_notifier(struct memfile_notifier *notifier) +{ + spin_lock(¬ifier->bs->lock); + list_del_rcu(¬ifier->list); + spin_unlock(¬ifier->bs->lock); + + synchronize_srcu(&memfile_srcu); +} +EXPORT_SYMBOL_GPL(memfile_unregister_notifier);