diff mbox series

[v7,03/14] mm: Introduce memfile_notifier

Message ID 20220706082016.2603916-4-chao.p.peng@linux.intel.com (mailing list archive)
State New
Headers show
Series KVM: mm: fd-based approach for supporting KVM guest private memory | expand

Commit Message

Chao Peng July 6, 2022, 8:20 a.m. UTC
This patch introduces memfile_notifier facility so existing memory file
subsystems (e.g. tmpfs/hugetlbfs) can provide memory pages to allow a
third kernel component to make use of memory bookmarked in the memory
file and gets notified when the pages in the memory file become
invalidated.

It will be used for KVM to use a file descriptor as the guest memory
backing store and KVM will use this memfile_notifier interface to
interact with memory file subsystems. In the future there might be other
consumers (e.g. VFIO with encrypted device memory).

It consists below components:
 - memfile_backing_store: Each supported memory file subsystem can be
   implemented as a memory backing store which bookmarks memory and
   provides callbacks for other kernel systems (memfile_notifier
   consumers) to interact with.
 - memfile_notifier: memfile_notifier consumers defines callbacks and
   associate them to a file using memfile_register_notifier().
 - memfile_node: A memfile_node is associated with the file (inode) from
   the backing store and includes feature flags and a list of registered
   memfile_notifier for notifying.

In KVM usages, userspace is in charge of guest memory lifecycle: it first
allocates pages in memory backing store and then passes the fd to KVM and
lets KVM register memory slot to memory backing store via
memfile_register_notifier.

Co-developed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 include/linux/memfile_notifier.h |  93 ++++++++++++++++++++++++
 mm/Kconfig                       |   4 +
 mm/Makefile                      |   1 +
 mm/memfile_notifier.c            | 121 +++++++++++++++++++++++++++++++
 4 files changed, 219 insertions(+)
 create mode 100644 include/linux/memfile_notifier.h
 create mode 100644 mm/memfile_notifier.c

Comments

David Hildenbrand Aug. 5, 2022, 1:22 p.m. UTC | #1
On 06.07.22 10:20, Chao Peng wrote:
> This patch introduces memfile_notifier facility so existing memory file
> subsystems (e.g. tmpfs/hugetlbfs) can provide memory pages to allow a
> third kernel component to make use of memory bookmarked in the memory
> file and gets notified when the pages in the memory file become
> invalidated.

Stupid question, but why is this called "memfile_notifier" and not
"memfd_notifier". We're only dealing with memfd's after all ... which
are anonymous files essentially. Or what am I missing? Are there any
other plans for fs than plain memfd support that I am not aware of?

> 
> It will be used for KVM to use a file descriptor as the guest memory
> backing store and KVM will use this memfile_notifier interface to
> interact with memory file subsystems. In the future there might be other
> consumers (e.g. VFIO with encrypted device memory).
> 
> It consists below components:
>  - memfile_backing_store: Each supported memory file subsystem can be
>    implemented as a memory backing store which bookmarks memory and
>    provides callbacks for other kernel systems (memfile_notifier
>    consumers) to interact with.
>  - memfile_notifier: memfile_notifier consumers defines callbacks and
>    associate them to a file using memfile_register_notifier().
>  - memfile_node: A memfile_node is associated with the file (inode) from
>    the backing store and includes feature flags and a list of registered
>    memfile_notifier for notifying.
> 
> In KVM usages, userspace is in charge of guest memory lifecycle: it first
> allocates pages in memory backing store and then passes the fd to KVM and
> lets KVM register memory slot to memory backing store via
> memfile_register_notifier.

Can we add documentation/description in any form how the different
functions exposed in linux/memfile_notifier.h are supposed to be used?

Staring at memfile_node_set_flags() and memfile_notifier_invalidate()
it's not immediately clear to me who's supposed to call that and under
which conditions.
Chao Peng Aug. 10, 2022, 9:22 a.m. UTC | #2
On Fri, Aug 05, 2022 at 03:22:58PM +0200, David Hildenbrand wrote:
> On 06.07.22 10:20, Chao Peng wrote:
> > This patch introduces memfile_notifier facility so existing memory file
> > subsystems (e.g. tmpfs/hugetlbfs) can provide memory pages to allow a
> > third kernel component to make use of memory bookmarked in the memory
> > file and gets notified when the pages in the memory file become
> > invalidated.
> 
> Stupid question, but why is this called "memfile_notifier" and not
> "memfd_notifier". We're only dealing with memfd's after all ... which
> are anonymous files essentially. Or what am I missing? Are there any
> other plans for fs than plain memfd support that I am not aware of?

There were some discussions on this in v3.
  https://lkml.org/lkml/2021/12/28/484
Sean commented it's OK to abstract it from memfd but he also wants the
kAPI (name) should not bind to memfd to make room for future non-memfd
usages.

> 
> > 
> > It will be used for KVM to use a file descriptor as the guest memory
> > backing store and KVM will use this memfile_notifier interface to
> > interact with memory file subsystems. In the future there might be other
> > consumers (e.g. VFIO with encrypted device memory).
> > 
> > It consists below components:
> >  - memfile_backing_store: Each supported memory file subsystem can be
> >    implemented as a memory backing store which bookmarks memory and
> >    provides callbacks for other kernel systems (memfile_notifier
> >    consumers) to interact with.
> >  - memfile_notifier: memfile_notifier consumers defines callbacks and
> >    associate them to a file using memfile_register_notifier().
> >  - memfile_node: A memfile_node is associated with the file (inode) from
> >    the backing store and includes feature flags and a list of registered
> >    memfile_notifier for notifying.
> > 
> > In KVM usages, userspace is in charge of guest memory lifecycle: it first
> > allocates pages in memory backing store and then passes the fd to KVM and
> > lets KVM register memory slot to memory backing store via
> > memfile_register_notifier.
> 
> Can we add documentation/description in any form how the different
> functions exposed in linux/memfile_notifier.h are supposed to be used?

Yeah, code comments can be added.

> 
> Staring at memfile_node_set_flags() and memfile_notifier_invalidate()
> it's not immediately clear to me who's supposed to call that and under
> which conditions.

I will also amend the commit message.

Chao
> 
> -- 
> Thanks,
> 
> David / dhildenb
David Hildenbrand Aug. 10, 2022, 10:05 a.m. UTC | #3
On 10.08.22 11:22, Chao Peng wrote:
> On Fri, Aug 05, 2022 at 03:22:58PM +0200, David Hildenbrand wrote:
>> On 06.07.22 10:20, Chao Peng wrote:
>>> This patch introduces memfile_notifier facility so existing memory file
>>> subsystems (e.g. tmpfs/hugetlbfs) can provide memory pages to allow a
>>> third kernel component to make use of memory bookmarked in the memory
>>> file and gets notified when the pages in the memory file become
>>> invalidated.
>>
>> Stupid question, but why is this called "memfile_notifier" and not
>> "memfd_notifier". We're only dealing with memfd's after all ... which
>> are anonymous files essentially. Or what am I missing? Are there any
>> other plans for fs than plain memfd support that I am not aware of?
> 
> There were some discussions on this in v3.
>   https://lkml.org/lkml/2021/12/28/484
> Sean commented it's OK to abstract it from memfd but he also wants the
> kAPI (name) should not bind to memfd to make room for future non-memfd
> usages.

Sorry, but how is "memfile" any better? memfd abstracted to memfile?! :)

I understand Sean's suggestion about abstracting, but if the new name
makes it harder to grasp and there isn't really an alternative to memfd
in sight, I'm not so sure I enjoy the tried abstraction here.

Otherwise we'd have to get creative now and discuss something like
"file_population_notifer" or "mapping_population_notifer" and I am not
sure that our time is well spent doing so right now.

... as this is kernel-internal, we can always adjust the name as we
please later, once we *actually* now what the abstraction should be.
Until then I'd suggest to KIS and soft-glue this to memfd.

Or am I missing something important?
Sean Christopherson Aug. 10, 2022, 2:38 p.m. UTC | #4
+Will

On Wed, Aug 10, 2022, David Hildenbrand wrote:
> On 10.08.22 11:22, Chao Peng wrote:
> > On Fri, Aug 05, 2022 at 03:22:58PM +0200, David Hildenbrand wrote:
> >> On 06.07.22 10:20, Chao Peng wrote:
> >>> This patch introduces memfile_notifier facility so existing memory file
> >>> subsystems (e.g. tmpfs/hugetlbfs) can provide memory pages to allow a
> >>> third kernel component to make use of memory bookmarked in the memory
> >>> file and gets notified when the pages in the memory file become
> >>> invalidated.
> >>
> >> Stupid question, but why is this called "memfile_notifier" and not
> >> "memfd_notifier". We're only dealing with memfd's after all ... which
> >> are anonymous files essentially. Or what am I missing? Are there any
> >> other plans for fs than plain memfd support that I am not aware of?
> > 
> > There were some discussions on this in v3.
> >   https://lkml.org/lkml/2021/12/28/484
> > Sean commented it's OK to abstract it from memfd but he also wants the
> > kAPI (name) should not bind to memfd to make room for future non-memfd
> > usages.
> 
> Sorry, but how is "memfile" any better? memfd abstracted to memfile?! :)

FWIW, I don't really like the memfile name either.

> I understand Sean's suggestion about abstracting, but if the new name
> makes it harder to grasp and there isn't really an alternative to memfd
> in sight, I'm not so sure I enjoy the tried abstraction here.

ARM's pKVM implementation is potentially (hopefully) going to switch to this API
(as a consumer) sooner than later.  If they anticipate being able to use memfd,
then there's unlikely to be a second backing type any time soon.

Quentin, Will?
 
> Otherwise we'd have to get creative now and discuss something like
> "file_population_notifer" or "mapping_population_notifer" and I am not
> sure that our time is well spent doing so right now.
> 
> ... as this is kernel-internal, we can always adjust the name as we
> please later, once we *actually* now what the abstraction should be.
> Until then I'd suggest to KIS and soft-glue this to memfd.
> 
> Or am I missing something important?

I don't think you're missing anything.  I'd still prefer a name that doesn't couple
KVM to memfd, but it's not a sticking point, and I've never been able to come up
with a better name...

With a little bit of cleverness I think we can keep the coupling in KVM to a
minimum, which is what I really care about.
Quentin Perret Aug. 11, 2022, 12:27 p.m. UTC | #5
+CC Fuad

On Wednesday 10 Aug 2022 at 14:38:43 (+0000), Sean Christopherson wrote:
> > I understand Sean's suggestion about abstracting, but if the new name
> > makes it harder to grasp and there isn't really an alternative to memfd
> > in sight, I'm not so sure I enjoy the tried abstraction here.
> 
> ARM's pKVM implementation is potentially (hopefully) going to switch to this API
> (as a consumer) sooner than later.  If they anticipate being able to use memfd,
> then there's unlikely to be a second backing type any time soon.
> 
> Quentin, Will?

Yep, Fuad is currently trying to port the pKVM mm stuff on top of this
series to see how well it fits, so stay tuned. I think there is still
some room for discussion around page conversions (private->shared etc),
and we'll need a clearer idea of what the code might look like to have a
constructive discussion, but so far it does seem like using a memfd (the
new private one or perhaps just memfd_secret, to be discussed) + memfd
notifiers is a promising option.
Chao Peng Aug. 11, 2022, 1:39 p.m. UTC | #6
On Thu, Aug 11, 2022 at 12:27:56PM +0000, Quentin Perret wrote:
> +CC Fuad
> 
> On Wednesday 10 Aug 2022 at 14:38:43 (+0000), Sean Christopherson wrote:
> > > I understand Sean's suggestion about abstracting, but if the new name
> > > makes it harder to grasp and there isn't really an alternative to memfd
> > > in sight, I'm not so sure I enjoy the tried abstraction here.
> > 
> > ARM's pKVM implementation is potentially (hopefully) going to switch to this API
> > (as a consumer) sooner than later.  If they anticipate being able to use memfd,
> > then there's unlikely to be a second backing type any time soon.
> > 
> > Quentin, Will?
> 
> Yep, Fuad is currently trying to port the pKVM mm stuff on top of this
> series to see how well it fits, so stay tuned.

Good to hear that.

>I think there is still
> some room for discussion around page conversions (private->shared etc),
> and we'll need a clearer idea of what the code might look like to have a
> constructive discussion,

That's fine. Looking forward to your feedbacks.

>but so far it does seem like using a memfd (the
> new private one or perhaps just memfd_secret, to be discussed) + memfd
> notifiers is a promising option.

If it still memfd (even memfd_secret), maybe we can use the name
memfd_notifier?

Chao
diff mbox series

Patch

diff --git a/include/linux/memfile_notifier.h b/include/linux/memfile_notifier.h
new file mode 100644
index 000000000000..c5d66fd8ba53
--- /dev/null
+++ b/include/linux/memfile_notifier.h
@@ -0,0 +1,93 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_MEMFILE_NOTIFIER_H
+#define _LINUX_MEMFILE_NOTIFIER_H
+
+#include <linux/pfn_t.h>
+#include <linux/rculist.h>
+#include <linux/spinlock.h>
+#include <linux/srcu.h>
+#include <linux/fs.h>
+
+/* memory in the file is inaccessible from userspace (e.g. read/write/mmap) */
+#define MEMFILE_F_USER_INACCESSIBLE	BIT(0)
+/* memory in the file is unmovable (e.g. via pagemigration)*/
+#define MEMFILE_F_UNMOVABLE		BIT(1)
+/* memory in the file is unreclaimable (e.g. via kswapd) */
+#define MEMFILE_F_UNRECLAIMABLE		BIT(2)
+
+#define MEMFILE_F_ALLOWED_MASK		(MEMFILE_F_USER_INACCESSIBLE | \
+					MEMFILE_F_UNMOVABLE | \
+					MEMFILE_F_UNRECLAIMABLE)
+
+struct memfile_node {
+	struct list_head	notifiers;	/* registered notifiers */
+	unsigned long		flags;		/* MEMFILE_F_* flags */
+};
+
+struct memfile_backing_store {
+	struct list_head list;
+	spinlock_t lock;
+	struct memfile_node* (*lookup_memfile_node)(struct file *file);
+	int (*get_pfn)(struct file *file, pgoff_t offset, pfn_t *pfn,
+		       int *order);
+	void (*put_pfn)(pfn_t pfn);
+};
+
+struct memfile_notifier;
+struct memfile_notifier_ops {
+	void (*invalidate)(struct memfile_notifier *notifier,
+			   pgoff_t start, pgoff_t end);
+};
+
+struct memfile_notifier {
+	struct list_head list;
+	struct memfile_notifier_ops *ops;
+	struct memfile_backing_store *bs;
+};
+
+static inline void memfile_node_init(struct memfile_node *node)
+{
+	INIT_LIST_HEAD(&node->notifiers);
+	node->flags = 0;
+}
+
+#ifdef CONFIG_MEMFILE_NOTIFIER
+/* APIs for backing stores */
+extern void memfile_register_backing_store(struct memfile_backing_store *bs);
+extern int memfile_node_set_flags(struct file *file, unsigned long flags);
+extern void memfile_notifier_invalidate(struct memfile_node *node,
+					pgoff_t start, pgoff_t end);
+/*APIs for notifier consumers */
+extern int memfile_register_notifier(struct file *file, unsigned long flags,
+				     struct memfile_notifier *notifier);
+extern void memfile_unregister_notifier(struct memfile_notifier *notifier);
+
+#else /* !CONFIG_MEMFILE_NOTIFIER */
+static inline void memfile_register_backing_store(struct memfile_backing_store *bs)
+{
+}
+
+static inline int memfile_node_set_flags(struct file *file, unsigned long flags)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void memfile_notifier_invalidate(struct memfile_node *node,
+					       pgoff_t start, pgoff_t end)
+{
+}
+
+static inline int memfile_register_notifier(struct file *file,
+					    unsigned long flags,
+					    struct memfile_notifier *notifier)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void memfile_unregister_notifier(struct memfile_notifier *notifier)
+{
+}
+
+#endif /* CONFIG_MEMFILE_NOTIFIER */
+
+#endif /* _LINUX_MEMFILE_NOTIFIER_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index 169e64192e48..19ab9350f5cb 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1130,6 +1130,10 @@  config PTE_MARKER_UFFD_WP
 	  purposes.  It is required to enable userfaultfd write protection on
 	  file-backed memory types like shmem and hugetlbfs.
 
+config MEMFILE_NOTIFIER
+	bool
+	select SRCU
+
 source "mm/damon/Kconfig"
 
 endmenu
diff --git a/mm/Makefile b/mm/Makefile
index 6f9ffa968a1a..b7e3fb5fa85b 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -133,3 +133,4 @@  obj-$(CONFIG_PAGE_REPORTING) += page_reporting.o
 obj-$(CONFIG_IO_MAPPING) += io-mapping.o
 obj-$(CONFIG_HAVE_BOOTMEM_INFO_NODE) += bootmem_info.o
 obj-$(CONFIG_GENERIC_IOREMAP) += ioremap.o
+obj-$(CONFIG_MEMFILE_NOTIFIER) += memfile_notifier.o
diff --git a/mm/memfile_notifier.c b/mm/memfile_notifier.c
new file mode 100644
index 000000000000..799d3197903e
--- /dev/null
+++ b/mm/memfile_notifier.c
@@ -0,0 +1,121 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ *  Copyright (C) 2022  Intel Corporation.
+ *             Chao Peng <chao.p.peng@linux.intel.com>
+ */
+
+#include <linux/memfile_notifier.h>
+#include <linux/pagemap.h>
+#include <linux/srcu.h>
+
+DEFINE_STATIC_SRCU(memfile_srcu);
+static __ro_after_init LIST_HEAD(backing_store_list);
+
+
+void memfile_notifier_invalidate(struct memfile_node *node,
+				 pgoff_t start, pgoff_t end)
+{
+	struct memfile_notifier *notifier;
+	int id;
+
+	id = srcu_read_lock(&memfile_srcu);
+	list_for_each_entry_srcu(notifier, &node->notifiers, list,
+				 srcu_read_lock_held(&memfile_srcu)) {
+		if (notifier->ops->invalidate)
+			notifier->ops->invalidate(notifier, start, end);
+	}
+	srcu_read_unlock(&memfile_srcu, id);
+}
+
+void __init memfile_register_backing_store(struct memfile_backing_store *bs)
+{
+	spin_lock_init(&bs->lock);
+	list_add_tail(&bs->list, &backing_store_list);
+}
+
+static void memfile_node_update_flags(struct file *file, unsigned long flags)
+{
+	struct address_space *mapping = file_inode(file)->i_mapping;
+	gfp_t gfp;
+
+	gfp = mapping_gfp_mask(mapping);
+	if (flags & MEMFILE_F_UNMOVABLE)
+		gfp &= ~__GFP_MOVABLE;
+	else
+		gfp |= __GFP_MOVABLE;
+	mapping_set_gfp_mask(mapping, gfp);
+
+	if (flags & MEMFILE_F_UNRECLAIMABLE)
+		mapping_set_unevictable(mapping);
+	else
+		mapping_clear_unevictable(mapping);
+}
+
+int memfile_node_set_flags(struct file *file, unsigned long flags)
+{
+	struct memfile_backing_store *bs;
+	struct memfile_node *node;
+
+	if (flags & ~MEMFILE_F_ALLOWED_MASK)
+		return -EINVAL;
+
+	list_for_each_entry(bs, &backing_store_list, list) {
+		node = bs->lookup_memfile_node(file);
+		if (node) {
+			spin_lock(&bs->lock);
+			node->flags = flags;
+			spin_unlock(&bs->lock);
+			memfile_node_update_flags(file, flags);
+			return 0;
+		}
+	}
+
+	return -EOPNOTSUPP;
+}
+
+int memfile_register_notifier(struct file *file, unsigned long flags,
+			      struct memfile_notifier *notifier)
+{
+	struct memfile_backing_store *bs;
+	struct memfile_node *node;
+	struct list_head *list;
+
+	if (!file || !notifier || !notifier->ops)
+		return -EINVAL;
+	if (flags & ~MEMFILE_F_ALLOWED_MASK)
+		return -EINVAL;
+
+	list_for_each_entry(bs, &backing_store_list, list) {
+		node = bs->lookup_memfile_node(file);
+		if (node) {
+			list = &node->notifiers;
+			notifier->bs = bs;
+
+			spin_lock(&bs->lock);
+			if (list_empty(list))
+				node->flags = flags;
+			else if (node->flags ^ flags) {
+				spin_unlock(&bs->lock);
+				return -EINVAL;
+			}
+
+			list_add_rcu(&notifier->list, list);
+			spin_unlock(&bs->lock);
+			memfile_node_update_flags(file, flags);
+			return 0;
+		}
+	}
+
+	return -EOPNOTSUPP;
+}
+EXPORT_SYMBOL_GPL(memfile_register_notifier);
+
+void memfile_unregister_notifier(struct memfile_notifier *notifier)
+{
+	spin_lock(&notifier->bs->lock);
+	list_del_rcu(&notifier->list);
+	spin_unlock(&notifier->bs->lock);
+
+	synchronize_srcu(&memfile_srcu);
+}
+EXPORT_SYMBOL_GPL(memfile_unregister_notifier);