From patchwork Mon Aug 5 09:32:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13753341 Received: from smtp-fw-52003.amazon.com (smtp-fw-52003.amazon.com [52.119.213.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 127D5149000; Mon, 5 Aug 2024 09:33:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850405; cv=none; b=m2M85Encghc2rW9Av8FSlG5fxlEX1lirKS741Y4rIGA6/e6gR1ocHYcjF5jvvCxWxYn6tOl4u7iwgyeMjy/9i1oxCXkBjNxb+GtqUD/uTGmpGYk+/2PcJfiVy+Is4geIdtZBW0aTC9cW1OaCjk3i5tTLT/SoAHhjM2Dot0Dskho= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850405; c=relaxed/simple; bh=0bFYaHSa+Vz9oSFvO03FVk4NUWF0Y2/7z97wmzfwMmE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=q0x6ZFE9Icu1eBBNhBh3dCQOhpzFL2V9bKiwmB6ANXPWNdPfV7owSe05rW/5mxpgZarLfS0d5dem5PerKIUfCNlfZUfEpUwrQhmis4iDcZg8ATWfQ+/hRpPBTa0BHB27cvO1bQSxUj5P+e9u8DQnGFct2lcBTqVW/QWlWxGI/bs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=hQOus4z7; arc=none smtp.client-ip=52.119.213.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="hQOus4z7" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1722850403; x=1754386403; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CIMC4dBHyqkF/IJtbOCSyvit7T6FcLaiifrH1Oqev8I=; b=hQOus4z7PWl70BuQy228FpjLbq2l+y3TMJEdoZWRKOcQpMCXWq/sCM0r DdjTFak8h0tyrIASJBFf17/QibRnHZo6pjmQ9qZ2Nlo6+pTucIQufHybN 3/bDFS0w1hFmWrqp3cbRpabpinAXTkj6j6GcaAPY6X98pZEqfQfmKeqQZ M=; X-IronPort-AV: E=Sophos;i="6.09,264,1716249600"; d="scan'208";a="16963930" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52003.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Aug 2024 09:33:20 +0000 Received: from EX19MTAEUA002.ant.amazon.com [10.0.43.254:34036] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.35.73:2525] with esmtp (Farcaster) id 9a93516f-30d6-4fa2-8ce9-a895a0a56cc5; Mon, 5 Aug 2024 09:33:18 +0000 (UTC) X-Farcaster-Flow-ID: 9a93516f-30d6-4fa2-8ce9-a895a0a56cc5 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUA002.ant.amazon.com (10.252.50.126) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:33:18 +0000 Received: from u5d18b891348c5b.ant.amazon.com (10.146.13.113) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:33:09 +0000 From: James Gowans To: CC: James Gowans , Sean Christopherson , Paolo Bonzini , Alexander Viro , Steve Sistare , Christian Brauner , Jan Kara , "Anthony Yznaga" , Mike Rapoport , "Andrew Morton" , , Jason Gunthorpe , , Usama Arif , , Alexander Graf , David Woodhouse , Paul Durrant , Nicolas Saenz Julienne Subject: [PATCH 01/10] guestmemfs: Introduce filesystem skeleton Date: Mon, 5 Aug 2024 11:32:36 +0200 Message-ID: <20240805093245.889357-2-jgowans@amazon.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240805093245.889357-1-jgowans@amazon.com> References: <20240805093245.889357-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D033UWC004.ant.amazon.com (10.13.139.225) To EX19D014EUC004.ant.amazon.com (10.252.51.182) Add an in-memory filesystem: guestmemfs. Memory is donated to guestmemfs by carving it out of the normal System RAM range with the memmap= cmdline parameter and then giving that same physical range to guestmemfs with the guestmemfs= cmdline parameter. A new filesystem is added; so far it doesn't do much except persist a super block at the start of the donated memory and allows itself to be mounted. A hook to x86 mm init is added to reserve the memory really early on via memblock allocator. There is probably a better arch-independent place to do this... Signed-off-by: James Gowans --- arch/x86/mm/init_64.c | 2 + fs/Kconfig | 1 + fs/Makefile | 1 + fs/guestmemfs/Kconfig | 11 ++++ fs/guestmemfs/Makefile | 6 ++ fs/guestmemfs/guestmemfs.c | 116 +++++++++++++++++++++++++++++++++++++ fs/guestmemfs/guestmemfs.h | 9 +++ include/linux/guestmemfs.h | 16 +++++ 8 files changed, 162 insertions(+) create mode 100644 fs/guestmemfs/Kconfig create mode 100644 fs/guestmemfs/Makefile create mode 100644 fs/guestmemfs/guestmemfs.c create mode 100644 fs/guestmemfs/guestmemfs.h create mode 100644 include/linux/guestmemfs.h diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 8932ba8f5cdd..39fcf017c90c 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -1331,6 +1332,7 @@ static void __init preallocate_vmalloc_pages(void) void __init mem_init(void) { + guestmemfs_reserve_mem(); pci_iommu_alloc(); /* clear_bss() already clear the empty_zero_page */ diff --git a/fs/Kconfig b/fs/Kconfig index a46b0cbc4d8f..727359901da8 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -321,6 +321,7 @@ source "fs/befs/Kconfig" source "fs/bfs/Kconfig" source "fs/efs/Kconfig" source "fs/jffs2/Kconfig" +source "fs/guestmemfs/Kconfig" # UBIFS File system configuration source "fs/ubifs/Kconfig" source "fs/cramfs/Kconfig" diff --git a/fs/Makefile b/fs/Makefile index 6ecc9b0a53f2..044524b17d63 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -129,3 +129,4 @@ obj-$(CONFIG_EFIVAR_FS) += efivarfs/ obj-$(CONFIG_EROFS_FS) += erofs/ obj-$(CONFIG_VBOXSF_FS) += vboxsf/ obj-$(CONFIG_ZONEFS_FS) += zonefs/ +obj-$(CONFIG_GUESTMEMFS_FS) += guestmemfs/ diff --git a/fs/guestmemfs/Kconfig b/fs/guestmemfs/Kconfig new file mode 100644 index 000000000000..d87fca4822cb --- /dev/null +++ b/fs/guestmemfs/Kconfig @@ -0,0 +1,11 @@ +# SPDX-License-Identifier: GPL-2.0-only + +config GUESTMEMFS_FS + bool "Persistent Guest memory filesystem (guestmemfs)" + help + An in-memory filesystem on top of reserved memory specified via + guestmemfs= cmdline argument. Used for storing kernel state and + userspace memory which is preserved across kexec to support + live update of a hypervisor when running guest virtual machines. + Select this if you need the ability to persist memory for guest VMs + across kexec to do live update. diff --git a/fs/guestmemfs/Makefile b/fs/guestmemfs/Makefile new file mode 100644 index 000000000000..6dc820a9d4fe --- /dev/null +++ b/fs/guestmemfs/Makefile @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0-only +# +# Makefile for persistent kernel filesystem +# + +obj-y += guestmemfs.o diff --git a/fs/guestmemfs/guestmemfs.c b/fs/guestmemfs/guestmemfs.c new file mode 100644 index 000000000000..3aaada1b8df6 --- /dev/null +++ b/fs/guestmemfs/guestmemfs.c @@ -0,0 +1,116 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "guestmemfs.h" +#include +#include +#include +#include +#include +#include +#include + +static phys_addr_t guestmemfs_base, guestmemfs_size; +struct guestmemfs_sb *psb; + +static int statfs(struct dentry *root, struct kstatfs *buf) +{ + simple_statfs(root, buf); + buf->f_bsize = PMD_SIZE; + buf->f_blocks = guestmemfs_size / PMD_SIZE; + buf->f_bfree = buf->f_bavail = buf->f_blocks; + return 0; +} + +static const struct super_operations guestmemfs_super_ops = { + .statfs = statfs, +}; + +static int guestmemfs_fill_super(struct super_block *sb, struct fs_context *fc) +{ + struct inode *inode; + struct dentry *dentry; + + psb = kzalloc(sizeof(*psb), GFP_KERNEL); + /* + * Keep a reference to the persistent super block in the + * ephemeral super block. + */ + sb->s_fs_info = psb; + sb->s_op = &guestmemfs_super_ops; + + inode = new_inode(sb); + if (!inode) + return -ENOMEM; + + inode->i_ino = 1; + inode->i_mode = S_IFDIR; + inode->i_op = &simple_dir_inode_operations; + inode->i_fop = &simple_dir_operations; + simple_inode_init_ts(inode); + /* directory inodes start off with i_nlink == 2 (for "." entry) */ + inc_nlink(inode); + + dentry = d_make_root(inode); + if (!dentry) + return -ENOMEM; + sb->s_root = dentry; + + return 0; +} + +static int guestmemfs_get_tree(struct fs_context *fc) +{ + return get_tree_nodev(fc, guestmemfs_fill_super); +} + +static const struct fs_context_operations guestmemfs_context_ops = { + .get_tree = guestmemfs_get_tree, +}; + +static int guestmemfs_init_fs_context(struct fs_context *const fc) +{ + fc->ops = &guestmemfs_context_ops; + return 0; +} + +static struct file_system_type guestmemfs_fs_type = { + .owner = THIS_MODULE, + .name = "guestmemfs", + .init_fs_context = guestmemfs_init_fs_context, + .kill_sb = kill_litter_super, + .fs_flags = FS_USERNS_MOUNT, +}; + +static int __init guestmemfs_init(void) +{ + int ret; + + ret = register_filesystem(&guestmemfs_fs_type); + return ret; +} + +/** + * Format: guestmemfs=: + * Just like: memmap=nn[KMG]!ss[KMG] + */ +static int __init parse_guestmemfs_extents(char *p) +{ + guestmemfs_size = memparse(p, &p); + return 0; +} + +early_param("guestmemfs", parse_guestmemfs_extents); + +void __init guestmemfs_reserve_mem(void) +{ + guestmemfs_base = memblock_phys_alloc(guestmemfs_size, 4 << 10); + if (guestmemfs_base) { + memblock_reserved_mark_noinit(guestmemfs_base, guestmemfs_size); + memblock_mark_nomap(guestmemfs_base, guestmemfs_size); + } else { + pr_warn("Failed to alloc %llu bytes for guestmemfs\n", guestmemfs_size); + } +} + +MODULE_ALIAS_FS("guestmemfs"); +module_init(guestmemfs_init); diff --git a/fs/guestmemfs/guestmemfs.h b/fs/guestmemfs/guestmemfs.h new file mode 100644 index 000000000000..37d8cf630e0a --- /dev/null +++ b/fs/guestmemfs/guestmemfs.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#define pr_fmt(fmt) "guestmemfs: " KBUILD_MODNAME ": " fmt + +#include + +struct guestmemfs_sb { + /* Will be populated soon... */ +}; diff --git a/include/linux/guestmemfs.h b/include/linux/guestmemfs.h new file mode 100644 index 000000000000..60e769c8e533 --- /dev/null +++ b/include/linux/guestmemfs.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: MIT */ + +#ifndef _LINUX_GUESTMEMFS_H +#define _LINUX_GUESTMEMFS_H + +/* + * Carves out chunks of memory from memblocks for guestmemfs. + * Must be called in early boot before memblocks are freed. + */ +# ifdef CONFIG_GUESTMEMFS_FS +void guestmemfs_reserve_mem(void); +#else +void guestmemfs_reserve_mem(void) { } +#endif + +#endif From patchwork Mon Aug 5 09:32:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13753342 Received: from smtp-fw-52005.amazon.com (smtp-fw-52005.amazon.com [52.119.213.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 808B114F9D7; Mon, 5 Aug 2024 09:33:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850415; cv=none; b=dOS0NND58M0OGoJZDUhnP2ZIxW6z7otatCv4jm71rDLCNkOm4FmrHtzIBIcLu/qcHjnh0fu4iQxd/5H+0pHlKIDB2TPjPIsgBvJJCo5MJDaLUSMHEcf6xQ1mL21VXY06fL4N518ZCTCpog/Ah5bB5zkKMQbza8fZGKM+MMa62bI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850415; c=relaxed/simple; bh=nQ1WLF14Y/vRRp7JSxyU66cbzjWNAiG3Po4/qQvOwtM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Q1pINjF2JqkmUaLMEgk2XfOqlgN6px4ss9EUhNsqQ6o2Z9P114IeBeOE2Mf59hB29i2J/RpjpL5o40IgJ18tH5EVztaCclSddzcDi+cqlyUSGi55Y+fJkwp801SHcWJUzRbA2PrsH6NpDkXPlnRBWoGT2IwVfSOKf/IQQSsw+uw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=AeKvkmuC; arc=none smtp.client-ip=52.119.213.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="AeKvkmuC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1722850414; x=1754386414; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=HjMwVBXj1g+VXvnThIAFzLofQdSvF983VdApn1VgvTs=; b=AeKvkmuCoABgBHBi6DsQsnzFeX4RcWNcdbUnSIAHz/lDMN5Ded8t8wqV RywxuNRXkrlOw+iDJOIB0DJQH9qkMO/K0WZag/gl06OlRiHN013NanEcp S0KaRwb08n/2k0u6hF2hZZ+V/vR9OceAXZDV/kVSSQ0tfnHNkWbCxol5w k=; X-IronPort-AV: E=Sophos;i="6.09,264,1716249600"; d="scan'208";a="672010836" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52005.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Aug 2024 09:33:29 +0000 Received: from EX19MTAEUB001.ant.amazon.com [10.0.10.100:16269] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.35.73:2525] with esmtp (Farcaster) id 2fb10915-ae34-4c72-8260-ae60945f471a; Mon, 5 Aug 2024 09:33:28 +0000 (UTC) X-Farcaster-Flow-ID: 2fb10915-ae34-4c72-8260-ae60945f471a Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUB001.ant.amazon.com (10.252.51.26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:33:28 +0000 Received: from u5d18b891348c5b.ant.amazon.com (10.146.13.113) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:33:19 +0000 From: James Gowans To: CC: James Gowans , Sean Christopherson , Paolo Bonzini , Alexander Viro , Steve Sistare , Christian Brauner , Jan Kara , "Anthony Yznaga" , Mike Rapoport , "Andrew Morton" , , Jason Gunthorpe , , Usama Arif , , Alexander Graf , David Woodhouse , Paul Durrant , Nicolas Saenz Julienne Subject: [PATCH 02/10] guestmemfs: add inode store, files and dirs Date: Mon, 5 Aug 2024 11:32:37 +0200 Message-ID: <20240805093245.889357-3-jgowans@amazon.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240805093245.889357-1-jgowans@amazon.com> References: <20240805093245.889357-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D033UWC004.ant.amazon.com (10.13.139.225) To EX19D014EUC004.ant.amazon.com (10.252.51.182) Here inodes are added to the filesystem: inodes for both regular files and directories. This involes supporting the callbacks to create inodes in a directory, as well as being able to list the contents of a directory and lookup and inode by name. The inode store is implemented as a 2 MiB page which is an array of struct guestmemfs_inode. The reason to have a large allocation and put them all in a big flat array is to make persistence easy: when it's time to introduce persistence to the filesystem it will need to persist this one big chunk of inodes across kexec using KHO. Free inodes in the page form a slab type structure, the first free inode pointing to the next free inode, etc. The super block points to the first free, so allocating involves popping the head, and freeing an inode involves pushing a new head. Directories point to the first inode in the directory via a child_inode reference. Subsequent inodes within the same directory are pointed to via a sibling_inode member. Essentially forming a linked list of inodes within the directory. Looking up an inode in a directory involves traversing the sibling_inode linked list until one with a matching name is found. Filesystem stats are updated to account for total and allocated inodes. Signed-off-by: James Gowans --- fs/guestmemfs/Makefile | 2 +- fs/guestmemfs/dir.c | 43 ++++++++++ fs/guestmemfs/guestmemfs.c | 21 ++++- fs/guestmemfs/guestmemfs.h | 36 +++++++- fs/guestmemfs/inode.c | 164 +++++++++++++++++++++++++++++++++++++ 5 files changed, 260 insertions(+), 6 deletions(-) create mode 100644 fs/guestmemfs/dir.c create mode 100644 fs/guestmemfs/inode.c diff --git a/fs/guestmemfs/Makefile b/fs/guestmemfs/Makefile index 6dc820a9d4fe..804997799ce8 100644 --- a/fs/guestmemfs/Makefile +++ b/fs/guestmemfs/Makefile @@ -3,4 +3,4 @@ # Makefile for persistent kernel filesystem # -obj-y += guestmemfs.o +obj-y += guestmemfs.o inode.o dir.o diff --git a/fs/guestmemfs/dir.c b/fs/guestmemfs/dir.c new file mode 100644 index 000000000000..4acd81421c85 --- /dev/null +++ b/fs/guestmemfs/dir.c @@ -0,0 +1,43 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "guestmemfs.h" + +static int guestmemfs_dir_iterate(struct file *dir, struct dir_context *ctx) +{ + struct guestmemfs_inode *guestmemfs_inode; + struct super_block *sb = dir->f_inode->i_sb; + + /* Indication from previous invoke that there's no more to iterate. */ + if (ctx->pos == -1) + return 0; + + if (!dir_emit_dots(dir, ctx)) + return 0; + + /* + * Just emitted this dir; go to dir contents. Use pos to smuggle + * the next inode number to emit across iterations. + * -1 indicates no valid inode. Can't use 0 because first loop has pos=0 + */ + if (ctx->pos == 2) { + ctx->pos = guestmemfs_get_persisted_inode(sb, dir->f_inode->i_ino)->child_ino; + /* Empty dir case. */ + if (ctx->pos == 0) + ctx->pos = -1; + } + + while (ctx->pos > 1) { + guestmemfs_inode = guestmemfs_get_persisted_inode(sb, ctx->pos); + dir_emit(ctx, guestmemfs_inode->filename, GUESTMEMFS_FILENAME_LEN, + ctx->pos, DT_UNKNOWN); + ctx->pos = guestmemfs_inode->sibling_ino; + if (!ctx->pos) + ctx->pos = -1; + } + return 0; +} + +const struct file_operations guestmemfs_dir_fops = { + .owner = THIS_MODULE, + .iterate_shared = guestmemfs_dir_iterate, +}; diff --git a/fs/guestmemfs/guestmemfs.c b/fs/guestmemfs/guestmemfs.c index 3aaada1b8df6..21cb3490a2bd 100644 --- a/fs/guestmemfs/guestmemfs.c +++ b/fs/guestmemfs/guestmemfs.c @@ -18,6 +18,9 @@ static int statfs(struct dentry *root, struct kstatfs *buf) buf->f_bsize = PMD_SIZE; buf->f_blocks = guestmemfs_size / PMD_SIZE; buf->f_bfree = buf->f_bavail = buf->f_blocks; + buf->f_files = PMD_SIZE / sizeof(struct guestmemfs_inode); + buf->f_ffree = buf->f_files - + GUESTMEMFS_PSB(root->d_sb)->allocated_inodes; return 0; } @@ -31,24 +34,34 @@ static int guestmemfs_fill_super(struct super_block *sb, struct fs_context *fc) struct dentry *dentry; psb = kzalloc(sizeof(*psb), GFP_KERNEL); + psb->inodes = kzalloc(2 << 20, GFP_KERNEL); + if (!psb->inodes) + return -ENOMEM; + /* * Keep a reference to the persistent super block in the * ephemeral super block. */ sb->s_fs_info = psb; + spin_lock_init(&psb->allocation_lock); + guestmemfs_initialise_inode_store(sb); + guestmemfs_get_persisted_inode(sb, 1)->flags = GUESTMEMFS_INODE_FLAG_DIR; + strscpy(guestmemfs_get_persisted_inode(sb, 1)->filename, ".", + GUESTMEMFS_FILENAME_LEN); + psb->next_free_ino = 2; + sb->s_op = &guestmemfs_super_ops; - inode = new_inode(sb); + inode = guestmemfs_inode_get(sb, 1); if (!inode) return -ENOMEM; - inode->i_ino = 1; inode->i_mode = S_IFDIR; - inode->i_op = &simple_dir_inode_operations; - inode->i_fop = &simple_dir_operations; + inode->i_fop = &guestmemfs_dir_fops; simple_inode_init_ts(inode); /* directory inodes start off with i_nlink == 2 (for "." entry) */ inc_nlink(inode); + inode_init_owner(&nop_mnt_idmap, inode, NULL, inode->i_mode); dentry = d_make_root(inode); if (!dentry) diff --git a/fs/guestmemfs/guestmemfs.h b/fs/guestmemfs/guestmemfs.h index 37d8cf630e0a..3a2954d1beec 100644 --- a/fs/guestmemfs/guestmemfs.h +++ b/fs/guestmemfs/guestmemfs.h @@ -3,7 +3,41 @@ #define pr_fmt(fmt) "guestmemfs: " KBUILD_MODNAME ": " fmt #include +#include + +#define GUESTMEMFS_FILENAME_LEN 255 +#define GUESTMEMFS_PSB(sb) ((struct guestmemfs_sb *)sb->s_fs_info) struct guestmemfs_sb { - /* Will be populated soon... */ + /* Inode number */ + unsigned long next_free_ino; + unsigned long allocated_inodes; + struct guestmemfs_inode *inodes; + spinlock_t allocation_lock; +}; + +// If neither of these are set the inode is not in use. +#define GUESTMEMFS_INODE_FLAG_FILE (1 << 0) +#define GUESTMEMFS_INODE_FLAG_DIR (1 << 1) +struct guestmemfs_inode { + int flags; + /* + * Points to next inode in the same directory, or + * 0 if last file in directory. + */ + unsigned long sibling_ino; + /* + * If this inode is a directory, this points to the + * first inode *in* that directory. + */ + unsigned long child_ino; + char filename[GUESTMEMFS_FILENAME_LEN]; + void *mappings; + int num_mappings; }; + +void guestmemfs_initialise_inode_store(struct super_block *sb); +struct inode *guestmemfs_inode_get(struct super_block *sb, unsigned long ino); +struct guestmemfs_inode *guestmemfs_get_persisted_inode(struct super_block *sb, int ino); + +extern const struct file_operations guestmemfs_dir_fops; diff --git a/fs/guestmemfs/inode.c b/fs/guestmemfs/inode.c new file mode 100644 index 000000000000..2360c3a4857d --- /dev/null +++ b/fs/guestmemfs/inode.c @@ -0,0 +1,164 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "guestmemfs.h" +#include + +const struct inode_operations guestmemfs_dir_inode_operations; + +struct guestmemfs_inode *guestmemfs_get_persisted_inode(struct super_block *sb, int ino) +{ + /* + * Inode index starts at 1, so -1 to get memory index. + */ + return GUESTMEMFS_PSB(sb)->inodes + ino - 1; +} + +struct inode *guestmemfs_inode_get(struct super_block *sb, unsigned long ino) +{ + struct inode *inode = iget_locked(sb, ino); + + /* If this inode is cached it is already populated; just return */ + if (!(inode->i_state & I_NEW)) + return inode; + inode->i_op = &guestmemfs_dir_inode_operations; + inode->i_sb = sb; + inode->i_mode = S_IFREG; + unlock_new_inode(inode); + return inode; +} + +static unsigned long guestmemfs_allocate_inode(struct super_block *sb) +{ + + unsigned long next_free_ino = -ENOMEM; + struct guestmemfs_sb *psb = GUESTMEMFS_PSB(sb); + + spin_lock(&psb->allocation_lock); + next_free_ino = psb->next_free_ino; + psb->allocated_inodes += 1; + if (!next_free_ino) + goto out; + psb->next_free_ino = + guestmemfs_get_persisted_inode(sb, next_free_ino)->sibling_ino; +out: + spin_unlock(&psb->allocation_lock); + return next_free_ino; +} + +/* + * Zeroes the inode and makes it the head of the free list. + */ +static void guestmemfs_free_inode(struct super_block *sb, unsigned long ino) +{ + struct guestmemfs_sb *psb = GUESTMEMFS_PSB(sb); + struct guestmemfs_inode *inode = guestmemfs_get_persisted_inode(sb, ino); + + spin_lock(&psb->allocation_lock); + memset(inode, 0, sizeof(struct guestmemfs_inode)); + inode->sibling_ino = psb->next_free_ino; + psb->next_free_ino = ino; + psb->allocated_inodes -= 1; + spin_unlock(&psb->allocation_lock); +} + +/* + * Sets all inodes as free and points each free inode to the next one. + */ +void guestmemfs_initialise_inode_store(struct super_block *sb) +{ + /* Inode store is a PMD sized (ie: 2 MiB) page */ + memset(guestmemfs_get_persisted_inode(sb, 1), 0, PMD_SIZE); + /* Point each inode for the next one; linked-list initialisation. */ + for (unsigned long ino = 2; ino * sizeof(struct guestmemfs_inode) < PMD_SIZE; ino++) + guestmemfs_get_persisted_inode(sb, ino - 1)->sibling_ino = ino; +} + +static int guestmemfs_create(struct mnt_idmap *id, struct inode *dir, + struct dentry *dentry, umode_t mode, bool excl) +{ + unsigned long free_inode; + struct guestmemfs_inode *guestmemfs_inode; + struct inode *vfs_inode; + + free_inode = guestmemfs_allocate_inode(dir->i_sb); + if (free_inode <= 0) + return -ENOMEM; + + guestmemfs_inode = guestmemfs_get_persisted_inode(dir->i_sb, free_inode); + guestmemfs_inode->sibling_ino = + guestmemfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino; + guestmemfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino = free_inode; + strscpy(guestmemfs_inode->filename, dentry->d_name.name, GUESTMEMFS_FILENAME_LEN); + guestmemfs_inode->flags = GUESTMEMFS_INODE_FLAG_FILE; + /* TODO: make dynamic */ + guestmemfs_inode->mappings = kzalloc(PAGE_SIZE, GFP_KERNEL); + + vfs_inode = guestmemfs_inode_get(dir->i_sb, free_inode); + d_instantiate(dentry, vfs_inode); + return 0; +} + +static struct dentry *guestmemfs_lookup(struct inode *dir, + struct dentry *dentry, + unsigned int flags) +{ + struct guestmemfs_inode *guestmemfs_inode; + unsigned long ino; + + guestmemfs_inode = guestmemfs_get_persisted_inode(dir->i_sb, dir->i_ino); + ino = guestmemfs_inode->child_ino; + while (ino) { + guestmemfs_inode = guestmemfs_get_persisted_inode(dir->i_sb, ino); + if (!strncmp(guestmemfs_inode->filename, + dentry->d_name.name, + GUESTMEMFS_FILENAME_LEN)) { + d_add(dentry, guestmemfs_inode_get(dir->i_sb, ino)); + break; + } + ino = guestmemfs_inode->sibling_ino; + } + return NULL; +} + +static int guestmemfs_unlink(struct inode *dir, struct dentry *dentry) +{ + unsigned long ino; + struct guestmemfs_inode *inode; + + ino = guestmemfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino; + + /* Special case for first file in dir */ + if (ino == dentry->d_inode->i_ino) { + guestmemfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino = + guestmemfs_get_persisted_inode(dir->i_sb, + dentry->d_inode->i_ino)->sibling_ino; + guestmemfs_free_inode(dir->i_sb, ino); + return 0; + } + + /* + * Although we know exactly the inode to free, because we maintain only + * a singly linked list we need to scan for it to find the previous + * element so it's "next" pointer can be updated. + */ + while (ino) { + inode = guestmemfs_get_persisted_inode(dir->i_sb, ino); + /* We've found the one pointing to the one we want to delete */ + if (inode->sibling_ino == dentry->d_inode->i_ino) { + inode->sibling_ino = + guestmemfs_get_persisted_inode(dir->i_sb, + dentry->d_inode->i_ino)->sibling_ino; + guestmemfs_free_inode(dir->i_sb, dentry->d_inode->i_ino); + break; + } + ino = guestmemfs_get_persisted_inode(dir->i_sb, ino)->sibling_ino; + } + + return 0; +} + +const struct inode_operations guestmemfs_dir_inode_operations = { + .create = guestmemfs_create, + .lookup = guestmemfs_lookup, + .unlink = guestmemfs_unlink, +}; From patchwork Mon Aug 5 09:32:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13753343 Received: from smtp-fw-80008.amazon.com (smtp-fw-80008.amazon.com [99.78.197.219]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5EFA014D2B3; Mon, 5 Aug 2024 09:34:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.219 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850454; cv=none; b=S1GM8kJ/lYx735IHFBGo3nwRSahDP+s0S/xD49qE2lZx63CdQM2PnjQLCZbUDwtNUo3AiNPmsLUDBQ71Nj3hyzSJ+bn2ZRWwjO6zk97/QV7MJ3YAubrDOvOB38cd19q2lqvs7bMDWcDOsQdyMvd/t8MzEIVS76cNMcNAByDPr/M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850454; c=relaxed/simple; bh=C+2QCWuajYQ/XOlrovRJz5UKrRoJyZS+Th/wsV04b04=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=FF+XR5lF29PnrqZDx6LkgarEmP8L0Jo7YElpH5UoSWMvUrC7R76RVgC6DRCp791Fxn9rDi91qxmQ/ptIsmjSYHtVr9VagYexfBs7MlVjWdIzkSxNCA4optqtG307b4OAhDd76Z4MotQG1aLmxQw9CKH8vHcaIpEoDWssI2Jbqro= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=K9w/CAIM; arc=none smtp.client-ip=99.78.197.219 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="K9w/CAIM" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1722850453; x=1754386453; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mf4bGqkjJGcuD7WLWM/6xATCnP8HxmWYZb3Uae5eqbU=; b=K9w/CAIMa2st6perqqv1IUClGAN06TjVdr6KQZWOFQ1m1ToMAQ+TPayo jZosfRt3EEA/WnXH8B56n1cETBYJQrpj6D+OVsxtKAFfKE13s3op8lRZp UHWtTzRv1U9tRFiT3QtdAKT0SBQXlZWy46tqI0huQ/bFRjPO28f/eHCj5 Q=; X-IronPort-AV: E=Sophos;i="6.09,264,1716249600"; d="scan'208";a="112401062" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80008.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Aug 2024 09:34:11 +0000 Received: from EX19MTAEUA001.ant.amazon.com [10.0.43.254:30354] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.25.233:2525] with esmtp (Farcaster) id bfd7aaac-eb93-420b-99d8-e287bbc4fa9e; Mon, 5 Aug 2024 09:34:10 +0000 (UTC) X-Farcaster-Flow-ID: bfd7aaac-eb93-420b-99d8-e287bbc4fa9e Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUA001.ant.amazon.com (10.252.50.192) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:34:08 +0000 Received: from u5d18b891348c5b.ant.amazon.com (10.146.13.113) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:33:59 +0000 From: James Gowans To: CC: James Gowans , Sean Christopherson , Paolo Bonzini , Alexander Viro , Steve Sistare , Christian Brauner , Jan Kara , "Anthony Yznaga" , Mike Rapoport , "Andrew Morton" , , Jason Gunthorpe , , Usama Arif , , Alexander Graf , David Woodhouse , Paul Durrant , Nicolas Saenz Julienne Subject: [PATCH 03/10] guestmemfs: add persistent data block allocator Date: Mon, 5 Aug 2024 11:32:38 +0200 Message-ID: <20240805093245.889357-4-jgowans@amazon.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240805093245.889357-1-jgowans@amazon.com> References: <20240805093245.889357-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D044UWB004.ant.amazon.com (10.13.139.134) To EX19D014EUC004.ant.amazon.com (10.252.51.182) In order to assign backing data memory to files there needs to be the ability to allocate blocks of data from the large contiguous reserved memory block of filesystem memory. Here an allocated is added to serve that purpose. For now it's a simple bitmap allocator: each bit corresponds to a 2 MiB chunk in the filesystem data block. On initialisation the bitmap is allocated for a fixed size (TODO: make this dynamic based on filesystem memory size). Allocating a block involves finding and setting the next free bit. Allocations will be done in the next commit which adds support for truncating files. It's quite limiting having a fixed size bitmap, and we perhaps want to look at making this a dynamic and potentially large allocation early in boot using the memblock allocator. It may also turn out that a simple bitmap is too limiting and something with more metadata is needed. Signed-off-by: James Gowans --- fs/guestmemfs/Makefile | 2 +- fs/guestmemfs/allocator.c | 40 ++++++++++++++++++++++++++++++++++++++ fs/guestmemfs/guestmemfs.c | 4 ++++ fs/guestmemfs/guestmemfs.h | 3 +++ 4 files changed, 48 insertions(+), 1 deletion(-) create mode 100644 fs/guestmemfs/allocator.c diff --git a/fs/guestmemfs/Makefile b/fs/guestmemfs/Makefile index 804997799ce8..b357073a60f3 100644 --- a/fs/guestmemfs/Makefile +++ b/fs/guestmemfs/Makefile @@ -3,4 +3,4 @@ # Makefile for persistent kernel filesystem # -obj-y += guestmemfs.o inode.o dir.o +obj-y += guestmemfs.o inode.o dir.o allocator.o diff --git a/fs/guestmemfs/allocator.c b/fs/guestmemfs/allocator.c new file mode 100644 index 000000000000..3da14d11b60f --- /dev/null +++ b/fs/guestmemfs/allocator.c @@ -0,0 +1,40 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "guestmemfs.h" + +/** + * For allocating blocks from the guestmemfs filesystem. + */ + +static void *guestmemfs_allocations_bitmap(struct super_block *sb) +{ + return GUESTMEMFS_PSB(sb)->allocator_bitmap; +} + +void guestmemfs_zero_allocations(struct super_block *sb) +{ + memset(guestmemfs_allocations_bitmap(sb), 0, (1 << 20)); +} + +/* + * Allocs one 2 MiB block, and returns the block index. + * Index is 2 MiB chunk index. + * Negative error code if unable to alloc. + */ +long guestmemfs_alloc_block(struct super_block *sb) +{ + unsigned long free_bit; + void *allocations_mem = guestmemfs_allocations_bitmap(sb); + + free_bit = bitmap_find_next_zero_area(allocations_mem, + (1 << 20), /* Size */ + 0, /* Start */ + 1, /* Number of zeroed bits to look for */ + 0); /* Alignment mask - none required. */ + + if (free_bit >= PMD_SIZE / 2) + return -ENOMEM; + + bitmap_set(allocations_mem, free_bit, 1); + return free_bit; +} diff --git a/fs/guestmemfs/guestmemfs.c b/fs/guestmemfs/guestmemfs.c index 21cb3490a2bd..c45c796c497a 100644 --- a/fs/guestmemfs/guestmemfs.c +++ b/fs/guestmemfs/guestmemfs.c @@ -37,6 +37,9 @@ static int guestmemfs_fill_super(struct super_block *sb, struct fs_context *fc) psb->inodes = kzalloc(2 << 20, GFP_KERNEL); if (!psb->inodes) return -ENOMEM; + psb->allocator_bitmap = kzalloc(1 << 20, GFP_KERNEL); + if (!psb->allocator_bitmap) + return -ENOMEM; /* * Keep a reference to the persistent super block in the @@ -45,6 +48,7 @@ static int guestmemfs_fill_super(struct super_block *sb, struct fs_context *fc) sb->s_fs_info = psb; spin_lock_init(&psb->allocation_lock); guestmemfs_initialise_inode_store(sb); + guestmemfs_zero_allocations(sb); guestmemfs_get_persisted_inode(sb, 1)->flags = GUESTMEMFS_INODE_FLAG_DIR; strscpy(guestmemfs_get_persisted_inode(sb, 1)->filename, ".", GUESTMEMFS_FILENAME_LEN); diff --git a/fs/guestmemfs/guestmemfs.h b/fs/guestmemfs/guestmemfs.h index 3a2954d1beec..af9832390be3 100644 --- a/fs/guestmemfs/guestmemfs.h +++ b/fs/guestmemfs/guestmemfs.h @@ -13,6 +13,7 @@ struct guestmemfs_sb { unsigned long next_free_ino; unsigned long allocated_inodes; struct guestmemfs_inode *inodes; + void *allocator_bitmap; spinlock_t allocation_lock; }; @@ -37,6 +38,8 @@ struct guestmemfs_inode { }; void guestmemfs_initialise_inode_store(struct super_block *sb); +void guestmemfs_zero_allocations(struct super_block *sb); +long guestmemfs_alloc_block(struct super_block *sb); struct inode *guestmemfs_inode_get(struct super_block *sb, unsigned long ino); struct guestmemfs_inode *guestmemfs_get_persisted_inode(struct super_block *sb, int ino); From patchwork Mon Aug 5 09:32:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13753344 Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5C3A615383D; Mon, 5 Aug 2024 09:34:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.171.184.29 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850470; cv=none; b=F5GBlaPX72CLnPfsqHl3RDNyqZUKjoM80xsIRWFMBdXjYx7WHicW2WmCotHMeephhxPnZmyCw452JtZ74cJL6FseOdXtbK6F1zweP/TYuF+PoKx5fqQ3TD/SxH072kQgVfOXyW9TvQzlQrzZo7gFK2bi7IBXy2kaNBQPt5/pdyo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850470; c=relaxed/simple; bh=AP2P6uLd9rXrpl3zS13MYDJFQU66oOrwe02O7eR19m0=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=NVnXSjTRtK+JdTX6rtcFGKJE37bJQfsBVG1MV6CJDj4x1fAN7sTXgcisnoTXoHFiOayQIIGEzAZm6wM4tIVFQ6yyo7vaWmRuiNMrJ7UIFupAl39Qfp26q/DHe+lkt1H9KU38xjMOe2dBwwMIm1C3pnDipR03ii+kSLb5p/GZVsM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=ZKqObFiz; arc=none smtp.client-ip=207.171.184.29 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="ZKqObFiz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1722850470; x=1754386470; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rBGc29GJqkanga86wc+CDruFBMWuAWTKAOMI6eWst9M=; b=ZKqObFizk4ScXCUe1MWqPVPs85PgeWoyJP9Ubt6Qpy8nKDJz0yAdafL7 ALMNU61tnDw6ZjI0/YVZyboyDLq4AkpZZvko83CxNlE13m1WE3KHKFq5r JkO8WA7tAT83lQQJVFpR/6crkV1j7x8hza6cJB6tiDPClvZjnoDfsWbR9 Q=; X-IronPort-AV: E=Sophos;i="6.09,264,1716249600"; d="scan'208";a="440962193" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-9102.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Aug 2024 09:34:20 +0000 Received: from EX19MTAEUB001.ant.amazon.com [10.0.10.100:16945] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.35.73:2525] with esmtp (Farcaster) id bb9ee3fa-f148-4792-ad18-0aaa1262bc00; Mon, 5 Aug 2024 09:34:18 +0000 (UTC) X-Farcaster-Flow-ID: bb9ee3fa-f148-4792-ad18-0aaa1262bc00 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUB001.ant.amazon.com (10.252.51.26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:34:18 +0000 Received: from u5d18b891348c5b.ant.amazon.com (10.146.13.113) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:34:09 +0000 From: James Gowans To: CC: James Gowans , Sean Christopherson , Paolo Bonzini , Alexander Viro , Steve Sistare , Christian Brauner , Jan Kara , "Anthony Yznaga" , Mike Rapoport , "Andrew Morton" , , Jason Gunthorpe , , Usama Arif , , Alexander Graf , David Woodhouse , Paul Durrant , Nicolas Saenz Julienne Subject: [PATCH 04/10] guestmemfs: support file truncation Date: Mon, 5 Aug 2024 11:32:39 +0200 Message-ID: <20240805093245.889357-5-jgowans@amazon.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240805093245.889357-1-jgowans@amazon.com> References: <20240805093245.889357-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D044UWB004.ant.amazon.com (10.13.139.134) To EX19D014EUC004.ant.amazon.com (10.252.51.182) In a previous commit a block allocator was added. Now use that block allocator to allocate blocks for files when ftruncate is run on them. To do that a inode_operations is added on the file inodes with a getattr callback handling the ATTR_SIZE attribute. When this is invoked pages are allocated, the indexes of which are put into a mappings block. The mappings block is an array with the index being the file offset block and the value at that index being the pkernfs block backign that file offset. Signed-off-by: James Gowans --- fs/guestmemfs/Makefile | 2 +- fs/guestmemfs/file.c | 52 ++++++++++++++++++++++++++++++++++++++ fs/guestmemfs/guestmemfs.h | 2 ++ fs/guestmemfs/inode.c | 25 +++++++++++++++--- 4 files changed, 77 insertions(+), 4 deletions(-) create mode 100644 fs/guestmemfs/file.c diff --git a/fs/guestmemfs/Makefile b/fs/guestmemfs/Makefile index b357073a60f3..e93e43ba274b 100644 --- a/fs/guestmemfs/Makefile +++ b/fs/guestmemfs/Makefile @@ -3,4 +3,4 @@ # Makefile for persistent kernel filesystem # -obj-y += guestmemfs.o inode.o dir.o allocator.o +obj-y += guestmemfs.o inode.o dir.o allocator.o file.o diff --git a/fs/guestmemfs/file.c b/fs/guestmemfs/file.c new file mode 100644 index 000000000000..618c93b12196 --- /dev/null +++ b/fs/guestmemfs/file.c @@ -0,0 +1,52 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "guestmemfs.h" + +static int truncate(struct inode *inode, loff_t newsize) +{ + unsigned long free_block; + struct guestmemfs_inode *guestmemfs_inode; + unsigned long *mappings; + + guestmemfs_inode = guestmemfs_get_persisted_inode(inode->i_sb, inode->i_ino); + mappings = guestmemfs_inode->mappings; + i_size_write(inode, newsize); + for (int block_idx = 0; block_idx * PMD_SIZE < newsize; ++block_idx) { + free_block = guestmemfs_alloc_block(inode->i_sb); + if (free_block < 0) + /* TODO: roll back allocations. */ + return -ENOMEM; + *(mappings + block_idx) = free_block; + ++guestmemfs_inode->num_mappings; + } + return 0; +} + +static int inode_setattr(struct mnt_idmap *idmap, struct dentry *dentry, struct iattr *iattr) +{ + struct inode *inode = dentry->d_inode; + int error; + + error = setattr_prepare(idmap, dentry, iattr); + if (error) + return error; + + if (iattr->ia_valid & ATTR_SIZE) { + error = truncate(inode, iattr->ia_size); + if (error) + return error; + } + setattr_copy(idmap, inode, iattr); + mark_inode_dirty(inode); + return 0; +} + +const struct inode_operations guestmemfs_file_inode_operations = { + .setattr = inode_setattr, + .getattr = simple_getattr, +}; + +const struct file_operations guestmemfs_file_fops = { + .owner = THIS_MODULE, + .iterate_shared = NULL, +}; diff --git a/fs/guestmemfs/guestmemfs.h b/fs/guestmemfs/guestmemfs.h index af9832390be3..7ea03ac8ecca 100644 --- a/fs/guestmemfs/guestmemfs.h +++ b/fs/guestmemfs/guestmemfs.h @@ -44,3 +44,5 @@ struct inode *guestmemfs_inode_get(struct super_block *sb, unsigned long ino); struct guestmemfs_inode *guestmemfs_get_persisted_inode(struct super_block *sb, int ino); extern const struct file_operations guestmemfs_dir_fops; +extern const struct file_operations guestmemfs_file_fops; +extern const struct inode_operations guestmemfs_file_inode_operations; diff --git a/fs/guestmemfs/inode.c b/fs/guestmemfs/inode.c index 2360c3a4857d..61f70441d82c 100644 --- a/fs/guestmemfs/inode.c +++ b/fs/guestmemfs/inode.c @@ -15,14 +15,28 @@ struct guestmemfs_inode *guestmemfs_get_persisted_inode(struct super_block *sb, struct inode *guestmemfs_inode_get(struct super_block *sb, unsigned long ino) { + struct guestmemfs_inode *guestmemfs_inode; struct inode *inode = iget_locked(sb, ino); /* If this inode is cached it is already populated; just return */ if (!(inode->i_state & I_NEW)) return inode; - inode->i_op = &guestmemfs_dir_inode_operations; + guestmemfs_inode = guestmemfs_get_persisted_inode(sb, ino); inode->i_sb = sb; - inode->i_mode = S_IFREG; + + if (guestmemfs_inode->flags & GUESTMEMFS_INODE_FLAG_DIR) { + inode->i_op = &guestmemfs_dir_inode_operations; + inode->i_mode = S_IFDIR; + } else { + inode->i_op = &guestmemfs_file_inode_operations; + inode->i_mode = S_IFREG; + inode->i_fop = &guestmemfs_file_fops; + inode->i_size = guestmemfs_inode->num_mappings * PMD_SIZE; + } + + set_nlink(inode, 1); + + /* Switch based on file type */ unlock_new_inode(inode); return inode; } @@ -103,6 +117,7 @@ static struct dentry *guestmemfs_lookup(struct inode *dir, unsigned int flags) { struct guestmemfs_inode *guestmemfs_inode; + struct inode *vfs_inode; unsigned long ino; guestmemfs_inode = guestmemfs_get_persisted_inode(dir->i_sb, dir->i_ino); @@ -112,7 +127,10 @@ static struct dentry *guestmemfs_lookup(struct inode *dir, if (!strncmp(guestmemfs_inode->filename, dentry->d_name.name, GUESTMEMFS_FILENAME_LEN)) { - d_add(dentry, guestmemfs_inode_get(dir->i_sb, ino)); + vfs_inode = guestmemfs_inode_get(dir->i_sb, ino); + mark_inode_dirty(dir); + inode_update_timestamps(vfs_inode, S_ATIME); + d_add(dentry, vfs_inode); break; } ino = guestmemfs_inode->sibling_ino; @@ -162,3 +180,4 @@ const struct inode_operations guestmemfs_dir_inode_operations = { .lookup = guestmemfs_lookup, .unlink = guestmemfs_unlink, }; + From patchwork Mon Aug 5 09:32:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13753345 Received: from smtp-fw-52002.amazon.com (smtp-fw-52002.amazon.com [52.119.213.150]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A1FB4154BE7; Mon, 5 Aug 2024 09:34:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.150 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850474; cv=none; b=oQkam1Pn/h+Kbfottc5Pms2yiy28tiSo6NoSmtR7fU4YfMGWyK7az1zdEPX5KkfXV+oS2TbKTRABl86CD1gLdLLr4t1RhBeeaNwB7dW3Bo9MK4BCfgIKP0RhhguEQO2/WI4dQY7OODzzKkLylLNvcKozuL+90mELAiFcwwCcCa0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850474; c=relaxed/simple; bh=JDBxRZkF7qtTIiobsrij9ABz3RpEDD5fpxOB1egzB0g=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=oSXG86KvGVsgzmFxRgROgCX0hXIaWddbEYDMg/CLUS/+07SYBSOU63Ob6ZCHkN61f0C2Ypjz3UgDbQsvgPMvUY1hhtT0ty3Z2eIiMu6NHYx7pfK+rkFYBfjU5i7XwW91wFv5d26tAm+EJ0CKr622i85zIC2Xv428NYWc8fIpDhA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=IkUqHxWA; arc=none smtp.client-ip=52.119.213.150 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="IkUqHxWA" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1722850473; x=1754386473; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QLzX8kQ4Me7VkWkGGUOEVumIPTgCaBccV0Vp9jQZvU0=; b=IkUqHxWAI7TYM9W2XdphdZ415TRyDYS6hLpAGvEqxH5+10J1lkCjK83C C6Kj9OgDLio64suRcfCtH2pPJfN5MDfxsNTFskg/+66TvkOlQHBk6wy5g NRj3lRYATmW8rZPkGp/d3lnCGxnYxIgxxhhv64r2KJARiOLSaTjUNjWWF w=; X-IronPort-AV: E=Sophos;i="6.09,264,1716249600"; d="scan'208";a="650673665" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52002.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Aug 2024 09:34:29 +0000 Received: from EX19MTAEUB002.ant.amazon.com [10.0.17.79:60629] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.35.73:2525] with esmtp (Farcaster) id d3e48c8d-1f8d-494a-8e05-dd641a46ffbf; Mon, 5 Aug 2024 09:34:28 +0000 (UTC) X-Farcaster-Flow-ID: d3e48c8d-1f8d-494a-8e05-dd641a46ffbf Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUB002.ant.amazon.com (10.252.51.79) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:34:27 +0000 Received: from u5d18b891348c5b.ant.amazon.com (10.146.13.113) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:34:18 +0000 From: James Gowans To: CC: James Gowans , Sean Christopherson , Paolo Bonzini , Alexander Viro , Steve Sistare , Christian Brauner , Jan Kara , "Anthony Yznaga" , Mike Rapoport , "Andrew Morton" , , Jason Gunthorpe , , Usama Arif , , Alexander Graf , David Woodhouse , Paul Durrant , Nicolas Saenz Julienne Subject: [PATCH 05/10] guestmemfs: add file mmap callback Date: Mon, 5 Aug 2024 11:32:40 +0200 Message-ID: <20240805093245.889357-6-jgowans@amazon.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240805093245.889357-1-jgowans@amazon.com> References: <20240805093245.889357-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D044UWB004.ant.amazon.com (10.13.139.134) To EX19D014EUC004.ant.amazon.com (10.252.51.182) Make the file data usable to userspace by adding mmap. That's all that QEMU needs for guest RAM, so that's all be bother implementing for now. When mmaping the file the VMA is marked as PFNMAP to indicate that there are no struct pages for the memory in this VMA. Remap_pfn_range() is used to actually populate the page tables. All PTEs are pre-faulted into the pgtables at mmap time so that the pgtables are usable when this virtual address range is given to VFIO's MAP_DMA. Signed-off-by: James Gowans --- fs/guestmemfs/file.c | 43 +++++++++++++++++++++++++++++++++++++- fs/guestmemfs/guestmemfs.c | 2 +- fs/guestmemfs/guestmemfs.h | 3 +++ 3 files changed, 46 insertions(+), 2 deletions(-) diff --git a/fs/guestmemfs/file.c b/fs/guestmemfs/file.c index 618c93b12196..b1a52abcde65 100644 --- a/fs/guestmemfs/file.c +++ b/fs/guestmemfs/file.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0-only #include "guestmemfs.h" +#include static int truncate(struct inode *inode, loff_t newsize) { @@ -41,6 +42,46 @@ static int inode_setattr(struct mnt_idmap *idmap, struct dentry *dentry, struct return 0; } +/* + * To be able to use PFNMAP VMAs for VFIO DMA mapping we need the page tables + * populated with mappings. Pre-fault everything. + */ +static int mmap(struct file *filp, struct vm_area_struct *vma) +{ + int rc; + unsigned long *mappings_block; + struct guestmemfs_inode *guestmemfs_inode; + + guestmemfs_inode = guestmemfs_get_persisted_inode(filp->f_inode->i_sb, + filp->f_inode->i_ino); + + mappings_block = guestmemfs_inode->mappings; + + /* Remap-pfn-range will mark the range VM_IO */ + for (unsigned long vma_addr_offset = vma->vm_start; + vma_addr_offset < vma->vm_end; + vma_addr_offset += PMD_SIZE) { + int block, mapped_block; + unsigned long map_size = min(PMD_SIZE, vma->vm_end - vma_addr_offset); + + block = (vma_addr_offset - vma->vm_start) / PMD_SIZE; + mapped_block = *(mappings_block + block); + /* + * It's wrong to use rempa_pfn_range; this will install PTE-level entries. + * The whole point of 2 MiB allocs is to improve TLB perf! + * We should use something like mm/huge_memory.c#insert_pfn_pmd + * but that is currently static. + * TODO: figure out the best way to install PMDs. + */ + rc = remap_pfn_range(vma, + vma_addr_offset, + (guestmemfs_base >> PAGE_SHIFT) + (mapped_block * 512), + map_size, + vma->vm_page_prot); + } + return 0; +} + const struct inode_operations guestmemfs_file_inode_operations = { .setattr = inode_setattr, .getattr = simple_getattr, @@ -48,5 +89,5 @@ const struct inode_operations guestmemfs_file_inode_operations = { const struct file_operations guestmemfs_file_fops = { .owner = THIS_MODULE, - .iterate_shared = NULL, + .mmap = mmap, }; diff --git a/fs/guestmemfs/guestmemfs.c b/fs/guestmemfs/guestmemfs.c index c45c796c497a..38f20ad25286 100644 --- a/fs/guestmemfs/guestmemfs.c +++ b/fs/guestmemfs/guestmemfs.c @@ -9,7 +9,7 @@ #include #include -static phys_addr_t guestmemfs_base, guestmemfs_size; +phys_addr_t guestmemfs_base, guestmemfs_size; struct guestmemfs_sb *psb; static int statfs(struct dentry *root, struct kstatfs *buf) diff --git a/fs/guestmemfs/guestmemfs.h b/fs/guestmemfs/guestmemfs.h index 7ea03ac8ecca..0f2788ce740e 100644 --- a/fs/guestmemfs/guestmemfs.h +++ b/fs/guestmemfs/guestmemfs.h @@ -8,6 +8,9 @@ #define GUESTMEMFS_FILENAME_LEN 255 #define GUESTMEMFS_PSB(sb) ((struct guestmemfs_sb *)sb->s_fs_info) +/* Units of bytes */ +extern phys_addr_t guestmemfs_base, guestmemfs_size; + struct guestmemfs_sb { /* Inode number */ unsigned long next_free_ino; From patchwork Mon Aug 5 09:32:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13753346 Received: from smtp-fw-80007.amazon.com (smtp-fw-80007.amazon.com [99.78.197.218]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B81314D2B3; Mon, 5 Aug 2024 09:35:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.218 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850523; cv=none; b=e6iPHAgcjL4CC6msKYTkXVrJR/BmoVBSQNGQb1tng3izZNRtcMYjcctoQmhpug0x92Hx/vG+/3Q9tXzD7L8CCJL2YIdwFAPXxH8Ke+iNzl+cZk5Gz+mfEDCrm6Dj/hQ3HCm4Kc6oB8EPTHh4Js4R+Esu98dgx7+ogxWCeyu9mPw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850523; c=relaxed/simple; bh=EmW4BPUo26OKPsT+/gBshLdYldAX2g0Wn1GmRbHCE6k=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=kBGHfLy0wUE9iXbbGt1fBLobLkG/UeFOWJBvq7QASsTaCD0CTa1NeVlqaE6Bfiw6m5iinSO3GJvltNAUkv2a/LRuzZT8AEgogZo2/GzkktDO56Gb8gb+hCDa0s02V8F3LA5w4D7Q8Ly5GYEiXWxQTrI181X7u9epQ9+28gQrK4Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=BZaZ630C; arc=none smtp.client-ip=99.78.197.218 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="BZaZ630C" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1722850522; x=1754386522; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Wges4MT0n6I9CyO780W2I1CcNRjpe5PQ75LHC+blwyY=; b=BZaZ630Czllo7tAoiVOI93WyGbuSy0ixVXym56SPOhUnTDUtsCCWRwaL KBFL1+H3FhMKGYhSOaIpQt+VKzy/5T8LDu9lrkawq2NJQt+M49ghByb7o BLVImtlqykFutxqfqo4hb832aAsumYILUioDCfrZlDkl/E83Ub+XURFGe M=; X-IronPort-AV: E=Sophos;i="6.09,264,1716249600"; d="scan'208";a="318022324" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80007.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Aug 2024 09:35:09 +0000 Received: from EX19MTAEUC002.ant.amazon.com [10.0.43.254:19549] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.14.223:2525] with esmtp (Farcaster) id 73411eaf-fd8d-4133-a679-10dbf0f4e67a; Mon, 5 Aug 2024 09:35:08 +0000 (UTC) X-Farcaster-Flow-ID: 73411eaf-fd8d-4133-a679-10dbf0f4e67a Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUC002.ant.amazon.com (10.252.51.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:35:08 +0000 Received: from u5d18b891348c5b.ant.amazon.com (10.146.13.113) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:34:58 +0000 From: James Gowans To: CC: James Gowans , Sean Christopherson , Paolo Bonzini , Alexander Viro , Steve Sistare , Christian Brauner , Jan Kara , "Anthony Yznaga" , Mike Rapoport , "Andrew Morton" , , Jason Gunthorpe , , Usama Arif , , Alexander Graf , David Woodhouse , Paul Durrant , Nicolas Saenz Julienne Subject: [PATCH 06/10] kexec/kho: Add addr flag to not initialise memory Date: Mon, 5 Aug 2024 11:32:41 +0200 Message-ID: <20240805093245.889357-7-jgowans@amazon.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240805093245.889357-1-jgowans@amazon.com> References: <20240805093245.889357-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D046UWB002.ant.amazon.com (10.13.139.181) To EX19D014EUC004.ant.amazon.com (10.252.51.182) Smuggle a flag on the address field. If set the memory region being reserved via KHO will be marked as no init in memblocks so it will not get struct pages, will not get given to the buddy allocator and will not be part of the direct map. This allows drivers to pass memory ranges which the driver has allocated itself from memblocks, independent of the kernel's mm and struct page based memory management. Signed-off-by: James Gowans --- include/uapi/linux/kexec.h | 6 ++++++ kernel/kexec_kho_in.c | 12 +++++++++++- kernel/kexec_kho_out.c | 4 ++++ 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h index ad9e95b88b34..1c031a261c2c 100644 --- a/include/uapi/linux/kexec.h +++ b/include/uapi/linux/kexec.h @@ -52,6 +52,12 @@ /* KHO passes an array of kho_mem as "mem cache" to the new kernel */ struct kho_mem { + /* + * Use the last bits for flags; addrs should be at least word + * aligned. + */ +#define KHO_MEM_ADDR_FLAG_NOINIT BIT(0) +#define KHO_MEM_ADDR_FLAG_MASK (BIT(1) - 1) __u64 addr; __u64 len; }; diff --git a/kernel/kexec_kho_in.c b/kernel/kexec_kho_in.c index 5f8e0d9f9e12..943d9483b009 100644 --- a/kernel/kexec_kho_in.c +++ b/kernel/kexec_kho_in.c @@ -75,6 +75,11 @@ __init void kho_populate_refcount(void) */ for (offset = 0; offset < mem_len; offset += sizeof(struct kho_mem)) { struct kho_mem *mem = mem_virt + offset; + + /* No struct pages for this region; nothing to claim. */ + if (mem->addr & KHO_MEM_ADDR_FLAG_NOINIT) + continue; + u64 start_pfn = PFN_DOWN(mem->addr); u64 end_pfn = PFN_UP(mem->addr + mem->len); u64 pfn; @@ -183,8 +188,13 @@ void __init kho_reserve_previous_mem(void) /* Then populate all preserved memory areas as reserved */ for (off = 0; off < mem_len; off += sizeof(struct kho_mem)) { struct kho_mem *mem = mem_virt + off; + __u64 addr = mem->addr & ~KHO_MEM_ADDR_FLAG_MASK; - memblock_reserve(mem->addr, mem->len); + memblock_reserve(addr, mem->len); + if (mem->addr & KHO_MEM_ADDR_FLAG_NOINIT) { + memblock_reserved_mark_noinit(addr, mem->len); + memblock_mark_nomap(addr, mem->len); + } } /* Unreserve the mem cache - we don't need it from here on */ diff --git a/kernel/kexec_kho_out.c b/kernel/kexec_kho_out.c index 2cf5755f5e4a..4d9da501c5dc 100644 --- a/kernel/kexec_kho_out.c +++ b/kernel/kexec_kho_out.c @@ -175,6 +175,10 @@ static int kho_alloc_mem_cache(struct kimage *image, void *fdt) const struct kho_mem *mem = &mems[i]; ulong mstart = PAGE_ALIGN_DOWN(mem->addr); ulong mend = PAGE_ALIGN(mem->addr + mem->len); + + /* Re-apply flags lost during round down. */ + mstart |= mem->addr & KHO_MEM_ADDR_FLAG_MASK; + struct kho_mem cmem = { .addr = mstart, .len = (mend - mstart), From patchwork Mon Aug 5 09:32:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13753347 Received: from smtp-fw-2101.amazon.com (smtp-fw-2101.amazon.com [72.21.196.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2403815442A; Mon, 5 Aug 2024 09:35:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=72.21.196.25 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850526; cv=none; b=R22y+dIdNG/ZVxpWnuLP6RzOyWOYUkjhrxSs0wQrr2A7y6NUz3m0rzoKuYMSz5QjFOyF/mnDINTcqgqiZkamr75MN7zzjcDSaormKPbhS2iABLhHofkceeJHdO+d4k9e9IPz1t9I/pDaT7arGpvR9VKUHdoJJ+TcZ+41SBn1BZU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850526; c=relaxed/simple; bh=NWcdHOT6meDsgvRcXp1Ku0orGgpxznWQs9iKhksWhQM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=MEbj1KoJ7gIy7EuXzgfkgcIMdGsBORTU9EiyXnhwgQz+UzNLT+9tC2hX4jkJ7SVvlw2hSwMs0XWD/DG7jjEPnT4Kn/tHPMNEvDiXEvMhDtH1DE8NjxDMrQbXqT81dCmNurbFMj1y9OUfaenj2zrvviI8v+HaEv7ziocWDYUTKMk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=l0N4dnJ9; arc=none smtp.client-ip=72.21.196.25 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="l0N4dnJ9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1722850524; x=1754386524; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GwcyuHUonPineq63SNtpYz+107twfbb1IfXA3IZYCMA=; b=l0N4dnJ9A+R7uRnHMR5icwiwQLh0cAexhbGXt4mlcADbD2ueg0LDFuSY FZyUv3jjNy6obk/7pwD44imNYq0gQnELfHqNbOL92oJmeMxUXIkRAZnRb qLdahAPa0AFjkm54bQMPbb7wvAhOam8CQS5BN2Z+E5R1bOMbsYtAC7c4A Y=; X-IronPort-AV: E=Sophos;i="6.09,264,1716249600"; d="scan'208";a="419323041" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-2101.iad2.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Aug 2024 09:35:20 +0000 Received: from EX19MTAEUC001.ant.amazon.com [10.0.17.79:14005] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.35.73:2525] with esmtp (Farcaster) id e982917a-a97c-45f2-b4e4-5a00217a4f1e; Mon, 5 Aug 2024 09:35:18 +0000 (UTC) X-Farcaster-Flow-ID: e982917a-a97c-45f2-b4e4-5a00217a4f1e Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUC001.ant.amazon.com (10.252.51.155) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:35:18 +0000 Received: from u5d18b891348c5b.ant.amazon.com (10.146.13.113) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:35:08 +0000 From: James Gowans To: CC: James Gowans , Sean Christopherson , Paolo Bonzini , Alexander Viro , Steve Sistare , Christian Brauner , Jan Kara , "Anthony Yznaga" , Mike Rapoport , "Andrew Morton" , , Jason Gunthorpe , , Usama Arif , , Alexander Graf , David Woodhouse , Paul Durrant , Nicolas Saenz Julienne Subject: [PATCH 07/10] guestmemfs: Persist filesystem metadata via KHO Date: Mon, 5 Aug 2024 11:32:42 +0200 Message-ID: <20240805093245.889357-8-jgowans@amazon.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240805093245.889357-1-jgowans@amazon.com> References: <20240805093245.889357-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D046UWB002.ant.amazon.com (10.13.139.181) To EX19D014EUC004.ant.amazon.com (10.252.51.182) Filesystem metadata consists of: physical memory extents, superblock, inodes block and allocation bitmap. Here serialisation and deserialisation of all of these is done via the KHO framework. A serialisation callback is added which is run when KHO activate is triggered. This creates the device tree blob for the metadata and marks the memory as persistent via struct kho_mem(s). When the filesystem is mounted it attempts to re-hydrate metadata from KHO. Only if this fails (first boot, for example) then it allocates fresh metadata pages. The privatet data struct is switched from holding a reference to the persistent superblock to now referencing the regular struct super_block. This is necessary for the serialisation code. Better would be to be able to define callback private data, if that were possible. Signed-off-by: James Gowans --- fs/guestmemfs/Makefile | 2 + fs/guestmemfs/guestmemfs.c | 72 ++++++--- fs/guestmemfs/guestmemfs.h | 8 + fs/guestmemfs/serialise.c | 296 +++++++++++++++++++++++++++++++++++++ 4 files changed, 355 insertions(+), 23 deletions(-) create mode 100644 fs/guestmemfs/serialise.c diff --git a/fs/guestmemfs/Makefile b/fs/guestmemfs/Makefile index e93e43ba274b..8b95cac34564 100644 --- a/fs/guestmemfs/Makefile +++ b/fs/guestmemfs/Makefile @@ -4,3 +4,5 @@ # obj-y += guestmemfs.o inode.o dir.o allocator.o file.o + +obj-$(CONFIG_KEXEC_KHO) += serialise.o diff --git a/fs/guestmemfs/guestmemfs.c b/fs/guestmemfs/guestmemfs.c index 38f20ad25286..cf47e5100504 100644 --- a/fs/guestmemfs/guestmemfs.c +++ b/fs/guestmemfs/guestmemfs.c @@ -3,6 +3,7 @@ #include "guestmemfs.h" #include #include +#include #include #include #include @@ -10,7 +11,7 @@ #include phys_addr_t guestmemfs_base, guestmemfs_size; -struct guestmemfs_sb *psb; +struct super_block *guestmemfs_sb; static int statfs(struct dentry *root, struct kstatfs *buf) { @@ -33,26 +34,39 @@ static int guestmemfs_fill_super(struct super_block *sb, struct fs_context *fc) struct inode *inode; struct dentry *dentry; - psb = kzalloc(sizeof(*psb), GFP_KERNEL); - psb->inodes = kzalloc(2 << 20, GFP_KERNEL); - if (!psb->inodes) - return -ENOMEM; - psb->allocator_bitmap = kzalloc(1 << 20, GFP_KERNEL); - if (!psb->allocator_bitmap) - return -ENOMEM; - /* * Keep a reference to the persistent super block in the * ephemeral super block. */ - sb->s_fs_info = psb; - spin_lock_init(&psb->allocation_lock); - guestmemfs_initialise_inode_store(sb); - guestmemfs_zero_allocations(sb); - guestmemfs_get_persisted_inode(sb, 1)->flags = GUESTMEMFS_INODE_FLAG_DIR; - strscpy(guestmemfs_get_persisted_inode(sb, 1)->filename, ".", - GUESTMEMFS_FILENAME_LEN); - psb->next_free_ino = 2; + sb->s_fs_info = guestmemfs_restore_from_kho(); + + if (GUESTMEMFS_PSB(sb)) { + pr_info("Restored super block from KHO\n"); + } else { + struct guestmemfs_sb *psb; + + pr_info("Did not restore from KHO - allocating free\n"); + psb = kzalloc(sizeof(*psb), GFP_KERNEL); + psb->inodes = kzalloc(2 << 20, GFP_KERNEL); + if (!psb->inodes) + return -ENOMEM; + psb->allocator_bitmap = kzalloc(1 << 20, GFP_KERNEL); + if (!psb->allocator_bitmap) + return -ENOMEM; + sb->s_fs_info = psb; + spin_lock_init(&psb->allocation_lock); + guestmemfs_initialise_inode_store(sb); + guestmemfs_zero_allocations(sb); + guestmemfs_get_persisted_inode(sb, 1)->flags = GUESTMEMFS_INODE_FLAG_DIR; + strscpy(guestmemfs_get_persisted_inode(sb, 1)->filename, ".", + GUESTMEMFS_FILENAME_LEN); + GUESTMEMFS_PSB(sb)->next_free_ino = 2; + } + /* + * Keep a reference to this sb; the serialise callback needs it + * and has no oher way to get it. + */ + guestmemfs_sb = sb; sb->s_op = &guestmemfs_super_ops; @@ -98,11 +112,18 @@ static struct file_system_type guestmemfs_fs_type = { .fs_flags = FS_USERNS_MOUNT, }; + +static struct notifier_block trace_kho_nb = { + .notifier_call = guestmemfs_serialise_to_kho, +}; + static int __init guestmemfs_init(void) { int ret; ret = register_filesystem(&guestmemfs_fs_type); + if (IS_ENABLED(CONFIG_FTRACE_KHO)) + register_kho_notifier(&trace_kho_nb); return ret; } @@ -120,13 +141,18 @@ early_param("guestmemfs", parse_guestmemfs_extents); void __init guestmemfs_reserve_mem(void) { - guestmemfs_base = memblock_phys_alloc(guestmemfs_size, 4 << 10); - if (guestmemfs_base) { - memblock_reserved_mark_noinit(guestmemfs_base, guestmemfs_size); - memblock_mark_nomap(guestmemfs_base, guestmemfs_size); - } else { - pr_warn("Failed to alloc %llu bytes for guestmemfs\n", guestmemfs_size); + if (guestmemfs_size) { + guestmemfs_base = memblock_phys_alloc(guestmemfs_size, 4 << 10); + + if (guestmemfs_base) { + memblock_reserved_mark_noinit(guestmemfs_base, guestmemfs_size); + memblock_mark_nomap(guestmemfs_base, guestmemfs_size); + pr_debug("guestmemfs reserved base=%llu from memblocks\n", guestmemfs_base); + } else { + pr_warn("Failed to alloc %llu bytes for guestmemfs\n", guestmemfs_size); + } } + } MODULE_ALIAS_FS("guestmemfs"); diff --git a/fs/guestmemfs/guestmemfs.h b/fs/guestmemfs/guestmemfs.h index 0f2788ce740e..263d995b75ed 100644 --- a/fs/guestmemfs/guestmemfs.h +++ b/fs/guestmemfs/guestmemfs.h @@ -10,11 +10,14 @@ /* Units of bytes */ extern phys_addr_t guestmemfs_base, guestmemfs_size; +extern struct super_block *guestmemfs_sb; struct guestmemfs_sb { /* Inode number */ unsigned long next_free_ino; unsigned long allocated_inodes; + + /* Ephemeral fields - must be updated on deserialise */ struct guestmemfs_inode *inodes; void *allocator_bitmap; spinlock_t allocation_lock; @@ -46,6 +49,11 @@ long guestmemfs_alloc_block(struct super_block *sb); struct inode *guestmemfs_inode_get(struct super_block *sb, unsigned long ino); struct guestmemfs_inode *guestmemfs_get_persisted_inode(struct super_block *sb, int ino); +int guestmemfs_serialise_to_kho(struct notifier_block *self, + unsigned long cmd, + void *v); +struct guestmemfs_sb *guestmemfs_restore_from_kho(void); + extern const struct file_operations guestmemfs_dir_fops; extern const struct file_operations guestmemfs_file_fops; extern const struct inode_operations guestmemfs_file_inode_operations; diff --git a/fs/guestmemfs/serialise.c b/fs/guestmemfs/serialise.c new file mode 100644 index 000000000000..eb70d496a3eb --- /dev/null +++ b/fs/guestmemfs/serialise.c @@ -0,0 +1,296 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "guestmemfs.h" +#include +#include + +/* + * Responsible for serialisation and deserialisation of filesystem metadata + * to and from KHO to survive kexec. The deserialisation logic needs to mirror + * serialisation, so putting them in the same file. + * + * The format of the device tree structure is: + * + * /guestmemfs + * compatible = "guestmemfs-v1" + * fs_mem { + * mem = [ ... ] + * }; + * superblock { + * mem = [ + * persistent super block, + * inodes, + * allocator_bitmap, + * }; + * mappings_block { + * mem = [ ... ] + * }; + * // For every mappings_block mem, which inode it belongs to. + * mappings_to_inode { + * num_inodes, + * mem = [ ... ], + * } + */ + +static int serialise_superblock(struct super_block *sb, void *fdt) +{ + struct kho_mem mem[3]; + int err = 0; + struct guestmemfs_sb *psb = sb->s_fs_info; + + err |= fdt_begin_node(fdt, "superblock"); + + mem[0].addr = virt_to_phys(psb); + mem[0].len = sizeof(*psb); + + mem[1].addr = virt_to_phys(psb->inodes); + mem[1].len = 2 << 20; + + mem[2].addr = virt_to_phys(psb->allocator_bitmap); + mem[2].len = 1 << 20; + + err |= fdt_property(fdt, "mem", &mem, sizeof(mem)); + err |= fdt_end_node(fdt); + + return err; +} + +static int serialise_mappings_blocks(struct super_block *sb, void *fdt) +{ + struct kho_mem *mappings_mems; + struct kho_mem mappings_to_inode_mem; + struct guestmemfs_sb *psb = sb->s_fs_info; + int inode_idx; + size_t num_inodes = PMD_SIZE / sizeof(struct guestmemfs_inode); + struct guestmemfs_inode *inode; + int err = 0; + int *mappings_to_inode; + int mappings_to_inode_idx = 0; + + mappings_to_inode = kzalloc(PAGE_SIZE, GFP_KERNEL); + + mappings_mems = kcalloc(psb->allocated_inodes, sizeof(struct kho_mem), GFP_KERNEL); + + for (inode_idx = 1; inode_idx < num_inodes; ++inode_idx) { + inode = guestmemfs_get_persisted_inode(sb, inode_idx); + if (inode->flags & GUESTMEMFS_INODE_FLAG_FILE) { + mappings_mems[mappings_to_inode_idx].addr = virt_to_phys(inode->mappings); + mappings_mems[mappings_to_inode_idx].len = PAGE_SIZE; + mappings_to_inode[mappings_to_inode_idx] = inode_idx; + mappings_to_inode_idx++; + } + } + + err |= fdt_begin_node(fdt, "mappings_blocks"); + err |= fdt_property(fdt, "mem", mappings_mems, + sizeof(struct kho_mem) * mappings_to_inode_idx); + err |= fdt_end_node(fdt); + + + err |= fdt_begin_node(fdt, "mappings_to_inode"); + mappings_to_inode_mem.addr = virt_to_phys(mappings_to_inode); + mappings_to_inode_mem.len = PAGE_SIZE; + err |= fdt_property(fdt, "mem", &mappings_to_inode_mem, + sizeof(mappings_to_inode_mem)); + err |= fdt_property(fdt, "num_inodes", &psb->allocated_inodes, + sizeof(psb->allocated_inodes)); + + err |= fdt_end_node(fdt); + + return err; +} + +int guestmemfs_serialise_to_kho(struct notifier_block *self, + unsigned long cmd, + void *v) +{ + static const char compatible[] = "guestmemfs-v1"; + struct kho_mem mem; + void *fdt = v; + int err = 0; + + switch (cmd) { + case KEXEC_KHO_ABORT: + /* No rollback action needed. */ + return NOTIFY_DONE; + case KEXEC_KHO_DUMP: + /* Handled below */ + break; + default: + return NOTIFY_BAD; + } + + err |= fdt_begin_node(fdt, "guestmemfs"); + err |= fdt_property(fdt, "compatible", compatible, sizeof(compatible)); + + err |= fdt_begin_node(fdt, "fs_mem"); + mem.addr = guestmemfs_base | KHO_MEM_ADDR_FLAG_NOINIT; + mem.len = guestmemfs_size; + err |= fdt_property(fdt, "mem", &mem, sizeof(mem)); + err |= fdt_end_node(fdt); + + err |= serialise_superblock(guestmemfs_sb, fdt); + err |= serialise_mappings_blocks(guestmemfs_sb, fdt); + + err |= fdt_end_node(fdt); + + pr_info("Serialised extends [0x%llx + 0x%llx] via KHO: %i\n", + guestmemfs_base, guestmemfs_size, err); + + return err; +} + +static struct guestmemfs_sb *deserialise_superblock(const void *fdt, int root_off) +{ + const struct kho_mem *mem; + int mem_len; + struct guestmemfs_sb *old_sb; + int off; + + off = fdt_subnode_offset(fdt, root_off, "superblock"); + mem = fdt_getprop(fdt, off, "mem", &mem_len); + + if (mem_len != 3 * sizeof(struct kho_mem)) { + pr_err("Incorrect mem_len; got %i\n", mem_len); + return NULL; + } + + old_sb = kho_claim_mem(mem); + old_sb->inodes = kho_claim_mem(mem + 1); + old_sb->allocator_bitmap = kho_claim_mem(mem + 2); + + return old_sb; +} + +static int deserialise_mappings_blocks(const void *fdt, int root_off, + struct guestmemfs_sb *sb) +{ + int off; + int len = 0; + const unsigned long *num_inodes; + const struct kho_mem *mappings_to_inode_mem; + int *mappings_to_inode; + int mappings_block; + const struct kho_mem *mappings_blocks_mems; + + /* + * Array of struct kho_mem - one for each persisted mappings + * blocks. + */ + off = fdt_subnode_offset(fdt, root_off, "mappings_blocks"); + mappings_blocks_mems = fdt_getprop(fdt, off, "mem", &len); + + /* + * Array specifying which inode a specific index into the + * mappings_blocks kho_mem array corresponds to. num_inodes + * indicates the size of the array which is the number of mappings + * blocks which need to be restored. + */ + off = fdt_subnode_offset(fdt, root_off, "mappings_to_inode"); + if (off < 0) { + pr_warn("No fs_mem available in KHO\n"); + return -EINVAL; + } + num_inodes = fdt_getprop(fdt, off, "num_inodes", &len); + if (len != sizeof(num_inodes)) { + pr_warn("Invalid num_inodes len: %i\n", len); + return -EINVAL; + } + mappings_to_inode_mem = fdt_getprop(fdt, off, "mem", &len); + if (len != sizeof(*mappings_to_inode_mem)) { + pr_warn("Invalid mappings_to_inode_mem len: %i\n", len); + return -EINVAL; + } + mappings_to_inode = kho_claim_mem(mappings_to_inode_mem); + + /* + * Re-assigned the mappings block to the inodes. Indexes into + * mappings_to_inode specifies which inode to assign each mappings + * block to. + */ + for (mappings_block = 0; mappings_block < *num_inodes; ++mappings_block) { + int inode = mappings_to_inode[mappings_block]; + + sb->inodes[inode].mappings = kho_claim_mem(&mappings_blocks_mems[mappings_block]); + } + + return 0; +} + +static int deserialise_fs_mem(const void *fdt, int root_off) +{ + int err; + /* Offset into the KHO DT */ + int off; + int len = 0; + const struct kho_mem *mem; + + off = fdt_subnode_offset(fdt, root_off, "fs_mem"); + if (off < 0) { + pr_info("No fs_mem available in KHO\n"); + return -EINVAL; + } + + mem = fdt_getprop(fdt, off, "mem", &len); + if (mem && len == sizeof(*mem)) { + guestmemfs_base = mem->addr & ~KHO_MEM_ADDR_FLAG_MASK; + guestmemfs_size = mem->len; + } else { + pr_err("KHO did not contain a guestmemfs base address and size\n"); + return -EINVAL; + } + + pr_info("Reclaimed [%llx + %llx] via KHO\n", guestmemfs_base, guestmemfs_size); + if (err) { + pr_err("Unable to reserve [0x%llx + 0x%llx] from memblock: %i\n", + guestmemfs_base, guestmemfs_size, err); + return err; + } + return 0; +} +struct guestmemfs_sb *guestmemfs_restore_from_kho(void) +{ + const void *fdt = kho_get_fdt(); + struct guestmemfs_sb *old_sb; + int err; + /* Offset into the KHO DT */ + int off; + + if (!fdt) { + pr_err("Unable to get KHO DT after KHO boot?\n"); + return NULL; + } + + off = fdt_path_offset(fdt, "/guestmemfs"); + pr_info("guestmemfs offset: %i\n", off); + + if (!off) { + pr_info("No guestmemfs data available in KHO\n"); + return NULL; + } + err = fdt_node_check_compatible(fdt, off, "guestmemfs-v1"); + if (err) { + pr_err("Existing KHO superblock format is not compatible with this kernel\n"); + return NULL; + } + + old_sb = deserialise_superblock(fdt, off); + if (!old_sb) { + pr_warn("Failed to restore superblock\n"); + return NULL; + } + + err = deserialise_mappings_blocks(fdt, off, old_sb); + if (err) { + pr_warn("Failed to restore mappings blocks\n"); + return NULL; + } + + err = deserialise_fs_mem(fdt, off); + if (err) { + pr_warn("Failed to restore filesystem memory extents\n"); + return NULL; + } + + return old_sb; +} From patchwork Mon Aug 5 09:32:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13753348 Received: from smtp-fw-80009.amazon.com (smtp-fw-80009.amazon.com [99.78.197.220]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1580715442A; Mon, 5 Aug 2024 09:35:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.220 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850534; cv=none; b=SfWajB2BNttR8XO9lI0CzgDnF91O8ze9iibSDuHY000ZoXBFc8jgCoHcpZ3oGyyXUYdM7ZLbwn4o9xJAn9r4854agECJ2L/rfWAcQYfXQOftX0BrzxrwSXmqEprC5WkHZLCGZ/Tgtsw617j9zyAMOZOF8H7ATVQOPL9bYVN9AVg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850534; c=relaxed/simple; bh=HaZglgonvr41X6YnrF6DoFX0ivd5TdLlUhGuSbscGfM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jArQT1MYdv7FJRh1tuViGjIOJcjeTVuLfBjMPWr+OCF6vfoPicd9lipQGX0l6ybH+PR8j1bThsXQTvvoov4USauWQBe6E8SCcj1iQWNKJYRn6iyAzFi/SxdaZj/VOsCCKYYZvdoFFPWA5WSOYvVL5FRt5CZInSHCYOlx2WEbsNo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=TuR4j0nd; arc=none smtp.client-ip=99.78.197.220 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="TuR4j0nd" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1722850533; x=1754386533; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=b/ZDrMWB4VIy9ADRZF2m3pCqPq++QQhsZ6WqVk7Aipc=; b=TuR4j0ndKzOgL0hg3WqnXT0RQqvEPwzHWPYPHcv3ntsuouNSIPkNlsLo 2hwNXAURpd9aCJIpcTnoMizJNZ4LIwD069tg5FBSoqVbFGrt/E/Koz84o gCNZmSnTfqDaftX5u8dH5BkrEVyg4Tns2jVHXN0XYYOUX3xIfijDYmSwC 8=; X-IronPort-AV: E=Sophos;i="6.09,264,1716249600"; d="scan'208";a="112169575" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80009.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Aug 2024 09:35:29 +0000 Received: from EX19MTAEUA001.ant.amazon.com [10.0.43.254:35726] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.45.111:2525] with esmtp (Farcaster) id ba2c37fa-2737-488d-9a39-9aa5f2392569; Mon, 5 Aug 2024 09:35:28 +0000 (UTC) X-Farcaster-Flow-ID: ba2c37fa-2737-488d-9a39-9aa5f2392569 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUA001.ant.amazon.com (10.252.50.192) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:35:27 +0000 Received: from u5d18b891348c5b.ant.amazon.com (10.146.13.113) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:35:18 +0000 From: James Gowans To: CC: James Gowans , Sean Christopherson , Paolo Bonzini , Alexander Viro , Steve Sistare , Christian Brauner , Jan Kara , "Anthony Yznaga" , Mike Rapoport , "Andrew Morton" , , Jason Gunthorpe , , Usama Arif , , Alexander Graf , David Woodhouse , Paul Durrant , Nicolas Saenz Julienne Subject: [PATCH 08/10] guestmemfs: Block modifications when serialised Date: Mon, 5 Aug 2024 11:32:43 +0200 Message-ID: <20240805093245.889357-9-jgowans@amazon.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240805093245.889357-1-jgowans@amazon.com> References: <20240805093245.889357-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D046UWB002.ant.amazon.com (10.13.139.181) To EX19D014EUC004.ant.amazon.com (10.252.51.182) Once the memory regions for inodes, mappings and allocations have been serialised, further modifications would break the serialised data; it would no longer be valid. Return an error code if attempting to create new files or allocate data for files once serialised. Signed-off-by: James Gowans --- fs/guestmemfs/file.c | 19 ++++++++++++++++--- fs/guestmemfs/guestmemfs.c | 1 + fs/guestmemfs/guestmemfs.h | 1 + fs/guestmemfs/inode.c | 6 ++++++ fs/guestmemfs/serialise.c | 8 +++++++- 5 files changed, 31 insertions(+), 4 deletions(-) diff --git a/fs/guestmemfs/file.c b/fs/guestmemfs/file.c index b1a52abcde65..8707a9d3ad90 100644 --- a/fs/guestmemfs/file.c +++ b/fs/guestmemfs/file.c @@ -8,19 +8,32 @@ static int truncate(struct inode *inode, loff_t newsize) unsigned long free_block; struct guestmemfs_inode *guestmemfs_inode; unsigned long *mappings; + int rc = 0; + struct guestmemfs_sb *psb = GUESTMEMFS_PSB(inode->i_sb); + + spin_lock(&psb->allocation_lock); + + if (psb->serialised) { + rc = -EBUSY; + goto out; + } guestmemfs_inode = guestmemfs_get_persisted_inode(inode->i_sb, inode->i_ino); mappings = guestmemfs_inode->mappings; i_size_write(inode, newsize); for (int block_idx = 0; block_idx * PMD_SIZE < newsize; ++block_idx) { free_block = guestmemfs_alloc_block(inode->i_sb); - if (free_block < 0) + if (free_block < 0) { /* TODO: roll back allocations. */ - return -ENOMEM; + rc = -ENOMEM; + goto out; + } *(mappings + block_idx) = free_block; ++guestmemfs_inode->num_mappings; } - return 0; +out: + spin_unlock(&psb->allocation_lock); + return rc; } static int inode_setattr(struct mnt_idmap *idmap, struct dentry *dentry, struct iattr *iattr) diff --git a/fs/guestmemfs/guestmemfs.c b/fs/guestmemfs/guestmemfs.c index cf47e5100504..d854033bfb7e 100644 --- a/fs/guestmemfs/guestmemfs.c +++ b/fs/guestmemfs/guestmemfs.c @@ -42,6 +42,7 @@ static int guestmemfs_fill_super(struct super_block *sb, struct fs_context *fc) if (GUESTMEMFS_PSB(sb)) { pr_info("Restored super block from KHO\n"); + GUESTMEMFS_PSB(sb)->serialised = 0; } else { struct guestmemfs_sb *psb; diff --git a/fs/guestmemfs/guestmemfs.h b/fs/guestmemfs/guestmemfs.h index 263d995b75ed..91cc06ae45a5 100644 --- a/fs/guestmemfs/guestmemfs.h +++ b/fs/guestmemfs/guestmemfs.h @@ -21,6 +21,7 @@ struct guestmemfs_sb { struct guestmemfs_inode *inodes; void *allocator_bitmap; spinlock_t allocation_lock; + bool serialised; }; // If neither of these are set the inode is not in use. diff --git a/fs/guestmemfs/inode.c b/fs/guestmemfs/inode.c index 61f70441d82c..d521b35d4992 100644 --- a/fs/guestmemfs/inode.c +++ b/fs/guestmemfs/inode.c @@ -48,6 +48,12 @@ static unsigned long guestmemfs_allocate_inode(struct super_block *sb) struct guestmemfs_sb *psb = GUESTMEMFS_PSB(sb); spin_lock(&psb->allocation_lock); + + if (psb->serialised) { + spin_unlock(&psb->allocation_lock); + return -EBUSY; + } + next_free_ino = psb->next_free_ino; psb->allocated_inodes += 1; if (!next_free_ino) diff --git a/fs/guestmemfs/serialise.c b/fs/guestmemfs/serialise.c index eb70d496a3eb..347eb8049a71 100644 --- a/fs/guestmemfs/serialise.c +++ b/fs/guestmemfs/serialise.c @@ -111,7 +111,7 @@ int guestmemfs_serialise_to_kho(struct notifier_block *self, switch (cmd) { case KEXEC_KHO_ABORT: - /* No rollback action needed. */ + GUESTMEMFS_PSB(guestmemfs_sb)->serialised = 0; return NOTIFY_DONE; case KEXEC_KHO_DUMP: /* Handled below */ @@ -120,6 +120,7 @@ int guestmemfs_serialise_to_kho(struct notifier_block *self, return NOTIFY_BAD; } + spin_lock(&GUESTMEMFS_PSB(guestmemfs_sb)->allocation_lock); err |= fdt_begin_node(fdt, "guestmemfs"); err |= fdt_property(fdt, "compatible", compatible, sizeof(compatible)); @@ -134,6 +135,11 @@ int guestmemfs_serialise_to_kho(struct notifier_block *self, err |= fdt_end_node(fdt); + if (!err) + GUESTMEMFS_PSB(guestmemfs_sb)->serialised = 1; + + spin_unlock(&GUESTMEMFS_PSB(guestmemfs_sb)->allocation_lock); + pr_info("Serialised extends [0x%llx + 0x%llx] via KHO: %i\n", guestmemfs_base, guestmemfs_size, err); From patchwork Mon Aug 5 09:32:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13753349 Received: from smtp-fw-9106.amazon.com (smtp-fw-9106.amazon.com [207.171.188.206]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AC28D1547CA; Mon, 5 Aug 2024 09:36:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.171.188.206 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850572; cv=none; b=p/Pbom4hmJMF5WhR2siVd1dPHQXsaSzbknLTU9TCXwpzz8P16Jq+X3Th4ik+P4XLAsBj/rawDwqtYqiwu/KZsxJLKqLIWx+LVLFyMhspuRM+HQzP9hn6wqccuAfXVX5JobNnRxATe7IW2buVaMWsBTpkaedyZtyGHIdCWZ0xFaY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850572; c=relaxed/simple; bh=xwSuldwE4yHRJC8JDggPK/sVRYuHgShOqIYd7zewYKs=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=lKuX6ulaDkXbGN0PL8R05nyTQywOysQKe8dIapjzWev6O0cFaPlcq9VOxcPfSlQzqHTex9+eZUcsmp9PwaNH45RrZj88Efyygxl0oYY8Ne7+KBwAzfSetfCl4xwVPjnyq+mVdV+77+Rqh1iCI/SL/K5k2a53oo5vBgWDDxdPK9s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=aur9+Bpu; arc=none smtp.client-ip=207.171.188.206 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="aur9+Bpu" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1722850572; x=1754386572; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SYxgCtdC1svzDJcgLJLCz3gGK5To5Xu2NrqOWTnvUZY=; b=aur9+Bpuy0I6VRNLR1quWtNir4JC6/+tOrHP/zvkm3mxMDJ/Pv95JTVe sY2dPIj14tLylmshb+78F5E0XfLiz9t02YAcodpFGpz7HzF+GXlSFzZ5T Gx9QmVttKy/cgu3OkX6oaXEPyxpzTylln2shS/Plc+xfqRMBKoCvmxNKj A=; X-IronPort-AV: E=Sophos;i="6.09,264,1716249600"; d="scan'208";a="747199432" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-9106.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Aug 2024 09:36:11 +0000 Received: from EX19MTAEUB001.ant.amazon.com [10.0.10.100:34899] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.4.201:2525] with esmtp (Farcaster) id 7ef1e7da-8235-4c57-b9f9-879b313d6daf; Mon, 5 Aug 2024 09:36:09 +0000 (UTC) X-Farcaster-Flow-ID: 7ef1e7da-8235-4c57-b9f9-879b313d6daf Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUB001.ant.amazon.com (10.252.51.26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:36:08 +0000 Received: from u5d18b891348c5b.ant.amazon.com (10.146.13.113) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:35:59 +0000 From: James Gowans To: CC: James Gowans , Sean Christopherson , Paolo Bonzini , Alexander Viro , Steve Sistare , Christian Brauner , Jan Kara , "Anthony Yznaga" , Mike Rapoport , "Andrew Morton" , , Jason Gunthorpe , , Usama Arif , , Alexander Graf , David Woodhouse , Paul Durrant , Nicolas Saenz Julienne Subject: [PATCH 09/10] guestmemfs: Add documentation and usage instructions Date: Mon, 5 Aug 2024 11:32:44 +0200 Message-ID: <20240805093245.889357-10-jgowans@amazon.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240805093245.889357-1-jgowans@amazon.com> References: <20240805093245.889357-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D031UWC002.ant.amazon.com (10.13.139.212) To EX19D014EUC004.ant.amazon.com (10.252.51.182) Describe the motivation for guestmemfs, the functionality it provides, how to compile it in, how to use it as a source of guest memory, how to persist it across kexec and save/restore a VM. Signed-off-by: James Gowans --- Documentation/filesystems/guestmemfs.rst | 87 ++++++++++++++++++++++++ 1 file changed, 87 insertions(+) create mode 100644 Documentation/filesystems/guestmemfs.rst diff --git a/Documentation/filesystems/guestmemfs.rst b/Documentation/filesystems/guestmemfs.rst new file mode 100644 index 000000000000..d6ce0d194cc8 --- /dev/null +++ b/Documentation/filesystems/guestmemfs.rst @@ -0,0 +1,87 @@ +.. SPDX-License-Identifier: GPL-2.0 + +====================================================== +Guestmemfs - Persistent in-memory guest RAM filesystem +====================================================== + +Overview +======== + +Guestmemfs is an in-memory filesystem designed specifically for the purpose of +live update of virtual machines by being a persistent across kexec source of +guest VM memory. + +Live update of a hypervisor refers to act of pausing running VMs, serialising +state, kexec-ing into a new hypervisor image, re-hydraing the KVM guests and +resuming them. To achieve this guest memory must be preserved across kexec. + +Additionally, guestmemfs provides: +- secret hiding for guest memory: the physical memory allocated for guestmemfs + is carved out of the direct map early in boot. +- struct page overhead elimination: guestmemfs memory is not allocated by the + buddy allocator and does not have associated struct pages. +- huge page mappings: allocations are done at PMD size and this improves TLB + performance (work in progress.) + +Compilation +=========== + +Guestmemfs is enabled via CONFIG_GUESTMEMFS_FS + +Persistence across kexec is enabled via CONFIG_KEXEC_KHO + +Usage +===== + +On first boot (cold boot), allocate a large contiguous chunk of memory for +guestmemfs via a kernel cmdline argument, eg: +`guestmemfs=10G`. + +Mount guestmemfs: +mount -t guestmemfs guestmemfs /mnt/guestmemfs/ + +Create and truncate a file which will be used for guest RAM: + +touch /mnt/guesttmemfs/guest-ram +truncate -s 500M /mnt/guestmemfs/guest-ram + +Boot a VM with this as the RAM source and the live update option enabled: + +qemu-system-x86_64 ... \ + -object memory-backend-file,id=pc.ram,size=100M,mem-path=/mnt/guestmemfs/guest-ram,share=yes,prealloc=off \ + -migrate-mode-enable cpr-reboot \ + ... + +Suspect the guest and save the state via QEMU monitor: + +migrate_set_parameter mode cpr-reboot +migrate file:/qemu.sav + +Activate KHO to serialise guestmemfs metadata and then kexec to the new +hypervisor image: + +echo 1 > /sys/kernel/kho/active +kexec -s -l --reuse-cmdline +kexec -e + +After the kexec completes remount guestmemfs (or have it added to fstab) +Re-start QEMU in live update restore mode: + +qemu-system-x86_64 ... \ + -object memory-backend-file,id=pc.ram,size=100M,mem-path=/mnt/guestmemfs/guest-ram,share=yes,prealloc=off \ + -migrate-mode-enable cpr-reboot \ + -incoming defer + ... + +Finally restore the VM state and resume it via QEMU console: + +migrate_incoming file:/qemu.sav + +Future Work +=========== +- NUMA awareness and multi-mount point support +- Actually creating PMD-level mappings in page tables +- guest_memfd style interface for confidential computing +- supporting PUD-level allocations and mappings +- MCE handling +- Persisting IOMMU pgtables to allow DMA to guestmemfs during kexec From patchwork Mon Aug 5 09:32:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13753350 Received: from smtp-fw-52005.amazon.com (smtp-fw-52005.amazon.com [52.119.213.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B23CF154BF8; Mon, 5 Aug 2024 09:36:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850583; cv=none; b=opYu9kZjLtrTqJV5eEnuzKVD9VDpnPoiKwFfvdf12CPFmZWiyE35uFkTcfS6R8NSPnpgA9ivnBkQBU0rJoF6nYU4cYHfvDByEj9xgIft8lVB7AnSCqLbwG5QAeZm9alTDAVceFqgAkNAVTm0OW2jTPSfTLfBwmn8aDdamYkGAVc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722850583; c=relaxed/simple; bh=aId4i3+f2oT9WZCHDhoMCMyZw/EFhFEq6i9DKsHg5xU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=c3+1yDjDZm6sy1z60bQxs1OS+3gwOkOSrCsKY9DbySs44IypaDjRIA7nSTS4iomdHjgDnmTDYHNrLhjeyhDgVsF4B5zPFZlx21iKsL5iLGOCME+YoMkoDejCbd2tipp+xJwRhnP37eLusVP2lLxXQ2PQIuk5PY2kPcVp9wZvdN0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=j04t36xV; arc=none smtp.client-ip=52.119.213.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="j04t36xV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1722850582; x=1754386582; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xIwlsPdKypNAzAdQV72OSHi8d6jH55QygowoOsRKvhI=; b=j04t36xVSDhserkIsv3/6bC9mLu2kZXjJh5IGXYB8c1/5CTHKz3Yom3B KRRa6lV4W8Hgzj5Da4J7mP9VHx5j/sOh2GpSA9jMPoF14J6NhfdqgVlAL DjVjgv3OGcp/jaCi5qIaq4xUOt2FXq3wt8z2pEkX1yJ7VMpfRqSOQngfu A=; X-IronPort-AV: E=Sophos;i="6.09,264,1716249600"; d="scan'208";a="672011613" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52005.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Aug 2024 09:36:20 +0000 Received: from EX19MTAEUB002.ant.amazon.com [10.0.17.79:42749] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.22.69:2525] with esmtp (Farcaster) id 9c651255-34c7-4aea-a776-f6d2f5be7b32; Mon, 5 Aug 2024 09:36:18 +0000 (UTC) X-Farcaster-Flow-ID: 9c651255-34c7-4aea-a776-f6d2f5be7b32 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUB002.ant.amazon.com (10.252.51.79) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:36:17 +0000 Received: from u5d18b891348c5b.ant.amazon.com (10.146.13.113) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 5 Aug 2024 09:36:08 +0000 From: James Gowans To: CC: James Gowans , Sean Christopherson , Paolo Bonzini , Alexander Viro , Steve Sistare , Christian Brauner , Jan Kara , "Anthony Yznaga" , Mike Rapoport , "Andrew Morton" , , Jason Gunthorpe , , Usama Arif , , Alexander Graf , David Woodhouse , Paul Durrant , Nicolas Saenz Julienne Subject: [PATCH 10/10] MAINTAINERS: Add maintainers for guestmemfs Date: Mon, 5 Aug 2024 11:32:45 +0200 Message-ID: <20240805093245.889357-11-jgowans@amazon.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240805093245.889357-1-jgowans@amazon.com> References: <20240805093245.889357-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D031UWC002.ant.amazon.com (10.13.139.212) To EX19D014EUC004.ant.amazon.com (10.252.51.182) Signed-off-by: James Gowans --- MAINTAINERS | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 1028eceb59ca..e9c841bb18ba 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9412,6 +9412,14 @@ S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/pablo/gtp.git F: drivers/net/gtp.c +GUESTMEMFS +M: James Gowans +M: Alex Graf +L: linux-fsdevel@vger.kernel.org +S: Maintained +F: Documentation/filesystems/guestmemfs.rst +F: fs/guestmemfs/ + GUID PARTITION TABLE (GPT) M: Davidlohr Bueso L: linux-efi@vger.kernel.org