From patchwork Mon Sep 16 16:57:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shivank Garg X-Patchwork-Id: 13805651 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64000C3ABB2 for ; Mon, 16 Sep 2024 16:58:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C38456B008C; Mon, 16 Sep 2024 12:58:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BE8276B0092; Mon, 16 Sep 2024 12:58:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A3B026B0093; Mon, 16 Sep 2024 12:58:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 80BE46B008C for ; Mon, 16 Sep 2024 12:58:35 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 303B7C061D for ; Mon, 16 Sep 2024 16:58:35 +0000 (UTC) X-FDA: 82571210190.03.FA3F55E Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2054.outbound.protection.outlook.com [40.107.220.54]) by imf11.hostedemail.com (Postfix) with ESMTP id 171CF40014 for ; Mon, 16 Sep 2024 16:58:31 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=YkW338mG; dmarc=pass (policy=quarantine) header.from=amd.com; spf=pass (imf11.hostedemail.com: domain of shivankg@amd.com designates 40.107.220.54 as permitted sender) smtp.mailfrom=shivankg@amd.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1726505881; a=rsa-sha256; cv=pass; b=bt9zkYMVcevBv9phkHDNlAYDIdjoA7igfzRYPM1vjYLPuqYROkJseMmuki8IvrahFL9IeQ tnuj8BjhVovxySy5Yo26qDp+ez5DyW6bGRAW7WXZTq4RfgF3O8dXcg5gKWbZCc1h/aXM8R AlSIoOM72MI02B0UTak+axXEY5PQfeg= ARC-Authentication-Results: i=2; imf11.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=YkW338mG; dmarc=pass (policy=quarantine) header.from=amd.com; spf=pass (imf11.hostedemail.com: domain of shivankg@amd.com designates 40.107.220.54 as permitted sender) smtp.mailfrom=shivankg@amd.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726505881; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5wOCBU41KU/9jYAjuxEnaob8uAuZTVT2xJKX1zAfNJ4=; b=zQfKHv7Qdd8dQYHiZO4/NGyZFaw+4OGJh6DuUpWOp+gumq1YSemHG5BEjEerKoQlrJviku FjO+y5OTiKBob+24SgtOY79tXYUBkVC1rrRt6AvnuPDknZSgRbb5bU2kGaGxIwapQNRYaa o5vswdTamV87joTsGlRTE9+dGsUCKy4= ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=v5y6xQPiUWn1DkHwR7oPqcS4vfNLDWC0FGJJDw48UDdk/1KWPt2iQeRU9+UoTXqzxdYZSYm8nRKmgRSmIu6nRgFqSepV1MzuZLT6XGer1fUfiaQS0VSP7K4b0QGuegIq0/+vfR5hjnrgK6aalgKM83q7H43Iqme0CDRAh3t7i39Ss7EUXVvDpypyRtQAghverH/WqIEfVgmG8B6DrTVG3qiHKsn3uxtf+KCYkSaP2ViwVR4/ykDNInD8vzFO7Ma+FDWJMmH4DXzZk2ROIxn83KMoXJZdUqaVPU38bNp7/u01g2SX3fJGheGTLyXhPu3EKU6V4ESHG9w9IUoqY90Pqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5wOCBU41KU/9jYAjuxEnaob8uAuZTVT2xJKX1zAfNJ4=; b=DTWFQBBZJO6MXARF6+3FX8YDTifPLsd3rl8u3iTh0MThXxIMSyW+r0mlmYJGrcj+NJ5emH2YtrVYRuYTXu7vLKVKFqGGvsZel59yKa9xCWRtNoMxSSIez30v5VWUvyw6U7d8BJj/s61j/qn8GrZ9Y8xnNNjRpfd+kCqgVLWlOsaig1H9xxLLBtSqq6Vu+y1qaZAxNTCgtVydKq9qZ75CHX19ICE2U+dYh0qzpQqaPKW1RVAWCSnbo8Za47rHGiIRjUWNpPjfdwaAjMkosubbK2Dm6paV0KS2N38xj6V+mtu1HgQBpAX3vizBFOrLvJLfhLMIVE/ytSp1zW/cFSRoHA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=redhat.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5wOCBU41KU/9jYAjuxEnaob8uAuZTVT2xJKX1zAfNJ4=; b=YkW338mGtmM/rP8DoGGhHvxaaWZyPz217gv0nhKYa607H8Xy4KVzPe03YaOw+H/BDKVtp1Y07yERDK+s4LM0RYm1qVch3XeLPumE8o3FsUhrKzBAGjTij9ZS7DtEPLYhoxYx8wwBnYArgHvPcPU5A2lwIJkr+fk4iZB1XGl6WqU= Received: from SN7P222CA0002.NAMP222.PROD.OUTLOOK.COM (2603:10b6:806:124::14) by CH3PR12MB8725.namprd12.prod.outlook.com (2603:10b6:610:170::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7962.23; Mon, 16 Sep 2024 16:58:26 +0000 Received: from SN1PEPF0002529E.namprd05.prod.outlook.com (2603:10b6:806:124:cafe::8a) by SN7P222CA0002.outlook.office365.com (2603:10b6:806:124::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7939.30 via Frontend Transport; Mon, 16 Sep 2024 16:58:25 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by SN1PEPF0002529E.mail.protection.outlook.com (10.167.242.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7918.13 via Frontend Transport; Mon, 16 Sep 2024 16:58:25 +0000 Received: from kaveri.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 16 Sep 2024 11:58:19 -0500 From: Shivank Garg To: , , , CC: , , , , , , , , , , , , , Subject: [PATCH RFC 1/3] KVM: guest_memfd: Extend creation API to support NUMA mempolicy Date: Mon, 16 Sep 2024 16:57:41 +0000 Message-ID: <20240916165743.201087-2-shivankg@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240916165743.201087-1-shivankg@amd.com> References: <20240916165743.201087-1-shivankg@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SN1PEPF0002529E:EE_|CH3PR12MB8725:EE_ X-MS-Office365-Filtering-Correlation-Id: f9f062da-5119-462f-3b16-08dcd670c77f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|36860700013|1800799024|7416014|82310400026; X-Microsoft-Antispam-Message-Info: sZuALtDDyE42Hp6mjhfDVXmW9wXL+1lX5Uq8rwB9yvdceyK1wqkKEghwGTaLtrvdIVuxgxIIUSXVsl9ZPKfe/nCYQV0lHtAUFcmscwNm/6LHip/SHO4TKjLCSIcFcUpJvKpSOlc8B0ya680oyKcaPHVGhpj8DLerjhMo2vYcWqE5kSvkiywSJh2ETgdXnYzSnS5QYO2DI3DkwYNef2ZCsgKP6dPDpKyKnTK4Q67wuh6jTV+YerJSdCqt9tRdDZUEt4ioj5TTuclVgFiygOtiiouhZsh/fF91HRO16drBeVNZG0TSrpXPc/8qAj28TTA2w1hgkzNQBg8M1K0j1QuQ0unvBuM5+Ez9quCiMOVPI2ESXF/m2LOkyPz8Ymu1SQpk0/myqoKeoX3nCCV+P+/1GezdzCvjS5aBb+kWEsDX8b6xzzZTGuzVIRRgGlZkogeQfaXuDi1DX6zf7yVhR1h9xZjW2fmq42SK1MCnv4qJT43ZTOOqH0xkvo1ewgiscEOjCTzzXX1lSorVv/YrWcKQAA4HbwsTVqnmb2ySKgGxXtKGPdO729SWn678wCf8k3ZoBTsYA9gBrgggMjw3RjWlGpsvFEBemPS3GfysMlcLDX93nbG482gZZokSFjk/zVrT1LFAz5l4exT7QPVxHAoyvk0BBM+7G2VtDz1xAhMaEZDDFR0TtqUBEC6pQHGCQfF8wh5VadFrMLxSTN3Hi9/EHcLbvUEUbUMCZfmI9x+gupLhVhw8T1yP5OzymP1ayYD90Rc1IdKdRyAb0zNsgRDO5SkkiLb6oZeJ1Lv9FOUmAouS+Ir+mAVNQj8IqIsiDgolp3d1cmaQxkePFUfKPPnNv2a/YfFESrI3o+jWyn7/RJIpbIhDhZgPEhuGJpNIHBAnCGRryQER3GOmG9Dhw87WPN8XhQ6QsIq6cC2+S3PdNsQJUY/ozA1f4vSt5ALM4DP39jD/aSFspzFa78EZC0QN7YB30e7eQnAk27yjYtMzToWD35yQ7bt6iEYU39U2XyAP8zhXZeX4k/Ae+IxjkgtBCateatGS+sHif8rCGFA9dzHS88Y3L9pjKB+bfpraeNx3Hs0iUwvtK4A+JT8D5RlH77OMW/wg4vdE7+d3eiTG5bBQRyHWrexkHSfjvvJOU2fWzMuJlzWNscK/SBir4dYzigCgLTqTS2E4giIffOkWmmIgEiNRYpYErfFCuufPKyNd0ya037HKzaAMo87w65JkqBb+XPeafNHBRlvmMGey0X3grGRVFjULDzT0Inaox9es7rzYJwWhwkx5vG1jRamD97RJtzhPCIPZSeqrWzSmlPpMSOL22n6us0EzpagR8UfoXRVJ3VYrC4akyTLzV9E3qXntHpwvvc6GwZjKaqD+uR5UUsbQKJnO863Flokl7t7BCH/dxUJFvG1LfXuZzRisSpQdnLSVZUQm5F+/pVOzQO64qVhwQSiF4LnrMUG1H5UH X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(376014)(36860700013)(1800799024)(7416014)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Sep 2024 16:58:25.0515 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f9f062da-5119-462f-3b16-08dcd670c77f X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SN1PEPF0002529E.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH3PR12MB8725 X-Rspam-User: X-Rspamd-Queue-Id: 171CF40014 X-Rspamd-Server: rspam01 X-Stat-Signature: 5mjfupbse5ipgtngrcth9sqpzze7pf3s X-HE-Tag: 1726505911-231818 X-HE-Meta: U2FsdGVkX1/S1SubU9ZMJmH+FFfEr/QMk6YZQHBYp691LDHbWHMjQls4NMyHQqPKiQ6fZoTUUzfQafFgq5KrGC8vtIktP8jCMAYerKHVamqw8rHIhmu9RwPfRfsElrfF/RkbLwJsjPUKddgdcdLcfp+yw7wS3/FCzwxV0dvaOSsoyBigDLgjwoz6yP1VdBOFnf9kPAhyEA4GOoFzQNPmLFe+F2nkRBt+So7Q6DJAg4GsYkDhOjwA9GYx2zQ3DwgeKmUtNPQ+E1+dTQe38TX63Ve4MYOMpXgJmuavcVRAEHbHDBq7k6tVS1yNbtX80UdWF6M4ycKE+TmiFNxxFpa4V5xNh3cIolZsJNCDXrcWd7LuB7HxL1qsjhV/yVyi9h95I7Tz+7fun56iS3s1C2oHGhofBB5vDeNebbYuo1Rqw7nkwmafhf1FgP6/9uc0wtLlib01j1e8tkHbvzJlPqiKjb/RMp+Pq1Nzbo1ygEquXi9/MHfYOz9enuQRzVGsAfFdJOTRXWKzEHTe8g30eSoDcQhCZ2QERZt8e2k/8bOReUn85FdsPcjFIcxrkaHkubdYOCobdUNAW72RwQkEbqDBjVnKcR4uZVj4DB1gp5Kjm3LF62QqfunRtqz+rcnsyfiEnXl3rXPBJgdhlkykmazkkPrLeHRmOGyoWyF/2WevAPJ6Z9g5j+JczAXw92caoEj6PPR8l9jx4ZE4Z+lCQHb/MAHV7TsttkS/wqREsN4WOVIotewBQ1QyvHeNZm2sdZOQzXXax8e0+T6BnAdz5PDhnyGsY8Ut4RnlGYFbwcOE17CXyneOtm50YW2OdSh/vzT2SD/jDv3DeMzTY217GCfFTuqgrlj2ophDIwIlgpXxAUWvVdQE4PnydY+dKOUDtBy2rL1Bqa1nn8UbieXwFg57fbl8q17RVZpuXZVFP1BVQvdpWcPCAHCBWlTalgSxeU+dz1EWAV/+wnSyFiBhgsP AxOP5+DJ h4wauT1AXGft+6+iI5KXBds02X17cIMSBxywjB/0aJr926hP9P7YXHlxJrxNusXQJEqy+g1WCWg49A4pe0Yah1OswIDG2TJ50DYQJL+ZesR2UDrNjGGw3QQKPKYWbHi1JLuiN80apDO4QQiBX81uRRELNtrAbxt9iEHNi/IQ6Sy7mdPzH5cTu9/k12HX4KCZ1MBjdB0b5Oo83miBOR7Qkt5CKWUOoSWkrw0D/I3WIZKXUDhZbKJDH4MgV/k5j9/rsS67kYSWpWv0M13FSmpXvdX60IWYTP5BFfsqigmSZ8GRKmKgkZk15NAUr2SG4zj3FH5c5iAZkh9AuZTOBg9ty9c+tfJvg9Ms6CaJtaIPKBmKtutZiwLbhdw0zQuHnQIAb0j+VTOlUKeyteUBHUxuEKhAC7JNF+qHM+C8EYNzyyeQx4+/JYfVL3lYBF3cGGFySzDltGit8C3pwkcQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Shivansh Dhiman Extend the API of creating guest-memfd to introduce proper NUMA support, allowing VMM to set memory policies effectively. The memory policy defines from which node memory is allocated. The current implementation of KVM guest-memfd does not honor the settings provided by VMM. While mbind() can be used for NUMA policy support in userspace applications, it is not functional for guest-memfd as the memory is not mapped to userspace. Currently, SEV-SNP guest use guest-memfd as a memory backend and would benefit from NUMA support. It enables fine-grained control over memory allocation, optimizing performance for specific workload requirements. To apply memory policy on a guest-memfd, extend the KVM_CREATE_GUEST_MEMFD IOCTL with additional fields related to mempolicy. - mpol_mode represents the policy mode (default, bind, interleave, or preferred). - host_nodes_addr denotes the userspace address of the nodemask, a bit mask of nodes containing up to maxnode bits. - First bit of flags must be set to use mempolicy. Store the mempolicy struct in i_private_data of the memfd's inode, which is currently unused in the context of guest-memfd. Signed-off-by: Shivansh Dhiman Signed-off-by: Shivank Garg --- Documentation/virt/kvm/api.rst | 13 ++++++++- include/linux/mempolicy.h | 4 +++ include/uapi/linux/kvm.h | 5 +++- mm/mempolicy.c | 52 ++++++++++++++++++++++++++++++++++ tools/include/uapi/linux/kvm.h | 5 +++- virt/kvm/guest_memfd.c | 21 ++++++++++++-- virt/kvm/kvm_mm.h | 3 ++ 7 files changed, 97 insertions(+), 6 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index b3be87489108..dcb61282c773 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6346,7 +6346,10 @@ and cannot be resized (guest_memfd files do however support PUNCH_HOLE). struct kvm_create_guest_memfd { __u64 size; __u64 flags; - __u64 reserved[6]; + __u64 host_nodes_addr; + __u16 maxnode; + __u8 mpol_mode; + __u8 reserved[37]; }; Conceptually, the inode backing a guest_memfd file represents physical memory, @@ -6367,6 +6370,14 @@ a single guest_memfd file, but the bound ranges must not overlap). See KVM_SET_USER_MEMORY_REGION2 for additional details. +NUMA memory policy support for KVM guest_memfd allows the host to specify +memory allocation behavior for guest NUMA nodes, similar to mbind(). If +KVM_GUEST_MEMFD_NUMA_ENABLE flag is set, memory allocations from the guest +will use the specified policy and host-nodes for physical memory. +- mpol_mode refers to the policy mode: default, preferred, bind, interleave, or + preferred. +- host_nodes_addr points to bitmask of nodes containing up to maxnode bits. + 4.143 KVM_PRE_FAULT_MEMORY --------------------------- diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 1add16f21612..468eeda2ec2f 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -299,4 +299,8 @@ static inline bool mpol_is_preferred_many(struct mempolicy *pol) } #endif /* CONFIG_NUMA */ + +struct mempolicy *create_mpol_from_args(unsigned char mode, + const unsigned long __user *nmask, + unsigned short maxnode); #endif diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 637efc055145..fda6cbef0a1d 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1561,7 +1561,10 @@ struct kvm_memory_attributes { struct kvm_create_guest_memfd { __u64 size; __u64 flags; - __u64 reserved[6]; + __u64 host_nodes_addr; + __u16 maxnode; + __u8 mpol_mode; + __u8 reserved[37]; }; #define KVM_PRE_FAULT_MEMORY _IOWR(KVMIO, 0xd5, struct kvm_pre_fault_memory) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index b858e22b259d..9e9450433fcc 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -3557,3 +3557,55 @@ static int __init mempolicy_sysfs_init(void) late_initcall(mempolicy_sysfs_init); #endif /* CONFIG_SYSFS */ + +#ifdef CONFIG_KVM_PRIVATE_MEM +/** + * create_mpol_from_args - create a mempolicy structure from args + * @mode: NUMA memory policy mode + * @nmask: bitmask of NUMA nodes + * @maxnode: number of bits in the nodes bitmask + * + * Create a mempolicy from given nodemask and memory policy such as + * default, preferred, interleave or bind. + * + * Return: error encoded in a pointer or memory policy on success. + */ +struct mempolicy *create_mpol_from_args(unsigned char mode, + const unsigned long __user *nmask, + unsigned short maxnode) +{ + struct mm_struct *mm = current->mm; + unsigned short mode_flags; + struct mempolicy *mpol; + nodemask_t nodes; + int lmode = mode; + int err = -ENOMEM; + + err = sanitize_mpol_flags(&lmode, &mode_flags); + if (err) + return ERR_PTR(err); + + err = get_nodes(&nodes, nmask, maxnode); + if (err) + return ERR_PTR(err); + + mpol = mpol_new(mode, mode_flags, &nodes); + if (IS_ERR_OR_NULL(mpol)) + return mpol; + + NODEMASK_SCRATCH(scratch); + if (!scratch) + return ERR_PTR(-ENOMEM); + + mmap_write_lock(mm); + err = mpol_set_nodemask(mpol, &nodes, scratch); + mmap_write_unlock(mm); + NODEMASK_SCRATCH_FREE(scratch); + + if (err) + return ERR_PTR(err); + + return mpol; +} +EXPORT_SYMBOL(create_mpol_from_args); +#endif diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h index e5af8c692dc0..e3effcd1e358 100644 --- a/tools/include/uapi/linux/kvm.h +++ b/tools/include/uapi/linux/kvm.h @@ -1546,7 +1546,10 @@ struct kvm_memory_attributes { struct kvm_create_guest_memfd { __u64 size; __u64 flags; - __u64 reserved[6]; + __u64 host_nodes_addr; + __u16 maxnode; + __u8 mpol_mode; + __u8 reserved[37]; }; #define KVM_PRE_FAULT_MEMORY _IOWR(KVMIO, 0xd5, struct kvm_pre_fault_memory) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index e930014b4bdc..8f1877be4976 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -4,6 +4,7 @@ #include #include #include +#include #include "kvm_mm.h" @@ -445,7 +446,8 @@ static const struct inode_operations kvm_gmem_iops = { .setattr = kvm_gmem_setattr, }; -static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) +static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags, + struct mempolicy *pol) { const char *anon_name = "[kvm-gmem]"; struct kvm_gmem *gmem; @@ -478,6 +480,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) inode->i_private = (void *)(unsigned long)flags; inode->i_op = &kvm_gmem_iops; inode->i_mapping->a_ops = &kvm_gmem_aops; + inode->i_mapping->i_private_data = (void *)pol; inode->i_mode |= S_IFREG; inode->i_size = size; mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); @@ -505,7 +508,8 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) { loff_t size = args->size; u64 flags = args->flags; - u64 valid_flags = 0; + u64 valid_flags = GUEST_MEMFD_NUMA_ENABLE; + struct mempolicy *mpol = NULL; if (flags & ~valid_flags) return -EINVAL; @@ -513,7 +517,18 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) if (size <= 0 || !PAGE_ALIGNED(size)) return -EINVAL; - return __kvm_gmem_create(kvm, size, flags); + if (flags & GUEST_MEMFD_NUMA_ENABLE) { + unsigned char mode = args->mpol_mode; + unsigned short maxnode = args->maxnode; + const unsigned long __user *user_nmask = + (const unsigned long *)args->host_nodes_addr; + + mpol = create_mpol_from_args(mode, user_nmask, maxnode); + if (IS_ERR_OR_NULL(mpol)) + return PTR_ERR(mpol); + } + + return __kvm_gmem_create(kvm, size, flags, mpol); } int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index 715f19669d01..3dd8495ae03d 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -36,6 +36,9 @@ static inline void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, #endif /* HAVE_KVM_PFNCACHE */ #ifdef CONFIG_KVM_PRIVATE_MEM +/* Flag to check NUMA policy while creating KVM guest-memfd. */ +#define GUEST_MEMFD_NUMA_ENABLE BIT_ULL(0) + void kvm_gmem_init(struct module *module); int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args); int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,