From patchwork Thu Dec 12 06:36:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Roth X-Patchwork-Id: 13904714 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25867E7717F for ; Thu, 12 Dec 2024 06:37:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A4B0F6B007B; Thu, 12 Dec 2024 01:37:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9FAE56B0082; Thu, 12 Dec 2024 01:37:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 89B716B0083; Thu, 12 Dec 2024 01:37:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 689BF6B007B for ; Thu, 12 Dec 2024 01:37:26 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 11121A04B1 for ; Thu, 12 Dec 2024 06:37:26 +0000 (UTC) X-FDA: 82885348770.17.488313C Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2045.outbound.protection.outlook.com [40.107.244.45]) by imf20.hostedemail.com (Postfix) with ESMTP id C1D901C000C for ; Thu, 12 Dec 2024 06:36:57 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=jr46Ni3q; dmarc=pass (policy=quarantine) header.from=amd.com; spf=pass (imf20.hostedemail.com: domain of Michael.Roth@amd.com designates 40.107.244.45 as permitted sender) smtp.mailfrom=Michael.Roth@amd.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733985432; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=XavV+GJTNnCd4r5clnnF87A+w66Y3anOR8aX8ksq4fk=; b=Lzd8SHU40yFF9njF5bXwntGz1IsTcnc80JKpmVxyZea87YYzWwdRcEuPLhTyUQ/b1/udeI cr4qViU6z6pI4TZtzqwlYQSS6IBnsoR1Wi/3syxMXRIRNxu+yG5UydZl/47xj7FyrHlu9t yPqCIH5d909e1eMK7+viUj52rfCM8zo= ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1733985432; a=rsa-sha256; cv=pass; b=2hm8T5pVRS3vzB9peTanB6Sbhf1Pf4cnGcjAvWHRqk+NTo2uUBsnUXrl4e5CBwX6/D213H X7ghnrlWegYGcnsx5gI4pSPoN5i4KWVushJBXMO47r+PRHvhsHjN2X9xfDrX/x8d/QCHuE wkfvgE/YXwW56nYaHzxz8iNk36Qdphs= ARC-Authentication-Results: i=2; imf20.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=jr46Ni3q; dmarc=pass (policy=quarantine) header.from=amd.com; spf=pass (imf20.hostedemail.com: domain of Michael.Roth@amd.com designates 40.107.244.45 as permitted sender) smtp.mailfrom=Michael.Roth@amd.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=t+ro+gUn87qn6XIOuBsa+q+KQ73P3biGHh65/CF0IiA8xcUN56ctIGKowqBKEBgUfu4YL5PbGducoC98rMaMToPYZZekoZ8kd3OknCFmXCvRoCZcjqdy26vKzwtar/YfgA5rMCIQogbS3Pvax9uKpL1C1pcXuTv2ZzbhhXnCywxmddFBs4tZPHQbZcgLRbM0weTiEQ5INkSg+1NAwRCTH5iG0/KsJx0C89Rhp5Fvs82vPBcB3QZMJnuLmm+EB4cD0n0QKR6X7/sXtYoDcJMu2vsT1HUtqBn8JF8CE21qqRh50rxSieRmLlyO3eXKpv6wd/HN4SockNNxXQGgG0KeQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=XavV+GJTNnCd4r5clnnF87A+w66Y3anOR8aX8ksq4fk=; b=mm3vn+TXnsCFzWRpFUHvG7+Y9difOD7rd+UmcMYHOyfgalMq9b9dGvwds6qvNjATAA6v1sNyjneZLBNrTzNSR5gVNGls1x7o0PRDfoMUDyxaxqTESrGTxOqTa0GJmSYNk14sKF+G6HsRZS/MXklf1WzyU9TjIR73VFxF3aKCYwOo6S1gmAhwthbKIP/gUKG7C66theqVYRB+iOTuO5KL/nmEOs382BHKqMiVzmkkSWtLkls550HCaNLWw4Trt5ULiGf3jAuZSKIm16OmP4mNH1y8Lk5/R7dRP2V2RwDPscebeEkU7DL8SbX8hIPovTOjicotdguzlDZX2qOubfJxdg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=XavV+GJTNnCd4r5clnnF87A+w66Y3anOR8aX8ksq4fk=; b=jr46Ni3qqWHULxUX7z4YD58XuB2SUs3U+yJQWBJx//bmagHbCiHYusX4Id0xYu1TCB0qlUdRJL7jsvk4s4mPH7MMclyW5fpTq4KPPIKE1ndFp0YSE5SlyasY8RCywgrjGE8evPs0awJms4VrKxJ/gIW8EA+hhf+EfiZEQiM0PeM= Received: from CH2PR19CA0004.namprd19.prod.outlook.com (2603:10b6:610:4d::14) by BL3PR12MB6545.namprd12.prod.outlook.com (2603:10b6:208:38c::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8251.15; Thu, 12 Dec 2024 06:37:18 +0000 Received: from CH1PEPF0000AD7C.namprd04.prod.outlook.com (2603:10b6:610:4d:cafe::cd) by CH2PR19CA0004.outlook.office365.com (2603:10b6:610:4d::14) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8230.23 via Frontend Transport; Thu, 12 Dec 2024 06:37:18 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CH1PEPF0000AD7C.mail.protection.outlook.com (10.167.244.84) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8251.15 via Frontend Transport; Thu, 12 Dec 2024 06:37:18 +0000 Received: from localhost (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Thu, 12 Dec 2024 00:37:17 -0600 From: Michael Roth To: CC: , , , , , , , , , , , , , , , Subject: [PATCH RFC v1 0/5] KVM: gmem: 2MB THP support and preparedness tracking changes Date: Thu, 12 Dec 2024 00:36:30 -0600 Message-ID: <20241212063635.712877-1-michael.roth@amd.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH1PEPF0000AD7C:EE_|BL3PR12MB6545:EE_ X-MS-Office365-Filtering-Correlation-Id: a65610ff-84ff-452a-418f-08dd1a776ca5 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|36860700013|1800799024|376014|7416014; X-Microsoft-Antispam-Message-Info: Vs59m1KOEjlBas8MTOe0s9BtQEUnr8/I9w4gUdX90iusdhBc3jrr+R09iQaMq6TlvCTa/ciln8wxe6z5W8d8z5Zyug1vg8dag/hpcfUwjMR/GDUtpjg1B07MaDy1VfslTBeVAUKMlBRh8HnPGrqfKh9OKhfrgqmhS8ZzKIlKcc+tp3WE8giDUHX79jhWqnLmId0XjeFQJj3nbzV1ARbSHEiNANGeyFRVOXypK7GhOjyEgKhKH8Zyi/I7FyI5z9qHWZOSh9qt8geGnQwy63CdIN/wv/eTQsyXgC4sAi2NrBUP4HZjWSez6CmwWMkYPhgQDH8UmrNHWmBVxOmq5k1CsxN0qTgLwleFuGHOL99czI5GvXm2b09J/PBYQUh+H/vdqziTYmXPXSZtmIwVKTSo2HPI7ZxKuSABfpvlm1Kcw+GKc1CnAw7usz3WGgiVPGDV18gClWqa8Vtwk7E/4LMUcXH64jVWAn1svyxsO0NYzC5/U8SJcM+veBq8yX2TbAcaSzbZTwlaARmb5JwUA9SX6ti3F9UVnI/q1E25fP/Hdbqv6HYPtO/WNPEZih6oazGGIkRE+CLPC6tELA3JIwNUq2RUqQq2vZlDGV1btIW5aASrpl3LQMNW9d3fF/V+B+XlDWe2jk3XjpVUe85U68dLtSBUS+ris35uRKgc8vdxL1Z1tJNPhtbdF0G5lfLiMYl1aPqzZqej3EAy7koihTPkRER7NALflJ+AsLqGhJG8ic8bJ603x0CXnzUNGlSMQUcO8CQE/N02xNmXmg5FE9n7d6SVIX4icyMj+J7PZH8lfG8XBB3YQqf6gvbmhYM9WlmPaAU2Eu6l70b1QPu/a76EmrekWeoxACOyIjk/egU632tHFKl0MWKh9QtvNMVmjmBAN827c2RWCSqd3ek9bMpz7010gGjSikMqmhAV75P9aImOb3gNC0lRuyQuPn+/VEfgMIy9bz2TfFSSJFu6G3ydKR0SalsX2QctoRF2m13FSfMsweJDSWupw97KyoyZy30Q13Y7Vr/iC4APxLlqY6l+hp563ElXr3pbFkfZeL84PmApij9Tp8VL+QzS8Zi52u8TsVeD/YKjcZPD6OZIFz+zMzF0gE4KvdlTPSJUQ9M6Bmzd8YrQw+Q18+pDYomDrHKF+jtUcZw1D4ZSHzwr2sbw3xNxYlpOY1hNQ/4OFKwRYoz/5RemOKGRpDQZd6Rdi6KjtLjE58VvYY7TZQIHTa+4c1hbBW+fPkNy/6RTpoupBanY+LXkZIYI7wWaXvAFBZ8Rs7WN1wCijRpbJuQaZ6RPpBDYtPwmzvPs6h93tZBsoMYzk2a8XAEPsjHwgfX+H6/lHi9Sp2R+YeiNaPQySvGc5NbN0A2TcjS3AuJFuXuyyuSuJf+qUIZh3AGNyNDVeXj1amZ8nbElhaqwJU7ZNpQYksVZSvchWqf6FC78CdttQtXZ+YkXuYaoPCWZLjZVLE0kBWK8g7rNJ1MO51MdonIxxw== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(82310400026)(36860700013)(1800799024)(376014)(7416014);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Dec 2024 06:37:18.1740 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a65610ff-84ff-452a-418f-08dd1a776ca5 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CH1PEPF0000AD7C.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL3PR12MB6545 X-Stat-Signature: 4jaes8zeymczf1q4k8i9j719xtjxjhqm X-Rspamd-Queue-Id: C1D901C000C X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1733985417-3474 X-HE-Meta: U2FsdGVkX19k0ss/xLvKUy3lZE933NptqhdnZiwbciFs9/XdqpFjUBSl3jxUZ157t7UVO8sLqF0CVB1NYier2GjjDtgqZqyzJKly6j1VZIEsjn4pHoa76jTzRLp2m1WVx0rkrxQROv1+YlhXu37OD1otuLrs6jO2CtQ/lapvXUoh/q+CZFAITTnp1L9y7Zs5/BIooKA5ZnRaXtcOwH2jFufV97AViJ84BE5o/eBIRNAn44oKuV8VqpuiYF0/E6z8KvVu1wxaiyUKc/UghalubmQxuFTU1pvOhCX3VbyR9DLT89U84leCUw+r5HcQgyOF5dIGbcDhNWXdTfM1p6sqXDArQP4kEyQ5uYkeAIeZ4feNeq3SwrEIM60CXJmx4wG8sXKHio/YEVytz2M/5XUDXnV08I7XvQI7Fb91NfGcqrggdOjNNnF6ScEosIRfmmLcD1oGRtl3inmLHsujpMOdtEd+x4/FrJBOTHH7mwQxjLrW+Rbcgxo7GViB8J9IIQT5WX6sESDm8QCMUNJcJe/Jpd5pZtv9aEELvVxo6IZOmYVXJsJHhApdA087MbeM3fMj59tlq9QSCCKeS+BSBC0e7sVvxcmrhIO/p1C4VX7HQDyNHM/96dR1lN8OmghschHVk4F9tsrDyvfDa/xGxkD1DQLX6y9n49SAbBGD6cr3qwTDY7seQ0dFpv5O+rgXxDYowzXBvKwqQjf/m4jUVelaPkA9XswORQ8nToaEUJXZUW82wc/KKcwVvAStFLbY6NMlHo7HIrT8YhGddXLjB3/x0kmFpvoLpr/voSeRNiJZkt3DuD3W9VGy4c4EBx22d5PkZu52+w2L/M21gRWNoDmok0fYDdpCnoEprXKkBN6HvMekZ3ui9va+nuLWVTGW0Ko4L4T6Mrqw16YjLeVzDuZCQ5Wf/JDzoOf+X+BUQ5i1Bdb2JpRVBQcV5Fst7S+6wdJZZGe+ivxKpOPFjP2t1n3 5YO0CrtL JJBFkpr1SMlw+Kt9aulUY039IOfJn29mWL0QNNpAHwzPG2a8mAZRoc6YcV5m5d1M5Se3Z7gs7tEzT016IKF1D9Mcdk9Z3ZtTJZS0FPkD8J8PagJyW427ucXgO+dzWvM9AQx3uWffDuGDauJHmJwAPbhgbIaZV3j1eM927bjeFt+BpOuyBjssPkY+65z8kHeIgADT5E3r+yt6bHfOCsmIsZOjn7LuVeKG2MgAUvbbc1npAxj1kRvpW4UV/HICN1kWypgYIUgNjDCo8OCeF4rNEhVNpmpJTpLvy3rbPv66Bx2uoEI/RiY+wGExTnzSAeM6FmLm2JgTipCid7jWickC9ZiYItMNS8O7fCnl6h0G8aJdEutl042wdxbkumtRn+nNKt7AkmtQBjG7ZpiHWdkOc77RKl65fi/qqKuvZKqkpXNFR2GvEOHJR1TCd/wRZ+xVRhxs2z3VzqLm2gZIPRo5EggCMLsp85u7hohAj6mgM3lU8jGePkMBZ3hYBTmjvNZKUXYd1hUg/QALwmuOfM9XIA7JDqO+yoaXfQPoWw/6JNKZH2XjvVQrDYio9Ctnml4h5t9Y2G6vltWCZdIYVDd5yG542P7FfqzPAkvY/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patchset is also available at: https://github.com/amdese/linux/commits/snp-prepare-thp-rfc1 and is based on top of Paolo's kvm-coco-queue-2024-11 tag which includes a snapshot of his patches[1] to provide tracking of whether or not sub-pages of a huge folio need to have kvm_arch_gmem_prepare() hooks issued before guest access: d55475f23cea KVM: gmem: track preparedness a page at a time 64b46ca6cd6d KVM: gmem: limit hole-punching to ranges within the file 17df70a5ea65 KVM: gmem: add a complete set of functions to query page preparedness e3449f6841ef KVM: gmem: allocate private data for the gmem inode [1] https://lore.kernel.org/lkml/20241108155056.332412-1-pbonzini@redhat.com/ This series addresses some of the pending review comments for those patches (feel free to squash/rework as-needed), and implements a first real user in the form of a reworked version of Sean's original 2MB THP support for gmem. It is still a bit up in the air as to whether or not gmem should support THP at all rather than moving straight to 2MB/1GB hugepages in the form of something like HugeTLB folios[2] or the lower-level PFN range allocator presented by Yu Zhao during the guest_memfd call last week. The main arguments against THP, as I understand it, is that THPs will become split over time due to hole-punching and rarely have an opportunity to get rebuilt due to lack of memory migration support for current CoCo hypervisor implementations like SNP (and adding the migration support to resolve that not necessarily resulting in a net-gain performance-wise). The current plan for SNP, as discussed during the first guest_memfd call, is to implement something similar to 2MB HugeTLB, and disallow hole-punching at sub-2MB granularity. However, there have also been some discussions during recent PUCK calls where the KVM maintainers have some still expressed some interest in pulling in gmem THP support in a more official capacity. The thinking there is that hole-punching is a userspace policy, and that it could in theory avoid holepunching for sub-2MB GFN ranges to avoid degradation over time. And if there's a desire to enforce this from the kernel-side by blocking sub-2MB hole-punching from the host-side, this would provide similar semantics/behavior to the 2MB HugeTLB-like approach above. So maybe there is still some room for discussion about these approaches. Outside that, there are a number of other development areas where it would be useful to at least have some experimental 2MB support in place so that those efforts can be pursued in parallel, such as the preparedness tracking touched on here, and exploring how that will intersect with other development areas like using gmem for both shared and private memory, mmap support, guest_memfd library, etc., so my hopes are that this approach could be useful for that purpose at least, even if only as an out-of-tree stop-gap. Thoughts/comments welcome! [2] https://lore.kernel.org/all/cover.1728684491.git.ackerleytng@google.com/ Testing ------- Currently, this series does not default to enabling 2M support, but it can instead be switched on/off dynamically via a module parameter: echo 1 >/sys/module/kvm/parameters/gmem_2m_enabled echo 0 >/sys/module/kvm/parameters/gmem_2m_enabled This can be useful for simulating things like host pressure where we start getting a mix of 4K/2MB allocations. I've used this to help test that the preparedness-tracking still handles things properly in these situations. But if we do decide to pull in THP support upstream it would make more sense to drop the parameter completely. ---------------------------------------------------------------- Michael Roth (4): KVM: gmem: Don't rely on __kvm_gmem_get_pfn() for preparedness KVM: gmem: Don't clear pages that have already been prepared KVM: gmem: Hold filemap invalidate lock while allocating/preparing folios KVM: SEV: Improve handling of large ranges in gmem prepare callback Sean Christopherson (1): KVM: Add hugepage support for dedicated guest memory arch/x86/kvm/svm/sev.c | 163 ++++++++++++++++++++++++++------------------ include/linux/kvm_host.h | 2 + virt/kvm/guest_memfd.c | 173 ++++++++++++++++++++++++++++++++++------------- virt/kvm/kvm_main.c | 4 ++ 4 files changed, 228 insertions(+), 114 deletions(-)