From patchwork Wed Mar 19 19:22:03 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Shivank Garg X-Patchwork-Id: 14023034 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 446CEC35FFA for ; Wed, 19 Mar 2025 19:23:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5E4B6280004; Wed, 19 Mar 2025 15:23:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 59560280001; Wed, 19 Mar 2025 15:23:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 40ECB280004; Wed, 19 Mar 2025 15:23:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 1CC34280001 for ; Wed, 19 Mar 2025 15:23:16 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 8A18314071E for ; Wed, 19 Mar 2025 19:23:15 +0000 (UTC) X-FDA: 83239273950.16.192696C Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2064.outbound.protection.outlook.com [40.107.237.64]) by imf22.hostedemail.com (Postfix) with ESMTP id A2319C0014 for ; Wed, 19 Mar 2025 19:23:12 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=hg9hnGpX; dmarc=pass (policy=quarantine) header.from=amd.com; spf=pass (imf22.hostedemail.com: domain of shivankg@amd.com designates 40.107.237.64 as permitted sender) smtp.mailfrom=shivankg@amd.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742412192; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=JpjelLZrm3L9ypISzT7yGOZZBcLBbq1TDSdcmS8EePU=; b=e7T0TByiROoE968q9jbEeRIQ1ir87CLR0KY8uyqkHQ7DdJkI4xmGj1x+L32nB4fOu2RGOy qHkoq8+z6MzNq0JELBuDQLebTAy1LrMpmd+1CZb50mQnK+LRRc2W9pJQKGa6BL3C/7chqc VJ7dBFm+AQichyD61bR6luPRxSB5cJQ= ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1742412192; a=rsa-sha256; cv=pass; b=zWeVkTO6UXbSVIkGc1rBUZAXlCBHIhErAM1aDJc+a9qnLv3x9ykbXlRaXiuVPPJv0B1TKL R6ZDAR48bdGqFQhfSR2X6YDeb18pYt3t5hGqs+wNE5chqmPF6W2j3Jp+KizETYxpuEQW6d UYlVyvvH8vvOX79EM3Uj9pZAypq4r8o= ARC-Authentication-Results: i=2; imf22.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=hg9hnGpX; dmarc=pass (policy=quarantine) header.from=amd.com; spf=pass (imf22.hostedemail.com: domain of shivankg@amd.com designates 40.107.237.64 as permitted sender) smtp.mailfrom=shivankg@amd.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=X6RtF37uZgxAnUpRcGB2iA6I2JrfC9w95uceI8SR5I/0DvXt9v6xYS5w+Hkbk+5jcbruq9fig0KqVT6mVoK4wglpK31AaXfdX4eZWCWrskF3of0Nmij3zHeS/0qR1XPo6E6JB0zi3nvs4fRZgYaXQnOWG18Y6pRKMJn9YtrmVQfWA9n6W29ui4DGy+nOe/98IXnY8sc/UUfkob2Wn5zoJdoScqL+01A2tyu+Gw3iv5fy6W6lX9hOyN8yWSpUoLHf8P23WVxieMFBeQe/0JKkcrP8N8+3XAXkoSjctKRYwazNhpvll8pKrUWdin0WRPYGo4XZFcU80xzWzcsVSao5Ig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=JpjelLZrm3L9ypISzT7yGOZZBcLBbq1TDSdcmS8EePU=; b=AccDCOEzyMEyKXnRQ04J3ey0D945vuOiQMNyRVWGMwJJFfo5eodf5+STVfXin1iJsnAjm6jXE9tBX9Tq6Um4AEFzBm6/IlLqqnlYIlG1R6ylGiZft/uhdMQHEb11g+cCYqfB9tjZtyJxEWtF6T1YVY2eJ8leanbMtw76Y0Q2CHPDTnI2Qhtb6IssgD079bNHFhEGSgWGmnhGC3FRhyKOOCkKO0J/ezk217twiTiTKD/U6po7g+o+VYqwQTH16CA/+opZghvcVslme2PPlFOqTohEUwo4FJ0ORmK5DMrrRQ5rGI0ydDsj8jsFPHPj2QgWJN1Qrgy4XIG3IJedt+vang== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=linux-foundation.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JpjelLZrm3L9ypISzT7yGOZZBcLBbq1TDSdcmS8EePU=; b=hg9hnGpXfpL8uAcNf44mbMxd+ctjzqG7yjLa7p25JGyidWNTL9U1SDC0Gf3E2Ba/j5W6ahja3Sz/T0DFNVogkmdgy0KTomfzM0989Oz3NGTjuk3rSMN1zEWzywmGRKil28vSAWqNjQHr3ntnJq/aIxwZIjA0zaRch0HE2D5MBBA= Received: from BYAPR07CA0080.namprd07.prod.outlook.com (2603:10b6:a03:12b::21) by SA1PR12MB7200.namprd12.prod.outlook.com (2603:10b6:806:2bb::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8534.33; Wed, 19 Mar 2025 19:23:08 +0000 Received: from CY4PEPF0000EDD2.namprd03.prod.outlook.com (2603:10b6:a03:12b:cafe::4c) by BYAPR07CA0080.outlook.office365.com (2603:10b6:a03:12b::21) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8534.34 via Frontend Transport; Wed, 19 Mar 2025 19:23:08 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CY4PEPF0000EDD2.mail.protection.outlook.com (10.167.241.198) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8534.20 via Frontend Transport; Wed, 19 Mar 2025 19:23:08 +0000 Received: from kaveri.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 19 Mar 2025 14:22:59 -0500 From: Shivank Garg To: , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH RFC V2 0/9] Enhancements to Page Migration with Multi-threading and Batch Offloading to DMA Date: Wed, 19 Mar 2025 19:22:03 +0000 Message-ID: <20250319192211.10092-1-shivankg@amd.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000EDD2:EE_|SA1PR12MB7200:EE_ X-MS-Office365-Filtering-Correlation-Id: 9ed11274-5489-4745-dc1a-08dd671b7b15 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|36860700013|82310400026|1800799024|376014; X-Microsoft-Antispam-Message-Info: =?utf-8?q?kEsZCZFgu2fQkbi84+sRgAfYT2G/GnG?= =?utf-8?q?CtFUCMUUWiP63VkrKO7u1U2MM78hfXQkHQGS79aZAwuX/ZLRYje+dht91ySgw3A0B?= =?utf-8?q?SXcTLyx5FLFFaAEK6TZqNNjdNtyoz7cW03bRCtt1CsMY8aNPN8DaDHIu9WI38t+n2?= =?utf-8?q?zVxVHDDIYPc7FwALDRH6cnr24UykJU8ec7XLKH6AZIY6eZuHdPTd7pK5634xs4dCf?= =?utf-8?q?1lQ8tORzTXUKyDlDym8kkBn6t8JJjzG1JxzwRB7v4oLOiFsnglOtnXaTjeTBHp7jQ?= =?utf-8?q?UzjMaBT1meD763z304ffHhjAzz7Wtrb2NFyOeWe6kn6jYe8vtAPlKeBHgBlzGLoon?= =?utf-8?q?TfYtx0NHoyHEuwqE//OsfAj0p9Slw/IEG6lexU/XKBPLrO5xnBE6W7R/dFqdHrrhd?= =?utf-8?q?TkD5z5GAH3gvKUBK4W52cVz5RfG1y21KhW7QpNg5UolliE6R3haaTi62GfRSX0SHR?= =?utf-8?q?hCQuWgoSAGaftfmCNH3/BuJKdjK9W7KynAkwErHakFliuoXfFARL7zk/h7bufSddO?= =?utf-8?q?EMj4/pIX2W/605exFMyeAReVOSmNd3X0ZDremNDT4qZTJK/V+CEl5didbUykqoFru?= =?utf-8?q?u7Y1ZjOwERD5H/yDufSWyu0vCTnAiVuARWvdHHnHNDdi36zy7p08C7HiyW9A34WI1?= =?utf-8?q?7y18mGgA4t2jKBO4CznlirTGdjKC852ZzFFrZSPerMYeaqWYoRNy3SEnvfD6tmggt?= =?utf-8?q?BrmtNWXk25TOyoT+uv1142vpgKkn/kF3plfj4dcwfwWspnSuXqnjI7S2YySJijknO?= =?utf-8?q?jV+GkiJU/N7BEtwCkO+o4+CSxTFfGxi7Ta76T5FkBG95Bn/TBtUc9u0/MJOV4HTxI?= =?utf-8?q?BI2bXBDj+ToL4QDEkNUL1fTLNvLAJJ+AG+jpg3WuixouWAu/iY/a6d8Riwqu6OBur?= =?utf-8?q?qzM0wmaYlrXmRdjnj6bG/JldHi/poeKGY3+sQlCYu3bVE1bdCknmXq1/JiZ1QyFOV?= =?utf-8?q?y79UQgWYWxksWZMFUuZs6swhSTzm/aWCi/gWnmESlIr4m2tENGjSsrrkMPWHmQqL9?= =?utf-8?q?WKWSgC9Sp5MAZSRQtbJJPdBOfcYj7czh1B+bj2I0dqrpkb1UPS2S/em3fLpEuhRl2?= =?utf-8?q?pEWbXuqqggHTZy39nfVok2FqPfiQ+6K+75xMg2okcKSk/kt4c7TuFZ6e7KWX0rpBS?= =?utf-8?q?B/xCuA2vEPAfshgkUuz1UL538UyaDSOsaojlaCw2OxbLG5ztiut4TH0L2P83rvEHj?= =?utf-8?q?fiJ8PBI5IUVfbRC1Cje/tQ+56t9pWCmf+Bfl2WT1HgExAMvfd21OKnTw5bI1kDj8v?= =?utf-8?q?Dp5VH0JdDxmPMXQpbEAFNogYlyzTx9sSclyGPIEHg/fFH6uBcqknmmlH1/srbVcCX?= =?utf-8?q?tvheg4aaTTatia/VXCXELhbatk7wtg4vSZY4FEEy5dfWpwVQmG8VWd6O225IyCy40?= =?utf-8?q?Mi8GtqxFTki5NJQsoQZI2tVlqok98hfQLlHmhdMwVKpVF0TlFCoEgg=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(7416014)(36860700013)(82310400026)(1800799024)(376014);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Mar 2025 19:23:08.1724 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9ed11274-5489-4745-dc1a-08dd671b7b15 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000EDD2.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR12MB7200 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: A2319C0014 X-Stat-Signature: tox4jpfbiaykm4fh3skn75339cnqwoti X-HE-Tag: 1742412192-85123 X-HE-Meta: U2FsdGVkX190dAbUJh19NJjngnwRI4ebN2WLYTIpvgxwJsi5oz5Uh/Mxlmby7IqI+a8RoCszDzvfJcFYLBMlHzJ1F3c+J6pXPM7IKZyo7tZZWjIVfsg8WAltoHrXlato6AKH0agIlC3Qwb1WDqDha5djr6zbAp85+gircOAfNlJeIc0y9/RaDMnWrV3vvxaIePxFOZeExv1RC64Jc5lSEboFQDCrqrpDk7xvqRPJrOGDyYCxmP0HGBNM2sbQh+PVnE0KTOPsz/Xhf4lbVdLYk5jC0YxygpAvnNXUCvHX669qfWdMYXJdD9d4w8RCwnIQSoJKesqYvaOi6rZ9jRW7VODQxNibSg9h1/k04V5itF8CHgivdjml5V3aLUJTgIzBoDsseKjJtmiblHZ0NHVOAvUWMJTPw8Jd17xUGMwkSfFUfIYzhs/672rMwF8lRM3aRPqTQzU1uxRKJhlkxfpSn2ynHOLvI6qhYQaXS1r36Xz07ObslF9iLlDaM0lCHGxyKbToUhm3w1jFXODrsgzqieyDbTjKtWkLCTmYwwn94IVCCkFMX+SJg0Ll0w8ow9wgG7jioLVBxVrRBVqTxEYZ7q/GztQHCttiyefVjHSB/6pwnGzOJWVWEshasP0ldprPkjcjMwhiKPwHnPszMvi96Km7yDnvXhDpbetuhyV4/o3XcQAjXjBwwmgFMF6i6Rkgkk5swwaHZ5lc6d+Mmzz0WdNL4h8URxGBRzOeR+9CNVDxKh885/ucKX2svI7EeCJAq+FWaAFGFXNKPXRjIBMl3CAYWcXKcaMyTTr55JN6gt997SFKgNCVMkgNd1TQRE4Iq7vnxojnTh1YTMwZHVS/vxk5doeAd8tEKLUBt6jeue7vvuoaK3AfiyF5qLg/gCYhFlxXPrF31O19uNdTnYh6wj8p9yrJxMhk9mkza2ZzcFxaHfTsyXNCzw8At7cfzoG15yuIvpkFDagGJcm6Km5 jItBZwG6 iU/gurwK6jNoB/0o4z0EOmH9+zyX5YjlBDgrohALleJuRjCAzeS5NfdJ7f2sbpa3+jGBki0MzCPmaGPe1saPpXIkhheHdXBLDrgB4Zi34jVH35VWkRnK617y+4CmWd1E0MuFHqXj1Y4lMgPCHu9yPJqEJTATOZXU4qyWI1kKbLVJuS885JCRuaadFtzsaR6Nefx9KnB4ZJSJ1Y+ulzeGTZq90TxZByQC4D9j9RuCmuidW9tYgGLX+NmJufoUHT9XlWl3z8YvnhshJ68Hcy/VRuY8VipzooCe7SL7Hoj3LmU0PjUAFmQ1OcE2rYBCVi38CM1NENTaCbhhgETOG+8B+nkfcl2YjvafOnYV8g6Cviy6m02I7Rgq2csH590mUJZ2gxoEeSGK1TcmE5Z/vly3RKCb3gV14WyjpEWz6xSKmTAhLrnRr6KPrEt6Twcnx7yWDGgEjsmaoSZi/8l4nIDqi9Ky8NF+eIdoOlT4mwDDcXPOrasf64dWSiJT3CcRVuhPwq/3hxliX3CTEXoaJlxTBKZ5S3o0htsCR/93bDh8f252fDxNGGaqYNXJLhqfHNoJYifGzhyjqxlGcrP2nogzK4MCf99ZYlIpR3cJLgSQneYWBw7S3oMZ7GuVrxEpP0Atyup9co8147/nvfdoZsSBEPAMT5g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.006432, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patchset introduces enhancements to the page migration by batching folio-copy operations and using multiple CPU threads for copying or offloading the copy to DMA hardware. It builds upon Zi's work on accelerating page migration via multi-threading[1] and my previous work on enhancing page migration with batch offloading via DMA[2]. MOTIVATION: ----------- Page migration costs have become increasingly critical in modern systems with memory-tiers and NUMA nodes: 1. Batching folio copies increases throughput, especially for base page migrations where kernel activities (moving folio metadata, updating page table entries) create overhead between individual copies. This is particularly important for smaller page-sizes (4KB on x86_64/ARM64, 64KB on ARM64). 2. Current simple serial copy patterns underutilize modern hardware capabilities, leaving memory migration bandwidth capped by single-threaded CPU-bound operations. These improvements are particularly valuable in: - Large-scale tiered-memory systems with CXL nodes and HBM - CPU-GPU coherent systems with GPU memory exposed as NUMA nodes - Systems where frequent page promotion/demotion occurs Following the trend of batching operations in the memory migration core path (batch migration, batch TLB flush), batch copying folio content is the logical next step. Modern systems equipped with powerful hardware accelerators (DMA engines), GPUs, and high CPU core counts offer untapped potential for hardware acceleration. DESIGN: ------- The patchset implements three key enhancements: 1. Batching: - Current approach: Process each folio individually for_each_folio() { Copy folio metadata like flags and mappings Copy the folio content from src to dst Update page tables with dst folio } - New approach: Process in batches for_each_folio() { Copy folio metadata like flags and mappings } Batch copy all src folios to dst for_each_folio() { Update page tables with dst folios } 2. Multi-Threading: - Distribute folio batch copy operations across multiple CPU threads. 3. DMA Offload: - Leverage DMA engines designed for high copy throughput. - Distribute folio batch-copy across mutliple DMA channels. PERFORMANCE RESULTS: ------------------- System Info: Testing environment: AMD Zen 3 EPYC server (2-sockets, 32 cores, SMT Enabled), 1 NUMA node per socket, Linux Kernel 6.14.0-rc7+, DVFS set to Performance, PTDMA hardware. Measurement: Throughput (GB/s) 1. Varying folio-size with different parallel threads/channels: Move different sized folios (mTHP - 4KB, 16KB,..., 2MB) such that total transfer size is constant (1GB), with different number of parallel threads/channels. a. Multi-Threaded CPU Folio Size--> Thread Cnt | 4K | 16K | 32K | 64K | 128K | 256K | 512K | 1M | 2M | =============================================================================================================== 1 | 1.72±0.05| 3.55±0.14| 4.44±0.07| 5.19±0.37| 5.57±0.47| 6.27±0.02 | 6.43±0.09 | 6.59±0.05 | 10.73±0.07| 2 | 1.93±0.06| 3.91±0.24| 5.22±0.03| 5.76±0.62| 7.42±0.16| 7.30±0.93 | 8.08±0.85 | 8.67±0.09 | 17.21±0.28| 4 | 2.00±0.03| 4.30±0.22| 6.02±0.10| 7.61±0.26| 8.60±0.92| 9.54±1.11 | 10.03±1.12| 10.98±0.14| 29.61±0.43| 8 | 2.07±0.08| 4.60±0.32| 6.06±0.85| 7.52±0.96| 7.98±1.83| 8.66±1.94 | 10.99±1.40| 11.22±1.49| 37.42±0.70| 16 | 2.04±0.04| 4.74±0.31| 6.20±0.39| 7.51±0.86| 8.26±1.47| 10.99±0.11| 9.72±1.51 | 12.07±0.02| 37.08±0.53| b. DMA Offload Folio Size--> Channel Cnt| 4K | 16K | 32K | 64K | 128K | 256K | 512K | 1M | 2M | ============================================================================================================ 1 | 0.46±0.01| 1.35±0.02| 1.99±0.02| 2.76±0.02| 3.44±0.17| 3.87±0.20| 3.98±0.29| 4.36±0.01| 11.79±0.05| 2 | 0.66±0.02| 1.84±0.07| 2.89±0.10| 4.02±0.30| 4.27±0.53| 5.98±0.05| 6.15±0.50| 5.83±0.64| 13.39±0.08| 4 | 0.91±0.01| 2.62±0.13| 3.98±0.17| 5.57±0.41| 6.55±0.70| 8.32±0.04| 8.91±0.05| 8.82±0.96| 24.52±0.22| 8 | 1.14±0.00| 3.21±0.07| 4.21±1.09| 6.07±0.81| 8.80±0.08| 8.91±1.38|11.03±0.02|10.68±1.38| 39.17±0.58| 16 | 1.19±0.11| 3.33±0.20| 4.98±0.33| 7.65±0.10| 7.85±1.50| 8.38±1.35| 8.94±3.23|12.85±0.06| 55.45±1.20| Inference: - Throughput increases with folio size. Higher Size folios benefit more from DMA. - Multi-threading and DMA offloading both provide significant gains. 2. Varying folio count (total transfer size) 2MB folio-size, use only 1 thread a. CPU Multi-Threaded Folio Count| GB/s ====================== 1 | 7.56±3.23 8 | 9.54±1.34 64 | 9.57±0.39 256 | 10.09±0.17 512 | 10.61±0.17 1024 | 10.77±0.07 2048 | 10.81±0.08 8192 | 10.84±0.05 b. DMA offload Folio Count| GB/s ====================== 1 | 8.21±3.68 8 | 9.92±2.12 64 | 9.90±0.31 256 | 11.51±0.32 512 | 11.67±0.11 1024 | 11.89±0.06 2048 | 11.92±0.08 8192 | 12.03±0.05 Inference: - Throughput increase with folios count but plateaus after a threshold. (The migrate_pages function uses a folio batch size of 512) 3. CPU Threads scheduling Analyze effect of CPU topology a. Spread Across different CCDs Threads | GB/s ======================== 1 | 10.60±0.06 2 | 17.21±0.12 4 | 29.94±0.16 8 | 37.07±1.62 16 | 36.19±0.97 b. Fill one CCD completely before moving to next one Threads | GB/s ======================== 1 | 10.44±0.47 2 | 10.93±0.11 4 | 10.99±0.04 8 | 11.08±0.03 16 | 17.91±0.12 Inference: - Hardware topology matters. On AMD systems, distributing copy threads across CCDs utilizes bandwidth better TODOs: We can further experiments to: - Characterize system behavior and develop heuristics - Analyze remote/local CPU scheduling impacts - Measure DMA setup overheads - Evaluate costs to userspace - Study cache hotness/pollution effects - DMA cost with different system I/O load [1] https://lore.kernel.org/linux-mm/20250103172419.4148674-1-ziy@nvidia.com [2] https://lore.kernel.org/linux-mm/20240614221525.19170-1-shivankg@amd.com [3] LSFMM Proposal: https://lore.kernel.org/all/cf6fc05d-c0b0-4de3-985e-5403977aa3aa@amd.com Mike Day (1): mm: add support for copy offload for folio Migration Shivank Garg (4): mm: batch folio copying during migration mm/migrate: add migrate_folios_batch_move to batch the folio move operations dcbm: add dma core batch migrator for batch page offloading mtcopy: spread threads across die for testing Zi Yan (4): mm/migrate: factor out code in move_to_new_folio() and migrate_folio_move() mm/migrate: revive MIGRATE_NO_COPY in migrate_mode. mm/migrate: introduce multi-threaded page copy routine adjust NR_MAX_BATCHED_MIGRATION for testing drivers/Kconfig | 2 + drivers/Makefile | 3 + drivers/migoffcopy/Kconfig | 17 ++ drivers/migoffcopy/Makefile | 2 + drivers/migoffcopy/dcbm/Makefile | 1 + drivers/migoffcopy/dcbm/dcbm.c | 393 ++++++++++++++++++++++++ drivers/migoffcopy/mtcopy/Makefile | 1 + drivers/migoffcopy/mtcopy/copy_pages.c | 408 +++++++++++++++++++++++++ include/linux/migrate_mode.h | 2 + include/linux/migrate_offc.h | 36 +++ include/linux/mm.h | 4 + mm/Kconfig | 8 + mm/Makefile | 1 + mm/migrate.c | 351 ++++++++++++++++++--- mm/migrate_offc.c | 51 ++++ mm/util.c | 41 +++ 16 files changed, 1275 insertions(+), 46 deletions(-) create mode 100644 drivers/migoffcopy/Kconfig create mode 100644 drivers/migoffcopy/Makefile create mode 100644 drivers/migoffcopy/dcbm/Makefile create mode 100644 drivers/migoffcopy/dcbm/dcbm.c create mode 100644 drivers/migoffcopy/mtcopy/Makefile create mode 100644 drivers/migoffcopy/mtcopy/copy_pages.c create mode 100644 include/linux/migrate_offc.h create mode 100644 mm/migrate_offc.c