From patchwork Thu Mar 6 04:42:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Balbir Singh X-Patchwork-Id: 14003817 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 24335C282DE for ; Thu, 6 Mar 2025 04:43:47 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 136A610E10B; Thu, 6 Mar 2025 04:43:44 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.b="WxodjVLz"; dkim-atps=neutral Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2060.outbound.protection.outlook.com [40.107.243.60]) by gabe.freedesktop.org (Postfix) with ESMTPS id D892C10E10B; Thu, 6 Mar 2025 04:43:42 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=LqCLaHd6M5Qd/lU2XLWoGmSoxjKdGSgblSN1QC9JJpJs/wPWfB9uyev96n3GWqDCSoAHl617gdVRaVubQcI+LiHOIoD9NEolSZmU2kvddR2TMLxYRlpHw9IM8geEnXAukGVZgP2/Anx3oR2LCxPaNUin86JRC1zJ+uIXAxLoYLa8yOE4xH+EXMwW4Xgun1PERLb1CkrK0PTWr/nnr1g7q6aV/rEtVNth/L5hJH896hyxI6QrIntTp8rmFyuooWNXXoZIC/JNDZZp+7B+LmP1dX9ggj+aAccvZeL+XpUhWDYB0VSzEbrRP4CgimzwpZuJ8Ez4GmwY2l5IJUII40B1Ig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4epqa7E+7N5phhBCx7jHFZ5uXuJg3Bv4xIEzF0uF8gE=; b=qYH54XLvKhwGJUamkn5y5SVSqO6GAHVVkoW6l3KYLrQyosdsiyQ+19DnmdIS9vvVAGdBOPzdmDO0LwO8MY3Hs+i7Evcw4p1SAZu+n0TTcXtm4A1saIahuzB355MyF8rG4zEebLjKZSFsm+mDZzXpXFSN5TIqKzm/HDojUbJy+QAeCZP/ihuEFUZ6gezHzNWwKDNJpqoZrToPlkXzMF4gVFigN5simc7AWn/QJaiWnsrzA0IHOfw7c+AohMF0IBx4/ig+FYJuWhoILXcQae7dDGste5v20LTSHVIcDs/JRwCDHmJVdnOzFRluE4DYfOgzRLEt+MDb5XUR4o3D8uCgdQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=4epqa7E+7N5phhBCx7jHFZ5uXuJg3Bv4xIEzF0uF8gE=; b=WxodjVLzb/Ua4+CyyHrMCtCkB81A3vAwv8BQRwO0eWA/oWfYiRHp77LR+yGChy2hzLeMWadGuy/EeIl6qpDU4x4CD9FKaEEKYBmDbVeabo6jQ3FGeyvTyaW4JF7PobFaNbuGDc5+8u3XkDzdOrxc1TSOryOHjAw2Opxmldk+u9HLcCo/VjITpBIDH6FK/rCZ9pcP9cCsaLOoeFheUcKSO6/zjwxgM69aE0BgQZQp+FGksa1hNM59ncdHQRVc+QwHM+1T+WyDzVHpoVQGC7Wo8XIDIaT3gzrfapq1ogK8Py+qql7bqyPqCYTXXN71aoj1FVAJRNcbBS8uMegRrVoyNw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) by DS0PR12MB7534.namprd12.prod.outlook.com (2603:10b6:8:139::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8489.25; Thu, 6 Mar 2025 04:43:36 +0000 Received: from SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868]) by SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868%7]) with mapi id 15.20.8489.028; Thu, 6 Mar 2025 04:43:36 +0000 From: Balbir Singh To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, Balbir Singh , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Shuah Khan , David Hildenbrand , Barry Song , Baolin Wang , Ryan Roberts , Matthew Wilcox , Peter Xu , Zi Yan , Kefeng Wang , Jane Chu , Alistair Popple , Donet Tom Subject: [RFC 01/11] mm/zone_device: support large zone device private folios Date: Thu, 6 Mar 2025 15:42:29 +1100 Message-ID: <20250306044239.3874247-2-balbirs@nvidia.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250306044239.3874247-1-balbirs@nvidia.com> References: <20250306044239.3874247-1-balbirs@nvidia.com> X-ClientProxiedBy: BYAPR06CA0007.namprd06.prod.outlook.com (2603:10b6:a03:d4::20) To SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA1PR12MB7272:EE_|DS0PR12MB7534:EE_ X-MS-Office365-Filtering-Correlation-Id: 33abd23b-e99e-49ab-9547-08dd5c697510 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|366016|1800799024; X-Microsoft-Antispam-Message-Info: IYZQ9ew0+a6st9bNWHuMafKl0VT3SV8Tp47j8iyd5SCc8aHhIkHXAceO9Yl5OvX21uoqproctYLKNGZQx9WFFyP74US5rEiikZ+9aufc/vyr/4cUu0wpsb06v5R/0UyBtKIY5R8yITPSqtmjt2YXDPonyVTxFZu5N7lV0m4pfSajXHl8fG9Tgiy2NBGnHRN+m4l5uJ7iZUPt9ZgJyvFSfqqbzToOFnyD3GjIixHdOFAr4LOtAEex2WhFecw+cXN63hqTnyLDpXkFKcf/HU3axEHnZ5rn4QsbjNAIig/9K3e+R++p0QuYimS2r3qTg/O8HhNMnHkaKKxsHLCuiMmRC4FThVMLd9RwpINLMcW6fk8IhNjhXxneASIiwfZoDGutpoII013zvSiiIENZUukFAxfSok3BUc+7n0r68q2RXWGt6L5/xhA8ralkltqvBtrVGBXkfi6D0NxZQ+k35JIJkqXZFC0VI75Cdf/OH3WR8v0Cz4yKpeGZP4+wQmlZ+87+eCtFwQytNzPowHiphQJXyhaVZWBClFzAfp3CLoKifrRdNC7wl3hynBFlyia9RNteMxJy1+NbDR3XDS7vDBFYt4PtAFxDlgCglhhNyo6/LYkeuEjdldHG0J4LYg/L86JavRlwXEhvY042v6zF66gzrKLKLtpHYwYEwIIJNXez+r4gXALnRq3WISnqr9FUlrZDQYYv052di3z/ox746aHuDdI0+adlvmSlGAjT9EMAmnEszOl1h2sB4yYH1S+7kHGYL8geEqPd/1i8/K+fI3PH4PLjwwsyZJwVYgrkgEd4On5KKV2JoOUcRQ33o6SHYRh+J3dwd0tCHCg25T2OsH3hQ1LFXR8U3Qwduco+N/XTyFDQ3COgRvVLS8uw+XbP8CE/6BrhD/r55GSwVFFwdFZi0JFAvuFPcATKiYV1WnwIxzrktFyapjLaV139WLwbHV1ytw6aEDFLHCKMc8++xJNYAf+Ej0Cd/hl6sqUIuGJlR74TzNdxMW001p7amp5d0QaLwOCO58GXnclXM58uHsg3FljbvPMb37SiA58WiCmunzjdjQjc4Hhg2vDfbSZtyPgsjutPYYYBLF+th95Jc1c1bZW42ehbqv3kV5Jm2HiU6wYZEnzbu4r7zHiL5o5d5gDscMO7a8/CCQb0rzSwE1UGtLuv2pEOE50DXLFlmRhlWBi2ByFGNpq5jVhjAskhS6SLGgQTD12w7K5gSIVDCavBBDHmigHUD4B3iK82nIhKPB5Vde3UZByjXy48HjTFJh3uGHhbrmLC47ESunOHLMwWM5ZcwyYuzbDOHCkNL7coNrHBlVc5qFNEr+Dh7QA+nS9k5V1Ku4KHtWfMnJm8pw+g0MQKyiBVTC/wJAmMXwZ2x5HKFxwxsVTyz4S2SHmtZZ47 X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SA1PR12MB7272.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(7416014)(376014)(366016)(1800799024); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: uflSSZCfX4jurU3fYD/Mk9ZkqPK17ZTp9TkaiOAewrwSDCeGlCtw1JCW1in3mIsBhYihYOUYJ6PjY9UGtq3Ovv+Y3aBjVAqypfObuxlZ1RopDSHPwfBcrsW8kX6+r9Avz5gJO9RwMKDoLHkp8RcyTFA9nJvoOo85SrzpIkX67DWJbXDA8OqTq0OOl01na9k7GmtINjPORUaXg23CEtZ6RiBR7w0IOLFiu5air052LbE/aUDkaM3cZry4Rcf3qMizIpfz71YIHVsLzDhothFBYIakB38Lu8On6M+o0L+EB/olpYIbwcjmor2P/L8rpBntzBxDwXpv5LKfAzzl7UvwTg7/ZDP5yaamG5qSeV7IWNbyqa5gnftHU1XNXlkCXMHNzDU7tSmRsD0ZvbsRouo0YwDjLXEBzJE41ZpqEmkSKeq/Be8LaQmLEpR8TUgQvAh4ocNYXaUdpZRl6Hl3Zrb3wW2OBPfZ+nTAwBZ/x85M6+dWZl+aCoH/yXP7KLQe+FS34FqQFDwQwI15us3HgFeyY0hTljzhJi2jodWIKYUWEq/ol1nrYehVQPnrhF2gThyP9vWYKraWNpdTqTVjLhn8WYNlKUYv7fQxS1zV2oYiFvkNFS5/fNSINoo8sCiow/ZBdPKftoROGymU0hp9DodOMYcgFUT33hrnMLc4euqfGPYPLTt+/7XuTMXlt3HUErJlBYUUFU4hbrVx38TD+5H0z93HeoNxQX58wfPoOTF9DGvWMw3q3n+l1tD9TfMhPh49iuk0cL+1ShjqVp6yye3XJzoO4HyQ7DWrwp/8Oo9NT36BsIpvrXLD6GhpOIui3/azeImI22B9esTKXYJGQpI1E+pvYVq7GnlWET20LMx9DstORsTiuOSpFTycF+WoI9OhwXB1/U2Xn9GQ6xq4ocScj3ONmMaV/Jf2dyQymzBg9RYnVFpIuscfYlO51iYUOJPnuVpYQVitlPfcl4HL6fIkERvCoAkdYa18y33wysqYmse/0nJ5yM0lYDesh3/uT0Bx4Q9rGEAMeyX0wFStUtpKtvUhRPvzIUOakXoPpdxBo0tRJnqbjwvM71YvU9rxhzQnlK7PWoGqtNn9vbWcifoOVfxTCVl8NMx8JWHMiEvDgwd30BYlAH09gF84hWmiMHvwhiiQWEp9cdSG4eEwXTnD3yMSXBjxeX+ysK82A3B6n2z+AJNg+wuu7Blkl5qShLaumhNI7T2nbfC2+UJMYTfv8Pf1LPqGmYStijWn/uvClG3YhMXlrJtujE5RDqHZqW5G0MJUdQskAGpMpWuEK5CSOSs0g7qiWkcYs0bDUS8kL0EIDF4GS+HARRBoLh47a4K+U7e3v3LjfF5B6iTH7inPmfuP0l/PYDLpbeI4ai3tPJUXkzdRw8UuY5oH6YyQwPb9suM7cqURnmnRVgJFmu1vgAX5IwbA+VrD1L7eJJa3BqvoBcqww4oYjxF2ZYJYjxHFu8hNsgo0mgi28N5T2YvtBrZauJsqeNnHrZeXRrU7OCKN8SKGrRM+ccob0l5BUaUhjGANf7gQYMicTTW9bMmjGVdcX2nNHpJisSWBM6W+Rv8Gc/Yqi7MUwjG2ve40/3ob X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 33abd23b-e99e-49ab-9547-08dd5c697510 X-MS-Exchange-CrossTenant-AuthSource: SA1PR12MB7272.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Mar 2025 04:43:36.4199 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: OPV+o4TkmCu+EOs2SAEdnEFhsLJ9jU6D0/d2ftZhajMhnyZJ/R9kwX9XWfxLn/n4cZhWHPIdcJtBoGRVGiA3dQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB7534 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Add routines to support allocation of large order zone device folios and helper functions for zone device folios, to check if a folio is device private and helpers for setting zone device data. When large folios are used, the existing page_free() callback in pgmap is called when the folio is freed, this is true for both PAGE_SIZE and higher order pages. Signed-off-by: Balbir Singh --- include/linux/memremap.h | 22 +++++++++++++++++- mm/memremap.c | 50 +++++++++++++++++++++++++++++----------- 2 files changed, 58 insertions(+), 14 deletions(-) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 4aa151914eab..11d586dd8ef1 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -169,6 +169,18 @@ static inline bool folio_is_device_private(const struct folio *folio) return is_device_private_page(&folio->page); } +static inline void *folio_zone_device_data(const struct folio *folio) +{ + VM_BUG_ON_FOLIO(!folio_is_device_private(folio), folio); + return folio->page.zone_device_data; +} + +static inline void folio_set_zone_device_data(struct folio *folio, void *data) +{ + VM_BUG_ON_FOLIO(!folio_is_device_private(folio), folio); + folio->page.zone_device_data = data; +} + static inline bool is_pci_p2pdma_page(const struct page *page) { return IS_ENABLED(CONFIG_PCI_P2PDMA) && @@ -199,7 +211,7 @@ static inline bool folio_is_fsdax(const struct folio *folio) } #ifdef CONFIG_ZONE_DEVICE -void zone_device_page_init(struct page *page); +void init_zone_device_folio(struct folio *folio, unsigned int order); void *memremap_pages(struct dev_pagemap *pgmap, int nid); void memunmap_pages(struct dev_pagemap *pgmap); void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); @@ -209,6 +221,14 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn, bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); unsigned long memremap_compat_align(void); + +static inline void zone_device_page_init(struct page *page) +{ + struct folio *folio = page_folio(page); + + init_zone_device_folio(folio, 0); +} + #else static inline void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) diff --git a/mm/memremap.c b/mm/memremap.c index 2aebc1b192da..7d98d0a4c0cd 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -459,20 +459,21 @@ EXPORT_SYMBOL_GPL(get_dev_pagemap); void free_zone_device_folio(struct folio *folio) { struct dev_pagemap *pgmap = folio->pgmap; + unsigned int nr = folio_nr_pages(folio); + int i; + bool anon = folio_test_anon(folio); + struct page *page = folio_page(folio, 0); if (WARN_ON_ONCE(!pgmap)) return; mem_cgroup_uncharge(folio); - /* - * Note: we don't expect anonymous compound pages yet. Once supported - * and we could PTE-map them similar to THP, we'd have to clear - * PG_anon_exclusive on all tail pages. - */ - if (folio_test_anon(folio)) { - VM_BUG_ON_FOLIO(folio_test_large(folio), folio); - __ClearPageAnonExclusive(folio_page(folio, 0)); + WARN_ON_ONCE(folio_test_large(folio) && !anon); + + for (i = 0; i < nr; i++) { + if (anon) + __ClearPageAnonExclusive(folio_page(folio, i)); } /* @@ -496,10 +497,19 @@ void free_zone_device_folio(struct folio *folio) switch (pgmap->type) { case MEMORY_DEVICE_PRIVATE: + if (folio_test_large(folio)) { + folio_unqueue_deferred_split(folio); + + percpu_ref_put_many(&folio->pgmap->ref, nr - 1); + } + pgmap->ops->page_free(page); + put_dev_pagemap(pgmap); + page->mapping = NULL; + break; case MEMORY_DEVICE_COHERENT: if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->page_free)) break; - pgmap->ops->page_free(folio_page(folio, 0)); + pgmap->ops->page_free(page); put_dev_pagemap(pgmap); break; @@ -523,14 +533,28 @@ void free_zone_device_folio(struct folio *folio) } } -void zone_device_page_init(struct page *page) +void init_zone_device_folio(struct folio *folio, unsigned int order) { + struct page *page = folio_page(folio, 0); + + VM_BUG_ON(order > MAX_ORDER_NR_PAGES); + + WARN_ON_ONCE(order && order != HPAGE_PMD_ORDER); + /* * Drivers shouldn't be allocating pages after calling * memunmap_pages(). */ - WARN_ON_ONCE(!percpu_ref_tryget_live(&page_pgmap(page)->ref)); - set_page_count(page, 1); + WARN_ON_ONCE(!percpu_ref_tryget_many(&page_pgmap(page)->ref, 1 << order)); + folio_set_count(folio, 1); lock_page(page); + + /* + * Only PMD level migration is supported for THP migration + */ + if (order > 1) { + prep_compound_page(page, order); + folio_set_large_rmappable(folio); + } } -EXPORT_SYMBOL_GPL(zone_device_page_init); +EXPORT_SYMBOL_GPL(init_zone_device_folio); From patchwork Thu Mar 6 04:42:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Balbir Singh X-Patchwork-Id: 14003816 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3D03CC28B23 for ; Thu, 6 Mar 2025 04:43:44 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 98F3210E100; Thu, 6 Mar 2025 04:43:43 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.b="HJToZw+c"; dkim-atps=neutral Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2060.outbound.protection.outlook.com [40.107.243.60]) by gabe.freedesktop.org (Postfix) with ESMTPS id 0131110E100; Thu, 6 Mar 2025 04:43:43 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=I9VQkTxlHVk988mnhZz5YKGq6hXrIKm+ycesRsglWORnUX+taGO0OheQw5yZNCu8TiGOt1OpWguaNLaonifojLOotVAt5uu6RJrvGcoO1uvii5sjlLMmIkU7eEJ9iveQnyZBxjM4Z75yTph7lfzKxdG/grC9d71Yk+M0w80RTAHcvaA8DnQ+dntLV8OoE4N5kPT8fO1uE/7yA8Sj6omIkZys4AEt85obEyKx5R3KLp2Kt3MRIQsl4yV8ZXjJtgelJeiP3RK0gixE+4szqDK4uzqqYpT9j/0uZhZH/qQ1aE0XzZSNBitMcrOlr+I5ByQa+jam2e2WRgjlgEm9FPMd/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=BuN1M/zScZ4O6Juoe8JmBvCNLK6Kiv1DRf4qLY4fJms=; b=BuVKspziqvxhuK1k6ceOwilbuiRJu5JmREP8R2WN9eWkhjbrH4n4NrTp2seM4/F+24N0x783XzyNiu5uTruvkpaFq2Z5MePtLoizh8yv+fCdIC4nJFxCDVn2SRHq4h1gMpw2TkMZcD3IpjgcfGGdD0DfjvmEEz5EazIbMPdpv+T2c3U50XSMr2RaJbL0d12XrwK2ZTre1cggBQ2rnsGfIZwdP0UOpvl94m2cUZaZnSLvRoC3mO+vK8Y0aRM19prl1/XwKQIHerLQZ3GME8Yz5I/UNJyR8oWevhA0O7M6H0ZuG93uaF6jpzp9QkdDOMT6x1bTtWQOcoSuYDHU42RmMQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BuN1M/zScZ4O6Juoe8JmBvCNLK6Kiv1DRf4qLY4fJms=; b=HJToZw+cQdgiB9QbGfWIYbKTzKc6EDluiP+lEsGxFBV12khxAYFY7+T0ZilOW2GFPNc34qg4SN98iCzwp6iwHGCZdHYZBKG/njbXnd1a7kBCvsRci586Xz5qn1VnmL2mNDjgVKqWAiJAahDe+ey1fd0+ujuRakJ9fETtRwXGhgkmoCMBXPeaf5tvunOb3z4ytPwlxKEvBkvkWFprbzG4XhdBTvD13Uz2esnbHTrPvfIZMPLVzig+HUyQY2cMMXwm+zoO6Btaxz1PzvstV6OdgVscKNVAv3RMLo7nNnKWeP7eECmHOilF8ZaBkfWA58CBu1a2anAST6KE9dzWcL7jKQ== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) by DS0PR12MB7534.namprd12.prod.outlook.com (2603:10b6:8:139::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8489.25; Thu, 6 Mar 2025 04:43:40 +0000 Received: from SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868]) by SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868%7]) with mapi id 15.20.8489.028; Thu, 6 Mar 2025 04:43:40 +0000 From: Balbir Singh To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, Balbir Singh , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Shuah Khan , David Hildenbrand , Barry Song , Baolin Wang , Ryan Roberts , Matthew Wilcox , Peter Xu , Zi Yan , Kefeng Wang , Jane Chu , Alistair Popple , Donet Tom Subject: [RFC 02/11] mm/migrate_device: flags for selecting device private THP pages Date: Thu, 6 Mar 2025 15:42:30 +1100 Message-ID: <20250306044239.3874247-3-balbirs@nvidia.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250306044239.3874247-1-balbirs@nvidia.com> References: <20250306044239.3874247-1-balbirs@nvidia.com> X-ClientProxiedBy: BYAPR06CA0025.namprd06.prod.outlook.com (2603:10b6:a03:d4::38) To SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA1PR12MB7272:EE_|DS0PR12MB7534:EE_ X-MS-Office365-Filtering-Correlation-Id: 2d02dec9-aa40-487b-838e-08dd5c697727 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|366016|1800799024; X-Microsoft-Antispam-Message-Info: Z04OSp4yuFLXozyn0L7v/jaj0MXdRLqK5sJZtkseMQ71/fMLt7vOtO4INJuxoB+8ZdkBfmFoZp+vtyQQstIyL4lDFwpWpJ26lM+IyROMZewOt8IDkzgjMH6HK9AF7MDph1J+7uZW/lo4ZjZh76216KYHRD/kdlNAZl+pEoGFjX/JoquD9c4gAkX1JtOSUPTHhPqm2EGFm6gROcILuMvsVUjYrRln5GDKknheA0bh/K5b9DUXgF1C4iS7npY27DiC3ymPOGGR/aAM/BaQClxExbDqYUTAqPsfIg9BmGnX6RcyTAx5ewCVIDAUlgBTW9Q3ywzB6IaJXDZvzl7EWjSdmRbiQiCnCgt/PeNmFFih0pTRFFNcq5ciGVBYgXIzU8ptfr9dApaMwfM0yDmAiQcM+079lWYzL/qE3WhHXU2HukcO7vyyFm4T9mk5ZAiGnkrFk/HP96FhIwsmr7S7f5A0l8RCkF25wU4OZH4q165L0PCM9bnOhbRey08dGyagBooAWpitADzHZnwk4S240galFhU83UrCj/Lze8C4slE/pqXt83mlOCu/HYFlUxN9ePRVykM7F8hlFw6H77jdbwCRy4VrCmk/wioTYwQMxV6lGW+x++HSgq9QmUCgxZCUTdTV1KKCCaSSbSbv1eMP6NvU8yZEDdMpiU2QpDNE0Eu3MXNHV3voGArFpVr0HRf5L+dYW5dwbJZ0w7GNYiIqcDbfcVxc0W2l2biuBRdgvgwqQqdpSXzg0fTNbzrS6zRdzFvd5QQ2m6HT1AsN1b07HJYLa1QTpnfSfWEi4vZ4GUAN8FQ24Aez0+GYyTGzo8K3pPaf6p5VKQDgM0APN0ejsXl1CZPfw5xGsYNnlrcEt4/Or0eYfSro4/EUyYDEWybF12xhMo34FpUwQvxHFICR7j2ciWaD2XK3MW6dpZ8M1z+RZ5fdloiJf25/vCR3T1yOrmOoRD7Z5BNzJ4VjxxScd8dRtOgq1pxzKeEXv3mGNHTXZBc3ZMLhscpjpvM2otJ1lkOAVI+9LVjIG5K0nxspmZC1vIlirxPqO7aREoDuaGO7C4jUydHH3qXvDril+eJcxh3569FrY/7C1KTDdR6Ec+aNd4yQoEqsR+ScC4+slTKiokprA6Lv+x8R/dnRY4Vj3gPEd7P4rcBiCPKIWo6AhgVlfrhpvjdf8abmRqVsqdSkdaPxVnC6Hpmsw6VyqwYRQTeMYT2xDZCSj2fVAkGKV8MiQOFujIczRZEmdc1hRVy6UtdEIAzygQrB2RDry96uAO5mDMoPy3DOIU3wTqVyyHzxYUrKnlMzDPqSIRBfrBk18OxNByVKawgGupyOMrVkiu7Wb4KDFKff7ylw/XfOB18pi14iSiSpwW7NAsE7TV7mpNMf22rF4h+FzvWDM2etKbXq X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SA1PR12MB7272.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(7416014)(376014)(366016)(1800799024); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: P5pRd8oQP4u1ZQfT8qV8svClO6uEB3v9y+MCQwx3nA0b/ifJulAkXEOi5Q07FkrUV4t/Tizi7uddHl9nRylW2faIC4ZS6LQuOYXRBIRIjd14O+yHzjw7Wy/aWGyvj/7W99pcmOVH6PM5UzlByMfWpuTbpfqrnWhFizm0o78Acq9ie5hWbrHIcp4gQZ0PWX9hpvtPHh0OEgGi09FKBBfwU4zSpW2SyDLZq/V7V/fJR0cO5Au5d4BJvzGRYrvXTe3IJSe6hK4dNYBmxytSb8J/gVeXyRZV13460canXzFYT5VeivuBe20cSf5beMO/hCsSYjvjGLfsVogyxAUPo1d4Uh6DpPliPD3lKwoofw0+l3wpSmA2JwnVKUcKPSuXy1536YJEUWGld8N1+Em6INRtN8HfC76jdj1OA9AQtZcIX6B3y+Ma5rMdXSlvUiYxdEidMuAqmFSy0Qlq4z+cPJY35lcW9Eo/3U7V1qZMpV/dzNplXidZSVLrz1x/63APoYA5jymXfPmUcskukCOJciP4LKaIKjZKlQJ4Xn8zCfLzdKyq1GfhhBuYF42nx2cuSequKF5ggBNwq1dZdJAGZYGDuDbNU5bqVKKUSKqiVVeI9aet7wH/lbTbYTZyPuQskgCR/kez6NBRnEYQUJD26VpwLo3d+gwY8PrXmwgc2yAZGaxRNggBYXQ5Xe6untQwlhLJXgM6XkFfvrLqC7O+w+5n8D3rVFqXt4lYAzZ9PgjvcY2KsRZN4H/uhGuX1IcGTtyZDoxrBY71JF+V5S6q9nhO1j7Cp6jBYS/73gxVfAYoaMyo9rW2P/tCM6WcX45GnnYc+sm2GXtLg7wO2TjU5szHhDKi3OI8U4oJGaelpAY7BZHuqoeAo0mbbk5rTo74KIX95nSWxzOLayozJGDc8GW9uLF+tU37qCdlYJcoAb4G6c+f+MPlnkkPxskyeaY+6RQIgMPOeWAwWIsoCuYVzl+9Jhwghrhl8/i6NHQr0Ygp7c6/HhJP51SstaCLjyHtNvOGzFd26906RrRcooV/dYwdqNSjBKq8ORdB8rly2K4QnhzDLdcAyyscBVfpWIRgeLj18NMFd7jQcQob0R+YCZuMHYs1HE0k69c9GLnewsl45rJyB86YKyhzwWToSv5WIbCD7Qw3trRYMNvmIfiYBELEULhNY3RASS2YaBmrQnyVaT7rBZhlLbuMwa7UzMMHsZGoipnZDfWX90Rbuo5DoELUc/TISrOSXLz7bSFtdD4lRtVEBYCYNAkY06wgMq2Etd78Pt4gOJAXbYrsAgmlv3a66QAxC3qeLr1VsYnbOi5x9hEZmvdiP4mSElnFdhXjWRFnLi9v63wWXeIojl0S9TJL9pX+1ag5DYBr5Xlhft9vKg4Q2zn8P5sNcDHs5g0JS4G2p+/6orKsdwQiSv6q+0ndEm06Nt8vGc/04Ym96m3gKKggDgXkFTR04tJBaOmbgfmcQDxtOYYWi7lnTTci99cFdC5rdDz6fLdF8x3vXAMIcRbRC9pGB6mkOyBYUr1gZte7bcSsEYhenkTZLRrBkFfyxJMUaO062/EkSsBkQbuUgqVJXpq10oI+dCfldkyhpBMV X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 2d02dec9-aa40-487b-838e-08dd5c697727 X-MS-Exchange-CrossTenant-AuthSource: SA1PR12MB7272.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Mar 2025 04:43:39.9361 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: C7zewl3b7VHJ2QTTGu1sLlBMv40jue12krFstFqW1PbL+fdYGvNvpHLOXDkTH+vDdjeTHrRRpIzT4EfeSGFK2w== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB7534 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Add flags to mark zone device migration pages. MIGRATE_VMA_SELECT_COMPOUND will be used to select THP pages during migrate_vma_setup() and MIGRATE_PFN_COMPOUND will make migrating device pages as compound pages during device pfn migration. Signed-off-by: Balbir Singh --- include/linux/migrate.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 61899ec7a9a3..b5e4f51e64c7 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -167,6 +167,7 @@ static inline int migrate_misplaced_folio(struct folio *folio, int node) #define MIGRATE_PFN_VALID (1UL << 0) #define MIGRATE_PFN_MIGRATE (1UL << 1) #define MIGRATE_PFN_WRITE (1UL << 3) +#define MIGRATE_PFN_COMPOUND (1UL << 4) #define MIGRATE_PFN_SHIFT 6 static inline struct page *migrate_pfn_to_page(unsigned long mpfn) @@ -185,6 +186,7 @@ enum migrate_vma_direction { MIGRATE_VMA_SELECT_SYSTEM = 1 << 0, MIGRATE_VMA_SELECT_DEVICE_PRIVATE = 1 << 1, MIGRATE_VMA_SELECT_DEVICE_COHERENT = 1 << 2, + MIGRATE_VMA_SELECT_COMPOUND = 1 << 3, }; struct migrate_vma { From patchwork Thu Mar 6 04:42:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Balbir Singh X-Patchwork-Id: 14003819 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E187EC28B24 for ; Thu, 6 Mar 2025 04:43:51 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C0E3610E8D2; Thu, 6 Mar 2025 04:43:49 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.b="mBKpvPmU"; dkim-atps=neutral Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2074.outbound.protection.outlook.com [40.107.244.74]) by gabe.freedesktop.org (Postfix) with ESMTPS id C972B10E8D2; Thu, 6 Mar 2025 04:43:48 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=D08nH6myRW6o5Dyoo0H9y2ZfKrCujQF3ZfCfBYkH30GX5JR55tvX2UTQeXJ+7eUxjpysnlSxI4Xlf1ow40/1wlv+8509rPsD18SyyR82t9ox9pV6war+RyeA2Le7wGnECVyHXzs2ISgVUTZBXpEAYmoK4zbpJX2Jpr1ZeHr7xk2tPR9SgjQHxG3Oy6todang2cuqqWBtks9CzKCohbrRPy/3HgeMs1JgiwXpn6+tXwAyTYoM/CpPvlDVowcp5DQZ0NNp8YG8+oQ5VRr+lQwOSx2txLLvYgCpAnTPljyMRcQJ461jEVoutjMgWvcVcnyWGcrtFqbQBq/umJeEJ5/Aig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=O4ZWVBlCG+6xuz/ii5TjkKlEEl+1Fnjg/Kbid3vR3N8=; b=MIeRhOrCZN8IbYk8qjzOzUs6flYp2vwxptozsO8DkwmM3r2FD4e5YLcW9UlQUxElNQHpTv7FEC5+w1zfDOsWNn98yBV8Fm8xaudpyXTHaMYplrgUL/erXE9FMFjsqzUpUiuDT7flostiXYaItUdqcSkTCXAbE6+om0VToGFb5313DsuEdK/qnFUnb4UMltJjBZgJesMP+nZAyoVs9Xnw3s4ECP2qSBa6EZLZetoQ4Z125bxgNe1ligXaI68uwTTnOkq4SMuruX7EqdCryslZNSZaNTcYxECz8aW1v6a6d49wItRh9Jj5Fh97NSCubAnKyI1lveMZr2Tz5i3CHWgf9w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=O4ZWVBlCG+6xuz/ii5TjkKlEEl+1Fnjg/Kbid3vR3N8=; b=mBKpvPmURzp6GPKvuj2Rz6p9oiDNvVoUdecHACsQpjSjF+wb0dkqmRQs7gpx8WKxRTqgBvCMvLFEVjw8682tokdPLVJ4PRGR5m4PgwVFfEATg/cz9GlUvBBPyGQBygfqeoMW9YMD9Du07EFCU49z2bnbMPFJtPnd3H9D4TIsMQXi3d77slPbPso2gj+1tOdcJE3WG1jTD3zBmKq87As9W7DPQkCSt/h7Aktkj53u1EZKHH6qTP0JzhW/kEUM9IEqCHJXMudrpvEqRpWMLIBSEDhfdjc7c6yMmY5+jES0C38TDe6lj3Y7oUOBeFK1WvpYTYpQUemnJig0Cz9//QVH/g== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) by DS0PR12MB7534.namprd12.prod.outlook.com (2603:10b6:8:139::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8489.25; Thu, 6 Mar 2025 04:43:43 +0000 Received: from SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868]) by SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868%7]) with mapi id 15.20.8489.028; Thu, 6 Mar 2025 04:43:43 +0000 From: Balbir Singh To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, Balbir Singh , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Shuah Khan , David Hildenbrand , Barry Song , Baolin Wang , Ryan Roberts , Matthew Wilcox , Peter Xu , Zi Yan , Kefeng Wang , Jane Chu , Alistair Popple , Donet Tom Subject: [RFC 03/11] mm/thp: zone_device awareness in THP handling code Date: Thu, 6 Mar 2025 15:42:31 +1100 Message-ID: <20250306044239.3874247-4-balbirs@nvidia.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250306044239.3874247-1-balbirs@nvidia.com> References: <20250306044239.3874247-1-balbirs@nvidia.com> X-ClientProxiedBy: SJ0PR03CA0054.namprd03.prod.outlook.com (2603:10b6:a03:33e::29) To SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA1PR12MB7272:EE_|DS0PR12MB7534:EE_ X-MS-Office365-Filtering-Correlation-Id: 08d3cc62-0c2c-47c0-e260-08dd5c697953 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|366016|1800799024; X-Microsoft-Antispam-Message-Info: rSaix4kFE0KoVLCcaxlsH08DBjCu333oIm/ypZU+cl3NVZD8mBz3CN/e7CMnfFdyd0T67Xvs2K7JyB0MxJDBVyWXoLIYNNcq3iX3Q0qELUexdZ+dl0YGL/m8fRGFeK940kPcIO89hUsSJaYAYfPuuTCqq/HOiFz8L5tafu0uwQqGpPmYFyIAbhXP6vb7NKjdMugfliPVgaj4Xb4risQMztE2Cu8dkHDEMlkrAQIvBk3ayXCLPULcXf9WmWkE8//SU/FjW2yMKw2yNLL0zipcoXG8kzrLzYB0Dvmw8VkcANf/kHmLrR6GcFBZ/2x6j5dZ96k1bViYMbOBh2GKSM6kiAuOZI0cc47KxuJPF/QlVDEdWK8814v+HpajL23J9UtTKMgW8xE9AxSLAyPX3/Oi8aGU6Fh12Ob+bTetxSgZL3W9egU7zapx002zvaQ/hHEOIzHzRJTK7wKGpUoO2vlQR+SGY3x/TX7gLmhCeH76G2ZpQnCYc1j+pQmEJs7Z2+ZnKWSjthLBeTBfKT4ZsVfyQHF5bOCNBAlPfCDSFQbRHG69lFI7n/x+TCmdvHqkatog0xlQHT82QlUR81ACX1pXD+FFJof78JPt7WE44+SE7qcruNqB0B+3cyCKupeQILtVHSlYcLnoPnE7Mhqyp0abgZM+g0LRtyv77onTi/+Gil7MxIh0I0xxvKYxR3ecbMKgaxOaQgXJlni0CDizZqg05Q79DVTGZoiArfMrJ3FVbwfqKh3SPR6jAPQ2wHFbkImNmsHDJ3wT79ZjK39MsZXTsS8eoKKls6NWa2P8wQ36vu6Pv0f2Xrpnr8F7tJu8H1yDuQO2Qw/sJ1Rjo/M71HPjnot5sEcGOGtGnAZ9CDhuQN4yqajWu73hZ0JoTNSNsbuc0ZaDR6f+yfPf+K/hvqt0Y9bs1bNxZi33Tth/cUyQ5GDD9NMLbQr1iUp03lqv3blbSDApWbC1EwH8dQRnoItyFnmEPYJek4aeErYXB415R2EQRKbBpWd2pdydChupq8Qfw3KOGYHQHgB5+bdocPbmNpxDv6m/JiUi8xxZ1lUjt420jNtgvU8+48PSO+EFavKj9t10JxMHVaYIAGKhx/LILedOLH5oET6beofKemt32JB6XB2cm9JzhDRqk6cCc3BrBwOKkmmm8su0sJ2QBSx+8IbUgyFrRHxub7Wdz59afYynidaX72QmMqAVQAuhB5NJV/1Nv0y2rKaYWhA+anMxeDyaId+6BVnZ4ecUrLFwlRZliE52X+Gv8U400jcXULwaTrdKhlqOHnyu4guoGw0opgg+PaI+rX8I92zRdcj/7Mr29E0xPSH7iKOXUmALIOkqQX6XgQYJIk41dC7paEQcultqUOGiR4IEVZ6X0F7caDXupKn3hwUQjdNXammXTuxt X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SA1PR12MB7272.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(7416014)(376014)(366016)(1800799024); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: Cs/q9rCx8fZUVJhYQSpw2PYK94HaynvidjWCmlpjqcbrJBG23dBOIynbqz72vIFn9WYTcLRTQmEA0hE3cDqDoSkQoYx+J4xWWsEUYPlK8huDF+Wm2DrQuBV4VST4zPwn8aek/Uurisa/iH8mx9huVowGNrOyxg5Jny1Z+P9k1XH6TocMqvPm8YjppfekAKelEoBYEqESQ766gewSoRhcZ14x57Nk+itV3XeJYsNvYVNTkBcJ8zxngAiN73wxJz0CThAg28QydP8o3mmV0uHIUEjf9N2J19lYqirS+YHUFi1nGAFCpTwjWw/badj428znsyh+fOyipiUQDHaHWchNLmrBNuQaYKrh3aWIb6arVzWrg3WnJ/y0Y9OnEgEBJ3kdE10jSwPajI2L8nhETQxkfwmxtmTYN0VTMHk6glqZb1mL2O6OVIu2GSvkuUUoH5XWTkyQqa5PsRgMNPxzb5vQ4qku91l8/Y5qU/p4qSzIOYV1GSRELijI77Dm9QrdmgBZ0ZK7QwM+MDd1dtZyc728xCuY8t5SHQf7nIAojDlX/xgu62rTuqKSiIpQFGweCDccudBpLrLooYL3nd76fMamzsrqB5f4pGkETdkhfYcxj5+6mE4RaIELEpPsxqWNbA/iVjHaDlnzF1KlP05eqRnnDnSD/TF6nvxM/elfYrkJ6aDJ65ToN5qsjVlMDAHc8H9sFou+9YE9ve1R0+2Y7IPQUYBRXEqQasFUw/9vHE+um1RHM7q3oh7/+2KVzHOfm6Lx2mwDMx869qOXORL2LPGr+zMWcLwN0db5LgaajSJKijZInXl0UFeVEfMlSP83AaLCkaqRlG21YCpdFYdrBR9C5cWKrlC863UgWIGN4n/Z+WiAykZny4Amv2kcOiNmadTQlc0LOFi7oKgydo3ci4zeG8Tu2UKq7L/RdiUm9iPli/9TVM23DsJpzPw5+dFJuCMk8HePNbscSrfG0l+gypm+gRvShebGXy44hTxmMhB4z86NUl/DX6KN0slYI2/khq44ufkzLEdzhcjrNa2Ybew7KsoMTlktTz3WDYpls5qM7JBETgxw9dlrgPrpUaDmPnYDZtrfRcoz2tkF+2iN0qfYW+ETV37S5x1JJRlHc21dk9dO3TRKKI6DfPpq6Km6pZ0CFFkY0P9Q31gwMXZSTadwONRQa1HsWZY8MQNylEIemdPTyneZc+azWgM6cmSWpkKbg/jrdF4kSSsOBRKen4wTtk6DTEwP1haTYQsKou22N7gN/4MwRd2FSMepyYtVDq4x2HkT9X4J7C3YfdaaRNXHdmvLeK0akEa97WznyYZkJEwjCx6uuYHeiiw5dztCwVdZok/MSl3aSPEqYaWeKmbfuGMX9SzCh4jb28q4Y6EXZR8oSOmCZIupUWMOoXACrjTVeX3qF2D9TD7C1aaWRTXW1e4/pRC/qLRL6HjTr1j7RtUjxH9HRwYYGQ22hf3P4SfAXLy64QZZ2T2xnx9I0qKkN5HT2XLDDAote8/HGLsrAMCaRBSWs/L2XXqGgSxPPymag/46jMKc1uiy79crK18v/efKKznxVwyfiz21At8F4yqWr+V/4/CGofou1MArVtIC X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 08d3cc62-0c2c-47c0-e260-08dd5c697953 X-MS-Exchange-CrossTenant-AuthSource: SA1PR12MB7272.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Mar 2025 04:43:43.4461 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: L3fWZQgZqDQI6+esUH4icRNu8WvQMtrNDkrIJgfuQkW4+HFFXIPY9bTIf/8+NCt+D1cR+UpVo7QApeV1KHvWLA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB7534 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Make THP handling code in the mm subsystem for THP pages aware of zone device pages. Although the code is designed to be generic when it comes to handling splitting of pages, the code is designed to work for THP page sizes corresponding to HPAGE_PMD_NR. Modify page_vma_mapped_walk() to return true when a zone device huge entry is present, enabling try_to_migrate() and other code migration paths to appropriately process the entry pmd_pfn() does not work well with zone device entries, use pfn_pmd_entry_to_swap() for checking and comparison as for zone device entries. try_to_map_to_unused_zeropage() does not apply to zone device entries, zone device entries are ignored in the call. Signed-off-by: Balbir Singh --- mm/huge_memory.c | 151 +++++++++++++++++++++++++++++++------------ mm/migrate.c | 2 + mm/page_vma_mapped.c | 10 +++ mm/rmap.c | 19 +++++- 4 files changed, 138 insertions(+), 44 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 826bfe907017..d8e018d1bdbd 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2247,10 +2247,17 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, } else if (thp_migration_supported()) { swp_entry_t entry; - VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); entry = pmd_to_swp_entry(orig_pmd); folio = pfn_swap_entry_folio(entry); flush_needed = 0; + + VM_BUG_ON(!is_pmd_migration_entry(*pmd) && + !folio_is_device_private(folio)); + + if (folio_is_device_private(folio)) { + folio_remove_rmap_pmd(folio, folio_page(folio, 0), vma); + WARN_ON_ONCE(folio_mapcount(folio) < 0); + } } else WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); @@ -2264,6 +2271,15 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, -HPAGE_PMD_NR); } + /* + * Do a folio put on zone device private pages after + * changes to mm_counter, because the folio_put() will + * clean folio->mapping and the folio_test_anon() check + * will not be usable. + */ + if (folio_is_device_private(folio)) + folio_put(folio); + spin_unlock(ptl); if (flush_needed) tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE); @@ -2392,7 +2408,8 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, struct folio *folio = pfn_swap_entry_folio(entry); pmd_t newpmd; - VM_BUG_ON(!is_pmd_migration_entry(*pmd)); + VM_BUG_ON(!is_pmd_migration_entry(*pmd) && + !folio_is_device_private(folio)); if (is_writable_migration_entry(entry)) { /* * A protection check is difficult so @@ -2405,9 +2422,11 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, newpmd = swp_entry_to_pmd(entry); if (pmd_swp_soft_dirty(*pmd)) newpmd = pmd_swp_mksoft_dirty(newpmd); - } else { + } else if (is_writable_device_private_entry(entry)) { + newpmd = swp_entry_to_pmd(entry); + entry = make_device_exclusive_entry(swp_offset(entry)); + } else newpmd = *pmd; - } if (uffd_wp) newpmd = pmd_swp_mkuffd_wp(newpmd); @@ -2860,11 +2879,12 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, struct page *page; pgtable_t pgtable; pmd_t old_pmd, _pmd; - bool young, write, soft_dirty, pmd_migration = false, uffd_wp = false; - bool anon_exclusive = false, dirty = false; + bool young, write, soft_dirty, uffd_wp = false; + bool anon_exclusive = false, dirty = false, present = false; unsigned long addr; pte_t *pte; int i; + swp_entry_t swp_entry; VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); VM_BUG_ON_VMA(vma->vm_start > haddr, vma); @@ -2918,20 +2938,25 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, return __split_huge_zero_page_pmd(vma, haddr, pmd); } - pmd_migration = is_pmd_migration_entry(*pmd); - if (unlikely(pmd_migration)) { - swp_entry_t entry; + present = pmd_present(*pmd); + if (unlikely(!present)) { + swp_entry = pmd_to_swp_entry(*pmd); old_pmd = *pmd; - entry = pmd_to_swp_entry(old_pmd); - page = pfn_swap_entry_to_page(entry); - write = is_writable_migration_entry(entry); + + folio = pfn_swap_entry_folio(swp_entry); + VM_BUG_ON(!is_migration_entry(swp_entry) && + !is_device_private_entry(swp_entry)); + page = pfn_swap_entry_to_page(swp_entry); + write = is_writable_migration_entry(swp_entry); + if (PageAnon(page)) - anon_exclusive = is_readable_exclusive_migration_entry(entry); - young = is_migration_entry_young(entry); - dirty = is_migration_entry_dirty(entry); + anon_exclusive = + is_readable_exclusive_migration_entry(swp_entry); soft_dirty = pmd_swp_soft_dirty(old_pmd); uffd_wp = pmd_swp_uffd_wp(old_pmd); + young = is_migration_entry_young(swp_entry); + dirty = is_migration_entry_dirty(swp_entry); } else { /* * Up to this point the pmd is present and huge and userland has @@ -3015,30 +3040,45 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, * Note that NUMA hinting access restrictions are not transferred to * avoid any possibility of altering permissions across VMAs. */ - if (freeze || pmd_migration) { + if (freeze || !present) { for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) { pte_t entry; - swp_entry_t swp_entry; - - if (write) - swp_entry = make_writable_migration_entry( - page_to_pfn(page + i)); - else if (anon_exclusive) - swp_entry = make_readable_exclusive_migration_entry( - page_to_pfn(page + i)); - else - swp_entry = make_readable_migration_entry( - page_to_pfn(page + i)); - if (young) - swp_entry = make_migration_entry_young(swp_entry); - if (dirty) - swp_entry = make_migration_entry_dirty(swp_entry); - entry = swp_entry_to_pte(swp_entry); - if (soft_dirty) - entry = pte_swp_mksoft_dirty(entry); - if (uffd_wp) - entry = pte_swp_mkuffd_wp(entry); - + if (freeze || is_migration_entry(swp_entry)) { + if (write) + swp_entry = make_writable_migration_entry( + page_to_pfn(page + i)); + else if (anon_exclusive) + swp_entry = make_readable_exclusive_migration_entry( + page_to_pfn(page + i)); + else + swp_entry = make_readable_migration_entry( + page_to_pfn(page + i)); + if (young) + swp_entry = make_migration_entry_young(swp_entry); + if (dirty) + swp_entry = make_migration_entry_dirty(swp_entry); + entry = swp_entry_to_pte(swp_entry); + if (soft_dirty) + entry = pte_swp_mksoft_dirty(entry); + if (uffd_wp) + entry = pte_swp_mkuffd_wp(entry); + } else { + VM_BUG_ON(!is_device_private_entry(swp_entry)); + if (write) + swp_entry = make_writable_device_private_entry( + page_to_pfn(page + i)); + else if (anon_exclusive) + swp_entry = make_device_exclusive_entry( + page_to_pfn(page + i)); + else + swp_entry = make_readable_device_private_entry( + page_to_pfn(page + i)); + entry = swp_entry_to_pte(swp_entry); + if (soft_dirty) + entry = pte_swp_mksoft_dirty(entry); + if (uffd_wp) + entry = pte_swp_mkuffd_wp(entry); + } VM_WARN_ON(!pte_none(ptep_get(pte + i))); set_pte_at(mm, addr, pte + i, entry); } @@ -3065,7 +3105,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, } pte_unmap(pte); - if (!pmd_migration) + if (present) folio_remove_rmap_pmd(folio, page, vma); if (freeze) put_page(page); @@ -3077,6 +3117,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, pmd_t *pmd, bool freeze, struct folio *folio) { + struct folio *pmd_folio; VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio)); VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE)); VM_WARN_ON_ONCE(folio && !folio_test_locked(folio)); @@ -3089,7 +3130,14 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, */ if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)) { - if (folio && folio != pmd_folio(*pmd)) + if (folio && !pmd_present(*pmd)) { + swp_entry_t swp_entry = pmd_to_swp_entry(*pmd); + + pmd_folio = page_folio(pfn_swap_entry_to_page(swp_entry)); + } else { + pmd_folio = pmd_folio(*pmd); + } + if (folio && folio != pmd_folio) return; __split_huge_pmd_locked(vma, pmd, address, freeze); } @@ -3581,11 +3629,16 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, folio_test_swapcache(origin_folio)) ? folio_nr_pages(release) : 0)); + if (folio_is_device_private(release)) + percpu_ref_get_many(&release->pgmap->ref, + (1 << new_order) - 1); + if (release == origin_folio) continue; - lru_add_page_tail(origin_folio, &release->page, - lruvec, list); + if (!folio_is_device_private(origin_folio)) + lru_add_page_tail(origin_folio, &release->page, + lruvec, list); /* Some pages can be beyond EOF: drop them from page cache */ if (release->index >= end) { @@ -4625,7 +4678,10 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, return 0; flush_cache_range(vma, address, address + HPAGE_PMD_SIZE); - pmdval = pmdp_invalidate(vma, address, pvmw->pmd); + if (!folio_is_device_private(folio)) + pmdval = pmdp_invalidate(vma, address, pvmw->pmd); + else + pmdval = pmdp_huge_clear_flush(vma, address, pvmw->pmd); /* See folio_try_share_anon_rmap_pmd(): invalidate PMD first. */ anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page); @@ -4675,6 +4731,17 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) entry = pmd_to_swp_entry(*pvmw->pmd); folio_get(folio); pmde = mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot)); + + if (unlikely(folio_is_device_private(folio))) { + if (pmd_write(pmde)) + entry = make_writable_device_private_entry( + page_to_pfn(new)); + else + entry = make_readable_device_private_entry( + page_to_pfn(new)); + pmde = swp_entry_to_pmd(entry); + } + if (pmd_swp_soft_dirty(*pvmw->pmd)) pmde = pmd_mksoft_dirty(pmde); if (is_writable_migration_entry(entry)) diff --git a/mm/migrate.c b/mm/migrate.c index 59e39aaa74e7..0aa1bdb711c3 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -200,6 +200,8 @@ static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvmw, if (PageCompound(page)) return false; + if (folio_is_device_private(folio)) + return false; VM_BUG_ON_PAGE(!PageAnon(page), page); VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(pte_present(*pvmw->pte), page); diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index e463c3be934a..5dd2e51477d3 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -278,6 +278,16 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) * cannot return prematurely, while zap_huge_pmd() has * cleared *pmd but not decremented compound_mapcount(). */ + swp_entry_t entry; + + if (!thp_migration_supported()) + return not_found(pvmw); + entry = pmd_to_swp_entry(pmde); + if (is_device_private_entry(entry)) { + pvmw->ptl = pmd_lock(mm, pvmw->pmd); + return true; + } + if ((pvmw->flags & PVMW_SYNC) && thp_vma_suitable_order(vma, pvmw->address, PMD_ORDER) && diff --git a/mm/rmap.c b/mm/rmap.c index 67bb273dfb80..67e99dc5f2ef 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2326,8 +2326,23 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION /* PMD-mapped THP migration entry */ if (!pvmw.pte) { - subpage = folio_page(folio, - pmd_pfn(*pvmw.pmd) - folio_pfn(folio)); + /* + * Zone device private folios do not work well with + * pmd_pfn() on some architectures due to pte + * inversion. + */ + if (folio_is_device_private(folio)) { + swp_entry_t entry = pmd_to_swp_entry(*pvmw.pmd); + unsigned long pfn = swp_offset_pfn(entry); + + subpage = folio_page(folio, pfn + - folio_pfn(folio)); + } else { + subpage = folio_page(folio, + pmd_pfn(*pvmw.pmd) + - folio_pfn(folio)); + } + VM_BUG_ON_FOLIO(folio_test_hugetlb(folio) || !folio_test_pmd_mappable(folio), folio); From patchwork Thu Mar 6 04:42:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Balbir Singh X-Patchwork-Id: 14003820 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D7C5AC28B24 for ; Thu, 6 Mar 2025 04:43:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 86B1910E8D9; Thu, 6 Mar 2025 04:43:53 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.b="dn+Qu8bs"; dkim-atps=neutral Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2071.outbound.protection.outlook.com [40.107.244.71]) by gabe.freedesktop.org (Postfix) with ESMTPS id 90D9410E8D8; Thu, 6 Mar 2025 04:43:52 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=ggu5p4kxFlm2+PduEV84jlg9TLFACTfOwisVNksUJHCDo9JILXJrnjcgwO2vikAD9IszqJvljyGJ1OykhkWQ186gFbzLgqPvfnEhZCLVOK+BYdwjtyNVv0hs7XGZSb6w1s7W+aiVAu4xPhYCRRo/vVOYHjeocF6wB/pGVGKD9zVG7/qyLzKdZUC03QGhdGtVjJgVjEVhWF3mDlX3gNm68pjiMz001ac2jqkNhGsY2KSlqmdrDrh1MZbFXhpiW8J+58mSjFfv3AadbxfoL7yWl7AUmzSMu0iH/mt91NVMD6JE6I5PbNdQJmmz6Qdwcm6B1MVafcyV4O0PYx7FNS5R5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=kTNdNLLeCR5CI6CJehh6Vp7DdtRyNzZCWL+jaDx0eCw=; b=WJO/j4FpUTSIxMviZjVTh5pwsJo6Anq0ynWF4NAWfS55/SAGbFNbHR0AxxjNsdK0+BzKVkF9yxKKXRk+nRJOqdw6al31InMqyTJ8Pvd29zZcVZmQNMEwBLNkYZPG4S4HgN2LlgWXS/Upu4zy8haRpyCI6vcpg3Ftg28bKVXbLQQyStZeDJzYO2Jgs8MZ3ClXiNfode0TXf7MWCt+EotPZVBkqUjW0GrUYcNbMYdbxBLVTwGFeh7nO5Fp7TVBV9PfFSEDbr/32L/7PKnfnWlg8BlitTQO0Iv+pXFNiaPlnMhOniQMThpfKu5r7zgplZzLKj3fYDvy+M7ReDHROzM+Mg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=kTNdNLLeCR5CI6CJehh6Vp7DdtRyNzZCWL+jaDx0eCw=; b=dn+Qu8bsphe6yqURxs+q3gVPyjJcy52UntQSXrfEqqyo6MDenZURYOKkj7/7i+sYqPaBmJiZHEZhRubMuxLNd67bgnE9+6S+8Ou1LHot5NHmuB92/0B3oXsmKsfYGqnZa1U5xMIvk1sBwKlWlIMONZc2IZzuI0NiQAgE8/UC9B92uK9toPcsW6Ypo891ed62lXDuuoCkJGk/mZJLpQrNFMJwV6/GHxw2VHOcmu3bzUBjB4KlwP3+XxSeOhHMeEJdfp+UzU8ydXWo210w6cXlZqUQ/IXRC0pFtDpMS72Q2BpCWyBN3bxrxwbwL4LQtmt37vEnpMlfEgGHgpWdfSd8Cw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) by DS0PR12MB7534.namprd12.prod.outlook.com (2603:10b6:8:139::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8489.25; Thu, 6 Mar 2025 04:43:47 +0000 Received: from SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868]) by SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868%7]) with mapi id 15.20.8489.028; Thu, 6 Mar 2025 04:43:47 +0000 From: Balbir Singh To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, Balbir Singh , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Shuah Khan , David Hildenbrand , Barry Song , Baolin Wang , Ryan Roberts , Matthew Wilcox , Peter Xu , Zi Yan , Kefeng Wang , Jane Chu , Alistair Popple , Donet Tom Subject: [RFC 04/11] mm/migrate_device: THP migration of zone device pages Date: Thu, 6 Mar 2025 15:42:32 +1100 Message-ID: <20250306044239.3874247-5-balbirs@nvidia.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250306044239.3874247-1-balbirs@nvidia.com> References: <20250306044239.3874247-1-balbirs@nvidia.com> X-ClientProxiedBy: BYAPR06CA0019.namprd06.prod.outlook.com (2603:10b6:a03:d4::32) To SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA1PR12MB7272:EE_|DS0PR12MB7534:EE_ X-MS-Office365-Filtering-Correlation-Id: 6bc6b347-22ea-4791-c3ff-08dd5c697b8a X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|366016|1800799024; X-Microsoft-Antispam-Message-Info: wl3rA+uf2Q9BMTh8QkauWc4n017jLLH0Ae8dkr7Fa2NFO4wQSsLMJf+ThVikVsoMW8zScgTYtgGiyp/ZKzJd4pyYZZpVPdvxrQkOMCnph6rMV63ef8B354GLN2bT13PXvpxfm2/gl1Gu6xHp48sAmtGzJIm/T498cS1a8W8gkHuhTwcLUiHcYVz2n5SPZpkOxsi1HZyHG9uDf8kmTlRlYbT0BIyywl1DPuvWLSIFQyhccqwqco3EyNkI9e8TNrmbh+CTorlm/BcU04YK2jeLmI9Yd2O97jVXj4QucskRuHlD8xui4WXdch4v6gO8Z3F0tdkdlskw+xWo1PPf2B2FuA0XSjm6I2nJXwDYDIUd5gbfXbhYvM7R1YKHq+TgmGIf7BFfMqygZk3wHJAC6JIy4LYi2Cn5b++mrp0AtOeE8gXuaA1NB2QzBJfc2WjG/9B/E0WOY7ezV6Byg62jwJc79xl3fsxQVc9KqvpZKGvsSS3NYh3hT3vFfgnYckqYwnDGOFFxN1Giaz2HKOLyx6478n6Sri+Cda1A3k4grYiYcawNg16TVv0zlnba8ThVXyN0MuTZP4SKBk4yQlMnD+uaTXM/+5MMuPoySGOJAQI8jqoFDwYqBrUTRj5FV3jnQk8z9T+aZ2NBGMCYuZpWg6hTc/mlSVIbpmFdISbqVCxvdtGk4g+bTDZEPiUdRkXe0tcp5igR6W4o5fIXlqKysI9T/jWw808XprKCjc17W9BLZoWcDxcxJYOMToJoV35/KXMAuDoB0un4fClova3rLzvcyw/1xmizXQgrQm33qrSXM3K3Vr8UNqTWK4ed90sjFRiHaIa5YGQmCPNJ9u+drGcDm6/iuGzdl83Y9iGuN/hvb5KpBcBZBBrO1YoX+SavcgB0WLtyLwdZCHDHG5KiaiTkTdy1jxHqzl+cm6+8+rSCCNNIHvp+ANducRjORCyKOWgN+b/aiIanSKcr4qIX6g4b78cf4CJvYZG8+3d61yA//aR6SCXbTGJJiBOgkhCmn6Fw7zwm+S7/N3fIij6slvJ3oodsAyg5T5KDCSyrJlRWE53b18f1zIB+njwlQty7o41nFizw+YoxVBN2vsaY6oea9zRZSs0Icjzab/pOE8N+7SA07YWe8/GHYTXCltwAPw3Nebs/Qg9+A4nF8E2SrXY9Ubse37plmqJo9snBNGVxwufl3fxdVkWKjspf9CiAc2tVGOPTNuqIaDTkpChqZ/cOKK6pZMbcOmu4KjKqXrOTz5SoWRNouW/TPGOTNdrmhqCudJghhGjQdA1YaQfiULrE0Bnp0vKmxkayIJu1soqonGknHhDSY4sZ/DTzyw14wFgjnhkuMm7075hiYfzT/04KwKQIPu3DsAgRh7HPgin8+taS06TVIhkMns8pOPsl5rxn X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SA1PR12MB7272.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(7416014)(376014)(366016)(1800799024); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: TcYdEHrkvbPQ3F0knUJbvADOzJyyDrKTKZiRYlgDOFxOZnKmGnf2mBtezp/PDUOl0FHBmgtm2691hNq8Mz1a8DqKBb2XFhYV8MUmPcv9jZ5fiy1dDUv1gYvO0TTHgN0VXDeG9UFExi1US6pUyiy1Qm6gA3Ehd0nfjtjr513r5yRSCQylQ4rgysbIpNvCklIM9j3ZpRrEF2OArJ5+o+AaeLERnnU4JqOOyWCcG8jZpNYwqL/sqxB6zzyhyaW3gVZ7lhzglIYxojunAM6Wsd3wSX5sIL85I9WPGP1Tg4+XsHusX2Ker59sFR5IHWbSVTpRQs3fWfUGFnApXNqHIWwzjS56iX4RG2QohbZXvikWtLrUm2clkJa6uJwafiOs5Es+MN4KnZ/BMvVonb76K8w6QmJQgEOF5ubS0JjZqreTvetz81XL5VtrBx0cm78jMHWk3+T19MDnxkIYdkTgUlD+2Dv6D/kuTLkeCr7Xwz/LRkgKf46dwS9ZFKVIRo2xVWiNcmhzzczhIt6i9v0MK25AbEq1FiQ8188vc9OPTzOEEMUHzhglmRuSINwy/YpNvDpCOsxopoj24kpxWU7ft9kW/pkNmoC2mMiSSkH2sni1b/W9nXPpMDAMakA4iMUoDvMTEVATyvEninqO6x59DyEa6qZoMua/XI1ZlcmdZ4Daedqb+kXSZa4ENiA/RpfPo0NpFCNyhHabOJJjYE8hG3v+OJnAG450Dc9To8QsnIJ/RNSsLvvTgcDj/SW18SuRFZpWuSwqADebDQSPNjUCLktYzZK9IKPuBbW0rpMpxKRKVFtJ4Zg7BxDy97v8D0fF/8YBxAo9FFqGrJ3jHJKcHiPkz2Eye94JjlIXwXHrDnNuvPi2eEGXSeJT6MvBd4aRW4E/f0CfHfGAs+WOqE1L8+5zp1iyPCYvwvu05BcztMhUWGGvZRXTBkUkRL0QfHrxO9r5F8d3Ga0XxshYINUGuZrYmUFcERP6CsxhcWYBsxYk41iKfiV1HqAWEp6p090tOVKFaMqJCm9unrhGOF9djMu5vgRNXmGJnbYWwmZ3D2p+9mqj28IX1EoSLgnlQR6dnCm7utheUA7ucoQPmEZ/1304GtVoox18eiMU4ADtvdUaSdJwE0B7BFn01+p2KCH5rRZseNsJ040pY/XNH6KDqC3zKKF7NSs4jIiJEckcdEeF649eMPWigY1QlMMqYNPtW924P1SNgo4HLEcSOvAYhyYU1mUuB1/k5+0ZF8Z1lw64unZd1HdVUcp6bC4PjYu+Gk5XMKLKn4bUhjIf1kOYaJXUPoVqe69HICx8NM3gVKmTcIrM0N1MVxUUE1pZIuVoNLzdJ0frzpBUgNsNcEU0Cs9XqpgCbckVA3+A4tTGGjvi8EaEEERyJTT/buj47lT+o8HkvLIiP3DquvI+c2hkCt8YkQ9YSsRm7uuNgfwW5PoKBIN27MYTCtbSilKtR7JGjfm/3dZqnXCKdYEzvpHzxB6BimaUq/PmxOKBMotCJlwacqWOe3BpUCrE+cENSzQwWli3VYuhkETfPpZ41NOL93zEfyCvrEfjw0UJfIkq2eirwekQyPkK3pjXZqVmSmBupVRb X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 6bc6b347-22ea-4791-c3ff-08dd5c697b8a X-MS-Exchange-CrossTenant-AuthSource: SA1PR12MB7272.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Mar 2025 04:43:47.1511 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: XkKCDzb86rbpf4G78jC8+yHg1FkM5qoFbPdkmfRlVX5sK8xo0P3bbB2AQxa7aBfaKke1OCjQVXkqkWs/L1z+xA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB7534 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" migrate_device code paths go through the collect, setup and finalize phases of migration. Support for MIGRATE_PFN_COMPOUND was added earlier in the series to mark THP pages as MIGRATE_PFN_COMPOUND. The entries in src and dst arrays passed to these functions still remain at a PAGE_SIZE granularity. When a compound page is passed, the first entry has the PFN along with MIGRATE_PFN_COMPOUND and other flags set (MIGRATE_PFN_MIGRATE, MIGRATE_PFN_VALID), the remaining entries (HPAGE_PMD_NR - 1) are filled with 0's. This representation allows for the compound page to be split into smaller page sizes. migrate_vma_collect_hole(), migrate_vma_collect_pmd() are now THP page aware. Two new helper functions migrate_vma_collect_huge_pmd() and migrate_vma_insert_huge_pmd_page() have been added. migrate_vma_collect_huge_pmd() can collect THP pages, but if for some reason this fails, there is fallback support to split the folio and migrate it. migrate_vma_insert_huge_pmd_page() closely follows the logic of migrate_vma_insert_page() Support for splitting pages as needed for migration will follow in later patches in this series. Signed-off-by: Balbir Singh --- mm/migrate_device.c | 436 ++++++++++++++++++++++++++++++++++++++------ 1 file changed, 379 insertions(+), 57 deletions(-) diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 7d0d64f67cdf..f3fff5d705bd 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include "internal.h" @@ -44,6 +45,23 @@ static int migrate_vma_collect_hole(unsigned long start, if (!vma_is_anonymous(walk->vma)) return migrate_vma_collect_skip(start, end, walk); + if (thp_migration_supported() && + (migrate->flags & MIGRATE_VMA_SELECT_COMPOUND) && + (IS_ALIGNED(start, HPAGE_PMD_SIZE) && + IS_ALIGNED(end, HPAGE_PMD_SIZE))) { + migrate->src[migrate->npages] = MIGRATE_PFN_MIGRATE | + MIGRATE_PFN_COMPOUND; + migrate->dst[migrate->npages] = 0; + migrate->npages++; + migrate->cpages++; + + /* + * Collect the remaining entries as holes, in case we + * need to split later + */ + return migrate_vma_collect_skip(start + PAGE_SIZE, end, walk); + } + for (addr = start; addr < end; addr += PAGE_SIZE) { migrate->src[migrate->npages] = MIGRATE_PFN_MIGRATE; migrate->dst[migrate->npages] = 0; @@ -54,50 +72,145 @@ static int migrate_vma_collect_hole(unsigned long start, return 0; } -static int migrate_vma_collect_pmd(pmd_t *pmdp, - unsigned long start, - unsigned long end, - struct mm_walk *walk) +/** + * migrate_vma_collect_huge_pmd - collect THP pages without splitting the + * folio for device private pages. + * @pmdp: pointer to pmd entry + * @start: start address of the range for migration + * @end: end address of the range for migration + * @walk: mm_walk callback structure + * + * Collect the huge pmd entry at @pmdp for migration and set the + * MIGRATE_PFN_COMPOUND flag in the migrate src entry to indicate that + * migration will occur at HPAGE_PMD granularity + */ +static int migrate_vma_collect_huge_pmd(pmd_t *pmdp, unsigned long start, + unsigned long end, struct mm_walk *walk) { + struct mm_struct *mm = walk->mm; + struct folio *folio; struct migrate_vma *migrate = walk->private; - struct vm_area_struct *vma = walk->vma; - struct mm_struct *mm = vma->vm_mm; - unsigned long addr = start, unmapped = 0; + swp_entry_t entry; + int ret; + unsigned long write = 0; + spinlock_t *ptl; - pte_t *ptep; -again: - if (pmd_none(*pmdp)) + ptl = pmd_lock(mm, pmdp); + if (pmd_none(*pmdp)) { + spin_unlock(ptl); return migrate_vma_collect_hole(start, end, -1, walk); + } if (pmd_trans_huge(*pmdp)) { - struct folio *folio; - - ptl = pmd_lock(mm, pmdp); - if (unlikely(!pmd_trans_huge(*pmdp))) { + if (!(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM)) { spin_unlock(ptl); - goto again; + return migrate_vma_collect_skip(start, end, walk); } folio = pmd_folio(*pmdp); if (is_huge_zero_folio(folio)) { spin_unlock(ptl); - split_huge_pmd(vma, pmdp, addr); - } else { - int ret; + return migrate_vma_collect_hole(start, end, -1, walk); + } + if (pmd_write(*pmdp)) + write = MIGRATE_PFN_WRITE; + } else if (!pmd_present(*pmdp)) { + entry = pmd_to_swp_entry(*pmdp); + folio = pfn_swap_entry_folio(entry); + + if (!is_device_private_entry(entry) || + !(migrate->flags & MIGRATE_VMA_SELECT_DEVICE_PRIVATE) || + (folio->pgmap->owner != migrate->pgmap_owner)) { + spin_unlock(ptl); + return migrate_vma_collect_skip(start, end, walk); + } - folio_get(folio); + if (is_migration_entry(entry)) { + migration_entry_wait_on_locked(entry, ptl); spin_unlock(ptl); - if (unlikely(!folio_trylock(folio))) - return migrate_vma_collect_skip(start, end, - walk); - ret = split_folio(folio); - folio_unlock(folio); - folio_put(folio); - if (ret) - return migrate_vma_collect_skip(start, end, - walk); + return -EAGAIN; } + + if (is_writable_device_private_entry(entry)) + write = MIGRATE_PFN_WRITE; + } else { + spin_unlock(ptl); + return -EAGAIN; + } + + folio_get(folio); + if (unlikely(!folio_trylock(folio))) { + spin_unlock(ptl); + folio_put(folio); + return migrate_vma_collect_skip(start, end, walk); + } + + if (thp_migration_supported() && + (migrate->flags & MIGRATE_VMA_SELECT_COMPOUND) && + (IS_ALIGNED(start, HPAGE_PMD_SIZE) && + IS_ALIGNED(end, HPAGE_PMD_SIZE))) { + + struct page_vma_mapped_walk pvmw = { + .ptl = ptl, + .address = start, + .pmd = pmdp, + .vma = walk->vma, + }; + + unsigned long pfn = page_to_pfn(folio_page(folio, 0)); + + migrate->src[migrate->npages] = migrate_pfn(pfn) | write + | MIGRATE_PFN_MIGRATE + | MIGRATE_PFN_COMPOUND; + migrate->dst[migrate->npages++] = 0; + migrate->cpages++; + ret = set_pmd_migration_entry(&pvmw, folio_page(folio, 0)); + if (ret) { + migrate->npages--; + migrate->cpages--; + migrate->src[migrate->npages] = 0; + migrate->dst[migrate->npages] = 0; + goto fallback; + } + migrate_vma_collect_skip(start + PAGE_SIZE, end, walk); + spin_unlock(ptl); + return 0; + } + +fallback: + spin_unlock(ptl); + ret = split_folio(folio); + folio_unlock(folio); + folio_put(folio); + if (ret) + return migrate_vma_collect_skip(start, end, walk); + if (pmd_none(pmdp_get_lockless(pmdp))) + return migrate_vma_collect_hole(start, end, -1, walk); + + return -ENOENT; +} + +static int migrate_vma_collect_pmd(pmd_t *pmdp, + unsigned long start, + unsigned long end, + struct mm_walk *walk) +{ + struct migrate_vma *migrate = walk->private; + struct vm_area_struct *vma = walk->vma; + struct mm_struct *mm = vma->vm_mm; + unsigned long addr = start, unmapped = 0; + spinlock_t *ptl; + pte_t *ptep; + +again: + if (pmd_trans_huge(*pmdp) || !pmd_present(*pmdp)) { + int ret = migrate_vma_collect_huge_pmd(pmdp, start, end, walk); + + if (ret == -EAGAIN) + goto again; + if (ret == 0) + return 0; } ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); @@ -168,8 +281,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, mpfn |= pte_write(pte) ? MIGRATE_PFN_WRITE : 0; } - /* FIXME support THP */ - if (!page || !page->mapping || PageTransCompound(page)) { + if (!page || !page->mapping) { mpfn = 0; goto next; } @@ -339,14 +451,6 @@ static bool migrate_vma_check_page(struct page *page, struct page *fault_page) */ int extra = 1 + (page == fault_page); - /* - * FIXME support THP (transparent huge page), it is bit more complex to - * check them than regular pages, because they can be mapped with a pmd - * or with a pte (split pte mapping). - */ - if (folio_test_large(folio)) - return false; - /* Page from ZONE_DEVICE have one extra reference */ if (folio_is_zone_device(folio)) extra++; @@ -375,17 +479,24 @@ static unsigned long migrate_device_unmap(unsigned long *src_pfns, lru_add_drain(); - for (i = 0; i < npages; i++) { + for (i = 0; i < npages; ) { struct page *page = migrate_pfn_to_page(src_pfns[i]); struct folio *folio; + unsigned int nr = 1; if (!page) { if (src_pfns[i] & MIGRATE_PFN_MIGRATE) unmapped++; - continue; + goto next; } folio = page_folio(page); + nr = folio_nr_pages(folio); + + if (nr > 1) + src_pfns[i] |= MIGRATE_PFN_COMPOUND; + + /* ZONE_DEVICE folios are not on LRU */ if (!folio_is_zone_device(folio)) { if (!folio_test_lru(folio) && allow_drain) { @@ -397,7 +508,7 @@ static unsigned long migrate_device_unmap(unsigned long *src_pfns, if (!folio_isolate_lru(folio)) { src_pfns[i] &= ~MIGRATE_PFN_MIGRATE; restore++; - continue; + goto next; } /* Drop the reference we took in collect */ @@ -416,10 +527,12 @@ static unsigned long migrate_device_unmap(unsigned long *src_pfns, src_pfns[i] &= ~MIGRATE_PFN_MIGRATE; restore++; - continue; + goto next; } unmapped++; +next: + i += nr; } for (i = 0; i < npages && restore; i++) { @@ -562,6 +675,146 @@ int migrate_vma_setup(struct migrate_vma *args) } EXPORT_SYMBOL(migrate_vma_setup); +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +/** + * migrate_vma_insert_huge_pmd_page: Insert a huge folio into @migrate->vma->vm_mm + * at @addr. folio is already allocated as a part of the migration process with + * large page. + * + * @folio needs to be initialized and setup after it's allocated. The code bits + * here follow closely the code in __do_huge_pmd_anonymous_page(). This API does + * not support THP zero pages. + * + * @migrate: migrate_vma arguments + * @addr: address where the folio will be inserted + * @folio: folio to be inserted at @addr + * @src: src pfn which is being migrated + * @pmdp: pointer to the pmd + */ +static int migrate_vma_insert_huge_pmd_page(struct migrate_vma *migrate, + unsigned long addr, + struct page *page, + unsigned long *src, + pmd_t *pmdp) +{ + struct vm_area_struct *vma = migrate->vma; + gfp_t gfp = vma_thp_gfp_mask(vma); + struct folio *folio = page_folio(page); + int ret; + spinlock_t *ptl; + pgtable_t pgtable; + pmd_t entry; + bool flush = false; + unsigned long i; + + VM_WARN_ON_FOLIO(!folio, folio); + VM_WARN_ON_ONCE(!pmd_none(*pmdp) && !is_huge_zero_pmd(*pmdp)); + + if (!thp_vma_suitable_order(vma, addr, HPAGE_PMD_ORDER)) + return -EINVAL; + + ret = anon_vma_prepare(vma); + if (ret) + return ret; + + folio_set_order(folio, HPAGE_PMD_ORDER); + folio_set_large_rmappable(folio); + + if (mem_cgroup_charge(folio, migrate->vma->vm_mm, gfp)) { + count_vm_event(THP_FAULT_FALLBACK); + count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); + ret = -ENOMEM; + goto abort; + } + + __folio_mark_uptodate(folio); + + pgtable = pte_alloc_one(vma->vm_mm); + if (unlikely(!pgtable)) + goto abort; + + if (folio_is_device_private(folio)) { + swp_entry_t swp_entry; + + if (vma->vm_flags & VM_WRITE) + swp_entry = make_writable_device_private_entry( + page_to_pfn(page)); + else + swp_entry = make_readable_device_private_entry( + page_to_pfn(page)); + entry = swp_entry_to_pmd(swp_entry); + } else { + if (folio_is_zone_device(folio) && + !folio_is_device_coherent(folio)) { + goto abort; + } + entry = mk_pmd(page, vma->vm_page_prot); + if (vma->vm_flags & VM_WRITE) + entry = pmd_mkwrite(pmd_mkdirty(entry), vma); + } + + ptl = pmd_lock(vma->vm_mm, pmdp); + ret = check_stable_address_space(vma->vm_mm); + if (ret) + goto abort; + + /* + * Check for userfaultfd but do not deliver the fault. Instead, + * just back off. + */ + if (userfaultfd_missing(vma)) + goto unlock_abort; + + if (!pmd_none(*pmdp)) { + if (!is_huge_zero_pmd(*pmdp)) + goto unlock_abort; + flush = true; + } else if (!pmd_none(*pmdp)) + goto unlock_abort; + + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE); + if (!folio_is_zone_device(folio)) + folio_add_lru_vma(folio, vma); + folio_get(folio); + + if (flush) { + pte_free(vma->vm_mm, pgtable); + flush_cache_page(vma, addr, addr + HPAGE_PMD_SIZE); + pmdp_invalidate(vma, addr, pmdp); + } else { + pgtable_trans_huge_deposit(vma->vm_mm, pmdp, pgtable); + mm_inc_nr_ptes(vma->vm_mm); + } + set_pmd_at(vma->vm_mm, addr, pmdp, entry); + update_mmu_cache_pmd(vma, addr, pmdp); + + spin_unlock(ptl); + + count_vm_event(THP_FAULT_ALLOC); + count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC); + count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC); + + return 0; + +unlock_abort: + spin_unlock(ptl); +abort: + for (i = 0; i < HPAGE_PMD_NR; i++) + src[i] &= ~MIGRATE_PFN_MIGRATE; + return 0; +} +#else /* !CONFIG_ARCH_ENABLE_THP_MIGRATION */ +static int migrate_vma_insert_huge_pmd_page(struct migrate_vma *migrate, + unsigned long addr, + struct page *page, + unsigned long *src, + pmd_t *pmdp) +{ + return 0; +} +#endif + /* * This code closely matches the code in: * __handle_mm_fault() @@ -572,9 +825,10 @@ EXPORT_SYMBOL(migrate_vma_setup); */ static void migrate_vma_insert_page(struct migrate_vma *migrate, unsigned long addr, - struct page *page, + unsigned long *dst, unsigned long *src) { + struct page *page = migrate_pfn_to_page(*dst); struct folio *folio = page_folio(page); struct vm_area_struct *vma = migrate->vma; struct mm_struct *mm = vma->vm_mm; @@ -602,8 +856,25 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, pmdp = pmd_alloc(mm, pudp, addr); if (!pmdp) goto abort; - if (pmd_trans_huge(*pmdp) || pmd_devmap(*pmdp)) - goto abort; + + if (thp_migration_supported() && (*dst & MIGRATE_PFN_COMPOUND)) { + int ret = migrate_vma_insert_huge_pmd_page(migrate, addr, page, + src, pmdp); + if (ret) + goto abort; + return; + } + + if (!pmd_none(*pmdp)) { + if (pmd_trans_huge(*pmdp)) { + if (!is_huge_zero_pmd(*pmdp)) + goto abort; + folio_get(pmd_folio(*pmdp)); + split_huge_pmd(vma, pmdp, addr); + } else if (pmd_leaf(*pmdp)) + goto abort; + } + if (pte_alloc(mm, pmdp)) goto abort; if (unlikely(anon_vma_prepare(vma))) @@ -694,23 +965,24 @@ static void __migrate_device_pages(unsigned long *src_pfns, unsigned long i; bool notified = false; - for (i = 0; i < npages; i++) { + for (i = 0; i < npages; ) { struct page *newpage = migrate_pfn_to_page(dst_pfns[i]); struct page *page = migrate_pfn_to_page(src_pfns[i]); struct address_space *mapping; struct folio *newfolio, *folio; int r, extra_cnt = 0; + unsigned long nr = 1; if (!newpage) { src_pfns[i] &= ~MIGRATE_PFN_MIGRATE; - continue; + goto next; } if (!page) { unsigned long addr; if (!(src_pfns[i] & MIGRATE_PFN_MIGRATE)) - continue; + goto next; /* * The only time there is no vma is when called from @@ -728,15 +1000,47 @@ static void __migrate_device_pages(unsigned long *src_pfns, migrate->pgmap_owner); mmu_notifier_invalidate_range_start(&range); } - migrate_vma_insert_page(migrate, addr, newpage, + + if ((src_pfns[i] & MIGRATE_PFN_COMPOUND) && + (!(dst_pfns[i] & MIGRATE_PFN_COMPOUND))) { + nr = HPAGE_PMD_NR; + src_pfns[i] &= ~MIGRATE_PFN_COMPOUND; + src_pfns[i] &= ~MIGRATE_PFN_MIGRATE; + goto next; + } + + migrate_vma_insert_page(migrate, addr, &dst_pfns[i], &src_pfns[i]); - continue; + goto next; } newfolio = page_folio(newpage); folio = page_folio(page); mapping = folio_mapping(folio); + /* + * If THP migration is enabled, check if both src and dst + * can migrate large pages + */ + if (thp_migration_supported()) { + if ((src_pfns[i] & MIGRATE_PFN_MIGRATE) && + (src_pfns[i] & MIGRATE_PFN_COMPOUND) && + !(dst_pfns[i] & MIGRATE_PFN_COMPOUND)) { + + if (!migrate) { + src_pfns[i] &= ~(MIGRATE_PFN_MIGRATE | + MIGRATE_PFN_COMPOUND); + goto next; + } + src_pfns[i] &= ~MIGRATE_PFN_MIGRATE; + } else if ((src_pfns[i] & MIGRATE_PFN_MIGRATE) && + (dst_pfns[i] & MIGRATE_PFN_COMPOUND) && + !(src_pfns[i] & MIGRATE_PFN_COMPOUND)) { + src_pfns[i] &= ~MIGRATE_PFN_MIGRATE; + } + } + + if (folio_is_device_private(newfolio) || folio_is_device_coherent(newfolio)) { if (mapping) { @@ -749,7 +1053,7 @@ static void __migrate_device_pages(unsigned long *src_pfns, if (!folio_test_anon(folio) || !folio_free_swap(folio)) { src_pfns[i] &= ~MIGRATE_PFN_MIGRATE; - continue; + goto next; } } } else if (folio_is_zone_device(newfolio)) { @@ -757,7 +1061,7 @@ static void __migrate_device_pages(unsigned long *src_pfns, * Other types of ZONE_DEVICE page are not supported. */ src_pfns[i] &= ~MIGRATE_PFN_MIGRATE; - continue; + goto next; } BUG_ON(folio_test_writeback(folio)); @@ -769,6 +1073,8 @@ static void __migrate_device_pages(unsigned long *src_pfns, src_pfns[i] &= ~MIGRATE_PFN_MIGRATE; else folio_migrate_flags(newfolio, folio); +next: + i += nr; } if (notified) @@ -899,24 +1205,40 @@ EXPORT_SYMBOL(migrate_vma_finalize); int migrate_device_range(unsigned long *src_pfns, unsigned long start, unsigned long npages) { - unsigned long i, pfn; + unsigned long i, j, pfn; - for (pfn = start, i = 0; i < npages; pfn++, i++) { - struct folio *folio; + i = 0; + pfn = start; + while (i < npages) { + struct page *page = pfn_to_page(pfn); + struct folio *folio = page_folio(page); + unsigned int nr = 1; folio = folio_get_nontail_page(pfn_to_page(pfn)); if (!folio) { src_pfns[i] = 0; - continue; + goto next; } if (!folio_trylock(folio)) { src_pfns[i] = 0; folio_put(folio); - continue; + goto next; } src_pfns[i] = migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE; + nr = folio_nr_pages(folio); + if (nr > 1) { + src_pfns[i] |= MIGRATE_PFN_COMPOUND; + for (j = 1; j < nr; j++) + src_pfns[i+j] = 0; + i += j; + pfn += j; + continue; + } +next: + i++; + pfn++; } migrate_device_unmap(src_pfns, npages, NULL); From patchwork Thu Mar 6 04:42:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Balbir Singh X-Patchwork-Id: 14003821 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0D8AFC282D1 for ; Thu, 6 Mar 2025 04:43:57 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6C46110E8E0; Thu, 6 Mar 2025 04:43:55 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.b="IjayQsof"; dkim-atps=neutral Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2071.outbound.protection.outlook.com [40.107.244.71]) by gabe.freedesktop.org (Postfix) with ESMTPS id AFD7E10E8D9; Thu, 6 Mar 2025 04:43:52 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=G9v9xXv9FMgfzJyp1HOBc2IUh1VpExzhBC71My7n+rsvliA3CWRwGUlMMkUiHV0ohQs5gh6d62Hb+Rdit+X/JIXR/RXA9HB7zV+t62JbZ/5NPR0rJZXloprIoA+NPyhgdEKGrVVD149ZMP/gHX82ap7BnrhTDDzC40GdoEueedSdYWoMFZAeDFsZY3K6QEm906FAZs0O7BzU++/DJPGSLjVQNQ2DD68Z3Q6o2LTqWBOxgLKtdbjT+GkNaChQu29j3F+96mDIcxiBsbPE0qNI6UndAjlEWV8tF6F5SuGKp4JWHuYwRtAJBApWT75g0Jjpk8WiASsDmYkLi0cqiG9zjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=r83BGccQ3T5HvdU2HP8phbbJ2DBl3Y42ADRff8m64ss=; b=iMYs4VRHnpWRlS3js0B2/2NzcfY60Wjxie/HelrG/Ho/BUqb3gCrInPkSx2kLcaCnGim7bYcmMM4wxOhywyZs1gdVaeOh4C5mfRnJWGZ7yogq7N1Q0BTmNvmt+VqWoe+ws1IKjiimuwdvqlv2ZJqLPoU2hksEC9eDBZEmNEoK2ErRsqbGVBATtoYdwtFB5Vs/Fhno8dR3vSlSKDAKRyV/Rlu7Cz0xsPCrHW3bOnanhg488KFgm1xZ++P5dJtqxOBdBN1wovnUsnCQspg6NP4nUm3FeY2A8XxW/YPHuXtG5OGL8sAxQKPUffHaxyr3cKcGkp2SXBxVr0R1QXEvUfOQw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=r83BGccQ3T5HvdU2HP8phbbJ2DBl3Y42ADRff8m64ss=; b=IjayQsofRcyApBAQ1ObrtJ2ZZzKql9OQnQqAxRo2RGFM+Tsxb9q1G8YWYYzyo2lepr4HulmWnetduRAUmv/6YiVapgqZJMkQSfKOREHlxkU28kjOTUBQp4kuyBxDxLSBKRY3dptvw0Fol9vSzXEnu7CgnQaAKTa9qZIDU2B36AcYdIJcAunjdVJONlJyVLzpAEwM4eIP3nYJKeQZMb1eC+VMItn8mnjcqZ8Qy4QpzoIDgJz4ZVOzl/7nlexoH8Fomavt4xuxXNXfnK926HI21I8mfkoqbwsMs/wfXXoFfiGqT/pI/+8tL+aXNhHgQ+fMhprUZnC3N+DVfLZHDFQSBA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) by DS0PR12MB7534.namprd12.prod.outlook.com (2603:10b6:8:139::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8489.25; Thu, 6 Mar 2025 04:43:50 +0000 Received: from SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868]) by SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868%7]) with mapi id 15.20.8489.028; Thu, 6 Mar 2025 04:43:50 +0000 From: Balbir Singh To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, Balbir Singh , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Shuah Khan , David Hildenbrand , Barry Song , Baolin Wang , Ryan Roberts , Matthew Wilcox , Peter Xu , Zi Yan , Kefeng Wang , Jane Chu , Alistair Popple , Donet Tom Subject: [RFC 05/11] mm/memory/fault: Add support for zone device THP fault handling Date: Thu, 6 Mar 2025 15:42:33 +1100 Message-ID: <20250306044239.3874247-6-balbirs@nvidia.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250306044239.3874247-1-balbirs@nvidia.com> References: <20250306044239.3874247-1-balbirs@nvidia.com> X-ClientProxiedBy: SJ0PR03CA0273.namprd03.prod.outlook.com (2603:10b6:a03:39e::8) To SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA1PR12MB7272:EE_|DS0PR12MB7534:EE_ X-MS-Office365-Filtering-Correlation-Id: a4549798-3d90-4651-a655-08dd5c697d94 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|366016|1800799024; X-Microsoft-Antispam-Message-Info: S3BI7T+LhfE63h2snhKOREZIHhldMBWT+G+fkjG6LErXeCpomJmB1Nx9vDoo6ulIRjo9k8bGIP+RQk/X42pyFCIlODXfiC85m43RUbEQb3KNUlBsnjF9VUCpVukvuayl/+OkMzkuxTCjGhP8+RHBEz86AhdqCBlVB3p25PGGp+ILEmX9do2XG0cp4StL18B6SMaYYidoEVnqeGAWibs9aCRb932RNP77sUes0xaUQ+dvUZBz/tdTBQEtZMhet7bJT4F/jMnA42advXLxCvf41D7ze26IYPl555rgGIxQt3t3A8PoxQerbI2MX7JUDT3UH/LohwwacOQoB9hn3zClw5thex8QRXbmd53nq+e71XeiwEFUVGIza4MtkVp3EHv0A+z/XVJWfeV6zAnIzN0SGqkQ542DwNX/aWyR9CegJrFYHE/egk94TklyXmlmdvzJ93pODmaYCtp//BEB4ojvQEf6ZWX/SnWR3uuOAzUHnMruuNvbwJjcwRiz9MpeGbq5AKYGulNPRPWHXB1CATNxVP2BVHpHvz+c8qh/Pdk5ejUaVEQAGPZbdjeCiUZi1r9NSu58aCtBxDJgZfHZfrxxRPLm1qd7RuoRHyrk6f4UcKjO9L5AndVjZWKZO0Xhw2TxHHNICfbMdngLXRWYrNDEhk5nIFVafZZOtKvkAlhoxN+P24mZspE7Er3FpVbwSVBj9ZfW+WJfCHtbWICixXLOCKWrM4kc+wy9cKofz5F/HXmG+ZOlSn7FFUOdwN48G2ZjnCzZmiNWNB9m+arjyxHdtFpb6tyblUKgBGhrUKCXpRxRycJljyfr3uF7XfTzPkmwATnnpleaiPIdgnbht4lrj62kL+wMyRwCrH78MFF7FPxhngLEV+P6Y0ZB8uWVYFZjYgsCtBzXUShcDI1xftr6fNIXFRHFmXUBBkeJ4UJCZLc0P/8bUXVeaRQSXgCa5OtGSiZk5+P7jnOunc+j6GYiM515KQCu1gja3QaoaXlT0XCsyBda/yQawTOx/+s/ZN5SB0amAes0pCRZw8hXy5kWSs/BT2xOD8s88fcnfzakljdcbNn2U3IF17Be99DPGFhFGvUR/F00+fLpgy3/jdv6BumAgxvr3e/4krLmLHruG9mmXu2r96X/hvgH3nXVu+dAyrA36Z75b6KFdpy2Q6wpydTtVGdSkkLE2LFuAqdCb8ceNkLBGdmRnn1nmnG8yqp3Ofwk3Aj5lAmR0NDD9hPiYfvtfwDBFwWMeHYz06UNL7Xmg0KPQH9kShzry1Gb4znWKPX+H5V3r9PxdtZp3ci6buUiazn9vpBUA0vy54eaGAqKAoE3If02KrkL52Lxq2lilMFy2wM2SRKaS7NzRlIHtow8k1VP7g4VtgvJjQdxOjr3K5xyWYIad2YtxN6GJFpK X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SA1PR12MB7272.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(7416014)(376014)(366016)(1800799024); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: nbfIbJjpTzxjcxNvXlG/yxnZa7bTTepjwysH7PRBns1d47Dlebx4rZ/NbFXFlttzpNIrfMt+osx5AUEd3uz+jhGWAscnWtzJ5J7wvR3YY42nX5KE3I3ft0lJIdVxWB2IsogXoFb1OmMUfBtC7WnW3r9sKtpD3x26VOFgXVRQNGyvsN6AKD3cMRSALCxR6P3EgsUvulmbnIw8PXAhNj5frI5UB14H+GHTPnxMb8uVyYHaSEKmZ8CfNAd2mE3EFLVs59HuQT9nKpAcQ6FkFaA4+guJKAHrXSHgta5pUC3cNwtq1AGLzE5VO2NmKpT1zbi7/XgzveBIgnj+vBeCflAj06pyi6umMdof0QSPKrmTCfg3SqC/ed2sA3L7TCuApoOIdnOwazt/1CrkuW/HCcwuP+1BUZbkcfNKZIsCdu21h0A4hJ9ef3sfTWawhAMyjN4jFe5NMHQPss1KOeEsxaxYq90pI++KvWF7Hw4rCbiMP6Sd4aqIlbLQQ2lqh0u9rhCY5fw47PRVJV8eZLWzDiKNjeqF37Icvr/Ka/8IqDJQdGdOOW/UJ/tLBISeYIch2gVmQmR3RSDxGkOeq3evgRHmg0SjgvTnAPfT/R4pWltS/uqkpGZH4mtUXbeZp2DpdtogB8tct2GGUB6s0Z6tTzZKhSDnQVqzsJv3Ps/GIQdQ3G6OpbHFMl83fmzZkleciT71aUoOPLgCgylonX6ERBNv0eAMpvzN1nrHmHF/TVy0V0tyCwjrytqu0Tr8UCkx0HkqwdkfmllF5pifE6N3ujofhI169EiaQs6TfdCyFqMPiOi3AvFT/Iar50Va6cu70fIrJjwHItu4EszDJWzl1HdFOWOQEqs/wxuMhm4KVCBqITGogdfh3x8FM6jw1D8ESFi4re3PjPPwWerM9PX29sEAXSmtLOI2ETzXDH0ngUa49u5yDPrpAg+Yf2Tk4Rpo3Lpw0lRoVB3YqU+Mjjn6chay2Pt8O/yHpLFN6sOLwSRg1OL1h5cMi9gVt8Shm3Sci39VwbKB/dikrmS9fsvDNpRXACDloCrcylYcz0MZzZ0y76YwBJSohPaE9rEtE+DCB/rtb1vpk21NZvdbBFRg1Dgx5kcVYN2fT4wUWDPUUKrGUtzokk4rUfd3h4Ho0BRGRiz+TTdmRvYd/qDu3mzkf0IeDHOqYUvCwJlh9NdqjBAJuIAE8KIisYf0FUOOuwS2BqTME3qZaLp8T2FBk4ZPwPXx/5+WJyoj/i7MQCYR6Zej3f+VE9qj7jASaRfJW+/9A/LUmyywAu7JhExzqiqNO1GTx1YSZ26k9zbBCMdOyI5crxyrMqboZlhWUkoVmXRhJLn7rJC5GH68CkYT3HdD1GRHxovpesueffRztJdKekwO4+/pGKb5bf72JcmUCgdNrnk+y/otPcOvWG/DIMKGA6tC3KoJMD0SsJBa/Lo9RpkWnsBMUaJjmjyLDVsvy71ofT+r5v/m7agKHFxj0B7/dnI/qrFUnhnWDEo+g7b9nQNUGTmZyF5xK9ugQ0aA+Sf7sZWNiEBdoRnR8u5wNmKOfwHOReqH8x0O2tOr5yW2uVOnP3mzDzyiyFjoIBrw+48df2Iz X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: a4549798-3d90-4651-a655-08dd5c697d94 X-MS-Exchange-CrossTenant-AuthSource: SA1PR12MB7272.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Mar 2025 04:43:50.7071 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: X94RF78/N4jm1vGr6AFg8veX3pmjAwOo/GcpthyhFpRzmRWMVC2gB1gQkZbYLVIYqgRBKqh76PbsLhacn0yUqg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB7534 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" When the CPU touches a zone device THP entry, the data needs to be migrated back to the CPU, call migrate_to_ram() on these pages via do_huge_pmd_device_private() fault handling helper. Signed-off-by: Balbir Singh --- include/linux/huge_mm.h | 7 +++++++ mm/huge_memory.c | 35 +++++++++++++++++++++++++++++++++++ mm/memory.c | 6 ++++-- 3 files changed, 46 insertions(+), 2 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index e893d546a49f..ad0c0ccfcbc2 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -479,6 +479,8 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf); +vm_fault_t do_huge_pmd_device_private(struct vm_fault *vmf); + extern struct folio *huge_zero_folio; extern unsigned long huge_zero_pfn; @@ -634,6 +636,11 @@ static inline vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) return 0; } +static inline vm_fault_t do_huge_pmd_device_private(struct vm_fault *vmf) +{ + return 0; +} + static inline bool is_huge_zero_folio(const struct folio *folio) { return false; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d8e018d1bdbd..995ac8be5709 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1375,6 +1375,41 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf) return __do_huge_pmd_anonymous_page(vmf); } +vm_fault_t do_huge_pmd_device_private(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; + vm_fault_t ret; + spinlock_t *ptl; + swp_entry_t swp_entry; + struct page *page; + + if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER)) + return VM_FAULT_FALLBACK; + + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { + vma_end_read(vma); + return VM_FAULT_RETRY; + } + + ptl = pmd_lock(vma->vm_mm, vmf->pmd); + if (unlikely(!pmd_same(*vmf->pmd, vmf->orig_pmd))) { + spin_unlock(ptl); + return 0; + } + + swp_entry = pmd_to_swp_entry(vmf->orig_pmd); + page = pfn_swap_entry_to_page(swp_entry); + vmf->page = page; + vmf->pte = NULL; + get_page(page); + spin_unlock(ptl); + ret = page_pgmap(page)->ops->migrate_to_ram(vmf); + put_page(page); + + return ret; +} + static int insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, pfn_t pfn, pgprot_t prot, bool write, pgtable_t pgtable) diff --git a/mm/memory.c b/mm/memory.c index a838c8c44bfd..deaa67b88708 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6149,8 +6149,10 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, vmf.orig_pmd = pmdp_get_lockless(vmf.pmd); if (unlikely(is_swap_pmd(vmf.orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(vmf.orig_pmd)); + if (is_device_private_entry( + pmd_to_swp_entry(vmf.orig_pmd))) + return do_huge_pmd_device_private(&vmf); + if (is_pmd_migration_entry(vmf.orig_pmd)) pmd_migration_entry_wait(mm, vmf.pmd); return 0; From patchwork Thu Mar 6 04:42:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Balbir Singh X-Patchwork-Id: 14003822 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 89E2CC28B23 for ; Thu, 6 Mar 2025 04:44:02 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id DFD3A10E8E3; Thu, 6 Mar 2025 04:44:01 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.b="tt3TUkP5"; dkim-atps=neutral Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2047.outbound.protection.outlook.com [40.107.243.47]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6DFAB10E8DE; Thu, 6 Mar 2025 04:44:00 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Qb295O1SjAgG+LfrePXy2EyTfq6jDnCFLlLvLqmQlPkm72ANh325DMmSbOuNPAQ/kU5nx+NR4DMK106iU16c7+1SZ8zTlLCZhIZtBNhbjEdsilbHq//1WWP1m1iZbGskcf5IUQNA598ZMMhR/qDr8zX6bgTYisBpkoXIdZfb54MRpUZuk2FQwqfW41hVLTxSi8o9fVLzM/CRP1GH/UuaN3WJ9gHfJw3P/ffVKfmScaWBwShxQLU95A7/OlzIyiL7slk9twuqUr1JXYmjIZzu3gXN2ESDRG7tv0i2yVo64zpLj5c5kKpPA5ucdcXu6xHmzH9/KzyCM854EqVJie8z4w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=YcB2n2oFNjToV92LSkT0qVEn2jBHczwFaDg726rQXQk=; b=o1Oyy0AA1/SKqKbgmgisVveSLsY2VhUIoMH5q6MaMYuBpOUGfMXgK6725Pw+ycJ5q2p/fpfMsCC97lSQfHgvABEHofgy4GG07/EJ0RjFATTn28rDf80ytaCh9tARtV5dK+aADECBZrip+XHtCQ9PKQNKWIEiNSdtwH7+GZ6MCYCy3LFP5xBbvZIwr5Pqp5pQxJzJPQdYF+ENQ2hRZGK32SXDmmdqQyeC1mdLiZmlbLmsofHo476nswta2nfhuz2A/ONLBXi+G6WfK7OuKau7zoECt/k8qUZGvVMMzMvjiPmf1PH95Z+D1HaHGlHEuu/hp0CZyZgptcWUJa2iBZCKqQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YcB2n2oFNjToV92LSkT0qVEn2jBHczwFaDg726rQXQk=; b=tt3TUkP5uib+lj+Tbds9wgUcIpSkoEULyxowAuQedkJNhkejAFwOi8cE00fXr6UUBkkKjGjoltDifhDhS84W14IlEvMEUN4z+YtGl0Po+Zq5RoVpKyDyi7xPQu9oIoxvvTHsifI2xk+jVRGDEu62Mn/fbc0kwjF2lB9gjAjVOG7YEF4GdI40DCK0Y5rViynR11MePTrw/RQGS4E0noiTGX2XiE6vdRF9QzbSnk5EtBbrycZo3GkQRInACKlOQOphkBMtnU3dgDkPXmrfePmnD65ZTBXP268cuRA/Koh4UX7JmVyZR+WOkdBH2tTsBPuiwqlbzAQgPb0r6ZBkCcg1XA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) by DS0PR12MB7534.namprd12.prod.outlook.com (2603:10b6:8:139::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8489.25; Thu, 6 Mar 2025 04:43:54 +0000 Received: from SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868]) by SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868%7]) with mapi id 15.20.8489.028; Thu, 6 Mar 2025 04:43:54 +0000 From: Balbir Singh To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, Balbir Singh , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Shuah Khan , David Hildenbrand , Barry Song , Baolin Wang , Ryan Roberts , Matthew Wilcox , Peter Xu , Zi Yan , Kefeng Wang , Jane Chu , Alistair Popple , Donet Tom Subject: [RFC 06/11] lib/test_hmm: test cases and support for zone device private THP Date: Thu, 6 Mar 2025 15:42:34 +1100 Message-ID: <20250306044239.3874247-7-balbirs@nvidia.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250306044239.3874247-1-balbirs@nvidia.com> References: <20250306044239.3874247-1-balbirs@nvidia.com> X-ClientProxiedBy: SJ0PR03CA0277.namprd03.prod.outlook.com (2603:10b6:a03:39e::12) To SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA1PR12MB7272:EE_|DS0PR12MB7534:EE_ X-MS-Office365-Filtering-Correlation-Id: fb60681b-89d8-4215-c304-08dd5c697fb5 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|366016|1800799024; X-Microsoft-Antispam-Message-Info: ezVL+AqnxcWtl0FPi5twaPdc6za06TPT1Y1F5Bw5J0UG99f1CChlVRAv33zBrmN4qpDYz6WXxVl0htgqp3zLyrWZfQXjvNqcHiqwx70tmb7dg+iL6ZUkiA/WxSwsTlBeg96D5cgLQznHiWd8j0r+7kaYmR3VFNzu0IE63UgmYWin+shWrLLfzMWAsLQTEaBTWE9Z6OgQse7E2FnK4Wjf/SAKjuty5PBxWT1ji5LJFd0yHrX7hDZ5OKT2LmdAqvjp52RkU0T4fR3SlWUNm/f/jogtgl97nAi3Npl/SIt+FzXU+KmsRB8W+BVL3i08wZsOrm+fiBSCg/RVaFaLK20DksYb2O6rBwoQ1RFM8KnsQlAnok2IaXxB/mceljlrx0zB2qPW/gM3AdV7FteIS8CR8tnX5lFQLaQapJEkCqhy5pMFyJVDXFNBweOiozJYHhKd55L3N0bd/s0rLdqeqnOY39Uh6oaUMfJPRmoijF/FLj8f2n21ddDzldwhyzf37TyOsudn3PuA2VwYROQ9+kNxwPXRKdBNyF6ECyO1okhaP3zHUDusXRW7NVX4kIZrUfdni9outifOPto8wkgal8U1/ln2a1AWpMM2Ni5xTRuBGQTk4mnDR5jcG1V6Q4yLhgMY4Muozkx38nHSIp5IGIiJPOQaeQFO0rzgkhBDYLSx82M4+mt7Xvr495bmwE2efzlqYwhgBBCJUtVPJEiZWp3dBFE+P77TD9pjM2afdhejaWA3BeQbtb1Fe1B8VlKNZHVC9uSctpJcsCwxwlxgR2yNq3BcPYyaUqk+ft3fKYPDMkIwZqKlVg6UhJrjy89+Q5Zsf3Po6fb5VnPIVIlzapu29lJ2n1hBIXNeeV2VOO4D0HbRzg7yMd0rVgVgfQKcM2P9GVZaEGMW2J6b5L84upffhrYGgj8deL0D7tprDAjDPs4t3f1UDgyM0TSnN4z3fw5LTJuhbQmYOZGfOBUSnwEUK4aQzwcCaDPRr3Nxs5SSSBAfCSJjdCtpaxu3/DcMItombcGGkE+Dwa5KAE0j1unV0xEE3nZP3GSy69OR81whyi/xSjt/cFYk32U2pI2A6MfqGC6/CFQxsIf6eV+ppP9zNQRD8eArBoxyx7bWUb4RGJlUjJfx7mlWTvV/QbpxnLGdX+VZxGgufCKEsw9L/Is4zS78Area6ZreUVB665teeHJc8V3EPPZMenAxhTo+i209D2RpGFcsse0soO23TS7H/R8asL3COfvVYx9BIXO4CPMdO5AVjLf9GT0tZHvhMltB7lJVceHxfKE/h18OrpjZHnRoDA+YoNerR7p+6gdXyf3v1kyIAX+mr+FWKSsJQeVzYiKDhWr1T/kReL/bcqTEHcJGQSzhJzsw5npxLIYMfuHXivP+wBeLQ0KXToxPE7j3 X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SA1PR12MB7272.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(7416014)(376014)(366016)(1800799024); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: RE/v5pP30ShdyxIgMjHHuTHpTtAnRzUccWB2c/x6fRgnfoyNFLB20wfRFhX+2ajPzlzdyhvH1FPMf3Lce8sw0otQ7PBfk1+N2M8GWnkTX+VNX+NI6uzsT9xdtbP1StDWKyP0LpyfW31Q2srwXtaiymKcpI4X8Ya0SmdNcaGS12Avi3/jjWVrhvHxs59cp4M6Pr5LwdFdG4jXDTrjmLWqoPRn7jwR4ORWkbclsmAGAYlf7Zmu/Xf1iPoFGkht8tIoqUCYzgBGyLHqYjqHYfbHVt07lefCQ/wjtUspn3x8DGaDojUOGbDc36YoP7l8oBy/cnQUGmIeGlk9a+YofDfHg2E09GIbFsR4nG26/4u+UW6uwbBa9xKo9QgfHheu2owYufLVInnJkdyzK4MoW8PowRuYf/05wjOpdbuihueZ0thsRl9OvuKpggNzHnAqYr6Q1kpOwqi9iykPGEsi7rct16AugeyLAoIuml5SXqSkyaGjW8j5QJDZmcW+XxzbP5FZAsZDPEeaca5FbIlwNHrEI/+vOSmEMQ7WWo4Jt+ioqPI0He0YP8v4raOFxHetmsrwxX1W4QoNZ5MGmBPsZ3mNbpqc8viqjbKQa00OurvaB9OjwkCswqqpuTsOhsZDP+eJ0JRp+Q5tiQUzGIqezFjUwQDVrPVn+x3O/cU37FANx9QAso2qOPoGU4yvcUIYK3FeQB70GtWENtL+hbAS+naTubQyGATOj/q49sZx36C2yEjUleUYsTMrviuLhB25O/HINEMSXl3te7oS0Gzc75KIdJlr87otKdmNLZ2RdP7LrAQI3Jt+PchKgrxRbJ0CBAOCs9b8sbn+aFUBDqLTxGoy7Q3PXK701oiurOmrguHMNAZu0/A+zM2B3Jhn16wjfOygo4bmoeI2mMQAQeTDw1DidgwOYVGlCidZRJMfBCqanLEK3uv214Aobt9w863ojdjuSpVmY9LwvHfvOEe0l3I87/ZeCJ2EetHPzHja5eQryUTUtkSS0QQ4CU92rwobWph5QIVgQn9T6UZ9QqPLPw+NIVIfrjS4uQGvopMcjr5gS68z3ENg02hx565KoSC5unDYRyRyVjy8vKwk5evIYUK+jOR+Bwz1OjT+IcN7scvAuN3i71d/oEZ2boCx/qSiOfiXCOUqEStgKONKHul+T4picloa7DBxHCscqHst0WF5gxKgzT3EnQWjkrFNPf9hr00cXNF5B/Uk0CUrs7Yx9oX2d0+FHSIZryK9qv1QDWzEBCDzynX0X8tyBwxE6F9KLt4B07asZe6vurqacHHvCF8ssNb+niuMTqokd5oH7vYxf+eot0gKALtaTozSYlCwSiOXXEKAGSTZI69xJB2cu1+HcnCXjCzoh8pNILFBFYTd1o7IXWYWQ/5FQLkfQyqPegiO5gD705RWjZh/o5qLUblBiLRc4i9LQbNDakt0ncgWCqEg4RsWCc1E1PrBcwDWixa2jwXjNBsvhQCtxeoworZ7Zj/TtJpWhleMqwtWR93nodZNkri7WABmsctvtijVIMfqyjLsHtbbkvfFygg17dbPiCc1XgKX6oKtf2uGrDQcchgv5b5t1mBcE8zmnvAjG7UL X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: fb60681b-89d8-4215-c304-08dd5c697fb5 X-MS-Exchange-CrossTenant-AuthSource: SA1PR12MB7272.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Mar 2025 04:43:54.3074 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: m13MT5IvUWgmlyu+sZJgMBoBVtwCgEh+fkPfH2dHM9ua01Vx8EaJtcDBJ/zNWyicxT8rg+fyovjP6xSGSd4Pag== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB7534 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Enhance the hmm test driver (lib/test_hmm) with support for THP pages. A new pool of free_folios() has now been added to the dmirror device, which can be allocated when a request for a THP zone device private page is made. Add compound page awareness to the allocation function during normal migration and fault based migration. These routines also copy folio_nr_pages() when moving data between system memory and device memory. args.src and args.dst used to hold migration entries are now dynamically allocated (as they need to hold HPAGE_PMD_NR entries or more). Split and migrate support will be added in future patches in this series. Signed-off-by: Balbir Singh --- lib/test_hmm.c | 342 +++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 273 insertions(+), 69 deletions(-) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 5b144bc5c4ec..a81d2f8a0426 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -119,6 +119,7 @@ struct dmirror_device { unsigned long calloc; unsigned long cfree; struct page *free_pages; + struct folio *free_folios; spinlock_t lock; /* protects the above */ }; @@ -492,7 +493,7 @@ static int dmirror_write(struct dmirror *dmirror, struct hmm_dmirror_cmd *cmd) } static int dmirror_allocate_chunk(struct dmirror_device *mdevice, - struct page **ppage) + struct page **ppage, bool is_large) { struct dmirror_chunk *devmem; struct resource *res = NULL; @@ -572,20 +573,45 @@ static int dmirror_allocate_chunk(struct dmirror_device *mdevice, pfn_first, pfn_last); spin_lock(&mdevice->lock); - for (pfn = pfn_first; pfn < pfn_last; pfn++) { + for (pfn = pfn_first; pfn < pfn_last; ) { struct page *page = pfn_to_page(pfn); + if (is_large && IS_ALIGNED(pfn, HPAGE_PMD_NR) + && (pfn + HPAGE_PMD_NR <= pfn_last)) { + page->zone_device_data = mdevice->free_folios; + mdevice->free_folios = page_folio(page); + pfn += HPAGE_PMD_NR; + continue; + } + page->zone_device_data = mdevice->free_pages; mdevice->free_pages = page; + pfn++; } + + ret = 0; if (ppage) { - *ppage = mdevice->free_pages; - mdevice->free_pages = (*ppage)->zone_device_data; - mdevice->calloc++; + if (is_large) { + if (!mdevice->free_folios) { + ret = -ENOMEM; + goto err_unlock; + } + *ppage = folio_page(mdevice->free_folios, 0); + mdevice->free_folios = (*ppage)->zone_device_data; + mdevice->calloc += HPAGE_PMD_NR; + } else if (mdevice->free_pages) { + *ppage = mdevice->free_pages; + mdevice->free_pages = (*ppage)->zone_device_data; + mdevice->calloc++; + } else { + ret = -ENOMEM; + goto err_unlock; + } } +err_unlock: spin_unlock(&mdevice->lock); - return 0; + return ret; err_release: mutex_unlock(&mdevice->devmem_lock); @@ -598,10 +624,13 @@ static int dmirror_allocate_chunk(struct dmirror_device *mdevice, return ret; } -static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice) +static struct page *dmirror_devmem_alloc_page(struct dmirror *dmirror, + bool is_large) { struct page *dpage = NULL; struct page *rpage = NULL; + unsigned int order = is_large ? HPAGE_PMD_ORDER : 0; + struct dmirror_device *mdevice = dmirror->mdevice; /* * For ZONE_DEVICE private type, this is a fake device so we allocate @@ -610,49 +639,55 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice) * data and ignore rpage. */ if (dmirror_is_private_zone(mdevice)) { - rpage = alloc_page(GFP_HIGHUSER); + rpage = folio_page(folio_alloc(GFP_HIGHUSER, order), 0); if (!rpage) return NULL; } spin_lock(&mdevice->lock); - if (mdevice->free_pages) { + if (is_large && mdevice->free_folios) { + dpage = folio_page(mdevice->free_folios, 0); + mdevice->free_folios = dpage->zone_device_data; + mdevice->calloc += 1 << order; + spin_unlock(&mdevice->lock); + } else if (!is_large && mdevice->free_pages) { dpage = mdevice->free_pages; mdevice->free_pages = dpage->zone_device_data; mdevice->calloc++; spin_unlock(&mdevice->lock); } else { spin_unlock(&mdevice->lock); - if (dmirror_allocate_chunk(mdevice, &dpage)) + if (dmirror_allocate_chunk(mdevice, &dpage, is_large)) goto error; } - zone_device_page_init(dpage); + init_zone_device_folio(page_folio(dpage), order); dpage->zone_device_data = rpage; return dpage; error: if (rpage) - __free_page(rpage); + __free_pages(rpage, order); return NULL; } static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args, struct dmirror *dmirror) { - struct dmirror_device *mdevice = dmirror->mdevice; const unsigned long *src = args->src; unsigned long *dst = args->dst; unsigned long addr; - for (addr = args->start; addr < args->end; addr += PAGE_SIZE, - src++, dst++) { + for (addr = args->start; addr < args->end; ) { struct page *spage; struct page *dpage; struct page *rpage; + bool is_large = *src & MIGRATE_PFN_COMPOUND; + int write = (*src & MIGRATE_PFN_WRITE) ? MIGRATE_PFN_WRITE : 0; + unsigned long nr = 1; if (!(*src & MIGRATE_PFN_MIGRATE)) - continue; + goto next; /* * Note that spage might be NULL which is OK since it is an @@ -662,17 +697,45 @@ static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args, if (WARN(spage && is_zone_device_page(spage), "page already in device spage pfn: 0x%lx\n", page_to_pfn(spage))) + goto next; + + dpage = dmirror_devmem_alloc_page(dmirror, is_large); + if (!dpage) { + struct folio *folio; + unsigned long i; + unsigned long spfn = *src >> MIGRATE_PFN_SHIFT; + struct page *src_page; + + if (!is_large) + goto next; + + if (!spage && is_large) { + nr = HPAGE_PMD_NR; + } else { + folio = page_folio(spage); + nr = folio_nr_pages(folio); + } + + for (i = 0; i < nr && addr < args->end; i++) { + dpage = dmirror_devmem_alloc_page(dmirror, false); + rpage = BACKING_PAGE(dpage); + rpage->zone_device_data = dmirror; + + *dst = migrate_pfn(page_to_pfn(dpage)) | write; + src_page = pfn_to_page(spfn + i); + + if (spage) + copy_highpage(rpage, src_page); + else + clear_highpage(rpage); + src++; + dst++; + addr += PAGE_SIZE; + } continue; - - dpage = dmirror_devmem_alloc_page(mdevice); - if (!dpage) - continue; + } rpage = BACKING_PAGE(dpage); - if (spage) - copy_highpage(rpage, spage); - else - clear_highpage(rpage); /* * Normally, a device would use the page->zone_device_data to @@ -684,10 +747,42 @@ static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args, pr_debug("migrating from sys to dev pfn src: 0x%lx pfn dst: 0x%lx\n", page_to_pfn(spage), page_to_pfn(dpage)); - *dst = migrate_pfn(page_to_pfn(dpage)); - if ((*src & MIGRATE_PFN_WRITE) || - (!spage && args->vma->vm_flags & VM_WRITE)) - *dst |= MIGRATE_PFN_WRITE; + + *dst = migrate_pfn(page_to_pfn(dpage)) | write; + + if (is_large) { + int i; + struct folio *folio = page_folio(dpage); + *dst |= MIGRATE_PFN_COMPOUND; + + if (folio_test_large(folio)) { + for (i = 0; i < folio_nr_pages(folio); i++) { + struct page *dst_page = + pfn_to_page(page_to_pfn(rpage) + i); + struct page *src_page = + pfn_to_page(page_to_pfn(spage) + i); + + if (spage) + copy_highpage(dst_page, src_page); + else + clear_highpage(dst_page); + src++; + dst++; + addr += PAGE_SIZE; + } + continue; + } + } + + if (spage) + copy_highpage(rpage, spage); + else + clear_highpage(rpage); + +next: + src++; + dst++; + addr += PAGE_SIZE; } } @@ -734,14 +829,17 @@ static int dmirror_migrate_finalize_and_map(struct migrate_vma *args, const unsigned long *src = args->src; const unsigned long *dst = args->dst; unsigned long pfn; + const unsigned long start_pfn = start >> PAGE_SHIFT; + const unsigned long end_pfn = end >> PAGE_SHIFT; /* Map the migrated pages into the device's page tables. */ mutex_lock(&dmirror->mutex); - for (pfn = start >> PAGE_SHIFT; pfn < (end >> PAGE_SHIFT); pfn++, - src++, dst++) { + for (pfn = start_pfn; pfn < end_pfn; pfn++, src++, dst++) { struct page *dpage; void *entry; + int nr, i; + struct page *rpage; if (!(*src & MIGRATE_PFN_MIGRATE)) continue; @@ -750,13 +848,25 @@ static int dmirror_migrate_finalize_and_map(struct migrate_vma *args, if (!dpage) continue; - entry = BACKING_PAGE(dpage); - if (*dst & MIGRATE_PFN_WRITE) - entry = xa_tag_pointer(entry, DPT_XA_TAG_WRITE); - entry = xa_store(&dmirror->pt, pfn, entry, GFP_ATOMIC); - if (xa_is_err(entry)) { - mutex_unlock(&dmirror->mutex); - return xa_err(entry); + if (*dst & MIGRATE_PFN_COMPOUND) + nr = folio_nr_pages(page_folio(dpage)); + else + nr = 1; + + WARN_ON_ONCE(end_pfn < start_pfn + nr); + + rpage = BACKING_PAGE(dpage); + VM_BUG_ON(folio_nr_pages(page_folio(rpage)) != nr); + + for (i = 0; i < nr; i++) { + entry = folio_page(page_folio(rpage), i); + if (*dst & MIGRATE_PFN_WRITE) + entry = xa_tag_pointer(entry, DPT_XA_TAG_WRITE); + entry = xa_store(&dmirror->pt, pfn + i, entry, GFP_ATOMIC); + if (xa_is_err(entry)) { + mutex_unlock(&dmirror->mutex); + return xa_err(entry); + } } } @@ -829,31 +939,61 @@ static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args, unsigned long start = args->start; unsigned long end = args->end; unsigned long addr; + unsigned int order = 0; + int i; - for (addr = start; addr < end; addr += PAGE_SIZE, - src++, dst++) { + for (addr = start; addr < end; ) { struct page *dpage, *spage; spage = migrate_pfn_to_page(*src); if (!spage || !(*src & MIGRATE_PFN_MIGRATE)) - continue; + goto next; if (WARN_ON(!is_device_private_page(spage) && !is_device_coherent_page(spage))) - continue; + goto next; spage = BACKING_PAGE(spage); - dpage = alloc_page_vma(GFP_HIGHUSER_MOVABLE, args->vma, addr); - if (!dpage) - continue; - pr_debug("migrating from dev to sys pfn src: 0x%lx pfn dst: 0x%lx\n", - page_to_pfn(spage), page_to_pfn(dpage)); + order = folio_order(page_folio(spage)); + if (order) + dpage = folio_page(vma_alloc_folio(GFP_HIGHUSER_MOVABLE, + order, args->vma, addr), 0); + else + dpage = alloc_page_vma(GFP_HIGHUSER_MOVABLE, args->vma, addr); + + /* Try with smaller pages if large allocation fails */ + if (!dpage && order) { + dpage = alloc_page_vma(GFP_HIGHUSER_MOVABLE, args->vma, addr); + if (!dpage) + return VM_FAULT_OOM; + order = 0; + } + + pr_debug("migrating from sys to dev pfn src: 0x%lx pfn dst: 0x%lx\n", + page_to_pfn(spage), page_to_pfn(dpage)); lock_page(dpage); xa_erase(&dmirror->pt, addr >> PAGE_SHIFT); copy_highpage(dpage, spage); *dst = migrate_pfn(page_to_pfn(dpage)); if (*src & MIGRATE_PFN_WRITE) *dst |= MIGRATE_PFN_WRITE; + if (order) + *dst |= MIGRATE_PFN_COMPOUND; + + for (i = 0; i < (1 << order); i++) { + struct page *src_page; + struct page *dst_page; + + src_page = pfn_to_page(page_to_pfn(spage) + i); + dst_page = pfn_to_page(page_to_pfn(dpage) + i); + + xa_erase(&dmirror->pt, addr >> PAGE_SHIFT); + copy_highpage(dst_page, src_page); + } +next: + addr += PAGE_SIZE << order; + src += 1 << order; + dst += 1 << order; } return 0; } @@ -939,8 +1079,8 @@ static int dmirror_migrate_to_device(struct dmirror *dmirror, unsigned long size = cmd->npages << PAGE_SHIFT; struct mm_struct *mm = dmirror->notifier.mm; struct vm_area_struct *vma; - unsigned long src_pfns[64] = { 0 }; - unsigned long dst_pfns[64] = { 0 }; + unsigned long *src_pfns; + unsigned long *dst_pfns; struct dmirror_bounce bounce; struct migrate_vma args = { 0 }; unsigned long next; @@ -955,6 +1095,18 @@ static int dmirror_migrate_to_device(struct dmirror *dmirror, if (!mmget_not_zero(mm)) return -EINVAL; + ret = -ENOMEM; + src_pfns = kmalloc_array(PTRS_PER_PTE, sizeof(*src_pfns), + GFP_KERNEL | __GFP_RETRY_MAYFAIL); + if (!src_pfns) + goto free_mem; + + dst_pfns = kmalloc_array(PTRS_PER_PTE, sizeof(*dst_pfns), + GFP_KERNEL | __GFP_RETRY_MAYFAIL); + if (!dst_pfns) + goto free_mem; + + ret = 0; mmap_read_lock(mm); for (addr = start; addr < end; addr = next) { vma = vma_lookup(mm, addr); @@ -962,7 +1114,7 @@ static int dmirror_migrate_to_device(struct dmirror *dmirror, ret = -EINVAL; goto out; } - next = min(end, addr + (ARRAY_SIZE(src_pfns) << PAGE_SHIFT)); + next = min(end, addr + (PTRS_PER_PTE << PAGE_SHIFT)); if (next > vma->vm_end) next = vma->vm_end; @@ -972,7 +1124,8 @@ static int dmirror_migrate_to_device(struct dmirror *dmirror, args.start = addr; args.end = next; args.pgmap_owner = dmirror->mdevice; - args.flags = MIGRATE_VMA_SELECT_SYSTEM; + args.flags = MIGRATE_VMA_SELECT_SYSTEM | + MIGRATE_VMA_SELECT_COMPOUND; ret = migrate_vma_setup(&args); if (ret) goto out; @@ -992,7 +1145,7 @@ static int dmirror_migrate_to_device(struct dmirror *dmirror, */ ret = dmirror_bounce_init(&bounce, start, size); if (ret) - return ret; + goto free_mem; mutex_lock(&dmirror->mutex); ret = dmirror_do_read(dmirror, start, end, &bounce); mutex_unlock(&dmirror->mutex); @@ -1003,11 +1156,14 @@ static int dmirror_migrate_to_device(struct dmirror *dmirror, } cmd->cpages = bounce.cpages; dmirror_bounce_fini(&bounce); - return ret; + goto free_mem; out: mmap_read_unlock(mm); mmput(mm); +free_mem: + kfree(src_pfns); + kfree(dst_pfns); return ret; } @@ -1200,6 +1356,7 @@ static void dmirror_device_evict_chunk(struct dmirror_chunk *chunk) unsigned long i; unsigned long *src_pfns; unsigned long *dst_pfns; + unsigned int order = 0; src_pfns = kvcalloc(npages, sizeof(*src_pfns), GFP_KERNEL | __GFP_NOFAIL); dst_pfns = kvcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL | __GFP_NOFAIL); @@ -1215,13 +1372,25 @@ static void dmirror_device_evict_chunk(struct dmirror_chunk *chunk) if (WARN_ON(!is_device_private_page(spage) && !is_device_coherent_page(spage))) continue; + + order = folio_order(page_folio(spage)); spage = BACKING_PAGE(spage); - dpage = alloc_page(GFP_HIGHUSER_MOVABLE | __GFP_NOFAIL); + if (src_pfns[i] & MIGRATE_PFN_COMPOUND) { + dpage = folio_page(folio_alloc(GFP_HIGHUSER_MOVABLE, + order), 0); + } else { + dpage = alloc_page(GFP_HIGHUSER_MOVABLE | __GFP_NOFAIL); + order = 0; + } + + /* TODO Support splitting here */ lock_page(dpage); - copy_highpage(dpage, spage); dst_pfns[i] = migrate_pfn(page_to_pfn(dpage)); if (src_pfns[i] & MIGRATE_PFN_WRITE) dst_pfns[i] |= MIGRATE_PFN_WRITE; + if (order) + dst_pfns[i] |= MIGRATE_PFN_COMPOUND; + folio_copy(page_folio(dpage), page_folio(spage)); } migrate_device_pages(src_pfns, dst_pfns, npages); migrate_device_finalize(src_pfns, dst_pfns, npages); @@ -1234,7 +1403,12 @@ static void dmirror_remove_free_pages(struct dmirror_chunk *devmem) { struct dmirror_device *mdevice = devmem->mdevice; struct page *page; + struct folio *folio; + + for (folio = mdevice->free_folios; folio; folio = folio_zone_device_data(folio)) + if (dmirror_page_to_chunk(folio_page(folio, 0)) == devmem) + mdevice->free_folios = folio_zone_device_data(folio); for (page = mdevice->free_pages; page; page = page->zone_device_data) if (dmirror_page_to_chunk(page) == devmem) mdevice->free_pages = page->zone_device_data; @@ -1265,6 +1439,7 @@ static void dmirror_device_remove_chunks(struct dmirror_device *mdevice) mdevice->devmem_count = 0; mdevice->devmem_capacity = 0; mdevice->free_pages = NULL; + mdevice->free_folios = NULL; kfree(mdevice->devmem_chunks); mdevice->devmem_chunks = NULL; } @@ -1378,18 +1553,29 @@ static void dmirror_devmem_free(struct page *page) { struct page *rpage = BACKING_PAGE(page); struct dmirror_device *mdevice; + struct folio *folio = page_folio(page); + unsigned int order = folio_order(folio); - if (rpage != page) - __free_page(rpage); + if (rpage != page) { + if (order) + __free_pages(rpage, order); + else + __free_page(rpage); + } mdevice = dmirror_page_to_device(page); spin_lock(&mdevice->lock); /* Return page to our allocator if not freeing the chunk */ if (!dmirror_page_to_chunk(page)->remove) { - mdevice->cfree++; - page->zone_device_data = mdevice->free_pages; - mdevice->free_pages = page; + mdevice->cfree += 1 << order; + if (order) { + page->zone_device_data = mdevice->free_folios; + mdevice->free_folios = folio; + } else { + page->zone_device_data = mdevice->free_pages; + mdevice->free_pages = page; + } } spin_unlock(&mdevice->lock); } @@ -1397,11 +1583,10 @@ static void dmirror_devmem_free(struct page *page) static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf) { struct migrate_vma args = { 0 }; - unsigned long src_pfns = 0; - unsigned long dst_pfns = 0; struct page *rpage; struct dmirror *dmirror; - vm_fault_t ret; + vm_fault_t ret = 0; + unsigned int order, nr; /* * Normally, a device would use the page->zone_device_data to point to @@ -1412,21 +1597,36 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf) dmirror = rpage->zone_device_data; /* FIXME demonstrate how we can adjust migrate range */ + order = folio_order(page_folio(vmf->page)); + nr = 1 << order; + + /* + * Consider a per-cpu cache of src and dst pfns, but with + * large number of cpus that might not scale well. + */ + args.start = ALIGN_DOWN(vmf->address, (1 << (PAGE_SHIFT + order))); args.vma = vmf->vma; - args.start = vmf->address; - args.end = args.start + PAGE_SIZE; - args.src = &src_pfns; - args.dst = &dst_pfns; + args.end = args.start + (PAGE_SIZE << order); + args.src = kcalloc(nr, sizeof(*args.src), GFP_KERNEL); + args.dst = kcalloc(nr, sizeof(*args.dst), GFP_KERNEL); args.pgmap_owner = dmirror->mdevice; args.flags = dmirror_select_device(dmirror); args.fault_page = vmf->page; + if (!args.src || !args.dst) { + ret = VM_FAULT_OOM; + goto err; + } + + if (order) + args.flags |= MIGRATE_VMA_SELECT_COMPOUND; + if (migrate_vma_setup(&args)) return VM_FAULT_SIGBUS; ret = dmirror_devmem_fault_alloc_and_copy(&args, dmirror); if (ret) - return ret; + goto err; migrate_vma_pages(&args); /* * No device finalize step is needed since @@ -1434,12 +1634,16 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf) * invalidated the device page table. */ migrate_vma_finalize(&args); - return 0; +err: + kfree(args.src); + kfree(args.dst); + return ret; } static const struct dev_pagemap_ops dmirror_devmem_ops = { .page_free = dmirror_devmem_free, .migrate_to_ram = dmirror_devmem_fault, + .page_free = dmirror_devmem_free, }; static int dmirror_device_init(struct dmirror_device *mdevice, int id) @@ -1465,7 +1669,7 @@ static int dmirror_device_init(struct dmirror_device *mdevice, int id) return ret; /* Build a list of free ZONE_DEVICE struct pages */ - return dmirror_allocate_chunk(mdevice, NULL); + return dmirror_allocate_chunk(mdevice, NULL, false); } static void dmirror_device_remove(struct dmirror_device *mdevice) From patchwork Thu Mar 6 04:42:35 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Balbir Singh X-Patchwork-Id: 14003823 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9466EC282D1 for ; Thu, 6 Mar 2025 04:44:04 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D55BB10E8E7; Thu, 6 Mar 2025 04:44:03 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.b="V49qwEit"; dkim-atps=neutral Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2062.outbound.protection.outlook.com [40.107.220.62]) by gabe.freedesktop.org (Postfix) with ESMTPS id E41BC10E8E4; Thu, 6 Mar 2025 04:44:02 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=U0nDYrCKuq7xGImWwsj35l/Ohmzj7T6/sABUtsz2UzHCJxpLk2eGY90YL5JLWCPf7TKnicX5jvzM/EG6UlsTQzLqCRHrgh4gOaf6CGLaho5E8dsqxhHx7Jvmu3B1lPtI+2DNGQDCquIEGYaeUEpSs8NRQln5la5l0rPkNLMOBVdvzUGJzGRFq8OGZenF2uzf5IxmBdtkZnWtwKj2lFWOyoPXLQH+AOm28gaEihFG5eYSCWYautepu4norttczu8oK7skkGHzgG6gQQ53A6qq5sglAsqTFEAoZqtoB13YAatYSv+ffDZOyNzz8KGLj/P+GJn2V26bMCDS4//azzsQaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=EfAQ5J7KgEMnhpzY/3xJHeeTxTuaO5ru8T/vnoUf9xs=; b=WxgtvaXyuPKMK7dyp+7Hu9okHQrekdLSP8GLMAubPOJlonspnnCcwsw/+QlVxGmCIZkplODywoDjt+YLqSd5K8e1ptqmwTgl0R6HWxex0CFqFuaedgoE4eow0NmetZvLvWrqsTh0pjyRbe9Mx1yu0fXyWPHNczVt7Cj3Ikrs/dcILe8UkX3OOa6U/4H2H2QUV+8w+mwz5XUji9Mk4P+VKxchoKVS9YqRAZUQ31EjWhM5gGZQavK1RmXgT5S9pgTH50JchSWaLxC7DQx4nuKGlyKy4QTJFmhGgf2ZP4OheUtHcmEwy1lplxmh2UHLfa+9usf3RTDdYCmQK78MxUVpZQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=EfAQ5J7KgEMnhpzY/3xJHeeTxTuaO5ru8T/vnoUf9xs=; b=V49qwEitCKnA4xtOwdqZZO66PEAUffYOacKfoxXFRLLiDBTbPLrYRffyEjNb16VZTmgLEfXFnoWWawraw4wH9UFxgcXOCntKT5QCx2XuIyEOggMd4C7fHElASQkiLDFTdsyFM8NUb/kX0SEtg22nfZZU0ktNTd3E7x1I++GtU/PqGM6HVoGBohJtzmMBmwiU38oNf/9CQExydTUXx7IjwwqwSkCJp3XQcKfR6jbRdZ6Jq/iuGydh34+7D92Lme9R2tVdFM0sTBiqst4N3lzthXkziNJ/B6n/vBedN40c7dMpmndPH7R/BFraR55YvcVDxuc+NT6TaoEYLn6rgaH1Gg== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) by SJ0PR12MB5674.namprd12.prod.outlook.com (2603:10b6:a03:42c::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8511.17; Thu, 6 Mar 2025 04:43:58 +0000 Received: from SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868]) by SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868%7]) with mapi id 15.20.8489.028; Thu, 6 Mar 2025 04:43:57 +0000 From: Balbir Singh To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, Balbir Singh , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Shuah Khan , David Hildenbrand , Barry Song , Baolin Wang , Ryan Roberts , Matthew Wilcox , Peter Xu , Zi Yan , Kefeng Wang , Jane Chu , Alistair Popple , Donet Tom Subject: [RFC 07/11] mm/memremap: Add folio_split support Date: Thu, 6 Mar 2025 15:42:35 +1100 Message-ID: <20250306044239.3874247-8-balbirs@nvidia.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250306044239.3874247-1-balbirs@nvidia.com> References: <20250306044239.3874247-1-balbirs@nvidia.com> X-ClientProxiedBy: BYAPR21CA0006.namprd21.prod.outlook.com (2603:10b6:a03:114::16) To SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA1PR12MB7272:EE_|SJ0PR12MB5674:EE_ X-MS-Office365-Filtering-Correlation-Id: c3a2bc03-9d0c-4634-a6f7-08dd5c6981bc X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|7416014|376014; X-Microsoft-Antispam-Message-Info: ZSmTMkc52VQF95t8ntgQ7R5W9IiunuTeSlY7hkVZ2Nstf7/4vVfjbAKQn6qJuroAdeRNphEsd+73x5UWGqAbiAzm3i3CQ2oovtQTGNejTDHX9bTv77cJhM6BX2jp+STyGTnAcmCGZrCok7ssefx87J8dfbb6BrKFYcFbnLT3pugMnfY22FS+3jjsgYR8GO8fJ8ZWMuPNinUlnxA0hijImc0G5dINaFYdR5n2PW7VN+uPkLVnr3Lf5ksttg/KWpeepEk5gnTgIoRi60+YjoX6a1YEzOjiu3i9TCTx/KkC94zHAnJUBhzvO+amrgvtukG5LfA+KnuGbfrjb8BlR2Ug2aVlBjUKOCpKwLi/V+dCD1scMP7sCUlun+BO57xV+pCNQuOzHjQhR9lJe+KFmEW1kYHe/FhC0VIZVH0yA4HOjqeKK4VNeId2xX9gnxwNeKrQTxyiQ6exKuwh0zalL2uOi4B6lFbsYuc3MZGkSWsvxvUtIMr1JoAAXtPn/sfT2IMScvBtYMeQNn9bHF4LGdBQjzZI7qdtS69W/jUuDNth1IhSrv5LFusN8dthmyir/NuW+FGV/4gt5AbingzNzhg8idX+FxpETHecLgl5+yjvf+oXK0J7nscwQhwESlvE35BnjAn5A+9v7nS0/09Mkn2kPgKZioQ/eoCqop+/mFN3vXRZQnlv2dlirR1RMG4g9yH408TvV5BiByb0ZvdE3gKW6PtsLuONFMNNh+D3x58CqSrn2ALsvzCoF6l6XaQHJu1Wi11HHr84A5dM0Md0IlEqdr3wOigpnir8uMzrUfKuJr6K4pvdIY5LGLPIPn42w+xJeOjhjh5/WrxLmiq7J5o3RpFhi4EkYxn8UBf3o0AVZKAIXgZBSlWBbtl3pNdUF9fCENOMen7q0IoU8m9PuyJ/4PozM7HaeyMe/Q14flpLUpe5zBovjpKi+poR6GW0JANLH0bSBvdvRx1pW50ZzcMu6zc624M8sMidfs/gXKCxCv9qDpd+1LLHpf1vvaU6oMT6je+U9b107Z1DKXkLjife4DDvmxxB/PkDOpusMypOD5BTx2aOdLGbg06K+MtRISHxCSNZyLn3oYqav/Hfn8q+iXLKkED4sD717q66QhZNpiYFtGKDTbjcwz7jEtmiHSXu/gVpz5h6iquHDq8wQfvfCNoebfYiazSu5Wz7dEpSFjOadQw/5Fj00k8kvue1uNtFLT2AP+vJO39GhLMz2/sgF2r6L6soD2RE4a8URDGFRIts0h6k11UWh8HuoaeaDV6g1rQuO4ZamLrLjxBl6TlOUQHzJQGEBR1+nsoNNvEaQbfgZ02ADh+w87tnP2YfibjlyANx/5MEWs1ZXLg8atQc2dRiTA1/ZX+whGaOLCPaJRvttuqdZfGQP+rRrrfxQ9xR X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SA1PR12MB7272.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(366016)(1800799024)(7416014)(376014); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: /9MFiabw7JK7w6ItwPVRnRemiz6usjUeAo0mrEEigGUI3YGKi3SKtn/c2D2fS3PFR9mOA1TxvON6yaw2y3nff8fLUtUuzKUsb4OjNrxmpAi4dfRckp8KXuqk8oRlhmnj7yvesxpwO9J4U6h45eausyQitbDyPtzIzp1LAZdel2fKaC22OPpE59iDHCLTptV7I+suie+4D2CPKj57+VyvLIllJBYZJkaja/3pOjAN0onYPZoHDonJ+vjnGo6Oz8VREbers3dmDEoS0GefZxmKe5Kmr/0Dqz+ypHxN/3AKTNSwSqLFXm4N6LaIrkVVJvryaphVZhewToPr9eef57OxyLX4x2ruKwKTx0BlRNashWpsgAd3zjG7DwZthkpOfHEk0isE8ruLsbNmt80Z1/f7Zw193Tv7va5n+OohIG0W8LY4OXBgk9MHP67u/z2Iv6Jw6oNE0erXVE2e9yXE0wpAEpGz9NwLMb3sBghFJ7LaD0ebJxhSBiay1mt/ecuqYsL/lt7Ncpav2dFO+Mp6qk10JWj4jC56v6BaRr12gRX2NT8jxUoCERvICYocnEisZ6Uqr4kXW3kYpUGrVLG40PY63uQHVkwlNVmsLtnIv5WA5DI2AzK/Or5MyXlt+QBucQN8r2HMGkIMsjxK32eRMy1nJTUdLu6NuYQuWrCEzmTn5cbFX65v1G1ZOym5/tvAoUEJpL6M5gAzPayfsCU2NCFOc+RAVoXFDjtwvIK8dGuBzuamqutY4tyOet+Qax2mNJLpp1Ng/yEGGYNKWe5homE3FPw3sV5hR+WrNFZJLJT0/QwohtuJBb6rCDBtSaZPImOl+Ct9GaaKMFWx/6s7AbASWaS058+KUNv8UJIM7efNvRb/HLiGvfQ3WuOitqnA7ZmObIpkLyrb1kLfb52/Ht3yzX4HM9AqqYWU8iAXsgI38Wckg93buEYNsPr0h1g7o/E/th/4Lhdzj+zl9j7ujqjm/Qn98ZYD8vAcvGDzR59MNDcHHz313FZGpR/VVyW8upm1xHgH2AcDt18W9y6QmTnqHZnTEL2dJfUwlcspizNv9M6M0qfjkGBzvQ+oUTETAU05gbEO6rNPOnVoP7VZmN1apX0VM0Fy/30iO+NFmqYZjJpmXmzTYTY2ZVPQ1T8s3AlLmk/8voDvbbjMY59xqJPCef/zM2x1dQtvKukE8oM7P6i0deb4MgpRgg/nOqNL2ty8Ythuqn898a/gF5mNDmNjiW7h4GEQXrn496s84DC58Dd+BLuh8zeInJhzmagHTyp4e1z3kEHER5vO5kfZ4y2FvUOdtGo67RLwm5lYqjRcW0dM5UUpbk/v5O+gGs+cqh19FmmbUbvYpu+85KQAw1ywH+01k/kd+fIdgHryL70ujZsFT6A0JuwvjLtdni+0CNKgAD6IeuK4rG4J0iOTEyqga6vg5EtqDib92hRbtIZse+KtHkfF9o572j3LzYvGk1ob4W2NIvUSVEHKq6xZbkI/Tn0rw6Mfc9eSBchYE0xaN2ApZ0Zr7GGeVVUsz9/Lhqrz1lv7NL8BYjQrpb0F/zOvbJn2cx9KCZ0eDMEkGIBK4IwWNwzDBRBte27K3EFXUk53 X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: c3a2bc03-9d0c-4634-a6f7-08dd5c6981bc X-MS-Exchange-CrossTenant-AuthSource: SA1PR12MB7272.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Mar 2025 04:43:57.6752 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: j1hrAssSrWSairo8gUxXMWQIHQnspsMWEYEyOO3yLBL+KFkMoU0VdIvybEJvMsiStx8F8KJKR6izTptse9wEkA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ0PR12MB5674 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" When a zone device page is split (via huge pmd folio split). The driver callback for folio_split is invoked to let the device driver know that the folio size has been split into a smaller order. The HMM test driver has been updated to handle the split, since the test driver uses backing pages, it requires a mechanism of reorganizing the backing pages (backing pages are used to create a mirror device) again into the right sized order pages. This is supported by exporting prep_compound_page(). Signed-off-by: Balbir Singh --- include/linux/memremap.h | 7 +++++++ include/linux/mm.h | 1 + lib/test_hmm.c | 35 +++++++++++++++++++++++++++++++++++ mm/huge_memory.c | 5 +++++ mm/page_alloc.c | 1 + 5 files changed, 49 insertions(+) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 11d586dd8ef1..2091b754f1da 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -100,6 +100,13 @@ struct dev_pagemap_ops { */ int (*memory_failure)(struct dev_pagemap *pgmap, unsigned long pfn, unsigned long nr_pages, int mf_flags); + + /* + * Used for private (un-addressable) device memory only. + * This callback is used when a folio is split into + * a smaller folio + */ + void (*folio_split)(struct folio *head, struct folio *tail); }; #define PGMAP_ALTMAP_VALID (1 << 0) diff --git a/include/linux/mm.h b/include/linux/mm.h index 98a67488b5fe..3d0e91e0a923 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1415,6 +1415,7 @@ static inline struct folio *virt_to_folio(const void *x) void __folio_put(struct folio *folio); void split_page(struct page *page, unsigned int order); +void prep_compound_page(struct page *page, unsigned int order); void folio_copy(struct folio *dst, struct folio *src); int folio_mc_copy(struct folio *dst, struct folio *src); diff --git a/lib/test_hmm.c b/lib/test_hmm.c index a81d2f8a0426..18b6a7b061d7 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -1640,10 +1640,45 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf) return ret; } + +static void dmirror_devmem_folio_split(struct folio *head, struct folio *tail) +{ + struct page *rpage = BACKING_PAGE(folio_page(head, 0)); + struct folio *new_rfolio; + struct folio *rfolio; + unsigned long offset = 0; + + if (!rpage) { + folio_page(tail, 0)->zone_device_data = NULL; + return; + } + + offset = folio_pfn(tail) - folio_pfn(head); + rfolio = page_folio(rpage); + new_rfolio = page_folio(folio_page(rfolio, offset)); + + folio_page(tail, 0)->zone_device_data = folio_page(new_rfolio, 0); + + if (folio_pfn(tail) - folio_pfn(head) == 1) { + if (folio_order(head)) + prep_compound_page(folio_page(rfolio, 0), + folio_order(head)); + folio_set_count(rfolio, 1); + } + clear_compound_head(folio_page(new_rfolio, 0)); + if (folio_order(tail)) + prep_compound_page(folio_page(new_rfolio, 0), + folio_order(tail)); + folio_set_count(new_rfolio, 1); + folio_page(new_rfolio, 0)->mapping = folio_page(rfolio, 0)->mapping; + tail->pgmap = head->pgmap; +} + static const struct dev_pagemap_ops dmirror_devmem_ops = { .page_free = dmirror_devmem_free, .migrate_to_ram = dmirror_devmem_fault, .page_free = dmirror_devmem_free, + .folio_split = dmirror_devmem_folio_split, }; static int dmirror_device_init(struct dmirror_device *mdevice, int id) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 995ac8be5709..518a70d1b58a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3655,6 +3655,11 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, MTHP_STAT_NR_ANON, 1); } + if (folio_is_device_private(origin_folio) && + origin_folio->pgmap->ops->folio_split) + origin_folio->pgmap->ops->folio_split( + origin_folio, release); + /* * Unfreeze refcount first. Additional reference from * page cache. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 17ea8fb27cbf..563f7e39aa79 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -573,6 +573,7 @@ void prep_compound_page(struct page *page, unsigned int order) prep_compound_head(page, order); } +EXPORT_SYMBOL_GPL(prep_compound_page); static inline void set_buddy_order(struct page *page, unsigned int order) { From patchwork Thu Mar 6 04:42:36 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Balbir Singh X-Patchwork-Id: 14003824 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AFD50C28B24 for ; Thu, 6 Mar 2025 04:44:07 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id EC82C10E8EB; Thu, 6 Mar 2025 04:44:06 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.b="OfQExJuG"; dkim-atps=neutral Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2085.outbound.protection.outlook.com [40.107.223.85]) by gabe.freedesktop.org (Postfix) with ESMTPS id D7B1E10E8EA; Thu, 6 Mar 2025 04:44:05 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=iugMRqL/buU31/Ba0ZBdqWSQdlqGLs1hYkmoBRb+Q4qhuuZ6bYmEL1wZyDnUu7/AO1ZdjcLXazqr9LiMPGAdeZZA1kq4lRdf9jQPqxfmNh09p3BzFIELbD9xiNvIhm6fEGOZWel/Ok2gQpdff+lMmFOjKm6O3XqGeqCP5SY3BHEGeoSDwRiJgqfuJ2gz6WtjKtd5v+I+BpmJr80JcsP7V0EgrPsPeDGTdujNrc+RjbtkD3J4Mw2SuqAGOXCVFgn8c/MiqVPPGKARqvRzFg0qL21d0JskWrS6zhUonW3VZNMa8ef/xqvjeF1QoDoNU3WrpIPGHltV40Sz+9YTi5EoEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=OGy+kWO+/5WWq70sXD1AWPF0IlIb7r5pajqUEBskhK8=; b=sUcqCYsGrc70y8VNlZ32FLBQJ65R9UuqhRhS05GA3OBHsaS/K9aQRnYOGT7SxhlUfoxtnYMVF5DV6acdXw2Tg9NZkpOOUNcX6S8xEwphcP2nSDuunRqJl9bRyRaCcnjD8mBV14N5Q0bddW92N0F+x6B/SHDaBmbddWDxM50cZ8bEanPLA1YH2Q2nwogWjegUynmxm+Oc9sAJH1CsTNJVp+E2NkEVuu43g2jxMJimZRxq/8D1S08XkataPR5IEQq564c9ixsQrOAAE3bW/+YRm6QhV8Ttt98HN3LFD2aYGcyI69U30AWdmi4JerqsKMfvJ1L+yBPXUbnevQxSxd3Q/w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=OGy+kWO+/5WWq70sXD1AWPF0IlIb7r5pajqUEBskhK8=; b=OfQExJuG9bIlcYeRC4IC3KX5BhSRw+d08DSTYlWioHfGXv6lH7WfK0mzdaFqNFLR7yGtPtK0v8cswx7IGVEkSsX5P0Gg42Lp7nUKAl6kewUnxTCq31QfKcuVQA1oRTg07ZsP14qMDJntMlluAuqlQ3byRCyBmYRQLtW5Lu+s7xnpSIp8NcrSET3Ru6Cu8tEIOayvXIhmvopxmtO5SVm0B5dvgBW1xv00P1Ghbpc6a+/XximGDCK1UHFjOFEb2DvYIfbSnVwVanDAjVbKtcffz6kvgjLO2sIlowVdoVDI1OCdebr2nIagp4siy6zAa2DSJr6+MHWZO7jsfBm1Ip3Jmg== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) by SJ0PR12MB5674.namprd12.prod.outlook.com (2603:10b6:a03:42c::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8511.17; Thu, 6 Mar 2025 04:44:01 +0000 Received: from SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868]) by SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868%7]) with mapi id 15.20.8489.028; Thu, 6 Mar 2025 04:44:01 +0000 From: Balbir Singh To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, Balbir Singh , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Shuah Khan , David Hildenbrand , Barry Song , Baolin Wang , Ryan Roberts , Matthew Wilcox , Peter Xu , Zi Yan , Kefeng Wang , Jane Chu , Alistair Popple , Donet Tom Subject: [RFC 08/11] mm/thp: add split during migration support Date: Thu, 6 Mar 2025 15:42:36 +1100 Message-ID: <20250306044239.3874247-9-balbirs@nvidia.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250306044239.3874247-1-balbirs@nvidia.com> References: <20250306044239.3874247-1-balbirs@nvidia.com> X-ClientProxiedBy: SJ0PR05CA0178.namprd05.prod.outlook.com (2603:10b6:a03:339::33) To SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA1PR12MB7272:EE_|SJ0PR12MB5674:EE_ X-MS-Office365-Filtering-Correlation-Id: eedff22d-2a6b-4200-06ad-08dd5c698408 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|7416014|376014; X-Microsoft-Antispam-Message-Info: 06fb735tJWCJf76lwuHlyRDJcwg4KGAEE6rtICpSUgbzCIq9iMDAmMPqj/uthMqrV9VAdSgS8kmBtcwgvhK3EV7/oI6PidO8OnX5b/XEdNYjGvqLhstbvrOb9PQLkSN1xuXaSbfUuSOz3H0OSJKfnP+oY7lztBuFhbGDXIOUp1Ylr7XPTTSk99r6Msxgy5PPrOqxFtG1tO2cHVcK9+MOjQM9D33GyPgwu9/kpWtiTqc7UQanXfmYaghqrogrdrG8CU4XO9szXQ6EEljVSCSN+NNWfTU/jCC2GVlhDjukAayBOs+QaXhQXXs6WMID4+Li5zDYUkezzsFN9rxqRAXoXsDEN7MP3RsCVBaA5RC7cHAKSIWXfuYn+zE/H1mziLfCdelJ4fjlSLZaI3hvvAhXwaTvaAyPLU9wBiqpWyA2SkRJW/xUSQ68WalibH9+H/epV1rE48Wn7/YT0sCcvSUTqu913sDpkUHcOsMemFvXlT0FDN9OE+lMT0YVx3/2iBL20tNKQIAkyBHTxRpBmIqv+yg9nt3YPAJOaGBKg7XTEcV0Pb6wLW0SFCX3cMofUtdBZ+WqfpkLiVe2PPq8dFqce05qGHYNK1Swy1l/eaDR1yvXY1qEDwOIEfNEHdK67Ul8yvEcS3AzO8l0kDvNnFUqyAabi0c4FHKnwrYnKCtkVf4yYi7qCO1cdnXK4tLCJjFun1+lXVsRBAGTkh/yK5JYNRMhkPU2TPnFfXk/1gvzWvTm+prAAu+RQm/T/oebQMxE9weApYE/3LXokc+hs3PT1UXupC4NonldXHYOFAajyvzpCINvCdRmY9U+N8ROnie5hfkDdg4A+BDUaBG2gMjmXYaC+oV12oe6ScFIi47iHw0AImx69EM8tXhzAf4VACIG/8uEmnrq5S3yibC++XoZfOVnX7HFSTPAYr+rtvRRdA5iCwHqLKhqzVR6fI1pisS2GKZa3VJJTRq3GBrBB4aG7zcBub5wbxcX6voMz2/8lGGSrLfc/dPKAtcByarLInzCuNdTXT4+keQDJY10vAQp6zcjSuXV3cCxTShToYwjwBHDsTpvUqQR8gzPbgSM7ygmSY1bUav1/bo57gry4FrditS3aibWX8lZ8ZT/ZPSsLf05CmfOA0AtW2Tk0ePBPl+0MsvFDyGhLeHoOCpVQo4wn0y5U0+puoTaREK/Sozc7sovXw4tnr8SpGNJQSfcxh1valyPcG7E7CuPeGNnVW+qdFhyWvWaVvA2+xKwYaIiRhiz0cGVy3Jq1UrTE7eOoZLGDTVGO4r42fzszKEOKGz7B4ISyMpcvmSS2zxiUnUF05be4Upm3ik/GRzLTYpIkq4thA5PHLWUqvtvLjnZxxaLuhjG3kz2rDmQCu6iBH762N6lr6bpAOn2kpyFa7uplQRb X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SA1PR12MB7272.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(366016)(1800799024)(7416014)(376014); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: eM75303JCVeCpAENpMqFMpvyEz/QjuhbmYLvDJqpb71zQDtnGqveHi1RSY+9OxhYSUZwu6l26KI/tliuxDDpGq2Mfq05ImoVp0Fbp2S6TutO0AhnmClqGnwstYjiYPDF6nfQt4vpUoE9v9Kho4SvQvvB1sOEQkIAykKlyc56EY0TZ5LDWFkUQLqBbSSF0+95zGRVCP5zgkCmBb8SnV7VIrxyXkq/+P/uRqdzCOXvsCUdSXQsYm+PvMuHz+USG9U54XjYDV9V30TZMWkn7778+vOOksOcsj7+9oQPePWU1KrZIuE/PjpL9+XBO+fGNaavag3LFUStjQ+gynTerRNVyegwJBvEAVz80SGNuy/2JMbcb+kttrR8k1E4z7XycWrGlf/71MvLeBLa+22zhg76NZTUO7Ja6hyLx1f7/PTbJXp0B0XBqexOQ1bSzZyJnC77CI/bWKJGKuQ4sMPO+4EaUfgRqZ3rzjunf4E0fD/dPffE7Ii5lvwEjaYzIZshUrod7moWsWNQCCQBDoxuG5mGwsH2QTrSPo90EOhLmbWP7M9PODH7ll0MXMk9sEuYmjW5tEhVaP7n4E6ODe425Ngrap4SNk/pyZ/l5lksVpqSJDRkOvFKH3Z3nKx3sHKvXzOpk9H2YpsI4xtTK+SvfKpzHrp+qeOj1onUl9eyoyohIuUdw+PuPtHEdrp4+Kl52jCDR95dRsrxEuJLFuKUXZm4ETpV5PFqjYDiiQH0R2/9jBvn6NsPH8cvEJzWkolCXUXj0JfHHnghq3XfKykNTI3lJkcNFqliO2KP7xu4ZCARGsva8skgY/bxcA++NKsfkZo3jwQSrvFe/ve2q9tqRg2ALCu15U85yh7J3I8M60g8KXhkI6mke+VXDtH98t+q8DDtlL+zDQWyYL7BiO5X0SQfIcdzoHOjbolM25/+lp5yJQtPWh5K6LJNpefGmBNzwNFoug3dA7Z/bljalrEMwsAkMbSYyIM1MONaCysbfntuajxGyfczFA42QW9yWcTOvq37pWmz6rh7Br8cxZyJlNO3Af4brGtzAwVkmsEaxNZx2BTRllD4lb8Z6yx9Ls0FKMlcLMOoQVoRLcTxspgp9dvi3BIhj3CIbOrjvE3ONC8LgUcBOkE3Bu6qSAtRsHYLrEyvos4h3JQoO0++1UwZ16efotVzHPKf1T7f03MDqBLdOXJxinC81uZgFWNJgpWdLSpGwmOX3uE2m+EjaEmzX1JHjhInUsHQMUKzZFB+iXi6zb5srclm33GB/sldPpj6W5/wg1trXge9ZmzqkWqBTKhj5y6zDG5CokkcBw5WM15/FyuMNG53p9QGFt+5ZrGGOFpM4VSPv/Glih47aiWr0iOYufxVoWGA6QEAmvKgY9hHzRnU8bsf8N3AtN/TUwsS08886M3K+O8JzNDYAoTpKls8DD0tgQlOUFOTzqMn95Rm3e/yXrJ5/9aTLRZhr/S7lh5+VmU0kbjO92VjFIjjFrxySEIQJMCDR7eXeUlQP2YGfL/Tdtqtz5wmm4h+MkPIIjFWzYNxQxwiUQ1FkufBtOMnvWW0FowWUHwOxZbRl2LR/G6yVunqERoYNXQMWTgVTghG X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: eedff22d-2a6b-4200-06ad-08dd5c698408 X-MS-Exchange-CrossTenant-AuthSource: SA1PR12MB7272.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Mar 2025 04:44:01.4399 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 0MbRQgtWKv/jC77MS/D3O6GRWHtIJeqJT5xiUmahFu5GzPVf4jFHgSnuTA63RK2RArls+3vjJKLtbv9OkHX4Hg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ0PR12MB5674 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Support splitting pages during THP zone device migration as needed. The common case that arises is that after setup, during migrate the destination might not be able to allocate MIGRATE_PFN_COMPOUND pages. Add a new routine migrate_vma_split_pages() to support the splitting of already isolated pages. The pages being migrated are already unmapped and marked for migration during setup (via unmap). folio_split() and __split_unmapped_folio() take additional isolated arguments, to avoid unmapping and remaping these pages and unlocking/putting the folio. Since unmap/remap is avoided in these code paths, an extra reference count is added to the split folio pages, which will be dropped in the finalize phase. Signed-off-by: Balbir Singh --- include/linux/huge_mm.h | 11 ++++++-- mm/huge_memory.c | 53 +++++++++++++++++++++++++----------- mm/migrate_device.c | 60 ++++++++++++++++++++++++++++++++--------- 3 files changed, 94 insertions(+), 30 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index ad0c0ccfcbc2..abb8debfb362 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -341,8 +341,8 @@ unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long add vm_flags_t vm_flags); bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins); -int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, - unsigned int new_order); +int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list, + unsigned int new_order, bool isolated); int min_order_for_split(struct folio *folio); int split_folio_to_list(struct folio *folio, struct list_head *list); bool uniform_split_supported(struct folio *folio, unsigned int new_order, @@ -351,6 +351,13 @@ bool non_uniform_split_supported(struct folio *folio, unsigned int new_order, bool warns); int folio_split(struct folio *folio, unsigned int new_order, struct page *page, struct list_head *list); + +static inline int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, + unsigned int new_order) +{ + return __split_huge_page_to_list_to_order(page, list, new_order, false); +} + /* * try_folio_split - try to split a @folio at @page using non uniform split. * @folio: folio to be split diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 518a70d1b58a..1a6f0e70acee 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3544,7 +3544,7 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, struct page *split_at, struct page *lock_at, struct list_head *list, pgoff_t end, struct xa_state *xas, struct address_space *mapping, - bool uniform_split) + bool uniform_split, bool isolated) { struct lruvec *lruvec; struct address_space *swap_cache = NULL; @@ -3586,6 +3586,7 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, int old_order = folio_order(folio); struct folio *release; struct folio *end_folio = folio_next(folio); + int extra_count = 1; /* order-1 anonymous folio is not supported */ if (folio_test_anon(folio) && split_order == 1) @@ -3629,6 +3630,14 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, __split_folio_to_order(folio, old_order, split_order); after_split: + /* + * When a folio is isolated, the split folios will + * not go through unmap/remap, so add the extra + * count here + */ + if (isolated) + extra_count++; + /* * Iterate through after-split folios and perform related * operations. But in buddy allocator like split, the folio @@ -3665,7 +3674,7 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, * page cache. */ folio_ref_unfreeze(release, - 1 + ((!folio_test_anon(origin_folio) || + extra_count + ((!folio_test_anon(origin_folio) || folio_test_swapcache(origin_folio)) ? folio_nr_pages(release) : 0)); @@ -3676,7 +3685,7 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, if (release == origin_folio) continue; - if (!folio_is_device_private(origin_folio)) + if (!isolated && !folio_is_device_private(origin_folio)) lru_add_page_tail(origin_folio, &release->page, lruvec, list); @@ -3714,6 +3723,12 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, if (nr_dropped) shmem_uncharge(mapping->host, nr_dropped); + /* + * Don't remap and unlock isolated folios + */ + if (isolated) + return ret; + remap_page(origin_folio, 1 << order, folio_test_anon(origin_folio) ? RMP_USE_SHARED_ZEROPAGE : 0); @@ -3808,6 +3823,7 @@ bool uniform_split_supported(struct folio *folio, unsigned int new_order, * @lock_at: a page within @folio to be left locked to caller * @list: after-split folios will be put on it if non NULL * @uniform_split: perform uniform split or not (non-uniform split) + * @isolated: The pages are already unmapped * * It calls __split_unmapped_folio() to perform uniform and non-uniform split. * It is in charge of checking whether the split is supported or not and @@ -3818,7 +3834,7 @@ bool uniform_split_supported(struct folio *folio, unsigned int new_order, */ static int __folio_split(struct folio *folio, unsigned int new_order, struct page *split_at, struct page *lock_at, - struct list_head *list, bool uniform_split) + struct list_head *list, bool uniform_split, bool isolated) { struct deferred_split *ds_queue = get_deferred_split_queue(folio); XA_STATE(xas, &folio->mapping->i_pages, folio->index); @@ -3864,14 +3880,16 @@ static int __folio_split(struct folio *folio, unsigned int new_order, * is taken to serialise against parallel split or collapse * operations. */ - anon_vma = folio_get_anon_vma(folio); - if (!anon_vma) { - ret = -EBUSY; - goto out; + if (!isolated) { + anon_vma = folio_get_anon_vma(folio); + if (!anon_vma) { + ret = -EBUSY; + goto out; + } + anon_vma_lock_write(anon_vma); } end = -1; mapping = NULL; - anon_vma_lock_write(anon_vma); } else { unsigned int min_order; gfp_t gfp; @@ -3933,7 +3951,8 @@ static int __folio_split(struct folio *folio, unsigned int new_order, goto out_unlock; } - unmap_folio(folio); + if (!isolated) + unmap_folio(folio); /* block interrupt reentry in xa_lock and spinlock */ local_irq_disable(); @@ -3986,14 +4005,15 @@ static int __folio_split(struct folio *folio, unsigned int new_order, ret = __split_unmapped_folio(folio, new_order, split_at, lock_at, list, end, &xas, mapping, - uniform_split); + uniform_split, isolated); } else { spin_unlock(&ds_queue->split_queue_lock); fail: if (mapping) xas_unlock(&xas); local_irq_enable(); - remap_page(folio, folio_nr_pages(folio), 0); + if (!isolated) + remap_page(folio, folio_nr_pages(folio), 0); ret = -EAGAIN; } @@ -4059,12 +4079,13 @@ static int __folio_split(struct folio *folio, unsigned int new_order, * Returns -EINVAL when trying to split to an order that is incompatible * with the folio. Splitting to order 0 is compatible with all folios. */ -int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, - unsigned int new_order) +int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list, + unsigned int new_order, bool isolated) { struct folio *folio = page_folio(page); - return __folio_split(folio, new_order, &folio->page, page, list, true); + return __folio_split(folio, new_order, &folio->page, page, list, true, + isolated); } /* @@ -4093,7 +4114,7 @@ int folio_split(struct folio *folio, unsigned int new_order, struct page *split_at, struct list_head *list) { return __folio_split(folio, new_order, split_at, &folio->page, list, - false); + false, false); } int min_order_for_split(struct folio *folio) diff --git a/mm/migrate_device.c b/mm/migrate_device.c index f3fff5d705bd..e4510bb86b3c 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -804,6 +804,24 @@ static int migrate_vma_insert_huge_pmd_page(struct migrate_vma *migrate, src[i] &= ~MIGRATE_PFN_MIGRATE; return 0; } + +static void migrate_vma_split_pages(struct migrate_vma *migrate, + unsigned long idx, unsigned long addr, + struct folio *folio) +{ + unsigned long i; + unsigned long pfn; + unsigned long flags; + + folio_get(folio); + split_huge_pmd_address(migrate->vma, addr, true, folio); + __split_huge_page_to_list_to_order(folio_page(folio, 0), NULL, 0, true); + migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND; + flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1); + pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT; + for (i = 1; i < HPAGE_PMD_NR; i++) + migrate->src[i+idx] = migrate_pfn(pfn + i) | flags; +} #else /* !CONFIG_ARCH_ENABLE_THP_MIGRATION */ static int migrate_vma_insert_huge_pmd_page(struct migrate_vma *migrate, unsigned long addr, @@ -813,6 +831,11 @@ static int migrate_vma_insert_huge_pmd_page(struct migrate_vma *migrate, { return 0; } + +static void migrate_vma_split_pages(struct migrate_vma *migrate, + unsigned long idx, unsigned long addr, + struct folio *folio) +{} #endif /* @@ -962,8 +985,9 @@ static void __migrate_device_pages(unsigned long *src_pfns, struct migrate_vma *migrate) { struct mmu_notifier_range range; - unsigned long i; + unsigned long i, j; bool notified = false; + unsigned long addr; for (i = 0; i < npages; ) { struct page *newpage = migrate_pfn_to_page(dst_pfns[i]); @@ -1005,12 +1029,16 @@ static void __migrate_device_pages(unsigned long *src_pfns, (!(dst_pfns[i] & MIGRATE_PFN_COMPOUND))) { nr = HPAGE_PMD_NR; src_pfns[i] &= ~MIGRATE_PFN_COMPOUND; - src_pfns[i] &= ~MIGRATE_PFN_MIGRATE; - goto next; + } else { + nr = 1; } - migrate_vma_insert_page(migrate, addr, &dst_pfns[i], - &src_pfns[i]); + for (j = 0; j < nr && i + j < npages; j++) { + src_pfns[i+j] |= MIGRATE_PFN_MIGRATE; + migrate_vma_insert_page(migrate, + addr + j * PAGE_SIZE, + &dst_pfns[i+j], &src_pfns[i+j]); + } goto next; } @@ -1032,7 +1060,10 @@ static void __migrate_device_pages(unsigned long *src_pfns, MIGRATE_PFN_COMPOUND); goto next; } - src_pfns[i] &= ~MIGRATE_PFN_MIGRATE; + nr = 1 << folio_order(folio); + addr = migrate->start + i * PAGE_SIZE; + migrate_vma_split_pages(migrate, i, addr, folio); + extra_cnt++; } else if ((src_pfns[i] & MIGRATE_PFN_MIGRATE) && (dst_pfns[i] & MIGRATE_PFN_COMPOUND) && !(src_pfns[i] & MIGRATE_PFN_COMPOUND)) { @@ -1067,12 +1098,17 @@ static void __migrate_device_pages(unsigned long *src_pfns, BUG_ON(folio_test_writeback(folio)); if (migrate && migrate->fault_page == page) - extra_cnt = 1; - r = folio_migrate_mapping(mapping, newfolio, folio, extra_cnt); - if (r != MIGRATEPAGE_SUCCESS) - src_pfns[i] &= ~MIGRATE_PFN_MIGRATE; - else - folio_migrate_flags(newfolio, folio); + extra_cnt++; + for (j = 0; j < nr && i + j < npages; j++) { + folio = page_folio(migrate_pfn_to_page(src_pfns[i+j])); + newfolio = page_folio(migrate_pfn_to_page(dst_pfns[i+j])); + + r = folio_migrate_mapping(mapping, newfolio, folio, extra_cnt); + if (r != MIGRATEPAGE_SUCCESS) + src_pfns[i+j] &= ~MIGRATE_PFN_MIGRATE; + else + folio_migrate_flags(newfolio, folio); + } next: i += nr; } From patchwork Thu Mar 6 04:42:37 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Balbir Singh X-Patchwork-Id: 14003825 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DD1CCC28B23 for ; Thu, 6 Mar 2025 04:44:10 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2C74810E8EF; Thu, 6 Mar 2025 04:44:10 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.b="Q15HZSu+"; dkim-atps=neutral Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2087.outbound.protection.outlook.com [40.107.236.87]) by gabe.freedesktop.org (Postfix) with ESMTPS id F2F5410E8EF; Thu, 6 Mar 2025 04:44:08 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=EG3ypliJBiQbLNujWGgTUCxoNJM9qeUcZiOgWEYUT4JMVJfRq9nU/MpoYvQSGt7lZIH8YxQ4TjqTw6FZOh4dXL/t10SHwQU32uc1UHLppVcxzt7f2T9L5FtKA2rFm/C10qp1/oHcKdj3Jci0WBlBUBDGc9BPP1SYtlFNj89i9C3hEPNKYs0z7slXfBUMf3rNRYa7Cu6441kqXxgWALBGz/7GIWlsEpbkaKNNvPR53d9cG48v0XVf56/Hd25eNAtBc5lmjkBbju2IGU/m25uchDbD1vudpm3lwr3M/+1OwhVbc3ApsVnwEgzdXFCcws5z7IIxbDN1+2UbTH/hXR64hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=KoyYN3GoNVkNu1Ru3ABRM1//mr6MezEx7gX8XVXgTBA=; b=gW5Xv8UnvMVA/oeI6S5UNcUu5ZvzyK1sJl1haYAmNdLw4kJt8HLTa/9HG2fzCrwLpn01VKLYqoo8DayfQPpLKFW0kVapW9ikhl19glMXW/nCLhRzv6GgH8KWEb7nMlq8J/sYTgReASoArNIoAEUHjjDxgQmGLHmXE7wp3QNtNgtrah+9eytUmPuiEH4tggI+AdeYZGov/l2A1Lx0zU07tKXlynkyXjHpVtr7ZXpc2xFZhu1ZBicw4iKz30Kixc3KHIzTr1PrqWR6nqIYFIrdTI/Q5Mxc47HRr1DSQxGuCUYWQTH+htuOLpa3/cITMufcv8u5WBndNJNP9CZUkT/KiQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KoyYN3GoNVkNu1Ru3ABRM1//mr6MezEx7gX8XVXgTBA=; b=Q15HZSu+urGFPd1un5aV9yIUMhAWe8NDPjPIAWIuFYYBCBySBRgQA0trrVeuONQc9nWTtM03yCsMy/eiy82J/hhkeVqvdzid1v/dB5q7QOhP4P/dWfELfQm9+BCt2xZyVvXG9RwH4Xw5uPvcQBwHk2dZmo537jHmqt7ad1oeMsiTvTQh6+aBZpGFWg9bXvycY9cv6kx6s4l6vlXzxVd+j9myQOuoHM4xs/J0wRaHtF2Hj1J5FypdDnGic0i6RnpQ3o6rbrmf7HpbMCzlAOlh6V04bpdPllJ004Y1tD+peRUCMhV+f+wBH96TjPTCyuTA6a1GSKWJ3m21EZbAUP58pw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) by SJ0PR12MB5674.namprd12.prod.outlook.com (2603:10b6:a03:42c::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8511.17; Thu, 6 Mar 2025 04:44:04 +0000 Received: from SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868]) by SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868%7]) with mapi id 15.20.8489.028; Thu, 6 Mar 2025 04:44:04 +0000 From: Balbir Singh To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, Balbir Singh , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Shuah Khan , David Hildenbrand , Barry Song , Baolin Wang , Ryan Roberts , Matthew Wilcox , Peter Xu , Zi Yan , Kefeng Wang , Jane Chu , Alistair Popple , Donet Tom Subject: [RFC 09/11] lib/test_hmm: add test case for split pages Date: Thu, 6 Mar 2025 15:42:37 +1100 Message-ID: <20250306044239.3874247-10-balbirs@nvidia.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250306044239.3874247-1-balbirs@nvidia.com> References: <20250306044239.3874247-1-balbirs@nvidia.com> X-ClientProxiedBy: SJ0PR05CA0171.namprd05.prod.outlook.com (2603:10b6:a03:339::26) To SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA1PR12MB7272:EE_|SJ0PR12MB5674:EE_ X-MS-Office365-Filtering-Correlation-Id: bf05efa0-32af-421d-a3af-08dd5c698608 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|7416014|376014; X-Microsoft-Antispam-Message-Info: 4jxZCB1cfhfmZwMdnrDuMRwPGz3CpLyRkfXPI861OqF5j9mMGYaMWmGpcKS+yh/OebPqE6DkJs8qrYMtFEs1TxdIJ7w3rXt9bxHqJk8nISpkFd3f7uKJ1Ps3vNeNUXr8T/kTp3G5XzCCPe0ednUmc30D5GoCK4b93m+S8M/XQFiX5GVdw1F3vhguiXt4YbzYYhP0JqK3qDXTzqBOgHhcbARbHZ0vqvzatQ5tWDh0UzIEf1Oj68jF5PI/c5FOANLHLssWusEZmetEOjSaMirUNG8rQkVin85Z0mr1jkXh5T+Eb+sJR6vqgytWEp/p/590lTcpkrRb9AbzxXCaRiORzOQSmLcoZnVgeVuvL6RTrWlHxPRg5wgFLX51cc8hkryoM7NhU+s1zIAeNDLtDE8raUwAQdFJ80KN6+JGJchWGx4JEEe2CMdCl2E1jxSaHVbRmqKHY4eWefeD7tYWX4BtYCJeIc4FvGGPAhIngb8kXYVUBXN8N/PpWCccwH1qws+RYhlUJfxom7buS4r7yeLYM8fwzgWOPL8EEyOarzHAEXP9T82bqlVvBfAWTGOAgm6Qeyex7HtgKMk1AHsB04ik8ZPY8aWqjkT7xN6y6jtsEitEWTTaoyUKNma89uWyFARBA3tcin6yjzeE4nI2OgQg1BKtuYl0Lz9KewWO0e/2RafAiFZhtSTRg1VJNBj82MVRCChHlD3h6BvyKYtPE0u90wIeJdmmeSI6cNfMTrnpg7jTMPlbDZXBhuG/X7emsUVAAbNVnywoGQsP73IoM4sFxIVg+ChuR5kc2YIsnbKdCIzYUOEVR89Bpne1MTzwGt50cW/FzXUCQE//HimDREdvDZVV43dwzMEI5GCPZ0P/8QbJGt+HPEhyqQPuMOtV3KsHcLhxDi4Cmq7jcGk/zOOqPRBQPbnYhPplXWXcu3T5GexNsZODyT+jcJJS9APNEFYwzd2Hkm9loPcjemCUrFookGlRiHnvMXJiJX4JMEUFqUfv7+cEQS7PIvBOOMkzJYYHHc7Y/4Hhy5vAO1LkICt0qlWNMekAXQMRQCQdPlk1NEyDXSPuQSu38+wi2n9C8bqP3LQ4N9gySHY8UminZS5gZHH2tWZmSFTnnl4QXTInqOwx46jOqAWtjyXBbKsJai5kNKS7dg67XFmQ5flmWqZAHR5T4SywYP0KivuUklkdiObCa3qqHGCAoVFmxLcqTdi7qxP8hyrPq7p2WV0nNRu8tkYt9jxv5Ry0piorj5/41wspcsB+OHQI/mzLtsiGgflLn2539V/3PUjTb89bZLWyxKza3roVm75pPrJC/24HNK/2HjM3SzlpnLGH4aRJE5wcNLnbbEGP7RmdNH/h8TU50hqfAp3M73HGZOgyCZ+OdrlFDhNACNBUS+kUb/t8RCCu X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SA1PR12MB7272.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(366016)(1800799024)(7416014)(376014); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 7R28iSciVvV6na+k+hX7ZsYDBMLS64w5Dqawh5of7yaQiO8KMVRYsJI4RqZdPhCBwVJ+vjkPCUNhj2ReFLFff2yp332mNzzkibPlBfpoUXoaG5VODLnaMBaSh+ZFyQU4AgCwRVsFWtKSUsinrJPUyNGe00PKUizrc2187pWwAWZQ5xudsBCNj+v7omyemNOQBBHwXLgayVM9onBELoIxUvHwr1jvTSkHJsof99ic8OoCLPAWNKOG5WN+JinClojbSVCCpY7LKn1x9OuZEXEng6MzZNe+YCqzsf2+UXUgJM+DXMmwmGvrCOCntUVCKIrzUmS6cI69SzwdweMRdh/7LVx5mkL4MJxXWarq+z+l6rHNxVdTm5E/BXxPHPW/9zBxFS/cUthCDIhIUO4nQb7QrqB/FgAj23j79bZXPkYzYHIArminwZc+hH/2uitw6nSw0w3+L3ayalGDaFnCkiBzboUKCnzaTPpkPATPWov//gMr/G7jdzaZUqRnwJWWpfLl+Rjix1vm8L3o6TBp1g5KUgBsq2j86eHYzGq/mTTLPr/lHtiPnQb2R5lAmZQQ9wiqZrHPQ+qxn7v2r1hLKbSGgtcT60pkwt7W3xaRbMH4Y9DCdhxbe9ArB/BMPdnKXfjXLhmxk495GEUH98Ew3DDhovyvqWPvEGdO/zflMeLuUwrAvZFFm1BsTRWNkKr5V+JLOfXnmPwsaBFVDQxGoj7V0eyxUGePCsf8eObu7+NFQY9zc0gNEMXZoLD9PdAF7NMatHlO0wUlGajxW1+lq0F7lEa01/wCnE2XouPDYLzUOj4Om3c+S0KALPVZK5gptNS1Sp/x347scmoOM7qY1RTVPEvKLig+9ZVKWdngS44n9t9Ja37Vgzqyjg7H4kYZq7OG/nGt9lJbl6PcdRpaeBTJgHRN52cn8kSa2TPOmSxvDPPxJCKgXpkvPkG5I1GEFzsTRMJWfjvdaf42XZ04FkqPU7NBMFk54wfqhSYIpOCk2zDB8x/93tfmQHC3gy4vq/1GD3sEoCnAmQZSaDenxHjK5FZnaspjpbuB7D84LknxXtaFJh2177JpGBvOJUyAf/erd+6wgiPbrk2v3+0j2AG9Sh67egn64c1DeFauMQ85azCkaFBZXb4hS2NvowRa2iZZW8/fAiCnKkKhS7BlpvlWinkc+7vp6h9RPvgoyRIV2xval2yI2Ny5plqf0hO1WaZmRptCQlGNX3M+aJatCF9nI4NbtzgYBvZroOQ6PU4Gb1F/xcjfQQOg1PEb+eur+w+9Yra8twSnr6enUY8IljYpOXRtaKUSmUkkL/JYt2kpiO1DDv9mkJKrcRBON1wbQWDRiSJrUR7XeJ0hBOIzWVLphZOIQYPG0qOdymTUWmN0ZgAY/SfEHlINuoAufHtUPjRzzc4V2B91xooSLdEG8Ffrp2VGo/5VlKuHRmZcHU/H76qio5LX03jJ21R+mC+Xh7+1Vf5kfaP+HmxLK2B0R1y7l0hUe9kLBqauRKb7kdQQipR+US4YgZBuyJOQlia3Gmgv3y/nT9NBmco5FOtVbof9AcGRpElgzFWcfJXKHVYnWHwpOwil1V0r/6ackmgZmz/K X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: bf05efa0-32af-421d-a3af-08dd5c698608 X-MS-Exchange-CrossTenant-AuthSource: SA1PR12MB7272.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Mar 2025 04:44:04.8872 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 8/YhPl/uuJcgu8SfDYzRURnp93k/LcxpUlpSlJMI6VfDW1dkX6diP3YA8a8ZvJAQSSYhWIFMGXWZueOIPQTOxg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ0PR12MB5674 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Add a new flag HMM_DMIRROR_FLAG_FAIL_ALLOC to emulate failure of allocating a large page. This tests the code paths involving split migration. Signed-off-by: Balbir Singh --- lib/test_hmm.c | 12 +++++++++++- lib/test_hmm_uapi.h | 3 +++ 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 18b6a7b061d7..36209184c430 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -92,6 +92,7 @@ struct dmirror { struct xarray pt; struct mmu_interval_notifier notifier; struct mutex mutex; + __u64 flags; }; /* @@ -699,7 +700,12 @@ static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args, page_to_pfn(spage))) goto next; - dpage = dmirror_devmem_alloc_page(dmirror, is_large); + if (dmirror->flags & HMM_DMIRROR_FLAG_FAIL_ALLOC) { + dmirror->flags &= ~HMM_DMIRROR_FLAG_FAIL_ALLOC; + dpage = NULL; + } else + dpage = dmirror_devmem_alloc_page(dmirror, is_large); + if (!dpage) { struct folio *folio; unsigned long i; @@ -1504,6 +1510,10 @@ static long dmirror_fops_unlocked_ioctl(struct file *filp, dmirror_device_remove_chunks(dmirror->mdevice); ret = 0; break; + case HMM_DMIRROR_FLAGS: + dmirror->flags = cmd.npages; + ret = 0; + break; default: return -EINVAL; diff --git a/lib/test_hmm_uapi.h b/lib/test_hmm_uapi.h index 8c818a2cf4f6..f94c6d457338 100644 --- a/lib/test_hmm_uapi.h +++ b/lib/test_hmm_uapi.h @@ -37,6 +37,9 @@ struct hmm_dmirror_cmd { #define HMM_DMIRROR_EXCLUSIVE _IOWR('H', 0x05, struct hmm_dmirror_cmd) #define HMM_DMIRROR_CHECK_EXCLUSIVE _IOWR('H', 0x06, struct hmm_dmirror_cmd) #define HMM_DMIRROR_RELEASE _IOWR('H', 0x07, struct hmm_dmirror_cmd) +#define HMM_DMIRROR_FLAGS _IOWR('H', 0x08, struct hmm_dmirror_cmd) + +#define HMM_DMIRROR_FLAG_FAIL_ALLOC (1ULL << 0) /* * Values returned in hmm_dmirror_cmd.ptr for HMM_DMIRROR_SNAPSHOT. From patchwork Thu Mar 6 04:42:38 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Balbir Singh X-Patchwork-Id: 14003826 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4C682C28B23 for ; Thu, 6 Mar 2025 04:44:14 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A684C10E8F6; Thu, 6 Mar 2025 04:44:13 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.b="l2oKZSfw"; dkim-atps=neutral Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2088.outbound.protection.outlook.com [40.107.223.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9FB8310E8F4; Thu, 6 Mar 2025 04:44:11 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=kWQvGsRe1MDxNfLPjC4CaSLNSjahhIBgAlyGgRVci4u6kmr/IfWZR/8IODqgWTUnedKX3WKRH2riTLGUKzOezw6z/4chm6Y2ndQ0pKbGRJCs17FhPT+NuNVZWxPIkpOfpGQAqJA4yhGTSAnEyVVnI12Pzd7hcRTMBeVyW/w8fwyHmHAQIulJlEmLHVCZv/GTOGNIj3DyNI6vfSV0x2GHfJMYDN+69TkI8drAEc1Thvq2uDV7Mt7J2eQCRWfh5jXnAkZBUlroVaWBbbHaTqFmoD1SG0J4lhgspOx4z9CCxRktbddeUVwc6emZin2u9qiiGPvcAEQafrFleq0gcKyLEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=YIumqqK46CHVi5Q85D2yejJUrMjvB3LIFwEsKiYdW/4=; b=fGuFWr7hsJlz6h95a3FZoyM89IxcDwp7x97vHUHsNOUkIL1a/zps/wbUzI0mPiidxK3STvR16yG0ysuMO7LlHHgPkY2pOcWQg5LoVILvrd0tdZ1a6irCFAY0KKouXjCzOCtT8qVLXYJqbgJfwyahtUs7nFRcff9TghcapJHcRG4SaTQlRmxnjdKJJcTe+00VJqbsEU6WQXMYOL0evkVVSWbbXZotdsTxm5U3s2QeVjySXLlKaNqj5WjQXkPqu2feOafdUd5UkKRtF9CIcTWoc7Gq9egqXpRRB6ziChFPnfJoNPfgRS5yFa3vtldhEsh40mXCbSGQJTkfXhKumuy4qA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YIumqqK46CHVi5Q85D2yejJUrMjvB3LIFwEsKiYdW/4=; b=l2oKZSfwf1nOkfKd0GQ2BCPWTv7OaO6tH0fuuC9D5KK9sOPNzx9JlaXHR9Kvk63UWVZ5nPS5I74kYie78zoIoamWbXLDtAX1i+qHIoBiZXkTQteGfSz6cSB7jKbE6oNxtABR8GUyF32xptEKO4AV+ZnyiJEUmj2hEnoXjGuPJhwygsWuomxvy6sxHoUM0r6q+7qZKfPksmGYf5XAZugAEPPSdK0glDgWN5hoKL3QpbFVwltHii8AyPoTFtHnxzWOgWfp81oc5hFub0LVPrCFE4I4sjFzt5RyZfud3RLilx2mI1oGyP+jUjtwjKZN0S62AD7ESRL52NVrf+04KPLOgA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) by SJ0PR12MB5674.namprd12.prod.outlook.com (2603:10b6:a03:42c::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8511.17; Thu, 6 Mar 2025 04:44:08 +0000 Received: from SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868]) by SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868%7]) with mapi id 15.20.8489.028; Thu, 6 Mar 2025 04:44:08 +0000 From: Balbir Singh To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, Balbir Singh , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Shuah Khan , David Hildenbrand , Barry Song , Baolin Wang , Ryan Roberts , Matthew Wilcox , Peter Xu , Zi Yan , Kefeng Wang , Jane Chu , Alistair Popple , Donet Tom Subject: [RFC 10/11] selftests/mm/hmm-tests: new tests for zone device THP migration Date: Thu, 6 Mar 2025 15:42:38 +1100 Message-ID: <20250306044239.3874247-11-balbirs@nvidia.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250306044239.3874247-1-balbirs@nvidia.com> References: <20250306044239.3874247-1-balbirs@nvidia.com> X-ClientProxiedBy: SJ0PR03CA0295.namprd03.prod.outlook.com (2603:10b6:a03:39e::30) To SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA1PR12MB7272:EE_|SJ0PR12MB5674:EE_ X-MS-Office365-Filtering-Correlation-Id: 1b78c234-3e1c-40a4-3dd5-08dd5c69882a X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|7416014|376014; X-Microsoft-Antispam-Message-Info: hbD9zz4fPIK2YBtJ4XMblryQFvEdaPJBSofY3j74Z6n7NTw0nHzXzFtOm4gEcmkmeMeo6ljgTsDHxmN6Gzj34Ny3Sl507Eg1q6PDUPxueNp7VtRQqQb1S1tyyUW3MerNoW9CPUtRResmP6kzS4g+3+X8xblrI5JZpvp9l3nyQc/X5k0ngBwzECk1q8uGB4z19w7c9uc1zL7/5lyedTHqIKDKq3gP5rVxjtcqU95qCKEw3ohIt79+UDxR/e0QaYiHhaBi7G2DKlNr+p/393vVYKx+SlPWWcT8IyHutIbpJibOVa9dl4jhuVuUAV/zdhA8X689zjZT6gAL+SMX84t70WRYDA6ozrS10JZyV9QVGyV2zQRGLJq6WmEvutCbYBIU+RHSLytQRU1dUJ0GVZ9V4HJvIe134n3nOoR9FAr6fYiiyUiYcfOr79H9Gttz/+7brEBqJNPwZh2hh2Wlzdt5fA9HK6Qbs/3+zXsLKyTEMl9QT4qljJETPZN1oWtm+iTcFaO/248gPzDbfpyyFN8IcoZlRL8pytuNu5nCcn17z/3A4QGQc25Eux5g7mAZ3DlfrcP9QakawHHntOFOUdTfLo7ZTyvDvCp+DWe8Cu/3o2o2MjbTzXoXqr7PuP53Z1ibowJNFM6Mj6DzUBi04oQrQ1qz4kNlw98eGEHYOR37xlIlnqGyFeyox/q9tXlmwrVGbP3Tgwp+jUfRcjTaZ0wUs5ArY3xiynqzlmXGzEvUs2SRmxNdrgqy5nA8IjueqQSfKLQR9TV52RW8JndK8ncbeSuW7uv1ekE9kHW3+KbHrL+C4bHiXZHHBkEuDJLro8sjPgbVaNxDXQFCq7JgwBL4d0M18uKnyrJGN1d/C+CQZhp6rIRNA4jTP3ZIvatBPnB8J5Mwbip9XzZFNIvapeJ9FTuxCOKsSdZ2MmIfxINnkyJ9VgOA6an9hubQtlWsuIcfPbfRNsEvMEXPovuGCDul4SIZ2Me6al7Sk7k7ch3+TSFWjZmFSY76aZr/SJCRO07WnB5eKLGxrPlNAPwz1h25Le6OSVK4rlg+xPzgxD3G86qpRfBRMV2hxTwZycB7ffKWut9TMhTh7M+2HCeOiKEPj0aF3qOz7jFPBo2HetbL5rYpy9fBtjIML0egWOv6X6YB7nXW7HmhLhVNkjkuDHeYoBiip3D1lSbDKR123ZrsIXPh2dO0/4/T36cv3yxeTY8yK4h5Dio8T/8mpbTuwHiic5KYcG8+1vEZ9tkAGQ384RGviHHU9aPBD2awjeAmpR1GHDyxxX/QHKDDC9UxLfcdWqW2d/nYOaJi5WiWdZsY/R974nvjUPHnYLpLHx7cffkDAQq++WJHOPS8CEwGjy3UEVqe6jam03Rf5v+IhzgpEpEw0yF9ZMEtlLhLU6GIWRdO X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SA1PR12MB7272.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(366016)(1800799024)(7416014)(376014); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 3Gj2Y0cm6Wk1dS0Bli0ktLohiOFN43Assump4mSHOOAW1EqBAQRyWAVQPMe/klRUELJ+zQwoJ4fH81zXYMSJxyME7szuQ4w3n9DdThiMGiv6oHwUy0JCRXzINexnMdjNRmV7OIeD8pFDzONUaI+6bv5eSXxI+a7PTn+jeYCHcFwZ3S3HQ6dGaroF2Ylyj3gCvKuhXNqyHFO/4AdAYe0VJBSqx5Z7xPDP8YtVBXe9+33vZ2Ww4bjU7l9Tk/8DXSyjYkDePL2FvyjZq15laK5TFKlzwLWDaJT0W84bRJcfdyOYUQb7JR4OMslegOuqtk3NgDKgMxxusrARbP4TBDXa3w0VKpuf8q4AkmpmlgicHkLM4cCEPRE8+yu/C5qnZDH43jM0/i8McderLsQ5f8xI8iQTN0W+NLri+t1XbB9DgdlZP5ET0oLXRI+cMylW0G6vLCrnlsbw72yFgn70tkx3bpqOow/erWRY87nYcLng3xJdaw0OmgqGmxuqExPYWhGG2I2CRb1e5DFxYINq6MC2CEmBqquqfa8VoGEUF85iteF+7pL92vksgrdXA8c281ZJYoaG2XKnM3vxSPRjprAfb469XZREAc/62q9/9+29gxDTVxk2UphXqXgZeEGO5B+aqaSSVFNOuQoWfeznLERU4K2LqzoiY4R9vjpxVkHjUF2bdQLFDsYpnI+ml5sYScAOGYO1qTqPZ8yalCCJhNrxEvs5glmkGSJ08q2M6X/pyPFeYtnEx3Q9eex5VEIOE+ehZbcYxnx0vJmecbJ9OUO8iBl+CDJpjfaam0fbHQpsARdTtKAQj27HTV/bTHtPoaliQ0QydQasHdmLb5oSK3vLvHFQ6oEWI/YjHzJ9yAsvaxCg/LgFjV9f9wK2ZsiZTMEr/seMVzSeutIiFQYJRoGM8ZpQP02ge90UIqK2egw3JWC/hxqqEyjAWQBxvK3VBe06XEg7Ytmk+P57rTlV8cmGCMyHCrEcFmqd5jXVEWKiTp4GpUy57BIlYh0Qzvv6fGiWm4hlKaOpPqTw2+PGRIFz0Lznqj5uF6ekHBzt28pWJ0WaFQbcdOumFL3MwgPIhKMzDEQQRQ0pF+vvfoqfiXcKwURy3Bxkeh+D3dtvzLvLbWg30gJ5tq7NYH6THzo9GvQc18ENtke81iluD9ZePD0lRIK71BQ0UxilhzjPEeA7nxXA1Ph21roSQ4rOtkEVkNH1asAF+T4z9iKjYY0WJOLUENcxt77ASpQQyVLrfZwrWi65XWimyFBhf8tlCKNX0IyCDks5s8FHZGaAMQs4+rmNu81hq3DP3z6UWBUtp+NSbmEFTwdCc1posbGTLZzsjnPt+3KtmUjmBtX2eGxncgpv+VtykRQzx1XewlD630Dq0apRE47qXFy3sICw6pH05fAn/HLSamg01Ju646YR3vvvq8BdlpSv4HRA3Et5FhIv+5PwR+k32qIgX/8gsr2ay22bK9/rywn/CPWDncD6nJqGfA8L7Gu7T/qbRTo5Kq1Ueq+QIcviJ1eqRwS3eI/fYU7fZSax0yfrpkcJA4topHRBWTntQk1cV2SZciBjSVW0ke1VJHIjlMOuoSSrjn6dKxI+ X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 1b78c234-3e1c-40a4-3dd5-08dd5c69882a X-MS-Exchange-CrossTenant-AuthSource: SA1PR12MB7272.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Mar 2025 04:44:08.4755 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: /G4Eu34xQ887Q2QKiEpJsVkPv+M1jj2Srf+2F+PuYVSiCAwsw1xek6gKdxOcV3SoKFR6yTd/BBhNN3wMEikppw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ0PR12MB5674 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Add new tests for migrating anon THP pages, including anon_huge, anon_huge_zero and error cases involving forced splitting of pages during migration. Signed-off-by: Balbir Singh --- tools/testing/selftests/mm/hmm-tests.c | 407 +++++++++++++++++++++++++ 1 file changed, 407 insertions(+) diff --git a/tools/testing/selftests/mm/hmm-tests.c b/tools/testing/selftests/mm/hmm-tests.c index 141bf63cbe05..b79274190022 100644 --- a/tools/testing/selftests/mm/hmm-tests.c +++ b/tools/testing/selftests/mm/hmm-tests.c @@ -2056,4 +2056,411 @@ TEST_F(hmm, hmm_cow_in_device) hmm_buffer_free(buffer); } + +/* + * Migrate private anonymous huge empty page. + */ +TEST_F(hmm, migrate_anon_huge_empty) +{ + struct hmm_buffer *buffer; + unsigned long npages; + unsigned long size; + unsigned long i; + void *old_ptr; + void *map; + int *ptr; + int ret; + + size = TWOMEG; + + buffer = malloc(sizeof(*buffer)); + ASSERT_NE(buffer, NULL); + + buffer->fd = -1; + buffer->size = 2 * size; + buffer->mirror = malloc(size); + ASSERT_NE(buffer->mirror, NULL); + memset(buffer->mirror, 0xFF, size); + + buffer->ptr = mmap(NULL, 2 * size, + PROT_READ, + MAP_PRIVATE | MAP_ANONYMOUS, + buffer->fd, 0); + ASSERT_NE(buffer->ptr, MAP_FAILED); + + npages = size >> self->page_shift; + map = (void *)ALIGN((uintptr_t)buffer->ptr, size); + ret = madvise(map, size, MADV_HUGEPAGE); + ASSERT_EQ(ret, 0); + old_ptr = buffer->ptr; + buffer->ptr = map; + + /* Migrate memory to device. */ + ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + /* Check what the device read. */ + for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i) + ASSERT_EQ(ptr[i], 0); + + buffer->ptr = old_ptr; + hmm_buffer_free(buffer); +} + +/* + * Migrate private anonymous huge zero page. + */ +TEST_F(hmm, migrate_anon_huge_zero) +{ + struct hmm_buffer *buffer; + unsigned long npages; + unsigned long size; + unsigned long i; + void *old_ptr; + void *map; + int *ptr; + int ret; + int val; + + size = TWOMEG; + + buffer = malloc(sizeof(*buffer)); + ASSERT_NE(buffer, NULL); + + buffer->fd = -1; + buffer->size = 2 * size; + buffer->mirror = malloc(size); + ASSERT_NE(buffer->mirror, NULL); + memset(buffer->mirror, 0xFF, size); + + buffer->ptr = mmap(NULL, 2 * size, + PROT_READ, + MAP_PRIVATE | MAP_ANONYMOUS, + buffer->fd, 0); + ASSERT_NE(buffer->ptr, MAP_FAILED); + + npages = size >> self->page_shift; + map = (void *)ALIGN((uintptr_t)buffer->ptr, size); + ret = madvise(map, size, MADV_HUGEPAGE); + ASSERT_EQ(ret, 0); + old_ptr = buffer->ptr; + buffer->ptr = map; + + /* Initialize a read-only zero huge page. */ + val = *(int *)buffer->ptr; + ASSERT_EQ(val, 0); + + /* Migrate memory to device. */ + ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + /* Check what the device read. */ + for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i) + ASSERT_EQ(ptr[i], 0); + + /* Fault pages back to system memory and check them. */ + for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i) { + ASSERT_EQ(ptr[i], 0); + /* If it asserts once, it probably will 500,000 times */ + if (ptr[i] != 0) + break; + } + + buffer->ptr = old_ptr; + hmm_buffer_free(buffer); +} + +/* + * Migrate private anonymous huge page and free. + */ +TEST_F(hmm, migrate_anon_huge_free) +{ + struct hmm_buffer *buffer; + unsigned long npages; + unsigned long size; + unsigned long i; + void *old_ptr; + void *map; + int *ptr; + int ret; + + size = TWOMEG; + + buffer = malloc(sizeof(*buffer)); + ASSERT_NE(buffer, NULL); + + buffer->fd = -1; + buffer->size = 2 * size; + buffer->mirror = malloc(size); + ASSERT_NE(buffer->mirror, NULL); + memset(buffer->mirror, 0xFF, size); + + buffer->ptr = mmap(NULL, 2 * size, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, + buffer->fd, 0); + ASSERT_NE(buffer->ptr, MAP_FAILED); + + npages = size >> self->page_shift; + map = (void *)ALIGN((uintptr_t)buffer->ptr, size); + ret = madvise(map, size, MADV_HUGEPAGE); + ASSERT_EQ(ret, 0); + old_ptr = buffer->ptr; + buffer->ptr = map; + + /* Initialize buffer in system memory. */ + for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i) + ptr[i] = i; + + /* Migrate memory to device. */ + ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + /* Check what the device read. */ + for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i) + ASSERT_EQ(ptr[i], i); + + /* Try freeing it. */ + ret = madvise(map, size, MADV_FREE); + ASSERT_EQ(ret, 0); + + buffer->ptr = old_ptr; + hmm_buffer_free(buffer); +} + +/* + * Migrate private anonymous huge page and fault back to sysmem. + */ +TEST_F(hmm, migrate_anon_huge_fault) +{ + struct hmm_buffer *buffer; + unsigned long npages; + unsigned long size; + unsigned long i; + void *old_ptr; + void *map; + int *ptr; + int ret; + + size = TWOMEG; + + buffer = malloc(sizeof(*buffer)); + ASSERT_NE(buffer, NULL); + + buffer->fd = -1; + buffer->size = 2 * size; + buffer->mirror = malloc(size); + ASSERT_NE(buffer->mirror, NULL); + memset(buffer->mirror, 0xFF, size); + + buffer->ptr = mmap(NULL, 2 * size, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, + buffer->fd, 0); + ASSERT_NE(buffer->ptr, MAP_FAILED); + + npages = size >> self->page_shift; + map = (void *)ALIGN((uintptr_t)buffer->ptr, size); + ret = madvise(map, size, MADV_HUGEPAGE); + ASSERT_EQ(ret, 0); + old_ptr = buffer->ptr; + buffer->ptr = map; + + /* Initialize buffer in system memory. */ + for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i) + ptr[i] = i; + + /* Migrate memory to device. */ + ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + /* Check what the device read. */ + for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i) + ASSERT_EQ(ptr[i], i); + + /* Fault pages back to system memory and check them. */ + for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i) + ASSERT_EQ(ptr[i], i); + + buffer->ptr = old_ptr; + hmm_buffer_free(buffer); +} + +/* + * Migrate private anonymous huge page with allocation errors. + */ +TEST_F(hmm, migrate_anon_huge_err) +{ + struct hmm_buffer *buffer; + unsigned long npages; + unsigned long size; + unsigned long i; + void *old_ptr; + void *map; + int *ptr; + int ret; + + size = TWOMEG; + + buffer = malloc(sizeof(*buffer)); + ASSERT_NE(buffer, NULL); + + buffer->fd = -1; + buffer->size = 2 * size; + buffer->mirror = malloc(2 * size); + ASSERT_NE(buffer->mirror, NULL); + memset(buffer->mirror, 0xFF, 2 * size); + + old_ptr = mmap(NULL, 2 * size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, buffer->fd, 0); + ASSERT_NE(old_ptr, MAP_FAILED); + + npages = size >> self->page_shift; + map = (void *)ALIGN((uintptr_t)old_ptr, size); + ret = madvise(map, size, MADV_HUGEPAGE); + ASSERT_EQ(ret, 0); + buffer->ptr = map; + + /* Initialize buffer in system memory. */ + for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i) + ptr[i] = i; + + /* Migrate memory to device but force a THP allocation error. */ + ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_FLAGS, buffer, + HMM_DMIRROR_FLAG_FAIL_ALLOC); + ASSERT_EQ(ret, 0); + ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + /* Check what the device read. */ + for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i) { + ASSERT_EQ(ptr[i], i); + if (ptr[i] != i) + break; + } + + /* Try faulting back a single (PAGE_SIZE) page. */ + ptr = buffer->ptr; + ASSERT_EQ(ptr[2048], 2048); + + /* unmap and remap the region to reset things. */ + ret = munmap(old_ptr, 2 * size); + ASSERT_EQ(ret, 0); + old_ptr = mmap(NULL, 2 * size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, buffer->fd, 0); + ASSERT_NE(old_ptr, MAP_FAILED); + map = (void *)ALIGN((uintptr_t)old_ptr, size); + ret = madvise(map, size, MADV_HUGEPAGE); + ASSERT_EQ(ret, 0); + buffer->ptr = map; + + /* Initialize buffer in system memory. */ + for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i) + ptr[i] = i; + + /* Migrate THP to device. */ + ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + /* + * Force an allocation error when faulting back a THP resident in the + * device. + */ + ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_FLAGS, buffer, + HMM_DMIRROR_FLAG_FAIL_ALLOC); + ASSERT_EQ(ret, 0); + ptr = buffer->ptr; + ASSERT_EQ(ptr[2048], 2048); + + buffer->ptr = old_ptr; + hmm_buffer_free(buffer); +} + +/* + * Migrate private anonymous huge zero page with allocation errors. + */ +TEST_F(hmm, migrate_anon_huge_zero_err) +{ + struct hmm_buffer *buffer; + unsigned long npages; + unsigned long size; + unsigned long i; + void *old_ptr; + void *map; + int *ptr; + int ret; + + size = TWOMEG; + + buffer = malloc(sizeof(*buffer)); + ASSERT_NE(buffer, NULL); + + buffer->fd = -1; + buffer->size = 2 * size; + buffer->mirror = malloc(2 * size); + ASSERT_NE(buffer->mirror, NULL); + memset(buffer->mirror, 0xFF, 2 * size); + + old_ptr = mmap(NULL, 2 * size, PROT_READ, + MAP_PRIVATE | MAP_ANONYMOUS, buffer->fd, 0); + ASSERT_NE(old_ptr, MAP_FAILED); + + npages = size >> self->page_shift; + map = (void *)ALIGN((uintptr_t)old_ptr, size); + ret = madvise(map, size, MADV_HUGEPAGE); + ASSERT_EQ(ret, 0); + buffer->ptr = map; + + /* Migrate memory to device but force a THP allocation error. */ + ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_FLAGS, buffer, + HMM_DMIRROR_FLAG_FAIL_ALLOC); + ASSERT_EQ(ret, 0); + ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + /* Check what the device read. */ + for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i) + ASSERT_EQ(ptr[i], 0); + + /* Try faulting back a single (PAGE_SIZE) page. */ + ptr = buffer->ptr; + ASSERT_EQ(ptr[2048], 0); + + /* unmap and remap the region to reset things. */ + ret = munmap(old_ptr, 2 * size); + ASSERT_EQ(ret, 0); + old_ptr = mmap(NULL, 2 * size, PROT_READ, + MAP_PRIVATE | MAP_ANONYMOUS, buffer->fd, 0); + ASSERT_NE(old_ptr, MAP_FAILED); + map = (void *)ALIGN((uintptr_t)old_ptr, size); + ret = madvise(map, size, MADV_HUGEPAGE); + ASSERT_EQ(ret, 0); + buffer->ptr = map; + + /* Initialize buffer in system memory (zero THP page). */ + ret = ptr[0]; + ASSERT_EQ(ret, 0); + + /* Migrate memory to device but force a THP allocation error. */ + ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_FLAGS, buffer, + HMM_DMIRROR_FLAG_FAIL_ALLOC); + ASSERT_EQ(ret, 0); + ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + /* Fault the device memory back and check it. */ + for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i) + ASSERT_EQ(ptr[i], 0); + + buffer->ptr = old_ptr; + hmm_buffer_free(buffer); +} TEST_HARNESS_MAIN From patchwork Thu Mar 6 04:42:39 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Balbir Singh X-Patchwork-Id: 14003827 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8EC1FC282DE for ; Thu, 6 Mar 2025 04:44:18 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1355010E8F5; Thu, 6 Mar 2025 04:44:18 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.b="I59XWa9V"; dkim-atps=neutral Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2086.outbound.protection.outlook.com [40.107.223.86]) by gabe.freedesktop.org (Postfix) with ESMTPS id C24AA10E8F5; Thu, 6 Mar 2025 04:44:16 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=eL69VNVbZuGJI0dZw6xt5QlbfiOBawp9gziVMpKFjB3NW6q1rFeOEs6PnNFPAqfOsvbTf6z3FXFw6Hg5BRnpBGoJ69LWO8bfup86D6tPVt+83XdB5WWixduHN0Jv8G0W2B8uYe+5ivKk+RAUqUwo9IMVMxygORFlEmpNzQl+dW4zBGW6sW/waCLjMnrW02EqGUiJXtV22+HQo+Pzlcr182L4fmmNlXRna8YM84Un1yVFZiEZZ2cwpYLiYifbdRMK2nhnGGRCu8SljfsxZlYe6t9jqeUpKvlkESoJ1NqIHU6tBe4pWSQAbb0PM5cGr2UHf7lkuqzAGtPniVeVnsa/gA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=6hQ8nXxEormldPk5Ihmbqhigf0YDwOQEqpFjSZ1wu5I=; b=gt+grx6iaXeJspAU5vJWjT+UVygqhSrV54CWrQU9iqS9mz5Tp/bSGfrx8fyZYDUb+SgmaQl73tTXcxngX9PUu58SiMmrKeHG3gYuMAzmgQsGHtQXszk5ajG2N5++6ki1txAjzVKLl5Hu2szkvhDe6ge4QpyRqzqpBDe/4gOmGsxjT1QX88ZtA38VtbMM6eeY3YyE4dnabBQAmw6fW0gZm040UWnwNIsiqACSirAAPS6b4w+sKUd532f1DfXICbgK2XJwirZ7u5/Fs0T7E80Ik7jCfFThPf/7+fQYqquFeRip0FxuDq+oHQgjM4SjcCnpeqY52KtpsJae7gLtgaGjgA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6hQ8nXxEormldPk5Ihmbqhigf0YDwOQEqpFjSZ1wu5I=; b=I59XWa9VungJ/LCi9fbhC1T4alEXf6fbvEuO9jql3BPG7d9MWM7/8uXu0euAXQKj9gWK5x2AdJjr/oc9hweIp2dxAN6XwIaL0i8joqRpr8B2llx+PCh9VFK10vnN94FU9J8KIQMTYnt30MWhAdgjZ3kKEQiuKC7CCBWcAYdrpFwW66G6k4IlR0iZtDRh+T0EqlO8az4koYm22GdJcl3vKXFDKZWcLvipgUInl+EWn6I2tM+TQAMCyQVv1skNV8/mnlX2u4HV886gqe8weCR7DSvEsebDZUoj/z5tbpPbgPrfXv+XQWsOmjNILv/VEItfe2kOaNpfzLlURilDQtpS8Q== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) by SJ0PR12MB5674.namprd12.prod.outlook.com (2603:10b6:a03:42c::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8511.17; Thu, 6 Mar 2025 04:44:12 +0000 Received: from SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868]) by SA1PR12MB7272.namprd12.prod.outlook.com ([fe80::a970:b87e:819a:1868%7]) with mapi id 15.20.8489.028; Thu, 6 Mar 2025 04:44:12 +0000 From: Balbir Singh To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, Balbir Singh , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Shuah Khan , David Hildenbrand , Barry Song , Baolin Wang , Ryan Roberts , Matthew Wilcox , Peter Xu , Zi Yan , Kefeng Wang , Jane Chu , Alistair Popple , Donet Tom Subject: [RFC 11/11] gpu/drm/nouveau: Add THP migration support Date: Thu, 6 Mar 2025 15:42:39 +1100 Message-ID: <20250306044239.3874247-12-balbirs@nvidia.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250306044239.3874247-1-balbirs@nvidia.com> References: <20250306044239.3874247-1-balbirs@nvidia.com> X-ClientProxiedBy: SJ0PR03CA0158.namprd03.prod.outlook.com (2603:10b6:a03:338::13) To SA1PR12MB7272.namprd12.prod.outlook.com (2603:10b6:806:2b6::7) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA1PR12MB7272:EE_|SJ0PR12MB5674:EE_ X-MS-Office365-Filtering-Correlation-Id: 9f110281-27d0-46d2-85de-08dd5c698a48 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|7416014|376014; X-Microsoft-Antispam-Message-Info: 4C9B383CHoHhUCFm8mFOeWJWN1K5hxg7+7CJhHkNinYNlkgW/ReSAMxtsNoPxVj41uk4nyw19pFsUSogbMnur2ruoqLFx3xp0BTPEPUgVHSGCm5ruPNRLcgW4+jmpmKJ36eIQcaKVR1eKfG5fnvEe94EDiVEspfyVlRtf0Fzdcoj+K8Gf6LlBBUJP/IyxPUNdhn6fwwiaMjYTPcQLxsB4O75He1DdWIXP/YfjhiIHZlybl4nsSr0zWxQ7tnbubg8wf6y4SUC2GUw+atSnmrM++3rZfwuCH7XzJhJ96xjrKAcvf/wilb9kuML/PhGI4LikZw520G87bQUZjQRJKJ04AqentKqk0JKlfnaU138dkXZtCGmFqT5739o+tet83vKiW6Lq0/jCtVzvirv1u21eQusr2fis9XHfyQXBYp+0570zUV1rtzHY1biQ8v7aVeiF60i00U3FTacZ8FYTDQK20Y4k8LrIE7rsDYJp3cKDgIXUD4OKDMQvmfu3AW0aFsVOYyaQn9olaNi+6++UXrNR9/DK/yD8Pyuc27vmxryzkn2Scdj2tVQUcGxINeXvYdOGqqbEQOWlDI72xFj1Tp/HgGYi7bDP5BRsxqDfLpEcBDr04HEQWu072FB5SBRgk9U0AHdhn+E9T2NUHA9RQhGM02QllXXYqP9qif4RzpjVAlrR+jwyxD9EBO0Uuf2J4+tmnM0Q0EdgOkrbqwG+6eb4W8mt62g8t8IlBE+9ExnDXlEnrhfOC7M3Mo9/7KQ6MWHFApsba3XkY1WHT4JrCi/p/THwSaRan+3Viw942ST+UOkQFiEzt2QLnsavMF1c7H2Va4sdAi3Ar2n+vCsX39ZsrjanayNb/e05qBPoyIlUDkuu9P7Wwb8Vi20ME5PeMaAmMRfSdpC8XqDkBRaR3WzVxO+o+YhJXT8wCWcQf0i6g4QF2ud/izHgkoCSkTaiLb1nzx+ja7MIbAVLKRPc9mujb0QVV8YQMJavG8KJsLwhO06oSCDkl3+HIa1VBSkuI0ju+MZzkzABEWidOCeFimQXGVb5IjAw5Wqf93gXP02BwPgUgefmapM2ceSBxKqBiepBMsHwdo87pUTsHVlDfU+nHDukNliprUHSVsSgvf9LomprJrj1bkJR5hDaRbEjkg02L4F1yVJLuNqgPj6r7VNeWKRhP5gK4nsUckiQuSfedvx8ivQEjzBbx6m03UkQ8T37yzzk0WUaldYbPJSQp3TdxsccELybNuIkp6Qo3pFKHTBdoN8fqz2Yw2OcJxAERz2dYpHgol92KqmtvI0f8ka1j0kJYc//P4ZOwnNJeGz5vwZbSBzAbZm2lk/aNcgg+qdZ2gyQfzSHzIJ++FNmeww3/Uk9JvPX9+WCgIJj71EfY/hoaXy1ETjSMBkLviR9G6H X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SA1PR12MB7272.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(366016)(1800799024)(7416014)(376014); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 4swrWtKX/8sQqsXwtfVNOAzWXYNcyNUiTozyC7D9wGCOY67v2LjJyzCP1g84xWa2EQiOHMsOFZFFdvQdZrSuoVC3JiD6//uCkBa+vgsGeIjxwsuTFOWrv7+Yj8agKvjTagtjocl/61bh+ztA8gOBdHKyKwYlZGbp0juMOto0pSfr/3Q4SiDFGfWkj3tPNsvTQKbZA5fGgnSs9Z7ClSqymQWUWuH/gH3f04Elxp0v2KQEszoJkVLFDOblR3wZkowTDFwR8dTmRLwRd7tMWtShfLgzt/uenVSgp7Jzy+xCB+oUZFQ+IilceWV8EMg6BcO0UibPH4rlPPsrcTNQB6i+9PwLVRls7ZIC5YFylxhMK0+jHehjGMMXcxOSCTmylyhRegJIQfTOsndlhqow4nlORXEgqxUPXGiNw5H4I5Ev2EHKrbWh354Lt0FgUKQkCHIlJ+2cVV0OXDvktTFr/i7NXEUQJXTnDDK+F4X/DVv1iJbfKuApf2dMkZdoDm7j1ARqrQSzXtfjiVql37xyi0aFdMQ057I9IphTewt7rNLT5GE/tY/yWq+kTVNnIJ7i9cSzEPrIlBHojrtylZld6mh6of9VYjeX7dfagzLZLcub/+0+HDoYWJiOuPvkgA7XcrgLelUwx019c3cY1ifY1p89zsGUhHa+FSUWDgx5hUnFPDlsLzwf1erf9XuQZjk8pk88HjT9YemvSJKIJ+ROK8U4IuLWeaIsPDt1qcIJTSq/Jnij1NsztDF3BmLLyqi/1t+4LAxhTD27rbSlvfe1JWqWX65GjymMXBkCe+dYmEXxInuNDLI+ImZ/u06QxVUpfE/0hCGF/XgSpOVU1B2Jrx/kVWYU8zoeCOMbF/N2v93bNWEYK3VAptsaV9sfBfQEwFa/0YYtrWcUiVeMyIut+1FNNvuWM9/jfturpEnHAI1i4si9vleJ49bAz5Wf3MThu/2JmSV0w/AGh3H5wzo8trDrWccwouGmD2MbHvtrHykPVQwmNqTujl6Fpym5Ae3p41NYfrSCQkSgwlXXB72aHJjn0MCcRoACcjlVnt50hzWwyxC/cNYZawfveA5e7IDqDEHTqAP0CN97zNrxE0y5OQ/uet7VTCL/IAw09XaNBY3vyP6/9n0hak+aKlY+Y9s4s/CmLr1BPovHveCR97Su1VQnbULa0Ec6gCrkTCB1gbO7a/63rjSR9/qSSILXuH/nGCN4LGrOu+i7smTQRHtOT5PUNeH8q9vkmxNDeoyvuHY9EP1f1gC0dor1yocvg66wEAeIoZGJsWtqTcKg+PKcY8KcIx/zupzCoV18OJLNyLCybiDAoBOtExvtepuz1BFI/6F+8HbxuVaVVszTla9S7ARIsnyhw5Kd5X+R1alYd1YYcRsk7CtE7Sjc2qSee//cXNmSmbsLRacvgf8cIU+v4CGv71WtBTvY6Hbsw5Se6+7ZqHTFplBg04Bk8bmFcQYDv12FqCuXBZrftl8vKNBf7kxAjFe3QaRzVy0nKTOMMnZwYDwR8wQ0crNsY3m6DcUKwlR2LcjDoZy2vCXBD2snZqK80RR27OywF6O2nEd4fwRGgwUfBEozNtL/c3PEhjAZ+Y/f X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 9f110281-27d0-46d2-85de-08dd5c698a48 X-MS-Exchange-CrossTenant-AuthSource: SA1PR12MB7272.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Mar 2025 04:44:12.0606 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: DITCZT9Zvq/WtnThyzIRDTbRWJAoEUVU+rTuhfmFTz/0V+nhyGhGdk+wcfm3sHumdauKzoqfTQPHX4fcE89weg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ0PR12MB5674 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Change the code to add support for MIGRATE_VMA_SELECT_COMPOUND and appropriately handling page sizes in the migrate/evict code paths. Signed-off-by: Balbir Singh --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 244 +++++++++++++++++-------- drivers/gpu/drm/nouveau/nouveau_svm.c | 6 +- drivers/gpu/drm/nouveau/nouveau_svm.h | 3 +- 3 files changed, 176 insertions(+), 77 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 61d0f411ef84..bf3681f52ce0 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -83,9 +83,15 @@ struct nouveau_dmem { struct list_head chunks; struct mutex mutex; struct page *free_pages; + struct folio *free_folios; spinlock_t lock; }; +struct nouveau_dmem_dma_info { + dma_addr_t dma_addr; + size_t size; +}; + static struct nouveau_dmem_chunk *nouveau_page_to_chunk(struct page *page) { return container_of(page_pgmap(page), struct nouveau_dmem_chunk, @@ -112,10 +118,16 @@ static void nouveau_dmem_page_free(struct page *page) { struct nouveau_dmem_chunk *chunk = nouveau_page_to_chunk(page); struct nouveau_dmem *dmem = chunk->drm->dmem; + struct folio *folio = page_folio(page); spin_lock(&dmem->lock); - page->zone_device_data = dmem->free_pages; - dmem->free_pages = page; + if (folio_order(folio)) { + folio_set_zone_device_data(folio, dmem->free_folios); + dmem->free_folios = folio; + } else { + page->zone_device_data = dmem->free_pages; + dmem->free_pages = page; + } WARN_ON(!chunk->callocated); chunk->callocated--; @@ -139,20 +151,28 @@ static void nouveau_dmem_fence_done(struct nouveau_fence **fence) } } -static int nouveau_dmem_copy_one(struct nouveau_drm *drm, struct page *spage, - struct page *dpage, dma_addr_t *dma_addr) +static int nouveau_dmem_copy_folio(struct nouveau_drm *drm, + struct folio *sfolio, struct folio *dfolio, + struct nouveau_dmem_dma_info *dma_info) { struct device *dev = drm->dev->dev; + struct page *dpage = folio_page(dfolio, 0); + struct page *spage = folio_page(sfolio, 0); - lock_page(dpage); + folio_lock(dfolio); - *dma_addr = dma_map_page(dev, dpage, 0, PAGE_SIZE, DMA_BIDIRECTIONAL); - if (dma_mapping_error(dev, *dma_addr)) + dma_info->dma_addr = dma_map_page(dev, dpage, 0, page_size(dpage), + DMA_BIDIRECTIONAL); + dma_info->size = page_size(dpage); + if (dma_mapping_error(dev, dma_info->dma_addr)) return -EIO; - if (drm->dmem->migrate.copy_func(drm, 1, NOUVEAU_APER_HOST, *dma_addr, - NOUVEAU_APER_VRAM, nouveau_dmem_page_addr(spage))) { - dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); + if (drm->dmem->migrate.copy_func(drm, folio_nr_pages(sfolio), + NOUVEAU_APER_HOST, dma_info->dma_addr, + NOUVEAU_APER_VRAM, + nouveau_dmem_page_addr(spage))) { + dma_unmap_page(dev, dma_info->dma_addr, page_size(dpage), + DMA_BIDIRECTIONAL); return -EIO; } @@ -165,21 +185,38 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf) struct nouveau_dmem *dmem = drm->dmem; struct nouveau_fence *fence; struct nouveau_svmm *svmm; - struct page *spage, *dpage; - unsigned long src = 0, dst = 0; - dma_addr_t dma_addr = 0; + struct page *dpage; vm_fault_t ret = 0; struct migrate_vma args = { .vma = vmf->vma, - .start = vmf->address, - .end = vmf->address + PAGE_SIZE, - .src = &src, - .dst = &dst, .pgmap_owner = drm->dev, .fault_page = vmf->page, - .flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE, + .flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE | + MIGRATE_VMA_SELECT_COMPOUND, + .src = NULL, + .dst = NULL, }; - + unsigned int order, nr; + struct folio *sfolio, *dfolio; + struct nouveau_dmem_dma_info dma_info; + + sfolio = page_folio(vmf->page); + order = folio_order(sfolio); + nr = 1 << order; + + if (order) + args.flags |= MIGRATE_VMA_SELECT_COMPOUND; + + args.start = ALIGN_DOWN(vmf->address, (1 << (PAGE_SHIFT + order))); + args.vma = vmf->vma; + args.end = args.start + (PAGE_SIZE << order); + args.src = kcalloc(nr, sizeof(*args.src), GFP_KERNEL); + args.dst = kcalloc(nr, sizeof(*args.dst), GFP_KERNEL); + + if (!args.src || !args.dst) { + ret = VM_FAULT_OOM; + goto err; + } /* * FIXME what we really want is to find some heuristic to migrate more * than just one page on CPU fault. When such fault happens it is very @@ -190,20 +227,26 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf) if (!args.cpages) return 0; - spage = migrate_pfn_to_page(src); - if (!spage || !(src & MIGRATE_PFN_MIGRATE)) - goto done; - - dpage = alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO, vmf->vma, vmf->address); - if (!dpage) + if (order) + dpage = folio_page(vma_alloc_folio(GFP_HIGHUSER | __GFP_ZERO, + order, vmf->vma, vmf->address), 0); + else + dpage = alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO, vmf->vma, + vmf->address); + if (!dpage) { + ret = VM_FAULT_OOM; goto done; + } - dst = migrate_pfn(page_to_pfn(dpage)); + args.dst[0] = migrate_pfn(page_to_pfn(dpage)); + if (order) + args.dst[0] |= MIGRATE_PFN_COMPOUND; + dfolio = page_folio(dpage); - svmm = spage->zone_device_data; + svmm = folio_zone_device_data(sfolio); mutex_lock(&svmm->mutex); nouveau_svmm_invalidate(svmm, args.start, args.end); - ret = nouveau_dmem_copy_one(drm, spage, dpage, &dma_addr); + ret = nouveau_dmem_copy_folio(drm, sfolio, dfolio, &dma_info); mutex_unlock(&svmm->mutex); if (ret) { ret = VM_FAULT_SIGBUS; @@ -213,19 +256,31 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf) nouveau_fence_new(&fence, dmem->migrate.chan); migrate_vma_pages(&args); nouveau_dmem_fence_done(&fence); - dma_unmap_page(drm->dev->dev, dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); + dma_unmap_page(drm->dev->dev, dma_info.dma_addr, PAGE_SIZE, + DMA_BIDIRECTIONAL); done: migrate_vma_finalize(&args); +err: + kfree(args.src); + kfree(args.dst); return ret; } +static void nouveau_dmem_folio_split(struct folio *head, struct folio *tail) +{ + tail->pgmap = head->pgmap; + folio_set_zone_device_data(tail, folio_zone_device_data(head)); +} + static const struct dev_pagemap_ops nouveau_dmem_pagemap_ops = { .page_free = nouveau_dmem_page_free, .migrate_to_ram = nouveau_dmem_migrate_to_ram, + .folio_split = nouveau_dmem_folio_split, }; static int -nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage) +nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage, + bool is_large) { struct nouveau_dmem_chunk *chunk; struct resource *res; @@ -279,16 +334,21 @@ nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage) pfn_first = chunk->pagemap.range.start >> PAGE_SHIFT; page = pfn_to_page(pfn_first); spin_lock(&drm->dmem->lock); - for (i = 0; i < DMEM_CHUNK_NPAGES - 1; ++i, ++page) { - page->zone_device_data = drm->dmem->free_pages; - drm->dmem->free_pages = page; + + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) || !is_large) { + for (i = 0; i < DMEM_CHUNK_NPAGES - 1; ++i, ++page) { + page->zone_device_data = drm->dmem->free_pages; + drm->dmem->free_pages = page; + } } + *ppage = page; chunk->callocated++; spin_unlock(&drm->dmem->lock); - NV_INFO(drm, "DMEM: registered %ldMB of device memory\n", - DMEM_CHUNK_SIZE >> 20); + NV_INFO(drm, "DMEM: registered %ldMB of %sdevice memory %lx %lx\n", + DMEM_CHUNK_SIZE >> 20, is_large ? "THP " : "", pfn_first, + nouveau_dmem_page_addr(page)); return 0; @@ -305,27 +365,37 @@ nouveau_dmem_chunk_alloc(struct nouveau_drm *drm, struct page **ppage) } static struct page * -nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm) +nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_large) { struct nouveau_dmem_chunk *chunk; struct page *page = NULL; + struct folio *folio = NULL; int ret; + unsigned int order = 0; spin_lock(&drm->dmem->lock); - if (drm->dmem->free_pages) { + if (is_large && drm->dmem->free_folios) { + folio = drm->dmem->free_folios; + drm->dmem->free_folios = folio_zone_device_data(folio); + chunk = nouveau_page_to_chunk(page); + chunk->callocated++; + spin_unlock(&drm->dmem->lock); + order = DMEM_CHUNK_NPAGES; + } else if (!is_large && drm->dmem->free_pages) { page = drm->dmem->free_pages; drm->dmem->free_pages = page->zone_device_data; chunk = nouveau_page_to_chunk(page); chunk->callocated++; spin_unlock(&drm->dmem->lock); + folio = page_folio(page); } else { spin_unlock(&drm->dmem->lock); - ret = nouveau_dmem_chunk_alloc(drm, &page); + ret = nouveau_dmem_chunk_alloc(drm, &page, is_large); if (ret) return NULL; } - zone_device_page_init(page); + init_zone_device_folio(folio, order); return page; } @@ -376,12 +446,12 @@ nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk) { unsigned long i, npages = range_len(&chunk->pagemap.range) >> PAGE_SHIFT; unsigned long *src_pfns, *dst_pfns; - dma_addr_t *dma_addrs; + struct nouveau_dmem_dma_info *dma_info; struct nouveau_fence *fence; src_pfns = kvcalloc(npages, sizeof(*src_pfns), GFP_KERNEL | __GFP_NOFAIL); dst_pfns = kvcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL | __GFP_NOFAIL); - dma_addrs = kvcalloc(npages, sizeof(*dma_addrs), GFP_KERNEL | __GFP_NOFAIL); + dma_info = kvcalloc(npages, sizeof(*dma_info), GFP_KERNEL | __GFP_NOFAIL); migrate_device_range(src_pfns, chunk->pagemap.range.start >> PAGE_SHIFT, npages); @@ -389,17 +459,28 @@ nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk) for (i = 0; i < npages; i++) { if (src_pfns[i] & MIGRATE_PFN_MIGRATE) { struct page *dpage; + struct folio *folio = page_folio( + migrate_pfn_to_page(src_pfns[i])); + unsigned int order = folio_order(folio); + + if (src_pfns[i] & MIGRATE_PFN_COMPOUND) { + dpage = folio_page( + folio_alloc( + GFP_HIGHUSER_MOVABLE, order), 0); + } else { + /* + * _GFP_NOFAIL because the GPU is going away and there + * is nothing sensible we can do if we can't copy the + * data back. + */ + dpage = alloc_page(GFP_HIGHUSER | __GFP_NOFAIL); + } - /* - * _GFP_NOFAIL because the GPU is going away and there - * is nothing sensible we can do if we can't copy the - * data back. - */ - dpage = alloc_page(GFP_HIGHUSER | __GFP_NOFAIL); dst_pfns[i] = migrate_pfn(page_to_pfn(dpage)); - nouveau_dmem_copy_one(chunk->drm, - migrate_pfn_to_page(src_pfns[i]), dpage, - &dma_addrs[i]); + nouveau_dmem_copy_folio(chunk->drm, + page_folio(migrate_pfn_to_page(src_pfns[i])), + page_folio(dpage), + &dma_info[i]); } } @@ -410,8 +491,9 @@ nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk) kvfree(src_pfns); kvfree(dst_pfns); for (i = 0; i < npages; i++) - dma_unmap_page(chunk->drm->dev->dev, dma_addrs[i], PAGE_SIZE, DMA_BIDIRECTIONAL); - kvfree(dma_addrs); + dma_unmap_page(chunk->drm->dev->dev, dma_info[i].dma_addr, + dma_info[i].size, DMA_BIDIRECTIONAL); + kvfree(dma_info); } void @@ -615,31 +697,35 @@ nouveau_dmem_init(struct nouveau_drm *drm) static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm, struct nouveau_svmm *svmm, unsigned long src, - dma_addr_t *dma_addr, u64 *pfn) + struct nouveau_dmem_dma_info *dma_info, u64 *pfn) { struct device *dev = drm->dev->dev; struct page *dpage, *spage; unsigned long paddr; + bool is_large = false; spage = migrate_pfn_to_page(src); if (!(src & MIGRATE_PFN_MIGRATE)) goto out; - dpage = nouveau_dmem_page_alloc_locked(drm); + is_large = src & MIGRATE_PFN_COMPOUND; + dpage = nouveau_dmem_page_alloc_locked(drm, is_large); if (!dpage) goto out; paddr = nouveau_dmem_page_addr(dpage); if (spage) { - *dma_addr = dma_map_page(dev, spage, 0, page_size(spage), + dma_info->dma_addr = dma_map_page(dev, spage, 0, page_size(spage), DMA_BIDIRECTIONAL); - if (dma_mapping_error(dev, *dma_addr)) + dma_info->size = page_size(spage); + if (dma_mapping_error(dev, dma_info->dma_addr)) goto out_free_page; - if (drm->dmem->migrate.copy_func(drm, 1, - NOUVEAU_APER_VRAM, paddr, NOUVEAU_APER_HOST, *dma_addr)) + if (drm->dmem->migrate.copy_func(drm, folio_nr_pages(page_folio(spage)), + NOUVEAU_APER_VRAM, paddr, NOUVEAU_APER_HOST, + dma_info->dma_addr)) goto out_dma_unmap; } else { - *dma_addr = DMA_MAPPING_ERROR; + dma_info->dma_addr = DMA_MAPPING_ERROR; if (drm->dmem->migrate.clear_func(drm, page_size(dpage), NOUVEAU_APER_VRAM, paddr)) goto out_free_page; @@ -653,7 +739,7 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm, return migrate_pfn(page_to_pfn(dpage)); out_dma_unmap: - dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); + dma_unmap_page(dev, dma_info->dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); out_free_page: nouveau_dmem_page_free_locked(drm, dpage); out: @@ -663,27 +749,33 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm, static void nouveau_dmem_migrate_chunk(struct nouveau_drm *drm, struct nouveau_svmm *svmm, struct migrate_vma *args, - dma_addr_t *dma_addrs, u64 *pfns) + struct nouveau_dmem_dma_info *dma_info, u64 *pfns) { struct nouveau_fence *fence; unsigned long addr = args->start, nr_dma = 0, i; + unsigned long order = 0; - for (i = 0; addr < args->end; i++) { + for (i = 0; addr < args->end; ) { + struct folio *folio; + + folio = page_folio(migrate_pfn_to_page(args->dst[i])); + order = folio_order(folio); args->dst[i] = nouveau_dmem_migrate_copy_one(drm, svmm, - args->src[i], dma_addrs + nr_dma, pfns + i); - if (!dma_mapping_error(drm->dev->dev, dma_addrs[nr_dma])) + args->src[i], dma_info + nr_dma, pfns + i); + if (!dma_mapping_error(drm->dev->dev, dma_info[nr_dma].dma_addr)) nr_dma++; - addr += PAGE_SIZE; + i += 1 << order; + addr += (1 << order) * PAGE_SIZE; } nouveau_fence_new(&fence, drm->dmem->migrate.chan); migrate_vma_pages(args); nouveau_dmem_fence_done(&fence); - nouveau_pfns_map(svmm, args->vma->vm_mm, args->start, pfns, i); + nouveau_pfns_map(svmm, args->vma->vm_mm, args->start, pfns, i, order); while (nr_dma--) { - dma_unmap_page(drm->dev->dev, dma_addrs[nr_dma], PAGE_SIZE, - DMA_BIDIRECTIONAL); + dma_unmap_page(drm->dev->dev, dma_info[nr_dma].dma_addr, + dma_info[nr_dma].size, DMA_BIDIRECTIONAL); } migrate_vma_finalize(args); } @@ -697,20 +789,24 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm, { unsigned long npages = (end - start) >> PAGE_SHIFT; unsigned long max = min(SG_MAX_SINGLE_ALLOC, npages); - dma_addr_t *dma_addrs; struct migrate_vma args = { .vma = vma, .start = start, .pgmap_owner = drm->dev, - .flags = MIGRATE_VMA_SELECT_SYSTEM, + .flags = MIGRATE_VMA_SELECT_SYSTEM + | MIGRATE_VMA_SELECT_COMPOUND, }; unsigned long i; u64 *pfns; int ret = -ENOMEM; + struct nouveau_dmem_dma_info *dma_info; if (drm->dmem == NULL) return -ENODEV; + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + max = max(HPAGE_PMD_NR, max); + args.src = kcalloc(max, sizeof(*args.src), GFP_KERNEL); if (!args.src) goto out; @@ -718,8 +814,8 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm, if (!args.dst) goto out_free_src; - dma_addrs = kmalloc_array(max, sizeof(*dma_addrs), GFP_KERNEL); - if (!dma_addrs) + dma_info = kmalloc_array(max, sizeof(*dma_info), GFP_KERNEL); + if (!dma_info) goto out_free_dst; pfns = nouveau_pfns_alloc(max); @@ -737,7 +833,7 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm, goto out_free_pfns; if (args.cpages) - nouveau_dmem_migrate_chunk(drm, svmm, &args, dma_addrs, + nouveau_dmem_migrate_chunk(drm, svmm, &args, dma_info, pfns); args.start = args.end; } @@ -746,7 +842,7 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm, out_free_pfns: nouveau_pfns_free(pfns); out_free_dma: - kfree(dma_addrs); + kfree(dma_info); out_free_dst: kfree(args.dst); out_free_src: diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c index 1fed638b9eba..0693179d0a7d 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -920,12 +920,14 @@ nouveau_pfns_free(u64 *pfns) void nouveau_pfns_map(struct nouveau_svmm *svmm, struct mm_struct *mm, - unsigned long addr, u64 *pfns, unsigned long npages) + unsigned long addr, u64 *pfns, unsigned long npages, + unsigned int page_shift) { struct nouveau_pfnmap_args *args = nouveau_pfns_to_args(pfns); args->p.addr = addr; - args->p.size = npages << PAGE_SHIFT; + args->p.size = npages << page_shift; + args->p.page = page_shift; mutex_lock(&svmm->mutex); diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.h b/drivers/gpu/drm/nouveau/nouveau_svm.h index e7d63d7f0c2d..3fd78662f17e 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.h +++ b/drivers/gpu/drm/nouveau/nouveau_svm.h @@ -33,7 +33,8 @@ void nouveau_svmm_invalidate(struct nouveau_svmm *svmm, u64 start, u64 limit); u64 *nouveau_pfns_alloc(unsigned long npages); void nouveau_pfns_free(u64 *pfns); void nouveau_pfns_map(struct nouveau_svmm *svmm, struct mm_struct *mm, - unsigned long addr, u64 *pfns, unsigned long npages); + unsigned long addr, u64 *pfns, unsigned long npages, + unsigned int page_shift); #else /* IS_ENABLED(CONFIG_DRM_NOUVEAU_SVM) */ static inline void nouveau_svm_init(struct nouveau_drm *drm) {} static inline void nouveau_svm_fini(struct nouveau_drm *drm) {}