From patchwork Tue Dec 17 05:12:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alistair Popple X-Patchwork-Id: 13911116 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AA5DDE7716A for ; Tue, 17 Dec 2024 05:25:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:MIME-Version:Content-Type: Content-Transfer-Encoding:References:In-Reply-To:Message-ID:Date:Subject:Cc: To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=QaSh+cBcmD4fAyVgZBd8ptmSZOPC9ILTc9cpqUKZh8s=; b=Vh3OTN5u6zIH3vd8tIzkrUKCwu xPRLsKZfhBlDcVo7v0d4Y5ZGT1ziF/MyWk7kdsoNPhLD1cKPMp7pH8bDMeGZmFR1YYy2keH/yT+AM ZA8KkLKMycNTT+cWq9Iow81sesAJUAtDNaVYZ9QKbR3vivQcDsBza3RGH/tkQbWLI1W1R6osuj2hT 98kM58sf03CsSMvSjy4Q6UMhbkpprlqi8jj1k9QERYTAJ9oRLpGZhlC0GEXcXP7cdE/+hrrAu6dWJ cQd+8xiYREo6VuFXG6kQ+IyY/gyCgg222zxPfYY4bMwE19SPxsvW/CJ+CbtWe2jeHcVaBT86Ecyds lvaYR8hA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tNQ5T-0000000CH42-0UvF; Tue, 17 Dec 2024 05:25:39 +0000 Received: from mail-dm6nam11on2062e.outbound.protection.outlook.com ([2a01:111:f403:2415::62e] helo=NAM11-DM6-obe.outbound.protection.outlook.com) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tNPui-0000000CDLp-0x6G for linux-arm-kernel@lists.infradead.org; Tue, 17 Dec 2024 05:14:33 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Y/O4KcL0S2UBwEvonloLZ0oqEvS4VcJEEi5ZaCm+pibJaOF4EumsRJpI/JhPllMLdGrw6Qod4g99nwjzf95TqhuEIM/Mo+n8pLfn5xNqOacucjjpRajJOQCV8M/3od6uRHJwtwxPPSrn0n/xgALMHQD38L53fRwTNi2B1kxQr1ZhlpSPcxJFbVrx2BHIjEDoreRmHpryKoH2Ui8Ca1fjdMelgWJ97rzWtVKRERt2G+e8XyW8FO4PiWosDnSjRy7Ndllo+tKkLTeJopAH9yz/te4yGecUAsfzP0uryQ67+pRD1N8+qbTgNoGk4MJHVFn7CA2SXMD0loaoQ00FK33KLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QaSh+cBcmD4fAyVgZBd8ptmSZOPC9ILTc9cpqUKZh8s=; b=YMNwrH1EXtt9QQDyu+i6BS7A+r5bX+0BIQAXem1wfrogVBzKR+W0B3PUFjKS2IJM0dR0UR84gRfHf7CxXrREnRk3E7wqfGYxtU0LOnRBUXhvn9hz3t2J3d4t8yqeZ4QzOzJudcX2HqcLcfs1e0UYe2ZwERg/p/ChwJsdNqosmrRrTIsXK6tUide4oKnsL+6JMvDuRpQl3kRxVTMUKfjxx+GmTPCwE1hDWoD/i37z8iVVLzw5yMSTyhPmpfbpnyIrG81HYApr0/Su27i84thg/t/Gxq4Udg8o2cgoQZGf4qXObeitmwAhrB0hj6h5QXM2ijAfQoTLsbBEzbsMkR3P7g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QaSh+cBcmD4fAyVgZBd8ptmSZOPC9ILTc9cpqUKZh8s=; b=LLSmlK/LjITICPJUbSqXY1OE7ZL9gTscmjMIT0yNzLHeXIXpV71erhtBdjIdEOGZhmwCoB4U12NoeZPub685r1AEVpe6W5FzsRKR7HU+/0SXko6Dn+e9vXkoe0JwL9S36J//2PUMDBot0XyfOZITQot1Ph8Lcuabqr3wfFG7ewbJ2a9F7qj4EfGHrVIS9VcZ+mMP5IlluYCTrUcj9mg2bhrWz7wRACrn4xozmeZb9qXCoE33PFwR2/QuOJXzDMqa3O1GQrGBElzJQ9WVFfP2CUbUeawPXFjq7xS/oarq8FrLZsj8CoARMk/nCjuQYBbr5Rvf20yRCl3A83WQ5+boGw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from DS0PR12MB7726.namprd12.prod.outlook.com (2603:10b6:8:130::6) by CH3PR12MB8936.namprd12.prod.outlook.com (2603:10b6:610:179::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8251.22; Tue, 17 Dec 2024 05:14:27 +0000 Received: from DS0PR12MB7726.namprd12.prod.outlook.com ([fe80::953f:2f80:90c5:67fe]) by DS0PR12MB7726.namprd12.prod.outlook.com ([fe80::953f:2f80:90c5:67fe%4]) with mapi id 15.20.8251.015; Tue, 17 Dec 2024 05:14:27 +0000 From: Alistair Popple To: akpm@linux-foundation.org, dan.j.williams@intel.com, linux-mm@kvack.org Cc: Alistair Popple , lina@asahilina.net, zhang.lyra@gmail.com, gerald.schaefer@linux.ibm.com, vishal.l.verma@intel.com, dave.jiang@intel.com, logang@deltatee.com, bhelgaas@google.com, jack@suse.cz, jgg@ziepe.ca, catalin.marinas@arm.com, will@kernel.org, mpe@ellerman.id.au, npiggin@gmail.com, dave.hansen@linux.intel.com, ira.weiny@intel.com, willy@infradead.org, djwong@kernel.org, tytso@mit.edu, linmiaohe@huawei.com, david@redhat.com, peterx@redhat.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, jhubbard@nvidia.com, hch@lst.de, david@fromorbit.com Subject: [PATCH v4 10/25] mm/mm_init: Move p2pdma page refcount initialisation to p2pdma Date: Tue, 17 Dec 2024 16:12:53 +1100 Message-ID: X-Mailer: git-send-email 2.45.2 In-Reply-To: References: X-ClientProxiedBy: SY5P300CA0005.AUSP300.PROD.OUTLOOK.COM (2603:10c6:10:1fb::6) To DS0PR12MB7726.namprd12.prod.outlook.com (2603:10b6:8:130::6) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS0PR12MB7726:EE_|CH3PR12MB8936:EE_ X-MS-Office365-Filtering-Correlation-Id: 5e3fad04-c653-4db0-dc1a-08dd1e59add1 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|1800799024|7416014|366016; X-Microsoft-Antispam-Message-Info: qZmYDa4cXf0EXnax6NFfPTUNdIDNBChsELUiVcoBHSWM1WO8iTOIbw9dbp37yGkBtDPV5Ubt31WqJn7Sfm0pTqrr1qL+22C2qMxENxELhiDIT8SDYN03U1Lhu4ox6o3txGzf/LjyU+0S6KgXxst3wonMl9H8a3RVmSNRzplStWP/SvLH6hGP4Hi/y6tpgdqhlhJCFXAcaI/Oko5H/pLS5LniEEwe32dJq5YIOot/GSPRnh8EMrk6uTSHMZC00uaUSe6JyPNFbIWgcIF6NNjqLcnBzpkFfCxZNTco5iaa2euX43fhzjqp4F1PyqeuW6fRTxHpDv+vS9cTdKsI4rvolp+Z7cT0kUxO3RnI6wJHDvrkfr5JT8HTUxQSKGntb32sI814WVI2kRDnh1oEs10k0OM2wjpiTxjGuQPBlpwtbWu75pKo8Auwe+ihSHoJ+GGorC/o+qa0WeqstxO1dPjITmgq3B7DlAK8AhNctr2XJH9T9iYzxT6srYy819BceDnRC1ZSz8jCi3Fq8Si8SRjW7M3R6gt2UuScuccGlFGtBcpJJw+wAN9Q27e9HQk5TqfO09lFSLDHGJ0hYhcc+CnDzHXjkdx7TPHY8qM84mbU4qyCVmIq8EuAmQjpo0lU3hm3HbUlHrAzyG29hMh9pTIprPHizWMj7OJYIrg3re9Bij5xmqlSpz2KsUkCt/ZDKrSLK0F+0ygrf4RFyje8I5RCeQ00c8SJZOZopJGD5S2wTWWUWT4Rmzym0zAr83rKwHmu/4L6AwdNhPeW2UrFw/dT9NJRjE41sYnE5QlAScAlgD30e2Ja1KEzuIGvA8JSAqPJWrCMQTtHymIlzorwEh4cIZy92K1xHE7HJGpGmQeLlIkB0mddg6/TYdeo3sbYFzKbtln4EWk5Fv02Aaz58duxJxoJkb+W8S41S8YACiiY+uMa0f+uyo9YuFkCjj/6Iy3rp1bQmZ4v18buN2dtmDm0BEMqwwnr1Du717lS0yyHFQ7SJGQpmektlt3xvdY2XOwwXhgW5tKykj+kMy2YKdPRuWHIxfQJh6n6oI2OoB6k8Ejhopt6imyIHzdBanj6diRa6PxP+7vEdCebGw2VgVk3DDww3sDvd9P+KewijuaBoizpNDrrbjsia/LQI046CnYezRuFW6urYu6NSdb8PLLd7Il/DzfhmE2jpFHWT1p7fztdBlmS2DlDdOf8K9fdMX4wnyVEYfctbWudBDB23mDOQL+NBlHCrZlb5YhR5M/s7n+uHUiJTXuz722fugPK8oz07Jvntbt8lWbjsix1Vr/CjUfmpXImLPm9BRPMiuNFw7gN8FeBlh056TfMhpox6jlGPxlL2wXqEZmSz7zupJ4u5kctR332wQVHZWkHLGMcqvk/4hiS8FldmUJyU0Jfa7sW X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DS0PR12MB7726.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(376014)(1800799024)(7416014)(366016);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: Fnxc5f9Cjr5mEiK0BtcOITZuMYawuMOYmGZZRMUoiwXolXLT4BrVkyNlkAueQbwyw7dR0hprz5kXX6XVpLf8optoZeQftO5nh8Xw03KpDfD/dHTg/+gC0SzKmzPgReZCnf57bcri91qpig0yi6lbsUb8LF0FgCQkZqjcfRYi3n1MRE6piBtW18MBaxtjU4ihBtdy31s/MKUhgx5FTvjXw7x4w3ZUyMtmv3pZFPKsa0hACiUQFQyteKPOSOo/kdJVC4mhX2ogP0CzbnIYPySCiV9Afz+q4kQWMsSMZEOCoSI1brqTk/v8VZNGmRvYuViNFF+Ksl0UkA+t5ZU+zjfasgwfKB7eLiZfygkXRqIK2JwRNaNyjpT+49GTP/HC19ILUODR+TI+Qw6yqyl4FOPJjuEr7SRtEh00QsY0kNB8QATvfpz7vYobiQiQsbJxnPigZjr8QQW42WIqJKA/F0tByNHtAXG9rG/2HAQvWK/yldckhgACJohO6y1wOs5/CCfCKbx2JyHEhT225sKTFSFuTRviml2z1MSraLdHmnKwHy3tEYNNFBWyzalkktsuN0CYTq7sya/u9U9Hd7Upp+smBMplW0EdRQX1GIWvHdiE/NTefJakqdmfqT/qEIJ7AabAGDAau+/dDiUAlpMF9NDrq5M4zeL2CAZD3/HMCnMFfiEeQCMB2IEYxT6cFcksyU4/tBVaHLDbu8EwA5+G8IzFsSfLCenqYOveIuf9//1KdyW3p1m1lF30IqXJ896e1vYdNFcurMAaow3VIuArJd4pr98OS/WB3OLue44lLQDAatm0NRciynofjmRIHRbwPTKNGslKnX5PiRrc7MKkOJMhTRx2TCPHZB58ltnmiLVgmd/BlQ8T4yr5VZvX++i0woODtUd6cNJDwS2eo2CJRE8Tm4VjWJ9yfeE8sPYrjlILvj4l/ORYNThhQowDehZEE16FEp/TOYD/1l9aUnS8gGYV8QVjUo9gu7nJ27CvCH3tytgLWCCiHpDt3heJ54hXNrQYylj3U4QmY1k8pDxNWiLIjV9gSE2q9Ds4o66mI/QnKP9kYLnywULuWIV3n4smxUsOLkHZHMa0/draGFzywU4TXokfW26YPH67XuOF92tdBnOjB/bWWtOUGyXRliptu/SfWpG3jLV9srmys5xQLtMs9y/6XCS+gtBrAs/DvfkJ4IDfJ7sNP7JKbiyGhmwOyR8k1MV86UVt94vdWdX04ZmXdhpTD60iZrpPt/THLwg34VMaN1w3E6VxpzBjCJRP1sF+yWRrD2cd+d7qppfxeByKWJ6/6SeKkz/05s9pVYrXw+FYHHDleYJk4eM/WvgQAXzPHOhfwgs/nyR8RJJAOMNzF5Gb6J+Og2gDi9EOIR9exLsXhPlnReV01n5K2Rc/GxDchVPMDRPycbc4rHcmLRM9RRxVdy1yAs2JnbLemLyNIVrgIkX9HmUVQtHhMbun+glmpSnxEkkwAhnNEMNEuhYj3EcqyxQBRtimW+9IVVa5DPn/0BL34iANqgp5qvyqg7RurTDiIyrKqnYzd1nPV6OOa6cumXuNLwzOhiRvg1yyyE45JsNxo7zYecQLHvixjS1a X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 5e3fad04-c653-4db0-dc1a-08dd1e59add1 X-MS-Exchange-CrossTenant-AuthSource: DS0PR12MB7726.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Dec 2024 05:14:27.6502 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: /7IU1rlWvaFoKdupaHWizYoZo6EMMKaxzBW2umaJWGpoJPYPbVha0pm/OWAGAFymJZy1Rzn7jGIn0FCzYu2i8g== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH3PR12MB8936 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241216_211432_259972_74523CA6 X-CRM114-Status: GOOD ( 22.49 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Currently ZONE_DEVICE page reference counts are initialised by core memory management code in __init_zone_device_page() as part of the memremap() call which driver modules make to obtain ZONE_DEVICE pages. This initialises page refcounts to 1 before returning them to the driver. This was presumably done because it drivers had a reference of sorts on the page. It also ensured the page could always be mapped with vm_insert_page() for example and would never get freed (ie. have a zero refcount), freeing drivers of manipulating page reference counts. However it complicates figuring out whether or not a page is free from the mm perspective because it is no longer possible to just look at the refcount. Instead the page type must be known and if GUP is used a secondary pgmap reference is also sometimes needed. To simplify this it is desirable to remove the page reference count for the driver, so core mm can just use the refcount without having to account for page type or do other types of tracking. This is possible because drivers can always assume the page is valid as core kernel will never offline or remove the struct page. This means it is now up to drivers to initialise the page refcount as required. P2PDMA uses vm_insert_page() to map the page, and that requires a non-zero reference count when initialising the page so set that when the page is first mapped. Signed-off-by: Alistair Popple Reviewed-by: Dan Williams Acked-by: David Hildenbrand --- Changes since v2: - Initialise the page refcount for all pages covered by the kaddr --- drivers/pci/p2pdma.c | 13 +++++++++++-- mm/memremap.c | 17 +++++++++++++---- mm/mm_init.c | 22 ++++++++++++++++++---- 3 files changed, 42 insertions(+), 10 deletions(-) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 0cb7e0a..04773a8 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -140,13 +140,22 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj, rcu_read_unlock(); for (vaddr = vma->vm_start; vaddr < vma->vm_end; vaddr += PAGE_SIZE) { - ret = vm_insert_page(vma, vaddr, virt_to_page(kaddr)); + struct page *page = virt_to_page(kaddr); + + /* + * Initialise the refcount for the freshly allocated page. As + * we have just allocated the page no one else should be + * using it. + */ + VM_WARN_ON_ONCE_PAGE(!page_ref_count(page), page); + set_page_count(page, 1); + ret = vm_insert_page(vma, vaddr, page); if (ret) { gen_pool_free(p2pdma->pool, (uintptr_t)kaddr, len); return ret; } percpu_ref_get(ref); - put_page(virt_to_page(kaddr)); + put_page(page); kaddr += PAGE_SIZE; len -= PAGE_SIZE; } diff --git a/mm/memremap.c b/mm/memremap.c index 40d4547..07bbe0e 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -488,15 +488,24 @@ void free_zone_device_folio(struct folio *folio) folio->mapping = NULL; folio->page.pgmap->ops->page_free(folio_page(folio, 0)); - if (folio->page.pgmap->type != MEMORY_DEVICE_PRIVATE && - folio->page.pgmap->type != MEMORY_DEVICE_COHERENT) + switch (folio->page.pgmap->type) { + case MEMORY_DEVICE_PRIVATE: + case MEMORY_DEVICE_COHERENT: + put_dev_pagemap(folio->page.pgmap); + break; + + case MEMORY_DEVICE_FS_DAX: + case MEMORY_DEVICE_GENERIC: /* * Reset the refcount to 1 to prepare for handing out the page * again. */ folio_set_count(folio, 1); - else - put_dev_pagemap(folio->page.pgmap); + break; + + case MEMORY_DEVICE_PCI_P2PDMA: + break; + } } void zone_device_page_init(struct page *page) diff --git a/mm/mm_init.c b/mm/mm_init.c index 24b68b4..f021e63 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1017,12 +1017,26 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, } /* - * ZONE_DEVICE pages are released directly to the driver page allocator - * which will set the page count to 1 when allocating the page. + * ZONE_DEVICE pages other than MEMORY_TYPE_GENERIC and + * MEMORY_TYPE_FS_DAX pages are released directly to the driver page + * allocator which will set the page count to 1 when allocating the + * page. + * + * MEMORY_TYPE_GENERIC and MEMORY_TYPE_FS_DAX pages automatically have + * their refcount reset to one whenever they are freed (ie. after + * their refcount drops to 0). */ - if (pgmap->type == MEMORY_DEVICE_PRIVATE || - pgmap->type == MEMORY_DEVICE_COHERENT) + switch (pgmap->type) { + case MEMORY_DEVICE_PRIVATE: + case MEMORY_DEVICE_COHERENT: + case MEMORY_DEVICE_PCI_P2PDMA: set_page_count(page, 0); + break; + + case MEMORY_DEVICE_FS_DAX: + case MEMORY_DEVICE_GENERIC: + break; + } } /*