From patchwork Tue Aug 8 09:15:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 13345919 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60974C001DB for ; Tue, 8 Aug 2023 10:05:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DFDC86B0071; Tue, 8 Aug 2023 06:05:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DAE5E8D0002; Tue, 8 Aug 2023 06:05:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C4EF68D0001; Tue, 8 Aug 2023 06:05:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B22376B0071 for ; Tue, 8 Aug 2023 06:05:14 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 852698070D for ; Tue, 8 Aug 2023 10:05:14 +0000 (UTC) X-FDA: 81100504548.25.33639C1 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf01.hostedemail.com (Postfix) with ESMTP id EF35740011 for ; Tue, 8 Aug 2023 10:05:10 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=ppPYYxmr; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf01.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691489111; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TMwsNHQ5gDf4ED/4iTXEiIlkIpskq6hOlBxAXB/D+tA=; b=gM0i1tzg6jsPfOEFqb29JnGAZwj9wxU8/1q9IADQbq0IKBpYvX9Psevlg8wGXCqsnzNp8v H1zMSpPntkhar4NVR4jZQmvGuWslATPE22EtvyfPsN5uHB0Xeng3u0T51O5La7TZYk+m7w VxKaLAA10Kfe8DjdjXNCoS+v2oZt5tM= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=ppPYYxmr; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf01.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691489111; a=rsa-sha256; cv=none; b=TsWwgOiA3K/GFEd14f2IGhrMzI50/YhLO6M8hj1Or8fERCjobEkEKlewFUpZ+vq9dYrJyz U2P91gMAzEpgKFp/Bkes1NwR4eWAa0Fe7tVfzrdbH3VJBfB1KRtcXvoMU9rLTk7yT1lIgg wGuWDuPNP4XnI+wc4yfI/xZBxnmLC14= Received: from pps.filterd (m0353728.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3789BTpv030141; Tue, 8 Aug 2023 09:15:34 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=TMwsNHQ5gDf4ED/4iTXEiIlkIpskq6hOlBxAXB/D+tA=; b=ppPYYxmrky/ZPlRGybcxKr6wf5IefnZ6jeCLOFeiWJdN0xUFaJuIhbMdfh/RC7o3Af5y QlMAheuf+fvZqn3onrMmVHu1LUXtN07BKq7YcsnxcHD16deq0iMS6cwezAaL/ZgNvllv b3+NmRKiXXlpf85CZABh6hUUyMWHxX6pQKlwHvLXiVKlq9aPQjpxqL2zseZAWp31BuvE EV6BBxltBXlhv15dCzG1bQux0OJzmgdT8TztvWAh9mzlktR5m3T47Uzd3zAOmHMUQnxa h1uQYXOh6FRtYpjgUl5nTffwrEddJaVHMSe2aufggx0yA50T+jqVaeUO9QZ7nhI8+X9V ow== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3sbjny072y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 08 Aug 2023 09:15:34 +0000 Received: from m0353728.ppops.net (m0353728.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3789C9EK032115; Tue, 8 Aug 2023 09:15:33 GMT Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3sbjny0721-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 08 Aug 2023 09:15:33 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3789EAGu000368; Tue, 8 Aug 2023 09:15:32 GMT Received: from smtprelay03.dal12v.mail.ibm.com ([172.16.1.5]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3sa28kd0rm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 08 Aug 2023 09:15:32 +0000 Received: from smtpav03.dal12v.mail.ibm.com (smtpav03.dal12v.mail.ibm.com [10.241.53.102]) by smtprelay03.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3789FVCh5702356 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 8 Aug 2023 09:15:31 GMT Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 430FA5803F; Tue, 8 Aug 2023 09:15:31 +0000 (GMT) Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0EC1E58060; Tue, 8 Aug 2023 09:15:28 +0000 (GMT) Received: from skywalker.in.ibm.com (unknown [9.109.212.144]) by smtpav03.dal12v.mail.ibm.com (Postfix) with ESMTP; Tue, 8 Aug 2023 09:15:27 +0000 (GMT) From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, akpm@linux-foundation.org, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, npiggin@gmail.com, christophe.leroy@csgroup.eu Cc: Oscar Salvador , David Hildenbrand , Michal Hocko , Vishal Verma , "Aneesh Kumar K.V" Subject: [PATCH v8 6/6] mm/memory_hotplug: Embed vmem_altmap details in memory block Date: Tue, 8 Aug 2023 14:45:01 +0530 Message-ID: <20230808091501.287660-7-aneesh.kumar@linux.ibm.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230808091501.287660-1-aneesh.kumar@linux.ibm.com> References: <20230808091501.287660-1-aneesh.kumar@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: gHy0Eaaks8kPFuECqskL1tiSQYreoaoB X-Proofpoint-GUID: glyNei6psgrULgPyLemTkKxdcsVIDY_m X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-08-08_07,2023-08-03_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 lowpriorityscore=0 spamscore=0 malwarescore=0 bulkscore=0 phishscore=0 mlxlogscore=864 priorityscore=1501 suspectscore=0 impostorscore=0 clxscore=1015 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2306200000 definitions=main-2308080081 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: EF35740011 X-Stat-Signature: x5gqnwyum5zsohnfgcm13iyrj118kn7p X-HE-Tag: 1691489110-607411 X-HE-Meta: U2FsdGVkX1/PkyMRZ6E0Mrui1AbdBd6ddK63Ag6s2H8G039ddJKXw9hiJXF9yXgLklEbpyVV/DX5C72PUk2U8+WgfU30lsDQzACo0RSRVewK17qUURpO5UNRTcfrtkWXY4bfnizHGLmpwqbbpIWfiBx7Ae1Sef9Ubdg8F7kkLQiD+Omt7jbmni9Tl8omQVyayqX5TEIPz9cWgLRr5y5OYovOB2BomCKamnXVAIRav8m9n8zpQKrC3WHd12mplQWZ/DNQC7nerpnwCek1KBSYU8cxWTHzhXVefREVzgC1CAultPot9x/sL6puN3OHXWk0ad24wBNf1TLa0oF5aoQwxhxqPS8WscEsdBNDuqjSf0Gb6iKFKo2klX8kmfKosi6PZUOqz/VWnX6BNPudEbW/BQkMr3zvRVEqG5Mjw7j/43vzZxpWtYhWQa+xppl28WlhC/eRp0zntHy3aFL+VPHoNrcHtvDrV+H+FNKjzH+/+f9Lph9wvNr+dKN/s4ph1GgagaLVbR50ySRxb8fsQ1N+UPUEqlfrXDCEmwbQC6kJa27PB5CBV0m68cLGWtZWr4gDWwufQzgbwUY13AosBUHHeJGILVDlmy4WQTGOq4587u6n5OGIJlxgCo6XAKvjYBCqz7e90UsOSU06csQTWQwW4RR7zJm+RJRNYC/oUBDj7UTBcNV0MQsjUkdKhpd/qvk3IElx+s1cHQm6OeZfUrJ0bOunYKzhsJcyhhsaIpBz3QDFP1zjJhZSKSlcx8mM1EfQcrtmDXTa+oBS5T4UxZKYfMNxVMd87g/UVn++6zBuoILg/x5X68YIiaetCPDZ+CVYcfGHlhn/cADZgo7gmHYV6/SH8POcWZbXijDShFEpNbi9Ld0xUkMTGknjgETXnhO2ZlqB1dsNSN+pCrV2Gq+v+fxtbUa/NXMMDoFkkotX76F642a8kVEJdl3ZbhzN7EGwXSER4uAuPCW0K8Ym/xh moO7KhgN Vc+sXN8Z2RRiHvH1AtITR9HpEbzJKTakoXS03owiPyNWJ5cU6IFvm5Ukv8rh/2gziUEM6uBE5VyttawNgF7CQgQUKPsCwWK7SK/esbbbYsQZIMZNB3mZjTcOWjS1nx+vqfw3yV/qZRgqNUSUY9BQQ5CT+Rr8UkQKcYHjVKDNOIPBXSErqgEVUp3zLdwt/NQHyHo45fagYL/DszzbZyC5l6H4ddW+tdcG0R2njLX9Vn2F3bvmKckWedBmLnOTbxWRLtcNTzHYnaJtBUImJhP7SetakPQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: With memmap on memory, some architecture needs more details w.r.t altmap such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of computing them again when we remove a memory block, embed vmem_altmap details in struct memory_block if we are using memmap on memory block feature. Acked-by: Michal Hocko Acked-by: David Hildenbrand Signed-off-by: Aneesh Kumar K.V --- drivers/base/memory.c | 27 +++++++++++++-------- include/linux/memory.h | 8 ++----- mm/memory_hotplug.c | 54 ++++++++++++++++++++++++++---------------- 3 files changed, 52 insertions(+), 37 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index b456ac213610..8191709c9ad2 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -105,7 +105,8 @@ EXPORT_SYMBOL(unregister_memory_notifier); static void memory_block_release(struct device *dev) { struct memory_block *mem = to_memory_block(dev); - + /* Verify that the altmap is freed */ + WARN_ON(mem->altmap); kfree(mem); } @@ -183,7 +184,7 @@ static int memory_block_online(struct memory_block *mem) { unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; - unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages; + unsigned long nr_vmemmap_pages = 0; struct zone *zone; int ret; @@ -200,6 +201,9 @@ static int memory_block_online(struct memory_block *mem) * stage helps to keep accounting easier to follow - e.g vmemmaps * belong to the same zone as the memory they backed. */ + if (mem->altmap) + nr_vmemmap_pages = mem->altmap->free; + if (nr_vmemmap_pages) { ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone); if (ret) @@ -230,7 +234,7 @@ static int memory_block_offline(struct memory_block *mem) { unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; - unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages; + unsigned long nr_vmemmap_pages = 0; int ret; if (!mem->zone) @@ -240,6 +244,9 @@ static int memory_block_offline(struct memory_block *mem) * Unaccount before offlining, such that unpopulated zone and kthreads * can properly be torn down in offline_pages(). */ + if (mem->altmap) + nr_vmemmap_pages = mem->altmap->free; + if (nr_vmemmap_pages) adjust_present_page_count(pfn_to_page(start_pfn), mem->group, -nr_vmemmap_pages); @@ -726,7 +733,7 @@ void memory_block_add_nid(struct memory_block *mem, int nid, #endif static int add_memory_block(unsigned long block_id, unsigned long state, - unsigned long nr_vmemmap_pages, + struct vmem_altmap *altmap, struct memory_group *group) { struct memory_block *mem; @@ -744,7 +751,7 @@ static int add_memory_block(unsigned long block_id, unsigned long state, mem->start_section_nr = block_id * sections_per_block; mem->state = state; mem->nid = NUMA_NO_NODE; - mem->nr_vmemmap_pages = nr_vmemmap_pages; + mem->altmap = altmap; INIT_LIST_HEAD(&mem->group_next); #ifndef CONFIG_NUMA @@ -783,14 +790,14 @@ static int __init add_boot_memory_block(unsigned long base_section_nr) if (section_count == 0) return 0; return add_memory_block(memory_block_id(base_section_nr), - MEM_ONLINE, 0, NULL); + MEM_ONLINE, NULL, NULL); } static int add_hotplug_memory_block(unsigned long block_id, - unsigned long nr_vmemmap_pages, + struct vmem_altmap *altmap, struct memory_group *group) { - return add_memory_block(block_id, MEM_OFFLINE, nr_vmemmap_pages, group); + return add_memory_block(block_id, MEM_OFFLINE, altmap, group); } static void remove_memory_block(struct memory_block *memory) @@ -818,7 +825,7 @@ static void remove_memory_block(struct memory_block *memory) * Called under device_hotplug_lock. */ int create_memory_block_devices(unsigned long start, unsigned long size, - unsigned long vmemmap_pages, + struct vmem_altmap *altmap, struct memory_group *group) { const unsigned long start_block_id = pfn_to_block_id(PFN_DOWN(start)); @@ -832,7 +839,7 @@ int create_memory_block_devices(unsigned long start, unsigned long size, return -EINVAL; for (block_id = start_block_id; block_id != end_block_id; block_id++) { - ret = add_hotplug_memory_block(block_id, vmemmap_pages, group); + ret = add_hotplug_memory_block(block_id, altmap, group); if (ret) break; } diff --git a/include/linux/memory.h b/include/linux/memory.h index 31343566c221..f53cfdaaaa41 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -77,11 +77,7 @@ struct memory_block { */ struct zone *zone; struct device dev; - /* - * Number of vmemmap pages. These pages - * lay at the beginning of the memory block. - */ - unsigned long nr_vmemmap_pages; + struct vmem_altmap *altmap; struct memory_group *group; /* group (if any) for this block */ struct list_head group_next; /* next block inside memory group */ #if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG) @@ -147,7 +143,7 @@ static inline int hotplug_memory_notifier(notifier_fn_t fn, int pri) extern int register_memory_notifier(struct notifier_block *nb); extern void unregister_memory_notifier(struct notifier_block *nb); int create_memory_block_devices(unsigned long start, unsigned long size, - unsigned long vmemmap_pages, + struct vmem_altmap *altmap, struct memory_group *group); void remove_memory_block_devices(unsigned long start, unsigned long size); extern void memory_dev_init(void); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 76b813991bdc..f8d3e7427e32 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1439,7 +1439,11 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) if (mhp_flags & MHP_MEMMAP_ON_MEMORY) { if (mhp_supports_memmap_on_memory(size)) { mhp_altmap.free = memory_block_memmap_on_memory_pages(); - params.altmap = &mhp_altmap; + params.altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL); + if (!params.altmap) + goto error; + + memcpy(params.altmap, &mhp_altmap, sizeof(mhp_altmap)); } /* fallback to not using altmap */ } @@ -1447,13 +1451,13 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) /* call arch's memory hotadd */ ret = arch_add_memory(nid, start, size, ¶ms); if (ret < 0) - goto error; + goto error_free; /* create memory block devices after memory was added */ - ret = create_memory_block_devices(start, size, mhp_altmap.free, group); + ret = create_memory_block_devices(start, size, params.altmap, group); if (ret) { arch_remove_memory(start, size, NULL); - goto error; + goto error_free; } if (new_node) { @@ -1490,6 +1494,8 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) walk_memory_blocks(start, size, NULL, online_memory_block); return ret; +error_free: + kfree(params.altmap); error: if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) memblock_remove(start, size); @@ -2056,12 +2062,18 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg) return 0; } -static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg) +static int test_has_altmap_cb(struct memory_block *mem, void *arg) { + struct memory_block **mem_ptr = (struct memory_block **)arg; /* - * If not set, continue with the next block. + * return the memblock if we have altmap + * and break callback. */ - return mem->nr_vmemmap_pages; + if (mem->altmap) { + *mem_ptr = mem; + return 1; + } + return 0; } static int check_cpu_on_node(int nid) @@ -2136,10 +2148,9 @@ EXPORT_SYMBOL(try_offline_node); static int __ref try_remove_memory(u64 start, u64 size) { - struct vmem_altmap mhp_altmap = {}; - struct vmem_altmap *altmap = NULL; - unsigned long nr_vmemmap_pages; + struct memory_block *mem; int rc = 0, nid = NUMA_NO_NODE; + struct vmem_altmap *altmap = NULL; BUG_ON(check_hotplug_memory_range(start, size)); @@ -2161,25 +2172,20 @@ static int __ref try_remove_memory(u64 start, u64 size) * the same granularity it was added - a single memory block. */ if (mhp_memmap_on_memory()) { - nr_vmemmap_pages = walk_memory_blocks(start, size, NULL, - get_nr_vmemmap_pages_cb); - if (nr_vmemmap_pages) { + rc = walk_memory_blocks(start, size, &mem, test_has_altmap_cb); + if (rc) { if (size != memory_block_size_bytes()) { pr_warn("Refuse to remove %#llx - %#llx," "wrong granularity\n", start, start + size); return -EINVAL; } - + altmap = mem->altmap; /* - * Let remove_pmd_table->free_hugepage_table do the - * right thing if we used vmem_altmap when hot-adding - * the range. + * Mark altmap NULL so that we can add a debug + * check on memblock free. */ - mhp_altmap.base_pfn = PHYS_PFN(start); - mhp_altmap.free = nr_vmemmap_pages; - mhp_altmap.alloc = nr_vmemmap_pages; - altmap = &mhp_altmap; + mem->altmap = NULL; } } @@ -2196,6 +2202,12 @@ static int __ref try_remove_memory(u64 start, u64 size) arch_remove_memory(start, size, altmap); + /* Verify that all vmemmap pages have actually been freed. */ + if (altmap) { + WARN(altmap->alloc, "Altmap not fully unmapped"); + kfree(altmap); + } + if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) { memblock_phys_free(start, size); memblock_remove(start, size);