From patchwork Thu Jul 6 08:50:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 13303389 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2E6FEB64D9 for ; Thu, 6 Jul 2023 09:04:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 492F98D0005; Thu, 6 Jul 2023 05:04:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 442FA8D0001; Thu, 6 Jul 2023 05:04:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E64E8D0005; Thu, 6 Jul 2023 05:04:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1E1DA8D0001 for ; Thu, 6 Jul 2023 05:04:54 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E47AF1C8F45 for ; Thu, 6 Jul 2023 09:04:53 +0000 (UTC) X-FDA: 80980602066.16.699058B Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf25.hostedemail.com (Postfix) with ESMTP id 93014A0004 for ; Thu, 6 Jul 2023 09:04:51 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=EZWxo9+z; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf25.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688634291; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=O/+NIHsmH+q4b9QJCEQ09zAxuImTZVSIxeXyQXIppyA=; b=snZgACwXdFwJrsvTmVmmrEX65DM20fEoDc+1gLS3mF5mB6DilC+UcZVfXZ4iSSKkZAEWLY AIG+rER89Ppqy/Oul/mCEqmGa6nJpXPKMfxTEIJf0fV/yFIHlJLSf9pPxix1QI+6d+cIix /IPHJsDEuToaesl0mf8EszJH90wJUv0= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=EZWxo9+z; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf25.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688634291; a=rsa-sha256; cv=none; b=MRscZ4zs8Tj9W8oV0BecrGzqZZpusdLdv0zy1c3mklp6rdRm5z9xesp0xqJby4TM+uB3up 4OB/7IwquEBJRE6SWalMIbswefjC3CCwUbNZRdtE5ZVBbWv3wbCZ1kZngOTGc712iWIj3m ia6BuNaijFxY9vt+HrriHTwTL1NtZEI= Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3668wcjs021260; Thu, 6 Jul 2023 09:04:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=O/+NIHsmH+q4b9QJCEQ09zAxuImTZVSIxeXyQXIppyA=; b=EZWxo9+z2N6d7/pH4wpxx5j2HWz8ruubVzE/DFEPwemsFisKk013dtp0YpVuvCWr9Ecw 5lRilpmfCXnHm8PKPm3+XdYLNCHZMtV8DaEfPqLhy93p8s6tjD7OsiBeX7GSAxK4Q8Fy /gIqfvJA97NZjyR7bPhXz/4KCrrhtiqUSXnb3ylF+LGvVDPMZix/uFbYldig563gWCYM QTlJe7KA/Vbf+klfru5J55afNIvrYTs58+pG8IMS6TRMK6FCob6Ki4Aj++5Gb5Fx1ACV tz+tgLqsOp4x+WGK5AKEtXKMG3rkY3pOW72Dqi8Ha5V97jnnLQFRzcPjnrZx85yezS6+ Qg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rntjv843u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 06 Jul 2023 09:04:42 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3668xZ5I025076; Thu, 6 Jul 2023 09:04:41 GMT Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com [169.63.121.186]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rntjv8432-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 06 Jul 2023 09:04:41 +0000 Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1]) by ppma03wdc.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 366866ZB017422; Thu, 6 Jul 2023 08:51:05 GMT Received: from smtprelay07.wdc07v.mail.ibm.com ([9.208.129.116]) by ppma03wdc.us.ibm.com (PPS) with ESMTPS id 3rjbs5ywrm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 06 Jul 2023 08:51:05 +0000 Received: from smtpav01.dal12v.mail.ibm.com (smtpav01.dal12v.mail.ibm.com [10.241.53.100]) by smtprelay07.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3668p4nY64356792 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 6 Jul 2023 08:51:04 GMT Received: from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 32FEE5805D; Thu, 6 Jul 2023 08:51:04 +0000 (GMT) Received: from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4FDE258057; Thu, 6 Jul 2023 08:51:00 +0000 (GMT) Received: from skywalker.in.ibm.com (unknown [9.109.212.144]) by smtpav01.dal12v.mail.ibm.com (Postfix) with ESMTP; Thu, 6 Jul 2023 08:50:59 +0000 (GMT) From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, akpm@linux-foundation.org, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, npiggin@gmail.com, christophe.leroy@csgroup.eu Cc: Oscar Salvador , David Hildenbrand , Michal Hocko , Vishal Verma , "Aneesh Kumar K.V" Subject: [PATCH v2 1/5] mm/hotplug: Embed vmem_altmap details in memory block Date: Thu, 6 Jul 2023 14:20:37 +0530 Message-ID: <20230706085041.826340-2-aneesh.kumar@linux.ibm.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230706085041.826340-1-aneesh.kumar@linux.ibm.com> References: <20230706085041.826340-1-aneesh.kumar@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 3PrySruIznShIalK4Nj5VTvcvWY-MIw4 X-Proofpoint-GUID: 97ziFc4pemDXgK3nV0F6sNLEuBhkrYfW X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-07-06_05,2023-07-06_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 clxscore=1015 priorityscore=1501 phishscore=0 suspectscore=0 bulkscore=0 mlxlogscore=535 spamscore=0 impostorscore=0 lowpriorityscore=0 mlxscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2307060080 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 93014A0004 X-Stat-Signature: 5arkxq3nb6h4s7ero1wcdgzjgqyxzctm X-HE-Tag: 1688634291-383447 X-HE-Meta: U2FsdGVkX18iRKj0hkvKbHNSNUIorl9JCg7ZaK4Glv3amAAyAHDY4FecdBpA6qWo7WFbIFVYiLhyujJ83+hTnBR260nR+8JIjYSpnpFfWk6vJV4pEv6Fn6JpLhG5jaHt1SafAFzRZ+RQGvn2s5roUKou/2m8czAIK8ahgZxFOxXC5Gpb/Zl1ddmA2kV1Hm6N8dNm/n55HfSgHny9ltnGfJfJ0G5Evd6p7PxfMi169M+9CKcHfrVXfGGDwQiih5LbOE/nsPw74AkTKXJ+vUY43+1069ISMAZwbiE1tTI+KBTC4rh4Y0AL/+/ccOSl1IeFV3vkbiUVo3mJtDvIl8BluMHnisZOp6Ttyoc3knXwRozdoTRNKT0PlNuknfV9NQn0XUTMD+OqcN+CSi9z6hc/6wiG3T5tm3CaoIzkTEf/P/WhFSTiqKiGq0Ox5wSbbuyPGgW3euKfmO6RXqeacrt6B69wpCXHBM0ZoSBij+E962D2to3xcoMgD9/HE/kDHQAvISs4wuTRnqGbasyGPcusgiuzP/THU6woBsXtu8hstkgDdblJUfFk30CNdIDFy6Jj4sNaztNPsjg4aFLpTHuPZyZ6I6WN2XFmcEsTMJlQdQGZ5ZCjTfshpB+z6ptlQLLpwfhDzKzGVlkBRvPbuYu7inTSXc1wVKPgytWuE2rgTVtLlICRmd9c3vxHEAjtSFqqjFYq5sWAb6MwgOHIPs+mTzWK1ZwJ1D7B36gV4hFjj8vCYe1pOeTwpy+UcP3HLg2/ESOCtKq45w3Ic6Hzry6gH7oXz4qJzIGQwRIsTYTCt2lC+O2Z5YYHAYsNxLQdYqTabTCLLljGLamOVPkmOIoSs3zEI+kteb0AnIhwaXnR70OsHwyPt+m9qlGFgkjWwISQcO9gHvJ/jQe9uHoiidoeIVV7xHwKmc2lSHGMAcLJ/YnbHtsdmbS67WeZF/gKWE8BpQZjVOKIsEqN2DEvoRv dHZVpOay 3wA0FHz3fPJeIj94W/WR2gWUPIrcs1YLBQEVFwA6MXXG3fa0Qr2DCS8cEMDxrCA0aDFUR6Cp+zrYJ7OYI935NxLPqNuS0+T3axPSPqodi+z6GtLaKb878J+V+IWPABDGOF0M1uP57Z5KNdEoAwkN72ln6pqRg88kxi56mgCo0fEK1bsN0FENtYE2gtO8qxUf0bLxKkmAmrynXt7lAqxLkzbTVIduSRFZR2HX+P+fjbHY4n0MjrDvv46oVMoYhCPICmXuSTO6K827rl2zChPGrIh/WDA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: With memmap on memory, some architecture needs more details w.r.t altmap such as base_pfn, end_pfn, etc to unmap vmemmap memory. Embed vmem_altmap data structure to memory_bock and use that instead of nr_vmemmap_pages. On memory unplug, if the kernel finds any memory block in the range to be using vmem_altmap, the kernel fails to unplug the memory if the request is not a single memory block unplug. Signed-off-by: Aneesh Kumar K.V --- drivers/base/memory.c | 28 +++++++++++++++++++--------- include/linux/memory.h | 25 +++++++++++++++++++------ include/linux/memremap.h | 18 +----------------- mm/memory_hotplug.c | 31 ++++++++++++++----------------- 4 files changed, 53 insertions(+), 49 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index b456ac213610..523cc1d37c81 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -106,6 +106,7 @@ static void memory_block_release(struct device *dev) { struct memory_block *mem = to_memory_block(dev); + WARN(mem->altmap.alloc, "Altmap not fully unmapped"); kfree(mem); } @@ -183,7 +184,7 @@ static int memory_block_online(struct memory_block *mem) { unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; - unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages; + unsigned long nr_vmemmap_pages = 0; struct zone *zone; int ret; @@ -200,6 +201,9 @@ static int memory_block_online(struct memory_block *mem) * stage helps to keep accounting easier to follow - e.g vmemmaps * belong to the same zone as the memory they backed. */ + if (mem->altmap.alloc) + nr_vmemmap_pages = mem->altmap.alloc + mem->altmap.reserve; + if (nr_vmemmap_pages) { ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone); if (ret) @@ -230,7 +234,7 @@ static int memory_block_offline(struct memory_block *mem) { unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; - unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages; + unsigned long nr_vmemmap_pages = 0; int ret; if (!mem->zone) @@ -240,6 +244,9 @@ static int memory_block_offline(struct memory_block *mem) * Unaccount before offlining, such that unpopulated zone and kthreads * can properly be torn down in offline_pages(). */ + if (mem->altmap.alloc) + nr_vmemmap_pages = mem->altmap.alloc + mem->altmap.reserve; + if (nr_vmemmap_pages) adjust_present_page_count(pfn_to_page(start_pfn), mem->group, -nr_vmemmap_pages); @@ -726,7 +733,7 @@ void memory_block_add_nid(struct memory_block *mem, int nid, #endif static int add_memory_block(unsigned long block_id, unsigned long state, - unsigned long nr_vmemmap_pages, + struct vmem_altmap *altmap, struct memory_group *group) { struct memory_block *mem; @@ -744,7 +751,10 @@ static int add_memory_block(unsigned long block_id, unsigned long state, mem->start_section_nr = block_id * sections_per_block; mem->state = state; mem->nid = NUMA_NO_NODE; - mem->nr_vmemmap_pages = nr_vmemmap_pages; + if (altmap) + memcpy(&mem->altmap, altmap, sizeof(*altmap)); + else + mem->altmap.alloc = 0; INIT_LIST_HEAD(&mem->group_next); #ifndef CONFIG_NUMA @@ -783,14 +793,14 @@ static int __init add_boot_memory_block(unsigned long base_section_nr) if (section_count == 0) return 0; return add_memory_block(memory_block_id(base_section_nr), - MEM_ONLINE, 0, NULL); + MEM_ONLINE, NULL, NULL); } static int add_hotplug_memory_block(unsigned long block_id, - unsigned long nr_vmemmap_pages, + struct vmem_altmap *altmap, struct memory_group *group) { - return add_memory_block(block_id, MEM_OFFLINE, nr_vmemmap_pages, group); + return add_memory_block(block_id, MEM_OFFLINE, altmap, group); } static void remove_memory_block(struct memory_block *memory) @@ -818,7 +828,7 @@ static void remove_memory_block(struct memory_block *memory) * Called under device_hotplug_lock. */ int create_memory_block_devices(unsigned long start, unsigned long size, - unsigned long vmemmap_pages, + struct vmem_altmap *altmap, struct memory_group *group) { const unsigned long start_block_id = pfn_to_block_id(PFN_DOWN(start)); @@ -832,7 +842,7 @@ int create_memory_block_devices(unsigned long start, unsigned long size, return -EINVAL; for (block_id = start_block_id; block_id != end_block_id; block_id++) { - ret = add_hotplug_memory_block(block_id, vmemmap_pages, group); + ret = add_hotplug_memory_block(block_id, altmap, group); if (ret) break; } diff --git a/include/linux/memory.h b/include/linux/memory.h index 31343566c221..87f12924250f 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -64,6 +64,23 @@ struct memory_group { }; }; +/** + * struct vmem_altmap - pre-allocated storage for vmemmap_populate + * @base_pfn: base of the entire dev_pagemap mapping + * @reserve: pages mapped, but reserved for driver use (relative to @base) + * @free: free pages set aside in the mapping for memmap storage + * @align: pages reserved to meet allocation alignments + * @alloc: track pages consumed, private to vmemmap_populate() + */ +struct vmem_altmap { + unsigned long base_pfn; + const unsigned long end_pfn; + const unsigned long reserve; + unsigned long free; + unsigned long align; + unsigned long alloc; +}; + struct memory_block { unsigned long start_section_nr; unsigned long state; /* serialized by the dev->lock */ @@ -77,11 +94,7 @@ struct memory_block { */ struct zone *zone; struct device dev; - /* - * Number of vmemmap pages. These pages - * lay at the beginning of the memory block. - */ - unsigned long nr_vmemmap_pages; + struct vmem_altmap altmap; struct memory_group *group; /* group (if any) for this block */ struct list_head group_next; /* next block inside memory group */ #if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_MEMORY_HOTPLUG) @@ -147,7 +160,7 @@ static inline int hotplug_memory_notifier(notifier_fn_t fn, int pri) extern int register_memory_notifier(struct notifier_block *nb); extern void unregister_memory_notifier(struct notifier_block *nb); int create_memory_block_devices(unsigned long start, unsigned long size, - unsigned long vmemmap_pages, + struct vmem_altmap *altmap, struct memory_group *group); void remove_memory_block_devices(unsigned long start, unsigned long size); extern void memory_dev_init(void); diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 1314d9c5f05b..4cb326f85302 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -2,6 +2,7 @@ #ifndef _LINUX_MEMREMAP_H_ #define _LINUX_MEMREMAP_H_ +#include #include #include #include @@ -10,23 +11,6 @@ struct resource; struct device; -/** - * struct vmem_altmap - pre-allocated storage for vmemmap_populate - * @base_pfn: base of the entire dev_pagemap mapping - * @reserve: pages mapped, but reserved for driver use (relative to @base) - * @free: free pages set aside in the mapping for memmap storage - * @align: pages reserved to meet allocation alignments - * @alloc: track pages consumed, private to vmemmap_populate() - */ -struct vmem_altmap { - unsigned long base_pfn; - const unsigned long end_pfn; - const unsigned long reserve; - unsigned long free; - unsigned long align; - unsigned long alloc; -}; - /* * Specialize ZONE_DEVICE memory into multiple types each has a different * usage. diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 3f231cf1b410..c4bac38cc147 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1354,7 +1354,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) goto error; /* create memory block devices after memory was added */ - ret = create_memory_block_devices(start, size, mhp_altmap.alloc, + ret = create_memory_block_devices(start, size, &mhp_altmap, group); if (ret) { arch_remove_memory(start, size, NULL); @@ -1956,12 +1956,18 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg) return 0; } -static int get_nr_vmemmap_pages_cb(struct memory_block *mem, void *arg) +static int get_vmemmap_altmap_cb(struct memory_block *mem, void *arg) { + struct vmem_altmap **altmap = (struct vmem_altmap **)arg; /* - * If not set, continue with the next block. + * If we have any pages allocated from altmap + * return the altmap details and break callback. */ - return mem->nr_vmemmap_pages; + if (mem->altmap.alloc) { + *altmap = &mem->altmap; + return 1; + } + return 0; } static int check_cpu_on_node(int nid) @@ -2036,9 +2042,8 @@ EXPORT_SYMBOL(try_offline_node); static int __ref try_remove_memory(u64 start, u64 size) { - struct vmem_altmap mhp_altmap = {}; + int ret; struct vmem_altmap *altmap = NULL; - unsigned long nr_vmemmap_pages; int rc = 0, nid = NUMA_NO_NODE; BUG_ON(check_hotplug_memory_range(start, size)); @@ -2060,24 +2065,16 @@ static int __ref try_remove_memory(u64 start, u64 size) * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in * the same granularity it was added - a single memory block. */ + if (mhp_memmap_on_memory()) { - nr_vmemmap_pages = walk_memory_blocks(start, size, NULL, - get_nr_vmemmap_pages_cb); - if (nr_vmemmap_pages) { + ret = walk_memory_blocks(start, size, &altmap, get_vmemmap_altmap_cb); + if (ret) { if (size != memory_block_size_bytes()) { pr_warn("Refuse to remove %#llx - %#llx," "wrong granularity\n", start, start + size); return -EINVAL; } - - /* - * Let remove_pmd_table->free_hugepage_table do the - * right thing if we used vmem_altmap when hot-adding - * the range. - */ - mhp_altmap.alloc = nr_vmemmap_pages; - altmap = &mhp_altmap; } }