From patchwork Tue Jul 25 10:02:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 13326255 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74D58C001DE for ; Tue, 25 Jul 2023 10:02:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 177976B007B; Tue, 25 Jul 2023 06:02:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1267B8D0001; Tue, 25 Jul 2023 06:02:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 01EA76B007E; Tue, 25 Jul 2023 06:02:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E7CBF6B007B for ; Tue, 25 Jul 2023 06:02:51 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id BFCF7160737 for ; Tue, 25 Jul 2023 10:02:51 +0000 (UTC) X-FDA: 81049695342.12.327028E Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf05.hostedemail.com (Postfix) with ESMTP id 43321100012 for ; Tue, 25 Jul 2023 10:02:48 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=PoOjFh9I; spf=pass (imf05.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690279369; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=m/N/ZGS4poz7rvG6PnELWOzMCVASA/Tj6XRMe7JZvjw=; b=nlTPqPp8qni1kUGOkj8UO/VzyqVCNToj8003rTrZiCIgBGvumM2ZBdG7U6i6vFKk+WCPJB N2V2Lqf2A5zXdUKMbNRKNEVqD3YTGZR/S0XM1bA1huyi7QKXP0GT11T92wKZPUBwxjoySp DBX4L988b2nNOSqtgnQ311rU0I1crrE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690279369; a=rsa-sha256; cv=none; b=o6G7gzqBWtaEYs8ZC2qGKrCCBrk0Q3L+INCgMl7PtJkxudbs8JoH6lvME6f2CnEd9R+B84 vurhve7u1mKscivyZEvPnifC9iEqb9inUmCfKz0rdRZwTMFRI7IB1csHjsTQKs5eR79g9F Ya4sniy39tpFZVQsBDmTI3i+/G8f0h4= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=PoOjFh9I; spf=pass (imf05.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com Received: from pps.filterd (m0353724.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 36P9dIjp027204; Tue, 25 Jul 2023 10:02:42 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=m/N/ZGS4poz7rvG6PnELWOzMCVASA/Tj6XRMe7JZvjw=; b=PoOjFh9INOvArDvZ2x0rlM5C8LXM6A1p8G/Oavu4Ux7sKuvSNN/DrmAvnaoOv5IQ86IT ooNYM9qCYT7+li/kIZtWP25qTYWb7cDVbQxhraM4TTp6mWrN++xZg3qO08oVhFH3tbhX J770FAEHPVA2jvY+4zwwc8zm+VCbKpRq0FMgupZgxnchgWNB/btEeK3nCCITQvWBo35t pBf1LixkbmrX0ewwUJEYZ7NUjTzHun+sQ0QflMC5OutSE5cGDf+HicURBKiVVKDrnMG/ WKJ7LeNBJFF2MOmAK6SChZZFNt4E5FMgnYBylaK/CuLsde/jQQ+Hb35PFBE4ilOi1ad/ Ig== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3s206365k7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 25 Jul 2023 10:02:42 +0000 Received: from m0353724.ppops.net (m0353724.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 36P9e96V030089; Tue, 25 Jul 2023 10:02:41 GMT Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3s206365j1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 25 Jul 2023 10:02:41 +0000 Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 36P8S1DM002059; Tue, 25 Jul 2023 10:02:39 GMT Received: from smtprelay04.wdc07v.mail.ibm.com ([172.16.1.71]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3s0temu1bu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 25 Jul 2023 10:02:39 +0000 Received: from smtpav02.dal12v.mail.ibm.com (smtpav02.dal12v.mail.ibm.com [10.241.53.101]) by smtprelay04.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 36PA2dUB33292804 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 25 Jul 2023 10:02:39 GMT Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 11D0858062; Tue, 25 Jul 2023 10:02:39 +0000 (GMT) Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5092A5805E; Tue, 25 Jul 2023 10:02:35 +0000 (GMT) Received: from skywalker.in.ibm.com (unknown [9.109.212.144]) by smtpav02.dal12v.mail.ibm.com (Postfix) with ESMTP; Tue, 25 Jul 2023 10:02:35 +0000 (GMT) From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, akpm@linux-foundation.org, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, npiggin@gmail.com, christophe.leroy@csgroup.eu Cc: Oscar Salvador , David Hildenbrand , Michal Hocko , Vishal Verma , "Aneesh Kumar K.V" Subject: [PATCH v5 4/7] mm/hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks Date: Tue, 25 Jul 2023 15:32:09 +0530 Message-ID: <20230725100212.531277-5-aneesh.kumar@linux.ibm.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230725100212.531277-1-aneesh.kumar@linux.ibm.com> References: <20230725100212.531277-1-aneesh.kumar@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: mexUxeru00NjjcMvWOlGxMknQ1u94Xm7 X-Proofpoint-ORIG-GUID: lvvumE8sev8_oknKHlQ6RnkRzSV8q_w_ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-07-25_04,2023-07-24_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 malwarescore=0 priorityscore=1501 bulkscore=0 phishscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2306200000 definitions=main-2307250082 X-Rspamd-Queue-Id: 43321100012 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: ou7azq9g9nygrnsw374nbt6rig9q8xbz X-HE-Tag: 1690279368-268556 X-HE-Meta: U2FsdGVkX1+0HpV2CnxbbdN6NHHqYRUlSFCzq3e7h8eHpkWliEe3xUxX7Uv3Ta6/2rNvyZYUS6pmFlomiLuu0qTyRTUA1HPyNogFv7foh/g0Cl7AYyX44ppO/Vc8wNMm17zn/dPZEEzdgClpvsMdbzgUVKIssfQ4nz/JAjB3iffP9GtFyM59fLC3eO0lfxuc7QQnXBeSeGbK2iMNVYt+uZjvbHLFrdXRsnSgbg3PGSweDN/D178pxDnUGQVnHPzQoHbF48hOpYMfVhsATEss4QPrlWTy9SHdcU7A/972H9nEzkf/U7SgsA809dt+iP0InPiez7hRGgf5mdaeOba1HlKZoslSu0XpUDdtoYqphYZ3kL4onnnz8SXleTDQvMVDMhNkb0hwd2Wy1ZcZSL8aO8xTweO92rK5Gjq5OBwLl1J9iDGY4ux79PewUkWqqfwCkbUhv0DZUrclccrPPBciwqobDLJqpTzec5Pwp5mRDQOPVnEXJRsWYyqV7cLIg0/RqQmztjENB2ffsuuRNvhOgjGWbxhhl5ys5IWMd8b18QQ+OxsKwsO5abIXYmoKT0GJUfj04ADz9A9bOEmmXLqphINbZgnfMc6HRDb30k/f23Dq0+NbvIcjEjiynorWo29PtrQdRn59ne4wAmzKb5deHY39ORb3dw+7sBvoPwn+cuv5asI1Bvq9XZahIqTCVulJx51yYc//J68Pt5Rr0FNBECox70BapDgtA5au9HGVB1Nn2lJLA+R2OfC4FooeawA5LYNUgJzw/jQ4xqVTZffCGT8jAEVczbzFJ8SAWaxnax9kuVKrnm3b+xsQr1394Rs0xaF34DlX2Cc7DseyKhGb4zJZFZ0+lvyrkFUorONDEkOBuNUJVSf6jLN7FeirqBiXJ4j1tlw7mtXtY5YnH9Hsan83UaBKlV94HYUV/Ao1p+o4UeGDqg058bf5Uq08DcCGZCEJcek1c4qp+u14qsr neSqbHFu E+lpZpO8PijROu9yZUGbj34e9/lwNZkTxuPIPv93dLTkyzNj63HWdW3znsTrpl4AJ3oVLlnJ3sEtRhWgauU8SoGMqQEEQTGO2lbHzgkXGyQd3zGsgB/V9QfkDSliGpWPsXRM++2V1xhLeFrENOk6R9F94a5R0UaeREbwQNASN0VZSt0kVA+Un/1l4Y2fVeFxwDgKm89h7sMY1rYWwjqDdcWIJE+6NdrjNADorGJiptLh/5GA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, memmap_on_memory feature is only supported with memory block sizes that result in vmemmap pages covering full page blocks. This is because memory onlining/offlining code requires applicable ranges to be pageblock-aligned, for example, to set the migratetypes properly. This patch helps to lift that restriction by reserving more pages than required for vmemmap space. This helps the start address to be page block aligned with different memory block sizes. Using this facility implies the kernel will be reserving some pages for every memoryblock. This allows the memmap on memory feature to be widely useful with different memory block size values. For ex: with 64K page size and 256MiB memory block size, we require 4 pages to map vmemmap pages, To align things correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages 28 pages get reserved. Signed-off-by: Aneesh Kumar K.V Acked-by: David Hildenbrand --- .../admin-guide/mm/memory-hotplug.rst | 12 ++ mm/memory_hotplug.c | 121 ++++++++++++++++-- 2 files changed, 119 insertions(+), 14 deletions(-) diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst index bd77841041af..2994958c7ce8 100644 --- a/Documentation/admin-guide/mm/memory-hotplug.rst +++ b/Documentation/admin-guide/mm/memory-hotplug.rst @@ -433,6 +433,18 @@ The following module parameters are currently defined: memory in a way that huge pages in bigger granularity cannot be formed on hotplugged memory. + + With value "force" it could result in memory + wastage due to memmap size limitations. For + example, if the memmap for a memory block + requires 1 MiB, but the pageblock size is 2 + MiB, 1 MiB of hotplugged memory will be wasted. + Note that there are still cases where the + feature cannot be enforced: for example, if the + memmap is smaller than a single page, or if the + architecture does not support the forced mode + in all configurations. + ``online_policy`` read-write: Set the basic policy used for automatic zone selection when onlining memory blocks without specifying a target zone. diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 457824a6ecb8..5b472e137898 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -41,17 +41,89 @@ #include "internal.h" #include "shuffle.h" +enum { + MEMMAP_ON_MEMORY_DISABLE = 0, + MEMMAP_ON_MEMORY_ENABLE, + MEMMAP_ON_MEMORY_FORCE, +}; + +static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE; + +static inline unsigned long memory_block_memmap_pages(void) +{ + unsigned long memmap_size; + + memmap_size = PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page); + return memmap_size >> PAGE_SHIFT; +} + +static inline unsigned long memory_block_memmap_on_memory_pages(void) +{ + unsigned long nr_pages = memory_block_memmap_pages(); + + /* + * In "forced" memmap_on_memory mode, we add extra pages to align the + * vmemmap size to cover full pageblocks. That way, we can add memory + * even if the vmemmap size is not properly aligned, however, we might waste + * memory. + */ + if (memmap_mode == MEMMAP_ON_MEMORY_FORCE) + return pageblock_align(nr_pages); + return nr_pages; +} + #ifdef CONFIG_MHP_MEMMAP_ON_MEMORY /* * memory_hotplug.memmap_on_memory parameter */ -static bool memmap_on_memory __ro_after_init; -module_param(memmap_on_memory, bool, 0444); -MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug"); +static int set_memmap_mode(const char *val, const struct kernel_param *kp) +{ + int ret, mode; + bool enabled; + + if (sysfs_streq(val, "force") || sysfs_streq(val, "FORCE")) { + mode = MEMMAP_ON_MEMORY_FORCE; + goto matched; + } + + ret = kstrtobool(val, &enabled); + if (ret < 0) + return ret; + if (enabled) + mode = MEMMAP_ON_MEMORY_ENABLE; + else + mode = MEMMAP_ON_MEMORY_DISABLE; + +matched: + *((int *)kp->arg) = mode; + if (mode == MEMMAP_ON_MEMORY_FORCE) { + unsigned long memmap_pages = memory_block_memmap_on_memory_pages(); + + pr_info("Memory hotplug will reserve %ld pages in each memory block\n", + memmap_pages - memory_block_memmap_pages()); + } + return 0; +} + +static int get_memmap_mode(char *buffer, const struct kernel_param *kp) +{ + if (*((int *)kp->arg) == MEMMAP_ON_MEMORY_FORCE) + return sprintf(buffer, "force\n"); + return param_get_bool(buffer, kp); +} + +static const struct kernel_param_ops memmap_mode_ops = { + .set = set_memmap_mode, + .get = get_memmap_mode, +}; +module_param_cb(memmap_on_memory, &memmap_mode_ops, &memmap_mode, 0444); +MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug\n" + "With value \"force\" it could result in memory wastage due " + "to memmap size limitations (Y/N/force)"); static inline bool mhp_memmap_on_memory(void) { - return memmap_on_memory; + return memmap_mode != MEMMAP_ON_MEMORY_DISABLE; } #else static inline bool mhp_memmap_on_memory(void) @@ -1266,7 +1338,7 @@ static bool mhp_supports_memmap_on_memory(unsigned long size) { unsigned long nr_vmemmap_pages = size >> PAGE_SHIFT; unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page); - unsigned long remaining_size = size - vmemmap_size; + unsigned long memmap_pages = memory_block_memmap_on_memory_pages(); /* * Besides having arch support and the feature enabled at runtime, we @@ -1294,10 +1366,28 @@ static bool mhp_supports_memmap_on_memory(unsigned long size) * altmap as an alternative source of memory, and we do not exactly * populate a single PMD. */ - return mhp_memmap_on_memory() && - size == memory_block_size_bytes() && - IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)) && - arch_supports_memmap_on_memory(size); + if (!mhp_memmap_on_memory() || size != memory_block_size_bytes()) + return false; + + /* + * Make sure the vmemmap allocation is fully contained + * so that we always allocate vmemmap memory from altmap area. + */ + if (!IS_ALIGNED(vmemmap_size, PAGE_SIZE)) + return false; + + /* + * start pfn should be pageblock_nr_pages aligned for correctly + * setting migrate types + */ + if (!pageblock_aligned(memmap_pages)) + return false; + + if (memmap_pages == PHYS_PFN(memory_block_size_bytes())) + /* No effective hotplugged memory doesn't make sense. */ + return false; + + return arch_supports_memmap_on_memory(size); } /* @@ -1310,7 +1400,10 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) { struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) }; enum memblock_flags memblock_flags = MEMBLOCK_NONE; - struct vmem_altmap mhp_altmap = {}; + struct vmem_altmap mhp_altmap = { + .base_pfn = PHYS_PFN(res->start), + .end_pfn = PHYS_PFN(res->end), + }; struct memory_group *group = NULL; u64 start, size; bool new_node = false; @@ -1355,8 +1448,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) */ if (mhp_flags & MHP_MEMMAP_ON_MEMORY) { if (mhp_supports_memmap_on_memory(size)) { - mhp_altmap.free = PHYS_PFN(size); - mhp_altmap.base_pfn = PHYS_PFN(start); + mhp_altmap.free = memory_block_memmap_on_memory_pages(); params.altmap = &mhp_altmap; } /* fallback to not using altmap */ @@ -1368,8 +1460,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) goto error; /* create memory block devices after memory was added */ - ret = create_memory_block_devices(start, size, mhp_altmap.alloc, - group); + ret = create_memory_block_devices(start, size, mhp_altmap.free, group); if (ret) { arch_remove_memory(start, size, NULL); goto error; @@ -2095,6 +2186,8 @@ static int __ref try_remove_memory(u64 start, u64 size) * right thing if we used vmem_altmap when hot-adding * the range. */ + mhp_altmap.base_pfn = PHYS_PFN(start); + mhp_altmap.free = nr_vmemmap_pages; mhp_altmap.alloc = nr_vmemmap_pages; altmap = &mhp_altmap; }