From patchwork Thu May 6 15:26:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 12242443 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30041C433B4 for ; Thu, 6 May 2021 15:27:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 98BBF61956 for ; Thu, 6 May 2021 15:27:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 98BBF61956 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 270446B0071; Thu, 6 May 2021 11:27:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2468F6B0072; Thu, 6 May 2021 11:27:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 073D26B0073; Thu, 6 May 2021 11:27:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0071.hostedemail.com [216.40.44.71]) by kanga.kvack.org (Postfix) with ESMTP id DF8E56B0071 for ; Thu, 6 May 2021 11:27:05 -0400 (EDT) Received: from smtpin34.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 8FF916D83 for ; Thu, 6 May 2021 15:27:05 +0000 (UTC) X-FDA: 78111184410.34.39DAA7F Received: from new4-smtp.messagingengine.com (new4-smtp.messagingengine.com [66.111.4.230]) by imf09.hostedemail.com (Postfix) with ESMTP id 8D2D36000130 for ; Thu, 6 May 2021 15:26:55 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailnew.nyi.internal (Postfix) with ESMTP id C1697580B92; Thu, 6 May 2021 11:27:04 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Thu, 06 May 2021 11:27:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm3; bh=WM/6yw4DbUOeZ Qua2xBOphgta4VgYuLUaDHBkJrEgcU=; b=Zg+hQIwW1OH4i+LZYZkg6EMPR2e/z mwHnFDj5A3p1witMQW6QJ5AKdqQ2x5hE6ru7DK5DhvlmRC4ohNCZUwnZ7mbMVWJ/ vedp69BcjkrriRFJJ0EYUXOLcgjjb7by74IbDooOPu7WRP2pfwsfOz9aFv5xzohi DAK4M4MMBTBJg909uvnOyQQwFnVGvJTwZ78Sqy+r00ReBx2AdYlD3HC1Xn+nejpm aVAPnFyjG/F6HQTL+11bsp4f4EuYJeORcAKkjyl1vM46kzqpURLPIRuDrWxSgInG Lktr4oXw+03sv39JRvm/NC1A4nCjNJAJvX9eKSBJehplGZtKvIR4smHgg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=WM/6yw4DbUOeZQua2xBOphgta4VgYuLUaDHBkJrEgcU=; b=BMvmeNAr Ze+uMLX0HnQ+fxidLY20Gi/maR2IIbkK7Ip1rXT2+OVXEBl8fgmuQKiIVqkxrAp3 UKcjtwhNXJWDVbPhL+cnj0KUe/QighhpeZnOL7iLe9m1cx3yDv6OgLXqt7NHChCt b7YBnTNUss1+2Zz8cXpgZAEWvb76k2DIJ31Oo3IVKBpdGMh3neMIXK0pa3BNmm7e efWgEKQTSmJXiY6pH+hTu6r52iPcRXqzNzrMbWRr1NAr5tgfMZtUBBWuDJ4dIKE8 C61XlFdIqvkXCKNND2nQElVlEKRe0lL51usBbG6TwoMSIuBRo2gzFT0MULRz/oL4 PsE/Em6unM3rBA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrvdegtddgkeelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkffojghfrhgggfestdhqredtredttdenucfhrhhomhepkghiucgj rghnuceoiihirdihrghnsehsvghnthdrtghomheqnecuggftrfgrthhtvghrnhepieejue dvueduuefhgefhheeiuedvtedvuefgieegveetueeiueehtdegudehfeelnecukfhppeef hedrudeigedrvdegiedrfeegnecuvehluhhsthgvrhfuihiivgepudenucfrrghrrghmpe hmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomh X-ME-Proxy: Received: from Threadripper.local (ec2-35-164-246-34.us-west-2.compute.amazonaws.com [35.164.246.34]) by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 6 May 2021 11:27:00 -0400 (EDT) From: Zi Yan To: David Hildenbrand , Oscar Salvador Cc: Michael Ellerman , Benjamin Herrenschmidt , Thomas Gleixner , x86@kernel.org, Andy Lutomirski , "Rafael J . Wysocki" , Andrew Morton , Mike Rapoport , Anshuman Khandual , Michal Hocko , Dan Williams , Wei Yang , linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org, Zi Yan Subject: [RFC PATCH 3/7] mm: memory_hotplug: decouple memory_block size with section size. Date: Thu, 6 May 2021 11:26:19 -0400 Message-Id: <20210506152623.178731-4-zi.yan@sent.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210506152623.178731-1-zi.yan@sent.com> References: <20210506152623.178731-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=sent.com header.s=fm3 header.b=Zg+hQIwW; dkim=pass header.d=messagingengine.com header.s=fm2 header.b=BMvmeNAr; dmarc=pass (policy=none) header.from=sent.com; spf=pass (imf09.hostedemail.com: domain of zi.yan@sent.com designates 66.111.4.230 as permitted sender) smtp.mailfrom=zi.yan@sent.com X-Stat-Signature: ydtzzepfnbnfthfaf3ce93nk4ioyk5gd X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 8D2D36000130 Received-SPF: none (sent.com>: No applicable sender policy available) receiver=imf09; identity=mailfrom; envelope-from=""; helo=new4-smtp.messagingengine.com; client-ip=66.111.4.230 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1620314815-483144 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan To enable subsection memory online/offline, we need to remove the assumption of memory_block size being greater or equal to section size. The following changes are made: 1. use (start_pfn, nr_pages) pair to specify memory_block size instead of start_section_nr. 2. calculate memory_block id using phys / memory_block_size_bytes() instead of section number. The memory_block minimum size is set to the smaller of 128MB (the old x86_64 section size) and section size instead. Signed-off-by: Zi Yan --- drivers/base/memory.c | 176 ++++++++++++++++++++--------------------- drivers/base/node.c | 2 +- include/linux/memory.h | 8 +- mm/memory_hotplug.c | 6 +- 4 files changed, 98 insertions(+), 94 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index b31b3af5c490..141431eb64a4 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -50,19 +50,15 @@ int mhp_online_type_from_str(const char *str) static int sections_per_block; -static inline unsigned long memory_block_id(unsigned long section_nr) +static inline unsigned long phys_to_block_id(unsigned long phys) { - return section_nr / sections_per_block; + return phys / memory_block_size_bytes(); } static inline unsigned long pfn_to_block_id(unsigned long pfn) { - return memory_block_id(pfn_to_section_nr(pfn)); -} - -static inline unsigned long phys_to_block_id(unsigned long phys) -{ - return pfn_to_block_id(PFN_DOWN(phys)); + /* calculate using memory_block_size_bytes() */ + return phys_to_block_id(PFN_PHYS(pfn)); } static int memory_subsys_online(struct device *dev); @@ -118,7 +114,7 @@ static ssize_t phys_index_show(struct device *dev, struct memory_block *mem = to_memory_block(dev); unsigned long phys_index; - phys_index = mem->start_section_nr / sections_per_block; + phys_index = pfn_to_section_nr(mem->start_pfn); return sysfs_emit(buf, "%08lx\n", phys_index); } @@ -171,8 +167,8 @@ int memory_notify(unsigned long val, void *v) static int memory_block_online(struct memory_block *mem) { - unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); - unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; + unsigned long start_pfn = mem->start_pfn; + unsigned long nr_pages = mem->nr_pages; unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages; struct zone *zone; int ret; @@ -212,8 +208,8 @@ static int memory_block_online(struct memory_block *mem) static int memory_block_offline(struct memory_block *mem) { - unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); - unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; + unsigned long start_pfn = mem->start_pfn; + unsigned long nr_pages = mem->nr_pages; unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages; struct zone *zone; int ret; @@ -260,7 +256,7 @@ memory_block_action(struct memory_block *mem, unsigned long action) break; default: WARN(1, KERN_WARNING "%s(%ld, %ld) unknown action: " - "%ld\n", __func__, mem->start_section_nr, action, action); + "%ld\n", __func__, mem->start_pfn, mem->nr_pages, action); ret = -EINVAL; } @@ -366,7 +362,7 @@ static ssize_t phys_device_show(struct device *dev, struct device_attribute *attr, char *buf) { struct memory_block *mem = to_memory_block(dev); - unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); + unsigned long start_pfn = mem->start_pfn; return sysfs_emit(buf, "%d\n", arch_get_memory_phys_device(start_pfn)); @@ -390,8 +386,8 @@ static ssize_t valid_zones_show(struct device *dev, struct device_attribute *attr, char *buf) { struct memory_block *mem = to_memory_block(dev); - unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); - unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; + unsigned long start_pfn = mem->start_pfn; + unsigned long nr_pages = mem->nr_pages; struct zone *default_zone; int len = 0; int nid; @@ -575,16 +571,6 @@ static struct memory_block *find_memory_block_by_id(unsigned long block_id) return mem; } -/* - * Called under device_hotplug_lock. - */ -struct memory_block *find_memory_block(struct mem_section *section) -{ - unsigned long block_id = memory_block_id(__section_nr(section)); - - return find_memory_block_by_id(block_id); -} - static struct attribute *memory_memblk_attrs[] = { &dev_attr_phys_index.attr, &dev_attr_state.attr, @@ -614,7 +600,7 @@ int register_memory(struct memory_block *memory) int ret; memory->dev.bus = &memory_subsys; - memory->dev.id = memory->start_section_nr / sections_per_block; + memory->dev.id = memory->start_pfn / (memory_block_size_bytes() >> PAGE_SHIFT); memory->dev.release = memory_block_release; memory->dev.groups = memory_memblk_attr_groups; memory->dev.offline = memory->state == MEM_OFFLINE; @@ -633,57 +619,89 @@ int register_memory(struct memory_block *memory) return ret; } -static int init_memory_block(unsigned long block_id, unsigned long state, +static void unregister_memory(struct memory_block *memory) +{ + if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys)) + return; + + WARN_ON(xa_erase(&memory_blocks, memory->dev.id) == NULL); + + /* drop the ref. we got via find_memory_block() */ + put_device(&memory->dev); + device_unregister(&memory->dev); +} + +static int init_memory_blocks(unsigned long start_pfn, unsigned long num_pages, unsigned long state, unsigned long nr_vmemmap_pages) { struct memory_block *mem; int ret = 0; + unsigned long block_nr_pages = memory_block_size_bytes() / PAGE_SIZE; + unsigned long block_start_pfn; - mem = find_memory_block_by_id(block_id); - if (mem) { - put_device(&mem->dev); - return -EEXIST; - } - mem = kzalloc(sizeof(*mem), GFP_KERNEL); - if (!mem) - return -ENOMEM; - - mem->start_section_nr = block_id * sections_per_block; - mem->state = state; - mem->nid = NUMA_NO_NODE; - mem->nr_vmemmap_pages = nr_vmemmap_pages; + for (block_start_pfn = start_pfn; num_pages != 0; block_start_pfn += block_nr_pages) { + unsigned long block_id = pfn_to_block_id(block_start_pfn); - ret = register_memory(mem); - - return ret; + mem = find_memory_block_by_id(block_id); + if (mem) { + put_device(&mem->dev); + return -EEXIST; + } + mem = kzalloc(sizeof(*mem), GFP_KERNEL); + if (!mem) + return -ENOMEM; + + mem->start_pfn = block_start_pfn; + mem->nr_pages = min(num_pages, block_nr_pages); + mem->state = state; + mem->nid = NUMA_NO_NODE; + mem->nr_vmemmap_pages = nr_vmemmap_pages; + + ret = register_memory(mem); + + if (ret) { + unsigned long unregister_block_pfn; + + for (unregister_block_pfn = start_pfn; + unregister_block_pfn < block_start_pfn; + unregister_block_pfn -= block_nr_pages) { + block_id = pfn_to_block_id(unregister_block_pfn); + mem = find_memory_block_by_id(block_id); + if (WARN_ON_ONCE(!mem)) + continue; + unregister_memory(mem); + } + return -EINVAL; + } + if (num_pages > block_nr_pages) + num_pages -= block_nr_pages; + else + num_pages = 0; + } + return 0; } -static int add_memory_block(unsigned long base_section_nr) +static void add_whole_section_memory_block(unsigned long base_section_nr) { - int section_count = 0; - unsigned long nr; + int ret; + unsigned long start_pfn = section_nr_to_pfn(base_section_nr); + unsigned long nr_pages = 0; + struct mem_section *ms = __nr_to_section(base_section_nr); - for (nr = base_section_nr; nr < base_section_nr + sections_per_block; - nr++) - if (present_section_nr(nr)) - section_count++; + if (bitmap_full(ms->usage->subsection_map, SUBSECTIONS_PER_SECTION)) + nr_pages = PAGES_PER_SECTION; + else + nr_pages = PAGES_PER_SUBSECTION * + bitmap_weight(ms->usage->subsection_map, SUBSECTIONS_PER_SECTION); - if (section_count == 0) - return 0; - return init_memory_block(memory_block_id(base_section_nr), - MEM_ONLINE, 0); -} -static void unregister_memory(struct memory_block *memory) -{ - if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys)) + if (!nr_pages) return; - WARN_ON(xa_erase(&memory_blocks, memory->dev.id) == NULL); - - /* drop the ref. we got via find_memory_block() */ - put_device(&memory->dev); - device_unregister(&memory->dev); + ret = init_memory_blocks(start_pfn, nr_pages, MEM_ONLINE, 0); + if (ret) + panic("%s() failed to add memory block: %d\n", __func__, + ret); } /* @@ -696,31 +714,16 @@ static void unregister_memory(struct memory_block *memory) int create_memory_block_devices(unsigned long start, unsigned long size, unsigned long vmemmap_pages) { - const unsigned long start_block_id = pfn_to_block_id(PFN_DOWN(start)); - unsigned long end_block_id = pfn_to_block_id(PFN_DOWN(start + size)); - struct memory_block *mem; - unsigned long block_id; + unsigned long start_pfn = PFN_DOWN(start); + unsigned long end_pfn = PFN_DOWN(start + size); int ret = 0; if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) || !IS_ALIGNED(size, memory_block_size_bytes()))) return -EINVAL; - for (block_id = start_block_id; block_id != end_block_id; block_id++) { - ret = init_memory_block(block_id, MEM_OFFLINE, vmemmap_pages); - if (ret) - break; - } - if (ret) { - end_block_id = block_id; - for (block_id = start_block_id; block_id != end_block_id; - block_id++) { - mem = find_memory_block_by_id(block_id); - if (WARN_ON_ONCE(!mem)) - continue; - unregister_memory(mem); - } - } + ret = init_memory_blocks(start_pfn, end_pfn - start_pfn, MEM_OFFLINE, vmemmap_pages); + return ret; } @@ -807,10 +810,7 @@ void __init memory_dev_init(void) */ for (nr = 0; nr <= __highest_present_section_nr; nr += sections_per_block) { - ret = add_memory_block(nr); - if (ret) - panic("%s() failed to add memory block: %d\n", __func__, - ret); + add_whole_section_memory_block(nr); } } diff --git a/drivers/base/node.c b/drivers/base/node.c index 2c36f61d30bc..76d67b8ddf1b 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -809,7 +809,7 @@ static int register_mem_block_under_node_early(struct memory_block *mem_blk, void *arg) { unsigned long memory_block_pfns = memory_block_size_bytes() / PAGE_SIZE; - unsigned long start_pfn = section_nr_to_pfn(mem_blk->start_section_nr); + unsigned long start_pfn = mem_blk->start_pfn; unsigned long end_pfn = start_pfn + memory_block_pfns - 1; int nid = *(int *)arg; unsigned long pfn; diff --git a/include/linux/memory.h b/include/linux/memory.h index 97e92e8b556a..e9590c7c6a9e 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -21,10 +21,15 @@ #include #include +#if SECTION_SIZE_BITS > 27 /* 128MB */ +#define MIN_MEMORY_BLOCK_SIZE (1UL << 27) +#else #define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS) +#endif struct memory_block { - unsigned long start_section_nr; + unsigned long start_pfn; + unsigned long nr_pages; unsigned long state; /* serialized by the dev->lock */ int online_type; /* for passing data to online routine */ int nid; /* NID for this memory block */ @@ -90,7 +95,6 @@ int create_memory_block_devices(unsigned long start, unsigned long size, void remove_memory_block_devices(unsigned long start, unsigned long size); extern void memory_dev_init(void); extern int memory_notify(unsigned long val, void *v); -extern struct memory_block *find_memory_block(struct mem_section *); typedef int (*walk_memory_blocks_func_t)(struct memory_block *, void *); extern int walk_memory_blocks(unsigned long start, unsigned long size, void *arg, walk_memory_blocks_func_t func); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 70620d0dd923..6e93b0ecc5cb 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1872,8 +1872,8 @@ static int check_memblock_offlined_cb(struct memory_block *mem, void *arg) if (unlikely(ret)) { phys_addr_t beginpa, endpa; - beginpa = PFN_PHYS(section_nr_to_pfn(mem->start_section_nr)); - endpa = beginpa + memory_block_size_bytes() - 1; + beginpa = PFN_PHYS(mem->start_pfn); + endpa = beginpa + mem->nr_pages * PAGE_SIZE - 1; pr_warn("removing memory fails, because memory [%pa-%pa] is onlined\n", &beginpa, &endpa); @@ -2079,7 +2079,7 @@ static int try_offline_memory_block(struct memory_block *mem, void *arg) * with multiple zones within one memory block will be rejected * by offlining code ... so we don't care about that. */ - page = pfn_to_online_page(section_nr_to_pfn(mem->start_section_nr)); + page = pfn_to_online_page(mem->start_pfn); if (page && zone_idx(page_zone(page)) == ZONE_MOVABLE) online_type = MMOP_ONLINE_MOVABLE;