From patchwork Fri Mar 22 16:57:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10866295 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 39A261390 for ; Fri, 22 Mar 2019 17:10:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 153D52A8C0 for ; Fri, 22 Mar 2019 17:10:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 094552A8C6; Fri, 22 Mar 2019 17:10:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1C7862A8C0 for ; Fri, 22 Mar 2019 17:10:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E2296B0006; Fri, 22 Mar 2019 13:10:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 191736B0007; Fri, 22 Mar 2019 13:10:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 083586B0008; Fri, 22 Mar 2019 13:10:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id BCFEA6B0006 for ; Fri, 22 Mar 2019 13:10:41 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id b11so2904602pfo.15 for ; Fri, 22 Mar 2019 10:10:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=7tgugivS06FKHN/N1HETjLJIDRie27GVe7ls8gMk6Ak=; b=Tq6vi1v9fhKSAfijDrd7HIEpYzaRBmM8f1pNR+g+G5pNk+C0OaoEyaXC0tOcHRVq8i RNAb9MZCgYowPpM0t4t3B0cm5b7Au4D/VuYrd+FIf595ozmxLC/KId6XBMNvjg7uQUBg tIV6OBnbE2COdsGTBbnVc1oqU6gsRIPcNFdaqnV+zJ/ZYd+8bjYSZoLi/ltrKsb8Yf+M UZfDFk7k+OUNhEi2omQGrBeQBJ5STv0qT2glkOg6n3UztvjFO3SWLIISEYzuD5VAs/dP oHTZ5/M+Rn/mNpbOxpARnE827DtVc6kvao5CkKJgxbDX9Eg8w1cxHdbMZxvmW0RBgMJN 4ssQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAUeIscZxAeFpzT32oSgX1svepRN+MXfKWYyVCCti6xaxWWaAktk Sy18oKCJRRKhzY3H2izRp9LTSe0t1qBk18yT4q8JuUWmzAzt4WeqGQot4X1g1Rd2qCKWlLPwsIR btApPN4XFws7C+gtzXDoHXZiZGEZ+OYOx0WNLPdoh34LndSFebyYYfXvj82V+FjmSFA== X-Received: by 2002:a65:5b47:: with SMTP id y7mr9932894pgr.449.1553274641386; Fri, 22 Mar 2019 10:10:41 -0700 (PDT) X-Google-Smtp-Source: APXvYqw9h6d2Ca+vB3p7/8UGzSbzJtuyzivQxEvVZ8gmgs1aKzVuRTPg8aavnMJN/YkVuc+se9qJ X-Received: by 2002:a65:5b47:: with SMTP id y7mr9932817pgr.449.1553274640462; Fri, 22 Mar 2019 10:10:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553274640; cv=none; d=google.com; s=arc-20160816; b=ycHGGQcA3fUaLleLjO2yzfHm42pJhvRi3D7RjVzblK4P4ZDCAa0s2SyprUgg3853kz Uid6ENbARCmpi6+eKv8gwMx0fUPsOJ0JakNfItbg6+irGFp88tDLmOXjitr5tqibHVaI wNOK+JbbUrXygGOIpoMI12S/ddkhPGxTax08lzjlOtRtPOJeZp1mllVcnWBLsGPWTPMK /VMHA49mcnM8inU9KA0BN4bg7aFIFhr2KnYNQ+ehCCmQp/dJHfcfOiuIhUTW7SN6qRQz x8NF4BgHu7f/E2xjkB7/DhqJCdFK4RVtAPkK4EGn7OSOFf9q2XflvbqankFduRu9WALS Sv/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=7tgugivS06FKHN/N1HETjLJIDRie27GVe7ls8gMk6Ak=; b=HA3SCqEqp4yA8j6NaNBsyMj6vNZOYNVYzltXs7pofueQSP2xLIGb/igeViKnLL4HuD u2B7aqWiSyqieGVD3qbgYmchC9KKVEfxYzB1GEQMErOtGr9H16WHAPVZ7R40xje8+mpA RlEQk0aMExd839UxPKWNuznhOq7S1oWrzQiUsbtIlrTGT/H3BVOhxNFY5yt03LXVAdRS mM/dg36MDlNpzxmQNLYO4Ef4lpCngckuyB+MRXzFjQtlWVVVrjh0C/LO1FBs3uONaULc DfnrsLlKl9wj3xS9g9Beaqdsl6ZIc6NOthwYIwqgRgvuSoi23XDZuHB1DkhdFwbi7DiZ cjAw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga03.intel.com (mga03.intel.com. [134.134.136.65]) by mx.google.com with ESMTPS id r7si6891992pfn.144.2019.03.22.10.10.40 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 10:10:40 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) client-ip=134.134.136.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Mar 2019 10:10:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,256,1549958400"; d="scan'208";a="157486298" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga001.fm.intel.com with ESMTP; 22 Mar 2019 10:10:38 -0700 Subject: [PATCH v5 01/10] mm/sparsemem: Introduce struct mem_section_usage From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Vlastimil Babka , Logan Gunthorpe , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org Date: Fri, 22 Mar 2019 09:57:59 -0700 Message-ID: <155327387961.225273.1318113033564648835.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Towards enabling memory hotplug to track partial population of a section, introduce 'struct mem_section_usage'. A pointer to a 'struct mem_section_usage' instance replaces the existing pointer to a 'pageblock_flags' bitmap. Effectively it adds one more 'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to house a new 'map_active' bitmap. The new bitmap enables the memory hot{plug,remove} implementation to act on incremental sub-divisions of a section. The primary motivation for this functionality is to support platforms that mix "System RAM" and "Persistent Memory" within a single section, or multiple PMEM ranges with different mapping lifetimes within a single section. The section restriction for hotplug has caused an ongoing saga of hacks and bugs for devm_memremap_pages() users. Beyond the fixups to teach existing paths how to retrieve the 'usemap' from a section, and updates to usemap allocation path, there are no expected behavior changes. Cc: Michal Hocko Cc: Vlastimil Babka Cc: Logan Gunthorpe Signed-off-by: Dan Williams --- include/linux/mmzone.h | 23 ++++++++++++-- mm/memory_hotplug.c | 18 ++++++----- mm/page_alloc.c | 2 + mm/sparse.c | 81 ++++++++++++++++++++++++------------------------ 4 files changed, 71 insertions(+), 53 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index fba7741533be..151dd7327e0b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1107,6 +1107,19 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec) #define SECTION_ALIGN_UP(pfn) (((pfn) + PAGES_PER_SECTION - 1) & PAGE_SECTION_MASK) #define SECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SECTION_MASK) +#define SECTION_ACTIVE_SIZE ((1UL << SECTION_SIZE_BITS) / BITS_PER_LONG) +#define SECTION_ACTIVE_MASK (~(SECTION_ACTIVE_SIZE - 1)) + +struct mem_section_usage { + /* + * SECTION_ACTIVE_SIZE portions of the section that are populated in + * the memmap + */ + unsigned long map_active; + /* See declaration of similar field in struct zone */ + unsigned long pageblock_flags[0]; +}; + struct page; struct page_ext; struct mem_section { @@ -1124,8 +1137,7 @@ struct mem_section { */ unsigned long section_mem_map; - /* See declaration of similar field in struct zone */ - unsigned long *pageblock_flags; + struct mem_section_usage *usage; #ifdef CONFIG_PAGE_EXTENSION /* * If SPARSEMEM, pgdat doesn't have page_ext pointer. We use @@ -1156,6 +1168,11 @@ extern struct mem_section **mem_section; extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif +static inline unsigned long *section_to_usemap(struct mem_section *ms) +{ + return ms->usage->pageblock_flags; +} + static inline struct mem_section *__nr_to_section(unsigned long nr) { #ifdef CONFIG_SPARSEMEM_EXTREME @@ -1167,7 +1184,7 @@ static inline struct mem_section *__nr_to_section(unsigned long nr) return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK]; } extern int __section_nr(struct mem_section* ms); -extern unsigned long usemap_size(void); +extern size_t mem_section_usage_size(void); /* * We use the lower bits of the mem_map pointer to store diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index f767582af4f8..2541a3a15854 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -164,9 +164,10 @@ void put_page_bootmem(struct page *page) #ifndef CONFIG_SPARSEMEM_VMEMMAP static void register_page_bootmem_info_section(unsigned long start_pfn) { - unsigned long *usemap, mapsize, section_nr, i; + unsigned long mapsize, section_nr, i; struct mem_section *ms; struct page *page, *memmap; + struct mem_section_usage *usage; section_nr = pfn_to_section_nr(start_pfn); ms = __nr_to_section(section_nr); @@ -186,10 +187,10 @@ static void register_page_bootmem_info_section(unsigned long start_pfn) for (i = 0; i < mapsize; i++, page++) get_page_bootmem(section_nr, page, SECTION_INFO); - usemap = ms->pageblock_flags; - page = virt_to_page(usemap); + usage = ms->usage; + page = virt_to_page(usage); - mapsize = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT; + mapsize = PAGE_ALIGN(mem_section_usage_size()) >> PAGE_SHIFT; for (i = 0; i < mapsize; i++, page++) get_page_bootmem(section_nr, page, MIX_SECTION_INFO); @@ -198,9 +199,10 @@ static void register_page_bootmem_info_section(unsigned long start_pfn) #else /* CONFIG_SPARSEMEM_VMEMMAP */ static void register_page_bootmem_info_section(unsigned long start_pfn) { - unsigned long *usemap, mapsize, section_nr, i; + unsigned long mapsize, section_nr, i; struct mem_section *ms; struct page *page, *memmap; + struct mem_section_usage *usage; section_nr = pfn_to_section_nr(start_pfn); ms = __nr_to_section(section_nr); @@ -209,10 +211,10 @@ static void register_page_bootmem_info_section(unsigned long start_pfn) register_page_bootmem_memmap(section_nr, memmap, PAGES_PER_SECTION); - usemap = ms->pageblock_flags; - page = virt_to_page(usemap); + usage = ms->usage; + page = virt_to_page(usage); - mapsize = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT; + mapsize = PAGE_ALIGN(mem_section_usage_size()) >> PAGE_SHIFT; for (i = 0; i < mapsize; i++, page++) get_page_bootmem(section_nr, page, MIX_SECTION_INFO); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 03fcf73d47da..bf23bc0b8399 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -388,7 +388,7 @@ static inline unsigned long *get_pageblock_bitmap(struct page *page, unsigned long pfn) { #ifdef CONFIG_SPARSEMEM - return __pfn_to_section(pfn)->pageblock_flags; + return section_to_usemap(__pfn_to_section(pfn)); #else return page_zone(page)->pageblock_flags; #endif /* CONFIG_SPARSEMEM */ diff --git a/mm/sparse.c b/mm/sparse.c index 69904aa6165b..cdd2978d0ffe 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -288,33 +288,31 @@ struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pn static void __meminit sparse_init_one_section(struct mem_section *ms, unsigned long pnum, struct page *mem_map, - unsigned long *pageblock_bitmap) + struct mem_section_usage *usage) { ms->section_mem_map &= ~SECTION_MAP_MASK; ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum) | SECTION_HAS_MEM_MAP; - ms->pageblock_flags = pageblock_bitmap; + ms->usage = usage; } -unsigned long usemap_size(void) +static unsigned long usemap_size(void) { return BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS) * sizeof(unsigned long); } -#ifdef CONFIG_MEMORY_HOTPLUG -static unsigned long *__kmalloc_section_usemap(void) +size_t mem_section_usage_size(void) { - return kmalloc(usemap_size(), GFP_KERNEL); + return sizeof(struct mem_section_usage) + usemap_size(); } -#endif /* CONFIG_MEMORY_HOTPLUG */ #ifdef CONFIG_MEMORY_HOTREMOVE -static unsigned long * __init +static struct mem_section_usage * __init sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat, unsigned long size) { + struct mem_section_usage *usage; unsigned long goal, limit; - unsigned long *p; int nid; /* * A page may contain usemaps for other sections preventing the @@ -330,15 +328,16 @@ sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat, limit = goal + (1UL << PA_SECTION_SHIFT); nid = early_pfn_to_nid(goal >> PAGE_SHIFT); again: - p = memblock_alloc_try_nid(size, SMP_CACHE_BYTES, goal, limit, nid); - if (!p && limit) { + usage = memblock_alloc_try_nid(size, SMP_CACHE_BYTES, goal, limit, nid); + if (!usage && limit) { limit = 0; goto again; } - return p; + return usage; } -static void __init check_usemap_section_nr(int nid, unsigned long *usemap) +static void __init check_usemap_section_nr(int nid, + struct mem_section_usage *usage) { unsigned long usemap_snr, pgdat_snr; static unsigned long old_usemap_snr; @@ -352,7 +351,7 @@ static void __init check_usemap_section_nr(int nid, unsigned long *usemap) old_pgdat_snr = NR_MEM_SECTIONS; } - usemap_snr = pfn_to_section_nr(__pa(usemap) >> PAGE_SHIFT); + usemap_snr = pfn_to_section_nr(__pa(usage) >> PAGE_SHIFT); pgdat_snr = pfn_to_section_nr(__pa(pgdat) >> PAGE_SHIFT); if (usemap_snr == pgdat_snr) return; @@ -380,14 +379,15 @@ static void __init check_usemap_section_nr(int nid, unsigned long *usemap) usemap_snr, pgdat_snr, nid); } #else -static unsigned long * __init +static struct mem_section_usage * __init sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat, unsigned long size) { return memblock_alloc_node(size, SMP_CACHE_BYTES, pgdat->node_id); } -static void __init check_usemap_section_nr(int nid, unsigned long *usemap) +static void __init check_usemap_section_nr(int nid, + struct mem_section_usage *usage) { } #endif /* CONFIG_MEMORY_HOTREMOVE */ @@ -474,14 +474,13 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin, unsigned long pnum_end, unsigned long map_count) { - unsigned long pnum, usemap_longs, *usemap; + struct mem_section_usage *usage; + unsigned long pnum; struct page *map; - usemap_longs = BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS); - usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nid), - usemap_size() * - map_count); - if (!usemap) { + usage = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nid), + mem_section_usage_size() * map_count); + if (!usage) { pr_err("%s: node[%d] usemap allocation failed", __func__, nid); goto failed; } @@ -497,9 +496,9 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin, pnum_begin = pnum; goto failed; } - check_usemap_section_nr(nid, usemap); - sparse_init_one_section(__nr_to_section(pnum), pnum, map, usemap); - usemap += usemap_longs; + check_usemap_section_nr(nid, usage); + sparse_init_one_section(__nr_to_section(pnum), pnum, map, usage); + usage = (void *) usage + mem_section_usage_size(); } sparse_buffer_fini(); return; @@ -693,9 +692,9 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, struct vmem_altmap *altmap) { unsigned long section_nr = pfn_to_section_nr(start_pfn); + struct mem_section_usage *usage; struct mem_section *ms; struct page *memmap; - unsigned long *usemap; int ret; /* @@ -709,8 +708,8 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, memmap = kmalloc_section_memmap(section_nr, nid, altmap); if (!memmap) return -ENOMEM; - usemap = __kmalloc_section_usemap(); - if (!usemap) { + usage = kzalloc(mem_section_usage_size(), GFP_KERNEL); + if (!usage) { __kfree_section_memmap(memmap, altmap); return -ENOMEM; } @@ -728,11 +727,11 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, page_init_poison(memmap, sizeof(struct page) * PAGES_PER_SECTION); section_mark_present(ms); - sparse_init_one_section(ms, section_nr, memmap, usemap); + sparse_init_one_section(ms, section_nr, memmap, usage); out: if (ret < 0) { - kfree(usemap); + kfree(usage); __kfree_section_memmap(memmap, altmap); } return ret; @@ -769,20 +768,20 @@ static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages) } #endif -static void free_section_usemap(struct page *memmap, unsigned long *usemap, - struct vmem_altmap *altmap) +static void free_section_usage(struct page *memmap, + struct mem_section_usage *usage, struct vmem_altmap *altmap) { - struct page *usemap_page; + struct page *usage_page; - if (!usemap) + if (!usage) return; - usemap_page = virt_to_page(usemap); + usage_page = virt_to_page(usage); /* * Check to see if allocation came from hot-plug-add */ - if (PageSlab(usemap_page) || PageCompound(usemap_page)) { - kfree(usemap); + if (PageSlab(usage_page) || PageCompound(usage_page)) { + kfree(usage); if (memmap) __kfree_section_memmap(memmap, altmap); return; @@ -801,19 +800,19 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms, unsigned long map_offset, struct vmem_altmap *altmap) { struct page *memmap = NULL; - unsigned long *usemap = NULL; + struct mem_section_usage *usage = NULL; if (ms->section_mem_map) { - usemap = ms->pageblock_flags; + usage = ms->usage; memmap = sparse_decode_mem_map(ms->section_mem_map, __section_nr(ms)); ms->section_mem_map = 0; - ms->pageblock_flags = NULL; + ms->usage = NULL; } clear_hwpoisoned_pages(memmap + map_offset, PAGES_PER_SECTION - map_offset); - free_section_usemap(memmap, usemap, altmap); + free_section_usage(memmap, usage, altmap); } #endif /* CONFIG_MEMORY_HOTREMOVE */ #endif /* CONFIG_MEMORY_HOTPLUG */