From patchwork Fri Mar 22 16:57:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10866295 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 39A261390 for ; Fri, 22 Mar 2019 17:10:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 153D52A8C0 for ; Fri, 22 Mar 2019 17:10:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 094552A8C6; Fri, 22 Mar 2019 17:10:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1C7862A8C0 for ; Fri, 22 Mar 2019 17:10:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E2296B0006; Fri, 22 Mar 2019 13:10:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 191736B0007; Fri, 22 Mar 2019 13:10:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 083586B0008; Fri, 22 Mar 2019 13:10:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id BCFEA6B0006 for ; Fri, 22 Mar 2019 13:10:41 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id b11so2904602pfo.15 for ; Fri, 22 Mar 2019 10:10:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=7tgugivS06FKHN/N1HETjLJIDRie27GVe7ls8gMk6Ak=; b=Tq6vi1v9fhKSAfijDrd7HIEpYzaRBmM8f1pNR+g+G5pNk+C0OaoEyaXC0tOcHRVq8i RNAb9MZCgYowPpM0t4t3B0cm5b7Au4D/VuYrd+FIf595ozmxLC/KId6XBMNvjg7uQUBg tIV6OBnbE2COdsGTBbnVc1oqU6gsRIPcNFdaqnV+zJ/ZYd+8bjYSZoLi/ltrKsb8Yf+M UZfDFk7k+OUNhEi2omQGrBeQBJ5STv0qT2glkOg6n3UztvjFO3SWLIISEYzuD5VAs/dP oHTZ5/M+Rn/mNpbOxpARnE827DtVc6kvao5CkKJgxbDX9Eg8w1cxHdbMZxvmW0RBgMJN 4ssQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAUeIscZxAeFpzT32oSgX1svepRN+MXfKWYyVCCti6xaxWWaAktk Sy18oKCJRRKhzY3H2izRp9LTSe0t1qBk18yT4q8JuUWmzAzt4WeqGQot4X1g1Rd2qCKWlLPwsIR btApPN4XFws7C+gtzXDoHXZiZGEZ+OYOx0WNLPdoh34LndSFebyYYfXvj82V+FjmSFA== X-Received: by 2002:a65:5b47:: with SMTP id y7mr9932894pgr.449.1553274641386; Fri, 22 Mar 2019 10:10:41 -0700 (PDT) X-Google-Smtp-Source: APXvYqw9h6d2Ca+vB3p7/8UGzSbzJtuyzivQxEvVZ8gmgs1aKzVuRTPg8aavnMJN/YkVuc+se9qJ X-Received: by 2002:a65:5b47:: with SMTP id y7mr9932817pgr.449.1553274640462; Fri, 22 Mar 2019 10:10:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553274640; cv=none; d=google.com; s=arc-20160816; b=ycHGGQcA3fUaLleLjO2yzfHm42pJhvRi3D7RjVzblK4P4ZDCAa0s2SyprUgg3853kz Uid6ENbARCmpi6+eKv8gwMx0fUPsOJ0JakNfItbg6+irGFp88tDLmOXjitr5tqibHVaI wNOK+JbbUrXygGOIpoMI12S/ddkhPGxTax08lzjlOtRtPOJeZp1mllVcnWBLsGPWTPMK /VMHA49mcnM8inU9KA0BN4bg7aFIFhr2KnYNQ+ehCCmQp/dJHfcfOiuIhUTW7SN6qRQz x8NF4BgHu7f/E2xjkB7/DhqJCdFK4RVtAPkK4EGn7OSOFf9q2XflvbqankFduRu9WALS Sv/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=7tgugivS06FKHN/N1HETjLJIDRie27GVe7ls8gMk6Ak=; b=HA3SCqEqp4yA8j6NaNBsyMj6vNZOYNVYzltXs7pofueQSP2xLIGb/igeViKnLL4HuD u2B7aqWiSyqieGVD3qbgYmchC9KKVEfxYzB1GEQMErOtGr9H16WHAPVZ7R40xje8+mpA RlEQk0aMExd839UxPKWNuznhOq7S1oWrzQiUsbtIlrTGT/H3BVOhxNFY5yt03LXVAdRS mM/dg36MDlNpzxmQNLYO4Ef4lpCngckuyB+MRXzFjQtlWVVVrjh0C/LO1FBs3uONaULc DfnrsLlKl9wj3xS9g9Beaqdsl6ZIc6NOthwYIwqgRgvuSoi23XDZuHB1DkhdFwbi7DiZ cjAw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga03.intel.com (mga03.intel.com. [134.134.136.65]) by mx.google.com with ESMTPS id r7si6891992pfn.144.2019.03.22.10.10.40 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 10:10:40 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) client-ip=134.134.136.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Mar 2019 10:10:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,256,1549958400"; d="scan'208";a="157486298" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga001.fm.intel.com with ESMTP; 22 Mar 2019 10:10:38 -0700 Subject: [PATCH v5 01/10] mm/sparsemem: Introduce struct mem_section_usage From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Vlastimil Babka , Logan Gunthorpe , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org Date: Fri, 22 Mar 2019 09:57:59 -0700 Message-ID: <155327387961.225273.1318113033564648835.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Towards enabling memory hotplug to track partial population of a section, introduce 'struct mem_section_usage'. A pointer to a 'struct mem_section_usage' instance replaces the existing pointer to a 'pageblock_flags' bitmap. Effectively it adds one more 'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to house a new 'map_active' bitmap. The new bitmap enables the memory hot{plug,remove} implementation to act on incremental sub-divisions of a section. The primary motivation for this functionality is to support platforms that mix "System RAM" and "Persistent Memory" within a single section, or multiple PMEM ranges with different mapping lifetimes within a single section. The section restriction for hotplug has caused an ongoing saga of hacks and bugs for devm_memremap_pages() users. Beyond the fixups to teach existing paths how to retrieve the 'usemap' from a section, and updates to usemap allocation path, there are no expected behavior changes. Cc: Michal Hocko Cc: Vlastimil Babka Cc: Logan Gunthorpe Signed-off-by: Dan Williams --- include/linux/mmzone.h | 23 ++++++++++++-- mm/memory_hotplug.c | 18 ++++++----- mm/page_alloc.c | 2 + mm/sparse.c | 81 ++++++++++++++++++++++++------------------------ 4 files changed, 71 insertions(+), 53 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index fba7741533be..151dd7327e0b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1107,6 +1107,19 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec) #define SECTION_ALIGN_UP(pfn) (((pfn) + PAGES_PER_SECTION - 1) & PAGE_SECTION_MASK) #define SECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SECTION_MASK) +#define SECTION_ACTIVE_SIZE ((1UL << SECTION_SIZE_BITS) / BITS_PER_LONG) +#define SECTION_ACTIVE_MASK (~(SECTION_ACTIVE_SIZE - 1)) + +struct mem_section_usage { + /* + * SECTION_ACTIVE_SIZE portions of the section that are populated in + * the memmap + */ + unsigned long map_active; + /* See declaration of similar field in struct zone */ + unsigned long pageblock_flags[0]; +}; + struct page; struct page_ext; struct mem_section { @@ -1124,8 +1137,7 @@ struct mem_section { */ unsigned long section_mem_map; - /* See declaration of similar field in struct zone */ - unsigned long *pageblock_flags; + struct mem_section_usage *usage; #ifdef CONFIG_PAGE_EXTENSION /* * If SPARSEMEM, pgdat doesn't have page_ext pointer. We use @@ -1156,6 +1168,11 @@ extern struct mem_section **mem_section; extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif +static inline unsigned long *section_to_usemap(struct mem_section *ms) +{ + return ms->usage->pageblock_flags; +} + static inline struct mem_section *__nr_to_section(unsigned long nr) { #ifdef CONFIG_SPARSEMEM_EXTREME @@ -1167,7 +1184,7 @@ static inline struct mem_section *__nr_to_section(unsigned long nr) return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK]; } extern int __section_nr(struct mem_section* ms); -extern unsigned long usemap_size(void); +extern size_t mem_section_usage_size(void); /* * We use the lower bits of the mem_map pointer to store diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index f767582af4f8..2541a3a15854 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -164,9 +164,10 @@ void put_page_bootmem(struct page *page) #ifndef CONFIG_SPARSEMEM_VMEMMAP static void register_page_bootmem_info_section(unsigned long start_pfn) { - unsigned long *usemap, mapsize, section_nr, i; + unsigned long mapsize, section_nr, i; struct mem_section *ms; struct page *page, *memmap; + struct mem_section_usage *usage; section_nr = pfn_to_section_nr(start_pfn); ms = __nr_to_section(section_nr); @@ -186,10 +187,10 @@ static void register_page_bootmem_info_section(unsigned long start_pfn) for (i = 0; i < mapsize; i++, page++) get_page_bootmem(section_nr, page, SECTION_INFO); - usemap = ms->pageblock_flags; - page = virt_to_page(usemap); + usage = ms->usage; + page = virt_to_page(usage); - mapsize = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT; + mapsize = PAGE_ALIGN(mem_section_usage_size()) >> PAGE_SHIFT; for (i = 0; i < mapsize; i++, page++) get_page_bootmem(section_nr, page, MIX_SECTION_INFO); @@ -198,9 +199,10 @@ static void register_page_bootmem_info_section(unsigned long start_pfn) #else /* CONFIG_SPARSEMEM_VMEMMAP */ static void register_page_bootmem_info_section(unsigned long start_pfn) { - unsigned long *usemap, mapsize, section_nr, i; + unsigned long mapsize, section_nr, i; struct mem_section *ms; struct page *page, *memmap; + struct mem_section_usage *usage; section_nr = pfn_to_section_nr(start_pfn); ms = __nr_to_section(section_nr); @@ -209,10 +211,10 @@ static void register_page_bootmem_info_section(unsigned long start_pfn) register_page_bootmem_memmap(section_nr, memmap, PAGES_PER_SECTION); - usemap = ms->pageblock_flags; - page = virt_to_page(usemap); + usage = ms->usage; + page = virt_to_page(usage); - mapsize = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT; + mapsize = PAGE_ALIGN(mem_section_usage_size()) >> PAGE_SHIFT; for (i = 0; i < mapsize; i++, page++) get_page_bootmem(section_nr, page, MIX_SECTION_INFO); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 03fcf73d47da..bf23bc0b8399 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -388,7 +388,7 @@ static inline unsigned long *get_pageblock_bitmap(struct page *page, unsigned long pfn) { #ifdef CONFIG_SPARSEMEM - return __pfn_to_section(pfn)->pageblock_flags; + return section_to_usemap(__pfn_to_section(pfn)); #else return page_zone(page)->pageblock_flags; #endif /* CONFIG_SPARSEMEM */ diff --git a/mm/sparse.c b/mm/sparse.c index 69904aa6165b..cdd2978d0ffe 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -288,33 +288,31 @@ struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pn static void __meminit sparse_init_one_section(struct mem_section *ms, unsigned long pnum, struct page *mem_map, - unsigned long *pageblock_bitmap) + struct mem_section_usage *usage) { ms->section_mem_map &= ~SECTION_MAP_MASK; ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum) | SECTION_HAS_MEM_MAP; - ms->pageblock_flags = pageblock_bitmap; + ms->usage = usage; } -unsigned long usemap_size(void) +static unsigned long usemap_size(void) { return BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS) * sizeof(unsigned long); } -#ifdef CONFIG_MEMORY_HOTPLUG -static unsigned long *__kmalloc_section_usemap(void) +size_t mem_section_usage_size(void) { - return kmalloc(usemap_size(), GFP_KERNEL); + return sizeof(struct mem_section_usage) + usemap_size(); } -#endif /* CONFIG_MEMORY_HOTPLUG */ #ifdef CONFIG_MEMORY_HOTREMOVE -static unsigned long * __init +static struct mem_section_usage * __init sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat, unsigned long size) { + struct mem_section_usage *usage; unsigned long goal, limit; - unsigned long *p; int nid; /* * A page may contain usemaps for other sections preventing the @@ -330,15 +328,16 @@ sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat, limit = goal + (1UL << PA_SECTION_SHIFT); nid = early_pfn_to_nid(goal >> PAGE_SHIFT); again: - p = memblock_alloc_try_nid(size, SMP_CACHE_BYTES, goal, limit, nid); - if (!p && limit) { + usage = memblock_alloc_try_nid(size, SMP_CACHE_BYTES, goal, limit, nid); + if (!usage && limit) { limit = 0; goto again; } - return p; + return usage; } -static void __init check_usemap_section_nr(int nid, unsigned long *usemap) +static void __init check_usemap_section_nr(int nid, + struct mem_section_usage *usage) { unsigned long usemap_snr, pgdat_snr; static unsigned long old_usemap_snr; @@ -352,7 +351,7 @@ static void __init check_usemap_section_nr(int nid, unsigned long *usemap) old_pgdat_snr = NR_MEM_SECTIONS; } - usemap_snr = pfn_to_section_nr(__pa(usemap) >> PAGE_SHIFT); + usemap_snr = pfn_to_section_nr(__pa(usage) >> PAGE_SHIFT); pgdat_snr = pfn_to_section_nr(__pa(pgdat) >> PAGE_SHIFT); if (usemap_snr == pgdat_snr) return; @@ -380,14 +379,15 @@ static void __init check_usemap_section_nr(int nid, unsigned long *usemap) usemap_snr, pgdat_snr, nid); } #else -static unsigned long * __init +static struct mem_section_usage * __init sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat, unsigned long size) { return memblock_alloc_node(size, SMP_CACHE_BYTES, pgdat->node_id); } -static void __init check_usemap_section_nr(int nid, unsigned long *usemap) +static void __init check_usemap_section_nr(int nid, + struct mem_section_usage *usage) { } #endif /* CONFIG_MEMORY_HOTREMOVE */ @@ -474,14 +474,13 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin, unsigned long pnum_end, unsigned long map_count) { - unsigned long pnum, usemap_longs, *usemap; + struct mem_section_usage *usage; + unsigned long pnum; struct page *map; - usemap_longs = BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS); - usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nid), - usemap_size() * - map_count); - if (!usemap) { + usage = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nid), + mem_section_usage_size() * map_count); + if (!usage) { pr_err("%s: node[%d] usemap allocation failed", __func__, nid); goto failed; } @@ -497,9 +496,9 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin, pnum_begin = pnum; goto failed; } - check_usemap_section_nr(nid, usemap); - sparse_init_one_section(__nr_to_section(pnum), pnum, map, usemap); - usemap += usemap_longs; + check_usemap_section_nr(nid, usage); + sparse_init_one_section(__nr_to_section(pnum), pnum, map, usage); + usage = (void *) usage + mem_section_usage_size(); } sparse_buffer_fini(); return; @@ -693,9 +692,9 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, struct vmem_altmap *altmap) { unsigned long section_nr = pfn_to_section_nr(start_pfn); + struct mem_section_usage *usage; struct mem_section *ms; struct page *memmap; - unsigned long *usemap; int ret; /* @@ -709,8 +708,8 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, memmap = kmalloc_section_memmap(section_nr, nid, altmap); if (!memmap) return -ENOMEM; - usemap = __kmalloc_section_usemap(); - if (!usemap) { + usage = kzalloc(mem_section_usage_size(), GFP_KERNEL); + if (!usage) { __kfree_section_memmap(memmap, altmap); return -ENOMEM; } @@ -728,11 +727,11 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, page_init_poison(memmap, sizeof(struct page) * PAGES_PER_SECTION); section_mark_present(ms); - sparse_init_one_section(ms, section_nr, memmap, usemap); + sparse_init_one_section(ms, section_nr, memmap, usage); out: if (ret < 0) { - kfree(usemap); + kfree(usage); __kfree_section_memmap(memmap, altmap); } return ret; @@ -769,20 +768,20 @@ static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages) } #endif -static void free_section_usemap(struct page *memmap, unsigned long *usemap, - struct vmem_altmap *altmap) +static void free_section_usage(struct page *memmap, + struct mem_section_usage *usage, struct vmem_altmap *altmap) { - struct page *usemap_page; + struct page *usage_page; - if (!usemap) + if (!usage) return; - usemap_page = virt_to_page(usemap); + usage_page = virt_to_page(usage); /* * Check to see if allocation came from hot-plug-add */ - if (PageSlab(usemap_page) || PageCompound(usemap_page)) { - kfree(usemap); + if (PageSlab(usage_page) || PageCompound(usage_page)) { + kfree(usage); if (memmap) __kfree_section_memmap(memmap, altmap); return; @@ -801,19 +800,19 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms, unsigned long map_offset, struct vmem_altmap *altmap) { struct page *memmap = NULL; - unsigned long *usemap = NULL; + struct mem_section_usage *usage = NULL; if (ms->section_mem_map) { - usemap = ms->pageblock_flags; + usage = ms->usage; memmap = sparse_decode_mem_map(ms->section_mem_map, __section_nr(ms)); ms->section_mem_map = 0; - ms->pageblock_flags = NULL; + ms->usage = NULL; } clear_hwpoisoned_pages(memmap + map_offset, PAGES_PER_SECTION - map_offset); - free_section_usemap(memmap, usemap, altmap); + free_section_usage(memmap, usage, altmap); } #endif /* CONFIG_MEMORY_HOTREMOVE */ #endif /* CONFIG_MEMORY_HOTPLUG */ From patchwork Fri Mar 22 16:58:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10866297 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E926C1390 for ; Fri, 22 Mar 2019 17:10:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CD5682A88F for ; Fri, 22 Mar 2019 17:10:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C0B7A2A8C0; Fri, 22 Mar 2019 17:10:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 40FB92A88F for ; Fri, 22 Mar 2019 17:10:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4A32F6B0007; Fri, 22 Mar 2019 13:10:46 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 452086B0008; Fri, 22 Mar 2019 13:10:46 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3685E6B000A; Fri, 22 Mar 2019 13:10:46 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id F3E656B0007 for ; Fri, 22 Mar 2019 13:10:45 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id y2so2889168pfl.16 for ; Fri, 22 Mar 2019 10:10:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=038kwI3sjgsCjDZLXeME3KWp2RPRkDZYPRJ4y9KN7nQ=; b=r3841CCF9dJYbPHbHtgjuV3+sYJEYCwVHNFfmY631eVf891xLtvIpQIyvf3mCAIU5l CbGhK6GsVuHGPPe1UtMy5qbXT1/GU4UbYIzVxypp37+lw0/jW2L65fB3IehmPg314KNt yjPuE3FoS+5APThQ5PnG/GCwWYGtCq0eooLeEs00PzlLrMwZmywhExxbV/i2kGXjVp27 JlZ3QqhuhDBpRfe1NLtDuuxfzfxLYlCPqVsmirUXp09M/Y0KwKwSCK0/PNXT877Y06E9 YNIJE18mwS18vyRBl6cfJBF1MXRnAqxc9gGZDVfmIl3BMGGaj/pVbaIuMnGGw7yp0xWu k5EA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAVl549JrGS+fyPoFlSae1MxY3TCg4FeeR+u8Qk2cNzsmumFdxnS Rx4VNFF1OsHWZKqek6nt6vnXkC+vecyWN5KfT4qm1QJf5ywX6mIhBDUCV93hi05Ws/QLTNPFPzM SozjCyM486t/+VqBMUVv9jA2d8raVePIknntSLT0Jy9oAfunltZjwsS6VDZVrxV80vQ== X-Received: by 2002:a63:4a5a:: with SMTP id j26mr2330555pgl.361.1553274645666; Fri, 22 Mar 2019 10:10:45 -0700 (PDT) X-Google-Smtp-Source: APXvYqxaLkxzZOLfCjqkLiKEqsZfkwSi8iH586qD0kY1aTZYn/5KvYRZPLY5QzNjXMB2wQUb3oMI X-Received: by 2002:a63:4a5a:: with SMTP id j26mr2330493pgl.361.1553274644936; Fri, 22 Mar 2019 10:10:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553274644; cv=none; d=google.com; s=arc-20160816; b=Ei/faHONycM/u9ZPhvPwYZbSG89Q2/OPgYVs6vvZu1HDMDTAz6YjqKM7aYXklYYyui eWz9eXb6dbJdUNlee1JyrY1ItckDT0WUFO4P2LuLQ9YpgpU1WuqbYG1AeIIi7gb7JHjc PCuZYKqrze2a4JuND0Uv1CA3ao00kB5WacSEbYgV9JU215IeKm2Y5atEgIRwhHjVA7O/ CCip8B5EWsugV3ciacKagPPw894aSyG0o0FrBlrSPtk6bmj0VKecHvjxUiHUwF/n9oT4 xXlvA0JDH0OyhkjaTNZwIR/mjGXbbVAsBnD36C00+6rfrP5WrfnzMaZBlSWyf0Xfz49k O01Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=038kwI3sjgsCjDZLXeME3KWp2RPRkDZYPRJ4y9KN7nQ=; b=h34PPRonVLk7NUaRykWCXom1vvka38Si5Dps6PP+wAODv4Vb/lTIbi85YcFxfE6sbl gBXkNWQM3COK85tEdll/tfPgWVXJiMconhkpHdyCfCVklpGvwCH51gfcnFk3rjMrAY8H 7JN/NCxDG7+Kf9B4Y/5lAOi2qHAQrdhfamknkkjMnkNwGevTLrGz9O/UcLmfP12W+t/u MrYMYIOHi8HgvWZdnorCxdhBs4JV+kMTULir/7yeGxrbRBBP0HiaPPvghaV27u0pXhxM g1icl5aCaCn/NdbCa/lp/z6CEXSe30bC6FQkC5UTmesIYWQMWizsoq9dBpRI5Y8Quf2X rihw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id l34si4179755pgb.574.2019.03.22.10.10.44 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 10:10:44 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.93 as permitted sender) client-ip=192.55.52.93; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Mar 2019 10:10:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,256,1549958400"; d="scan'208";a="216629502" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga001.jf.intel.com with ESMTP; 22 Mar 2019 10:10:43 -0700 Subject: [PATCH v5 02/10] mm/sparsemem: Introduce common definitions for the size and mask of a section From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Vlastimil Babka , =?utf-8?b?SsOpcsO0bWU=?= Glisse , Logan Gunthorpe , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org Date: Fri, 22 Mar 2019 09:58:05 -0700 Message-ID: <155327388517.225273.8517440825117584932.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Up-level the local section size and mask from kernel/memremap.c to global definitions. These will be used by the new sub-section hotplug support. Cc: Michal Hocko Cc: Vlastimil Babka Cc: Jérôme Glisse Cc: Logan Gunthorpe Signed-off-by: Dan Williams --- include/linux/mmzone.h | 2 ++ kernel/memremap.c | 10 ++++------ mm/hmm.c | 2 -- 3 files changed, 6 insertions(+), 8 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 151dd7327e0b..69b9cb9cb2ed 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1081,6 +1081,8 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn) * PFN_SECTION_SHIFT pfn to/from section number */ #define PA_SECTION_SHIFT (SECTION_SIZE_BITS) +#define PA_SECTION_SIZE (1UL << PA_SECTION_SHIFT) +#define PA_SECTION_MASK (~(PA_SECTION_SIZE-1)) #define PFN_SECTION_SHIFT (SECTION_SIZE_BITS - PAGE_SHIFT) #define NR_MEM_SECTIONS (1UL << SECTIONS_SHIFT) diff --git a/kernel/memremap.c b/kernel/memremap.c index a856cb5ff192..dda1367b385d 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -14,8 +14,6 @@ #include static DEFINE_XARRAY(pgmap_array); -#define SECTION_MASK ~((1UL << PA_SECTION_SHIFT) - 1) -#define SECTION_SIZE (1UL << PA_SECTION_SHIFT) #if IS_ENABLED(CONFIG_DEVICE_PRIVATE) vm_fault_t device_private_entry_fault(struct vm_area_struct *vma, @@ -98,8 +96,8 @@ static void devm_memremap_pages_release(void *data) put_page(pfn_to_page(pfn)); /* pages are dead and unused, undo the arch mapping */ - align_start = res->start & ~(SECTION_SIZE - 1); - align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE) + align_start = res->start & ~(PA_SECTION_SIZE - 1); + align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE) - align_start; nid = page_to_nid(pfn_to_page(align_start >> PAGE_SHIFT)); @@ -154,8 +152,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) if (!pgmap->ref || !pgmap->kill) return ERR_PTR(-EINVAL); - align_start = res->start & ~(SECTION_SIZE - 1); - align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE) + align_start = res->start & ~(PA_SECTION_SIZE - 1); + align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE) - align_start; align_end = align_start + align_size - 1; diff --git a/mm/hmm.c b/mm/hmm.c index fe1cd87e49ac..ef9e4e6c9f92 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -33,8 +33,6 @@ #include #include -#define PA_SECTION_SIZE (1UL << PA_SECTION_SHIFT) - #if IS_ENABLED(CONFIG_HMM_MIRROR) static const struct mmu_notifier_ops hmm_mmu_notifier_ops; From patchwork Fri Mar 22 16:58:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10866303 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DD89B14DE for ; Fri, 22 Mar 2019 17:10:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B99682A88F for ; Fri, 22 Mar 2019 17:10:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id ADA5A2A8C0; Fri, 22 Mar 2019 17:10:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2B94D2A8B6 for ; Fri, 22 Mar 2019 17:10:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0A9F36B0008; Fri, 22 Mar 2019 13:10:51 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 059CA6B000A; Fri, 22 Mar 2019 13:10:51 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB1E46B000C; Fri, 22 Mar 2019 13:10:50 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id B3A226B0008 for ; Fri, 22 Mar 2019 13:10:50 -0400 (EDT) Received: by mail-pg1-f199.google.com with SMTP id 14so2676868pgf.22 for ; Fri, 22 Mar 2019 10:10:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=BSK6TVzzGaTdUDjLhcyDZLPy3x+xOocQH7fiYXagYSQ=; b=jvctoEqilEk1WwcPF6pO/rT2O7jJl0Ce8X/2r6xPK1cWHQL5snz69VCcbF8QIZgmcp i19eF7CFV5sdmuL8qpEGeo6jlXyBRq2wRIhiL5H7LQ2L/XmqmxhIP5c81v0zEZ0rOAmC SoKyD+b05MtIte/fWCKs7uxB3FBbsEuNaAWpw3gxGpb70oZS2ZQV4SShlb5M6PdMdrsC zeO/hymoc2LFZH9VRihCYZ/eh+RcbFOlTajx1JCnYh0ve2++uqJGc2OCDKnEc+Hfe4Lm dXn8m7CeuBkHOwOqHfHut1/zy/RF061NAXnTMB2gbpL6aFJFSm4pH0tcK0K55bXbv83W Q7Vg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAXbRdybeIQ5c4ezwlmTOy0pXFZIY/+no6ty5NVa6x9Tf57MMI/B p76FGuXrMtKOibxWtHVMvoJmp3e4oQ9eXaCWrkXcZZmcZi8BRaarkG5hgYvOlJST8ePjafRdb69 g0OzdS0yxDERt6JBqJV2rCdcN96TMKS+8Bhf/NC84NGSTEmcj/IeEQI6knliellBFmw== X-Received: by 2002:a62:ea0f:: with SMTP id t15mr10456110pfh.124.1553274650404; Fri, 22 Mar 2019 10:10:50 -0700 (PDT) X-Google-Smtp-Source: APXvYqyY+TRjGE5qLyPPr/CxOYyu9EHbrLsWlgupgGpbJXTo2KQBUUOH6kmIWfWMJdzswUFjQbKY X-Received: by 2002:a62:ea0f:: with SMTP id t15mr10456048pfh.124.1553274649608; Fri, 22 Mar 2019 10:10:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553274649; cv=none; d=google.com; s=arc-20160816; b=a2Z7lhatb5h6/mo8aSCPIctf3OjnlLx4735SZ2Usq6peB4PRD3MekodBildH1prBjm Ek6eqKtlp6spUUMHIiK9TdqQU0LUbQmEzmMHpzz5LLnz2KXoTBACwYqqJYvtXEz2yix4 6X0uWP2Yu+BP5cGIJP5kLuSAfzrelJjZdJklaK1/YTCezBk/2IyO/P0iSHySoyxEOm5G 1tKE2qHFSRd6rJ/VjT0aYFz6ss9UiZ1P0E4laVU/hJ++/ULzyB6KywVFxlOstFiw+rpb /elHWZn7xulamsFcdkWhfeQeDxMQmonbFVF77swKCgR3KHJzsCoV8gwvbOgznEt+bYAD F18Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=BSK6TVzzGaTdUDjLhcyDZLPy3x+xOocQH7fiYXagYSQ=; b=Q9tVMY2813dzN7MQp6IFNHMDzx0wCJfn/Bxxjfs1jLH0YG7oz/LmLpGyk2cbJc11jD Nxyn6puu+LH3UsUYxoRgGhy05d4Rw7bpOCmFbdUsZ/biVErTE0i5QogAwbDejddjTC2t Ri86zFamI28wv7ZkWUYOTjdTW70BDgNmb+N9vjhwRDZJQDOOHo87Kejs541c95Ktbnh6 6f5/TAySVa6fnNs3N1HJvAMwKTjcXm/smQFk3SeEAUBMl0pWgez5iw7xKrnqP8d8PUtX I+5JCpwpRQk1gwXmdK/55YAPBGyzx9/i1DfUczrxGuzDpRjBF2h0O5JUoceJ0EvqfiRx DcpQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id t7si6940653pgp.196.2019.03.22.10.10.49 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 10:10:49 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Mar 2019 10:10:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,256,1549958400"; d="scan'208";a="144365061" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga002.jf.intel.com with ESMTP; 22 Mar 2019 10:10:49 -0700 Subject: [PATCH v5 03/10] mm/sparsemem: Add helpers track active portions of a section at boot From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Vlastimil Babka , Logan Gunthorpe , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org Date: Fri, 22 Mar 2019 09:58:10 -0700 Message-ID: <155327389029.225273.1972826189687261996.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Prepare for hot{plug,remove} of sub-ranges of a section by tracking a section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) / map_active bitmask length (64)). If it turns out that 2MB is too large of an active tracking granularity it is trivial to increase the size of the map_active bitmap. The implications of a partially populated section is that pfn_valid() needs to go beyond a valid_section() check and read the sub-section active ranges from the bitmask. Cc: Michal Hocko Cc: Vlastimil Babka Cc: Logan Gunthorpe Signed-off-by: Dan Williams --- include/linux/mmzone.h | 29 ++++++++++++++++++++++++++++- mm/page_alloc.c | 4 +++- mm/sparse.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 79 insertions(+), 2 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 69b9cb9cb2ed..ae4aa7f63d2e 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1122,6 +1122,8 @@ struct mem_section_usage { unsigned long pageblock_flags[0]; }; +void section_active_init(unsigned long pfn, unsigned long nr_pages); + struct page; struct page_ext; struct mem_section { @@ -1259,12 +1261,36 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn) extern int __highest_present_section_nr; +static inline int section_active_index(phys_addr_t phys) +{ + return (phys & ~(PA_SECTION_MASK)) / SECTION_ACTIVE_SIZE; +} + +#ifdef CONFIG_SPARSEMEM_VMEMMAP +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn) +{ + int idx = section_active_index(PFN_PHYS(pfn)); + + return !!(ms->usage->map_active & (1UL << idx)); +} +#else +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn) +{ + return 1; +} +#endif + #ifndef CONFIG_HAVE_ARCH_PFN_VALID static inline int pfn_valid(unsigned long pfn) { + struct mem_section *ms; + if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) return 0; - return valid_section(__nr_to_section(pfn_to_section_nr(pfn))); + ms = __nr_to_section(pfn_to_section_nr(pfn)); + if (!valid_section(ms)) + return 0; + return pfn_section_valid(ms, pfn); } #endif @@ -1295,6 +1321,7 @@ void sparse_init(void); #else #define sparse_init() do {} while (0) #define sparse_index_init(_sec, _nid) do {} while (0) +#define section_active_init(_pfn, _nr_pages) do {} while (0) #endif /* CONFIG_SPARSEMEM */ /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bf23bc0b8399..508a810fd514 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7221,10 +7221,12 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn) /* Print out the early node map */ pr_info("Early memory node ranges\n"); - for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { pr_info(" node %3d: [mem %#018Lx-%#018Lx]\n", nid, (u64)start_pfn << PAGE_SHIFT, ((u64)end_pfn << PAGE_SHIFT) - 1); + section_active_init(start_pfn, end_pfn - start_pfn); + } /* Initialise every node */ mminit_verify_pageflags_layout(); diff --git a/mm/sparse.c b/mm/sparse.c index cdd2978d0ffe..3cd7ce46e749 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -210,6 +210,54 @@ static inline unsigned long first_present_section_nr(void) return next_present_section_nr(-1); } +static unsigned long section_active_mask(unsigned long pfn, + unsigned long nr_pages) +{ + int idx_start, idx_size; + phys_addr_t start, size; + + if (!nr_pages) + return 0; + + start = PFN_PHYS(pfn); + size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION + - (pfn & ~PAGE_SECTION_MASK))); + size = ALIGN(size, SECTION_ACTIVE_SIZE); + + idx_start = section_active_index(start); + idx_size = section_active_index(size); + + if (idx_size == 0) + return -1; + return ((1UL << idx_size) - 1) << idx_start; +} + +void section_active_init(unsigned long pfn, unsigned long nr_pages) +{ + int end_sec = pfn_to_section_nr(pfn + nr_pages - 1); + int i, start_sec = pfn_to_section_nr(pfn); + + if (!nr_pages) + return; + + for (i = start_sec; i <= end_sec; i++) { + struct mem_section *ms; + unsigned long mask; + unsigned long pfns; + + pfns = min(nr_pages, PAGES_PER_SECTION + - (pfn & ~PAGE_SECTION_MASK)); + mask = section_active_mask(pfn, pfns); + + ms = __nr_to_section(i); + pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i, mask); + ms->usage->map_active = mask; + + pfn += pfns; + nr_pages -= pfns; + } +} + /* Record a memory area against a node. */ void __init memory_present(int nid, unsigned long start, unsigned long end) { From patchwork Fri Mar 22 16:58:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10866307 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 48C20922 for ; Fri, 22 Mar 2019 17:10:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 265392A88F for ; Fri, 22 Mar 2019 17:10:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1A1D12A8C0; Fri, 22 Mar 2019 17:10:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9E3342A88F for ; Fri, 22 Mar 2019 17:10:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 899E76B000A; Fri, 22 Mar 2019 13:10:56 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 847AB6B000C; Fri, 22 Mar 2019 13:10:56 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 762D46B000D; Fri, 22 Mar 2019 13:10:56 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 32D7E6B000A for ; Fri, 22 Mar 2019 13:10:56 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id u8so2909615pfm.6 for ; Fri, 22 Mar 2019 10:10:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=4laCd5MFQsasg3wAcoeizAGd9ZCiY08UnTwXQ4jobr8=; b=S2+FmuCD9QK2/adybvoKaaIHZoL1+AlvmEV+PEjthH1cEeWpF4aOXsJ5kyKj/vY+9c JYxwjy8MIHFhKtpR2xcU5Wz0AuR7A4z26D6n3+g9rc/72r/NTrfh1cyXCarzF2GaJ1Hn KYGePAtBa5N2wWp6ipqx4VyNxXk/inT/cTXZRIhj0NYDOomC8b4ILtjxQpFTqddzp697 easO90h64MpN5p7NqMLTJA+vPEkLe1gA5JmQ7uZtjjbOvx5AxlFAfABhIsZ//EN23S1N +r3m3vZxXFRgz1qdTSwrCkaZ7ltoqPQCZ1tp+iH+13Za+NifpdL7l76h3OUmoqyddBns 6BUg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAVQDz5wjf0073Yb/sEwNFmpN2RDh5Q6mHKXfuuJN3jhylHqxSOd xZilShwBBwvG1qKLnsoNkNGx8yCoszHYQOjLh6zn8glCAIyYL7RMOTb+VTM73WoUHOInZtvfvIN DhBChtJI6zardrwTV5mX2sHTEQrFM2M1TiXrXgtZLcjeuf6twS7dreoVSgxzfvQVYlA== X-Received: by 2002:a17:902:b788:: with SMTP id e8mr310624pls.339.1553274655877; Fri, 22 Mar 2019 10:10:55 -0700 (PDT) X-Google-Smtp-Source: APXvYqx2oqWlgj1JMMdHYU+04AGUMpF6XrZgcIMfJ7LXFaGW+iYUxqvQyw2wqYNxez0bwaY9Gr8w X-Received: by 2002:a17:902:b788:: with SMTP id e8mr310571pls.339.1553274655197; Fri, 22 Mar 2019 10:10:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553274655; cv=none; d=google.com; s=arc-20160816; b=xBUIMd6uATrLL/d5BWnScP3RaP2tls1ZwfdAiUJrMapwX80gNVUsbKm4u/KUBGUKad kKfnpimJEs4pGSC/alJjDM3M7y2/lPHvQdJJdae4cihtNPpVoOCkWJQTYRI8N7Bi8I/R Zv0g9MLUlGeXfodQKPSXZ47r7iAdoaIzVD9K3NSN/sfVxCjUgvZYJbNFGOVRsFG0iI3j ZzwZmEZPbGJuIXUyeQowr9cerzmx7730XY/mr5HVD8Oh5E+RHvL7hXgQcmWhVmwHmNIP 8vIez6LDulhEsLBO2qvT73IpOif0EhLjW550liYu/qTQ1t/tm2aDF2V9hCHXE0Pn4i7m KiZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=4laCd5MFQsasg3wAcoeizAGd9ZCiY08UnTwXQ4jobr8=; b=QMeOX8cwe/Xqf5omflgcy3cF6jkWj/Cu0Ajz68CqmqLZVpdbO37U1a7VOCjHRP3B4e ki8PcfGobCzWjpNiWj2ZIhKpvvQM7wdNOKmS6+usVdlS1oScdSZFZEXZ7Xj7w7d4SOHx gUTamjXr40C5KSMVEUJEjyY1cftxNAPKzXKFYxNMCa3KQCevtIy6dmuO5Bw4LBQj60Lv J+04ZooFHZYWe7s8u0BRZ806zEEobvFkkV9qTw+PVPlV67su1rETJ1ctVi8tZqeawCPz SK0zgy8VNwewF5vBGc8jSoEnQG4ZJMRYe6xZdYhiH2JzQ/isHf6uP8L2JvisKFv6Ngt4 zlqQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com. [192.55.52.43]) by mx.google.com with ESMTPS id s4si6966753pgs.566.2019.03.22.10.10.55 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 10:10:55 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) client-ip=192.55.52.43; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Mar 2019 10:10:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,256,1549958400"; d="scan'208";a="136390571" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga003.jf.intel.com with ESMTP; 22 Mar 2019 10:10:54 -0700 Subject: [PATCH v5 04/10] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Vlastimil Babka , Logan Gunthorpe , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org Date: Fri, 22 Mar 2019 09:58:15 -0700 Message-ID: <155327389539.225273.8758677172387750805.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Sub-section hotplug support reduces the unit of operation of hotplug from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units (PAGES_PER_SUBSECTION). Teach shrink_{zone,pgdat}_span() to consider PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not valid_section(), can toggle. Cc: Michal Hocko Cc: Vlastimil Babka Cc: Logan Gunthorpe Signed-off-by: Dan Williams --- include/linux/mmzone.h | 2 ++ mm/memory_hotplug.c | 16 ++++++++-------- 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index ae4aa7f63d2e..067ee217c692 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1111,6 +1111,8 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec) #define SECTION_ACTIVE_SIZE ((1UL << SECTION_SIZE_BITS) / BITS_PER_LONG) #define SECTION_ACTIVE_MASK (~(SECTION_ACTIVE_SIZE - 1)) +#define PAGES_PER_SUB_SECTION (SECTION_ACTIVE_SIZE / PAGE_SIZE) +#define PAGE_SUB_SECTION_MASK (~(PAGES_PER_SUB_SECTION-1)) struct mem_section_usage { /* diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 2541a3a15854..0ea3bb58d223 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -326,10 +326,10 @@ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone, { struct mem_section *ms; - for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) { + for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SUB_SECTION) { ms = __pfn_to_section(start_pfn); - if (unlikely(!valid_section(ms))) + if (unlikely(!pfn_valid(start_pfn))) continue; if (unlikely(pfn_to_nid(start_pfn) != nid)) @@ -354,10 +354,10 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone, /* pfn is the end pfn of a memory section. */ pfn = end_pfn - 1; - for (; pfn >= start_pfn; pfn -= PAGES_PER_SECTION) { + for (; pfn >= start_pfn; pfn -= PAGES_PER_SUB_SECTION) { ms = __pfn_to_section(pfn); - if (unlikely(!valid_section(ms))) + if (unlikely(!pfn_valid(pfn))) continue; if (unlikely(pfn_to_nid(pfn) != nid)) @@ -416,10 +416,10 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn, * it check the zone has only hole or not. */ pfn = zone_start_pfn; - for (; pfn < zone_end_pfn; pfn += PAGES_PER_SECTION) { + for (; pfn < zone_end_pfn; pfn += PAGES_PER_SUB_SECTION) { ms = __pfn_to_section(pfn); - if (unlikely(!valid_section(ms))) + if (unlikely(!pfn_valid(pfn))) continue; if (page_zone(pfn_to_page(pfn)) != zone) @@ -484,10 +484,10 @@ static void shrink_pgdat_span(struct pglist_data *pgdat, * has only hole or not. */ pfn = pgdat_start_pfn; - for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SECTION) { + for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SUB_SECTION) { ms = __pfn_to_section(pfn); - if (unlikely(!valid_section(ms))) + if (unlikely(!pfn_valid(pfn))) continue; if (pfn_to_nid(pfn) != nid) From patchwork Fri Mar 22 16:58:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10866311 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 778E21390 for ; Fri, 22 Mar 2019 17:11:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 511B62A88F for ; Fri, 22 Mar 2019 17:11:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4514B2A8C0; Fri, 22 Mar 2019 17:11:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 80BC12A88F for ; Fri, 22 Mar 2019 17:11:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5154A6B000C; Fri, 22 Mar 2019 13:11:01 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4C4776B000D; Fri, 22 Mar 2019 13:11:01 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B3A06B000E; Fri, 22 Mar 2019 13:11:01 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 001386B000C for ; Fri, 22 Mar 2019 13:11:00 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id u78so2902957pfa.12 for ; Fri, 22 Mar 2019 10:11:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=lCf4CrbJlu33GZuHjmJIOCXSsYddlrQERtP4xWNIGuc=; b=oHcxfbwV26GySke4fJhATV3Cwzarjoe4XKv7qbWWGO/uyI1Iv0SINC0uF1JYKFnafl ZwILqvtcdSTcLFn+kIR8ybSzD+0AbLAvw6+x1K0Pd1x8P6wuZe5sTql+lE5hQy6gpf/9 fe2lsMsfUanlqY2XYkRa7lSfY7TA9EAEfzdWa3I6nyin5/PhLVLPtweesolEuA0m1ZyU OlX4nvu5EABcbOhb76Rdilub7MPQ65CPWA01cGEgYWDKmN042TtO/IULzz8WNnk9JEx5 cIxpqj4675cj4s2CcQmclJtdG5/8HG3bYF4eMT5OY109MqOYJG0pmcY9yD9jtDDSfU9w Hhpg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAUtDYhhkKdpnyIKP6HOZaCFCBSkDZjMTgZk17FqwNHjiyYQw+as 1RC8yGJaH8pOpkyWz5Ui62r5ckTq38gTw+FO52g3II5g0VrFcBjc3zTVVVPS+AFcaT/KC4vs8tD XKTAVmGhd0iL5In0Ko2DgIkQX9oHLolvNPVzKps7phDcSqeGF6G7hCiDLtq/DPkx/kQ== X-Received: by 2002:a17:902:b60c:: with SMTP id b12mr10409757pls.261.1553274660652; Fri, 22 Mar 2019 10:11:00 -0700 (PDT) X-Google-Smtp-Source: APXvYqx/Vn18Pb1r9wqt7D2bJXOXrhDwNjo6yjrX34LxnnxMvHJ2NvfaizZymPGOhu8vkBXXZEVO X-Received: by 2002:a17:902:b60c:: with SMTP id b12mr10409688pls.261.1553274659779; Fri, 22 Mar 2019 10:10:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553274659; cv=none; d=google.com; s=arc-20160816; b=HNB7aWTp+sxNwPcsKqiVajQHi+pg90GFeAzAh6X3EJiXOyicd6w9TFcRi6TG6weOIj S34y/dcURAP5f32UC2cDEugh28VkChGJTE48vuUbKAUBUURprm1kjp2Cdae8rHclLPnb oRpWoatmFAs9r4MGW/5lLcozfYWtOqkUOo+gV3zoNeVt9rNhBfpFczHdUDx9HACrJLwD 11OedQmvOU9KNaLyiKtUzRoMU/+fHu0NxAECHxFqWhYO93XM/MYhOgT7/8pshOQe8ndR nR6VMBvq6qIL9wjEvjJcb0Ao+F0Tm1QOSh//dD+SmbvoCnnaZDA5DFVQwNMMTr9MLbFI eKwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=lCf4CrbJlu33GZuHjmJIOCXSsYddlrQERtP4xWNIGuc=; b=1LCq1TvzwcqDB/1EIXtS7u4ooj7o+t1UFV0QUbQAoid0fnMQdgclZD1Utvo/Fw5R3r +t1UnfHrCJDfOmTbDYK0jrcGDIGOOh2FMGBdbFcRCyWrl7rVw4PJ3C8qawNsbPt/j8TN DCo6OvSyqCKgvSvLmddjB/OwFGhnh6HiFggdf8Wd2YVjYxUCT2kYqiRWEB9a0TPuLsnw Ffq7DjJWM4pnC/+qUmZqKUjmhIETtuKXYww2MMMuOYKiGtMDNBnkj8LSs61hTZLhbqn7 KfZ5K/JHqRaX8rGXURwMza9o5PAJz22KpUuz7nqbnJyzT6Z5Wsoy1vJ1mAr84KpVqx+T S3Mw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga06.intel.com (mga06.intel.com. [134.134.136.31]) by mx.google.com with ESMTPS id h12si6957599pgs.207.2019.03.22.10.10.59 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 10:10:59 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.31 as permitted sender) client-ip=134.134.136.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Mar 2019 10:10:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,256,1549958400"; d="scan'208";a="129304803" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga006.jf.intel.com with ESMTP; 22 Mar 2019 10:10:59 -0700 Subject: [PATCH v5 05/10] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap() From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Vlastimil Babka , Logan Gunthorpe , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org Date: Fri, 22 Mar 2019 09:58:20 -0700 Message-ID: <155327390049.225273.851253292223555625.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Allow sub-section sized ranges to be added to the memmap. populate_section_memmap() takes an explict pfn range rather than assuming a full section, and those parameters are plumbed all the way through to vmmemap_populate(). There should be no sub-section usage in current deployments. New warnings are added to clarify which memmap allocation paths are sub-section capable. Cc: Michal Hocko Cc: Vlastimil Babka Cc: Logan Gunthorpe Signed-off-by: Dan Williams --- arch/x86/mm/init_64.c | 4 ++- include/linux/mm.h | 4 ++- mm/sparse-vmemmap.c | 21 +++++++++++------ mm/sparse.c | 61 +++++++++++++++++++++++++++++++------------------ 4 files changed, 57 insertions(+), 33 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index bccff68e3267..799887eada60 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1461,7 +1461,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, { int err; - if (boot_cpu_has(X86_FEATURE_PSE)) + if (end - start < PAGES_PER_SECTION * sizeof(struct page)) + err = vmemmap_populate_basepages(start, end, node); + else if (boot_cpu_has(X86_FEATURE_PSE)) err = vmemmap_populate_hugepages(start, end, node, altmap); else if (altmap) { pr_err_once("%s: no cpu support for altmap allocations\n", diff --git a/include/linux/mm.h b/include/linux/mm.h index 76769749b5a5..76ba638ceda8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2666,8 +2666,8 @@ const char * arch_vma_name(struct vm_area_struct *vma); void print_vma_addr(char *prefix, unsigned long rip); void *sparse_buffer_alloc(unsigned long size); -struct page *sparse_mem_map_populate(unsigned long pnum, int nid, - struct vmem_altmap *altmap); +struct page * __populate_section_memmap(unsigned long pfn, + unsigned long nr_pages, int nid, struct vmem_altmap *altmap); pgd_t *vmemmap_pgd_populate(unsigned long addr, int node); p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node); pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node); diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 7fec05796796..dcb023aa23d1 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -245,19 +245,26 @@ int __meminit vmemmap_populate_basepages(unsigned long start, return 0; } -struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid, - struct vmem_altmap *altmap) +struct page * __meminit __populate_section_memmap(unsigned long pfn, + unsigned long nr_pages, int nid, struct vmem_altmap *altmap) { unsigned long start; unsigned long end; - struct page *map; - map = pfn_to_page(pnum * PAGES_PER_SECTION); - start = (unsigned long)map; - end = (unsigned long)(map + PAGES_PER_SECTION); + /* + * The minimum granularity of memmap extensions is + * SECTION_ACTIVE_SIZE as allocations are tracked in the + * 'map_active' bitmap of the section. + */ + end = ALIGN(pfn + nr_pages, PHYS_PFN(SECTION_ACTIVE_SIZE)); + pfn &= PHYS_PFN(SECTION_ACTIVE_MASK); + nr_pages = end - pfn; + + start = (unsigned long) pfn_to_page(pfn); + end = start + nr_pages * sizeof(struct page); if (vmemmap_populate(start, end, nid, altmap)) return NULL; - return map; + return pfn_to_page(pfn); } diff --git a/mm/sparse.c b/mm/sparse.c index 3cd7ce46e749..38f80639c6cc 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -452,8 +452,8 @@ static unsigned long __init section_map_size(void) return PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION); } -struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid, - struct vmem_altmap *altmap) +struct page __init *__populate_section_memmap(unsigned long pfn, + unsigned long nr_pages, int nid, struct vmem_altmap *altmap) { unsigned long size = section_map_size(); struct page *map = sparse_buffer_alloc(size); @@ -534,10 +534,13 @@ static void __init sparse_init_nid(int nid, unsigned long pnum_begin, } sparse_buffer_init(map_count * section_map_size(), nid); for_each_present_section_nr(pnum_begin, pnum) { + unsigned long pfn = section_nr_to_pfn(pnum); + if (pnum >= pnum_end) break; - map = sparse_mem_map_populate(pnum, nid, NULL); + map = __populate_section_memmap(pfn, PAGES_PER_SECTION, + nid, NULL); if (!map) { pr_err("%s: node[%d] memory map backing failed. Some memory will not be available.", __func__, nid); @@ -637,17 +640,17 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn) #endif #ifdef CONFIG_SPARSEMEM_VMEMMAP -static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid, - struct vmem_altmap *altmap) +static struct page *populate_section_memmap(unsigned long pfn, + unsigned long nr_pages, int nid, struct vmem_altmap *altmap) { - /* This will make the necessary allocations eventually. */ - return sparse_mem_map_populate(pnum, nid, altmap); + return __populate_section_memmap(pfn, nr_pages, nid, altmap); } -static void __kfree_section_memmap(struct page *memmap, + +static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages, struct vmem_altmap *altmap) { - unsigned long start = (unsigned long)memmap; - unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION); + unsigned long start = (unsigned long) pfn_to_page(pfn); + unsigned long end = start + nr_pages * sizeof(struct page); vmemmap_free(start, end, altmap); } @@ -661,11 +664,18 @@ static void free_map_bootmem(struct page *memmap) } #endif /* CONFIG_MEMORY_HOTREMOVE */ #else -static struct page *__kmalloc_section_memmap(void) +struct page *populate_section_memmap(unsigned long pfn, + unsigned long nr_pages, int nid, struct vmem_altmap *altmap) { struct page *page, *ret; unsigned long memmap_size = sizeof(struct page) * PAGES_PER_SECTION; + if ((pfn & ~PAGE_SECTION_MASK) || nr_pages != PAGES_PER_SECTION) { + WARN(1, "%s: called with section unaligned parameters\n", + __func__); + return NULL; + } + page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size)); if (page) goto got_map_page; @@ -682,15 +692,17 @@ static struct page *__kmalloc_section_memmap(void) return ret; } -static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid, +static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages, struct vmem_altmap *altmap) { - return __kmalloc_section_memmap(); -} + struct page *memmap = pfn_to_page(pfn); + + if ((pfn & ~PAGE_SECTION_MASK) || nr_pages != PAGES_PER_SECTION) { + WARN(1, "%s: called with section unaligned parameters\n", + __func__); + return; + } -static void __kfree_section_memmap(struct page *memmap, - struct vmem_altmap *altmap) -{ if (is_vmalloc_addr(memmap)) vfree(memmap); else @@ -753,12 +765,13 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, if (ret < 0 && ret != -EEXIST) return ret; ret = 0; - memmap = kmalloc_section_memmap(section_nr, nid, altmap); + memmap = populate_section_memmap(start_pfn, PAGES_PER_SECTION, nid, + altmap); if (!memmap) return -ENOMEM; usage = kzalloc(mem_section_usage_size(), GFP_KERNEL); if (!usage) { - __kfree_section_memmap(memmap, altmap); + depopulate_section_memmap(start_pfn, PAGES_PER_SECTION, altmap); return -ENOMEM; } @@ -780,7 +793,7 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, out: if (ret < 0) { kfree(usage); - __kfree_section_memmap(memmap, altmap); + depopulate_section_memmap(start_pfn, PAGES_PER_SECTION, altmap); } return ret; } @@ -817,7 +830,8 @@ static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages) #endif static void free_section_usage(struct page *memmap, - struct mem_section_usage *usage, struct vmem_altmap *altmap) + struct mem_section_usage *usage, unsigned long pfn, + unsigned long nr_pages, struct vmem_altmap *altmap) { struct page *usage_page; @@ -831,7 +845,7 @@ static void free_section_usage(struct page *memmap, if (PageSlab(usage_page) || PageCompound(usage_page)) { kfree(usage); if (memmap) - __kfree_section_memmap(memmap, altmap); + depopulate_section_memmap(pfn, nr_pages, altmap); return; } @@ -860,7 +874,8 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms, clear_hwpoisoned_pages(memmap + map_offset, PAGES_PER_SECTION - map_offset); - free_section_usage(memmap, usage, altmap); + free_section_usage(memmap, usage, section_nr_to_pfn(__section_nr(ms)), + PAGES_PER_SECTION, altmap); } #endif /* CONFIG_MEMORY_HOTREMOVE */ #endif /* CONFIG_MEMORY_HOTPLUG */ From patchwork Fri Mar 22 16:58:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10866315 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 404FE1390 for ; Fri, 22 Mar 2019 17:11:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1609D2A8C0 for ; Fri, 22 Mar 2019 17:11:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0A6312A96A; Fri, 22 Mar 2019 17:11:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 34F092A8C0 for ; Fri, 22 Mar 2019 17:11:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F80A6B000D; Fri, 22 Mar 2019 13:11:07 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0A9466B000E; Fri, 22 Mar 2019 13:11:07 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EFEFB6B0010; Fri, 22 Mar 2019 13:11:06 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id ADB536B000D for ; Fri, 22 Mar 2019 13:11:06 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id a72so2885421pfj.19 for ; Fri, 22 Mar 2019 10:11:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=2SR8zzxeE/sJwzq26xWUu3mR8gMIcbL+iaGrAhV8UtE=; b=NykGfLJx2hsIdvzw6N9FZhr6kGy8Tgaez4C1EuriEqlJhKAivVwW8zGNzLctdpzq8W 207oK0vuOtRiMqIV0O85Rynho6ufaFsHw8ic4gPCl3nDoNFkIENvgiQvrVXdUVU7IG9Z yDpQzu+dlq4WStkp7TMqGbItHL4tN6yCyKcmVv5/eXXzGjJdTNqDsvMivlCC0xZK5R0H I9NBtMuiTLywhT0M3JvDUjx1YD3qq7wb983MZZjOFjc+xy+xxMwPCt76xp83DTt5jNJf 01q3x+qn8/pjSqlCugiVg27GFUAEoWePZ6kVYZCiUoAegTQxujHVnRDcF0rX0WmXYoXR WHzw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAUTB1YHERKADr3g9Ygl3xIivugTfy+zE7KTZqk1/vqEtCSt+mVA /LCdO86WOQwGIc6Z3C2X53Lbc4Niae9YVQo662g3ULd0+tUVgqhRfxT9xPqjr0wGtR0UhEtf/uE WrY/UuVbXaOaYORDh8OGkLLWBU+2e7DRymFV4MXRpIKTG1GFfklci9qa5hjAVNLfNpA== X-Received: by 2002:a62:4746:: with SMTP id u67mr10206926pfa.243.1553274666355; Fri, 22 Mar 2019 10:11:06 -0700 (PDT) X-Google-Smtp-Source: APXvYqyL6HL2xWuVHhUHUaCes/+bJQylt6AJmJI4jh/sUzoXYNxuXod7CocBT96VG0V4t1zo/J7A X-Received: by 2002:a62:4746:: with SMTP id u67mr10206808pfa.243.1553274665001; Fri, 22 Mar 2019 10:11:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553274664; cv=none; d=google.com; s=arc-20160816; b=uZ+NnjuYAhVH8bOOemMasZdKxUGby2mcmVWBgkYgar1ibuljNbxXSHjVICs7DJ4qsq g6BYhN6arbZLGIEGn8F3kg/gghpbFSpNnTgdjj2tgMEfsqEb9J+qobu7DCq6FxtZ82oJ r8jvUyW9RBDqUQaJI3UjvrcJAWXSJiKUHWkg8V6HIHmeT7zr+0MipX3R5vGmk4793CZs VWtwKvmx+ClBnonDThP8t3zBXCMXKqW0+wf8YwKJU7KE3a50i8euWVcq5RxFADgSxrEA LDDN28j+J6szH41J6AQNotKHtpMCP/nF2qN4PXuNP/dkSTv81x34VyS9138qdPgy+vSs o6jA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=2SR8zzxeE/sJwzq26xWUu3mR8gMIcbL+iaGrAhV8UtE=; b=h7y7Abdq8F/mMgkd7okooNF8DDL/ogvSRameHc3q8tBdQtiYIWzh0x9JDK1hMEcJZn bpaRTCpDdVq0TNtNUn7gSN8c79jej6SpJvGLZpn15Z6OSPetgl+T9Dj3G7r9ndIVGP6E PHYqRf7K5QsSlAJk1D0MBcB5vHabqYi4xDgHFvvJt0/GwU0PKiIqofBFJau3AV3RLkDI qUlYQUP9ouyyNDCd0AihcepGS2w4eDlH6CUv1pPOniN5wPUQC3wMATwHQxlu9/bcbw7C Y9wdjdMEcGS0NfZIvIyywxQp29WPM3UpeezM5m3nb7HsBen0NL0Um4oCh/QBnsL1XjVO mGdQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id bg4si7157985plb.238.2019.03.22.10.11.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 10:11:04 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.93 as permitted sender) client-ip=192.55.52.93; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Mar 2019 10:11:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,256,1549958400"; d="scan'208";a="125010969" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga007.jf.intel.com with ESMTP; 22 Mar 2019 10:11:04 -0700 Subject: [PATCH v5 06/10] mm/sparsemem: Prepare for sub-section ranges From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Vlastimil Babka , Logan Gunthorpe , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org Date: Fri, 22 Mar 2019 09:58:25 -0700 Message-ID: <155327390559.225273.10974961998965315841.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Prepare the memory hot-{add,remove} paths for handling sub-section ranges by plumbing the starting page frame and number of pages being handled through arch_{add,remove}_memory() to sparse_{add,remove}_one_section(). This is simply plumbing, small cleanups, and some identifier renames. No intended functional changes. Cc: Michal Hocko Cc: Vlastimil Babka Cc: Logan Gunthorpe Signed-off-by: Dan Williams --- arch/x86/mm/init_64.c | 11 +++++ include/linux/memory_hotplug.h | 7 ++- mm/memory_hotplug.c | 85 ++++++++++++++++++++++------------------ mm/sparse.c | 7 ++- 4 files changed, 66 insertions(+), 44 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 799887eada60..4ae817a79ac3 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -781,6 +781,17 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, { int ret; + /* + * Only allow partial section hotplug for !memblock ranges, + * since register_new_memory() requires section alignment, and + * CONFIG_SPARSEMEM_VMEMMAP=n requires sections to be fully + * populated. + */ + if ((!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) || want_memblock) + && ((start_pfn & ~PAGE_SECTION_MASK) + || (nr_pages & ~PAGE_SECTION_MASK))) + return -EINVAL; + ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); WARN_ON_ONCE(ret); diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 8ade08c50d26..83ee937fb67f 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -336,9 +336,10 @@ extern int arch_add_memory(int nid, u64 start, u64 size, extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap); extern bool is_memblock_offlined(struct memory_block *mem); -extern int sparse_add_one_section(int nid, unsigned long start_pfn, - struct vmem_altmap *altmap); -extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms, +extern int sparse_add_section(int nid, unsigned long pfn, + unsigned long nr_pages, struct vmem_altmap *altmap); +extern void sparse_remove_section(struct zone *zone, struct mem_section *ms, + unsigned long pfn, unsigned long nr_pages, unsigned long map_offset, struct vmem_altmap *altmap); extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 0ea3bb58d223..e093348f5d04 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -250,22 +250,23 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat) } #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */ -static int __meminit __add_section(int nid, unsigned long phys_start_pfn, - struct vmem_altmap *altmap, bool want_memblock) +static int __meminit __add_section(int nid, unsigned long pfn, + unsigned long nr_pages, struct vmem_altmap *altmap, + bool want_memblock) { int ret; - if (pfn_valid(phys_start_pfn)) + if (pfn_valid(pfn)) return -EEXIST; - ret = sparse_add_one_section(nid, phys_start_pfn, altmap); + ret = sparse_add_section(nid, pfn, nr_pages, altmap); if (ret < 0) return ret; if (!want_memblock) return 0; - return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn)); + return hotplug_memory_register(nid, __pfn_to_section(pfn)); } /* @@ -274,23 +275,18 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn, * call this function after deciding the zone to which to * add the new pages. */ -int __ref __add_pages(int nid, unsigned long phys_start_pfn, - unsigned long nr_pages, struct vmem_altmap *altmap, - bool want_memblock) +int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, + struct vmem_altmap *altmap, bool want_memblock) { unsigned long i; int err = 0; int start_sec, end_sec; - /* during initialize mem_map, align hot-added range to section */ - start_sec = pfn_to_section_nr(phys_start_pfn); - end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1); - if (altmap) { /* * Validate altmap is within bounds of the total request */ - if (altmap->base_pfn != phys_start_pfn + if (altmap->base_pfn != pfn || vmem_altmap_offset(altmap) > nr_pages) { pr_warn_once("memory add fail, invalid altmap\n"); err = -EINVAL; @@ -299,9 +295,16 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn, altmap->alloc = 0; } + start_sec = pfn_to_section_nr(pfn); + end_sec = pfn_to_section_nr(pfn + nr_pages - 1); for (i = start_sec; i <= end_sec; i++) { - err = __add_section(nid, section_nr_to_pfn(i), altmap, - want_memblock); + unsigned long pfns; + + pfns = min(nr_pages, PAGES_PER_SECTION + - (pfn & ~PAGE_SECTION_MASK)); + err = __add_section(nid, pfn, pfns, altmap, want_memblock); + pfn += pfns; + nr_pages -= pfns; /* * EEXIST is finally dealt with by ioresource collision @@ -506,10 +509,10 @@ static void shrink_pgdat_span(struct pglist_data *pgdat, pgdat->node_spanned_pages = 0; } -static void __remove_zone(struct zone *zone, unsigned long start_pfn) +static void __remove_zone(struct zone *zone, unsigned long start_pfn, + unsigned long nr_pages) { struct pglist_data *pgdat = zone->zone_pgdat; - int nr_pages = PAGES_PER_SECTION; unsigned long flags; pgdat_resize_lock(zone->zone_pgdat, &flags); @@ -518,11 +521,11 @@ static void __remove_zone(struct zone *zone, unsigned long start_pfn) pgdat_resize_unlock(zone->zone_pgdat, &flags); } -static int __remove_section(struct zone *zone, struct mem_section *ms, - unsigned long map_offset, struct vmem_altmap *altmap) +static int __remove_section(struct zone *zone, unsigned long pfn, + unsigned long nr_pages, unsigned long map_offset, + struct vmem_altmap *altmap) { - unsigned long start_pfn; - int scn_nr; + struct mem_section *ms = __nr_to_section(pfn_to_section_nr(pfn)); int ret = -EINVAL; if (!valid_section(ms)) @@ -532,18 +535,16 @@ static int __remove_section(struct zone *zone, struct mem_section *ms, if (ret) return ret; - scn_nr = __section_nr(ms); - start_pfn = section_nr_to_pfn((unsigned long)scn_nr); - __remove_zone(zone, start_pfn); + __remove_zone(zone, pfn, nr_pages); - sparse_remove_one_section(zone, ms, map_offset, altmap); + sparse_remove_section(zone, ms, pfn, nr_pages, map_offset, altmap); return 0; } /** * __remove_pages() - remove sections of pages from a zone * @zone: zone from which pages need to be removed - * @phys_start_pfn: starting pageframe (must be aligned to start of a section) + * @pfn: starting pageframe (must be aligned to start of a section) * @nr_pages: number of pages to remove (must be multiple of section size) * @altmap: alternative device page map or %NULL if default memmap is used * @@ -552,12 +553,11 @@ static int __remove_section(struct zone *zone, struct mem_section *ms, * sure that pages are marked reserved and zones are adjust properly by * calling offline_pages(). */ -int __remove_pages(struct zone *zone, unsigned long phys_start_pfn, +int __remove_pages(struct zone *zone, unsigned long pfn, unsigned long nr_pages, struct vmem_altmap *altmap) { - unsigned long i; unsigned long map_offset = 0; - int sections_to_remove, ret = 0; + int i, start_sec, end_sec, ret = 0; /* In the ZONE_DEVICE case device driver owns the memory region */ if (is_dev_zone(zone)) { @@ -566,7 +566,7 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn, } else { resource_size_t start, size; - start = phys_start_pfn << PAGE_SHIFT; + start = pfn << PAGE_SHIFT; size = nr_pages * PAGE_SIZE; ret = release_mem_region_adjustable(&iomem_resource, start, @@ -582,18 +582,27 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn, clear_zone_contiguous(zone); /* - * We can only remove entire sections + * Only ZONE_DEVICE memory is enabled to remove + * section-unaligned ranges. See register_new_memory() which + * assumes section alignment and is skipped for ZONE_DEVICE + * ranges. */ - BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK); - BUG_ON(nr_pages % PAGES_PER_SECTION); + if (!is_dev_zone(zone) && ((pfn | nr_pages) & ~PAGE_SECTION_MASK)) { + WARN(1, "section unaligned removal not supported\n"); + return -EINVAL; + } - sections_to_remove = nr_pages / PAGES_PER_SECTION; - for (i = 0; i < sections_to_remove; i++) { - unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION; + start_sec = pfn_to_section_nr(pfn); + end_sec = pfn_to_section_nr(pfn + nr_pages - 1); + for (i = start_sec; i <= end_sec; i++) { + unsigned long pfns; cond_resched(); - ret = __remove_section(zone, __pfn_to_section(pfn), map_offset, - altmap); + pfns = min(nr_pages, PAGES_PER_SECTION + - (pfn & ~PAGE_SECTION_MASK)); + ret = __remove_section(zone, pfn, pfns, map_offset, altmap); + pfn += pfns; + nr_pages -= pfns; map_offset = 0; if (ret) break; diff --git a/mm/sparse.c b/mm/sparse.c index 38f80639c6cc..767713c88cf5 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -748,8 +748,8 @@ static void free_map_bootmem(struct page *memmap) * set. If this is <=0, then that means that the passed-in * map was not consumed and must be freed. */ -int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, - struct vmem_altmap *altmap) +int __meminit sparse_add_section(int nid, unsigned long start_pfn, + unsigned long nr_pages, struct vmem_altmap *altmap) { unsigned long section_nr = pfn_to_section_nr(start_pfn); struct mem_section_usage *usage; @@ -858,7 +858,8 @@ static void free_section_usage(struct page *memmap, free_map_bootmem(memmap); } -void sparse_remove_one_section(struct zone *zone, struct mem_section *ms, +void sparse_remove_section(struct zone *zone, struct mem_section *ms, + unsigned long pfn, unsigned long nr_pages, unsigned long map_offset, struct vmem_altmap *altmap) { struct page *memmap = NULL; From patchwork Fri Mar 22 16:58:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10866319 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 84314922 for ; Fri, 22 Mar 2019 17:11:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 60EBE2A88F for ; Fri, 22 Mar 2019 17:11:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 548652A8C0; Fri, 22 Mar 2019 17:11:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 79CE02A88F for ; Fri, 22 Mar 2019 17:11:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 369A76B000E; Fri, 22 Mar 2019 13:11:12 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3186D6B0010; Fri, 22 Mar 2019 13:11:12 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2083D6B0266; Fri, 22 Mar 2019 13:11:12 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id D261B6B000E for ; Fri, 22 Mar 2019 13:11:11 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id e5so2881353pfi.23 for ; Fri, 22 Mar 2019 10:11:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=hImD7Y00gaX81YIfSMmqjk6E4HgjxDbyb28VwVwb7xc=; b=aGrzfsZ3XyR5nxzvMcJEKSXxUsLBT0lSJop0xl6BIVrEhudO67bLSyrpJoe9lh5jlL /EaIKBMYz9MvnTJZ9ZAYEmmd6yQwiTtZW4BQRJxtVWs/mp/Gi3Xn59HQZA/WwDgki+8s abj3J0lv8Nl7zjb0Q9cxBSUnw3dM3J23QCVFlG+/D4G1NJcbdMoviSJrcL5W/Q7MOJ/6 TXDGyRT1xtZmfEuWUl6PPdNbkJr59Bnxj7w4PRvaNfqhLwsMWWu06NvDKGoquE5QPyTd sHRuDbdAlVdvXtPBnVrun7aPkrnU+FW4atY7qySpJxKSaSry+DOtCg8WCgkuqXILdQgS GXDA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAURsI9Xs3TMtC6m0PQanYhXFhK5F6mINwiUF4S//CkxlXMISY92 Vk67iOhu++9OHIvgvcXnn23E6AeGQ61YP7eL4sAGMQQU10tg+sV2ODGBaVjUdORMcUCBMRympQ+ rhrM5p1b8Y3kSrlGZjMbfEy9CjVQ+cSFXqnEA8XIpQjkX6DtHc+gunO5Uvjxpuy1Otw== X-Received: by 2002:a17:902:5c5:: with SMTP id f63mr7726880plf.64.1553274671490; Fri, 22 Mar 2019 10:11:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqxBO3KLClO4F6Jp7OHqytRRg822Pn1FLZDL3chP0O9shPVu6160Cxh4zYOolFIz6Z4E05zV X-Received: by 2002:a17:902:5c5:: with SMTP id f63mr7726813plf.64.1553274670525; Fri, 22 Mar 2019 10:11:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553274670; cv=none; d=google.com; s=arc-20160816; b=nC/KRnZRO3I235CqD//N2ZXDTnlG3PORXElw8jaZasU35BhUEK7P9OXWffJS08eTTt OCQxz5mfT/q2WxzcYz2EpOthIMGpxrpXRMKvYEQPLVMns9kAFjSckp+VWP7ovVD+51dv lu8YBnEvEg/0u37ZQEB494wHfkxD6+7WA3b+RICwcKk8hTd5z+L7024asK6ppg/bQRoI r0peywzrSsqpKqJMu2GPeZVhYmZrfdGpKJZNR70AyhcKgrz7R39WG/BqTGVyKMJVOQZ3 t1Iy6SDJgIEvxQZVTd+K7wCLp3i+MNc9VSmZ3wk1A1EA8TMHUo4xOoG4W6KiQiBkITh8 2EdQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=hImD7Y00gaX81YIfSMmqjk6E4HgjxDbyb28VwVwb7xc=; b=iWoLbSETuc+/zrMtb923gwmJBRSHhxNJT2zlMUojgk03yYRXIop2my4tEFWF09lkFF KNUIfKpbIkfSLkem3w9N5shkMRbLlccIxiB88VicTDfGHrckJ7CAJ008rXPObNmzywZl SampPhowvaDJ36YYyBwYQF+vzKdI2HkajM7x9wAChkmiA58rGnebRMAe3KVu22UtShxn 4lq3NyIMyBn7FsdxOoCxHlF/PTlH1Mzmk+ZuRXUJsSYW5+cMpecJjZ66XvAQjtSWBZ+K ZPeLitSDW1J4sig2nXz2m6UOHydn3Bi7zjfVz8VGXTCVOObc1dW/G5ZwXlW1/DcPOIIm zzxQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id r3si6976449pgp.154.2019.03.22.10.11.10 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 10:11:10 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Mar 2019 10:11:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,256,1549958400"; d="scan'208";a="329775834" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga006.fm.intel.com with ESMTP; 22 Mar 2019 10:11:09 -0700 Subject: [PATCH v5 07/10] mm/sparsemem: Support sub-section hotplug From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Vlastimil Babka , Logan Gunthorpe , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org Date: Fri, 22 Mar 2019 09:58:30 -0700 Message-ID: <155327391072.225273.15649820215289276904.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The libnvdimm sub-system has suffered a series of hacks and broken workarounds for the memory-hotplug implementation's awkward section-aligned (128MB) granularity. For example the following backtrace is emitted when attempting arch_add_memory() with physical address ranges that intersect 'System RAM' (RAM) with 'Persistent Memory' (PMEM) within a given section: WARNING: CPU: 0 PID: 558 at kernel/memremap.c:300 devm_memremap_pages+0x3b5/0x4c0 devm_memremap_pages attempted on mixed region [mem 0x200000000-0x2fbffffff flags 0x200] [..] Call Trace: dump_stack+0x86/0xc3 __warn+0xcb/0xf0 warn_slowpath_fmt+0x5f/0x80 devm_memremap_pages+0x3b5/0x4c0 __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap] pmem_attach_disk+0x19a/0x440 [nd_pmem] Recently it was discovered that the problem goes beyond RAM vs PMEM collisions as some platform produce PMEM vs PMEM collisions within a given section. The libnvdimm workaround for that case revealed that the libnvdimm section-alignment-padding implementation has been broken for a long while. A fix for that long-standing breakage introduces as many problems as it solves as it would require a backward-incompatible change to the namespace metadata interpretation. Instead of that dubious route [1], address the root problem in the memory-hotplug implementation. [1]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com Cc: Michal Hocko Cc: Vlastimil Babka Cc: Logan Gunthorpe Signed-off-by: Dan Williams --- mm/sparse.c | 235 ++++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 158 insertions(+), 77 deletions(-) diff --git a/mm/sparse.c b/mm/sparse.c index 767713c88cf5..d41ad9643f86 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -83,8 +83,15 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid) unsigned long root = SECTION_NR_TO_ROOT(section_nr); struct mem_section *section; + /* + * An existing section is possible in the sub-section hotplug + * case. First hot-add instantiates, follow-on hot-add reuses + * the existing section. + * + * The mem_hotplug_lock resolves the apparent race below. + */ if (mem_section[root]) - return -EEXIST; + return 0; section = sparse_index_alloc(nid); if (!section) @@ -338,6 +345,15 @@ static void __meminit sparse_init_one_section(struct mem_section *ms, unsigned long pnum, struct page *mem_map, struct mem_section_usage *usage) { + /* + * Given that SPARSEMEM_VMEMMAP=y supports sub-section hotplug, + * ->section_mem_map can not be guaranteed to point to a full + * section's worth of memory. The field is only valid / used + * in the SPARSEMEM_VMEMMAP=n case. + */ + if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)) + mem_map = NULL; + ms->section_mem_map &= ~SECTION_MAP_MASK; ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum) | SECTION_HAS_MEM_MAP; @@ -743,58 +759,164 @@ static void free_map_bootmem(struct page *memmap) #endif /* CONFIG_MEMORY_HOTREMOVE */ #endif /* CONFIG_SPARSEMEM_VMEMMAP */ -/* - * returns the number of sections whose mem_maps were properly - * set. If this is <=0, then that means that the passed-in - * map was not consumed and must be freed. +#ifndef CONFIG_MEMORY_HOTREMOVE +static void free_map_bootmem(struct page *memmap) +{ +} +#endif + +static bool is_early_section(struct mem_section *ms) +{ + struct page *usage_page; + + usage_page = virt_to_page(ms->usage); + if (PageSlab(usage_page) || PageCompound(usage_page)) + return false; + else + return true; +} + +static void section_deactivate(unsigned long pfn, unsigned long nr_pages, + int nid, struct vmem_altmap *altmap) +{ + unsigned long mask = section_active_mask(pfn, nr_pages); + struct mem_section *ms = __pfn_to_section(pfn); + bool early_section = is_early_section(ms); + struct page *memmap = NULL; + + if (WARN(!ms->usage || (ms->usage->map_active & mask) != mask, + "section already deactivated: active: %#lx mask: %#lx\n", + ms->usage ? ms->usage->map_active : 0, mask)) + return; + + if (WARN(!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) + && nr_pages < PAGES_PER_SECTION, + "partial memory section removal not supported\n")) + return; + + /* + * There are 3 cases to handle across two configurations + * (SPARSEMEM_VMEMMAP={y,n}): + * + * 1/ deactivation of a partial hot-added section (only possible + * in the SPARSEMEM_VMEMMAP=y case). + * a/ section was present at memory init + * b/ section was hot-added post memory init + * 2/ deactivation of a complete hot-added section + * 3/ deactivation of a complete section from memory init + * + * For 1/, when map_active does not go to zero we will not be + * freeing the usage map, but still need to free the vmemmap + * range. + * + * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified + */ + ms->usage->map_active ^= mask; + if (ms->usage->map_active == 0) { + unsigned long section_nr = pfn_to_section_nr(pfn); + + if (!early_section) { + kfree(ms->usage); + ms->usage = NULL; + } + memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr); + ms->section_mem_map = sparse_encode_mem_map(NULL, section_nr); + } + + if (early_section && memmap) + free_map_bootmem(memmap); + else + depopulate_section_memmap(pfn, nr_pages, altmap); +} + +static struct page * __meminit section_activate(int nid, unsigned long pfn, + unsigned long nr_pages, struct vmem_altmap *altmap) +{ + unsigned long mask = section_active_mask(pfn, nr_pages); + struct mem_section *ms = __pfn_to_section(pfn); + struct mem_section_usage *usage = NULL; + struct page *memmap; + int rc = 0; + + if (!ms->usage) { + usage = kzalloc(mem_section_usage_size(), GFP_KERNEL); + if (!usage) + return ERR_PTR(-ENOMEM); + ms->usage = usage; + } + + if (!mask) + rc = -EINVAL; + else if (mask & ms->usage->map_active) + rc = -EEXIST; + else + ms->usage->map_active |= mask; + + if (rc) { + if (usage) + ms->usage = NULL; + kfree(usage); + return ERR_PTR(rc); + } + + /* + * The early init code does not consider partially populated + * initial sections, it simply assumes that memory will never be + * referenced. If we hot-add memory into such a section then we + * do not need to populate the memmap and can simply reuse what + * is already there. + */ + if (nr_pages < PAGES_PER_SECTION && is_early_section(ms)) + return pfn_to_page(pfn); + + memmap = populate_section_memmap(pfn, nr_pages, nid, altmap); + if (!memmap) { + section_deactivate(pfn, nr_pages, nid, altmap); + return ERR_PTR(-ENOMEM); + } + + return memmap; +} + +/** + * sparse_add_section() - create a new memmap section, or populate an + * existing one + * @zone: host zone for the new memory mapping + * @start_pfn: first pfn to add (section aligned if zone != ZONE_DEVICE) + * @nr_pages: number of new pages to add + * + * returns the number of sections whose mem_maps were properly set. If + * this is <=0, then that means that the passed-in map was not consumed + * and must be freed. */ int __meminit sparse_add_section(int nid, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap) { unsigned long section_nr = pfn_to_section_nr(start_pfn); - struct mem_section_usage *usage; - struct mem_section *ms; + struct mem_section *ms = __pfn_to_section(start_pfn); struct page *memmap; int ret; - /* - * no locking for this, because it does its own - * plus, it does a kmalloc - */ ret = sparse_index_init(section_nr, nid); if (ret < 0 && ret != -EEXIST) return ret; - ret = 0; - memmap = populate_section_memmap(start_pfn, PAGES_PER_SECTION, nid, - altmap); - if (!memmap) - return -ENOMEM; - usage = kzalloc(mem_section_usage_size(), GFP_KERNEL); - if (!usage) { - depopulate_section_memmap(start_pfn, PAGES_PER_SECTION, altmap); - return -ENOMEM; - } - ms = __pfn_to_section(start_pfn); - if (ms->section_mem_map & SECTION_MARKED_PRESENT) { - ret = -EEXIST; - goto out; - } + memmap = section_activate(nid, start_pfn, nr_pages, altmap); + if (IS_ERR(memmap)) + return PTR_ERR(memmap); + ret = 0; /* * Poison uninitialized struct pages in order to catch invalid flags * combinations. */ - page_init_poison(memmap, sizeof(struct page) * PAGES_PER_SECTION); + page_init_poison(pfn_to_page(start_pfn), sizeof(struct page) * nr_pages); section_mark_present(ms); - sparse_init_one_section(ms, section_nr, memmap, usage); + sparse_init_one_section(ms, section_nr, memmap, ms->usage); -out: - if (ret < 0) { - kfree(usage); - depopulate_section_memmap(start_pfn, PAGES_PER_SECTION, altmap); - } + if (ret < 0) + section_deactivate(start_pfn, nr_pages, nid, altmap); return ret; } @@ -829,54 +951,13 @@ static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages) } #endif -static void free_section_usage(struct page *memmap, - struct mem_section_usage *usage, unsigned long pfn, - unsigned long nr_pages, struct vmem_altmap *altmap) -{ - struct page *usage_page; - - if (!usage) - return; - - usage_page = virt_to_page(usage); - /* - * Check to see if allocation came from hot-plug-add - */ - if (PageSlab(usage_page) || PageCompound(usage_page)) { - kfree(usage); - if (memmap) - depopulate_section_memmap(pfn, nr_pages, altmap); - return; - } - - /* - * The usemap came from bootmem. This is packed with other usemaps - * on the section which has pgdat at boot time. Just keep it as is now. - */ - - if (memmap) - free_map_bootmem(memmap); -} - void sparse_remove_section(struct zone *zone, struct mem_section *ms, unsigned long pfn, unsigned long nr_pages, unsigned long map_offset, struct vmem_altmap *altmap) { - struct page *memmap = NULL; - struct mem_section_usage *usage = NULL; - - if (ms->section_mem_map) { - usage = ms->usage; - memmap = sparse_decode_mem_map(ms->section_mem_map, - __section_nr(ms)); - ms->section_mem_map = 0; - ms->usage = NULL; - } - - clear_hwpoisoned_pages(memmap + map_offset, - PAGES_PER_SECTION - map_offset); - free_section_usage(memmap, usage, section_nr_to_pfn(__section_nr(ms)), - PAGES_PER_SECTION, altmap); + clear_hwpoisoned_pages(pfn_to_page(pfn) + map_offset, + nr_pages - map_offset); + section_deactivate(pfn, nr_pages, zone_to_nid(zone), altmap); } #endif /* CONFIG_MEMORY_HOTREMOVE */ #endif /* CONFIG_MEMORY_HOTPLUG */ From patchwork Fri Mar 22 16:58:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10866323 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AD8A41390 for ; Fri, 22 Mar 2019 17:11:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8BA462A88F for ; Fri, 22 Mar 2019 17:11:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 803F72A8C2; Fri, 22 Mar 2019 17:11:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BB6BE2A88F for ; Fri, 22 Mar 2019 17:11:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 899906B0010; Fri, 22 Mar 2019 13:11:17 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 847CD6B0266; Fri, 22 Mar 2019 13:11:17 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7396C6B0269; Fri, 22 Mar 2019 13:11:17 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 396B16B0010 for ; Fri, 22 Mar 2019 13:11:17 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id g1so2107315pfo.2 for ; Fri, 22 Mar 2019 10:11:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=bjCWBK7XWA7ZduHohUaiiF9VXYObt6QrOmEaCPsYtCg=; b=ZIUouXRrUvSXqhRAZDluO8KnCH5m/IBvh2GFVptZ3HpTK3QREH14cEzM9ws3RY/lZc 6lVTbxzmd4W/+B6uYsN0YhEXRXA9YTRtG2Z76oB7S84nU/bRsun4j4tdfQmsIwrenlXH nrzQ09NS6MZN+XqefiOw8uJsheIuzs8h/lGs7sDVgfJJ0qR4/hRdkWJMn8ehMOLHF8tb KN/GCApYI4JLYHuYCxbL+wl/aL6A5Oi7YFNBs3QLGKHhf+nUj+1LGEUHPglAPs91IfLJ vzo+IVAVJhRQ3/HGoum59062GMqzkStf7cmthf7iV/gywJY0w75Urw8yEbzdZlYZmoLk 2gtA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAWd+F2NDsAZk0m97RWK+LqYbXP7QVKPQdQqSGXOWhaOIQ9qBYLR hs8msRFGE1MocSvJQXQ/hUXQpSyGu9tt7FR9nOTBt3vMayfeR91uZNPnK+r4PIR98buwam5a2l2 2+d2pZJAjvyFr5ty+GIARC+j8cWHfprgV8M+7vtdYKj96B73yye2toz0n5u8YlKjPkw== X-Received: by 2002:a17:902:361:: with SMTP id 88mr10766903pld.78.1553274676895; Fri, 22 Mar 2019 10:11:16 -0700 (PDT) X-Google-Smtp-Source: APXvYqymI/q8UZ+uicFYHeKyhWktuC9Ye7s/PHzmOBKy58i3rq7mfXZ8s8YAcTA8YdgMD3hmLyzO X-Received: by 2002:a17:902:361:: with SMTP id 88mr10766841pld.78.1553274676067; Fri, 22 Mar 2019 10:11:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553274676; cv=none; d=google.com; s=arc-20160816; b=fpsO37A4XFtcW9nFSjAqvr7hSSRxSufCsuJtvpZ0ljLcsWE7ar//bGGIkRTQNWudFZ ap3A0JwWesry+puIfhPrQJI+ShFibvvX5j55i7nSxOKuQHhHqubjVF5yqohfp8X1X1jC 0bEUI9DllMQlYSfHExXdXsadIQMzOVyJf5DxRg6wnCfieZ7Sywxlyess2rs9H22IBO1J eToZB0S3pWl8Vfd4dgEwXb7UV8LDAqm4Dz7v8R+fkpK4BVoctzsy+1Z+uiVx/9eaWsGv LZZYfPil4/AF/eTst0PDyB10RzSAyGVXiVNq9nfPVvVWJT64WbQgsAZsjcdlGasqs4Cq PK6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=bjCWBK7XWA7ZduHohUaiiF9VXYObt6QrOmEaCPsYtCg=; b=NGeWCY/Yg6jBceA75HRIlFsgLXD1B6PDQPe8Q9aLUCXuYKRh1O04lxPyDbrz+Jsgjj ps6wv9Em1FUo5OnwoaXs66H59grWmm3fxRsMccnclUGb/QNTvDSTAoMAUcVSI3G6+3fJ otnz67qmdogSuOHbIcmR4Q/HsobCU5aQk24ylD4XqTTWEmNFnmJe3v8s+nqkqzad4Hxr jtaOMHxf3+mLGeKCIsaB8WrWFw8hKm61PyUuQ/z2m9f5/IfQYvqNIlY27NwfAlnFUWkz MJH48D98PMnE1jtz7TqYPXwHHzvq6RX6cfCy3hDitRXQJ2C06zr8P/QOq/83IlaMfPH4 iutg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id i65si7045774pfj.105.2019.03.22.10.11.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 10:11:16 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Mar 2019 10:11:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,256,1549958400"; d="scan'208";a="136552998" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga007.fm.intel.com with ESMTP; 22 Mar 2019 10:11:14 -0700 Subject: [PATCH v5 08/10] mm/devm_memremap_pages: Enable sub-section remap From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Toshi Kani , =?utf-8?b?SsOpcsO0bWU=?= Glisse , Logan Gunthorpe , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org Date: Fri, 22 Mar 2019 09:58:36 -0700 Message-ID: <155327391603.225273.924677730380586912.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Teach devm_memremap_pages() about the new sub-section capabilities of arch_{add,remove}_memory(). Effectively, just replace all usage of align_start, align_end, and align_size with res->start, res->end, and resource_size(res). The existing sanity check will still make sure that the two separate remap attempts do not collide within a sub-section (2MB on x86). Cc: Michal Hocko Cc: Toshi Kani Cc: Jérôme Glisse Cc: Logan Gunthorpe Signed-off-by: Dan Williams --- kernel/memremap.c | 55 +++++++++++++++++++++-------------------------------- 1 file changed, 22 insertions(+), 33 deletions(-) diff --git a/kernel/memremap.c b/kernel/memremap.c index dda1367b385d..08344869e717 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -59,7 +59,7 @@ static unsigned long pfn_first(struct dev_pagemap *pgmap) struct vmem_altmap *altmap = &pgmap->altmap; unsigned long pfn; - pfn = res->start >> PAGE_SHIFT; + pfn = PHYS_PFN(res->start); if (pgmap->altmap_valid) pfn += vmem_altmap_offset(altmap); return pfn; @@ -87,7 +87,6 @@ static void devm_memremap_pages_release(void *data) struct dev_pagemap *pgmap = data; struct device *dev = pgmap->dev; struct resource *res = &pgmap->res; - resource_size_t align_start, align_size; unsigned long pfn; int nid; @@ -96,25 +95,21 @@ static void devm_memremap_pages_release(void *data) put_page(pfn_to_page(pfn)); /* pages are dead and unused, undo the arch mapping */ - align_start = res->start & ~(PA_SECTION_SIZE - 1); - align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE) - - align_start; - - nid = page_to_nid(pfn_to_page(align_start >> PAGE_SHIFT)); + nid = page_to_nid(pfn_to_page(PHYS_PFN(res->start))); mem_hotplug_begin(); if (pgmap->type == MEMORY_DEVICE_PRIVATE) { - pfn = align_start >> PAGE_SHIFT; + pfn = PHYS_PFN(res->start); __remove_pages(page_zone(pfn_to_page(pfn)), pfn, - align_size >> PAGE_SHIFT, NULL); + PHYS_PFN(resource_size(res)), NULL); } else { - arch_remove_memory(nid, align_start, align_size, + arch_remove_memory(nid, res->start, resource_size(res), pgmap->altmap_valid ? &pgmap->altmap : NULL); - kasan_remove_zero_shadow(__va(align_start), align_size); + kasan_remove_zero_shadow(__va(res->start), resource_size(res)); } mem_hotplug_done(); - untrack_pfn(NULL, PHYS_PFN(align_start), align_size); + untrack_pfn(NULL, PHYS_PFN(res->start), resource_size(res)); pgmap_array_delete(res); dev_WARN_ONCE(dev, pgmap->altmap.alloc, "%s: failed to free all reserved pages\n", __func__); @@ -141,7 +136,6 @@ static void devm_memremap_pages_release(void *data) */ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) { - resource_size_t align_start, align_size, align_end; struct vmem_altmap *altmap = pgmap->altmap_valid ? &pgmap->altmap : NULL; struct resource *res = &pgmap->res; @@ -152,26 +146,21 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) if (!pgmap->ref || !pgmap->kill) return ERR_PTR(-EINVAL); - align_start = res->start & ~(PA_SECTION_SIZE - 1); - align_size = ALIGN(res->start + resource_size(res), PA_SECTION_SIZE) - - align_start; - align_end = align_start + align_size - 1; - - conflict_pgmap = get_dev_pagemap(PHYS_PFN(align_start), NULL); + conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->start), NULL); if (conflict_pgmap) { dev_WARN(dev, "Conflicting mapping in same section\n"); put_dev_pagemap(conflict_pgmap); return ERR_PTR(-ENOMEM); } - conflict_pgmap = get_dev_pagemap(PHYS_PFN(align_end), NULL); + conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->end), NULL); if (conflict_pgmap) { dev_WARN(dev, "Conflicting mapping in same section\n"); put_dev_pagemap(conflict_pgmap); return ERR_PTR(-ENOMEM); } - is_ram = region_intersects(align_start, align_size, + is_ram = region_intersects(res->start, resource_size(res), IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE); if (is_ram != REGION_DISJOINT) { @@ -192,8 +181,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) if (nid < 0) nid = numa_mem_id(); - error = track_pfn_remap(NULL, &pgprot, PHYS_PFN(align_start), 0, - align_size); + error = track_pfn_remap(NULL, &pgprot, PHYS_PFN(res->start), 0, + resource_size(res)); if (error) goto err_pfn_remap; @@ -211,16 +200,16 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) * arch_add_memory(). */ if (pgmap->type == MEMORY_DEVICE_PRIVATE) { - error = add_pages(nid, align_start >> PAGE_SHIFT, - align_size >> PAGE_SHIFT, NULL, false); + error = add_pages(nid, PHYS_PFN(res->start), + PHYS_PFN(resource_size(res)), NULL, false); } else { - error = kasan_add_zero_shadow(__va(align_start), align_size); + error = kasan_add_zero_shadow(__va(res->start), resource_size(res)); if (error) { mem_hotplug_done(); goto err_kasan; } - error = arch_add_memory(nid, align_start, align_size, altmap, + error = arch_add_memory(nid, res->start, resource_size(res), altmap, false); } @@ -228,8 +217,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) struct zone *zone; zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE]; - move_pfn_range_to_zone(zone, align_start >> PAGE_SHIFT, - align_size >> PAGE_SHIFT, altmap); + move_pfn_range_to_zone(zone, PHYS_PFN(res->start), + PHYS_PFN(resource_size(res)), altmap); } mem_hotplug_done(); @@ -241,8 +230,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) * to allow us to do the work while not holding the hotplug lock. */ memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE], - align_start >> PAGE_SHIFT, - align_size >> PAGE_SHIFT, pgmap); + PHYS_PFN(res->start), + PHYS_PFN(resource_size(res)), pgmap); percpu_ref_get_many(pgmap->ref, pfn_end(pgmap) - pfn_first(pgmap)); error = devm_add_action_or_reset(dev, devm_memremap_pages_release, @@ -253,9 +242,9 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) return __va(res->start); err_add_memory: - kasan_remove_zero_shadow(__va(align_start), align_size); + kasan_remove_zero_shadow(__va(res->start), resource_size(res)); err_kasan: - untrack_pfn(NULL, PHYS_PFN(align_start), align_size); + untrack_pfn(NULL, PHYS_PFN(res->start), resource_size(res)); err_pfn_remap: pgmap_array_delete(res); err_array: From patchwork Fri Mar 22 16:58:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10866327 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3FA07922 for ; Fri, 22 Mar 2019 17:11:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1D8E32A88F for ; Fri, 22 Mar 2019 17:11:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 11D212A8C0; Fri, 22 Mar 2019 17:11:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8D6172A88F for ; Fri, 22 Mar 2019 17:11:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7126D6B0266; Fri, 22 Mar 2019 13:11:24 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6E9736B0269; Fri, 22 Mar 2019 13:11:24 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B4876B026A; Fri, 22 Mar 2019 13:11:24 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id 21A8E6B0266 for ; Fri, 22 Mar 2019 13:11:24 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id v16so2895302pfn.11 for ; Fri, 22 Mar 2019 10:11:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=Vg0Whlt5nWKS2UY2aDlAxifF3nfmLxQ9zBg1NLRGZtg=; b=i4DRHiHra+dyMhHCX3UBCId3nDT1qYIshrCGsksoGAj7LuOphUL1kiV4rbHa5+3zfb DvteW1oDE7lr4/u/sumM70W80ROG9uBHHh5IieSAIOpIQIVNmiS9cObtg6dAuxU0IG0R KVKuPIL8yo0lgyme4aTKZkcg5xsE3dfXHAiI39YPawObhKwa62hj/lGdVwGbLSXBTzZg WEJf6jxzZpzvCbX61axo0f9kAomW4x6zV3XP47J6Vit/CfjjaE75HmBH2rEw0dQFEGCj 9c8Znnf7q7UlO9G1w3wp6vp49Wiva63+0ThkpQxr6XgaRGh+VNHAJKm1bWUnuPBEv0Yu bOxg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAVgQr5A81wBquNKbXwNkbCiltPIdbGGnQRiGOBIL8IS4wD1FBL5 TSJJlHvGo0DBRxPBTE+DcQLHElGhKlEcQZNmuk6auMOE5v0DI2w0Tg1sewL6JEoRL5G9kEzK5dr gDkLboBZ8dldvGmJyPLndooIUP+IaOJX0IMQOnmtSSRZGjYKKAiiHQTvhcs1eFU5KTA== X-Received: by 2002:a65:538b:: with SMTP id x11mr9748734pgq.35.1553274683746; Fri, 22 Mar 2019 10:11:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqynki6iw4NUIo98w1WzY+mIwrRwvXffd0DOj47ZFLmtwD6+09+7/8sZOjop6phUD6srxHUK X-Received: by 2002:a65:538b:: with SMTP id x11mr9748632pgq.35.1553274682410; Fri, 22 Mar 2019 10:11:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553274682; cv=none; d=google.com; s=arc-20160816; b=vRnqIlNs3/vn5P1Pxia4v5aAMu5Ac2d/cQQ9CMg4nbv23o6pNT/uJhdntoekO+P5WN apVXAWoOXq/ba6r53tSwkc2wW8yytGMlKzvT1DLx/AMdRbxYKZsVz5IvafoNa29k8FBj uAGCMrBRvyJHPPXfanzLimNFPzwSObnVIUfKNMAiHCZICdihwNlic2xqPLf5kdc6AAWk 3TMLNh0Hwy/c0bnVXQMkJLRdpy0ofJ0C4rcK6yGMYdxWERvUFj/nfmXnzjaRyORWF6Xj l2PCsPdF6sEuamwQW/jO++oh0agNt6NkY//EGSbTePRTkD35fOx4i7X6b8dh2uxOJhjM NURQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=Vg0Whlt5nWKS2UY2aDlAxifF3nfmLxQ9zBg1NLRGZtg=; b=0WckX+M2dMmbssvIuIYy64YaSZPbghfdTOAy5tIF3HlPHigTsBCFrLSe/yVZoJbTGJ cNGTMDFdpCduS8XDsmffVmAAbDjkeK7/hhOrMjsXf24uUiKLB08DH/kZq4mIcjuIAp7l LjYlBh3pGx5cvYuQq9S+nu25TrdhfexgXjikEXnFELZLNSRQ0RcITRhVCYhjDyJQxXox bxEm+CY4EBNDHSoiFDCz+vTsRk7B9Im5DKz9GP6bqQnmQ0niqi4RUJryrDriE6BMDnW4 IwJRHFsqd89kecHk+0E20aHGcHASga7jIwwReO5QL//4Iq2JWXfAwDeZjErlNtS0negD mGzA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id t26si7072004pgu.504.2019.03.22.10.11.22 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 10:11:22 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Mar 2019 10:11:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,256,1549958400"; d="scan'208";a="154240491" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga002.fm.intel.com with ESMTP; 22 Mar 2019 10:11:20 -0700 Subject: [PATCH v5 09/10] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields From: Dan Williams To: akpm@linux-foundation.org Cc: stable@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org Date: Fri, 22 Mar 2019 09:58:41 -0700 Message-ID: <155327392164.225273.1248065676074470935.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP At namespace creation time there is the potential for the "expected to be zero" fields of a 'pfn' info-block to be filled with indeterminate data. While the kernel buffer is zeroed on allocation it is immediately overwritten by nd_pfn_validate() filling it with the current contents of the on-media info-block location. For fields like, 'flags' and the 'padding' it potentially means that future implementations can not rely on those fields being zero. In preparation to stop using the 'start_pad' and 'end_trunc' fields for section alignment, arrange for fields that are not explicitly initialized to be guaranteed zero. Bump the minor version to indicate it is safe to assume the 'padding' and 'flags' are zero. Otherwise, this corruption is expected to benign since all other critical fields are explicitly initialized. Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem") Cc: Signed-off-by: Dan Williams --- drivers/nvdimm/dax_devs.c | 2 +- drivers/nvdimm/pfn.h | 1 + drivers/nvdimm/pfn_devs.c | 18 +++++++++++++++--- 3 files changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/nvdimm/dax_devs.c b/drivers/nvdimm/dax_devs.c index 0453f49dc708..326f02ffca81 100644 --- a/drivers/nvdimm/dax_devs.c +++ b/drivers/nvdimm/dax_devs.c @@ -126,7 +126,7 @@ int nd_dax_probe(struct device *dev, struct nd_namespace_common *ndns) nvdimm_bus_unlock(&ndns->dev); if (!dax_dev) return -ENOMEM; - pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); + pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); nd_pfn->pfn_sb = pfn_sb; rc = nd_pfn_validate(nd_pfn, DAX_SIG); dev_dbg(dev, "dax: %s\n", rc == 0 ? dev_name(dax_dev) : ""); diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h index dde9853453d3..e901e3a3b04c 100644 --- a/drivers/nvdimm/pfn.h +++ b/drivers/nvdimm/pfn.h @@ -36,6 +36,7 @@ struct nd_pfn_sb { __le32 end_trunc; /* minor-version-2 record the base alignment of the mapping */ __le32 align; + /* minor-version-3 guarantee the padding and flags are zero */ u8 padding[4000]; __le64 checksum; }; diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index d271bd731af7..f0e918186504 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -420,6 +420,15 @@ static int nd_pfn_clear_memmap_errors(struct nd_pfn *nd_pfn) return 0; } +/** + * nd_pfn_validate - read and validate info-block + * @nd_pfn: fsdax namespace runtime state / properties + * @sig: 'devdax' or 'fsdax' signature + * + * Upon return the info-block buffer contents (->pfn_sb) are + * indeterminate when validation fails, and a coherent info-block + * otherwise. + */ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) { u64 checksum, offset; @@ -565,7 +574,7 @@ int nd_pfn_probe(struct device *dev, struct nd_namespace_common *ndns) nvdimm_bus_unlock(&ndns->dev); if (!pfn_dev) return -ENOMEM; - pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); + pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL); nd_pfn = to_nd_pfn(pfn_dev); nd_pfn->pfn_sb = pfn_sb; rc = nd_pfn_validate(nd_pfn, PFN_SIG); @@ -702,7 +711,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) u64 checksum; int rc; - pfn_sb = devm_kzalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL); + pfn_sb = devm_kmalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL); if (!pfn_sb) return -ENOMEM; @@ -711,11 +720,14 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) sig = DAX_SIG; else sig = PFN_SIG; + rc = nd_pfn_validate(nd_pfn, sig); if (rc != -ENODEV) return rc; /* no info block, do init */; + memset(pfn_sb, 0, sizeof(*pfn_sb)); + nd_region = to_nd_region(nd_pfn->dev.parent); if (nd_region->ro) { dev_info(&nd_pfn->dev, @@ -768,7 +780,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) memcpy(pfn_sb->uuid, nd_pfn->uuid, 16); memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16); pfn_sb->version_major = cpu_to_le16(1); - pfn_sb->version_minor = cpu_to_le16(2); + pfn_sb->version_minor = cpu_to_le16(3); pfn_sb->start_pad = cpu_to_le32(start_pad); pfn_sb->end_trunc = cpu_to_le32(end_trunc); pfn_sb->align = cpu_to_le32(nd_pfn->align); From patchwork Fri Mar 22 16:58:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10866331 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D33071390 for ; Fri, 22 Mar 2019 17:11:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AAE982A88F for ; Fri, 22 Mar 2019 17:11:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9F2222A8C0; Fri, 22 Mar 2019 17:11:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E66A62A88F for ; Fri, 22 Mar 2019 17:11:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5182F6B0269; Fri, 22 Mar 2019 13:11:28 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4C8CF6B026A; Fri, 22 Mar 2019 13:11:28 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 366346B026B; Fri, 22 Mar 2019 13:11:28 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 0A0AA6B0269 for ; Fri, 22 Mar 2019 13:11:28 -0400 (EDT) Received: by mail-pg1-f198.google.com with SMTP id e5so2701238pgc.16 for ; Fri, 22 Mar 2019 10:11:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=sFbP31RLAul+9eYp0JUXZ3lsrzxrgfRvgKAdscas9PA=; b=kOSPikIzrB/Vwuh7Ks7Ii77US99EExazB6xFskRJr4EmxScS1gN5seG36pQfML9I/c IEI65Y9JCQ0KfEs2Uml9eTeepx0zWnuCFrOmi5FrQwgOiGpFXSfBhp/WNGc/9LJprS37 +XbLnYDLb6sfy5n2qcd6G5cYdZno7vt7RY+0RqvJ3SxnIV+Z+lcFRjrRj5ZOhidIN6hs DSAYsXS+x5m1phrtwIW0lEJS0FsRK6yIesVAgRIm3ATwzHtuP25uw2WZUTlSXBqMoM0K GI6pF3434GzEn6X0t12sLV/6jw6EY2xO8jPzzHhd0MuYEbOmcvJgIWX+47BZwLw7DhG8 DArg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAVk0a984U7j4R8Hb/yO61eRF1zEr8CUBgFSkEKKNT+LSvBZKekI wb3p6VKBUCVq8Dc/KF+QRM8HlW/tO4bhgzNQGrD8cEByYwjdC6+PPwSMkqMBv2LUWUcmWjDss+N p4yV3IpWmDIpiNFW/yBVciK9tHlZ2x97x1oe4dAnoqb7jTU3ndYgZpOR8Fg7Lt7Zd9g== X-Received: by 2002:a63:e151:: with SMTP id h17mr9926658pgk.413.1553274687678; Fri, 22 Mar 2019 10:11:27 -0700 (PDT) X-Google-Smtp-Source: APXvYqx6LVyzp1eJ4CDEFnDmvDCENrfAA19u05CXOR+1VQeGJck7P1DpqaX7VwQQ3Pub9ondZuiA X-Received: by 2002:a63:e151:: with SMTP id h17mr9926584pgk.413.1553274686687; Fri, 22 Mar 2019 10:11:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553274686; cv=none; d=google.com; s=arc-20160816; b=kO3RFKPiHcnL18ShJMdyRElwMv62iEdE3tO8EkxDBMqDQWgXxPY2JZjzqisgxxH63m ZMLvbxayW6rF1SRlUonwcBZjSCklNlTXXIxTH0hQwEDtzizrrpMoTtL0Jk7gCisuUDTC 1FUZAI0VYA/A3FY1PcKv1BMbUUBoz7nLOh4INnJY/U+ZB0CbWEYcOkgKHxtoUZBGHOML Z4/buVTwB8G/nsy6ld+cBN98IGyAKZ3eh3BKAR78gBOShstuVaDLjbSOCnAOyZOA/2k+ AHbKRfL/0CmvbNA9ovw8cCIyRZAoIAXzfW3tDVAnXW9+MFP8BGd+RM5rsOlGsxbw+IpO DYLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=sFbP31RLAul+9eYp0JUXZ3lsrzxrgfRvgKAdscas9PA=; b=BxdZ51X6dRhC7QWLC+spgUyy5EKEkyVr3j5lbNUAM7ejClAnaUQchpe3CdbNLMjnbX N/yUS9ZH6PSH1oO7m6KhrmrkpYJV3KzHoyMbWyzEkqHpteodCHjhe9WOHKzBO9Wf/Zkw udVLRvJYhxc4UTXcnKnmaEOerhrtvCL/ggNWOlqq9dXuvyHBCznx2QBKeiLAGlAVVRT7 Dx+m+kOMP9Fd90UDRJBdn94CMs0VCdnPfpKYaJNDmzuGcza67/h15Ep964ux4HDyRaG5 HUr+hCHV3j8drmDQS4bNnrXN1hOFCNoNWnfzk+fZQgSxONY9Ey0f4UiUw4sdhIRweyHj o93w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTPS id g188si7036656pgc.88.2019.03.22.10.11.26 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Mar 2019 10:11:26 -0700 (PDT) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) client-ip=134.134.136.20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Mar 2019 10:11:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,256,1549958400"; d="scan'208";a="154861778" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga004.fm.intel.com with ESMTP; 22 Mar 2019 10:11:25 -0700 Subject: [PATCH v5 10/10] libnvdimm/pfn: Stop padding pmem namespaces to section alignment From: Dan Williams To: akpm@linux-foundation.org Cc: Jeff Moyer , linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org Date: Fri, 22 Mar 2019 09:58:46 -0700 Message-ID: <155327392690.225273.7607286722861237500.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> References: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Now that the mm core supports section-unaligned hotplug of ZONE_DEVICE memory, we no longer need to add padding at pfn/dax device creation time. The kernel will still honor padding established by older kernels. Reported-by: Jeff Moyer Signed-off-by: Dan Williams --- drivers/nvdimm/pfn.h | 11 ++----- drivers/nvdimm/pfn_devs.c | 75 +++++++-------------------------------------- include/linux/mmzone.h | 4 ++ 3 files changed, 19 insertions(+), 71 deletions(-) diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h index e901e3a3b04c..ae589cc528f2 100644 --- a/drivers/nvdimm/pfn.h +++ b/drivers/nvdimm/pfn.h @@ -41,18 +41,13 @@ struct nd_pfn_sb { __le64 checksum; }; -#ifdef CONFIG_SPARSEMEM -#define PFN_SECTION_ALIGN_DOWN(x) SECTION_ALIGN_DOWN(x) -#define PFN_SECTION_ALIGN_UP(x) SECTION_ALIGN_UP(x) -#else /* * In this case ZONE_DEVICE=n and we will disable 'pfn' device support, * but we still want pmem to compile. */ -#define PFN_SECTION_ALIGN_DOWN(x) (x) -#define PFN_SECTION_ALIGN_UP(x) (x) +#ifndef SUB_SECTION_ALIGN_DOWN +#define SUB_SECTION_ALIGN_DOWN(x) (x) +#define SUB_SECTION_ALIGN_UP(x) (x) #endif -#define PHYS_SECTION_ALIGN_DOWN(x) PFN_PHYS(PFN_SECTION_ALIGN_DOWN(PHYS_PFN(x))) -#define PHYS_SECTION_ALIGN_UP(x) PFN_PHYS(PFN_SECTION_ALIGN_UP(PHYS_PFN(x))) #endif /* __NVDIMM_PFN_H */ diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index f0e918186504..b7928bfc5691 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -595,14 +595,14 @@ static u32 info_block_reserve(void) } /* - * We hotplug memory at section granularity, pad the reserved area from - * the previous section base to the namespace base address. + * We hotplug memory at sub-section granularity, pad the reserved area + * from the previous section base to the namespace base address. */ static unsigned long init_altmap_base(resource_size_t base) { unsigned long base_pfn = PHYS_PFN(base); - return PFN_SECTION_ALIGN_DOWN(base_pfn); + return SUB_SECTION_ALIGN_DOWN(base_pfn); } static unsigned long init_altmap_reserve(resource_size_t base) @@ -610,7 +610,7 @@ static unsigned long init_altmap_reserve(resource_size_t base) unsigned long reserve = info_block_reserve() >> PAGE_SHIFT; unsigned long base_pfn = PHYS_PFN(base); - reserve += base_pfn - PFN_SECTION_ALIGN_DOWN(base_pfn); + reserve += base_pfn - SUB_SECTION_ALIGN_DOWN(base_pfn); return reserve; } @@ -641,8 +641,7 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap) nd_pfn->npfns = le64_to_cpu(pfn_sb->npfns); pgmap->altmap_valid = false; } else if (nd_pfn->mode == PFN_MODE_PMEM) { - nd_pfn->npfns = PFN_SECTION_ALIGN_UP((resource_size(res) - - offset) / PAGE_SIZE); + nd_pfn->npfns = PHYS_PFN((resource_size(res) - offset)); if (le64_to_cpu(nd_pfn->pfn_sb->npfns) > nd_pfn->npfns) dev_info(&nd_pfn->dev, "number of pfns truncated from %lld to %ld\n", @@ -658,50 +657,10 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap) return 0; } -static u64 phys_pmem_align_down(struct nd_pfn *nd_pfn, u64 phys) -{ - return min_t(u64, PHYS_SECTION_ALIGN_DOWN(phys), - ALIGN_DOWN(phys, nd_pfn->align)); -} - -/* - * Check if pmem collides with 'System RAM', or other regions when - * section aligned. Trim it accordingly. - */ -static void trim_pfn_device(struct nd_pfn *nd_pfn, u32 *start_pad, u32 *end_trunc) -{ - struct nd_namespace_common *ndns = nd_pfn->ndns; - struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); - struct nd_region *nd_region = to_nd_region(nd_pfn->dev.parent); - const resource_size_t start = nsio->res.start; - const resource_size_t end = start + resource_size(&nsio->res); - resource_size_t adjust, size; - - *start_pad = 0; - *end_trunc = 0; - - adjust = start - PHYS_SECTION_ALIGN_DOWN(start); - size = resource_size(&nsio->res) + adjust; - if (region_intersects(start - adjust, size, IORESOURCE_SYSTEM_RAM, - IORES_DESC_NONE) == REGION_MIXED - || nd_region_conflict(nd_region, start - adjust, size)) - *start_pad = PHYS_SECTION_ALIGN_UP(start) - start; - - /* Now check that end of the range does not collide. */ - adjust = PHYS_SECTION_ALIGN_UP(end) - end; - size = resource_size(&nsio->res) + adjust; - if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM, - IORES_DESC_NONE) == REGION_MIXED - || !IS_ALIGNED(end, nd_pfn->align) - || nd_region_conflict(nd_region, start, size)) - *end_trunc = end - phys_pmem_align_down(nd_pfn, end); -} - static int nd_pfn_init(struct nd_pfn *nd_pfn) { struct nd_namespace_common *ndns = nd_pfn->ndns; struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); - u32 start_pad, end_trunc, reserve = info_block_reserve(); resource_size_t start, size; struct nd_region *nd_region; struct nd_pfn_sb *pfn_sb; @@ -736,43 +695,35 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) return -ENXIO; } - memset(pfn_sb, 0, sizeof(*pfn_sb)); - - trim_pfn_device(nd_pfn, &start_pad, &end_trunc); - if (start_pad + end_trunc) - dev_info(&nd_pfn->dev, "%s alignment collision, truncate %d bytes\n", - dev_name(&ndns->dev), start_pad + end_trunc); - /* * Note, we use 64 here for the standard size of struct page, * debugging options may cause it to be larger in which case the * implementation will limit the pfns advertised through * ->direct_access() to those that are included in the memmap. */ - start = nsio->res.start + start_pad; + start = nsio->res.start; size = resource_size(&nsio->res); - npfns = PFN_SECTION_ALIGN_UP((size - start_pad - end_trunc - reserve) - / PAGE_SIZE); + npfns = PHYS_PFN(size - SZ_8K); if (nd_pfn->mode == PFN_MODE_PMEM) { /* * The altmap should be padded out to the block size used * when populating the vmemmap. This *should* be equal to * PMD_SIZE for most architectures. */ - offset = ALIGN(start + reserve + 64 * npfns, - max(nd_pfn->align, PMD_SIZE)) - start; + offset = ALIGN(start + SZ_8K + 64 * npfns, + max(nd_pfn->align, SECTION_ACTIVE_SIZE)) - start; } else if (nd_pfn->mode == PFN_MODE_RAM) - offset = ALIGN(start + reserve, nd_pfn->align) - start; + offset = ALIGN(start + SZ_8K, nd_pfn->align) - start; else return -ENXIO; - if (offset + start_pad + end_trunc >= size) { + if (offset >= size) { dev_err(&nd_pfn->dev, "%s unable to satisfy requested alignment\n", dev_name(&ndns->dev)); return -ENXIO; } - npfns = (size - offset - start_pad - end_trunc) / SZ_4K; + npfns = PHYS_PFN(size - offset); pfn_sb->mode = cpu_to_le32(nd_pfn->mode); pfn_sb->dataoff = cpu_to_le64(offset); pfn_sb->npfns = cpu_to_le64(npfns); @@ -781,8 +732,6 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16); pfn_sb->version_major = cpu_to_le16(1); pfn_sb->version_minor = cpu_to_le16(3); - pfn_sb->start_pad = cpu_to_le32(start_pad); - pfn_sb->end_trunc = cpu_to_le32(end_trunc); pfn_sb->align = cpu_to_le32(nd_pfn->align); checksum = nd_sb_checksum((struct nd_gen_sb *) pfn_sb); pfn_sb->checksum = cpu_to_le64(checksum); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 067ee217c692..3117aa9d0f33 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1114,6 +1114,10 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec) #define PAGES_PER_SUB_SECTION (SECTION_ACTIVE_SIZE / PAGE_SIZE) #define PAGE_SUB_SECTION_MASK (~(PAGES_PER_SUB_SECTION-1)) +#define SUB_SECTION_ALIGN_UP(pfn) (((pfn) + PAGES_PER_SUB_SECTION - 1) \ + & PAGE_SUB_SECTION_MASK) +#define SUB_SECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SUB_SECTION_MASK) + struct mem_section_usage { /* * SECTION_ACTIVE_SIZE portions of the section that are populated in