From patchwork Wed May 23 15:11:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 10421743 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 5B43E6032A for ; Wed, 23 May 2018 15:13:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 41DA128FAB for ; Wed, 23 May 2018 15:13:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 402CB29083; Wed, 23 May 2018 15:13:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BD341290B5 for ; Wed, 23 May 2018 15:12:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4BAC46B02A9; Wed, 23 May 2018 11:12:11 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 43E656B02AA; Wed, 23 May 2018 11:12:11 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 26D836B02AB; Wed, 23 May 2018 11:12:11 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk0-f197.google.com (mail-qk0-f197.google.com [209.85.220.197]) by kanga.kvack.org (Postfix) with ESMTP id ED2326B02A9 for ; Wed, 23 May 2018 11:12:10 -0400 (EDT) Received: by mail-qk0-f197.google.com with SMTP id b62-v6so16530664qkj.6 for ; Wed, 23 May 2018 08:12:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=ISOPiQM8h1VhLzXm6linpSy+K/OdkQtaAcnTZK0eVfU=; b=qHNDy3jIMko1H/63gg/Tcgp+UUCDqsw5FCEwpGVyeRMQOfnCEPYe0CfZXzboES/2J+ 3V/vIIGFEdh0Sbr4jYwDWW1Jif5pohFx584cg3aUB9wedA2gL3E2jw26JoThyghrO16V JaMbgXyk8xcvSvdrLAnig3ie5NqZz6DBV6BJ/gVFxKrBmThwqVNJafQQwSUyCgMDkuGJ lzsnWfNQdyb468jCJlywlNcm2YeE6yG0MuunI6B4Dc3kBYhIURPmbZX/sO9rZl3D39c6 JlOyCasDZv3C7MmLBS34MLSOazMYWdulKZqaB1Xl1zE9DTP+8f4frseEDCeGzmkkhnqk Ewdw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of david@redhat.com designates 66.187.233.73 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: ALKqPwf6EeCs+ynEsvOBMaCXSEPHqul2IVPovRD8mByMbmHQmgSLfQW/ jQm6KEA9Gms1iItaHadva9ct6GCA/eKzuwrFk6gfYHbfEboT8VeJY5Dt9mdGzZ9eRjLH737IF36 nB77h3fwzJwSOjcaX06zh5hz977SjiDycCXlLugY3Dp3wQQRqsBHyCVPQnOuqsiiKrQ== X-Received: by 2002:a37:a3c2:: with SMTP id m185-v6mr2834323qke.255.1527088330694; Wed, 23 May 2018 08:12:10 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKh+kWK1gOMjXPyhL5V7cKKhHD2L5PXCe/tWUh8D3Fk32749cj3Nppm42UCpNlVZwW5C3rO X-Received: by 2002:a37:a3c2:: with SMTP id m185-v6mr2834268qke.255.1527088329870; Wed, 23 May 2018 08:12:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527088329; cv=none; d=google.com; s=arc-20160816; b=Glh3h2DKMWFgijaVrXAhs4XUI4gDQK+c68szmVEY+jY1Yj2vkpLbz+Z0GyoA13rUEc EqiTd0JJMVORPDaMmfnMmyJYvftDNIdLMynitPeA1uSuE7m+zuff9EwVVepagqcMHHNE /i4U5+6qOmVSWf2aw2Vj1OkrFgw38/vvg9ziYNGCehMdYVOsh5GRD+iRV2C2T5tAZtY9 UBaBkkD4JbKL+Z0UvzAjWUag5gtD1icrdlZ2Yi7gwe2+Wwiv6SkN1Vtc8Z0vUkryaaCI 0o3jzDRlDxMwXS+NXBehacqbKAHJ1gnBzMU7TigHXiQuB1FzKy+hY/3NYG12Kvcb+oQg jrBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:arc-authentication-results; bh=ISOPiQM8h1VhLzXm6linpSy+K/OdkQtaAcnTZK0eVfU=; b=eGji4mSJ4f2gm3I0UvQVL779mVJp1qsnP5NlqeFN5VJVj6zpGswK86avBJZM4IXXDq 0eK+lWUrvoUkK8JREAqSK3w1kMeiy/5/WMXGnk1+XnWY2/X/dFbjI4/jUZJRcnqna9ro KvX6APjZPk7hpV1Q9OhIpJKTJo9EMDxy+QwOgV26qxX+svMF5oFbsExUqGEF5H5RHRJK r+7MXhG7awD+Ihhv3m0a6bOVOq3eI1mgQyKGZ2sQE+y5ByV/oQvF4Iq9oC6dt8UBwbEc 2TUkIojPXaTRs9HVRsTWtHeQn+5uFF3C2yxO6zFhEMVOiIkWgPWSbeiHDnIPRmQVKOJO HRSg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of david@redhat.com designates 66.187.233.73 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx3-rdu2.redhat.com. [66.187.233.73]) by mx.google.com with ESMTPS id n24-v6si9607qkh.129.2018.05.23.08.12.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 May 2018 08:12:09 -0700 (PDT) Received-SPF: pass (google.com: domain of david@redhat.com designates 66.187.233.73 as permitted sender) client-ip=66.187.233.73; Authentication-Results: mx.google.com; spf=pass (google.com: domain of david@redhat.com designates 66.187.233.73 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6AB23818BAF5; Wed, 23 May 2018 15:12:09 +0000 (UTC) Received: from t460s.redhat.com (ovpn-116-112.ams2.redhat.com [10.36.116.112]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1AD7710C564A; Wed, 23 May 2018 15:12:05 +0000 (UTC) From: David Hildenbrand To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, David Hildenbrand , Greg Kroah-Hartman , Ingo Molnar , Andrew Morton , Pavel Tatashin , Philippe Ombredanne , Thomas Gleixner , Dan Williams , Michal Hocko , Jan Kara , "Kirill A. Shutemov" , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Matthew Wilcox , Souptick Joarder , Hugh Dickins , Huang Ying , Miles Chen , Vlastimil Babka , Reza Arbab , Mel Gorman , Tetsuo Handa Subject: [PATCH v1 01/10] mm: introduce and use PageOffline() Date: Wed, 23 May 2018 17:11:42 +0200 Message-Id: <20180523151151.6730-2-david@redhat.com> In-Reply-To: <20180523151151.6730-1-david@redhat.com> References: <20180523151151.6730-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Wed, 23 May 2018 15:12:09 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Wed, 23 May 2018 15:12:09 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'david@redhat.com' RCPT:'' X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP offline_pages() theoretically works on sub-section sizes. Problem is that we have no way to know which pages are actually offline. So right now, offline_pages() will always mark the whole section as offline. In addition, in virtualized environments, we might soon have pages that are logically offline and shall no longer be read or written - e.g. because we offline a subsection and told our hypervisor to remove it. We need a way (e.g. for kdump) to flag these pages (like PG_hwpoison), otherwise kdump will happily access all memory and crash the system when accessing memory that is not meant to be accessed. Marking pages as offline will also later to give kdump that information and to mark a section as offline once all pages are offline. It is save to use mapcount as all pages are logically removed from the system (offline_pages()). This e.g. allows us to add/remove memory to Linux in a VM in 4MB chunks Please note that we can't use PG_reserved for this. PG_reserved does not imply that - a page should not be dumped - a page is offline and we should mark the section offline Cc: Greg Kroah-Hartman Cc: Ingo Molnar Cc: Andrew Morton Cc: Pavel Tatashin Cc: Philippe Ombredanne Cc: Thomas Gleixner Cc: Dan Williams Cc: Michal Hocko Cc: Jan Kara Cc: "Kirill A. Shutemov" Cc: "Jérôme Glisse" Cc: Matthew Wilcox Cc: Souptick Joarder Cc: Hugh Dickins Cc: Huang Ying Cc: Miles Chen Cc: Vlastimil Babka Cc: Reza Arbab Cc: Mel Gorman Cc: Tetsuo Handa Signed-off-by: David Hildenbrand --- drivers/base/node.c | 1 - include/linux/memory.h | 1 - include/linux/mm.h | 10 ++++++++++ include/linux/page-flags.h | 9 +++++++++ mm/memory_hotplug.c | 32 +++++++++++++++++++++++--------- mm/page_alloc.c | 32 ++++++++++++++++++-------------- mm/sparse.c | 25 ++++++++++++++++++++++++- 7 files changed, 84 insertions(+), 26 deletions(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index 7a3a580821e0..58a889b2b2f4 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -408,7 +408,6 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, int nid, if (!mem_blk) return -EFAULT; - mem_blk->nid = nid; if (!node_online(nid)) return 0; diff --git a/include/linux/memory.h b/include/linux/memory.h index 31ca3e28b0eb..9f8cd856ca1e 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -33,7 +33,6 @@ struct memory_block { void *hw; /* optional pointer to fw/hw data */ int (*phys_callback)(struct memory_block *); struct device dev; - int nid; /* NID for this memory block */ }; int arch_get_memory_phys_device(unsigned long start_pfn); diff --git a/include/linux/mm.h b/include/linux/mm.h index c6fa9a255dbf..197c251be775 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1111,7 +1111,15 @@ static inline void set_page_address(struct page *page, void *address) { page->virtual = address; } +static void set_page_virtual(struct page *page, and enum zone_type zone) +{ + /* The shift won't overflow because ZONE_NORMAL is below 4G. */ + if (!is_highmem_idx(zone)) + set_page_address(page, __va(pfn << PAGE_SHIFT)); +} #define page_address_init() do { } while(0) +#else +#define set_page_virtual(page, zone) do { } while(0) #endif #if defined(HASHED_PAGE_VIRTUAL) @@ -2063,6 +2071,8 @@ extern unsigned long find_min_pfn_with_active_regions(void); extern void free_bootmem_with_active_regions(int nid, unsigned long max_low_pfn); extern void sparse_memory_present_with_active_regions(int nid); +extern void __meminit init_single_page(struct page *page, unsigned long pfn, + unsigned long zone, int nid); #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index e34a27727b9a..07ec6e48073b 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -686,6 +686,15 @@ PAGE_MAPCOUNT_OPS(Balloon, BALLOON) #define PAGE_KMEMCG_MAPCOUNT_VALUE (-512) PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG) +/* + * PageOffline() indicates that a page is offline (either never online via + * online_pages() or offlined via offline_pages()). Nobody in the system + * should have a reference to these pages. In virtual environments, + * the backing storage might already have been removed. Don't touch! + */ +#define PAGE_OFFLINE_MAPCOUNT_VALUE (-1024) +PAGE_MAPCOUNT_OPS(Offline, OFFLINE) + extern bool is_free_buddy_page(struct page *page); __PAGEFLAG(Isolated, isolated, PF_ANY); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index f74826cdceea..7f7bd2acb55b 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -250,6 +250,7 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn, struct vmem_altmap *altmap, bool want_memblock) { int ret; + int i; if (pfn_valid(phys_start_pfn)) return -EEXIST; @@ -258,6 +259,25 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn, if (ret < 0) return ret; + /* + * Mark all the pages in the section as offline before creating the + * memblock and onlining any sub-sections (and therefore marking the + * whole section as online). Mark them reserved so nobody will stumble + * over a half inititalized state. + */ + for (i = 0; i < PAGES_PER_SECTION; i++) { + unsigned long pfn = phys_start_pfn + i; + struct page *page; + if (!pfn_valid(pfn)) + continue; + page = pfn_to_page(pfn); + + /* dummy zone, the actual one will be set when onlining pages */ + init_single_page(page, pfn, ZONE_NORMAL, nid); + SetPageReserved(page); + __SetPageOffline(page); + } + if (!want_memblock) return 0; @@ -651,6 +671,7 @@ EXPORT_SYMBOL_GPL(__online_page_increment_counters); void __online_page_free(struct page *page) { + __ClearPageOffline(page); __free_reserved_page(page); } EXPORT_SYMBOL_GPL(__online_page_free); @@ -891,15 +912,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ int nid; int ret; struct memory_notify arg; - struct memory_block *mem; - - /* - * We can't use pfn_to_nid() because nid might be stored in struct page - * which is not yet initialized. Instead, we find nid from memory block. - */ - mem = find_memory_block(__pfn_to_section(pfn)); - nid = mem->nid; + nid = pfn_to_nid(pfn); /* associate pfn range with the zone */ zone = move_pfn_range(online_type, nid, pfn, nr_pages); @@ -1426,7 +1440,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) } /* - * remove from free_area[] and mark all as Reserved. + * remove from free_area[] and mark all as Reserved and Offline. */ static int offline_isolated_pages_cb(unsigned long start, unsigned long nr_pages, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 905db9d7962f..cd75686ff1d7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1171,7 +1171,7 @@ static void free_one_page(struct zone *zone, spin_unlock(&zone->lock); } -static void __meminit __init_single_page(struct page *page, unsigned long pfn, +void __meminit init_single_page(struct page *page, unsigned long pfn, unsigned long zone, int nid) { mm_zero_struct_page(page); @@ -1181,11 +1181,7 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn, page_cpupid_reset_last(page); INIT_LIST_HEAD(&page->lru); -#ifdef WANT_PAGE_VIRTUAL - /* The shift won't overflow because ZONE_NORMAL is below 4G. */ - if (!is_highmem_idx(zone)) - set_page_address(page, __va(pfn << PAGE_SHIFT)); -#endif + set_page_virtual(page, zone); } #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT @@ -1206,7 +1202,7 @@ static void __meminit init_reserved_page(unsigned long pfn) if (pfn >= zone->zone_start_pfn && pfn < zone_end_pfn(zone)) break; } - __init_single_page(pfn_to_page(pfn), pfn, zid, nid); + init_single_page(pfn_to_page(pfn), pfn, zid, nid); } #else static inline void init_reserved_page(unsigned long pfn) @@ -1523,7 +1519,7 @@ static unsigned long __init deferred_init_pages(int nid, int zid, } else { page++; } - __init_single_page(page, pfn, zid, nid); + init_single_page(page, pfn, zid, nid); nr_pages++; } return (nr_pages); @@ -5514,9 +5510,13 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, not_early: page = pfn_to_page(pfn); - __init_single_page(page, pfn, zone, nid); - if (context == MEMMAP_HOTPLUG) - SetPageReserved(page); + if (context == MEMMAP_HOTPLUG) { + /* everything but the zone was inititalized */ + set_page_zone(page, zone); + set_page_virtual(page, zone); + } + else + init_single_page(page, pfn, zone, nid); /* * Mark the block movable so that blocks are reserved for @@ -6404,7 +6404,7 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size, #ifdef CONFIG_HAVE_MEMBLOCK /* * Only struct pages that are backed by physical memory are zeroed and - * initialized by going through __init_single_page(). But, there are some + * initialized by going through init_single_page(). But, there are some * struct pages which are reserved in memblock allocator and their fields * may be accessed (for example page_to_pfn() on some configuration accesses * flags). We must explicitly zero those struct pages. @@ -8005,7 +8005,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) break; if (pfn == end_pfn) return; - offline_mem_sections(pfn, end_pfn); zone = page_zone(pfn_to_page(pfn)); spin_lock_irqsave(&zone->lock, flags); pfn = start_pfn; @@ -8022,11 +8021,13 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) if (unlikely(!PageBuddy(page) && PageHWPoison(page))) { pfn++; SetPageReserved(page); + __SetPageOffline(page); continue; } BUG_ON(page_count(page)); BUG_ON(!PageBuddy(page)); + BUG_ON(PageOffline(page)); order = page_order(page); #ifdef CONFIG_DEBUG_VM pr_info("remove from free list %lx %d %lx\n", @@ -8035,11 +8036,14 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) list_del(&page->lru); rmv_page_order(page); zone->free_area[order].nr_free--; - for (i = 0; i < (1 << order); i++) + for (i = 0; i < (1 << order); i++) { SetPageReserved((page+i)); + __SetPageOffline(page + i); + } pfn += (1 << order); } spin_unlock_irqrestore(&zone->lock, flags); + offline_mem_sections(start_pfn, end_pfn); } #endif diff --git a/mm/sparse.c b/mm/sparse.c index 73dc2fcc0eab..645d8a4e41a0 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -623,7 +623,24 @@ void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn) } #ifdef CONFIG_MEMORY_HOTREMOVE -/* Mark all memory sections within the pfn range as online */ +static bool all_pages_in_section_offline(unsigned long section_nr) +{ + unsigned long pfn = section_nr_to_pfn(section_nr); + struct page *page; + int i; + + for (i = 0; i < PAGES_PER_SECTION; i++, pfn++) { + if (!pfn_valid(pfn)) + continue; + + page = pfn_to_page(pfn); + if (!PageOffline(page)) + return false; + } + return true; +} + +/* Try to mark all memory sections within the pfn range as offline */ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn) { unsigned long pfn; @@ -639,6 +656,12 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn) if (WARN_ON(!valid_section_nr(section_nr))) continue; + /* if we don't cover whole sections, check all pages */ + if ((section_nr_to_pfn(section_nr) != start_pfn || + start_pfn + PAGES_PER_SECTION >= end_pfn) && + !all_pages_in_section_offline(section_nr)) + continue; + ms = __nr_to_section(section_nr); ms->section_mem_map &= ~SECTION_IS_ONLINE; }