From patchwork Tue Jan 22 10:37:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 10775333 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3AE666C2 for ; Tue, 22 Jan 2019 10:37:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 275022A518 for ; Tue, 22 Jan 2019 10:37:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1B1A02A527; Tue, 22 Jan 2019 10:37:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6FCCC2A518 for ; Tue, 22 Jan 2019 10:37:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A13D18E0005; Tue, 22 Jan 2019 05:37:46 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9A22F8E0001; Tue, 22 Jan 2019 05:37:46 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 866018E0005; Tue, 22 Jan 2019 05:37:46 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by kanga.kvack.org (Postfix) with ESMTP id 285688E0001 for ; Tue, 22 Jan 2019 05:37:46 -0500 (EST) Received: by mail-ed1-f69.google.com with SMTP id e29so9296773ede.19 for ; Tue, 22 Jan 2019 02:37:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=vBHlJiRDY8N9YGhkp2mdFx+Qk2/DA3qOKHFkJdj/Kws=; b=Ha/TI9JOcnJ0ue/iO1q341pKxoSzLbpYRQkyWQna9C1UvBHZTWtYmVNtkDwGEzppc/ NXRfUJ2UXjhNGnIJNvnvjRrlVpHOJrn2dW5Ko3lsNG87WB0Dq/QSwmKTgfYpH67i5FZp /tiELPU3ipM0W8+VtP9V2TW/PG6KDGbbPJV+c1L7nyXNPj556znxpmPhbPZP/gEztu7K wXP/NAoFNxgVAWiorcOeaOTDtCKDilYUZ4doiwBTMC7O4RJ4aXIGix45Q3aLow0cCy3G YYYVmcYhJ5FzuW9GxRuP1GzH81vnelvColeyUEgNSAUCWTRmoRPi4EhfynCFxMnKM4Ev 8S2w== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Gm-Message-State: AJcUukfXYHR2LRJN3vUxZ8EatGTDsFN6uPvQD/wkJmkwIuK+73HfmOh0 obb1vMn5If03sTfy+FAPbViN2UAmH+dzuvgSxzIWiB30Sma2P0VvAABQsaDKtcMjM/fkv9Eapbm aPAUTpyikmRvjYTUAZU8aDK5UVWtzmKqkD1S7tRURt7DxdSxR/bsDPAzGd8R6k0S1UQ== X-Received: by 2002:a50:a3d4:: with SMTP id t20mr30846470edb.159.1548153465592; Tue, 22 Jan 2019 02:37:45 -0800 (PST) X-Google-Smtp-Source: ALg8bN7I9TNn4CRWHyVs0cOCS4Zy1DjRQ3L6NomSdNPfk/4g1q2U53tlw1rCshf8qJ24B0iyt3Wl X-Received: by 2002:a50:a3d4:: with SMTP id t20mr30846402edb.159.1548153464403; Tue, 22 Jan 2019 02:37:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548153464; cv=none; d=google.com; s=arc-20160816; b=i/Mm0jLbjOZKApZPGw/hh60U+EJQDY58Jj+7DDc/URkMKmb6BRew7fqHhbKbln2bUp 5G8Oi54ukxGsGBS6XTpvixNSID4FGpeprLmVHy1VoKeSBzX5Zj4macDy4g4o5or3vsNE 5FznAKPWwkCnHWOYRrN8rg4JzfqZddx6hKhfvU+0WHsuHVslI+Mze2oYpzFuc9xy88Np cH342Vabfn8pZMsl6I6jXNUmz3OvTOosRLzNwQqb3hHO3kRksT8hN5ZZSQDivbEi29im Qq47A4F02r7Dz2gYBEj0Hx5siLTYuM0dbMF0v/3O0VVcvnvhX9k1hXAcMc9DILjXX/dH DL6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=vBHlJiRDY8N9YGhkp2mdFx+Qk2/DA3qOKHFkJdj/Kws=; b=B5McWMYRMFe04dAeROEp/RHPzpG2/zrWt9ZbqhoVAMNGLMJcrW367PxK9tNrL1FoU6 eMpmJkwqjbTWwHm2bvY4MAHafL9VDQmCRiOn0MAzOj3ZQ5R+NhMT3gx1MZewrrnCjZX2 lsAXK850v1SSzyUAR1Li7t9q9/VhqfCUFxYQ+so7xCh0TbZCb3bkuylu8rdrmb+JbzlT f1V7YX5BsuIFvg9X+THu7YPExK1xVCzKJP86roPl15E/g4x6NHjA276Cy3ygmf3HxKI3 w1VnWZnd0ykJEAb252djV2RdaYCKwUoD7gKKrnoVheZAPKfUJSKLWx7jVF2PgA3QYHmt fL9Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id f10-v6si3340651eja.11.2019.01.22.02.37.44 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Jan 2019 02:37:44 -0800 (PST) Received-SPF: pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 22 Jan 2019 11:37:43 +0100 Received: from d104.suse.de (nwb-a10-snat.microfocus.com [10.120.13.202]) by emea4-mta.ukb.novell.com with ESMTP (NOT encrypted); Tue, 22 Jan 2019 10:37:18 +0000 From: Oscar Salvador To: linux-mm@kvack.org Cc: mhocko@suse.com, dan.j.williams@intel.com, Pavel.Tatashin@microsoft.com, david@redhat.com, linux-kernel@vger.kernel.org, dave.hansen@intel.com, Oscar Salvador Subject: [RFC PATCH v2 1/4] mm, memory_hotplug: cleanup memory offline path Date: Tue, 22 Jan 2019 11:37:05 +0100 Message-Id: <20190122103708.11043-2-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20190122103708.11043-1-osalvador@suse.de> References: <20190122103708.11043-1-osalvador@suse.de> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Michal Hocko check_pages_isolated_cb currently accounts the whole pfn range as being offlined if test_pages_isolated suceeds on the range. This is based on the assumption that all pages in the range are freed which is currently the case in most cases but it won't be with later changes. I haven't double checked but if the range contains invalid pfns we could theoretically over account and underflow zone's managed pages. Move the offlined pages counting to offline_isolated_pages_cb and rely on __offline_isolated_pages to return the correct value. check_pages_isolated_cb will still do it's primary job and check the pfn range. While we are at it remove check_pages_isolated and offline_isolated_pages and use directly walk_system_ram_range as do in online_pages. Signed-off-by: Michal Hocko Signed-off-by: Oscar Salvador --- include/linux/memory_hotplug.h | 2 +- mm/memory_hotplug.c | 45 +++++++++++------------------------------- mm/page_alloc.c | 11 +++++++++-- 3 files changed, 22 insertions(+), 36 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index d56bfbacf7d6..1a230dde6027 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -85,7 +85,7 @@ extern int add_one_highpage(struct page *page, int pfn, int bad_ppro); extern int online_pages(unsigned long, unsigned long, int); extern int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn, unsigned long *valid_start, unsigned long *valid_end); -extern void __offline_isolated_pages(unsigned long, unsigned long); +extern unsigned long __offline_isolated_pages(unsigned long, unsigned long); typedef int (*online_page_callback_t)(struct page *page, unsigned int order); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index ec22c86d9f89..6efa44087b37 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1451,17 +1451,12 @@ static int offline_isolated_pages_cb(unsigned long start, unsigned long nr_pages, void *data) { - __offline_isolated_pages(start, start + nr_pages); + unsigned long offlined_pages; + offlined_pages = __offline_isolated_pages(start, start + nr_pages); + *(unsigned long *)data += offlined_pages; return 0; } -static void -offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) -{ - walk_system_ram_range(start_pfn, end_pfn - start_pfn, NULL, - offline_isolated_pages_cb); -} - /* * Check all pages in range, recoreded as memory resource, are isolated. */ @@ -1469,26 +1464,7 @@ static int check_pages_isolated_cb(unsigned long start_pfn, unsigned long nr_pages, void *data) { - int ret; - long offlined = *(long *)data; - ret = test_pages_isolated(start_pfn, start_pfn + nr_pages, true); - offlined = nr_pages; - if (!ret) - *(long *)data += offlined; - return ret; -} - -static long -check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn) -{ - long offlined = 0; - int ret; - - ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, &offlined, - check_pages_isolated_cb); - if (ret < 0) - offlined = (long)ret; - return offlined; + return test_pages_isolated(start_pfn, start_pfn + nr_pages, true); } static int __init cmdline_parse_movable_node(char *p) @@ -1573,7 +1549,7 @@ static int __ref __offline_pages(unsigned long start_pfn, unsigned long end_pfn) { unsigned long pfn, nr_pages; - long offlined_pages; + unsigned long offlined_pages = 0; int ret, node; unsigned long flags; unsigned long valid_start, valid_end; @@ -1650,13 +1626,16 @@ static int __ref __offline_pages(unsigned long start_pfn, goto failed_removal_isolated; } /* check again */ - offlined_pages = check_pages_isolated(start_pfn, end_pfn); - } while (offlined_pages < 0); + ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, NULL, + check_pages_isolated_cb); + } while (ret); - pr_info("Offlined Pages %ld\n", offlined_pages); /* Ok, all of our target is isolated. We cannot do rollback at this point. */ - offline_isolated_pages(start_pfn, end_pfn); + walk_system_ram_range(start_pfn, end_pfn - start_pfn, &offlined_pages, + offline_isolated_pages_cb); + + pr_info("Offlined Pages %ld\n", offlined_pages); /* reset pagetype flags and makes migrate type to be MOVABLE */ undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); /* removal success */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d7a521971a05..cad7468a0f20 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8479,7 +8479,7 @@ void zone_pcp_reset(struct zone *zone) * All pages in the range must be in a single zone and isolated * before calling this. */ -void +unsigned long __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) { struct page *page; @@ -8487,12 +8487,15 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) unsigned int order, i; unsigned long pfn; unsigned long flags; + unsigned long offlined_pages = 0; + /* find the first valid pfn */ for (pfn = start_pfn; pfn < end_pfn; pfn++) if (pfn_valid(pfn)) break; if (pfn == end_pfn) - return; + return offlined_pages; + offline_mem_sections(pfn, end_pfn); zone = page_zone(pfn_to_page(pfn)); spin_lock_irqsave(&zone->lock, flags); @@ -8510,12 +8513,14 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) if (unlikely(!PageBuddy(page) && PageHWPoison(page))) { pfn++; SetPageReserved(page); + offlined_pages++; continue; } BUG_ON(page_count(page)); BUG_ON(!PageBuddy(page)); order = page_order(page); + offlined_pages += 1 << order; #ifdef CONFIG_DEBUG_VM pr_info("remove from free list %lx %d %lx\n", pfn, 1 << order, end_pfn); @@ -8528,6 +8533,8 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) pfn += (1 << order); } spin_unlock_irqrestore(&zone->lock, flags); + + return offlined_pages; } #endif From patchwork Tue Jan 22 10:37:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 10775335 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F31F91575 for ; Tue, 22 Jan 2019 10:37:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DFDFE2A518 for ; Tue, 22 Jan 2019 10:37:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D37B12A522; Tue, 22 Jan 2019 10:37:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CC2492A519 for ; Tue, 22 Jan 2019 10:37:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B02B28E0006; Tue, 22 Jan 2019 05:37:47 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A64298E0001; Tue, 22 Jan 2019 05:37:47 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8DF938E0006; Tue, 22 Jan 2019 05:37:47 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by kanga.kvack.org (Postfix) with ESMTP id 18E198E0001 for ; Tue, 22 Jan 2019 05:37:47 -0500 (EST) Received: by mail-ed1-f72.google.com with SMTP id b3so9228701edi.0 for ; Tue, 22 Jan 2019 02:37:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=5sib36ZPlE6rZGiPDDhnLW+ptjbpiuko97QDJ9PDfAI=; b=R2LiY6d0sn/fG86jM4S6aumYQ3BOOVEMPP0ls1fVikekUr6SjDdcBquTByj/0K5oxT GFZUNQ4xzy9wMxtOiM/JVrpLaCDi51FI2KZqg5XCx2PISb0/8pJZgJNX29bPlraUGqy8 VggI+Kidn4FZNCyNS3SuB4pOMvtlcRNAIarAmpN0+GbKapMvqtDo5tVEE/oHT1HqABua s2aSsqjGNcSj30hIZPX0EdGQdgmYix7Abv1obgIIzLJk4EK8nfCJfAlH40fMtoLgnFUv jsSu7isSIK6+WO0Y3dKbie7UqYMPYIh1q8md3xKfnVDeVZsT56lcOPuyKRhv1ELcZO0p AY6A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Gm-Message-State: AJcUuke3AsOkdkjV9aMQ10/qBVnojPvlN3Ab1oDIzej5r1U0OzDXX14n /k2nNvg5SYDXcA1HfW5C7CW0clEO08lah2Sk6fg/RA5K2NfrGl9U2YtHal7pGRBXbVuB7X6k16c +32TSUnV/o+OQjkpX+f60wiesH/fw2rqS0qS+udRIzDftfFWYtSPRIOjiOQoPo6rhwg== X-Received: by 2002:aa7:ca0d:: with SMTP id y13mr29576506eds.285.1548153466394; Tue, 22 Jan 2019 02:37:46 -0800 (PST) X-Google-Smtp-Source: ALg8bN4AB9jfQyJyExTL44ewyYkl600ofJD/I7aRAI8Zyt9JnWf+S6dw19/s4HJmCqEIvXz+OH7p X-Received: by 2002:aa7:ca0d:: with SMTP id y13mr29576431eds.285.1548153464917; Tue, 22 Jan 2019 02:37:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548153464; cv=none; d=google.com; s=arc-20160816; b=Por66qIKRjLauvfmlICkTN3CqoQETUsSHyCwPvVnX85EBNsBZWRJ6tFJ7fQbuH/SGc nhB7zrW6vtT+0IaNEF9Ce95HV8TtUAKW5SFsY6jauqF/R8I722QTECrGlEynJDi9PDxw V3dsO1lZhfZsViRqIg7IgUcLUCA367msbNGX5M/+URTVWabzjfOMcd03N2ttNweHkhYL 7tl5bWQt0ll8dGxPeloqmFUaIVGagIFDlerZ8+QfhCYbhxot03OrDHniH/dTn/Dtl1s/ vXnLLBNN/IXgPibtF8XwlRFHShiIsC/mAmsgkScFbddO8Ug3C0UHngmzYmFEhNubAHLZ 21+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=5sib36ZPlE6rZGiPDDhnLW+ptjbpiuko97QDJ9PDfAI=; b=TBlQAN3m929cmCVUjrXJkW7ZU5UICd9ZJZ8R//ucfybegoVdylH9QyjkqIvExpW082 eQW0c1YA6CBm8ABvTFl2pujbnegFqAMziB8xQll+OLi5UQDtoIkQn50oQYNruTBFZX2Y RJGrA0GUXH2fzNefS0aqd6k5V03sAtmxtYlIfaalO7rJ3cEFk9BIgCrOTw8eIjUu4Ooo v1W/Sj94Yf+C8ivKHDnrWg61UVWCQX7Ov8aAbK/MQX+73BWltNFV0ZF//sFCSKxeDYbX 07GOvInbxZTmWyfxE4De7hbQxStwU9isrruiCmzAqGAAerMvA4JX+btCq69IbTcq9DvR n72A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id j11-v6si6182808ejk.263.2019.01.22.02.37.44 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Jan 2019 02:37:44 -0800 (PST) Received-SPF: pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 22 Jan 2019 11:37:44 +0100 Received: from d104.suse.de (nwb-a10-snat.microfocus.com [10.120.13.202]) by emea4-mta.ukb.novell.com with ESMTP (NOT encrypted); Tue, 22 Jan 2019 10:37:19 +0000 From: Oscar Salvador To: linux-mm@kvack.org Cc: mhocko@suse.com, dan.j.williams@intel.com, Pavel.Tatashin@microsoft.com, david@redhat.com, linux-kernel@vger.kernel.org, dave.hansen@intel.com, Oscar Salvador Subject: [RFC PATCH v2 2/4] mm, memory_hotplug: provide a more generic restrictions for memory hotplug Date: Tue, 22 Jan 2019 11:37:06 +0100 Message-Id: <20190122103708.11043-3-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20190122103708.11043-1-osalvador@suse.de> References: <20190122103708.11043-1-osalvador@suse.de> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Michal Hocko arch_add_memory, __add_pages take a want_memblock which controls whether the newly added memory should get the sysfs memblock user API (e.g. ZONE_DEVICE users do not want/need this interface). Some callers even want to control where do we allocate the memmap from by configuring altmap. Add a more generic hotplug context for arch_add_memory and __add_pages. struct mhp_restrictions contains flags which contains additional features to be enabled by the memory hotplug (MHP_MEMBLOCK_API currently) and altmap for alternative memmap allocator. Please note that the complete altmap propagation down to vmemmap code is still not done in this patch. It will be done in the follow up to reduce the churn here. This patch shouldn't introduce any functional change. Signed-off-by: Michal Hocko Signed-off-by: Oscar Salvador --- arch/arm64/mm/mmu.c | 5 ++--- arch/ia64/mm/init.c | 5 ++--- arch/powerpc/mm/mem.c | 6 +++--- arch/s390/mm/init.c | 6 +++--- arch/sh/mm/init.c | 6 +++--- arch/x86/mm/init_32.c | 6 +++--- arch/x86/mm/init_64.c | 10 +++++----- include/linux/memory_hotplug.h | 25 +++++++++++++++++++------ kernel/memremap.c | 9 ++++++--- mm/memory_hotplug.c | 10 ++++++---- 10 files changed, 52 insertions(+), 36 deletions(-) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index b6f5aa52ac67..3926969f9187 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1049,8 +1049,7 @@ int p4d_free_pud_page(p4d_t *p4d, unsigned long addr) } #ifdef CONFIG_MEMORY_HOTPLUG -int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) +int arch_add_memory(int nid, u64 start, u64 size, struct mhp_restrictions *restrictions) { int flags = 0; @@ -1061,6 +1060,6 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, size, PAGE_KERNEL, pgd_pgtable_alloc, flags); return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT, - altmap, want_memblock); + restrictions); } #endif diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index 29d841525ca1..f7bacfde1b7c 100644 --- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -644,14 +644,13 @@ mem_init (void) } #ifdef CONFIG_MEMORY_HOTPLUG -int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) +int arch_add_memory(int nid, u64 start, u64 size, struct mhp_restrictions *restrictions) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; int ret; - ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + ret = __add_pages(nid, start_pfn, nr_pages, restrictions); if (ret) printk("%s: Problem encountered in __add_pages() as ret=%d\n", __func__, ret); diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 33cc6f676fa6..30a2a9b668d7 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -117,8 +117,8 @@ int __weak remove_section_mapping(unsigned long start, unsigned long end) return -ENODEV; } -int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) +int __meminit arch_add_memory(int nid, u64 start, u64 size, + struct mhp_restrictions *restrictions) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; @@ -135,7 +135,7 @@ int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap * } flush_inval_dcache_range(start, start + size); - return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + return __add_pages(nid, start_pfn, nr_pages, restrictions); } #ifdef CONFIG_MEMORY_HOTREMOVE diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index 3e82f66d5c61..9ae71a82e9e1 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -224,8 +224,8 @@ device_initcall(s390_cma_mem_init); #endif /* CONFIG_CMA */ -int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) +int arch_add_memory(int nid, u64 start, u64 size, + struct mhp_restrictions *restrictions) { unsigned long start_pfn = PFN_DOWN(start); unsigned long size_pages = PFN_DOWN(size); @@ -235,7 +235,7 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, if (rc) return rc; - rc = __add_pages(nid, start_pfn, size_pages, altmap, want_memblock); + rc = __add_pages(nid, start_pfn, size_pages, restrictions); if (rc) vmem_remove_mapping(start, size); return rc; diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c index a0fa4de03dd5..000232933934 100644 --- a/arch/sh/mm/init.c +++ b/arch/sh/mm/init.c @@ -410,15 +410,15 @@ void free_initrd_mem(unsigned long start, unsigned long end) #endif #ifdef CONFIG_MEMORY_HOTPLUG -int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) +int arch_add_memory(int nid, u64 start, u64 size, + struct mhp_restrictions *restrictions) { unsigned long start_pfn = PFN_DOWN(start); unsigned long nr_pages = size >> PAGE_SHIFT; int ret; /* We only have ZONE_NORMAL, so this is easy.. */ - ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + ret = __add_pages(nid, start_pfn, nr_pages, restrictions); if (unlikely(ret)) printk("%s: Failed, __add_pages() == %d\n", __func__, ret); diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index 85c94f9a87f8..755dbed85531 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -850,13 +850,13 @@ void __init mem_init(void) } #ifdef CONFIG_MEMORY_HOTPLUG -int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) +int arch_add_memory(int nid, u64 start, u64 size, + struct mhp_restrictions *restrictions) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; - return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + return __add_pages(nid, start_pfn, nr_pages, restrictions); } #ifdef CONFIG_MEMORY_HOTREMOVE diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index bccff68e3267..db42c11b48fb 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -777,11 +777,11 @@ static void update_end_of_memory_vars(u64 start, u64 size) } int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, - struct vmem_altmap *altmap, bool want_memblock) + struct mhp_restrictions *restrictions) { int ret; - ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + ret = __add_pages(nid, start_pfn, nr_pages, restrictions); WARN_ON_ONCE(ret); /* update max_pfn, max_low_pfn and high_memory */ @@ -791,15 +791,15 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, return ret; } -int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) +int arch_add_memory(int nid, u64 start, u64 size, + struct mhp_restrictions *restrictions) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; init_memory_mapping(start, start + size); - return add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + return add_pages(nid, start_pfn, nr_pages, restrictions); } #define PAGE_INUSE 0xFD diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 1a230dde6027..4e0d75b17715 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -113,20 +113,33 @@ extern int __remove_pages(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap); #endif /* CONFIG_MEMORY_HOTREMOVE */ +/* + * Do we want sysfs memblock files created. This will allow userspace to online + * and offline memory explicitly. Lack of this bit means that the caller has to + * call move_pfn_range_to_zone to finish the initialization. + */ + +#define MHP_MEMBLOCK_API 1<<0 + +/* Restrictions for the memory hotplug */ +struct mhp_restrictions { + unsigned long flags; /* MHP_ flags */ + struct vmem_altmap *altmap; /* use this alternative allocator for memmaps */ +}; + /* reasonably generic interface to expand the physical pages */ extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, - struct vmem_altmap *altmap, bool want_memblock); + struct mhp_restrictions *restrictions); #ifndef CONFIG_ARCH_HAS_ADD_PAGES static inline int add_pages(int nid, unsigned long start_pfn, - unsigned long nr_pages, struct vmem_altmap *altmap, - bool want_memblock) + unsigned long nr_pages, struct mhp_restrictions *restrictions) { - return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + return __add_pages(nid, start_pfn, nr_pages, restrictions); } #else /* ARCH_HAS_ADD_PAGES */ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, - struct vmem_altmap *altmap, bool want_memblock); + struct mhp_restrictions *restrictions); #endif /* ARCH_HAS_ADD_PAGES */ #ifdef CONFIG_NUMA @@ -328,7 +341,7 @@ extern int __add_memory(int nid, u64 start, u64 size); extern int add_memory(int nid, u64 start, u64 size); extern int add_memory_resource(int nid, struct resource *resource); extern int arch_add_memory(int nid, u64 start, u64 size, - struct vmem_altmap *altmap, bool want_memblock); + struct mhp_restrictions *restrictions); extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap); extern bool is_memblock_offlined(struct memory_block *mem); diff --git a/kernel/memremap.c b/kernel/memremap.c index a856cb5ff192..d42f11673979 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -149,6 +149,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) struct resource *res = &pgmap->res; struct dev_pagemap *conflict_pgmap; pgprot_t pgprot = PAGE_KERNEL; + struct mhp_restrictions restrictions = {}; int error, nid, is_ram; if (!pgmap->ref || !pgmap->kill) @@ -199,6 +200,9 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) if (error) goto err_pfn_remap; + /* We do not want any optional features only our own memmap */ + restrictions.altmap = altmap; + mem_hotplug_begin(); /* @@ -214,7 +218,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) */ if (pgmap->type == MEMORY_DEVICE_PRIVATE) { error = add_pages(nid, align_start >> PAGE_SHIFT, - align_size >> PAGE_SHIFT, NULL, false); + align_size >> PAGE_SHIFT, &restrictions); } else { error = kasan_add_zero_shadow(__va(align_start), align_size); if (error) { @@ -222,8 +226,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) goto err_kasan; } - error = arch_add_memory(nid, align_start, align_size, altmap, - false); + error = arch_add_memory(nid, align_start, align_size, &restrictions); } if (!error) { diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 6efa44087b37..8313279136ff 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -271,12 +271,12 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn, * add the new pages. */ int __ref __add_pages(int nid, unsigned long phys_start_pfn, - unsigned long nr_pages, struct vmem_altmap *altmap, - bool want_memblock) + unsigned long nr_pages, struct mhp_restrictions *restrictions) { unsigned long i; int err = 0; int start_sec, end_sec; + struct vmem_altmap *altmap = restrictions->altmap; /* during initialize mem_map, align hot-added range to section */ start_sec = pfn_to_section_nr(phys_start_pfn); @@ -297,7 +297,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn, for (i = start_sec; i <= end_sec; i++) { err = __add_section(nid, section_nr_to_pfn(i), altmap, - want_memblock); + restrictions->flags & MHP_MEMBLOCK_API); /* * EEXIST is finally dealt with by ioresource collision @@ -1108,6 +1108,7 @@ int __ref add_memory_resource(int nid, struct resource *res) u64 start, size; bool new_node = false; int ret; + struct mhp_restrictions restrictions = {}; start = res->start; size = resource_size(res); @@ -1132,7 +1133,8 @@ int __ref add_memory_resource(int nid, struct resource *res) new_node = ret; /* call arch's memory hotadd */ - ret = arch_add_memory(nid, start, size, NULL, true); + restrictions.flags = MHP_MEMBLOCK_API; + ret = arch_add_memory(nid, start, size, &restrictions); if (ret < 0) goto error; From patchwork Tue Jan 22 10:37:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 10775339 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8B8A61575 for ; Tue, 22 Jan 2019 10:37:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 786AA2A518 for ; Tue, 22 Jan 2019 10:37:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6BFB62A522; Tue, 22 Jan 2019 10:37:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8C7782A518 for ; Tue, 22 Jan 2019 10:37:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 275228E0008; Tue, 22 Jan 2019 05:37:50 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 15FBC8E0001; Tue, 22 Jan 2019 05:37:50 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F1EF38E0008; Tue, 22 Jan 2019 05:37:49 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by kanga.kvack.org (Postfix) with ESMTP id 683198E0001 for ; Tue, 22 Jan 2019 05:37:49 -0500 (EST) Received: by mail-ed1-f69.google.com with SMTP id v4so8958759edm.18 for ; Tue, 22 Jan 2019 02:37:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=+c/alIqD6xjjDplKXCudgU4NGa/y7CJIoA9tQxbWbwk=; b=haDZkUiXUzASotpnfEreqVy+jzc8P6bkff/TiLXcQ6OapdyxeSCfQf79+Cv0KdPTsX b5EQH/0+QaUY/eqhBRXSAlErfJAJ4AmE7U27CoZTCm/9r1KHESBlZYvZtSvqaJ3neG2T lLlw8AnGs5Hv82Etb8mZtBOzHoymTlDlywdvC9A2T43fWO3Scghp1jd9Tkowen+M2eNo AfALVQKgF0gY8hqhwkzuw6frCykPQno0532cY2Jzhf7xsmqxE07myEJtEXP+HBe4n2Zb 9FcfrNLh7+7YlK7wBd679B0W8Oq7zcXMGdBJUMr5KdaFycn0aXPHgbG50h2o02Tnsdia 6ZSw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Gm-Message-State: AJcUukdsO2N9WlDj6WE/0dsJACTu/FYfwMhQJ+ibx4UDOroo19PJrLSE UtyuXqn3QvnuhYijEaI8AOagzEU3Mn/HEPSKjRF2wCwOJKKq7GADLp++ElDqlL5/1GwOi4Zm0XV 209tlOC17rmtUOxUtXrUESexszvzEfZfmPODj1lhar1YajqA6M4c7ZNMlRiRRqCG3uQ== X-Received: by 2002:a17:906:7817:: with SMTP id u23-v6mr5243833ejm.145.1548153468734; Tue, 22 Jan 2019 02:37:48 -0800 (PST) X-Google-Smtp-Source: ALg8bN7rRHWyAjBwz1x7qKRe7AJm1LvSeD+7z+lMb62Lj5zW+EF+HZyPqjv6jfph8yw34yCiYD1/ X-Received: by 2002:a17:906:7817:: with SMTP id u23-v6mr5243718ejm.145.1548153466478; Tue, 22 Jan 2019 02:37:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548153466; cv=none; d=google.com; s=arc-20160816; b=d3Xp6n13jCd1v/rywh7JJ+Qq/mvJEj+hcfmYPuXX8XPAXNfnjyBEjLDqjKtCEsUJqC iPC/lxPvKsSjfhd2Dc1inJReJu7RIyI6neuoqt6FTc3hdBmH2Ms5ixZo9TGA7P3wymJn RE2sMN2ObHWDGCdyOQRZioSAdMXU9MVBBnfbHNiCjc+54Lci+Y+WeDKGDwODoVqzlHVe xZTLeyFdZTzKEBoMk9dy5oaAGardeJBc3irU+Vcv8CqyXd7OdaMT5MC9sh8Ri2HuyMC8 X/g+Ez4Z3q9K/irZXvJcU86B04mFmXUPzfTwpm+SFL+FlyAR9mfyfq37XzK5yLsv5FxO DrLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=+c/alIqD6xjjDplKXCudgU4NGa/y7CJIoA9tQxbWbwk=; b=EBk/EfUB3uMpmj+xE+uvNknb55ZA4zRoGepcobHcB9itBZABPAAJ+JBTdGQWHNaHMu k0goc6A9d1/lDRvqhmlvj9gvIsbOjkRhvk4imDeH3tmva6GV9Sy7cgzY3UEw7c/VBGmz Cxz1+RxGpdUEVw+MtzMGHzt4F3/khazta/4+7F5vm+2M/bhNjPeEDRR8Nl6/AO0LBd4E 9iq3A8RpVM0HIy2brv4cRrUbh9KNcQnnphn69dv1W7rHy/VqTeBYPfyQmAdVTJzorAeN kVd2+/La7po6IE+A9py04UmEkLp+Yr6W70ZQPM3lRR9XtEwqP0LKBI2XciuPV8qhmYm7 i2fg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id h9si2900673edr.174.2019.01.22.02.37.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Jan 2019 02:37:46 -0800 (PST) Received-SPF: pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 22 Jan 2019 11:37:45 +0100 Received: from d104.suse.de (nwb-a10-snat.microfocus.com [10.120.13.202]) by emea4-mta.ukb.novell.com with ESMTP (NOT encrypted); Tue, 22 Jan 2019 10:37:20 +0000 From: Oscar Salvador To: linux-mm@kvack.org Cc: mhocko@suse.com, dan.j.williams@intel.com, Pavel.Tatashin@microsoft.com, david@redhat.com, linux-kernel@vger.kernel.org, dave.hansen@intel.com, Oscar Salvador Subject: [RFC PATCH v2 3/4] mm, memory_hotplug: allocate memmap from the added memory range for sparse-vmemmap Date: Tue, 22 Jan 2019 11:37:07 +0100 Message-Id: <20190122103708.11043-4-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20190122103708.11043-1-osalvador@suse.de> References: <20190122103708.11043-1-osalvador@suse.de> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Physical memory hotadd has to allocate a memmap (struct page array) for the newly added memory section. Currently, alloc_pages_node() is used for those allocations. This has some disadvantages: a) an existing memory is consumed for that purpose (~2MB per 128MB memory section on x86_64) b) if the whole node is movable then we have off-node struct pages which has performance drawbacks. a) has turned out to be a problem for memory hotplug based ballooning because the userspace might not react in time to online memory while the memory consumed during physical hotadd consumes enough memory to push system to OOM. 31bc3858ea3e ("memory-hotplug: add automatic onlining policy for the newly added memory") has been added to workaround that problem. I have also seen hot-add operations failing on powerpc due to the fact that we try to use order-8 pages. If the base page size is 64KB, this gives us 16MB, and if we run out of those, we simply fail. One could arge that we can fall back to basepages as we do in x86_64. But We can do much better when CONFIG_SPARSEMEM_VMEMMAP=y because vmemap page tables can map arbitrary memory. That means that we can simply use the beginning of each memory section and map struct pages there. struct pages which back the allocated space then just need to be treated carefully. Add {_Set,_Clear}PageVmemmap helpers to distinguish those pages in pfn walkers. We do not have any spare page flag for this purpose so use the combination of PageReserved bit which already tells that the page should be ignored by the core mm code and store VMEMMAP_PAGE (which sets all bits but PAGE_MAPPING_FLAGS) into page->mapping. On the memory hotplug front add a new MHP_MEMMAP_FROM_RANGE restriction flag. User is supposed to set the flag if the memmap should be allocated from the hotadded range. Please note that this is just a hint and architecture code can veto this if this cannot be supported. E.g. s390 cannot support this currently beause the physical memory range is made accessible only during memory online. Implementation wise we reuse vmem_altmap infrastructure to override the default allocator used by __vmemap_populate. Once the memmap is allocated we need a way to mark altmap pfns used for the allocation. For this, we define a init() and a constructor() callback in mhp_restrictions structure. Init() points now to init_altmap_memmap(), and constructor() to mark_vmemmap_pages(). init_altmap_memmap() takes care of checking the flags, and inits the vmemap_altmap structure with the required fields. mark_vmemmap_pages() takes care of marking the pages as Vmemmap, and inits some fields we need. The current layout of the Vmemmap pages are: - There is a head Vmemmap (first page), which has the following fields set: * page->_refcount: number of sections that used this altmap * page->private: total number of vmemmap pages - The remaining vmemmap pages have: * page->freelist: pointer to the head vmemmap page This is done to easy the computation we need in some places. So, let us say we hot-add 9GB on x86_64: head->_refcount = 72 sections head->private = 36864 vmemmap pages tail's->freelist = head We keep a _refcount of the used sections to know how much do we have to defer the call to vmemmap_free(). The thing is that the first pages of the hot-added range are used to create the memmap mapping, so we cannot remove those first, otherwise we would blow up. What we do is that since when we hot-remove a memory-range, sections are being removed sequentially, we wait until we hit the last section, and then we free the hole range to vmemmap_free backwards. We know that it is the last section because in every pass we decrease head->_refcount, and when it reaches 0, we got our last section. We also have to be careful about those pages during online and offline operations. They are simply skipped now so online will keep them reserved and so unusable for any other purpose and offline ignores them so they do not block the offline operation. Please note that only the memory hotplug is currently using this allocation scheme. The boot time memmap allocation could use the same trick as well but this is not done yet. Signed-off-by: Oscar Salvador --- arch/arm64/mm/mmu.c | 5 +- arch/powerpc/mm/init_64.c | 7 +++ arch/s390/mm/init.c | 6 ++ arch/x86/mm/init_64.c | 10 ++++ drivers/hv/hv_balloon.c | 1 + drivers/xen/balloon.c | 1 + include/linux/memory_hotplug.h | 23 +++++-- include/linux/memremap.h | 2 +- include/linux/page-flags.h | 23 +++++++ mm/compaction.c | 8 +++ mm/memory_hotplug.c | 133 ++++++++++++++++++++++++++++++++++++----- mm/page_alloc.c | 36 ++++++++++- mm/page_isolation.c | 13 ++++ mm/sparse.c | 108 +++++++++++++++++++++++++++++++++ mm/util.c | 2 + 15 files changed, 354 insertions(+), 24 deletions(-) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 3926969f9187..c4eb6d96d088 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -749,7 +749,10 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, if (pmd_none(READ_ONCE(*pmdp))) { void *p = NULL; - p = vmemmap_alloc_block_buf(PMD_SIZE, node); + if (altmap) + p = altmap_alloc_block_buf(PMD_SIZE, altmap); + else + p = vmemmap_alloc_block_buf(PMD_SIZE, node); if (!p) return -ENOMEM; diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index a5091c034747..d8b487a6f019 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -296,6 +296,13 @@ void __ref vmemmap_free(unsigned long start, unsigned long end, if (base_pfn >= alt_start && base_pfn < alt_end) { vmem_altmap_free(altmap, nr_pages); + } else if (PageVmemmap(page)) { + /* + * runtime vmemmap pages are residing inside the memory + * section so they do not have to be freed anywhere. + */ + while (PageVmemmap(page)) + __ClearPageVmemmap(page++); } else if (PageReserved(page)) { /* allocated from bootmem */ if (page_size < PAGE_SIZE) { diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index 9ae71a82e9e1..75e96860a9ac 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -231,6 +231,12 @@ int arch_add_memory(int nid, u64 start, u64 size, unsigned long size_pages = PFN_DOWN(size); int rc; + /* + * Physical memory is added only later during the memory online so we + * cannot use the added range at this stage unfortunately. + */ + restrictions->flags &= ~MHP_MEMMAP_FROM_RANGE; + rc = vmem_add_mapping(start, size); if (rc) return rc; diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index db42c11b48fb..2e40c9e637b9 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -809,6 +809,16 @@ static void __meminit free_pagetable(struct page *page, int order) unsigned long magic; unsigned int nr_pages = 1 << order; + /* + * runtime vmemmap pages are residing inside the memory section so + * they do not have to be freed anywhere. + */ + if (PageVmemmap(page)) { + while (nr_pages--) + __ClearPageVmemmap(page++); + return; + } + /* bootmem page has reserved flag */ if (PageReserved(page)) { __ClearPageReserved(page); diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index b32036cbb7a4..582d6e8c734d 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -1585,6 +1585,7 @@ static int balloon_probe(struct hv_device *dev, #ifdef CONFIG_MEMORY_HOTPLUG do_hot_add = hot_add; + hotplug_vmemmap_enabled = false; #else do_hot_add = false; #endif diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 3ff8f91b1fea..678e835718cf 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -715,6 +715,7 @@ static int __init balloon_init(void) set_online_page_callback(&xen_online_pages); register_memory_notifier(&xen_memory_nb); register_sysctl_table(xen_root); + hotplug_vmemmap_enabled = false; #endif #ifdef CONFIG_XEN_PV diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 4e0d75b17715..89317ef50a61 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -118,13 +118,27 @@ extern int __remove_pages(struct zone *zone, unsigned long start_pfn, * and offline memory explicitly. Lack of this bit means that the caller has to * call move_pfn_range_to_zone to finish the initialization. */ - #define MHP_MEMBLOCK_API 1<<0 -/* Restrictions for the memory hotplug */ +/* + * Do we want memmap (struct page array) allocated from the hotadded range. + * Please note that only SPARSE_VMEMMAP implements this feature and some + * architectures might not support it even for that memory model (e.g. s390) + */ +#define MHP_MEMMAP_FROM_RANGE 1<<1 + +/* Restrictions for the memory hotplug + * flags: MHP_ flags + * altmap: use this alternative allocator for memmaps + * init: callback to be called before we add this memory + * constructor: callback to be called once the more has been added + */ struct mhp_restrictions { - unsigned long flags; /* MHP_ flags */ - struct vmem_altmap *altmap; /* use this alternative allocator for memmaps */ + unsigned long flags; + struct vmem_altmap *altmap; + void (*init)(unsigned long, unsigned long, struct vmem_altmap *, + struct mhp_restrictions *); + void (*constructor)(struct vmem_altmap *, struct mhp_restrictions *); }; /* reasonably generic interface to expand the physical pages */ @@ -345,6 +359,7 @@ extern int arch_add_memory(int nid, u64 start, u64 size, extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap); extern bool is_memblock_offlined(struct memory_block *mem); +extern void mark_vmemmap_pages(struct vmem_altmap *self, struct mhp_restrictions *r); extern int sparse_add_one_section(int nid, unsigned long start_pfn, struct vmem_altmap *altmap); extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms, diff --git a/include/linux/memremap.h b/include/linux/memremap.h index f0628660d541..cfde1c1febb7 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -16,7 +16,7 @@ struct device; * @alloc: track pages consumed, private to vmemmap_populate() */ struct vmem_altmap { - const unsigned long base_pfn; + unsigned long base_pfn; const unsigned long reserve; unsigned long free; unsigned long align; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 808b4183e30d..2483fcbe8ed6 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -437,6 +437,29 @@ static __always_inline int __PageMovable(struct page *page) PAGE_MAPPING_MOVABLE; } +#define VMEMMAP_PAGE ~PAGE_MAPPING_FLAGS +static __always_inline int PageVmemmap(struct page *page) +{ + return PageReserved(page) && (unsigned long)page->mapping == VMEMMAP_PAGE; +} + +static __always_inline void __ClearPageVmemmap(struct page *page) +{ + __ClearPageReserved(page); + page->mapping = NULL; +} + +static __always_inline void __SetPageVmemmap(struct page *page) +{ + __SetPageReserved(page); + page->mapping = (void *)VMEMMAP_PAGE; +} + +static __always_inline struct page *vmemmap_get_head(struct page *page) +{ + return (struct page *)page->freelist; +} + #ifdef CONFIG_KSM /* * A KSM page is one of those write-protected "shared pages" or "merged pages" diff --git a/mm/compaction.c b/mm/compaction.c index 9830f81cd27f..8bf59eaed204 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -852,6 +852,14 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, page = pfn_to_page(low_pfn); /* + * Vmemmap pages are pages that are used for creating the memmap + * array mapping, and they reside in their hot-added memory range. + * Therefore, we cannot migrate them. + */ + if (PageVmemmap(page)) + goto isolate_fail; + + /* * Check if the pageblock has already been marked skipped. * Only the aligned PFN is checked as the caller isolates * COMPACT_CLUSTER_MAX at a time so the second call must diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 8313279136ff..3c9eb3b82b34 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -73,6 +73,12 @@ bool memhp_auto_online = true; #endif EXPORT_SYMBOL_GPL(memhp_auto_online); +/* + * Do we want to allocate the memmap array from the + * hot-added range? + */ +bool hotplug_vmemmap_enabled = true; + static int __init setup_memhp_default_state(char *str) { if (!strcmp(str, "online")) @@ -264,6 +270,18 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn, return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn)); } +static void init_altmap_memmap(unsigned long pfn, unsigned long nr_pages, + struct vmem_altmap *altmap, + struct mhp_restrictions *r) +{ + if (!(r->flags & MHP_MEMMAP_FROM_RANGE)) + return; + + altmap->base_pfn = pfn; + altmap->free = nr_pages; + r->altmap = altmap; +} + /* * Reasonably generic function for adding memory. It is * expected that archs that support memory hotplug will @@ -276,12 +294,18 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn, unsigned long i; int err = 0; int start_sec, end_sec; - struct vmem_altmap *altmap = restrictions->altmap; + struct vmem_altmap *altmap; + struct vmem_altmap __memblk_altmap = {}; /* during initialize mem_map, align hot-added range to section */ start_sec = pfn_to_section_nr(phys_start_pfn); end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1); + if (restrictions->init) + restrictions->init(phys_start_pfn, nr_pages, &__memblk_altmap, + restrictions); + + altmap = restrictions->altmap; if (altmap) { /* * Validate altmap is within bounds of the total request @@ -310,6 +334,12 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn, cond_resched(); } vmemmap_populate_print_last(); + + /* + * Check if we have a constructor + */ + if (restrictions->constructor) + restrictions->constructor(altmap, restrictions); out: return err; } @@ -694,17 +724,48 @@ static int online_pages_blocks(unsigned long start, unsigned long nr_pages) return onlined_pages; } +static unsigned long check_nr_vmemmap_pages(struct page *page) +{ + if (PageVmemmap(page)) { + struct page *head = vmemmap_get_head(page); + unsigned long vmemmap_pages = page_private(head); + + return vmemmap_pages - (page - head); + } + + return 0; +} + static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages, void *arg) { unsigned long onlined_pages = *(unsigned long *)arg; + unsigned long pfn = start_pfn; + unsigned long skip_pages = 0; + + if (PageVmemmap(pfn_to_page(pfn))) { + /* + * We do not want to send vmemmap pages to __free_pages_core, + * as we will have to populate that with checks to make sure + * vmemmap pages preserve their state. + * Skipping them here saves us some complexity, and has the + * side effect of not accounting vmemmap pages as managed_pages. + */ + skip_pages = check_nr_vmemmap_pages(pfn_to_page(pfn)); + skip_pages = min_t(unsigned long, skip_pages, nr_pages); + pfn += skip_pages; + } - if (PageReserved(pfn_to_page(start_pfn))) - onlined_pages = online_pages_blocks(start_pfn, nr_pages); + if ((nr_pages > skip_pages) && PageReserved(pfn_to_page(pfn))) + onlined_pages = online_pages_blocks(pfn, nr_pages - skip_pages); online_mem_sections(start_pfn, start_pfn + nr_pages); - *(unsigned long *)arg += onlined_pages; + /* + * We do want to account vmemmap pages to present_pages, so + * make sure to add it up. + */ + *(unsigned long *)arg += onlined_pages + skip_pages; return 0; } @@ -1134,6 +1195,12 @@ int __ref add_memory_resource(int nid, struct resource *res) /* call arch's memory hotadd */ restrictions.flags = MHP_MEMBLOCK_API; + if (hotplug_vmemmap_enabled) { + restrictions.flags |= MHP_MEMMAP_FROM_RANGE; + restrictions.init = init_altmap_memmap; + restrictions.constructor = mark_vmemmap_pages; + } + ret = arch_add_memory(nid, start, size, &restrictions); if (ret < 0) goto error; @@ -1547,8 +1614,7 @@ static void node_states_clear_node(int node, struct memory_notify *arg) node_clear_state(node, N_MEMORY); } -static int __ref __offline_pages(unsigned long start_pfn, - unsigned long end_pfn) +static int __ref __offline_pages(unsigned long start_pfn, unsigned long end_pfn) { unsigned long pfn, nr_pages; unsigned long offlined_pages = 0; @@ -1558,14 +1624,30 @@ static int __ref __offline_pages(unsigned long start_pfn, struct zone *zone; struct memory_notify arg; char *reason; + unsigned long nr_vmemmap_pages = 0; + bool skip_migration = false; mem_hotplug_begin(); + if (PageVmemmap(pfn_to_page(start_pfn))) { + nr_vmemmap_pages = check_nr_vmemmap_pages(pfn_to_page(start_pfn)); + if (start_pfn + nr_vmemmap_pages >= end_pfn) { + /* + * It can be that depending on how large is the + * hot-added range, an entire memblock only contains + * vmemmap pages. + * Should be that the case, there is no reason in trying + * to isolate and migrate this range. + */ + nr_vmemmap_pages = end_pfn - start_pfn; + skip_migration = true; + } + } + /* This makes hotplug much easier...and readable. we assume this for now. .*/ if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start, &valid_end)) { - mem_hotplug_done(); ret = -EINVAL; reason = "multizone range"; goto failed_removal; @@ -1575,14 +1657,15 @@ static int __ref __offline_pages(unsigned long start_pfn, node = zone_to_nid(zone); nr_pages = end_pfn - start_pfn; - /* set above range as isolated */ - ret = start_isolate_page_range(start_pfn, end_pfn, - MIGRATE_MOVABLE, - SKIP_HWPOISON | REPORT_FAILURE); - if (ret) { - mem_hotplug_done(); - reason = "failure to isolate range"; - goto failed_removal; + if (!skip_migration) { + /* set above range as isolated */ + ret = start_isolate_page_range(start_pfn, end_pfn, + MIGRATE_MOVABLE, + SKIP_HWPOISON | REPORT_FAILURE); + if (ret) { + reason = "failure to isolate range"; + goto failed_removal; + } } arg.start_pfn = start_pfn; @@ -1596,6 +1679,13 @@ static int __ref __offline_pages(unsigned long start_pfn, goto failed_removal_isolated; } + if (skip_migration) + /* + * If the entire memblock is populated with vmemmap pages, + * there is nothing we can migrate, so skip it. + */ + goto no_migration; + do { for (pfn = start_pfn; pfn;) { if (signal_pending(current)) { @@ -1634,14 +1724,25 @@ static int __ref __offline_pages(unsigned long start_pfn, /* Ok, all of our target is isolated. We cannot do rollback at this point. */ +no_migration: walk_system_ram_range(start_pfn, end_pfn - start_pfn, &offlined_pages, offline_isolated_pages_cb); pr_info("Offlined Pages %ld\n", offlined_pages); /* reset pagetype flags and makes migrate type to be MOVABLE */ - undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); + if (!skip_migration) + undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); /* removal success */ adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages); + + /* + * Vmemmap pages are not being accounted to managed_pages but to + * present_pages. + * We need to add them up to the already offlined pages to get + * the accounting right. + */ + offlined_pages += nr_vmemmap_pages; + zone->present_pages -= offlined_pages; pgdat_resize_lock(zone->zone_pgdat, &flags); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cad7468a0f20..05492cc95d74 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1257,14 +1257,19 @@ static void __meminit __init_struct_page_nolru(struct page *page, unsigned long zone, int nid, bool is_reserved) { - mm_zero_struct_page(page); + if (!PageVmemmap(page)) { + /* + * Vmemmap pages need to preserve their state. + */ + mm_zero_struct_page(page); + init_page_count(page); + } /* * We can use a non-atomic operation for setting the * PG_reserved flag as we are still initializing the pages. */ set_page_links(page, zone, nid, pfn, is_reserved); - init_page_count(page); page_mapcount_reset(page); page_cpupid_reset_last(page); page_kasan_tag_reset(page); @@ -8138,6 +8143,19 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, page = pfn_to_page(check); + /* + * Vmemmap pages are marked as reserved, so skip them here, + * otherwise the check below will drive us to a bad conclusion. + */ + if (PageVmemmap(page)) { + struct page *head = vmemmap_get_head(page); + unsigned int skip_pages; + + skip_pages = page_private(head) - (page - head); + iter += skip_pages - 1; + continue; + } + if (PageReserved(page)) goto unmovable; @@ -8506,6 +8524,20 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) continue; } page = pfn_to_page(pfn); + + /* + * Vmemmap pages are self-hosted in the hot-added range, + * we do not need to free them, so skip them. + */ + if (PageVmemmap(page)) { + struct page *head = vmemmap_get_head(page); + unsigned long skip_pages; + + skip_pages = page_private(head) - (page - head); + pfn += skip_pages; + continue; + } + /* * The HWPoisoned page may be not in buddy system, and * page_count() is not 0. diff --git a/mm/page_isolation.c b/mm/page_isolation.c index ce323e56b34d..e29b378f39ae 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -155,6 +155,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) page = pfn_to_online_page(pfn + i); if (!page) continue; + if (PageVmemmap(page)) + continue; return page; } return NULL; @@ -257,6 +259,17 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, continue; } page = pfn_to_page(pfn); + if (PageVmemmap(page)) { + /* + * Vmemmap pages are not isolated. Skip them. + */ + struct page *head = vmemmap_get_head(page); + unsigned long skip_pages; + + skip_pages = page_private(head) - (page - head); + pfn += skip_pages; + continue; + } if (PageBuddy(page)) /* * If the page is on a free list, it has to be on diff --git a/mm/sparse.c b/mm/sparse.c index 7ea5dc6c6b19..dd30468dc8f5 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -579,6 +579,103 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn) #endif #ifdef CONFIG_SPARSEMEM_VMEMMAP +void mark_vmemmap_pages(struct vmem_altmap *self, struct mhp_restrictions *r) +{ + unsigned long pfn = self->base_pfn + self->reserve; + unsigned long nr_pages = self->alloc; + unsigned long nr_sects = self->free / PAGES_PER_SECTION; + unsigned long i; + struct page *head; + + if (!(r->flags & MHP_MEMMAP_FROM_RANGE) || !nr_pages) + return; + + /* + * All allocations for the memory hotplug are the same sized so align + * should be 0. + */ + WARN_ON(self->align); + + /* + * Mark these pages as Vmemmap pages. + * We keep track of the sections used by this altmap by means + * of a refcount, so we know how much do we have to defer the call + * to vmemmap_free for this memory range. + * This refcount is kept in the first vmemmap page (head). + * For example: + * We add 10GB: (ffffea0004000000 - ffffea000427ffc0) + * ffffea0004000000 will have a refcount of 80. + * To easily get the head of any vmemmap page, we keep a pointer of it + * in page->freelist. + * We also keep the total nr of pages used by this altmap in the head + * page. + * So, we have this picture: + * + * Head page: + * page->_refcount: nr of sections + * page->private: nr of vmemmap pages + * Tail page: + * page->freelist: pointer to the head page + */ + + /* + * Head, first vmemmap page. + */ + head = pfn_to_page(pfn); + + for (i = 0; i < nr_pages; i++, pfn++) { + struct page *page = pfn_to_page(pfn); + + mm_zero_struct_page(page); + __SetPageVmemmap(page); + page->freelist = head; + init_page_count(page); + } + set_page_count(head, (int)nr_sects); + set_page_private(head, nr_pages); +} + +/* + * If the range we are trying to remove was hot-added with vmemmap pages, + * we need to keep track of it to know how much do we have do defer the + * the free up. + * Since sections are removed sequentally in __remove_pages()->__remove_section(), + * we just wait until we hit the last section. + * Once that happens, we can trigger free_deferred_vmemmap_range to actually + * free the whole memory-range. + * This is done because we actually have to free the memory-range backwards. + * The reason is that the first pages of that memory are used for the pagetables + * in order to create the memmap mapping. + * If we removed those pages first, we would blow up, so the vmemmap pages have + * to be freed the last. + * Since hot-add/hot-remove operations are serialized by the hotplug lock, we know + * that once we start a hot-remove operation, we will go all the way down until it + * is done, so we do not need any locking for these two variables. + */ +static struct page *head_vmemmap_page; +static bool in_vmemmap_range; + +static inline bool vmemmap_dec_and_test(void) +{ + return page_ref_dec_and_test(head_vmemmap_page); +} + +static void free_deferred_vmemmap_range(unsigned long start, + unsigned long end) +{ + unsigned long nr_pages = end - start; + unsigned long first_section = (unsigned long)head_vmemmap_page; + + while (start >= first_section) { + pr_info("vmemmap_free: %lx - %lx\n", start, end); + vmemmap_free(start, end, NULL); + end = start; + start -= nr_pages; + } + head_vmemmap_page = NULL; + in_vmemmap_range = false; +} + static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid, struct vmem_altmap *altmap) { @@ -591,6 +688,17 @@ static void __kfree_section_memmap(struct page *memmap, unsigned long start = (unsigned long)memmap; unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION); + if (PageVmemmap(memmap) && !in_vmemmap_range) { + in_vmemmap_range = true; + head_vmemmap_page = memmap; + } + + if (in_vmemmap_range) { + if (vmemmap_dec_and_test()) + free_deferred_vmemmap_range(start, end); + return; + } + vmemmap_free(start, end, altmap); } #ifdef CONFIG_MEMORY_HOTREMOVE diff --git a/mm/util.c b/mm/util.c index 1ea055138043..e0ac8712a392 100644 --- a/mm/util.c +++ b/mm/util.c @@ -517,6 +517,8 @@ struct address_space *page_mapping(struct page *page) mapping = page->mapping; if ((unsigned long)mapping & PAGE_MAPPING_ANON) return NULL; + if ((unsigned long)mapping == VMEMMAP_PAGE) + return NULL; return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS); } From patchwork Tue Jan 22 10:37:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 10775337 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A15741575 for ; Tue, 22 Jan 2019 10:37:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 910882A518 for ; Tue, 22 Jan 2019 10:37:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 84E3C2A522; Tue, 22 Jan 2019 10:37:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1DC952A518 for ; Tue, 22 Jan 2019 10:37:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8E1158E0007; Tue, 22 Jan 2019 05:37:49 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 866108E0008; Tue, 22 Jan 2019 05:37:49 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 61C448E0007; Tue, 22 Jan 2019 05:37:49 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by kanga.kvack.org (Postfix) with ESMTP id 03AE78E0001 for ; Tue, 22 Jan 2019 05:37:49 -0500 (EST) Received: by mail-ed1-f70.google.com with SMTP id e29so9296847ede.19 for ; Tue, 22 Jan 2019 02:37:48 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=ki6GX9ntNXP/ZqaJEHvMbVe3agDP7v8x4u4ZJC/W1g8=; b=c5pKZ61LGtsp/vkM0Ev5006HCzv7ywfAx2HPjPe9amyj6vy1DzYFWbBQl0KQM12ZtG l7yfdHksw9XFguLLXWiaw7jvEKay8cAbWrrF/tPS0NcVUKX9HYx7vLtYBj5PmIhnM1cu tHcvPFKAGAcHDgQm7pqMpxmEnUDVzIAF2xl8ERQ9fRenBNDIuju/opP5jqNghH8ViYs2 SI3WFAtA/TLcT/RuCz8cd7VmyGK57OBVyqT4ZdMygid5PER0wPp/BRqOpGBl/SZc7Y2h 7PtKC6Nw9iZucv9T+Pp9heFRMtUDWbw9Crf6N8Uzz4V2q01xvR4VXBBMOr42dHur6KBe x4uw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Gm-Message-State: AJcUukfPs7x5KWBdt+iR9o4pojEJWBharLArP1dPqsCTbuHWWOAK6Jbj wGGzogCJ9h6H+vfSbr3S/mNncEU9ItE+hcRlXDSlSeqbuCK+IRlvHu8HauwJN9wguQSbrUlO3GS 91kaJcDqIELNz7iYgSevmqrBJOHIQ7aAWgv5jFvMtTTFu+xMgIIjbUd81fSQI8Bf8Mg== X-Received: by 2002:a17:906:4d2:: with SMTP id g18-v6mr27726456eja.182.1548153468397; Tue, 22 Jan 2019 02:37:48 -0800 (PST) X-Google-Smtp-Source: ALg8bN4FsGFO35WBWtdO7+n8xd4ideK3BgMfd4cdNWKnFCYx0BEFFlrK6NGkFUFpg//20jY080gB X-Received: by 2002:a17:906:4d2:: with SMTP id g18-v6mr27726416eja.182.1548153467477; Tue, 22 Jan 2019 02:37:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548153467; cv=none; d=google.com; s=arc-20160816; b=mnTB0GhuxVeTSL+k1g3GD3jRzzfEXhWoG7Pp3EKZWu9ype77SNaDofqlmTZO4ad82J gpBfQWlUUwOfN60bQh454t/ce6wst+Nv3pZOW6v2NCc2Jv+ZPDxLdqMYRbkdJMH3jFJW 7nExhyWm9l+vMxjRsNvRh3egPDT2Ux9de839j3Os/NsnQHdkcS9RHnnkbQKuUNKQxuX4 OVyqQ+pNBU60s+9C3PtVu7q/lhlQlPR1Dy8aUGOTrGQGr5YArFnDwoqZDdcSPgPfawhs 0Ja334CTrN0DtchIhc+5Val+Hc9vVgzfFVBF8rV4zEzKIOlj1FjWETp6tlgfURoNF0Uo ufdQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=ki6GX9ntNXP/ZqaJEHvMbVe3agDP7v8x4u4ZJC/W1g8=; b=CGHYhCiVAmE9EAPZFDKAkHTOGMmER4FH377M0h1UvOcSGaFLnVrNm9MQ0hbDWlf56U X/TbKIwxF9Jyy/SXAqjYoxZLITj3qpEpsRlT3YqNPvTZVsYXwtIru02OfCttamyDQ42b NT5OkbRU3XBhMUg6hUD1FLMiNLA62GVnNMqXzo1g5O0szvoCFT2XVnSYEzTNAaH3jSPR znBGYR/GAMFRkVH/lxjMDpqtMQkakuDWuSvFG3HWMxU2/CPe+IEkTDXX6H4Qp33cxhRA 0WmlA3avkUpfsNFvO1ifAXTpwqU2lEKMMCmu8lPGJFtuW7dbBbrT0VqqSvps/CpWdCWE xffg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id l9-v6si1158921ejh.293.2019.01.22.02.37.47 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Jan 2019 02:37:47 -0800 (PST) Received-SPF: pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 22 Jan 2019 11:37:46 +0100 Received: from d104.suse.de (nwb-a10-snat.microfocus.com [10.120.13.202]) by emea4-mta.ukb.novell.com with ESMTP (NOT encrypted); Tue, 22 Jan 2019 10:37:21 +0000 From: Oscar Salvador To: linux-mm@kvack.org Cc: mhocko@suse.com, dan.j.williams@intel.com, Pavel.Tatashin@microsoft.com, david@redhat.com, linux-kernel@vger.kernel.org, dave.hansen@intel.com, Oscar Salvador Subject: [RFC PATCH v2 4/4] mm, sparse: rename kmalloc_section_memmap, __kfree_section_memmap Date: Tue, 22 Jan 2019 11:37:08 +0100 Message-Id: <20190122103708.11043-5-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20190122103708.11043-1-osalvador@suse.de> References: <20190122103708.11043-1-osalvador@suse.de> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Michal Hocko The sufix "kmalloc" is misleading. Rename it to alloc_section_memmap/free_section_memmap which better reflects the funcionality. Signed-off-by: Michal Hocko Signed-off-by: Oscar Salvador --- mm/sparse.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/mm/sparse.c b/mm/sparse.c index dd30468dc8f5..27428b965d46 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -676,13 +676,13 @@ static void free_deferred_vmemmap_range(unsigned long start, in_vmemmap_range = false; } -static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid, +static inline struct page *alloc_section_memmap(unsigned long pnum, int nid, struct vmem_altmap *altmap) { /* This will make the necessary allocations eventually. */ return sparse_mem_map_populate(pnum, nid, altmap); } -static void __kfree_section_memmap(struct page *memmap, +static void free_section_memmap(struct page *memmap, struct vmem_altmap *altmap) { unsigned long start = (unsigned long)memmap; @@ -732,13 +732,13 @@ static struct page *__kmalloc_section_memmap(void) return ret; } -static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid, +static inline struct page *alloc_section_memmap(unsigned long pnum, int nid, struct vmem_altmap *altmap) { return __kmalloc_section_memmap(); } -static void __kfree_section_memmap(struct page *memmap, +static void free_section_memmap(struct page *memmap, struct vmem_altmap *altmap) { if (is_vmalloc_addr(memmap)) @@ -803,12 +803,12 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, if (ret < 0 && ret != -EEXIST) return ret; ret = 0; - memmap = kmalloc_section_memmap(section_nr, nid, altmap); + memmap = alloc_section_memmap(section_nr, nid, altmap); if (!memmap) return -ENOMEM; usemap = __kmalloc_section_usemap(); if (!usemap) { - __kfree_section_memmap(memmap, altmap); + free_section_memmap(memmap, altmap); return -ENOMEM; } @@ -830,7 +830,7 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, out: if (ret < 0) { kfree(usemap); - __kfree_section_memmap(memmap, altmap); + free_section_memmap(memmap, altmap); } return ret; } @@ -881,7 +881,7 @@ static void free_section_usemap(struct page *memmap, unsigned long *usemap, if (PageSlab(usemap_page) || PageCompound(usemap_page)) { kfree(usemap); if (memmap) - __kfree_section_memmap(memmap, altmap); + free_section_memmap(memmap, altmap); return; }