From patchwork Thu Jun 14 06:34:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naoya Horiguchi X-Patchwork-Id: 10463423 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 3A37C60348 for ; Thu, 14 Jun 2018 06:38:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2A8551FFE5 for ; Thu, 14 Jun 2018 06:38:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1E33827EE2; Thu, 14 Jun 2018 06:38:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 400AC1FFE5 for ; Thu, 14 Jun 2018 06:38:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3B6DC6B000D; Thu, 14 Jun 2018 02:38:28 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 366876B000E; Thu, 14 Jun 2018 02:38:28 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 255406B0010; Thu, 14 Jun 2018 02:38:28 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-oi0-f72.google.com (mail-oi0-f72.google.com [209.85.218.72]) by kanga.kvack.org (Postfix) with ESMTP id EEADB6B000D for ; Thu, 14 Jun 2018 02:38:27 -0400 (EDT) Received: by mail-oi0-f72.google.com with SMTP id b20-v6so3252315oih.11 for ; Wed, 13 Jun 2018 23:38:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:thread-topic:thread-index:date:message-id:references :in-reply-to:accept-language:content-language:content-id :content-transfer-encoding:mime-version; bh=nbfLVL3c6kvDNxk/TGwpFdsB0E6RXSSueq6otV8nOyw=; b=XdwWp64zBI+jb1pdE4ow6L2180T7dmucQF4jN/nriiBIlPQzPhA8wBBKI/LOeOOSqm fYOc4lf8tBzVSovMNv2uQJvXNT2cN2HA+2JwZxjG+4TO7WDWYGOvfLpCXl0KNk5pT2X4 F7F4UTEo3dxuFNNMDZXY3Qqx6icnEICQ9qY5t0LiIKlH3BK8BIzQ3Qvy65ALOT6BSk5Z /kK3wVKPYxoQQO2Omj+9iBmHRWRNvUQESQfDoytAl3pqeDcn2q0GXI8csaBPwojQDf90 T30NeRpZtjdY00S0hx0vnJV/bxwoVHundGOMCB+YfQ3MzNql9UfJ6F1X2ewpbyZZXovt UNPQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of n-horiguchi@ah.jp.nec.com designates 114.179.232.162 as permitted sender) smtp.mailfrom=n-horiguchi@ah.jp.nec.com X-Gm-Message-State: APt69E0ghdtVkhc3tSBmtx6cayJvqZqCr5ZUPKsFANxyhKs3Zuq6uV+N MAkciliLshIx5N6XPJmMmdDFZFXZepz+7CZFo4bqRbv5aQg3nxFOyCtFjfJcYUIbGjbX2ikn+oK adkQD6+1JoFEtPS5FjDTNGSyKAD6/sLlH9un6wVmE9MZRRDJsnKBgSvopnqdC+rjNDA== X-Received: by 2002:aca:5796:: with SMTP id l144-v6mr549987oib.171.1528958307672; Wed, 13 Jun 2018 23:38:27 -0700 (PDT) X-Google-Smtp-Source: ADUXVKICBR+cKr7XOwkULxi+vT4AZbZHWh81+GpDrfVSSmmOqvjTQ3dLAEYhzYNaBYSNW0BPTTdK X-Received: by 2002:aca:5796:: with SMTP id l144-v6mr549970oib.171.1528958306773; Wed, 13 Jun 2018 23:38:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528958306; cv=none; d=google.com; s=arc-20160816; b=WJmvfwSq330sK68Ih5rS4EgCfpvUZeuyjkpawMD7y2NuNPnHaGX+y+i2QgQ79cIAgS QPY2X16xFC+MJSy1XD9QGom0AAtji5J6qoxVgHxpVP+dftvIMB+Av95qXmvMQ4nu6jo5 +W4d2ve71cFUicK3Z/9jfgQCOhWFOhXqPOBWW8eymU2U57VAwDIXwUuYn43FN5DbIDGp ZF+8BACJpNXU+DDVlqwQB3dLceAwIEXGo50+Jl6aYk6mlGNwu9xuBxnvqb6aBDFV0x/6 6OfDrHRGqI+j6kmsL6/fFLYOuic+9ZTc3fszGTtuAPXM8fjPdZw+NReanimv/0Wf3473 4mLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:content-transfer-encoding:content-id:content-language :accept-language:in-reply-to:references:message-id:date:thread-index :thread-topic:subject:cc:to:from:arc-authentication-results; bh=nbfLVL3c6kvDNxk/TGwpFdsB0E6RXSSueq6otV8nOyw=; b=OyUrfPme7pEjdYZSIHwAKmhgdS+IhX1YxpyEcaVxPdhsAUojod/yfC1tVSVM+Y6pUD pejQBmNwtokdgekG/kvEFpjZa1LR57NnzUyUjP1DbES0YhZHgLWC2UYETOQkTwyLO6Ti sYYb7qVVXlErau7r5D0XFWKpfEyRGSCk2Ql/heVzy4jATCVHnC6GBKBjnczGwANY256P U06GmT41KnU6B1zAfcH70Q3leb7s11CDisOY9ZpFOc2jnCMwZsqSDgzcCd/dD7kHgD91 f/fVxO9eoUJQYIo76iP8vHSDFJilq0mjiDYyH1ntVnB+inr0eZ2+XbQ04TJD54AgNP2s NCcg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of n-horiguchi@ah.jp.nec.com designates 114.179.232.162 as permitted sender) smtp.mailfrom=n-horiguchi@ah.jp.nec.com Received: from tyo162.gate.nec.co.jp (tyo162.gate.nec.co.jp. [114.179.232.162]) by mx.google.com with ESMTPS id v136-v6si1488644oif.103.2018.06.13.23.38.26 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 13 Jun 2018 23:38:26 -0700 (PDT) Received-SPF: pass (google.com: domain of n-horiguchi@ah.jp.nec.com designates 114.179.232.162 as permitted sender) client-ip=114.179.232.162; Authentication-Results: mx.google.com; spf=pass (google.com: domain of n-horiguchi@ah.jp.nec.com designates 114.179.232.162 as permitted sender) smtp.mailfrom=n-horiguchi@ah.jp.nec.com Received: from mailgate02.nec.co.jp ([114.179.233.122]) by tyo162.gate.nec.co.jp (8.15.1/8.15.1) with ESMTPS id w5E6btaf007482 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 14 Jun 2018 15:37:55 +0900 Received: from mailsv02.nec.co.jp (mailgate-v.nec.co.jp [10.204.236.94]) by mailgate02.nec.co.jp (8.15.1/8.15.1) with ESMTP id w5E6btrJ020870; Thu, 14 Jun 2018 15:37:55 +0900 Received: from mail02.kamome.nec.co.jp (mail02.kamome.nec.co.jp [10.25.43.5]) by mailsv02.nec.co.jp (8.15.1/8.15.1) with ESMTP id w5E6asbm027302; Thu, 14 Jun 2018 15:37:55 +0900 Received: from bpxc99gp.gisp.nec.co.jp ([10.38.151.147] [10.38.151.147]) by mail01b.kamome.nec.co.jp with ESMTP id BT-MMP-1165374; Thu, 14 Jun 2018 15:34:56 +0900 Received: from BPXM23GP.gisp.nec.co.jp ([10.38.151.215]) by BPXC19GP.gisp.nec.co.jp ([10.38.151.147]) with mapi id 14.03.0319.002; Thu, 14 Jun 2018 15:34:55 +0900 From: Naoya Horiguchi To: Oscar Salvador CC: Michal Hocko , "linux-mm@kvack.org" , Pavel Tatashin , "Steven Sistare" , Daniel Jordan , Matthew Wilcox , "linux-kernel@vger.kernel.org" , Andrew Morton , "mingo@kernel.org" , "dan.j.williams@intel.com" , Huang Ying Subject: [PATCH v2] x86/e820: put !E820_TYPE_RAM regions into memblock.reserved Thread-Topic: [PATCH v2] x86/e820: put !E820_TYPE_RAM regions into memblock.reserved Thread-Index: AQHUA6nOlVxDLZeEvEaTwA6nZLvyTg== Date: Thu, 14 Jun 2018 06:34:55 +0000 Message-ID: <20180614063454.GA32419@hori1.linux.bs1.fc.nec.co.jp> References: <20180606090630.GA27065@hori1.linux.bs1.fc.nec.co.jp> <20180606092405.GA6562@hori1.linux.bs1.fc.nec.co.jp> <20180607062218.GB22554@hori1.linux.bs1.fc.nec.co.jp> <20180607065940.GA7334@techadventures.net> <20180607094921.GA8545@techadventures.net> <20180607100256.GA9129@hori1.linux.bs1.fc.nec.co.jp> <20180613054107.GA5329@hori1.linux.bs1.fc.nec.co.jp> <20180613090700.GG13364@dhcp22.suse.cz> <20180614051618.GB17860@hori1.linux.bs1.fc.nec.co.jp> <20180614053859.GA9863@techadventures.net> In-Reply-To: <20180614053859.GA9863@techadventures.net> Accept-Language: en-US, ja-JP Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.51.8.80] Content-ID: <1A424E76DB8D844EAF9458E5E0C6A8C5@gisp.nec.co.jp> MIME-Version: 1.0 X-TM-AS-MML: disable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP On Thu, Jun 14, 2018 at 07:38:59AM +0200, Oscar Salvador wrote: > On Thu, Jun 14, 2018 at 05:16:18AM +0000, Naoya Horiguchi wrote: ... > > > > My concern is that there are a few E820 memory types rather than > > E820_TYPE_RAM and E820_TYPE_RESERVED, and I'm not sure that putting them > > all into memblock.reserved is really acceptable. > > Hi Naoya, > > Maybe you could just add to memblock.reserved, all unavailable ranges within > E820_TYPE_RAM. > Actually, in your original patch, you are walking memblock.memory, which should > only contain E820_TYPE_RAM ranges (talking about x86). > > So I think the below would to the trick as well? > > @@ -1248,6 +1276,7 @@ void __init e820__memblock_setup(void) > { > int i; > u64 end; > + u64 next = 0; > > /* > * The bootstrap memblock region count maximum is 128 entries > > @@ -1269,6 +1299,14 @@ void __init e820__memblock_setup(void) > > if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN) > continue; > > + > + if (entry->type == E820_TYPE_RAM) > + if (next < entry->addr) { > + memblock_reserve (next, next + (entry->addr - next)); > + next = end; > + } > > With the above patch, I can no longer see the issues either. I double-checked and this change looks good to me. > > Although, there is a difference between this and your original patch. > In your original patch, you are just zeroing the pages, while with this one (or with your second patch), > we will zero the page in reserve_bootmem_region(), but that function also init > some other fields of the struct page: > > mm_zero_struct_page(page); > set_page_links(page, zone, nid, pfn); > init_page_count(page); > page_mapcount_reset(page); > page_cpupid_reset_last(page); > > So I am not sure we want to bother doing that for pages that are really unreachable. I think that considering that /proc/kpageflags can check them, some data (even if it's trivial) might be better than just zeros. Here's the updated patch. Thanks for the suggestion and testing! --- From: Naoya Horiguchi Date: Thu, 14 Jun 2018 14:44:36 +0900 Subject: [PATCH] x86/e820: put !E820_TYPE_RAM regions into memblock.reserved There is a kernel panic that is triggered when reading /proc/kpageflags on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': BUG: unable to handle kernel paging request at fffffffffffffffe PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 2 PID: 1728 Comm: page-types Not tainted 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014 RIP: 0010:stable_page_flags+0x27/0x3c0 Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 RSP: 0018:ffffbbd44111fde0 EFLAGS: 00010202 RAX: fffffffffffffffe RBX: 00007fffffffeff9 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffffed1182fff5c0 RBP: ffffffffffffffff R08: 0000000000000001 R09: 0000000000000001 R10: ffffbbd44111fed8 R11: 0000000000000000 R12: ffffed1182fff5c0 R13: 00000000000bffd7 R14: 0000000002fff5c0 R15: ffffbbd44111ff10 FS: 00007efc4335a500(0000) GS:ffff93a5bfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffffffffffffffe CR3: 00000000b2a58000 CR4: 00000000001406e0 Call Trace: kpageflags_read+0xc7/0x120 proc_reg_read+0x3c/0x60 __vfs_read+0x36/0x170 vfs_read+0x89/0x130 ksys_pread64+0x71/0x90 do_syscall_64+0x5b/0x160 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7efc42e75e23 Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 According to kernel bisection, this problem became visible due to commit f7f99100d8d9 which changes how struct pages are initialized. Memblock layout affects the pfn ranges covered by node/zone. Consider that we have a VM with 2 NUMA nodes and each node has 4GB memory, and the default (no memmap= given) memblock layout is like below: MEMBLOCK configuration: memory size = 0x00000001fff75c00 reserved size = 0x000000000300c000 memory.cnt = 0x4 memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0 memory[0x1] [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0 memory[0x2] [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 0 flags: 0x0 memory[0x3] [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0 ... If you give memmap=1G!4G (so it just covers memory[0x2]), the range [0x100000000-0x13fffffff] is gone: MEMBLOCK configuration: memory size = 0x00000001bff75c00 reserved size = 0x000000000300c000 memory.cnt = 0x3 memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0 memory[0x1] [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0 memory[0x2] [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0 ... This causes shrinking node 0's pfn range because it is calculated by the address range of memblock.memory. So some of struct pages in the gap range are left uninitialized. We have a function zero_resv_unavail() which does zeroing the struct pages within the reserved unavailable range (i.e. memblock.memory && !memblock.reserved). This patch utilizes it to cover all unavailable ranges by putting them into memblock.reserved. Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") Signed-off-by: Naoya Horiguchi Suggested-by: Oscar Salvador Tested-by: Oscar Salvador --- arch/x86/kernel/e820.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index d1f25c831447..d15ef47ea354 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -1248,6 +1248,7 @@ void __init e820__memblock_setup(void) { int i; u64 end; + u64 next = 0; /* * The bootstrap memblock region count maximum is 128 entries @@ -1270,6 +1271,17 @@ void __init e820__memblock_setup(void) if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN) continue; + /* + * Ranges unavailable in E820_TYPE_RAM are put into + * memblock.reserved, to make sure that struct pages in such + * regions are not left uninitialized after bootup. + */ + if (entry->type == E820_TYPE_RAM) + if (next < entry->addr) { + memblock_reserve (next, next + (entry->addr - next)); + next = end; + } + memblock_add(entry->addr, entry->size); }