From patchwork Mon Jan 7 08:04:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pingfan Liu X-Patchwork-Id: 10750097 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 13F7B14DE for ; Mon, 7 Jan 2019 08:05:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 07063288E8 for ; Mon, 7 Jan 2019 08:05:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EF01C289B7; Mon, 7 Jan 2019 08:05:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 05183288E8 for ; Mon, 7 Jan 2019 08:05:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF8398E000B; Mon, 7 Jan 2019 03:05:22 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DA5D98E0001; Mon, 7 Jan 2019 03:05:22 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C47D38E000B; Mon, 7 Jan 2019 03:05:22 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id 808F38E0001 for ; Mon, 7 Jan 2019 03:05:22 -0500 (EST) Received: by mail-pg1-f199.google.com with SMTP id x26so33478637pgc.5 for ; Mon, 07 Jan 2019 00:05:22 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id; bh=kfdCkGhUiRbAOgir5HhOxtbVcWo8hMEtSNSE2dL8r9E=; b=NjKK75jzzRtXsHWtHO7HwRci9JoAtpETlMwU1UcuLnh3voi2rKgeElr3cso/HtCjPG dxednb+p+NdyjlbH0qkWw5EcO8aBeFtk23DQK7oZxb7WGP6HDv7EyfOb8+aHiTPCuu6A Ciom+9CM1ot77zPrHHb+okrLJFp2AtfSBrj/+EG13P4iT+CNEvsl7N1KkMVt4arkBsqi yQYo5IMdU3nesLeo50fhF09/GzCYhUDxMaCV4gAPMaiweLCqW3pdZ/fE33z+h+sMfuUs 7y7PFMUDkipFOKW9Nsn1qveq5v8uv+UkF/3dKHd7hpSqlQ/UM5Os+e5aRk/wk3ZGYzmj e5bA== X-Gm-Message-State: AJcUukf3OlLQGpmF2jXUJWz4cp7O8YhZa4BfcQoq0OSkVd7GkQpPtouK 5TjBnagHomGeqAZ9mVKG0EbqQ7QSjKcPmbq6LD5AU6ODTTQW40b8cvT8JhrJ8hCcbeFHD7gNCWJ VYwHdQdWPik6YdFxhmUphjaEprEVzardkACDbuGoPjshjOBGQ90lvkiMq7ogKLpITFAxmv6djbh ovoMXFUiORFbSVIyN1eZQOrp/9aYyToAEDzSFu9zyv9Y8drRwMLYyKkY3FsowahTp20jjOvFvlV BZcrSksU6GfqWBkyCE4UsIliS5eXCSC8et3We6osp3NhsDVnBDoXxFXX1EJyvYdf7emezmGlrnb jPPt0sgIC52kPB8Ib8wOfNByOAC33t/Yn1tpJfEJH8l8/Zqj9AC9SxGlQb03osX1ycSs8bDN9SZ l X-Received: by 2002:a17:902:20e9:: with SMTP id v38mr17207117plg.250.1546848321678; Mon, 07 Jan 2019 00:05:21 -0800 (PST) X-Received: by 2002:a17:902:20e9:: with SMTP id v38mr17207068plg.250.1546848320692; Mon, 07 Jan 2019 00:05:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546848320; cv=none; d=google.com; s=arc-20160816; b=C0Et8fhtf09DzMNohOLRDPaYOjcdoe31gyP/Mt7ZpPML/VUwDyBe2R5CH0ZW653YEE HOjM3GD0LhCCHy9AuvI/Jba9jWm8Zf0iKY34RXzRusPDw9GOFtZDotW3zhQhOp68nCpi C2WYO1qQDLLFIGRyfAFmWj7b91FW+To506aIcWK+saDqJrC7+fl+YwyACPOi41oWp5Gm ztWAA/IOBdghnz/4FO3f25yFSXDIp7QT0ZqQHnQstbfrbgQantSpQR69vQ7PP8cvRryT x/vVZTSfMANhia0U0MlYZpPv2Rurhh4bHU43pK2JHKhwKTtw2OJZs/mrVvqNPUU48WX2 wmpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:cc:to:from:dkim-signature; bh=kfdCkGhUiRbAOgir5HhOxtbVcWo8hMEtSNSE2dL8r9E=; b=vaj4GJJuw6d/GHhSl0dMWwQqGCWayCKbPMb0YJJBpy9HcwouJoHuZHyOeaDngT0Qs0 ry1Z1Ydi8Ua07g3K6lC3jzDpp0FSXUnnlQQjBo2N6boC1JFdgWj/2a0PfDC6vFab1HcF Z4ZHi4ySS7zq4+yNH1ubHahWrzMnjNr9GNx23ILbGL6SlP3nFgybB+dF8BzvChQcM/py pA0zwMQLPDn7HGMS09yUmPML1HdwtEwX6Yz6un3EYlzGupyp5Ys+glNBPluoW989qo7f h/82iYeXOrqmXq1PoBEcuxmoFf0ns4jvjNqSSYjA5iAB94dbNwRLyZLThESFV+sKczWW WfQg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=jJTzgmzZ; spf=pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=kernelfans@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id r29sor39813920pfk.38.2019.01.07.00.05.20 for (Google Transport Security); Mon, 07 Jan 2019 00:05:20 -0800 (PST) Received-SPF: pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=jJTzgmzZ; spf=pass (google.com: domain of kernelfans@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=kernelfans@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=kfdCkGhUiRbAOgir5HhOxtbVcWo8hMEtSNSE2dL8r9E=; b=jJTzgmzZJZPxQf6KJK+v6NnyA17dljz2r5ltNg4pdtf6JI6+lUyQDAi1OFNs43E+yn 2bB1pGRuTtFOnOr8DmJfXMTwSYX9VPslamG+p/7ps/Vz3mI1PF0CGDsyM/4NIiiuOClY TGdQtdDIdwka8QR3I+1/zrb5I1zDRJqoXqa0FBagPuP4Y1kGrRgrVNLcH6yXQRtuB/FQ sW5eJOGvLrQdqMWSKd49m3M5oUVDRhJXcLRNGYYVNHVl0lHJBM/LwqUHK2yeBwy6C01N /FDsBig2w8zuImOVXAxHoGU1vy8aXX6DhrrmFA6i9cfsjNykO+vIi5UeOH4uXu0DjNJA CvYQ== X-Google-Smtp-Source: ALg8bN7hbpRU1yme4cFZnQRWusC/H5BPQsO37MVTw7+pzpKYR7A4M8ZlTCc1p5HfOoe+gHwX9PtTnQ== X-Received: by 2002:a62:c505:: with SMTP id j5mr62066762pfg.149.1546848320048; Mon, 07 Jan 2019 00:05:20 -0800 (PST) Received: from mylaptop.nay.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id 83sm112726798pgf.57.2019.01.07.00.05.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Jan 2019 00:05:19 -0800 (PST) From: Pingfan Liu To: linux-mm@kvack.org, kexec@lists.infradead.org Cc: Pingfan Liu , Tang Chen , "Rafael J. Wysocki" , Len Brown , Andrew Morton , Mike Rapoport , Michal Hocko , Jonathan Corbet , Yaowei Bai , Pavel Tatashin , Nicholas Piggin , Naoya Horiguchi , Daniel Vacek , Mathieu Malaterre , Stefan Agner , Dave Young , Baoquan He , yinghai@kernel.org, vgoyal@redhat.com, linux-kernel@vger.kernel.org Subject: [PATCHv5] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr Date: Mon, 7 Jan 2019 16:04:59 +0800 Message-Id: <1546848299-23628-1-git-send-email-kernelfans@gmail.com> X-Mailer: git-send-email 2.7.4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Customer reported a bug on a high end server with many pcie devices, where kernel bootup with crashkernel=384M, and kaslr is enabled. Even though we still see much memory under 896 MB, the finding still failed intermittently. Because currently we can only find region under 896 MB, if w/0 ',high' specified. Then KASLR breaks 896 MB into several parts randomly, and crashkernel reservation need be aligned to 128 MB, that's why failure is found. It raises confusion to the end user that sometimes crashkernel=X works while sometimes fails. If want to make it succeed, customer can change kernel option to "crashkernel=384M, high". Just this give "crashkernel=xx@yy" a very limited space to behave even though its grammer looks more generic. And we can't answer questions raised from customer that confidently: 1) why it doesn't succeed to reserve 896 MB; 2) what's wrong with memory region under 4G; 3) why I have to add ',high', I only require 384 MB, not 3840 MB. This patch simplifies the method suggested in the mail [1]. It just goes bottom-up to find a candidate region for crashkernel. The bottom-up may be better compatible with the old reservation style, i.e. still want to get memory region from 896 MB firstly, then [896 MB, 4G], finally above 4G. There is one trivial thing about the compatibility with old kexec-tools: if the reserved region is above 896M, then old tool will fail to load bzImage. But without this patch, the old tool also fail since there is no memory below 896M can be reserved for crashkernel. [1]: http://lists.infradead.org/pipermail/kexec/2017-October/019571.html Signed-off-by: Pingfan Liu Cc: Tang Chen Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Andrew Morton Cc: Mike Rapoport Cc: Michal Hocko Cc: Jonathan Corbet Cc: Yaowei Bai Cc: Pavel Tatashin Cc: Nicholas Piggin Cc: Naoya Horiguchi Cc: Daniel Vacek Cc: Mathieu Malaterre Cc: Stefan Agner Cc: Dave Young Cc: Baoquan He Cc: yinghai@kernel.org, Cc: vgoyal@redhat.com Cc: linux-kernel@vger.kernel.org --- v4 -> v5: add a wrapper of bottom up allocation func v3 -> v4: instead of exporting the stage of parsing mem hotplug info, just using the bottom-up allocation func directly arch/x86/kernel/setup.c | 8 ++++---- include/linux/memblock.h | 3 +++ mm/memblock.c | 29 +++++++++++++++++++++++++++++ 3 files changed, 36 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index d494b9b..80e7923 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -546,10 +546,10 @@ static void __init reserve_crashkernel(void) * as old kexec-tools loads bzImage below that, unless * "crashkernel=size[KMG],high" is specified. */ - crash_base = memblock_find_in_range(CRASH_ALIGN, - high ? CRASH_ADDR_HIGH_MAX - : CRASH_ADDR_LOW_MAX, - crash_size, CRASH_ALIGN); + crash_base = memblock_find_range_bottom_up(CRASH_ALIGN, + (max_pfn * PAGE_SIZE), crash_size, CRASH_ALIGN, + NUMA_NO_NODE); + if (!crash_base) { pr_info("crashkernel reservation failed - No suitable area found.\n"); return; diff --git a/include/linux/memblock.h b/include/linux/memblock.h index aee299a..a35ae17 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -116,6 +116,9 @@ phys_addr_t memblock_find_in_range_node(phys_addr_t size, phys_addr_t align, int nid, enum memblock_flags flags); phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end, phys_addr_t size, phys_addr_t align); +phys_addr_t __init_memblock +memblock_find_range_bottom_up(phys_addr_t start, phys_addr_t end, + phys_addr_t size, phys_addr_t align, int nid); void memblock_allow_resize(void); int memblock_add_node(phys_addr_t base, phys_addr_t size, int nid); int memblock_add(phys_addr_t base, phys_addr_t size); diff --git a/mm/memblock.c b/mm/memblock.c index 81ae63c..f68287e 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -192,6 +192,35 @@ __memblock_find_range_bottom_up(phys_addr_t start, phys_addr_t end, return 0; } +phys_addr_t __init_memblock +memblock_find_range_bottom_up(phys_addr_t start, phys_addr_t end, + phys_addr_t size, phys_addr_t align, int nid) +{ + phys_addr_t ret; + enum memblock_flags flags = choose_memblock_flags(); + + /* pump up @end */ + if (end == MEMBLOCK_ALLOC_ACCESSIBLE) + end = memblock.current_limit; + + /* avoid allocating the first page */ + start = max_t(phys_addr_t, start, PAGE_SIZE); + end = max(start, end); + +again: + ret = __memblock_find_range_bottom_up(start, end, size, align, + nid, flags); + + if (!ret && (flags & MEMBLOCK_MIRROR)) { + pr_warn("Could not allocate %pap bytes of mirrored memory\n", + &size); + flags &= ~MEMBLOCK_MIRROR; + goto again; + } + + return ret; +} + /** * __memblock_find_range_top_down - find free area utility, in top-down * @start: start of candidate range