From patchwork Fri Jan 13 08:16:18 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: AKASHI Takahiro X-Patchwork-Id: 9514807 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A82E2607D4 for ; Fri, 13 Jan 2017 08:15:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9C151284C3 for ; Fri, 13 Jan 2017 08:15:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9077028658; Fri, 13 Jan 2017 08:15:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D774B2863D for ; Fri, 13 Jan 2017 08:15:46 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1cRx1V-00009G-FM; Fri, 13 Jan 2017 08:15:45 +0000 Received: from mail-pf0-x229.google.com ([2607:f8b0:400e:c00::229]) by bombadil.infradead.org with esmtps (Exim 4.87 #1 (Red Hat Linux)) id 1cRx1M-0008Pb-1C for linux-arm-kernel@lists.infradead.org; Fri, 13 Jan 2017 08:15:39 +0000 Received: by mail-pf0-x229.google.com with SMTP id f144so27816827pfa.2 for ; Fri, 13 Jan 2017 00:15:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:reply-to:mail-followup-to :references:mime-version:content-disposition:in-reply-to:user-agent; bh=gzyjjD8jn4cKAJko1KV63gXI4beLbA0jDBeHGON4Qf0=; b=UCNKwCVR0PKT6GjBIJMzfK0U58TNtjuD4g26rZVvUPJTtp/YnMVnGXdHtNFAWjARvC FnfHzPEXtOWIJcvcmEN9001xaaIRoWeOTxWHwRZNUiFO+E4U+x71J1ocmPn7fhyyJIpM 7rVVbG7+XiAGIekLEPn7kLrJxdjgi3LCyQLFc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:reply-to :mail-followup-to:references:mime-version:content-disposition :in-reply-to:user-agent; bh=gzyjjD8jn4cKAJko1KV63gXI4beLbA0jDBeHGON4Qf0=; b=kxFGKQzi+Sv5GHXYj8wRpHVHX67FbV0qdmU3waOjBaRCE+mv6wKHF4w5P193Pz0yO2 WszW6vMv/BPwFz0D0ylIYFwvo8DiHlzkKPEDCdJ5Jytn81V6N35O2J+vx9GdKYlUI8j1 njHECb+KNc0xsaLnYPanV6izQmiSeeANgVFaYX3+pDXw5+gM6Sy08v6iaNGR2r0bX2Gf sQdcJF7aRV4oOl2VfhLpeiavPtePTNeWyxY2MApnvy8S4D2vgT6SjWSpWuSuOmX7rMys KwFsehkdLQeYpg3jz4h/ZvyFu2FJuMgjD7t+SRUudutvqnmIWJBJ5XOvYfdgrQdrYf9M jwkA== X-Gm-Message-State: AIkVDXIP6CAOJPyCbV6fuVf+Gqq3mBYkWu7lm9wiv7z8Xowz0ggUo+qrXMJG8l0/FTHw2ssT X-Received: by 10.98.44.10 with SMTP id s10mr21341027pfs.161.1484295313978; Fri, 13 Jan 2017 00:15:13 -0800 (PST) Received: from linaro.org ([121.95.100.191]) by smtp.googlemail.com with ESMTPSA id f5sm27103980pgg.5.2017.01.13.00.15.10 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 13 Jan 2017 00:15:13 -0800 (PST) Date: Fri, 13 Jan 2017 17:16:18 +0900 From: AKASHI Takahiro To: Mark Rutland Subject: Re: [PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel Message-ID: <20170113081617.GI20972@linaro.org> Mail-Followup-To: takahiro.akashi@linaro.org, Mark Rutland , catalin.marinas@arm.com, will.deacon@arm.com, james.morse@arm.com, geoff@infradead.org, bauerman@linux.vnet.ibm.com, dyoung@redhat.com, kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org, Mark Salter , Pratyush Anand References: <20161228043347.27358-1-takahiro.akashi@linaro.org> <20161228043605.27470-2-takahiro.akashi@linaro.org> <20170112150926.GA12249@leverpostej> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20170112150926.GA12249@leverpostej> User-Agent: Mutt/1.5.24 (2015-08-30) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20170113_001536_137368_68BF95BC X-CRM114-Status: GOOD ( 40.02 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: takahiro.akashi@linaro.org Cc: Pratyush Anand , geoff@infradead.org, catalin.marinas@arm.com, will.deacon@arm.com, james.morse@arm.com, Mark Salter , bauerman@linux.vnet.ibm.com, dyoung@redhat.com, kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Mark, On Thu, Jan 12, 2017 at 03:09:26PM +0000, Mark Rutland wrote: > Hi, > > As a general note, I must apologise for my minimial review of the series > until this point. Judging by the way the DT parts are organised. I'm > very concerned with the way the DT parts are organised, and clearly I > did not communicate my concerns and suggestions effectively in prior > rounds of review. > > On Wed, Dec 28, 2016 at 01:36:00PM +0900, AKASHI Takahiro wrote: > > "crashkernel=" kernel parameter specifies the size (and optionally > > the start address) of the system ram used by crash dump kernel. > > reserve_crashkernel() will allocate and reserve the memory at the startup > > of primary kernel. > > > > This memory range will be exported to userspace via: > > - an entry named "Crash kernel" in /proc/iomem, and > > - "linux,crashkernel-base" and "linux,crashkernel-size" under > > /sys/firmware/devicetree/base/chosen > > > +#ifdef CONFIG_KEXEC_CORE > > +static unsigned long long crash_size, crash_base; > > +static struct property crash_base_prop = { > > + .name = "linux,crashkernel-base", > > + .length = sizeof(u64), > > + .value = &crash_base > > +}; > > +static struct property crash_size_prop = { > > + .name = "linux,crashkernel-size", > > + .length = sizeof(u64), > > + .value = &crash_size, > > +}; > > + > > +static int __init export_crashkernel(void) > > +{ > > + struct device_node *node; > > + int ret; > > + > > + if (!crash_size) > > + return 0; > > + > > + /* Add /chosen/linux,crashkernel-* properties */ > > + node = of_find_node_by_path("/chosen"); > > + if (!node) > > + return -ENOENT; > > + > > + /* > > + * There might be existing crash kernel properties, but we can't > > + * be sure what's in them, so remove them. > > + */ > > + of_remove_property(node, of_find_property(node, > > + "linux,crashkernel-base", NULL)); > > + of_remove_property(node, of_find_property(node, > > + "linux,crashkernel-size", NULL)); > > + > > + ret = of_add_property(node, &crash_base_prop); > > + if (ret) > > + goto ret_err; > > + > > + ret = of_add_property(node, &crash_size_prop); > > + if (ret) > > + goto ret_err; > > + > > + return 0; > > + > > +ret_err: > > + pr_warn("Exporting crashkernel region to device tree failed\n"); > > + return ret; > > +} > > +late_initcall(export_crashkernel); > > I very much do not like this. > > I don't think we should be modifying the DT exposed to userspace in this > manner, in the usual boot path, especially given that the kernel itself > does not appear to be a consumer of this property. I do not think that > it is right to use the DT exposed to userspace as a communication > channel solely between the kernel and userspace. As you mentioned in your comments against my patch#9, this property originates from PPC implementation. I added it solely from the sympathy for dt-based architectures. > So I think we should drop the above, and for arm64 have userspace > consistently use /proc/iomem (or perhaps a new kexec-specific file) to > determine the region reserved for the crash kernel, if it needs to know > this. As a matter of fact, my port of kexec-tools doesn't check this property and dropping it won't cause any problem. > I'll have further comments on this front in the binding patch. > > > +/* > > + * reserve_crashkernel() - reserves memory for crash kernel > > + * > > + * This function reserves memory area given in "crashkernel=" kernel command > > + * line parameter. The memory reserved is used by dump capture kernel when > > + * primary kernel is crashing. > > + */ > > +static void __init reserve_crashkernel(void) > > +{ > > + int ret; > > + > > + ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), > > + &crash_size, &crash_base); > > + /* no crashkernel= or invalid value specified */ > > + if (ret || !crash_size) > > + return; > > + > > + if (crash_base == 0) { > > + /* Current arm64 boot protocol requires 2MB alignment */ > > + crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT, > > + crash_size, SZ_2M); > > + if (crash_base == 0) { > > + pr_warn("Unable to allocate crashkernel (size:%llx)\n", > > + crash_size); > > + return; > > + } > > + } else { > > + /* User specifies base address explicitly. */ > > + if (!memblock_is_region_memory(crash_base, crash_size) || > > + memblock_is_region_reserved(crash_base, crash_size)) { > > + pr_warn("crashkernel has wrong address or size\n"); > > + return; > > + } > > + > > + if (!IS_ALIGNED(crash_base, SZ_2M)) { > > + pr_warn("crashkernel base address is not 2MB aligned\n"); > > + return; > > + } > > + } > > + memblock_reserve(crash_base, crash_size); > > This will mean that the crash kernel will have a permanent alias in the linear > map which is vulnerable to being clobbered. There could also be issues > with mismatched attributes in future. Good point, I've never thought of that except making the memblock region "reserved." > We're probably ok for now, but in future we'll likely want to fix this > up to remove the region (or mark it nomap), and only map it temporarily > when loading things into the region. Well, I found that the following commit is already in: commit 9b492cf58077 Author: Xunlei Pang Date: Mon May 23 16:24:10 2016 -0700 kexec: introduce a protection mechanism for the crashkernel reserved memory To make best use of this framework, I'd like to re-use set_memory_ro/rx() instead of removing the region from linear mapping. But to do so, we need to * make memblock_isolate_range() global, * allow set_memory_ro/rx() to be applied to regions in linear mapping since set_memory_ro/rx() works only on page-level mappings. What do you think? (See my tentative solution below.) > > + > > + pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n", > > + crash_size >> 20, crash_base >> 20); > > + > > + crashk_res.start = crash_base; > > + crashk_res.end = crash_base + crash_size - 1; > > +} > > +#else > > +static void __init reserve_crashkernel(void) > > +{ > > + ; > > Nit: the ';' line can go. OK Thanks, -Takahiro AKASHI > > +} > > +#endif /* CONFIG_KEXEC_CORE */ > > + > > /* > > * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It > > * currently assumes that for memory starting above 4G, 32-bit devices will > > @@ -331,6 +438,9 @@ void __init arm64_memblock_init(void) > > arm64_dma_phys_limit = max_zone_dma_phys(); > > else > > arm64_dma_phys_limit = PHYS_MASK + 1; > > + > > + reserve_crashkernel(); > > + > > dma_contiguous_reserve(arm64_dma_phys_limit); > > > > memblock_allow_resize(); > > -- > > 2.11.0 > > Other than my comments regarding the DT usage above, this looks fine to > me. > > Thanks, > Mark. ===>8=== ===8<=== diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index c0fc3d458195..bb21c0473b8e 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -211,6 +211,44 @@ void machine_kexec(struct kimage *kimage) BUG(); /* Should never get here. */ } +static int kexec_mark_range(unsigned long start, unsigned long end, + bool protect) +{ + unsigned int nr_pages; + + if (!end || start >= end) + return 0; + + nr_pages = (end >> PAGE_SHIFT) - (start >> PAGE_SHIFT) + 1; + + if (protect) + return set_memory_ro(__phys_to_virt(start), nr_pages); + else + return set_memory_rw(__phys_to_virt(start), nr_pages); +} + +static void kexec_mark_crashkres(bool protect) +{ + unsigned long control; + + /* Don't touch the control code page used in crash_kexec().*/ + control = page_to_phys(kexec_crash_image->control_code_page); + kexec_mark_range(crashk_res.start, control - 1, protect); + + control += KEXEC_CONTROL_PAGE_SIZE; + kexec_mark_range(control, crashk_res.end, protect); +} + +void arch_kexec_protect_crashkres(void) +{ + kexec_mark_crashkres(true); +} + +void arch_kexec_unprotect_crashkres(void) +{ + kexec_mark_crashkres(false); +} + static void machine_kexec_mask_interrupts(void) { unsigned int i; diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 569ec3325bc8..764ec89c4f76 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -90,6 +90,7 @@ early_param("initrd", early_initrd); static void __init reserve_crashkernel(void) { unsigned long long crash_size, crash_base; + int start_rgn, end_rgn; int ret; ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), @@ -121,6 +122,9 @@ static void __init reserve_crashkernel(void) } } memblock_reserve(crash_base, crash_size); + memblock_isolate_range(&memblock.memory, crash_base, crash_size, + &start_rgn, &end_rgn); + pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n", crash_size >> 20, crash_base >> 20); diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 17243e43184e..0f60f19c287b 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -362,6 +363,17 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end unsigned long kernel_start = __pa(_text); unsigned long kernel_end = __pa(__init_begin); +#ifdef CONFIG_KEXEC_CORE + if (crashk_res.end && start >= crashk_res.start && + end <= (crashk_res.end + 1)) { + __create_pgd_mapping(pgd, start, __phys_to_virt(start), + end - start, PAGE_KERNEL, + early_pgtable_alloc, + true); + return; + } +#endif + /* * Take care not to create a writable alias for the * read-only text and rodata sections of the kernel image.