From patchwork Tue Jan 17 08:20:44 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: AKASHI Takahiro X-Patchwork-Id: 9520089 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id E2E246020B for ; Tue, 17 Jan 2017 08:20:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C99B827DA4 for ; Tue, 17 Jan 2017 08:20:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B983F284F8; Tue, 17 Jan 2017 08:20:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, T_DKIM_INVALID autolearn=no version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 1D02427DA4 for ; Tue, 17 Jan 2017 08:20:03 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1cTOzl-0000Sg-UL; Tue, 17 Jan 2017 08:19:57 +0000 Received: from mail-pf0-x22d.google.com ([2607:f8b0:400e:c00::22d]) by bombadil.infradead.org with esmtps (Exim 4.87 #1 (Red Hat Linux)) id 1cTOzg-0000Q3-PG for linux-arm-kernel@lists.infradead.org; Tue, 17 Jan 2017 08:19:55 +0000 Received: by mail-pf0-x22d.google.com with SMTP id 189so60476249pfu.3 for ; Tue, 17 Jan 2017 00:19:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=wDWv65WppOy4EQgk0JB/vd7ahoQRjw7DD2t1bdNfqCk=; b=OeeycJUlJBQznjfG//VdAM6KMiwZh2Q1IzmLdZQUogm7uPG9Ka/bcF1NTvu7BqFKmH CgqlHVav7nOZPLGvoMkw6mmapj78b4kBzYJF6g9K0c5hex+DHzBc3VT+EHnD/3Kk2MHq JqtEYqIUySD6tg2o2XPLkERHdCrOwWK1xMEXw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to:user-agent; bh=wDWv65WppOy4EQgk0JB/vd7ahoQRjw7DD2t1bdNfqCk=; b=hAfNsFqWSyF0qry2NspYafldf6dRdVZ0zLApaY1o5fSjPraOU9ZVEjFYgcG1kxacbw tyTxnc6UKZ0RJ2UxatuTwn7Mq1w9jF2xaXQnwrXjWdN1CrCYVWuO0oI7FPZyv9HzOOX3 +fKRBcX2VREz1gWrAS7gNUtBR+6BjLyA+GOmtyxubNtmkliBAOJg2Udq7ldWisj9F1GT MWKIqLtTCk7orgOCwwKg7D5X9yUnPAV6y4Wk2eoUgtRvlYBa/gS2wbT7/jfNbBVck1vA 454GHQVVMj2HG2XlEAhhs6z8pK/WtEhSTAtholhO10CIgHrNyaIi191gN0p0ZmGUXdkX cqAA== X-Gm-Message-State: AIkVDXL5D3AjeXkJfbPy+tTZyClWJcUSNTb5kEStQPyFR4aiVPssiBMiXFRJAI939XWzUJzp X-Received: by 10.99.193.17 with SMTP id w17mr10691911pgf.124.1484641171801; Tue, 17 Jan 2017 00:19:31 -0800 (PST) Received: from linaro.org ([121.95.100.191]) by smtp.googlemail.com with ESMTPSA id o24sm53178643pfj.78.2017.01.17.00.19.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 17 Jan 2017 00:19:31 -0800 (PST) Date: Tue, 17 Jan 2017 17:20:44 +0900 From: AKASHI Takahiro To: Mark Rutland Subject: Re: [PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel Message-ID: <20170117082043.GM20972@linaro.org> Mail-Followup-To: AKASHI Takahiro , Mark Rutland , catalin.marinas@arm.com, will.deacon@arm.com, james.morse@arm.com, geoff@infradead.org, bauerman@linux.vnet.ibm.com, dyoung@redhat.com, kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org, Mark Salter , Pratyush Anand References: <20161228043347.27358-1-takahiro.akashi@linaro.org> <20161228043605.27470-2-takahiro.akashi@linaro.org> <20170112150926.GA12249@leverpostej> <20170113081617.GI20972@linaro.org> <20170113113914.GB26804@leverpostej> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20170113113914.GB26804@leverpostej> User-Agent: Mutt/1.5.24 (2015-08-30) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20170117_001952_873698_305390C8 X-CRM114-Status: GOOD ( 45.68 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pratyush Anand , geoff@infradead.org, catalin.marinas@arm.com, will.deacon@arm.com, james.morse@arm.com, Mark Salter , bauerman@linux.vnet.ibm.com, dyoung@redhat.com, kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP On Fri, Jan 13, 2017 at 11:39:15AM +0000, Mark Rutland wrote: > On Fri, Jan 13, 2017 at 05:16:18PM +0900, AKASHI Takahiro wrote: > > On Thu, Jan 12, 2017 at 03:09:26PM +0000, Mark Rutland wrote: > > > > +static int __init export_crashkernel(void) > > > > > + /* Add /chosen/linux,crashkernel-* properties */ > > > > > + of_remove_property(node, of_find_property(node, > > > > + "linux,crashkernel-base", NULL)); > > > > + of_remove_property(node, of_find_property(node, > > > > + "linux,crashkernel-size", NULL)); > > > > + > > > > + ret = of_add_property(node, &crash_base_prop); > > > > + if (ret) > > > > + goto ret_err; > > > > + > > > > + ret = of_add_property(node, &crash_size_prop); > > > > + if (ret) > > > > + goto ret_err; > > > > I very much do not like this. > > > > > > I don't think we should be modifying the DT exposed to userspace in this > > > manner, in the usual boot path, especially given that the kernel itself > > > does not appear to be a consumer of this property. I do not think that > > > it is right to use the DT exposed to userspace as a communication > > > channel solely between the kernel and userspace. > > > > As you mentioned in your comments against my patch#9, this property > > originates from PPC implementation. > > I added it solely from the sympathy for dt-based architectures. > > > > > So I think we should drop the above, and for arm64 have userspace > > > consistently use /proc/iomem (or perhaps a new kexec-specific file) to > > > determine the region reserved for the crash kernel, if it needs to know > > > this. > > > > As a matter of fact, my port of kexec-tools doesn't check this property > > and dropping it won't cause any problem. > > Ok. It sounds like we're both happy for this to go, then. > > While it's unfortunate that architectures differ, I think we have > legitimate reasons to differ, and it's preferable to do so. We have a > different set of constraints (e.g. supporting EFI memory maps), and > following the PPC approach creates longer term issues for us, making it > harder to do the right thing consistently. > > > > > +/* > > > > + * reserve_crashkernel() - reserves memory for crash kernel > > > > + * > > > > + * This function reserves memory area given in "crashkernel=" kernel command > > > > + * line parameter. The memory reserved is used by dump capture kernel when > > > > + * primary kernel is crashing. > > > > + */ > > > > +static void __init reserve_crashkernel(void) > > > > > + memblock_reserve(crash_base, crash_size); > > > > > > This will mean that the crash kernel will have a permanent alias in the linear > > > map which is vulnerable to being clobbered. There could also be issues > > > with mismatched attributes in future. > > > > Good point, I've never thought of that except making the memblock > > region "reserved." > > > > > We're probably ok for now, but in future we'll likely want to fix this > > > up to remove the region (or mark it nomap), and only map it temporarily > > > when loading things into the region. > > > > Well, I found that the following commit is already in: > > commit 9b492cf58077 > > Author: Xunlei Pang > > Date: Mon May 23 16:24:10 2016 -0700 > > > > kexec: introduce a protection mechanism for the crashkernel > > reserved memory > > > > To make best use of this framework, I'd like to re-use set_memory_ro/rx() > > instead of removing the region from linear mapping. But to do so, > > we need to > > * make memblock_isolate_range() global, > > * allow set_memory_ro/rx() to be applied to regions in linear mapping > > since set_memory_ro/rx() works only on page-level mappings. > > > > What do you think? > > (See my tentative solution below.) > > Great! I think it would be better to follow the approach of > mark_rodata_ro(), rather than opening up set_memory_*(), but otherwise, > it looks like it should work. I'm not quite sure what the approach of mark_rodata_ro() means, but I found that using create_mapping_late() may cause two problems: 1) it fails when PTE_CONT bits mismatch between an old and new mmu entry. This can happen, say, if the memory range for crash dump kernel starts in the mid of _continuous_ pages. 2) The control code page, of one-page size, is still written out in machine_kexec() which is called at a crash, and this means that the range must be writable even after kexec_load(), but create_mapping_late() does not handle a case of changing attributes for a single page which is in _section_ mapping. We cannot make single-page mapping for the control page since the address of that page is not determined at the boot time. As for (1), we need to call memblock_isolate_range() to make the region an independent one. > Either way, this still leaves us with an RO alias on crashed cores (and > potential cache attribute mismatches in future). Do we need to read from > the region later, I believe not, but the region must be _writable_ as I mentioned in (2) above. To avoid this issue, we have to move copying the control code page to machine_kexec_prepare() which is called in kexec_load() and so the region is writable anyway then. I want Geoff to affirm that this change is safe. (See my second solution below.) > or could we unmap it entirely? given the change above, I think we can. Is there any code to re-use especially for unmapping? Thanks, -Takahiro AKASHI > Thanks, > Mark. ===>8=== ===8<=== diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index c0fc3d458195..80a52e9aaf73 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -26,8 +26,6 @@ extern const unsigned char arm64_relocate_new_kernel[]; extern const unsigned long arm64_relocate_new_kernel_size; -static unsigned long kimage_start; - /** * kexec_image_info - For debugging output. */ @@ -68,7 +66,7 @@ void machine_kexec_cleanup(struct kimage *kimage) */ int machine_kexec_prepare(struct kimage *kimage) { - kimage_start = kimage->start; + void *reboot_code_buffer; kexec_image_info(kimage); @@ -77,6 +75,21 @@ int machine_kexec_prepare(struct kimage *kimage) return -EBUSY; } + reboot_code_buffer = + phys_to_virt(page_to_phys(kimage->control_code_page)); + + /* + * Copy arm64_relocate_new_kernel to the reboot_code_buffer for use + * after the kernel is shut down. + */ + memcpy(reboot_code_buffer, arm64_relocate_new_kernel, + arm64_relocate_new_kernel_size); + + /* Flush the reboot_code_buffer in preparation for its execution. */ + __flush_dcache_area(reboot_code_buffer, arm64_relocate_new_kernel_size); + flush_icache_range((uintptr_t)reboot_code_buffer, + arm64_relocate_new_kernel_size); + return 0; } @@ -147,7 +160,6 @@ static void kexec_segment_flush(const struct kimage *kimage) void machine_kexec(struct kimage *kimage) { phys_addr_t reboot_code_buffer_phys; - void *reboot_code_buffer; /* * New cpus may have become stuck_in_kernel after we loaded the image. @@ -156,7 +168,6 @@ void machine_kexec(struct kimage *kimage) !WARN_ON(kimage == kexec_crash_image)); reboot_code_buffer_phys = page_to_phys(kimage->control_code_page); - reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys); kexec_image_info(kimage); @@ -164,26 +175,12 @@ void machine_kexec(struct kimage *kimage) kimage->control_code_page); pr_debug("%s:%d: reboot_code_buffer_phys: %pa\n", __func__, __LINE__, &reboot_code_buffer_phys); - pr_debug("%s:%d: reboot_code_buffer: %p\n", __func__, __LINE__, - reboot_code_buffer); pr_debug("%s:%d: relocate_new_kernel: %p\n", __func__, __LINE__, arm64_relocate_new_kernel); pr_debug("%s:%d: relocate_new_kernel_size: 0x%lx(%lu) bytes\n", __func__, __LINE__, arm64_relocate_new_kernel_size, arm64_relocate_new_kernel_size); - /* - * Copy arm64_relocate_new_kernel to the reboot_code_buffer for use - * after the kernel is shut down. - */ - memcpy(reboot_code_buffer, arm64_relocate_new_kernel, - arm64_relocate_new_kernel_size); - - /* Flush the reboot_code_buffer in preparation for its execution. */ - __flush_dcache_area(reboot_code_buffer, arm64_relocate_new_kernel_size); - flush_icache_range((uintptr_t)reboot_code_buffer, - arm64_relocate_new_kernel_size); - /* Flush the kimage list and its buffers. */ kexec_list_flush(kimage); @@ -206,7 +203,7 @@ void machine_kexec(struct kimage *kimage) */ cpu_soft_restart(kimage != kexec_crash_image, - reboot_code_buffer_phys, kimage->head, kimage_start, 0); + reboot_code_buffer_phys, kimage->head, kimage->start, 0); BUG(); /* Should never get here. */ } diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 569ec3325bc8..e4cc170edc0c 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -90,6 +90,7 @@ early_param("initrd", early_initrd); static void __init reserve_crashkernel(void) { unsigned long long crash_size, crash_base; + int start_rgn, end_rgn; int ret; ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), @@ -120,6 +121,8 @@ static void __init reserve_crashkernel(void) return; } } + memblock_isolate_range(&memblock.memory, crash_base, crash_size, + &start_rgn, &end_rgn); memblock_reserve(crash_base, crash_size); pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n", diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 17243e43184e..b7c75845407a 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -22,6 +22,8 @@ #include #include #include +#include +#include #include #include #include @@ -817,3 +819,27 @@ int pmd_clear_huge(pmd_t *pmd) pmd_clear(pmd); return 1; } + +#ifdef CONFIG_KEXEC_CORE +void arch_kexec_protect_crashkres(void) +{ + flush_tlb_all(); + + create_mapping_late(crashk_res.start, __phys_to_virt(crashk_res.start), + resource_size(&crashk_res), PAGE_KERNEL_RO); + + /* flush the TLBs after updating live kernel mappings */ + flush_tlb_all(); +} + +void arch_kexec_unprotect_crashkres(void) +{ + flush_tlb_all(); + + create_mapping_late(crashk_res.start, __phys_to_virt(crashk_res.start), + resource_size(&crashk_res), PAGE_KERNEL); + + /* flush the TLBs after updating live kernel mappings */ + flush_tlb_all(); +} +#endif