From patchwork Thu Mar 30 09:56:27 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: AKASHI Takahiro X-Patchwork-Id: 9653443 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id CD68F6034C for ; Thu, 30 Mar 2017 09:54:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C573728581 for ; Thu, 30 Mar 2017 09:54:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B8FEC28583; Thu, 30 Mar 2017 09:54:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [65.50.211.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B020028581 for ; Thu, 30 Mar 2017 09:54:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=ln16+WjqxG7oT84M09gkIvMoecGrOyQO1P6gezY1OZM=; b=uuJigI99Itf273 YBVvNRnSaEtRet/gMQImK2HfHPQH+bfUduNRGyaY+AnCL+tvTXxB607G9xCNbMNxQe+wxZlEzcbYv ks3iddsILsYB/epTdSZ1CkrKnmpswjOTiikVogDPqWJBM37KL620N3XfOjCIU6zxjt7AlMmfGqKI4 zlR3S3ZNsEE4IZYUe/y4sBgtaQz8L08O3Xm+gb8bBJQebuGyTdHgM4m+m5Io78LSi9cMNTZNtL2VM tE+HjRqKtrkFRuf77O1fmRcPkcqHEhe7Zibfa4ye18PRsQKHfeZ10Yf7U6gMLq5Lq8Wum6pEI47KZ zuBetDrx7zGO8W7Ska0g==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1ctWmd-00036Z-73; Thu, 30 Mar 2017 09:54:23 +0000 Received: from mail-pg0-x22b.google.com ([2607:f8b0:400e:c05::22b]) by bombadil.infradead.org with esmtps (Exim 4.87 #1 (Red Hat Linux)) id 1ctWmA-00031u-NR for linux-arm-kernel@lists.infradead.org; Thu, 30 Mar 2017 09:53:56 +0000 Received: by mail-pg0-x22b.google.com with SMTP id g2so34447729pge.3 for ; Thu, 30 Mar 2017 02:53:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=ILO9/ksXb7T0XX77DU/UmHurOOkde9tNR47qW8svKqk=; b=cIbFWgN6OrWP3NNN1Zz1ZYFG8weGb5OwJ8PX0xIJrWOh6lucA3wn751NEBMH657lm8 hnPI/4hOt5FWi+bzhynDMrDCwH/O0kmSiIvkKki9CPu5mL0gJDcew651cmQ8eHzP3sY0 Es9lVxKCuhpO6iwMrCjhpqb6HUWQow/GLateY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to:user-agent; bh=ILO9/ksXb7T0XX77DU/UmHurOOkde9tNR47qW8svKqk=; b=aaLg0I3F48q9XELe6s+YysmXtzbx4V7vNX8OnSUDKy/Ciho8lFU4PVIzgeWe+WGW0L JxmZ4ZVVaaE31auJyBMGUkv6WCv9Gg4u5L9YfaJcybLBYX+KEuFlYbnvmyhPcszedfQd pnuAA7amPVdskAlE7TPQE0DFdgtu2mWEyfk6YvQosF/7EM4kQZXH1wGxMp6SQttsu90V Q1BhrzruYeMLn8PIag3FrBHZ2OG5LG5VPPSiJ7crZpY8KyguqaQN4QMZ/a752OzAW6+f mrR/iC8HkvT98L9kKCz2eIoMNJpWJjy/6SV9zWBkVdayqAp1j8yXdK7flPPkO7iHTpj4 bhxA== X-Gm-Message-State: AFeK/H3lVG0m4ShqmDVJKJJby3c0LzbKCNqMDoP4F7bLhvYlFTBLHsWZ56tJdT1gr5pDL0Tl X-Received: by 10.84.194.195 with SMTP id h61mr6138765pld.182.1490867613591; Thu, 30 Mar 2017 02:53:33 -0700 (PDT) Received: from linaro.org ([121.95.100.191]) by smtp.googlemail.com with ESMTPSA id m3sm3378821pgn.52.2017.03.30.02.53.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 30 Mar 2017 02:53:32 -0700 (PDT) Date: Thu, 30 Mar 2017 18:56:27 +0900 From: AKASHI Takahiro To: Ard Biesheuvel Subject: Re: [PATCH v34 06/14] arm64: kdump: protect crash dump kernel memory Message-ID: <20170330095625.GC16309@linaro.org> Mail-Followup-To: AKASHI Takahiro , Ard Biesheuvel , Catalin Marinas , Will Deacon , James Morse , Geoff Levand , Thiago Jung Bauermann , Dave Young , Mark Rutland , Pratyush Anand , Sameer Goel , David Woodhouse , "kexec@lists.infradead.org" , "linux-arm-kernel@lists.infradead.org" References: <20170328064831.15894-1-takahiro.akashi@linaro.org> <20170328065130.16019-4-takahiro.akashi@linaro.org> <20170328110733.GB16309@linaro.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20170330_025354_843769_03ABA648 X-CRM114-Status: GOOD ( 39.14 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , Pratyush Anand , Geoff Levand , Catalin Marinas , Will Deacon , James Morse , "kexec@lists.infradead.org" , Thiago Jung Bauermann , Sameer Goel , Dave Young , David Woodhouse , "linux-arm-kernel@lists.infradead.org" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP Ard, On Tue, Mar 28, 2017 at 03:05:58PM +0100, Ard Biesheuvel wrote: > On 28 March 2017 at 12:07, AKASHI Takahiro wrote: > > Ard, > > > > On Tue, Mar 28, 2017 at 11:07:05AM +0100, Ard Biesheuvel wrote: > >> On 28 March 2017 at 07:51, AKASHI Takahiro wrote: > >> > arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres() > >> > are meant to be called by kexec_load() in order to protect the memory > >> > allocated for crash dump kernel once the image is loaded. > >> > > >> > The protection is implemented by unmapping the relevant segments in crash > >> > dump kernel memory, rather than making it read-only as other archs do, > >> > to prevent any corruption due to potential cache alias (with different > >> > attributes) problem. > >> > > >> > >> I think it would be more accurate to replace 'corruption' with > >> 'coherency issues', given that this patch does not solve the issue of > >> writable aliases that may be used to modify the contents of the > >> region, but it does prevent issues related to mismatched attributes > >> (which are arguably a bigger concern) > > > > OK > > > >> > Page-level mappings are consistently used here so that we can change > >> > the attributes of segments in page granularity as well as shrink the region > >> > also in page granularity through /sys/kernel/kexec_crash_size, putting > >> > the freed memory back to buddy system. > >> > > >> > Signed-off-by: AKASHI Takahiro > >> > >> As a head's up, this patch is going to conflict heavily with patches > >> that are queued up in arm64/for-next/core atm. > > > > I'll look into it later, but > > > >> Some questions below. > >> > >> > --- > >> > arch/arm64/kernel/machine_kexec.c | 32 +++++++++++--- > >> > arch/arm64/mm/mmu.c | 90 ++++++++++++++++++++------------------- > >> > 2 files changed, 72 insertions(+), 50 deletions(-) > >> > > >> > diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c > >> > index bc96c8a7fc79..b63baa749609 100644 > >> > --- a/arch/arm64/kernel/machine_kexec.c > >> > +++ b/arch/arm64/kernel/machine_kexec.c > >> > @@ -14,7 +14,9 @@ > >> > > >> > #include > >> > #include > >> > +#include > >> > #include > >> > +#include > >> > > >> > #include "cpu-reset.h" > >> > > >> > @@ -22,8 +24,6 @@ > >> > extern const unsigned char arm64_relocate_new_kernel[]; > >> > extern const unsigned long arm64_relocate_new_kernel_size; > >> > > >> > -static unsigned long kimage_start; > >> > - > >> > /** > >> > * kexec_image_info - For debugging output. > >> > */ > >> > @@ -64,8 +64,6 @@ void machine_kexec_cleanup(struct kimage *kimage) > >> > */ > >> > int machine_kexec_prepare(struct kimage *kimage) > >> > { > >> > - kimage_start = kimage->start; > >> > - > >> > kexec_image_info(kimage); > >> > > >> > if (kimage->type != KEXEC_TYPE_CRASH && cpus_are_stuck_in_kernel()) { > >> > @@ -183,7 +181,7 @@ void machine_kexec(struct kimage *kimage) > >> > kexec_list_flush(kimage); > >> > > >> > /* Flush the new image if already in place. */ > >> > - if (kimage->head & IND_DONE) > >> > + if ((kimage != kexec_crash_image) && (kimage->head & IND_DONE)) > >> > kexec_segment_flush(kimage); > >> > > >> > pr_info("Bye!\n"); > >> > @@ -201,7 +199,7 @@ void machine_kexec(struct kimage *kimage) > >> > */ > >> > > >> > cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head, > >> > - kimage_start, 0); > >> > + kimage->start, 0); > >> > > >> > BUG(); /* Should never get here. */ > >> > } > >> > @@ -210,3 +208,25 @@ void machine_crash_shutdown(struct pt_regs *regs) > >> > { > >> > /* Empty routine needed to avoid build errors. */ > >> > } > >> > + > >> > +void arch_kexec_protect_crashkres(void) > >> > +{ > >> > + int i; > >> > + > >> > + kexec_segment_flush(kexec_crash_image); > >> > + > >> > + for (i = 0; i < kexec_crash_image->nr_segments; i++) > >> > + set_memory_valid( > >> > + __phys_to_virt(kexec_crash_image->segment[i].mem), > >> > + kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 0); > >> > +} > >> > + > >> > +void arch_kexec_unprotect_crashkres(void) > >> > +{ > >> > + int i; > >> > + > >> > + for (i = 0; i < kexec_crash_image->nr_segments; i++) > >> > + set_memory_valid( > >> > + __phys_to_virt(kexec_crash_image->segment[i].mem), > >> > + kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 1); > >> > +} > >> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > >> > index d28dbcf596b6..f6a3c0e9d37f 100644 > >> > --- a/arch/arm64/mm/mmu.c > >> > +++ b/arch/arm64/mm/mmu.c > >> > @@ -22,6 +22,8 @@ > >> > #include > >> > #include > >> > #include > >> > +#include > >> > +#include > >> > #include > >> > #include > >> > #include > >> > @@ -332,56 +334,31 @@ static void create_mapping_late(phys_addr_t phys, unsigned long virt, > >> > NULL, debug_pagealloc_enabled()); > >> > } > >> > > >> > -static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end) > >> > +static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, > >> > + phys_addr_t end, pgprot_t prot, > >> > + bool page_mappings_only) > >> > +{ > >> > + __create_pgd_mapping(pgd, start, __phys_to_virt(start), end - start, > >> > + prot, early_pgtable_alloc, > >> > + page_mappings_only); > >> > +} > >> > + > >> > +static void __init map_mem(pgd_t *pgd) > >> > { > >> > phys_addr_t kernel_start = __pa_symbol(_text); > >> > phys_addr_t kernel_end = __pa_symbol(__init_begin); > >> > + struct memblock_region *reg; > >> > > >> > /* > >> > - * Take care not to create a writable alias for the > >> > - * read-only text and rodata sections of the kernel image. > >> > + * Temporarily marked as NOMAP to skip mapping in the next for-loop > >> > */ > >> > + memblock_mark_nomap(kernel_start, kernel_end - kernel_start); > >> > > >> > >> OK, so the trick is to mark a memblock region NOMAP temporarily, so > >> that we can iterate over the regions more easily? > >> Is that the sole reason for using NOMAP in this series? > > > > Yes. (I followed Mark's suggestion.) > > > > OK. It is slightly hacky, but it should work without any problems afaict > > > So I assume that my change here will be essentially orthogonal > > with the chnages in for-next/core, at least, in its intent. > > > > Yes. The changes should not conflict in fundamental ways, but the code > has changed in ways that git will not be able to deal with. > Unfortunately, that does mean another respin :-( Can you please review the patch attached below? Hunks look a bit complex, but the resulting code is good, I believe. If you are happy with it, I will post the entire patchset as v35. Thanks, -Takahiro AKASHI ===8<=== From 5b546ab2a755cc2abf858c2abbd5887cdcbb31fd Mon Sep 17 00:00:00 2001 From: Takahiro Akashi Date: Tue, 28 Mar 2017 15:51:24 +0900 Subject: [PATCH] arm64: kdump: protect crash dump kernel memory arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres() are meant to be called by kexec_load() in order to protect the memory allocated for crash dump kernel once the image is loaded. The protection is implemented by unmapping the relevant segments in crash dump kernel memory, rather than making it read-only as other archs do, to prevent coherency issues due to potential cache aliasing (with mismatched attributes). Page-level mappings are consistently used here so that we can change the attributes of segments in page granularity as well as shrink the region also in page granularity through /sys/kernel/kexec_crash_size, putting the freed memory back to buddy system. Signed-off-by: AKASHI Takahiro --- arch/arm64/kernel/machine_kexec.c | 32 +++++++++--- arch/arm64/mm/mmu.c | 103 ++++++++++++++++++++------------------ 2 files changed, 80 insertions(+), 55 deletions(-) diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index bc96c8a7fc79..b63baa749609 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -14,7 +14,9 @@ #include #include +#include #include +#include #include "cpu-reset.h" @@ -22,8 +24,6 @@ extern const unsigned char arm64_relocate_new_kernel[]; extern const unsigned long arm64_relocate_new_kernel_size; -static unsigned long kimage_start; - /** * kexec_image_info - For debugging output. */ @@ -64,8 +64,6 @@ void machine_kexec_cleanup(struct kimage *kimage) */ int machine_kexec_prepare(struct kimage *kimage) { - kimage_start = kimage->start; - kexec_image_info(kimage); if (kimage->type != KEXEC_TYPE_CRASH && cpus_are_stuck_in_kernel()) { @@ -183,7 +181,7 @@ void machine_kexec(struct kimage *kimage) kexec_list_flush(kimage); /* Flush the new image if already in place. */ - if (kimage->head & IND_DONE) + if ((kimage != kexec_crash_image) && (kimage->head & IND_DONE)) kexec_segment_flush(kimage); pr_info("Bye!\n"); @@ -201,7 +199,7 @@ void machine_kexec(struct kimage *kimage) */ cpu_soft_restart(1, reboot_code_buffer_phys, kimage->head, - kimage_start, 0); + kimage->start, 0); BUG(); /* Should never get here. */ } @@ -210,3 +208,25 @@ void machine_crash_shutdown(struct pt_regs *regs) { /* Empty routine needed to avoid build errors. */ } + +void arch_kexec_protect_crashkres(void) +{ + int i; + + kexec_segment_flush(kexec_crash_image); + + for (i = 0; i < kexec_crash_image->nr_segments; i++) + set_memory_valid( + __phys_to_virt(kexec_crash_image->segment[i].mem), + kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 0); +} + +void arch_kexec_unprotect_crashkres(void) +{ + int i; + + for (i = 0; i < kexec_crash_image->nr_segments; i++) + set_memory_valid( + __phys_to_virt(kexec_crash_image->segment[i].mem), + kexec_crash_image->segment[i].memsz >> PAGE_SHIFT, 1); +} diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 91502e36e6d9..3cde5f2d30ec 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -22,6 +22,8 @@ #include #include #include +#include +#include #include #include #include @@ -393,10 +395,28 @@ static void update_mapping_prot(phys_addr_t phys, unsigned long virt, flush_tlb_kernel_range(virt, virt + size); } -static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end) +static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, + phys_addr_t end, pgprot_t prot, int flags) +{ + __create_pgd_mapping(pgd, start, __phys_to_virt(start), end - start, + prot, early_pgtable_alloc, flags); +} + +void __init mark_linear_text_alias_ro(void) +{ + /* + * Remove the write permissions from the linear alias of .text/.rodata + */ + update_mapping_prot(__pa_symbol(_text), (unsigned long)lm_alias(_text), + (unsigned long)__init_begin - (unsigned long)_text, + PAGE_KERNEL_RO); +} + +static void __init map_mem(pgd_t *pgd) { phys_addr_t kernel_start = __pa_symbol(_text); phys_addr_t kernel_end = __pa_symbol(__init_begin); + struct memblock_region *reg; int flags = 0; if (debug_pagealloc_enabled()) @@ -405,30 +425,28 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end /* * Take care not to create a writable alias for the * read-only text and rodata sections of the kernel image. + * So temporarily mark them as NOMAP to skip mappings in + * the following for-loop */ + memblock_mark_nomap(kernel_start, kernel_end - kernel_start); +#ifdef CONFIG_KEXEC_CORE + if (crashk_res.end) + memblock_mark_nomap(crashk_res.start, + resource_size(&crashk_res)); +#endif - /* No overlap with the kernel text/rodata */ - if (end < kernel_start || start >= kernel_end) { - __create_pgd_mapping(pgd, start, __phys_to_virt(start), - end - start, PAGE_KERNEL, - early_pgtable_alloc, flags); - return; - } + /* map all the memory banks */ + for_each_memblock(memory, reg) { + phys_addr_t start = reg->base; + phys_addr_t end = start + reg->size; - /* - * This block overlaps the kernel text/rodata mappings. - * Map the portion(s) which don't overlap. - */ - if (start < kernel_start) - __create_pgd_mapping(pgd, start, - __phys_to_virt(start), - kernel_start - start, PAGE_KERNEL, - early_pgtable_alloc, flags); - if (kernel_end < end) - __create_pgd_mapping(pgd, kernel_end, - __phys_to_virt(kernel_end), - end - kernel_end, PAGE_KERNEL, - early_pgtable_alloc, flags); + if (start >= end) + break; + if (memblock_is_nomap(reg)) + continue; + + __map_memblock(pgd, start, end, PAGE_KERNEL, flags); + } /* * Map the linear alias of the [_text, __init_begin) interval @@ -440,37 +458,24 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end * Note that contiguous mappings cannot be remapped in this way, * so we should avoid them here. */ - __create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start), - kernel_end - kernel_start, PAGE_KERNEL, - early_pgtable_alloc, NO_CONT_MAPPINGS); -} + __map_memblock(pgd, kernel_start, kernel_end, + PAGE_KERNEL_RO, NO_CONT_MAPPINGS); + memblock_clear_nomap(kernel_start, kernel_end - kernel_start); -void __init mark_linear_text_alias_ro(void) -{ +#ifdef CONFIG_KEXEC_CORE /* - * Remove the write permissions from the linear alias of .text/.rodata + * Use page-level mappings here so that we can shrink the region + * in page granularity and put back unused memory to buddy system + * through /sys/kernel/kexec_crash_size interface. */ - update_mapping_prot(__pa_symbol(_text), (unsigned long)lm_alias(_text), - (unsigned long)__init_begin - (unsigned long)_text, - PAGE_KERNEL_RO); -} - -static void __init map_mem(pgd_t *pgd) -{ - struct memblock_region *reg; - - /* map all the memory banks */ - for_each_memblock(memory, reg) { - phys_addr_t start = reg->base; - phys_addr_t end = start + reg->size; - - if (start >= end) - break; - if (memblock_is_nomap(reg)) - continue; - - __map_memblock(pgd, start, end); + if (crashk_res.end) { + __map_memblock(pgd, crashk_res.start, crashk_res.end + 1, + PAGE_KERNEL, + NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS); + memblock_clear_nomap(crashk_res.start, + resource_size(&crashk_res)); } +#endif } void mark_rodata_ro(void)