Message ID | 20190507035058.63992-3-chenzhou10@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | support reserving crashkernel above 4G on arm64 kdump | expand |
Hello, On 07/05/2019 04:50, Chen Zhou wrote: > When crashkernel is reserved above 4G in memory, kernel should > reserve some amount of low memory for swiotlb and some DMA buffers. > Meanwhile, support crashkernel=X,[high,low] in arm64. When use > crashkernel=X parameter, try low memory first and fall back to high > memory unless "crashkernel=X,high" is specified. What is the 'unless crashkernel=...,high' for? I think it would be simpler to relax the ARCH_LOW_ADDRESS_LIMIT if reserve_crashkernel_low() allocated something. This way "crashkernel=1G" tries to allocate 1G below 4G, but fails if there isn't enough memory. "crashkernel=1G crashkernel=16M,low" allocates 16M below 4G, which is more likely to succeed, if it does it can then place the 1G block anywhere. > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c > index 413d566..82cd9a0 100644 > --- a/arch/arm64/kernel/setup.c > +++ b/arch/arm64/kernel/setup.c > @@ -243,6 +243,9 @@ static void __init request_standard_resources(void) > request_resource(res, &kernel_data); > #ifdef CONFIG_KEXEC_CORE > /* Userspace will find "Crash kernel" region in /proc/iomem. */ > + if (crashk_low_res.end && crashk_low_res.start >= res->start && > + crashk_low_res.end <= res->end) > + request_resource(res, &crashk_low_res); > if (crashk_res.end && crashk_res.start >= res->start && > crashk_res.end <= res->end) > request_resource(res, &crashk_res); With both crashk_low_res and crashk_res, we end up with two entries in /proc/iomem called "Crash kernel". Because its sorted by address, and kexec-tools stops searching when it find "Crash kernel", you are always going to get the kernel placed in the lower portion. I suspect this isn't what you want, can we rename crashk_low_res for arm64 so that existing kexec-tools doesn't use it? > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index d2adffb..3fcd739 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -74,20 +74,37 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init; > static void __init reserve_crashkernel(void) > { > unsigned long long crash_base, crash_size; > + bool high = false; > int ret; > > ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), > &crash_size, &crash_base); > /* no crashkernel= or invalid value specified */ > - if (ret || !crash_size) > - return; > + if (ret || !crash_size) { > + /* crashkernel=X,high */ > + ret = parse_crashkernel_high(boot_command_line, > + memblock_phys_mem_size(), > + &crash_size, &crash_base); > + if (ret || !crash_size) > + return; > + high = true; > + } > > crash_size = PAGE_ALIGN(crash_size); > > if (crash_base == 0) { > - /* Current arm64 boot protocol requires 2MB alignment */ > - crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT, > - crash_size, SZ_2M); > + /* > + * Try low memory first and fall back to high memory > + * unless "crashkernel=size[KMG],high" is specified. > + */ > + if (!high) > + crash_base = memblock_find_in_range(0, > + ARCH_LOW_ADDRESS_LIMIT, > + crash_size, CRASH_ALIGN); > + if (!crash_base) > + crash_base = memblock_find_in_range(0, > + memblock_end_of_DRAM(), > + crash_size, CRASH_ALIGN); > if (crash_base == 0) { > pr_warn("cannot allocate crashkernel (size:0x%llx)\n", > crash_size); > @@ -105,13 +122,18 @@ static void __init reserve_crashkernel(void) > return; > } > > - if (!IS_ALIGNED(crash_base, SZ_2M)) { > + if (!IS_ALIGNED(crash_base, CRASH_ALIGN)) { > pr_warn("cannot reserve crashkernel: base address is not 2MB aligned\n"); > return; > } > } > memblock_reserve(crash_base, crash_size); > > + if (crash_base >= SZ_4G && reserve_crashkernel_low()) { > + memblock_free(crash_base, crash_size); > + return; This is going to be annoying on platforms that don't have, and don't need memory below 4G. A "crashkernel=...,low" on these system will break crashdump. I don't think we should expect users to know the memory layout. (I'm assuming distro's are going to add a low reservation everywhere, just in case) I think the 'low' region should be a small optional/best-effort extra, that kexec-tools can't touch. I'm afraid you've missed the ugly bit of the crashkernel reservation... arch/arm64/mm/mmu.c::map_mem() marks the crashkernel as 'nomap' during the first pass of page-table generation. This means it isn't mapped in the linear map. It then maps it with page-size mappings, and removes the nomap flag. This is done so that arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres() can remove the valid bits of the crashkernel mapping. This way the old-kernel can't accidentally overwrite the crashkernel. It also saves us if the old-kernel and the crashkernel use different memory attributes for the mapping. As your low-memory reservation is intended to be used for devices, having it mapped by the old-kernel as cacheable memory is going to cause problems if those CPUs aren't taken offline and go corrupting this memory. (we did crash for a reason after all) I think the simplest thing to do is mark the low region as 'nomap' in reserve_crashkernel() and always leave it unmapped. We can then describe it via a different string in /proc/iomem, something like "Crash kernel (low)". Older kexec-tools shouldn't use it, (I assume its not using strncmp() in a way that would do this by accident), and newer kexec-tools can know to describe it in the DT, but it can't write to it. Thanks, James
Hi James, On 2019/6/6 0:29, James Morse wrote: > Hello, > > On 07/05/2019 04:50, Chen Zhou wrote: >> When crashkernel is reserved above 4G in memory, kernel should >> reserve some amount of low memory for swiotlb and some DMA buffers. > >> Meanwhile, support crashkernel=X,[high,low] in arm64. When use >> crashkernel=X parameter, try low memory first and fall back to high >> memory unless "crashkernel=X,high" is specified. > > What is the 'unless crashkernel=...,high' for? I think it would be simpler to relax the > ARCH_LOW_ADDRESS_LIMIT if reserve_crashkernel_low() allocated something. > > This way "crashkernel=1G" tries to allocate 1G below 4G, but fails if there isn't enough > memory. "crashkernel=1G crashkernel=16M,low" allocates 16M below 4G, which is more likely > to succeed, if it does it can then place the 1G block anywhere. > Yeah, this is much simpler. > >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c >> index 413d566..82cd9a0 100644 >> --- a/arch/arm64/kernel/setup.c >> +++ b/arch/arm64/kernel/setup.c >> @@ -243,6 +243,9 @@ static void __init request_standard_resources(void) >> request_resource(res, &kernel_data); >> #ifdef CONFIG_KEXEC_CORE >> /* Userspace will find "Crash kernel" region in /proc/iomem. */ >> + if (crashk_low_res.end && crashk_low_res.start >= res->start && >> + crashk_low_res.end <= res->end) >> + request_resource(res, &crashk_low_res); >> if (crashk_res.end && crashk_res.start >= res->start && >> crashk_res.end <= res->end) >> request_resource(res, &crashk_res); > > With both crashk_low_res and crashk_res, we end up with two entries in /proc/iomem called > "Crash kernel". Because its sorted by address, and kexec-tools stops searching when it > find "Crash kernel", you are always going to get the kernel placed in the lower portion. > > I suspect this isn't what you want, can we rename crashk_low_res for arm64 so that > existing kexec-tools doesn't use it? > In my patchset, in addition to the kernel patches, i also modify the kexec-tools. arm64: support more than one crash kernel regions(http://lists.infradead.org/pipermail/kexec/2019-April/022792.html). In kexec-tools patch, we read all the "Crash kernel" entry and load crash kernel high. > >> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c >> index d2adffb..3fcd739 100644 >> --- a/arch/arm64/mm/init.c >> +++ b/arch/arm64/mm/init.c >> @@ -74,20 +74,37 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init; >> static void __init reserve_crashkernel(void) >> { >> unsigned long long crash_base, crash_size; >> + bool high = false; >> int ret; >> >> ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), >> &crash_size, &crash_base); >> /* no crashkernel= or invalid value specified */ >> - if (ret || !crash_size) >> - return; >> + if (ret || !crash_size) { >> + /* crashkernel=X,high */ >> + ret = parse_crashkernel_high(boot_command_line, >> + memblock_phys_mem_size(), >> + &crash_size, &crash_base); >> + if (ret || !crash_size) >> + return; >> + high = true; >> + } >> >> crash_size = PAGE_ALIGN(crash_size); >> >> if (crash_base == 0) { >> - /* Current arm64 boot protocol requires 2MB alignment */ >> - crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT, >> - crash_size, SZ_2M); >> + /* >> + * Try low memory first and fall back to high memory >> + * unless "crashkernel=size[KMG],high" is specified. >> + */ >> + if (!high) >> + crash_base = memblock_find_in_range(0, >> + ARCH_LOW_ADDRESS_LIMIT, >> + crash_size, CRASH_ALIGN); >> + if (!crash_base) >> + crash_base = memblock_find_in_range(0, >> + memblock_end_of_DRAM(), >> + crash_size, CRASH_ALIGN); >> if (crash_base == 0) { >> pr_warn("cannot allocate crashkernel (size:0x%llx)\n", >> crash_size); >> @@ -105,13 +122,18 @@ static void __init reserve_crashkernel(void) >> return; >> } >> >> - if (!IS_ALIGNED(crash_base, SZ_2M)) { >> + if (!IS_ALIGNED(crash_base, CRASH_ALIGN)) { >> pr_warn("cannot reserve crashkernel: base address is not 2MB aligned\n"); >> return; >> } >> } >> memblock_reserve(crash_base, crash_size); >> >> + if (crash_base >= SZ_4G && reserve_crashkernel_low()) { >> + memblock_free(crash_base, crash_size); >> + return; > > This is going to be annoying on platforms that don't have, and don't need memory below 4G. > A "crashkernel=...,low" on these system will break crashdump. I don't think we should > expect users to know the memory layout. (I'm assuming distro's are going to add a low > reservation everywhere, just in case) > > I think the 'low' region should be a small optional/best-effort extra, that kexec-tools > can't touch. > > > I'm afraid you've missed the ugly bit of the crashkernel reservation... > > arch/arm64/mm/mmu.c::map_mem() marks the crashkernel as 'nomap' during the first pass of > page-table generation. This means it isn't mapped in the linear map. It then maps it with > page-size mappings, and removes the nomap flag. > > This is done so that arch_kexec_protect_crashkres() and > arch_kexec_unprotect_crashkres() can remove the valid bits of the crashkernel mapping. > This way the old-kernel can't accidentally overwrite the crashkernel. It also saves us if > the old-kernel and the crashkernel use different memory attributes for the mapping. > > As your low-memory reservation is intended to be used for devices, having it mapped by the > old-kernel as cacheable memory is going to cause problems if those CPUs aren't taken > offline and go corrupting this memory. (we did crash for a reason after all) > > > I think the simplest thing to do is mark the low region as 'nomap' in > reserve_crashkernel() and always leave it unmapped. We can then describe it via a > different string in /proc/iomem, something like "Crash kernel (low)". Older kexec-tools > shouldn't use it, (I assume its not using strncmp() in a way that would do this by > accident), and newer kexec-tools can know to describe it in the DT, but it can't write to it. > I did miss the bit of the crashkernel reservation. I will fix this in next version. > > Thanks, > > James > > . > Thanks, Chen Zhou
Hi Chen Zhou, On 13/06/2019 12:27, Chen Zhou wrote: > On 2019/6/6 0:29, James Morse wrote: >> On 07/05/2019 04:50, Chen Zhou wrote: >>> When crashkernel is reserved above 4G in memory, kernel should >>> reserve some amount of low memory for swiotlb and some DMA buffers. >> >>> Meanwhile, support crashkernel=X,[high,low] in arm64. When use >>> crashkernel=X parameter, try low memory first and fall back to high >>> memory unless "crashkernel=X,high" is specified. >> >> What is the 'unless crashkernel=...,high' for? I think it would be simpler to relax the >> ARCH_LOW_ADDRESS_LIMIT if reserve_crashkernel_low() allocated something. >> >> This way "crashkernel=1G" tries to allocate 1G below 4G, but fails if there isn't enough >> memory. "crashkernel=1G crashkernel=16M,low" allocates 16M below 4G, which is more likely >> to succeed, if it does it can then place the 1G block anywhere. >> > Yeah, this is much simpler. >>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c >>> index 413d566..82cd9a0 100644 >>> --- a/arch/arm64/kernel/setup.c >>> +++ b/arch/arm64/kernel/setup.c >>> @@ -243,6 +243,9 @@ static void __init request_standard_resources(void) >>> request_resource(res, &kernel_data); >>> #ifdef CONFIG_KEXEC_CORE >>> /* Userspace will find "Crash kernel" region in /proc/iomem. */ >>> + if (crashk_low_res.end && crashk_low_res.start >= res->start && >>> + crashk_low_res.end <= res->end) >>> + request_resource(res, &crashk_low_res); >>> if (crashk_res.end && crashk_res.start >= res->start && >>> crashk_res.end <= res->end) >>> request_resource(res, &crashk_res); >> >> With both crashk_low_res and crashk_res, we end up with two entries in /proc/iomem called >> "Crash kernel". Because its sorted by address, and kexec-tools stops searching when it >> find "Crash kernel", you are always going to get the kernel placed in the lower portion. >> >> I suspect this isn't what you want, can we rename crashk_low_res for arm64 so that >> existing kexec-tools doesn't use it? > In my patchset, in addition to the kernel patches, i also modify the kexec-tools. > arm64: support more than one crash kernel regions(http://lists.infradead.org/pipermail/kexec/2019-April/022792.html). > In kexec-tools patch, we read all the "Crash kernel" entry and load crash kernel high. But we can't rely on people updating user-space when they update the kernel! [...] >> I'm afraid you've missed the ugly bit of the crashkernel reservation... >> >> arch/arm64/mm/mmu.c::map_mem() marks the crashkernel as 'nomap' during the first pass of >> page-table generation. This means it isn't mapped in the linear map. It then maps it with >> page-size mappings, and removes the nomap flag. >> >> This is done so that arch_kexec_protect_crashkres() and >> arch_kexec_unprotect_crashkres() can remove the valid bits of the crashkernel mapping. >> This way the old-kernel can't accidentally overwrite the crashkernel. It also saves us if >> the old-kernel and the crashkernel use different memory attributes for the mapping. >> >> As your low-memory reservation is intended to be used for devices, having it mapped by the >> old-kernel as cacheable memory is going to cause problems if those CPUs aren't taken >> offline and go corrupting this memory. (we did crash for a reason after all) >> >> >> I think the simplest thing to do is mark the low region as 'nomap' in >> reserve_crashkernel() and always leave it unmapped. We can then describe it via a >> different string in /proc/iomem, something like "Crash kernel (low)". Older kexec-tools >> shouldn't use it, (I assume its not using strncmp() in a way that would do this by >> accident), and newer kexec-tools can know to describe it in the DT, but it can't write to it. > I did miss the bit of the crashkernel reservation. > I will fix this in next version. I think all that is needed is to make the low-region nmap, and describe it via /proc/iomem with a name where nothing will try and use it by accident. Thanks, James
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 67e4cb7..32949bf 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -28,6 +28,9 @@ #define KEXEC_ARCH KEXEC_ARCH_AARCH64 +/* 2M alignment for crash kernel regions */ +#define CRASH_ALIGN SZ_2M + #ifndef __ASSEMBLY__ /** diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 413d566..82cd9a0 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -243,6 +243,9 @@ static void __init request_standard_resources(void) request_resource(res, &kernel_data); #ifdef CONFIG_KEXEC_CORE /* Userspace will find "Crash kernel" region in /proc/iomem. */ + if (crashk_low_res.end && crashk_low_res.start >= res->start && + crashk_low_res.end <= res->end) + request_resource(res, &crashk_low_res); if (crashk_res.end && crashk_res.start >= res->start && crashk_res.end <= res->end) request_resource(res, &crashk_res); diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index d2adffb..3fcd739 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -74,20 +74,37 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init; static void __init reserve_crashkernel(void) { unsigned long long crash_base, crash_size; + bool high = false; int ret; ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), &crash_size, &crash_base); /* no crashkernel= or invalid value specified */ - if (ret || !crash_size) - return; + if (ret || !crash_size) { + /* crashkernel=X,high */ + ret = parse_crashkernel_high(boot_command_line, + memblock_phys_mem_size(), + &crash_size, &crash_base); + if (ret || !crash_size) + return; + high = true; + } crash_size = PAGE_ALIGN(crash_size); if (crash_base == 0) { - /* Current arm64 boot protocol requires 2MB alignment */ - crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT, - crash_size, SZ_2M); + /* + * Try low memory first and fall back to high memory + * unless "crashkernel=size[KMG],high" is specified. + */ + if (!high) + crash_base = memblock_find_in_range(0, + ARCH_LOW_ADDRESS_LIMIT, + crash_size, CRASH_ALIGN); + if (!crash_base) + crash_base = memblock_find_in_range(0, + memblock_end_of_DRAM(), + crash_size, CRASH_ALIGN); if (crash_base == 0) { pr_warn("cannot allocate crashkernel (size:0x%llx)\n", crash_size); @@ -105,13 +122,18 @@ static void __init reserve_crashkernel(void) return; } - if (!IS_ALIGNED(crash_base, SZ_2M)) { + if (!IS_ALIGNED(crash_base, CRASH_ALIGN)) { pr_warn("cannot reserve crashkernel: base address is not 2MB aligned\n"); return; } } memblock_reserve(crash_base, crash_size); + if (crash_base >= SZ_4G && reserve_crashkernel_low()) { + memblock_free(crash_base, crash_size); + return; + } + pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n", crash_base, crash_base + crash_size, crash_size >> 20);
When crashkernel is reserved above 4G in memory, kernel should reserve some amount of low memory for swiotlb and some DMA buffers. Meanwhile, support crashkernel=X,[high,low] in arm64. When use crashkernel=X parameter, try low memory first and fall back to high memory unless "crashkernel=X,high" is specified. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> --- arch/arm64/include/asm/kexec.h | 3 +++ arch/arm64/kernel/setup.c | 3 +++ arch/arm64/mm/init.c | 34 ++++++++++++++++++++++++++++------ 3 files changed, 34 insertions(+), 6 deletions(-)