Message ID | 20241014144622.876731-2-david@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | virtio-mem: s390 support | expand |
On Mon, Oct 14, 2024 at 04:46:13PM +0200, David Hildenbrand wrote: > s390 currently always results in is_kdump_kernel() == false, because > it sets "elfcorehdr_addr = ELFCORE_ADDR_MAX;" early during setup_arch to > deactivate the elfcorehdr= kernel parameter. > > Let's follow the powerpc example and implement our own logic. > > This is required for virtio-mem to reliably identify a kdump > environment to not try hotplugging memory. > > Tested-by: Mario Casquero <mcasquer@redhat.com> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > arch/s390/include/asm/kexec.h | 4 ++++ > arch/s390/kernel/crash_dump.c | 6 ++++++ > 2 files changed, 10 insertions(+) Looks like this could work. But the comment in smp.c above dump_available() needs to be updated. Are you willing to do that, or should I provide an addon patch?
On 14.10.24 20:20, Heiko Carstens wrote: > On Mon, Oct 14, 2024 at 04:46:13PM +0200, David Hildenbrand wrote: >> s390 currently always results in is_kdump_kernel() == false, because >> it sets "elfcorehdr_addr = ELFCORE_ADDR_MAX;" early during setup_arch to >> deactivate the elfcorehdr= kernel parameter. >> >> Let's follow the powerpc example and implement our own logic. >> >> This is required for virtio-mem to reliably identify a kdump >> environment to not try hotplugging memory. >> >> Tested-by: Mario Casquero <mcasquer@redhat.com> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> arch/s390/include/asm/kexec.h | 4 ++++ >> arch/s390/kernel/crash_dump.c | 6 ++++++ >> 2 files changed, 10 insertions(+) > > Looks like this could work. But the comment in smp.c above > dump_available() needs to be updated. A right, I remember that there was some outdated documentation. > > Are you willing to do that, or should I provide an addon patch? > I can squash the following: diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c index 4df56fdb2488..a4f538876462 100644 --- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -587,16 +587,16 @@ int smp_store_status(int cpu) * with sigp stop-and-store-status. The firmware or the boot-loader * stored the registers of the boot CPU in the absolute lowcore in the * memory of the old system. - * 3) kdump and the old kernel did not store the CPU state, - * or stand-alone kdump for DASD - * condition: OLDMEM_BASE != NULL && !is_kdump_kernel() + * 3) kdump or stand-alone kdump for DASD + * condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false * The state for all CPUs except the boot CPU needs to be collected * with sigp stop-and-store-status. The kexec code or the boot-loader * stored the registers of the boot CPU in the memory of the old system. - * 4) kdump and the old kernel stored the CPU state - * condition: OLDMEM_BASE != NULL && is_kdump_kernel() - * This case does not exist for s390 anymore, setup_arch explicitly - * deactivates the elfcorehdr= kernel parameter + * + * Note that the old Kdump mode where the old kernel stored the CPU state + * does no longer exist: setup_arch explicitly deactivates the elfcorehdr= + * kernel parameter. The is_kudmp_kernel() implementation on s390 is independent + * of the elfcorehdr= parameter. */ static bool dump_available(void) { Does that sound reasonable? I'm not so sure about the "2) stand-alone kdump for SCSI/NVMe (zfcp/nvme dump with swapped memory)": is that really "kdump" ?
On Mon, Oct 14, 2024 at 09:26:03PM +0200, David Hildenbrand wrote: > On 14.10.24 20:20, Heiko Carstens wrote: > > Looks like this could work. But the comment in smp.c above > > dump_available() needs to be updated. > > A right, I remember that there was some outdated documentation. > > > > > Are you willing to do that, or should I provide an addon patch? > > > > I can squash the following: > > diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c > index 4df56fdb2488..a4f538876462 100644 > --- a/arch/s390/kernel/smp.c > +++ b/arch/s390/kernel/smp.c > @@ -587,16 +587,16 @@ int smp_store_status(int cpu) > * with sigp stop-and-store-status. The firmware or the boot-loader > * stored the registers of the boot CPU in the absolute lowcore in the > * memory of the old system. > - * 3) kdump and the old kernel did not store the CPU state, > - * or stand-alone kdump for DASD > - * condition: OLDMEM_BASE != NULL && !is_kdump_kernel() > + * 3) kdump or stand-alone kdump for DASD > + * condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false > * The state for all CPUs except the boot CPU needs to be collected > * with sigp stop-and-store-status. The kexec code or the boot-loader > * stored the registers of the boot CPU in the memory of the old system. > - * 4) kdump and the old kernel stored the CPU state > - * condition: OLDMEM_BASE != NULL && is_kdump_kernel() > - * This case does not exist for s390 anymore, setup_arch explicitly > - * deactivates the elfcorehdr= kernel parameter > + * > + * Note that the old Kdump mode where the old kernel stored the CPU state To be consistent with the rest of the comment, please write kdump in all lower case characters, please. > + * does no longer exist: setup_arch explicitly deactivates the elfcorehdr= > + * kernel parameter. The is_kudmp_kernel() implementation on s390 is independent Typo: kudmp. > Does that sound reasonable? I'm not so sure about the "2) stand-alone kdump for > SCSI/NVMe (zfcp/nvme dump with swapped memory)": is that really "kdump" ? Yes, it is some sort of kdump, even though a bit odd. But the comment as it is doesn't need to be changed. Only at the very top, please also change: "There are four cases" into "There are three cases". Then it all looks good. Thanks a lot!
On 15.10.24 10:30, Heiko Carstens wrote: > On Mon, Oct 14, 2024 at 09:26:03PM +0200, David Hildenbrand wrote: >> On 14.10.24 20:20, Heiko Carstens wrote: >>> Looks like this could work. But the comment in smp.c above >>> dump_available() needs to be updated. >> >> A right, I remember that there was some outdated documentation. >> >>> >>> Are you willing to do that, or should I provide an addon patch? >>> >> >> I can squash the following: >> >> diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c >> index 4df56fdb2488..a4f538876462 100644 >> --- a/arch/s390/kernel/smp.c >> +++ b/arch/s390/kernel/smp.c >> @@ -587,16 +587,16 @@ int smp_store_status(int cpu) >> * with sigp stop-and-store-status. The firmware or the boot-loader >> * stored the registers of the boot CPU in the absolute lowcore in the >> * memory of the old system. >> - * 3) kdump and the old kernel did not store the CPU state, >> - * or stand-alone kdump for DASD >> - * condition: OLDMEM_BASE != NULL && !is_kdump_kernel() >> + * 3) kdump or stand-alone kdump for DASD >> + * condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false >> * The state for all CPUs except the boot CPU needs to be collected >> * with sigp stop-and-store-status. The kexec code or the boot-loader >> * stored the registers of the boot CPU in the memory of the old system. >> - * 4) kdump and the old kernel stored the CPU state >> - * condition: OLDMEM_BASE != NULL && is_kdump_kernel() >> - * This case does not exist for s390 anymore, setup_arch explicitly >> - * deactivates the elfcorehdr= kernel parameter >> + * >> + * Note that the old Kdump mode where the old kernel stored the CPU state > > To be consistent with the rest of the comment, please write kdump in > all lower case characters, please. It obviously was too late in the evening for me :) Thanks! > >> + * does no longer exist: setup_arch explicitly deactivates the elfcorehdr= >> + * kernel parameter. The is_kudmp_kernel() implementation on s390 is independent > > Typo: kudmp. > >> Does that sound reasonable? I'm not so sure about the "2) stand-alone kdump for >> SCSI/NVMe (zfcp/nvme dump with swapped memory)": is that really "kdump" ? > > Yes, it is some sort of kdump, even though a bit odd. My concern is that we'll now have bool is_kdump_kernel(void) { return oldmem_data.start && !is_ipl_type_dump(); } Which matches 3), but if 2) is also called "kdump", then should it actually be bool is_kdump_kernel(void) { return oldmem_data.start; } ? When I wrote that code I was rather convinced that the variant in this patch is the right thing to do.
On 15.10.24 10:41, David Hildenbrand wrote: > On 15.10.24 10:30, Heiko Carstens wrote: >> On Mon, Oct 14, 2024 at 09:26:03PM +0200, David Hildenbrand wrote: >>> On 14.10.24 20:20, Heiko Carstens wrote: >>>> Looks like this could work. But the comment in smp.c above >>>> dump_available() needs to be updated. >>> >>> A right, I remember that there was some outdated documentation. >>> >>>> >>>> Are you willing to do that, or should I provide an addon patch? >>>> >>> >>> I can squash the following: >>> >>> diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c >>> index 4df56fdb2488..a4f538876462 100644 >>> --- a/arch/s390/kernel/smp.c >>> +++ b/arch/s390/kernel/smp.c >>> @@ -587,16 +587,16 @@ int smp_store_status(int cpu) >>> * with sigp stop-and-store-status. The firmware or the boot-loader >>> * stored the registers of the boot CPU in the absolute lowcore in the >>> * memory of the old system. >>> - * 3) kdump and the old kernel did not store the CPU state, >>> - * or stand-alone kdump for DASD >>> - * condition: OLDMEM_BASE != NULL && !is_kdump_kernel() >>> + * 3) kdump or stand-alone kdump for DASD >>> + * condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false >>> * The state for all CPUs except the boot CPU needs to be collected >>> * with sigp stop-and-store-status. The kexec code or the boot-loader >>> * stored the registers of the boot CPU in the memory of the old system. >>> - * 4) kdump and the old kernel stored the CPU state >>> - * condition: OLDMEM_BASE != NULL && is_kdump_kernel() >>> - * This case does not exist for s390 anymore, setup_arch explicitly >>> - * deactivates the elfcorehdr= kernel parameter >>> + * >>> + * Note that the old Kdump mode where the old kernel stored the CPU state >> >> To be consistent with the rest of the comment, please write kdump in >> all lower case characters, please. > > It obviously was too late in the evening for me :) Thanks! > >> >>> + * does no longer exist: setup_arch explicitly deactivates the elfcorehdr= >>> + * kernel parameter. The is_kudmp_kernel() implementation on s390 is independent >> >> Typo: kudmp. >> >>> Does that sound reasonable? I'm not so sure about the "2) stand-alone kdump for >>> SCSI/NVMe (zfcp/nvme dump with swapped memory)": is that really "kdump" ? >> >> Yes, it is some sort of kdump, even though a bit odd. > > My concern is that we'll now have > > bool is_kdump_kernel(void) > { > return oldmem_data.start && !is_ipl_type_dump(); > } > > Which matches 3), but if 2) is also called "kdump", then should it > actually be > > bool is_kdump_kernel(void) > { > return oldmem_data.start; > } > > ? > > When I wrote that code I was rather convinced that the variant in this > patch is the right thing to do. > I think we can do some follow up cleanups, assuming is_kdump_kernel() here is correct: diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c index cca1827d3d2e..fbc5de66d03b 100644 --- a/arch/s390/kernel/crash_dump.c +++ b/arch/s390/kernel/crash_dump.c @@ -609,7 +609,7 @@ int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size) u64 hdr_off; /* If we are not in kdump or zfcp/nvme dump mode return */ - if (!oldmem_data.start && !is_ipl_type_dump()) + if (!dump_available()) return 0; /* If we cannot get HSA size for zfcp/nvme dump return error */ if (is_ipl_type_dump() && !sclp.hsa_size) diff --git a/arch/s390/kernel/os_info.c b/arch/s390/kernel/os_info.c index b695f980bbde..09578f400ef7 100644 --- a/arch/s390/kernel/os_info.c +++ b/arch/s390/kernel/os_info.c @@ -148,7 +148,7 @@ static void os_info_old_init(void) if (os_info_init) return; - if (!oldmem_data.start && !is_ipl_type_dump()) + if (!dump_available()) goto fail; if (copy_oldmem_kernel(&addr, __LC_OS_INFO, sizeof(addr))) goto fail; diff --git a/drivers/s390/char/zcore.c b/drivers/s390/char/zcore.c index 33cebb91b933..6a194b4f6ba5 100644 --- a/drivers/s390/char/zcore.c +++ b/drivers/s390/char/zcore.c @@ -300,9 +300,7 @@ static int __init zcore_init(void) unsigned char arch; int rc; - if (!is_ipl_type_dump()) - return -ENODATA; - if (oldmem_data.start) + if (is_kdump_kernel()) return -ENODATA; zcore_dbf = debug_register("zcore", 4, 1, 4 * sizeof(long));
On 15.10.24 10:53, David Hildenbrand wrote: > On 15.10.24 10:41, David Hildenbrand wrote: >> On 15.10.24 10:30, Heiko Carstens wrote: >>> On Mon, Oct 14, 2024 at 09:26:03PM +0200, David Hildenbrand wrote: >>>> On 14.10.24 20:20, Heiko Carstens wrote: >>>>> Looks like this could work. But the comment in smp.c above >>>>> dump_available() needs to be updated. >>>> >>>> A right, I remember that there was some outdated documentation. >>>> >>>>> >>>>> Are you willing to do that, or should I provide an addon patch? >>>>> >>>> >>>> I can squash the following: >>>> >>>> diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c >>>> index 4df56fdb2488..a4f538876462 100644 >>>> --- a/arch/s390/kernel/smp.c >>>> +++ b/arch/s390/kernel/smp.c >>>> @@ -587,16 +587,16 @@ int smp_store_status(int cpu) >>>> * with sigp stop-and-store-status. The firmware or the boot-loader >>>> * stored the registers of the boot CPU in the absolute lowcore in the >>>> * memory of the old system. >>>> - * 3) kdump and the old kernel did not store the CPU state, >>>> - * or stand-alone kdump for DASD >>>> - * condition: OLDMEM_BASE != NULL && !is_kdump_kernel() >>>> + * 3) kdump or stand-alone kdump for DASD >>>> + * condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false >>>> * The state for all CPUs except the boot CPU needs to be collected >>>> * with sigp stop-and-store-status. The kexec code or the boot-loader >>>> * stored the registers of the boot CPU in the memory of the old system. >>>> - * 4) kdump and the old kernel stored the CPU state >>>> - * condition: OLDMEM_BASE != NULL && is_kdump_kernel() >>>> - * This case does not exist for s390 anymore, setup_arch explicitly >>>> - * deactivates the elfcorehdr= kernel parameter >>>> + * >>>> + * Note that the old Kdump mode where the old kernel stored the CPU state >>> >>> To be consistent with the rest of the comment, please write kdump in >>> all lower case characters, please. >> >> It obviously was too late in the evening for me :) Thanks! >> >>> >>>> + * does no longer exist: setup_arch explicitly deactivates the elfcorehdr= >>>> + * kernel parameter. The is_kudmp_kernel() implementation on s390 is independent >>> >>> Typo: kudmp. >>> >>>> Does that sound reasonable? I'm not so sure about the "2) stand-alone kdump for >>>> SCSI/NVMe (zfcp/nvme dump with swapped memory)": is that really "kdump" ? >>> >>> Yes, it is some sort of kdump, even though a bit odd. >> >> My concern is that we'll now have >> >> bool is_kdump_kernel(void) >> { >> return oldmem_data.start && !is_ipl_type_dump(); >> } >> >> Which matches 3), but if 2) is also called "kdump", then should it >> actually be >> >> bool is_kdump_kernel(void) >> { >> return oldmem_data.start; >> } >> >> ? >> >> When I wrote that code I was rather convinced that the variant in this >> patch is the right thing to do. >> > > I think we can do some follow up cleanups, assuming is_kdump_kernel() here is correct: > > diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c > index cca1827d3d2e..fbc5de66d03b 100644 > --- a/arch/s390/kernel/crash_dump.c > +++ b/arch/s390/kernel/crash_dump.c > @@ -609,7 +609,7 @@ int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size) > u64 hdr_off; > > /* If we are not in kdump or zfcp/nvme dump mode return */ > - if (!oldmem_data.start && !is_ipl_type_dump()) > + if (!dump_available()) > return 0; > /* If we cannot get HSA size for zfcp/nvme dump return error */ > if (is_ipl_type_dump() && !sclp.hsa_size) > diff --git a/arch/s390/kernel/os_info.c b/arch/s390/kernel/os_info.c > index b695f980bbde..09578f400ef7 100644 > --- a/arch/s390/kernel/os_info.c > +++ b/arch/s390/kernel/os_info.c > @@ -148,7 +148,7 @@ static void os_info_old_init(void) > > if (os_info_init) > return; > - if (!oldmem_data.start && !is_ipl_type_dump()) > + if (!dump_available()) > goto fail; > if (copy_oldmem_kernel(&addr, __LC_OS_INFO, sizeof(addr))) > goto fail; > diff --git a/drivers/s390/char/zcore.c b/drivers/s390/char/zcore.c > index 33cebb91b933..6a194b4f6ba5 100644 > --- a/drivers/s390/char/zcore.c > +++ b/drivers/s390/char/zcore.c > @@ -300,9 +300,7 @@ static int __init zcore_init(void) > unsigned char arch; > int rc; > > - if (!is_ipl_type_dump()) > - return -ENODATA; > - if (oldmem_data.start) > + if (is_kdump_kernel()) > return -ENODATA; > > zcore_dbf = debug_register("zcore", 4, 1, 4 * sizeof(long)); > > Ugh, ignore the last one, I'm just confused about dumping options on s390x at this point :)
On Tue, Oct 15, 2024 at 10:41:21AM +0200, David Hildenbrand wrote: > On 15.10.24 10:30, Heiko Carstens wrote: > > On Mon, Oct 14, 2024 at 09:26:03PM +0200, David Hildenbrand wrote: > > > On 14.10.24 20:20, Heiko Carstens wrote: > > > > Looks like this could work. But the comment in smp.c above > > > > dump_available() needs to be updated. > > > > > > A right, I remember that there was some outdated documentation. ... > My concern is that we'll now have > > bool is_kdump_kernel(void) > { > return oldmem_data.start && !is_ipl_type_dump(); > } > > Which matches 3), but if 2) is also called "kdump", then should it actually > be > > bool is_kdump_kernel(void) > { > return oldmem_data.start; > } > > ? > > When I wrote that code I was rather convinced that the variant in this patch > is the right thing to do. Oh well, we simply of too many dump options. I'll let Alexander and Mikhail better comment on this :) Alexander, Mikhail, the discussion starts here, and we need a sane is_kdump_kernel() for s390: https://lore.kernel.org/all/20241014144622.876731-1-david@redhat.com/
On 15.10.24 12:08, Heiko Carstens wrote: > On Tue, Oct 15, 2024 at 10:41:21AM +0200, David Hildenbrand wrote: >> On 15.10.24 10:30, Heiko Carstens wrote: >>> On Mon, Oct 14, 2024 at 09:26:03PM +0200, David Hildenbrand wrote: >>>> On 14.10.24 20:20, Heiko Carstens wrote: >>>>> Looks like this could work. But the comment in smp.c above >>>>> dump_available() needs to be updated. >>>> >>>> A right, I remember that there was some outdated documentation. > > ... > >> My concern is that we'll now have >> >> bool is_kdump_kernel(void) >> { >> return oldmem_data.start && !is_ipl_type_dump(); >> } >> >> Which matches 3), but if 2) is also called "kdump", then should it actually >> be >> >> bool is_kdump_kernel(void) >> { >> return oldmem_data.start; >> } >> >> ? >> >> When I wrote that code I was rather convinced that the variant in this patch >> is the right thing to do. > > Oh well, we simply of too many dump options. I'll let Alexander and > Mikhail better comment on this :) Thanks! > > Alexander, Mikhail, the discussion starts here, and we need a sane > is_kdump_kernel() for s390: > https://lore.kernel.org/all/20241014144622.876731-1-david@redhat.com/ > With the following cleanup in mind: From 9fbbff43f725a8482ce9e85857316ab8484ff3c8 Mon Sep 17 00:00:00 2001 From: David Hildenbrand <david@redhat.com> Date: Tue, 15 Oct 2024 11:07:43 +0200 Subject: [PATCH] s390: use !dump_available() instead of "!oldmem_data.start && !is_ipl_type_dump()" Let's make the code more readable: !dump_available() -> !(oldmem_data.start || is_ipl_type_dump()) -> !oldmem_data.start && !is_ipl_type_dump() Signed-off-by: David Hildenbrand <david@redhat.com> --- arch/s390/kernel/crash_dump.c | 2 +- arch/s390/kernel/os_info.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c index cca1827d3d2e..fbc5de66d03b 100644 --- a/arch/s390/kernel/crash_dump.c +++ b/arch/s390/kernel/crash_dump.c @@ -609,7 +609,7 @@ int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size) u64 hdr_off; /* If we are not in kdump or zfcp/nvme dump mode return */ - if (!oldmem_data.start && !is_ipl_type_dump()) + if (!dump_available()) return 0; /* If we cannot get HSA size for zfcp/nvme dump return error */ if (is_ipl_type_dump() && !sclp.hsa_size) diff --git a/arch/s390/kernel/os_info.c b/arch/s390/kernel/os_info.c index b695f980bbde..09578f400ef7 100644 --- a/arch/s390/kernel/os_info.c +++ b/arch/s390/kernel/os_info.c @@ -148,7 +148,7 @@ static void os_info_old_init(void) if (os_info_init) return; - if (!oldmem_data.start && !is_ipl_type_dump()) + if (!dump_available()) goto fail; if (copy_oldmem_kernel(&addr, __LC_OS_INFO, sizeof(addr))) goto fail;
Hi David, > My concern is that we'll now have > > bool is_kdump_kernel(void) > { > return oldmem_data.start && !is_ipl_type_dump(); > } > > Which matches 3), but if 2) is also called "kdump", then should it actually > be > > bool is_kdump_kernel(void) > { > return oldmem_data.start; > } > > ? > > When I wrote that code I was rather convinced that the variant in this patch > is the right thing to do. A short explanation about what a stand-alone kdump is. * First, it's not really a _regular_ kdump activated with kexec-tools and executed by Linux itself but a regular stand-alone dump (SCSI) from the FW's perspective (one has to use HMC or dumpconf to execute it and not with kexec-tools like for the _regular_ kdump). * One has to reserve crashkernel memory region in the old crashed kernel even if it remains unused until the dump starts. * zipl uses regular kdump kernel and initramfs to create stand-alone dumper images and to write them to a dump disk which is used for IPLIng the stand-alone dumper. * The zipl bootloader takes care of transferring the old kernel memory saved in HSA by the FW to the crashkernel memory region reserved by the old crashed kernel before it enters the dumper. The HSA memory is released by the zipl bootloader _before_ the dumper image is entered, therefore, we cannot use HSA to read old kernel memory, and instead use memory from crashkernel region, just like the regular kdump. * is_ipl_type_dump() will be true for a stand-alone kdump because we IPL the dumper like a regular stand-alone dump (e.g. zfcpdump). * Summarized, zipl bootloader prepares an environment which is expected by the regular kdump for a stand-alone kdump dumper before it is entered. In my opinion, the correct version of is_kdump_kernel() would be bool is_kdump_kernel(void) { return oldmem_data.start; } because Linux kernel doesn't differentiate between both the regular and the stand-alone kdump where it matters while performing dumper operations (e.g. reading saved old kernel memory from crashkernel memory region). Furthermore, if i'm not mistaken then the purpose of is_kdump_kernel() is to tell us whether Linux kernel runs in a kdump like environment and not whether the current mode is identical to the proper and true kdump, right ? And if stand-alone kdump swims like a duck, quacks like one, then it is one, regardless how it was started, by kexecing or IPLing from a disk. The stand-alone kdump has a very special use case which most users will never encounter. And usually, one just takes zfcpdump instead which is more robust and much smaller considering how big kdump initrd can get. stand-alone kdump dumper images cannot exceed HSA memory limit on a Z machine. Regards Alex
>> >> When I wrote that code I was rather convinced that the variant in this patch >> is the right thing to do. > > A short explanation about what a stand-alone kdump is. > > * First, it's not really a _regular_ kdump activated with kexec-tools and > executed by Linux itself but a regular stand-alone dump (SCSI) from the > FW's perspective (one has to use HMC or dumpconf to execute it and not > with kexec-tools like for the _regular_ kdump). Ah, that makes sense. > * One has to reserve crashkernel memory region in the old crashed kernel > even if it remains unused until the dump starts. > * zipl uses regular kdump kernel and initramfs to create stand-alone > dumper images and to write them to a dump disk which is used for > IPLIng the stand-alone dumper. > * The zipl bootloader takes care of transferring the old kernel memory > saved in HSA by the FW to the crashkernel memory region reserved by the old > crashed kernel before it enters the dumper. The HSA memory is released > by the zipl bootloader _before_ the dumper image is entered, > therefore, we cannot use HSA to read old kernel memory, and instead > use memory from crashkernel region, just like the regular kdump. > * is_ipl_type_dump() will be true for a stand-alone kdump because we IPL > the dumper like a regular stand-alone dump (e.g. zfcpdump). > * Summarized, zipl bootloader prepares an environment which is expected by > the regular kdump for a stand-alone kdump dumper before it is entered. Thanks for the details! > > In my opinion, the correct version of is_kdump_kernel() would be > > bool is_kdump_kernel(void) > { > return oldmem_data.start; > } > > because Linux kernel doesn't differentiate between both the regular > and the stand-alone kdump where it matters while performing dumper > operations (e.g. reading saved old kernel memory from crashkernel memory region). > Right, but if we consider "/proc/vmcore is available", a better version would IMHO be: bool is_kdump_kernel(void) { return dump_available(); } Because that is mostly (not completely) how is_kdump_kernel() would have worked right now *after* we had the elfcorehdr_alloc() during the fs_init call. > Furthermore, if i'm not mistaken then the purpose of is_kdump_kernel() > is to tell us whether Linux kernel runs in a kdump like environment and not > whether the current mode is identical to the proper and true kdump, > right ? And if stand-alone kdump swims like a duck, quacks like one, then it > is one, regardless how it was started, by kexecing or IPLing > from a disk. Same thinking here. > > The stand-alone kdump has a very special use case which most users will > never encounter. And usually, one just takes zfcpdump instead which is > more robust and much smaller considering how big kdump initrd can get. > stand-alone kdump dumper images cannot exceed HSA memory limit on a Z machine. Makes sense, so it boils down to either bool is_kdump_kernel(void) { return oldmem_data.start; } Which means is_kdump_kernel() can be "false" even though /proc/vmcore is available or bool is_kdump_kernel(void) { return dump_available(); } Which means is_kdump_kernel() can never be "false" if /proc/vmcore is available. There is the chance of is_kdump_kernel() being "true" if "elfcorehdr_alloc()" fails with -ENODEV. You're call :) Thanks!
On 16.10.24 17:47, David Hildenbrand wrote: >>> >>> When I wrote that code I was rather convinced that the variant in this patch >>> is the right thing to do. >> >> A short explanation about what a stand-alone kdump is. >> >> * First, it's not really a _regular_ kdump activated with kexec-tools and >> executed by Linux itself but a regular stand-alone dump (SCSI) from the >> FW's perspective (one has to use HMC or dumpconf to execute it and not >> with kexec-tools like for the _regular_ kdump). > > Ah, that makes sense. > >> * One has to reserve crashkernel memory region in the old crashed kernel >> even if it remains unused until the dump starts. >> * zipl uses regular kdump kernel and initramfs to create stand-alone >> dumper images and to write them to a dump disk which is used for >> IPLIng the stand-alone dumper. >> * The zipl bootloader takes care of transferring the old kernel memory >> saved in HSA by the FW to the crashkernel memory region reserved by the old >> crashed kernel before it enters the dumper. The HSA memory is released >> by the zipl bootloader _before_ the dumper image is entered, >> therefore, we cannot use HSA to read old kernel memory, and instead >> use memory from crashkernel region, just like the regular kdump. >> * is_ipl_type_dump() will be true for a stand-alone kdump because we IPL >> the dumper like a regular stand-alone dump (e.g. zfcpdump). >> * Summarized, zipl bootloader prepares an environment which is expected by >> the regular kdump for a stand-alone kdump dumper before it is entered. > > Thanks for the details! > >> >> In my opinion, the correct version of is_kdump_kernel() would be >> >> bool is_kdump_kernel(void) >> { >> return oldmem_data.start; >> } >> >> because Linux kernel doesn't differentiate between both the regular >> and the stand-alone kdump where it matters while performing dumper >> operations (e.g. reading saved old kernel memory from crashkernel memory region). >> > > Right, but if we consider "/proc/vmcore is available", a better version > would IMHO be: > > bool is_kdump_kernel(void) > { > return dump_available(); > } > > Because that is mostly (not completely) how is_kdump_kernel() would have > worked right now *after* we had the elfcorehdr_alloc() during the > fs_init call. > > >> Furthermore, if i'm not mistaken then the purpose of is_kdump_kernel() >> is to tell us whether Linux kernel runs in a kdump like environment and not >> whether the current mode is identical to the proper and true kdump, >> right ? And if stand-alone kdump swims like a duck, quacks like one, then it >> is one, regardless how it was started, by kexecing or IPLing >> from a disk. > > Same thinking here. > >> >> The stand-alone kdump has a very special use case which most users will >> never encounter. And usually, one just takes zfcpdump instead which is >> more robust and much smaller considering how big kdump initrd can get. >> stand-alone kdump dumper images cannot exceed HSA memory limit on a Z machine. > > Makes sense, so it boils down to either > > bool is_kdump_kernel(void) > { > return oldmem_data.start; > } > > Which means is_kdump_kernel() can be "false" even though /proc/vmcore is > available or > > bool is_kdump_kernel(void) > { > return dump_available(); > } > > Which means is_kdump_kernel() can never be "false" if /proc/vmcore is > available. There is the chance of is_kdump_kernel() being "true" if > "elfcorehdr_alloc()" fails with -ENODEV. > > > You're call :) Thanks! > What I think we should do is the following (improved comment + patch description), but I'll do whatever you think is better: From e86194b5195c743eff33f563796b9c725fecc65f Mon Sep 17 00:00:00 2001 From: David Hildenbrand <david@redhat.com> Date: Wed, 4 Sep 2024 14:57:10 +0200 Subject: [PATCH] s390/kdump: provide custom is_kdump_kernel() s390 currently always results in is_kdump_kernel() == false until vmcore_init()->elfcorehdr_alloc() ran, because it sets "elfcorehdr_addr = ELFCORE_ADDR_MAX;" early during setup_arch to deactivate any elfcorehdr= kernel parameter. Let's follow the powerpc example and implement our own logic. Let's use "dump_available()", because this is mostly (with one exception when elfcorehdr_alloc() fails with -ENODEV) when we would create /proc/vmcore and when is_kdump_kernel() would have returned "true" after vmcore_init(). This is required for virtio-mem to reliably identify a kdump environment before vmcore_init() was called to not try hotplugging memory. Update the documentation above dump_available(). Tested-by: Mario Casquero <mcasquer@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> --- arch/s390/include/asm/kexec.h | 4 ++++ arch/s390/kernel/crash_dump.c | 6 ++++++ arch/s390/kernel/smp.c | 16 ++++++++-------- 3 files changed, 18 insertions(+), 8 deletions(-) diff --git a/arch/s390/include/asm/kexec.h b/arch/s390/include/asm/kexec.h index 1bd08eb56d5f..bd20543515f5 100644 --- a/arch/s390/include/asm/kexec.h +++ b/arch/s390/include/asm/kexec.h @@ -94,6 +94,9 @@ void arch_kexec_protect_crashkres(void); void arch_kexec_unprotect_crashkres(void); #define arch_kexec_unprotect_crashkres arch_kexec_unprotect_crashkres + +bool is_kdump_kernel(void); +#define is_kdump_kernel is_kdump_kernel #endif #ifdef CONFIG_KEXEC_FILE @@ -107,4 +110,5 @@ int arch_kexec_apply_relocations_add(struct purgatory_info *pi, int arch_kimage_file_post_load_cleanup(struct kimage *image); #define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup #endif + #endif /*_S390_KEXEC_H */ diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c index 51313ed7e617..43bbaf534dd2 100644 --- a/arch/s390/kernel/crash_dump.c +++ b/arch/s390/kernel/crash_dump.c @@ -237,6 +237,12 @@ int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from, prot); } +bool is_kdump_kernel(void) +{ + return dump_available(); +} +EXPORT_SYMBOL_GPL(is_kdump_kernel); + static const char *nt_name(Elf64_Word type) { const char *name = "LINUX"; diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c index 4df56fdb2488..bd41e35a27a0 100644 --- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -574,7 +574,7 @@ int smp_store_status(int cpu) /* * Collect CPU state of the previous, crashed system. - * There are four cases: + * There are three cases: * 1) standard zfcp/nvme dump * condition: OLDMEM_BASE == NULL && is_ipl_type_dump() == true * The state for all CPUs except the boot CPU needs to be collected @@ -587,16 +587,16 @@ int smp_store_status(int cpu) * with sigp stop-and-store-status. The firmware or the boot-loader * stored the registers of the boot CPU in the absolute lowcore in the * memory of the old system. - * 3) kdump and the old kernel did not store the CPU state, - * or stand-alone kdump for DASD - * condition: OLDMEM_BASE != NULL && !is_kdump_kernel() + * 3) kdump or stand-alone kdump for DASD + * condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false * The state for all CPUs except the boot CPU needs to be collected * with sigp stop-and-store-status. The kexec code or the boot-loader * stored the registers of the boot CPU in the memory of the old system. - * 4) kdump and the old kernel stored the CPU state - * condition: OLDMEM_BASE != NULL && is_kdump_kernel() - * This case does not exist for s390 anymore, setup_arch explicitly - * deactivates the elfcorehdr= kernel parameter + * + * Note that the old kdump mode where the old kernel stored the CPU state + * does no longer exist: setup_arch explicitly deactivates the elfcorehdr= + * kernel parameter. The is_kdump_kernel() implementation on s390 is independent + * of the elfcorehdr= parameter, and is purely based on dump_available(). */ static bool dump_available(void) {
Hi David, David Hildenbrand <david@redhat.com> writes: > Makes sense, so it boils down to either > > bool is_kdump_kernel(void) > { > return oldmem_data.start; > } > > Which means is_kdump_kernel() can be "false" even though /proc/vmcore is > available or > > bool is_kdump_kernel(void) > { > return dump_available(); > } > > Which means is_kdump_kernel() can never be "false" if /proc/vmcore is > available. There is the chance of is_kdump_kernel() being "true" if > "elfcorehdr_alloc()" fails with -ENODEV. Do you consider is_kdump_kernel() returning "true" in case of zfcpdump or nvme/eckd+ldipl dump (also called NGDump) okay ? Because dump_available() would return "true" in such cases too. If yes then please explain why, i might have missed a previous explanation from you. I'm afraid everyone will make wrong assumptions while reading the name of is_kdump_kernel() and assuming that it only applies to kdump or kdump-alike dumps (like stand-alone kdump), and, therefore, introduce bugs because the name of the function conveys the wrong idea to code readers. I consider dump_available() as a superset of is_kdump_kernel() and, therefore, to me they are not equivalent. I have the feeling you consider is_kdump_kernel() equivalent to "/proc/vmcore" being present and not really saying anything about whether kdump is active ? Regards Alex
Am 21.10.24 um 14:46 schrieb Alexander Egorenkov: > Hi David, > > David Hildenbrand <david@redhat.com> writes: > >> Makes sense, so it boils down to either >> >> bool is_kdump_kernel(void) >> { >> return oldmem_data.start; >> } >> >> Which means is_kdump_kernel() can be "false" even though /proc/vmcore is >> available or >> >> bool is_kdump_kernel(void) >> { >> return dump_available(); >> } >> >> Which means is_kdump_kernel() can never be "false" if /proc/vmcore is >> available. There is the chance of is_kdump_kernel() being "true" if >> "elfcorehdr_alloc()" fails with -ENODEV. Thanks for having another look! > > Do you consider is_kdump_kernel() returning "true" in case of zfcpdump or > nvme/eckd+ldipl dump (also called NGDump) okay ? Because > dump_available() would return "true" in such cases too. > If yes then please explain why, i might have missed a previous > explanation from you. I consider it okay because this is the current behavior after elfcorehdr_alloc() succeeded and set elfcorehdr_addr. Not sure if it is the right think to do, though :) Whatever we do, we should achieve on s390 that the result of is_kdump_kernel() is consistent throughout the booting stages, just like on all other architectures. Right now it goes from false->true when /proc/vmcore gets initialized (and only if it gets initialized properly). > > I'm afraid everyone will make wrong assumptions while reading the name > of is_kdump_kernel() and assuming that it only applies to kdump or > kdump-alike dumps (like stand-alone kdump), and, therefore, introduce > bugs because the name of the function conveys the wrong idea to code > readers. I consider dump_available() as a superset of is_kdump_kernel() > and, therefore, to me they are not equivalent. > > I have the feeling you consider is_kdump_kernel() equivalent to > "/proc/vmcore" being present and not really saying anything about > whether kdump is active ? Yes, but primarily because this is the existing handling. Staring at the powerpc implementation: /* * Return true only when kexec based kernel dump capturing method is used. * This ensures all restritions applied for kdump case are not automatically * applied for fadump case. */ bool is_kdump_kernel(void) { return !is_fadump_active() && elfcorehdr_addr != ELFCORE_ADDR_MAX; } EXPORT_SYMBOL_GPL(is_kdump_kernel); Which was added by commit b098f1c32365304633077d73e4ae21c72d4241b3 Author: Hari Bathini <hbathini@linux.ibm.com> Date: Tue Sep 12 13:59:50 2023 +0530 powerpc/fadump: make is_kdump_kernel() return false when fadump is active Currently, is_kdump_kernel() returns true in crash dump capture kernel for both kdump and fadump crash dump capturing methods, as both these methods set elfcorehdr_addr. Some restrictions enforced for crash dump capture kernel, based on is_kdump_kernel(), are specifically meant for kdump case and not desirable for fadump - eg. IO queues restriction in device drivers. So, define is_kdump_kernel() to return false when f/w assisted dump is active. For my purpose (virtio-mem), it's sufficient to only support "kexec triggered kdump" either way, so I don't care. So for me it's good enough to have bool is_kdump_kernel(void) { return oldmem_data.start; } And trying to document the situation in a comment like powerpc does :)
On Mon, Oct 21, 2024 at 04:45:59PM +0200, David Hildenbrand wrote: > For my purpose (virtio-mem), it's sufficient to only support "kexec > triggered kdump" either way, so I don't care. > > So for me it's good enough to have > > bool is_kdump_kernel(void) > { > return oldmem_data.start; > } > > And trying to document the situation in a comment like powerpc does :) Then let's go forward with this, since as Alexander wrote, this is returning what is actually happening. If this is not sufficient or something breaks we can still address this.
On 23.10.24 09:42, Heiko Carstens wrote: > On Mon, Oct 21, 2024 at 04:45:59PM +0200, David Hildenbrand wrote: >> For my purpose (virtio-mem), it's sufficient to only support "kexec >> triggered kdump" either way, so I don't care. >> >> So for me it's good enough to have >> >> bool is_kdump_kernel(void) >> { >> return oldmem_data.start; >> } >> >> And trying to document the situation in a comment like powerpc does :) > > Then let's go forward with this, since as Alexander wrote, this is returning > what is actually happening. If this is not sufficient or something breaks we > can still address this. > Yes, I'll send this change separately from the other virtio-mem stuff out today.
Hi David, David Hildenbrand <david@redhat.com> writes: > Staring at the powerpc implementation: > > /* > * Return true only when kexec based kernel dump capturing method is used. > * This ensures all restritions applied for kdump case are not automatically > * applied for fadump case. > */ > bool is_kdump_kernel(void) > { > return !is_fadump_active() && elfcorehdr_addr != ELFCORE_ADDR_MAX; > } > EXPORT_SYMBOL_GPL(is_kdump_kernel); Thanks for the pointer. I would say power's version is semantically equivalent to what i have in mind for s390 :) If a dump kernel is running, but not a stand-alone one (apart from sa kdump), then it's a kdump kernel. Regards Alex
diff --git a/arch/s390/include/asm/kexec.h b/arch/s390/include/asm/kexec.h index 1bd08eb56d5f..bd20543515f5 100644 --- a/arch/s390/include/asm/kexec.h +++ b/arch/s390/include/asm/kexec.h @@ -94,6 +94,9 @@ void arch_kexec_protect_crashkres(void); void arch_kexec_unprotect_crashkres(void); #define arch_kexec_unprotect_crashkres arch_kexec_unprotect_crashkres + +bool is_kdump_kernel(void); +#define is_kdump_kernel is_kdump_kernel #endif #ifdef CONFIG_KEXEC_FILE @@ -107,4 +110,5 @@ int arch_kexec_apply_relocations_add(struct purgatory_info *pi, int arch_kimage_file_post_load_cleanup(struct kimage *image); #define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup #endif + #endif /*_S390_KEXEC_H */ diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c index edae13416196..cca1827d3d2e 100644 --- a/arch/s390/kernel/crash_dump.c +++ b/arch/s390/kernel/crash_dump.c @@ -237,6 +237,12 @@ int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from, prot); } +bool is_kdump_kernel(void) +{ + return oldmem_data.start && !is_ipl_type_dump(); +} +EXPORT_SYMBOL_GPL(is_kdump_kernel); + static const char *nt_name(Elf64_Word type) { const char *name = "LINUX";