mbox series

[v1,00/11] fs/proc/vmcore: kdump support for virtio-mem on s390

Message ID 20241025151134.1275575-1-david@redhat.com (mailing list archive)
Headers show
Series fs/proc/vmcore: kdump support for virtio-mem on s390 | expand

Message

David Hildenbrand Oct. 25, 2024, 3:11 p.m. UTC
This is based on "[PATCH v3 0/7] virtio-mem: s390 support" [1], which adds
virtio-mem support on s390.

The only "different than everything else" thing about virtio-mem on s390
is kdump: The crash (2nd) kernel allocates+prepares the elfcore hdr
during fs_init()->vmcore_init()->elfcorehdr_alloc(). Consequently, the
crash kernel must detect memory ranges of the crashed/panicked kernel to
include via PT_LOAD in the vmcore.

On other architectures, all RAM regions (boot + hotplugged) can easily be
observed on the old (to crash) kernel (e.g., using /proc/iomem) to create
the elfcore hdr.

On s390, information about "ordinary" memory (heh, "storage") can be
obtained by querying the hypervisor/ultravisor via SCLP/diag260, and
that information is stored early during boot in the "physmem" memblock
data structure.

But virtio-mem memory is always detected by as device driver, which is
usually build as a module. So in the crash kernel, this memory can only be
properly detected once the virtio-mem driver started up.

The virtio-mem driver already supports the "kdump mode", where it won't
hotplug any memory but instead queries the device to implement the
pfn_is_ram() callback, to avoid reading unplugged memory holes when reading
the vmcore.

With this series, if the virtio-mem driver is included in the kdump
initrd -- which dracut already takes care of under Fedora/RHEL -- it will
now detect the device RAM ranges on s390 once it probes the devices, to add
them to the vmcore using the same callback mechanism we already have for
pfn_is_ram().

To add these device RAM ranges to the vmcore ("patch the vmcore"), we will
add new PT_LOAD entries that describe these memory ranges, and update
all offsets vmcore size so it is all consistent.

Note that makedumfile is shaky with v6.12-rcX, I made the "obvious" things
(e.g., free page detection) work again while testing as documented in [2].

Creating the dumps using makedumpfile seems to work fine, and the
dump regions (PT_LOAD) are as expected. I yet have to check in more detail
if the created dumps are good (IOW, the right memory was dumped, but it
looks like makedumpfile reads the right memory when interpreting the
kernel data structures, which is promising).

Patch #1 -- #6 are vmcore preparations and cleanups
Patch #7 adds the infrastructure for drivers to report device RAM
Patch #8 + #9 are virtio-mem preparations
Patch #10 implements virtio-mem support to report device RAM
Patch #11 activates it for s390, implementing a new function to fill
          PT_LOAD entry for device RAM

[1] https://lkml.kernel.org/r/20241025141453.1210600-1-david@redhat.com
[2] https://github.com/makedumpfile/makedumpfile/issues/16

Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: "Eugenio Pérez" <eperezma@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Eric Farman <farman@linux.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>

David Hildenbrand (11):
  fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex
  fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex
  fs/proc/vmcore: disallow vmcore modifications after the vmcore was
    opened
  fs/proc/vmcore: move vmcore definitions from kcore.h to crash_dump.h
  fs/proc/vmcore: factor out allocating a vmcore memory node
  fs/proc/vmcore: factor out freeing a list of vmcore ranges
  fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM
    ranges in 2nd kernel
  virtio-mem: mark device ready before registering callbacks in kdump
    mode
  virtio-mem: remember usable region size
  virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM
  s390/kdump: virtio-mem kdump support (CONFIG_PROC_VMCORE_DEVICE_RAM)

 arch/s390/Kconfig             |   1 +
 arch/s390/kernel/crash_dump.c |  39 +++--
 drivers/virtio/Kconfig        |   1 +
 drivers/virtio/virtio_mem.c   | 103 +++++++++++++-
 fs/proc/Kconfig               |  25 ++++
 fs/proc/vmcore.c              | 258 +++++++++++++++++++++++++---------
 include/linux/crash_dump.h    |  47 +++++++
 include/linux/kcore.h         |  13 --
 8 files changed, 396 insertions(+), 91 deletions(-)

Comments

Baoquan He Nov. 4, 2024, 6:21 a.m. UTC | #1
On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> This is based on "[PATCH v3 0/7] virtio-mem: s390 support" [1], which adds
> virtio-mem support on s390.
> 
> The only "different than everything else" thing about virtio-mem on s390
> is kdump: The crash (2nd) kernel allocates+prepares the elfcore hdr
> during fs_init()->vmcore_init()->elfcorehdr_alloc(). Consequently, the
> crash kernel must detect memory ranges of the crashed/panicked kernel to
> include via PT_LOAD in the vmcore.
> 
> On other architectures, all RAM regions (boot + hotplugged) can easily be
> observed on the old (to crash) kernel (e.g., using /proc/iomem) to create
> the elfcore hdr.
> 
> On s390, information about "ordinary" memory (heh, "storage") can be
> obtained by querying the hypervisor/ultravisor via SCLP/diag260, and
> that information is stored early during boot in the "physmem" memblock
> data structure.
> 
> But virtio-mem memory is always detected by as device driver, which is
> usually build as a module. So in the crash kernel, this memory can only be
> properly detected once the virtio-mem driver started up.
> 
> The virtio-mem driver already supports the "kdump mode", where it won't
> hotplug any memory but instead queries the device to implement the
> pfn_is_ram() callback, to avoid reading unplugged memory holes when reading
> the vmcore.
> 
> With this series, if the virtio-mem driver is included in the kdump
> initrd -- which dracut already takes care of under Fedora/RHEL -- it will
> now detect the device RAM ranges on s390 once it probes the devices, to add
> them to the vmcore using the same callback mechanism we already have for
> pfn_is_ram().
> 
> To add these device RAM ranges to the vmcore ("patch the vmcore"), we will
> add new PT_LOAD entries that describe these memory ranges, and update
> all offsets vmcore size so it is all consistent.
> 
> Note that makedumfile is shaky with v6.12-rcX, I made the "obvious" things
> (e.g., free page detection) work again while testing as documented in [2].
> 
> Creating the dumps using makedumpfile seems to work fine, and the
> dump regions (PT_LOAD) are as expected. I yet have to check in more detail
> if the created dumps are good (IOW, the right memory was dumped, but it
> looks like makedumpfile reads the right memory when interpreting the
> kernel data structures, which is promising).
> 
> Patch #1 -- #6 are vmcore preparations and cleanups

Thanks for CC-ing me, I will review the patch 1-6, vmcore part next
week.
Baoquan He Nov. 15, 2024, 8:46 a.m. UTC | #2
On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> This is based on "[PATCH v3 0/7] virtio-mem: s390 support" [1], which adds
> virtio-mem support on s390.
> 
> The only "different than everything else" thing about virtio-mem on s390
> is kdump: The crash (2nd) kernel allocates+prepares the elfcore hdr
> during fs_init()->vmcore_init()->elfcorehdr_alloc(). Consequently, the
> crash kernel must detect memory ranges of the crashed/panicked kernel to
> include via PT_LOAD in the vmcore.
> 
> On other architectures, all RAM regions (boot + hotplugged) can easily be
> observed on the old (to crash) kernel (e.g., using /proc/iomem) to create
> the elfcore hdr.
> 
> On s390, information about "ordinary" memory (heh, "storage") can be
> obtained by querying the hypervisor/ultravisor via SCLP/diag260, and
> that information is stored early during boot in the "physmem" memblock
> data structure.
> 
> But virtio-mem memory is always detected by as device driver, which is
> usually build as a module. So in the crash kernel, this memory can only be
                                       ~~~~~~~~~~~
                                       Is it 1st kernel or 2nd kernel?
Usually we call the 1st kernel as panicked kernel, crashed kernel, the
2nd kernel as kdump kernel. 
> properly detected once the virtio-mem driver started up.
> 
> The virtio-mem driver already supports the "kdump mode", where it won't
> hotplug any memory but instead queries the device to implement the
> pfn_is_ram() callback, to avoid reading unplugged memory holes when reading
> the vmcore.
> 
> With this series, if the virtio-mem driver is included in the kdump
> initrd -- which dracut already takes care of under Fedora/RHEL -- it will
> now detect the device RAM ranges on s390 once it probes the devices, to add
> them to the vmcore using the same callback mechanism we already have for
> pfn_is_ram().

Do you mean on s390 virtio-mem memory region will be detected and added
to vmcore in kdump kernel when virtio-mem driver is initialized? Not
sure if I understand it correctly.

> 
> To add these device RAM ranges to the vmcore ("patch the vmcore"), we will
> add new PT_LOAD entries that describe these memory ranges, and update
> all offsets vmcore size so it is all consistent.
> 
> Note that makedumfile is shaky with v6.12-rcX, I made the "obvious" things
> (e.g., free page detection) work again while testing as documented in [2].
> 
> Creating the dumps using makedumpfile seems to work fine, and the
> dump regions (PT_LOAD) are as expected. I yet have to check in more detail
> if the created dumps are good (IOW, the right memory was dumped, but it
> looks like makedumpfile reads the right memory when interpreting the
> kernel data structures, which is promising).
> 
> Patch #1 -- #6 are vmcore preparations and cleanups
> Patch #7 adds the infrastructure for drivers to report device RAM
> Patch #8 + #9 are virtio-mem preparations
> Patch #10 implements virtio-mem support to report device RAM
> Patch #11 activates it for s390, implementing a new function to fill
>           PT_LOAD entry for device RAM
> 
> [1] https://lkml.kernel.org/r/20241025141453.1210600-1-david@redhat.com
> [2] https://github.com/makedumpfile/makedumpfile/issues/16
> 
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Alexander Gordeev <agordeev@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Sven Schnelle <svens@linux.ibm.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Cc: "Eugenio Pérez" <eperezma@redhat.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: Thomas Huth <thuth@redhat.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Janosch Frank <frankja@linux.ibm.com>
> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Cc: Eric Farman <farman@linux.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> 
> David Hildenbrand (11):
>   fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex
>   fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex
>   fs/proc/vmcore: disallow vmcore modifications after the vmcore was
>     opened
>   fs/proc/vmcore: move vmcore definitions from kcore.h to crash_dump.h
>   fs/proc/vmcore: factor out allocating a vmcore memory node
>   fs/proc/vmcore: factor out freeing a list of vmcore ranges
>   fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM
>     ranges in 2nd kernel
>   virtio-mem: mark device ready before registering callbacks in kdump
>     mode
>   virtio-mem: remember usable region size
>   virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM
>   s390/kdump: virtio-mem kdump support (CONFIG_PROC_VMCORE_DEVICE_RAM)
> 
>  arch/s390/Kconfig             |   1 +
>  arch/s390/kernel/crash_dump.c |  39 +++--
>  drivers/virtio/Kconfig        |   1 +
>  drivers/virtio/virtio_mem.c   | 103 +++++++++++++-
>  fs/proc/Kconfig               |  25 ++++
>  fs/proc/vmcore.c              | 258 +++++++++++++++++++++++++---------
>  include/linux/crash_dump.h    |  47 +++++++
>  include/linux/kcore.h         |  13 --
>  8 files changed, 396 insertions(+), 91 deletions(-)
> 
> -- 
> 2.46.1
>
David Hildenbrand Nov. 15, 2024, 8:55 a.m. UTC | #3
On 15.11.24 09:46, Baoquan He wrote:
> On 10/25/24 at 05:11pm, David Hildenbrand wrote:
>> This is based on "[PATCH v3 0/7] virtio-mem: s390 support" [1], which adds
>> virtio-mem support on s390.
>>
>> The only "different than everything else" thing about virtio-mem on s390
>> is kdump: The crash (2nd) kernel allocates+prepares the elfcore hdr
>> during fs_init()->vmcore_init()->elfcorehdr_alloc(). Consequently, the
>> crash kernel must detect memory ranges of the crashed/panicked kernel to
>> include via PT_LOAD in the vmcore.
>>
>> On other architectures, all RAM regions (boot + hotplugged) can easily be
>> observed on the old (to crash) kernel (e.g., using /proc/iomem) to create
>> the elfcore hdr.
>>
>> On s390, information about "ordinary" memory (heh, "storage") can be
>> obtained by querying the hypervisor/ultravisor via SCLP/diag260, and
>> that information is stored early during boot in the "physmem" memblock
>> data structure.
>>
>> But virtio-mem memory is always detected by as device driver, which is
>> usually build as a module. So in the crash kernel, this memory can only be
>                                         ~~~~~~~~~~~
>                                         Is it 1st kernel or 2nd kernel?
> Usually we call the 1st kernel as panicked kernel, crashed kernel, the
> 2nd kernel as kdump kernel.

It should have been called "kdump (2nd) kernel" here indeed.

>> properly detected once the virtio-mem driver started up.
>>
>> The virtio-mem driver already supports the "kdump mode", where it won't
>> hotplug any memory but instead queries the device to implement the
>> pfn_is_ram() callback, to avoid reading unplugged memory holes when reading
>> the vmcore.
>>
>> With this series, if the virtio-mem driver is included in the kdump
>> initrd -- which dracut already takes care of under Fedora/RHEL -- it will
>> now detect the device RAM ranges on s390 once it probes the devices, to add
>> them to the vmcore using the same callback mechanism we already have for
>> pfn_is_ram().
> 
> Do you mean on s390 virtio-mem memory region will be detected and added
> to vmcore in kdump kernel when virtio-mem driver is initialized? Not
> sure if I understand it correctly.

Yes exactly. In the kdump kernel, the driver gets probed and registers 
the vmcore callbacks. From there, we detect and add the device regions.

Thanks!
Baoquan He Nov. 15, 2024, 9:48 a.m. UTC | #4
On 11/15/24 at 09:55am, David Hildenbrand wrote:
> On 15.11.24 09:46, Baoquan He wrote:
> > On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> > > This is based on "[PATCH v3 0/7] virtio-mem: s390 support" [1], which adds
> > > virtio-mem support on s390.
> > > 
> > > The only "different than everything else" thing about virtio-mem on s390
> > > is kdump: The crash (2nd) kernel allocates+prepares the elfcore hdr
> > > during fs_init()->vmcore_init()->elfcorehdr_alloc(). Consequently, the
> > > crash kernel must detect memory ranges of the crashed/panicked kernel to
> > > include via PT_LOAD in the vmcore.
> > > 
> > > On other architectures, all RAM regions (boot + hotplugged) can easily be
> > > observed on the old (to crash) kernel (e.g., using /proc/iomem) to create
> > > the elfcore hdr.
> > > 
> > > On s390, information about "ordinary" memory (heh, "storage") can be
> > > obtained by querying the hypervisor/ultravisor via SCLP/diag260, and
> > > that information is stored early during boot in the "physmem" memblock
> > > data structure.
> > > 
> > > But virtio-mem memory is always detected by as device driver, which is
> > > usually build as a module. So in the crash kernel, this memory can only be
> >                                         ~~~~~~~~~~~
> >                                         Is it 1st kernel or 2nd kernel?
> > Usually we call the 1st kernel as panicked kernel, crashed kernel, the
> > 2nd kernel as kdump kernel.
> 
> It should have been called "kdump (2nd) kernel" here indeed.
> 
> > > properly detected once the virtio-mem driver started up.
> > > 
> > > The virtio-mem driver already supports the "kdump mode", where it won't
> > > hotplug any memory but instead queries the device to implement the
> > > pfn_is_ram() callback, to avoid reading unplugged memory holes when reading
> > > the vmcore.
> > > 
> > > With this series, if the virtio-mem driver is included in the kdump
> > > initrd -- which dracut already takes care of under Fedora/RHEL -- it will
> > > now detect the device RAM ranges on s390 once it probes the devices, to add
> > > them to the vmcore using the same callback mechanism we already have for
> > > pfn_is_ram().
> > 
> > Do you mean on s390 virtio-mem memory region will be detected and added
> > to vmcore in kdump kernel when virtio-mem driver is initialized? Not
> > sure if I understand it correctly.
> 
> Yes exactly. In the kdump kernel, the driver gets probed and registers the
> vmcore callbacks. From there, we detect and add the device regions.

I see now, thanks for your confirmation.