Message ID | 20210322160200.19633-2-david@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | kernel/resource: make walk_system_ram_res() and walk_mem_res() search the whole tree | expand |
On Mon, Mar 22, 2021 at 9:02 AM David Hildenbrand <david@redhat.com> wrote: > > It used to be true that we can have busy system RAM only on the first level > in the resourc tree. However, this is no longer holds for driver-managed > system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on > lower levels. > > We have two users of walk_system_ram_res(), which currently only > consideres the first level: > a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip > IORESOURCE_SYSRAM_DRIVER_MANAGED resources via > locate_mem_hole_callback(), so even after this change, we won't be > placing kexec images onto dax/kmem and virtio-mem added memory. No > change. > b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently > not adding relevant ranges to the crash elf info, resulting in them > not getting dumped via kdump. > > This change fixes loading a crashkernel via kexec_file_load() and including > dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note > that e.g,, arm64 relies on memblock data and, therefore, always considers > all added System RAM already. > > Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function > behave like walk_system_ram_range(). > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> > Cc: Signed-off-by: David Hildenbrand <david@redhat.com> > Cc: Dave Young <dyoung@redhat.com> > Cc: Baoquan He <bhe@redhat.com> > Cc: Vivek Goyal <vgoyal@redhat.com> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Keith Busch <keith.busch@intel.com> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Qian Cai <cai@lca.pw> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Eric Biederman <ebiederm@xmission.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Borislav Petkov <bp@alien8.de> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Tom Lendacky <thomas.lendacky@amd.com> > Cc: Brijesh Singh <brijesh.singh@amd.com> > Cc: x86@kernel.org > Cc: kexec@lists.infradead.org > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > kernel/resource.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/resource.c b/kernel/resource.c > index 627e61b0c124..4efd6e912279 100644 > --- a/kernel/resource.c > +++ b/kernel/resource.c > @@ -457,7 +457,7 @@ int walk_system_ram_res(u64 start, u64 end, void *arg, > { > unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > > - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true, > + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false, > arg, func); Looks good, Reviewed-by: Dan Williams <dan.j.williams@intel.com>
On 22.03.21 17:01, David Hildenbrand wrote: > It used to be true that we can have busy system RAM only on the first level > in the resourc tree. However, this is no longer holds for driver-managed > system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on > lower levels. > > We have two users of walk_system_ram_res(), which currently only > consideres the first level: > a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip > IORESOURCE_SYSRAM_DRIVER_MANAGED resources via > locate_mem_hole_callback(), so even after this change, we won't be > placing kexec images onto dax/kmem and virtio-mem added memory. No > change. > b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently > not adding relevant ranges to the crash elf info, resulting in them > not getting dumped via kdump. > > This change fixes loading a crashkernel via kexec_file_load() and including > dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note > that e.g,, arm64 relies on memblock data and, therefore, always considers > all added System RAM already. > > Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function > behave like walk_system_ram_range(). > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> > Cc: Signed-off-by: David Hildenbrand <david@redhat.com> ^ My copy-paste action when creating the cc list slipped in a duplicate SO in all 3 patches. I can resend if desired.
On Mon, Mar 22, 2021 at 05:01:58PM +0100, David Hildenbrand wrote: > It used to be true that we can have busy system RAM only on the first level > in the resourc tree. However, this is no longer holds for driver-managed > system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on > lower levels. > > We have two users of walk_system_ram_res(), which currently only > consideres the first level: > a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip > IORESOURCE_SYSRAM_DRIVER_MANAGED resources via > locate_mem_hole_callback(), so even after this change, we won't be > placing kexec images onto dax/kmem and virtio-mem added memory. No > change. > b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently > not adding relevant ranges to the crash elf info, resulting in them > not getting dumped via kdump. > > This change fixes loading a crashkernel via kexec_file_load() and including "...fixes..." effectively means to me that Fixes tag should be provided. > dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note > that e.g,, arm64 relies on memblock data and, therefore, always considers > all added System RAM already. > > Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function > behave like walk_system_ram_range(). > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> > Cc: Signed-off-by: David Hildenbrand <david@redhat.com> > Cc: Dave Young <dyoung@redhat.com> > Cc: Baoquan He <bhe@redhat.com> > Cc: Vivek Goyal <vgoyal@redhat.com> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Keith Busch <keith.busch@intel.com> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Qian Cai <cai@lca.pw> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Eric Biederman <ebiederm@xmission.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Borislav Petkov <bp@alien8.de> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Tom Lendacky <thomas.lendacky@amd.com> > Cc: Brijesh Singh <brijesh.singh@amd.com> > Cc: x86@kernel.org > Cc: kexec@lists.infradead.org > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > kernel/resource.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/resource.c b/kernel/resource.c > index 627e61b0c124..4efd6e912279 100644 > --- a/kernel/resource.c > +++ b/kernel/resource.c > @@ -457,7 +457,7 @@ int walk_system_ram_res(u64 start, u64 end, void *arg, > { > unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > > - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true, > + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false, > arg, func); > } > > -- > 2.29.2 >
On Tue, Mar 23, 2021 at 10:40:33AM +0100, David Hildenbrand wrote: > On 22.03.21 17:01, David Hildenbrand wrote: > > It used to be true that we can have busy system RAM only on the first level > > in the resourc tree. However, this is no longer holds for driver-managed > > system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on > > lower levels. > > > > We have two users of walk_system_ram_res(), which currently only > > consideres the first level: > > a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip > > IORESOURCE_SYSRAM_DRIVER_MANAGED resources via > > locate_mem_hole_callback(), so even after this change, we won't be > > placing kexec images onto dax/kmem and virtio-mem added memory. No > > change. > > b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently > > not adding relevant ranges to the crash elf info, resulting in them > > not getting dumped via kdump. > > > > This change fixes loading a crashkernel via kexec_file_load() and including > > dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note > > that e.g,, arm64 relies on memblock data and, therefore, always considers > > all added System RAM already. > > > > Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function > > behave like walk_system_ram_range(). > > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > Cc: Dan Williams <dan.j.williams@intel.com> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> > > Cc: Signed-off-by: David Hildenbrand <david@redhat.com> > > ^ My copy-paste action when creating the cc list slipped in a duplicate SO > in all 3 patches. I can resend if desired. I think to address my comments you will need to resend anyway (as v2).
On 23.03.21 12:06, Andy Shevchenko wrote: > On Mon, Mar 22, 2021 at 05:01:58PM +0100, David Hildenbrand wrote: >> It used to be true that we can have busy system RAM only on the first level >> in the resourc tree. However, this is no longer holds for driver-managed >> system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on >> lower levels. >> >> We have two users of walk_system_ram_res(), which currently only >> consideres the first level: >> a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip >> IORESOURCE_SYSRAM_DRIVER_MANAGED resources via >> locate_mem_hole_callback(), so even after this change, we won't be >> placing kexec images onto dax/kmem and virtio-mem added memory. No >> change. >> b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently >> not adding relevant ranges to the crash elf info, resulting in them >> not getting dumped via kdump. >> >> This change fixes loading a crashkernel via kexec_file_load() and including > > "...fixes..." effectively means to me that Fixes tag should be provided. We can certainly add, although it doesn't really affect the running kernel, but only crashdumps taken in the kdump kernel: Fixes: ebf71552bb0e ("virtio-mem: Add parent resource for all added "System RAM"") Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like normal RAM") Thanks
On 03/22/21 at 05:01pm, David Hildenbrand wrote: > It used to be true that we can have busy system RAM only on the first level > in the resourc tree. However, this is no longer holds for driver-managed > system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on > lower levels. > > We have two users of walk_system_ram_res(), which currently only > consideres the first level: > a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip > IORESOURCE_SYSRAM_DRIVER_MANAGED resources via > locate_mem_hole_callback(), so even after this change, we won't be > placing kexec images onto dax/kmem and virtio-mem added memory. No > change. > b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently > not adding relevant ranges to the crash elf info, resulting in them > not getting dumped via kdump. > > This change fixes loading a crashkernel via kexec_file_load() and including > dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note > that e.g,, arm64 relies on memblock data and, therefore, always considers > all added System RAM already. > > Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function > behave like walk_system_ram_range(). > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> > Cc: Signed-off-by: David Hildenbrand <david@redhat.com> > Cc: Dave Young <dyoung@redhat.com> > Cc: Baoquan He <bhe@redhat.com> > Cc: Vivek Goyal <vgoyal@redhat.com> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Keith Busch <keith.busch@intel.com> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Qian Cai <cai@lca.pw> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Eric Biederman <ebiederm@xmission.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Borislav Petkov <bp@alien8.de> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Tom Lendacky <thomas.lendacky@amd.com> > Cc: Brijesh Singh <brijesh.singh@amd.com> > Cc: x86@kernel.org > Cc: kexec@lists.infradead.org > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > kernel/resource.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/resource.c b/kernel/resource.c > index 627e61b0c124..4efd6e912279 100644 > --- a/kernel/resource.c > +++ b/kernel/resource.c > @@ -457,7 +457,7 @@ int walk_system_ram_res(u64 start, u64 end, void *arg, > { > unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > > - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true, > + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false, > arg, func); Thanks, David, this is a good fix. Acked-by: Baoquan He <bhe@redhat.com> > } > > -- > 2.29.2 >
On Mon, Mar 22, 2021 at 05:01:58PM +0100, David Hildenbrand wrote: > It used to be true that we can have busy system RAM only on the first level > in the resourc tree. However, this is no longer holds for driver-managed > system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on > lower levels. Let me ask some rookie questions: What does "busy" term stand for here? Why resources coming from virtio-mem are added at a lower levels? > > We have two users of walk_system_ram_res(), which currently only > consideres the first level: > a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip > IORESOURCE_SYSRAM_DRIVER_MANAGED resources via > locate_mem_hole_callback(), so even after this change, we won't be > placing kexec images onto dax/kmem and virtio-mem added memory. No > change. > b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently > not adding relevant ranges to the crash elf info, resulting in them > not getting dumped via kdump. > > This change fixes loading a crashkernel via kexec_file_load() and including > dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note > that e.g,, arm64 relies on memblock data and, therefore, always considers > all added System RAM already. > > Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function > behave like walk_system_ram_range(). > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> > Cc: Signed-off-by: David Hildenbrand <david@redhat.com> > Cc: Dave Young <dyoung@redhat.com> > Cc: Baoquan He <bhe@redhat.com> > Cc: Vivek Goyal <vgoyal@redhat.com> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Keith Busch <keith.busch@intel.com> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Qian Cai <cai@lca.pw> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Eric Biederman <ebiederm@xmission.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Borislav Petkov <bp@alien8.de> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Tom Lendacky <thomas.lendacky@amd.com> > Cc: Brijesh Singh <brijesh.singh@amd.com> > Cc: x86@kernel.org > Cc: kexec@lists.infradead.org > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > kernel/resource.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/resource.c b/kernel/resource.c > index 627e61b0c124..4efd6e912279 100644 > --- a/kernel/resource.c > +++ b/kernel/resource.c > @@ -457,7 +457,7 @@ int walk_system_ram_res(u64 start, u64 end, void *arg, > { > unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > > - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true, > + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false, > arg, func); > } > > -- > 2.29.2 > >
On 24.03.21 12:18, Oscar Salvador wrote: > On Mon, Mar 22, 2021 at 05:01:58PM +0100, David Hildenbrand wrote: >> It used to be true that we can have busy system RAM only on the first level >> in the resourc tree. However, this is no longer holds for driver-managed >> system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on >> lower levels. > > Let me ask some rookie questions: > > What does "busy" term stand for here? IORESOURCE_BUSY - here: actually added, not just some reserved range / container. > Why resources coming from virtio-mem are added at a lower levels? Some information can be had from ebf71552bb0e690cad523ad175e8c4c89a33c333 commit ebf71552bb0e690cad523ad175e8c4c89a33c333 Author: David Hildenbrand <david@redhat.com> Date: Thu May 7 16:01:35 2020 +0200 virtio-mem: Add parent resource for all added "System RAM" Let's add a parent resource, named after the virtio device (inspired by drivers/dax/kmem.c). This allows user space to identify which memory belongs to which virtio-mem device. With this change and two virtio-mem devices: :/# cat /proc/iomem 00000000-00000fff : Reserved 00001000-0009fbff : System RAM [...] 140000000-333ffffff : virtio0 140000000-147ffffff : System RAM 148000000-14fffffff : System RAM 150000000-157ffffff : System RAM [...] 334000000-3033ffffff : virtio1 338000000-33fffffff : System RAM 340000000-347ffffff : System RAM 348000000-34fffffff : System RAM [...] For dax/kmem it comes naturally due to the "Persistent Memory" and device parent resources like: 140000000-33fffffff : Persistent Memory 140000000-1481fffff : namespace0.0 150000000-33fffffff : dax0.0 150000000-33fffffff : System RAM (kmem) 3280000000-32ffffffff : PCI Bus 0000:00 Thanks
diff --git a/kernel/resource.c b/kernel/resource.c index 627e61b0c124..4efd6e912279 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -457,7 +457,7 @@ int walk_system_ram_res(u64 start, u64 end, void *arg, { unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; - return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, true, + return __walk_iomem_res_desc(start, end, flags, IORES_DESC_NONE, false, arg, func); }
It used to be true that we can have busy system RAM only on the first level in the resourc tree. However, this is no longer holds for driver-managed system RAM (i.e., added via dax/kmem and virtio-mem), which gets added on lower levels. We have two users of walk_system_ram_res(), which currently only consideres the first level: a) kernel/kexec_file.c:kexec_walk_resources() -- We properly skip IORESOURCE_SYSRAM_DRIVER_MANAGED resources via locate_mem_hole_callback(), so even after this change, we won't be placing kexec images onto dax/kmem and virtio-mem added memory. No change. b) arch/x86/kernel/crash.c:fill_up_crash_elf_data() -- we're currently not adding relevant ranges to the crash elf info, resulting in them not getting dumped via kdump. This change fixes loading a crashkernel via kexec_file_load() and including dax/kmem and virtio-mem added System RAM in the crashdump on x86-64. Note that e.g,, arm64 relies on memblock data and, therefore, always considers all added System RAM already. Let's find all busy IORESOURCE_SYSTEM_RAM resources, making the function behave like walk_system_ram_range(). Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Oscar Salvador <osalvador@suse.de> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: x86@kernel.org Cc: kexec@lists.infradead.org Signed-off-by: David Hildenbrand <david@redhat.com> --- kernel/resource.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)