diff mbox series

[v2,3/3] device-dax: Add system ram (add_memory()) with MHP_NO_FIRMWARE_MEMMAP

Message ID 20200430102908.10107-4-david@redhat.com (mailing list archive)
State Not Applicable, archived
Headers show
Series mm/memory_hotplug: Allow to not create firmware memmap entries | expand

Commit Message

David Hildenbrand April 30, 2020, 10:29 a.m. UTC
Currently, when adding memory, we create entries in /sys/firmware/memmap/
as "System RAM". This does not reflect the reality and will lead to
kexec-tools to add that memory to the fixed-up initial memmap for a
kexec kernel (loaded via kexec_load()). The memory will be considered
initial System RAM by the kexec kernel.

We should let the kexec kernel decide how to use that memory - just as
we do during an ordinary reboot.

Before configuring the namespace:
	[root@localhost ~]# cat /proc/iomem
	...
	140000000-33fffffff : Persistent Memory
	  140000000-33fffffff : namespace0.0
	3280000000-32ffffffff : PCI Bus 0000:00

After configuring the namespace:
	[root@localhost ~]# cat /proc/iomem
	...
	140000000-33fffffff : Persistent Memory
	  140000000-1481fffff : namespace0.0
	  148200000-33fffffff : dax0.0
	3280000000-32ffffffff : PCI Bus 0000:00

After loading kmem:
	[root@localhost ~]# cat /proc/iomem
	...
	140000000-33fffffff : Persistent Memory
	  140000000-1481fffff : namespace0.0
	  150000000-33fffffff : dax0.0
	    150000000-33fffffff : System RAM
	3280000000-32ffffffff : PCI Bus 0000:00

After a proper reboot:
	[root@localhost ~]# cat /proc/iomem
	...
	140000000-33fffffff : Persistent Memory
	  140000000-1481fffff : namespace0.0
	  148200000-33fffffff : dax0.0
	3280000000-32ffffffff : PCI Bus 0000:00

Within the kexec kernel before this change:
	[root@localhost ~]# cat /proc/iomem
	...
	140000000-33fffffff : Persistent Memory
	  140000000-1481fffff : namespace0.0
	  150000000-33fffffff : System RAM
	3280000000-32ffffffff : PCI Bus 0000:00

Within the kexec kernel after this change:
	[root@localhost ~]# cat /proc/iomem
	...
	140000000-33fffffff : Persistent Memory
	  140000000-1481fffff : namespace0.0
	  148200000-33fffffff : dax0.0
	3280000000-32ffffffff : PCI Bus 0000:00

/sys/firmware/memmap/ before this change:
	0000000000000000-000000000009fc00 (System RAM)
	000000000009fc00-00000000000a0000 (Reserved)
	00000000000f0000-0000000000100000 (Reserved)
	0000000000100000-00000000bffdf000 (System RAM)
	00000000bffdf000-00000000c0000000 (Reserved)
	00000000feffc000-00000000ff000000 (Reserved)
	00000000fffc0000-0000000100000000 (Reserved)
	0000000100000000-0000000140000000 (System RAM)
	0000000150000000-0000000340000000 (System RAM)

/sys/firmware/memmap/ after a proper reboot:
	0000000000000000-000000000009fc00 (System RAM)
	000000000009fc00-00000000000a0000 (Reserved)
	00000000000f0000-0000000000100000 (Reserved)
	0000000000100000-00000000bffdf000 (System RAM)
	00000000bffdf000-00000000c0000000 (Reserved)
	00000000feffc000-00000000ff000000 (Reserved)
	00000000fffc0000-0000000100000000 (Reserved)
	0000000100000000-0000000140000000 (System RAM)

/sys/firmware/memmap/ after this change:
	0000000000000000-000000000009fc00 (System RAM)
	000000000009fc00-00000000000a0000 (Reserved)
	00000000000f0000-0000000000100000 (Reserved)
	0000000000100000-00000000bffdf000 (System RAM)
	00000000bffdf000-00000000c0000000 (Reserved)
	00000000feffc000-00000000ff000000 (Reserved)
	00000000fffc0000-0000000100000000 (Reserved)
	0000000100000000-0000000140000000 (System RAM)

kexec-tools already seem to basically ignore any System RAM that's not
on top level when searching for areas to place kexec images - but also
for determining crash areas to dump via kdump. This behavior is not
changed by this patch. kexec-tools probably has to be fixed to also
include this memory in system dumps.

Note: kexec_file_load() does the right thing already within the kernel.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/dax/kmem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Dave Hansen April 30, 2020, 11:23 a.m. UTC | #1
On 4/30/20 3:29 AM, David Hildenbrand wrote:
> Currently, when adding memory, we create entries in /sys/firmware/memmap/
> as "System RAM". This does not reflect the reality and will lead to
> kexec-tools to add that memory to the fixed-up initial memmap for a
> kexec kernel (loaded via kexec_load()). The memory will be considered
> initial System RAM by the kexec kernel.
> 
> We should let the kexec kernel decide how to use that memory - just as
> we do during an ordinary reboot.
...
> -	rc = add_memory(numa_node, new_res->start, resource_size(new_res), 0);
> +	rc = add_memory(numa_node, new_res->start, resource_size(new_res),
> +			MHP_NO_FIRMWARE_MEMMAP);

Looks fine.  But, if you send another revision, could you add a comment
about the actual goal of MHP_NO_FIRMWARE_MEMMAP?  Maybe:

	/*
	 * MHP_NO_FIRMWARE_MEMMAP ensures that future
	 * kexec'd kernels will not treat this as RAM.
	 */

Not a biggie, though.

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
David Hildenbrand April 30, 2020, 3:28 p.m. UTC | #2
On 30.04.20 13:23, Dave Hansen wrote:
> On 4/30/20 3:29 AM, David Hildenbrand wrote:
>> Currently, when adding memory, we create entries in /sys/firmware/memmap/
>> as "System RAM". This does not reflect the reality and will lead to
>> kexec-tools to add that memory to the fixed-up initial memmap for a
>> kexec kernel (loaded via kexec_load()). The memory will be considered
>> initial System RAM by the kexec kernel.
>>
>> We should let the kexec kernel decide how to use that memory - just as
>> we do during an ordinary reboot.
> ...
>> -	rc = add_memory(numa_node, new_res->start, resource_size(new_res), 0);
>> +	rc = add_memory(numa_node, new_res->start, resource_size(new_res),
>> +			MHP_NO_FIRMWARE_MEMMAP);
> 
> Looks fine.  But, if you send another revision, could you add a comment
> about the actual goal of MHP_NO_FIRMWARE_MEMMAP?  Maybe:
> 
> 	/*
> 	 * MHP_NO_FIRMWARE_MEMMAP ensures that future
> 	 * kexec'd kernels will not treat this as RAM.
> 	 */
> 
> Not a biggie, though.

Sure, maybe Andrew can fixup when applying (if no resend is necessary).

Thanks Dave!

> 
> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
>
diff mbox series

Patch

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index e159184e0ba0..929823a79816 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -65,7 +65,8 @@  int dev_dax_kmem_probe(struct device *dev)
 	new_res->flags = IORESOURCE_SYSTEM_RAM;
 	new_res->name = dev_name(dev);
 
-	rc = add_memory(numa_node, new_res->start, resource_size(new_res), 0);
+	rc = add_memory(numa_node, new_res->start, resource_size(new_res),
+			MHP_NO_FIRMWARE_MEMMAP);
 	if (rc) {
 		release_resource(new_res);
 		kfree(new_res);