diff mbox series

[RFC,v2,01/21] e820: cheat PMEM as DRAM

Message ID 20181226133351.106676005@intel.com (mailing list archive)
State New, archived
Headers show
Series PMEM NUMA node and hotness accounting/migration | expand

Commit Message

Fengguang Wu Dec. 26, 2018, 1:14 p.m. UTC
From: Fan Du <fan.du@intel.com>

This is a hack to enumerate PMEM as NUMA nodes.
It's necessary for current BIOS that don't yet fill ACPI HMAT table.

WARNING: take care to backup. It is mutual exclusive with libnvdimm
subsystem and can destroy ndctl managed namespaces.

Signed-off-by: Fan Du <fan.du@intel.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
---
 arch/x86/kernel/e820.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Matthew Wilcox Dec. 27, 2018, 3:41 a.m. UTC | #1
On Wed, Dec 26, 2018 at 09:14:47PM +0800, Fengguang Wu wrote:
> From: Fan Du <fan.du@intel.com>
> 
> This is a hack to enumerate PMEM as NUMA nodes.
> It's necessary for current BIOS that don't yet fill ACPI HMAT table.
> 
> WARNING: take care to backup. It is mutual exclusive with libnvdimm
> subsystem and can destroy ndctl managed namespaces.

Why depend on firmware to present this "correctly"?  It seems to me like
less effort all around to have ndctl label some namespaces as being for
this kind of use.
Fengguang Wu Dec. 27, 2018, 4:11 a.m. UTC | #2
On Wed, Dec 26, 2018 at 07:41:41PM -0800, Matthew Wilcox wrote:
>On Wed, Dec 26, 2018 at 09:14:47PM +0800, Fengguang Wu wrote:
>> From: Fan Du <fan.du@intel.com>
>>
>> This is a hack to enumerate PMEM as NUMA nodes.
>> It's necessary for current BIOS that don't yet fill ACPI HMAT table.
>>
>> WARNING: take care to backup. It is mutual exclusive with libnvdimm
>> subsystem and can destroy ndctl managed namespaces.
>
>Why depend on firmware to present this "correctly"?  It seems to me like
>less effort all around to have ndctl label some namespaces as being for
>this kind of use.

Dave Hansen may be more suitable to answer your question. He posted
patches to make PMEM NUMA node coexist with libnvdimm and ndctl:

[PATCH 0/9] Allow persistent memory to be used like normal RAM
https://lkml.org/lkml/2018/10/23/9

That depends on future BIOS. So we did this quick hack to test out
PMEM NUMA node for the existing BIOS.

Thanks,
Fengguang
Dan Williams Dec. 27, 2018, 5:13 a.m. UTC | #3
On Wed, Dec 26, 2018 at 8:11 PM Fengguang Wu <fengguang.wu@intel.com> wrote:
>
> On Wed, Dec 26, 2018 at 07:41:41PM -0800, Matthew Wilcox wrote:
> >On Wed, Dec 26, 2018 at 09:14:47PM +0800, Fengguang Wu wrote:
> >> From: Fan Du <fan.du@intel.com>
> >>
> >> This is a hack to enumerate PMEM as NUMA nodes.
> >> It's necessary for current BIOS that don't yet fill ACPI HMAT table.
> >>
> >> WARNING: take care to backup. It is mutual exclusive with libnvdimm
> >> subsystem and can destroy ndctl managed namespaces.
> >
> >Why depend on firmware to present this "correctly"?  It seems to me like
> >less effort all around to have ndctl label some namespaces as being for
> >this kind of use.
>
> Dave Hansen may be more suitable to answer your question. He posted
> patches to make PMEM NUMA node coexist with libnvdimm and ndctl:
>
> [PATCH 0/9] Allow persistent memory to be used like normal RAM
> https://lkml.org/lkml/2018/10/23/9
>
> That depends on future BIOS. So we did this quick hack to test out
> PMEM NUMA node for the existing BIOS.

No, it does not depend on a future BIOS.

Willy, have a look here [1], here [2], and here [3] for the
work-in-progress ndctl takeover approach (actually 'daxctl' in this
case).

[1]: https://lkml.org/lkml/2018/10/23/9
[2]: https://lkml.org/lkml/2018/10/31/243
[3]: https://lists.01.org/pipermail/linux-nvdimm/2018-November/018677.html
Yang Shi Dec. 27, 2018, 7:32 p.m. UTC | #4
On Wed, Dec 26, 2018 at 9:13 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, Dec 26, 2018 at 8:11 PM Fengguang Wu <fengguang.wu@intel.com> wrote:
> >
> > On Wed, Dec 26, 2018 at 07:41:41PM -0800, Matthew Wilcox wrote:
> > >On Wed, Dec 26, 2018 at 09:14:47PM +0800, Fengguang Wu wrote:
> > >> From: Fan Du <fan.du@intel.com>
> > >>
> > >> This is a hack to enumerate PMEM as NUMA nodes.
> > >> It's necessary for current BIOS that don't yet fill ACPI HMAT table.
> > >>
> > >> WARNING: take care to backup. It is mutual exclusive with libnvdimm
> > >> subsystem and can destroy ndctl managed namespaces.
> > >
> > >Why depend on firmware to present this "correctly"?  It seems to me like
> > >less effort all around to have ndctl label some namespaces as being for
> > >this kind of use.
> >
> > Dave Hansen may be more suitable to answer your question. He posted
> > patches to make PMEM NUMA node coexist with libnvdimm and ndctl:
> >
> > [PATCH 0/9] Allow persistent memory to be used like normal RAM
> > https://lkml.org/lkml/2018/10/23/9
> >
> > That depends on future BIOS. So we did this quick hack to test out
> > PMEM NUMA node for the existing BIOS.
>
> No, it does not depend on a future BIOS.

It is correct. We already have Dave's patches + Dan's patch (added
target_node field) work on our machine which has SRAT.

Thanks,
Yang

>
> Willy, have a look here [1], here [2], and here [3] for the
> work-in-progress ndctl takeover approach (actually 'daxctl' in this
> case).
>
> [1]: https://lkml.org/lkml/2018/10/23/9
> [2]: https://lkml.org/lkml/2018/10/31/243
> [3]: https://lists.01.org/pipermail/linux-nvdimm/2018-November/018677.html
>
Fengguang Wu Dec. 28, 2018, 3:27 a.m. UTC | #5
On Thu, Dec 27, 2018 at 11:32:06AM -0800, Yang Shi wrote:
>On Wed, Dec 26, 2018 at 9:13 PM Dan Williams <dan.j.williams@intel.com> wrote:
>>
>> On Wed, Dec 26, 2018 at 8:11 PM Fengguang Wu <fengguang.wu@intel.com> wrote:
>> >
>> > On Wed, Dec 26, 2018 at 07:41:41PM -0800, Matthew Wilcox wrote:
>> > >On Wed, Dec 26, 2018 at 09:14:47PM +0800, Fengguang Wu wrote:
>> > >> From: Fan Du <fan.du@intel.com>
>> > >>
>> > >> This is a hack to enumerate PMEM as NUMA nodes.
>> > >> It's necessary for current BIOS that don't yet fill ACPI HMAT table.
>> > >>
>> > >> WARNING: take care to backup. It is mutual exclusive with libnvdimm
>> > >> subsystem and can destroy ndctl managed namespaces.
>> > >
>> > >Why depend on firmware to present this "correctly"?  It seems to me like
>> > >less effort all around to have ndctl label some namespaces as being for
>> > >this kind of use.
>> >
>> > Dave Hansen may be more suitable to answer your question. He posted
>> > patches to make PMEM NUMA node coexist with libnvdimm and ndctl:
>> >
>> > [PATCH 0/9] Allow persistent memory to be used like normal RAM
>> > https://lkml.org/lkml/2018/10/23/9
>> >
>> > That depends on future BIOS. So we did this quick hack to test out
>> > PMEM NUMA node for the existing BIOS.
>>
>> No, it does not depend on a future BIOS.
>
>It is correct. We already have Dave's patches + Dan's patch (added
>target_node field) work on our machine which has SRAT.

Thanks for the correction. It looks my perception was out of date.
So we can follow Dave+Dan's patches to create the PMEM NUMA nodes.

Thanks,
Fengguang

>>
>> Willy, have a look here [1], here [2], and here [3] for the
>> work-in-progress ndctl takeover approach (actually 'daxctl' in this
>> case).
>>
>> [1]: https://lkml.org/lkml/2018/10/23/9
>> [2]: https://lkml.org/lkml/2018/10/31/243
>> [3]: https://lists.01.org/pipermail/linux-nvdimm/2018-November/018677.html
>>
>
diff mbox series

Patch

--- linux.orig/arch/x86/kernel/e820.c	2018-12-23 19:20:34.587078783 +0800
+++ linux/arch/x86/kernel/e820.c	2018-12-23 19:20:34.587078783 +0800
@@ -403,7 +403,8 @@  static int __init __append_e820_table(st
 		/* Ignore the entry on 64-bit overflow: */
 		if (start > end && likely(size))
 			return -1;
-
+		if (type == E820_TYPE_PMEM)
+			type = E820_TYPE_RAM;
 		e820__range_add(start, size, type);
 
 		entry++;