diff mbox

grub causing NVDIMMs to be treated as normal memory

Message ID 5655FFC3.9030603@gmail.com (mailing list archive)
State Superseded
Headers show

Commit Message

Andrei Borzenkov Nov. 25, 2015, 6:36 p.m. UTC
25.11.2015 02:52, Elliott, Robert (Persistent Memory) ?????:
> We've noticed that some combinations of grub and old linux kernels
> 
> end up interpreting the UEFI memory map EfiPersistentMemory type 14
> 
> (formerly a reserved value) as regular memory in the linux e820
> 
> table, causing silent data corruption on the NVDIMMs.  That occurs
> 
> even though grub prints this message suggesting everything is safe:
> 
>     Unknown memory type 14, considering reserved
> 
> 
> 
> In broken versions of grub, the code parsing the UEFI memory map
> 
> has a "default" case that falls through to the
> 
> GRUB_EFI_BOOT_SERVICES_DATA case, which marks the memory range
> 
> as GRUB_MEMORY_AVAILABLE and ends up in e820 as regular memory.
> 

Could you test if attached patch works for you (compile tested)?

Comments

Elliott, Robert (Servers) Nov. 26, 2015, 12:12 a.m. UTC | #1
> -----Original Message-----

> From: Linux-nvdimm [mailto:linux-nvdimm-bounces@lists.01.org] On

> Behalf Of Andrei Borzenkov

> Sent: Wednesday, November 25, 2015 12:37 PM

> To: The development of GNU GRUB <grub-devel@gnu.org>;

> dan.j.williams@intel.com; linux-nvdimm@lists.01.org

> Subject: Re: grub causing NVDIMMs to be treated as normal memory

> 

> 25.11.2015 02:52, Elliott, Robert (Persistent Memory) ?????:

> > We've noticed that some combinations of grub and old linux kernels

> >

> > end up interpreting the UEFI memory map EfiPersistentMemory type 14

> > (formerly a reserved value) as regular memory in the linux e820

> > table, causing silent data corruption on the NVDIMMs.  That occurs

> > even though grub prints this message suggesting everything is safe:

> >

> >     Unknown memory type 14, considering reserved

> >

> >

> >

> > In broken versions of grub, the code parsing the UEFI memory map

> > has a "default" case that falls through to the

> >

> > GRUB_EFI_BOOT_SERVICES_DATA case, which marks the memory range

> > as GRUB_MEMORY_AVAILABLE and ends up in e820 as regular memory.

> 

> Could you test if attached patch works for you (compile tested)?


Thanks.

I think I finally got that to compile with
    configure --with-platform=efi
    make

but have no clue how to install it and try it out.  I'm using a
fedora22 system, which has its own /sbin/grub2-install.  I 
don't understand how that differs from the grub-install in the
build directory or how to get either of them to work.

Anyway, we should create another patch that does:
* #define GRUB_EFI_PERSISTENT_MEMORY 14	per UEFI 2.5
* #define GRUB_E820_PERSISTENT_MEMORY 7	per ACPI 6.0
* add a GRUB_MEMORY_PMEM enum
* map GRUB_EFI_PERSISTENT_MEMORY -> GRUM_PMEMORY_MEM 
  -> GRUB_E820_PERSISTENT_MEMORY per ACPI 6.0

to explicitly handle the new types (in addition to handling
unknown values correctly).


---
Robert Elliott, HPE Persistent Memory
Andrei Borzenkov Nov. 26, 2015, 3:30 a.m. UTC | #2
26.11.2015 03:12, Elliott, Robert (Persistent Memory) ?????:
> 
> 
>> -----Original Message-----
>> From: Linux-nvdimm [mailto:linux-nvdimm-bounces@lists.01.org] On
>> Behalf Of Andrei Borzenkov
>> Sent: Wednesday, November 25, 2015 12:37 PM
>> To: The development of GNU GRUB <grub-devel@gnu.org>;
>> dan.j.williams@intel.com; linux-nvdimm@lists.01.org
>> Subject: Re: grub causing NVDIMMs to be treated as normal memory
>>
>> 25.11.2015 02:52, Elliott, Robert (Persistent Memory) ?????:
>>> We've noticed that some combinations of grub and old linux kernels
>>>
>>> end up interpreting the UEFI memory map EfiPersistentMemory type 14
>>> (formerly a reserved value) as regular memory in the linux e820
>>> table, causing silent data corruption on the NVDIMMs.  That occurs
>>> even though grub prints this message suggesting everything is safe:
>>>
>>>     Unknown memory type 14, considering reserved
>>>
>>>
>>>
>>> In broken versions of grub, the code parsing the UEFI memory map
>>> has a "default" case that falls through to the
>>>
>>> GRUB_EFI_BOOT_SERVICES_DATA case, which marks the memory range
>>> as GRUB_MEMORY_AVAILABLE and ends up in e820 as regular memory.
>>
>> Could you test if attached patch works for you (compile tested)?
> 
> Thanks.
> 
> I think I finally got that to compile with
>     configure --with-platform=efi
>     make
> 
> but have no clue how to install it and try it out.  I'm using a
> fedora22 system, which has its own /sbin/grub2-install.  I 
> don't understand how that differs from the grub-install in the
> build directory or how to get either of them to work.
> 

From the build directory

pkgdatadir=$PWD ./grub-install --bootloader-id testgrub -d grub-core

This should install grub in \EFI\testgrub on ESP and add EFI menu for
it. You can add --no-nvram to skip EFI menu update and load it manually
then. It will install modules in /boot/grub (instead of /boot/grub2), so
you should probably copy /boot/grub2/grub.cfg there.

Of course you can simply add patch to Fedora package and rebuild this
package.

> Anyway, we should create another patch that does:
> * #define GRUB_EFI_PERSISTENT_MEMORY 14	per UEFI 2.5
> * #define GRUB_E820_PERSISTENT_MEMORY 7	per ACPI 6.0
> * add a GRUB_MEMORY_PMEM enum
> * map GRUB_EFI_PERSISTENT_MEMORY -> GRUM_PMEMORY_MEM 
>   -> GRUB_E820_PERSISTENT_MEMORY per ACPI 6.0
> 
> to explicitly handle the new types (in addition to handling
> unknown values correctly).
> 

That is much more involved than obvious bug fix. It can be done later.
Elliott, Robert (Servers) Nov. 26, 2015, 6:15 a.m. UTC | #3
> -----Original Message-----
> From: Andrei Borzenkov [mailto:arvidjaar@gmail.com]
...
> From the build directory
> 
> pkgdatadir=$PWD ./grub-install --bootloader-id testgrub -d grub-core
> 
> This should install grub in \EFI\testgrub on ESP and add EFI menu for
> it. You can add --no-nvram to skip EFI menu update and load it
> manually then. It will install modules in /boot/grub (instead of
> /boot/grub2), so you should probably copy /boot/grub2/grub.cfg there.

grub-install reported one complaint, which I ignored:
	Installing for x86_64-efi platform.
	./grub-install: warning: cannot open directory `/usr/local/share/locale': No such file or directory.
	Installation finished. No error reported.

I ran this to create a grub.cfg there like the one I had been using:
	grub2-mkconfig /boot/grub 

The system rebooted and ran the code, giving me a menu on the serial
port:
	GNU GRUB  version 2.02~beta2 menu 

Booting failed.  I had to change these with 'e':
* linuxefi to linux
* initrdefi to initrd

Hitting Ctrl-x to boot after that resulted in a successful boot with
these prints:
  Booting a command list

Unknown memory type 14, considering reserved
Unknown memory type 14, considering reserved
Unknown memory type 14, considering reserved
Unknown memory type 14, considering reserved
Unknown memory type 14, considering reserved
Unknown memory type 14, considering reserved
Unknown memory type 14, considering reserved
Unknown memory type 14, considering reserved
Unknown memory type 14, considering reserved
Unknown memory type 14, considering reserved
Unknown memory type 14, considering reserved
Unknown memory type 14, considering reserved
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 4.4.0-rc2+ (orange@s17) (gcc version 5.1.1 20150618 (Red Hat 5.1.1-4) (GCC) ) #87 SMP Mon Nov 23 16:12:10 CST 2015
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.4.0-rc2+ root=/dev/mapper/fedora_pmem03-root ro rd.lvm.lv=fedora_pmem03/swap rd.lvm.lv=fedora_pmem03/root rhgb quiet efi=debug libnvdimm.dyndbg nfit.dyndbg nd_pmem.dyndbg nd_btt.dyndbg ignore_loglevel console=tty0 console=ttyS0,115200
...


Next, I ran grub-mkconfig from the build directory:
	grub-mkconfig -o /boot/grub/grub.cfg
and it created a grub.cfg that used linux and initrd, but 
lacked all the linux command line options from /etc/default/grub.

Based on comments in the top of grub.cfg file, I copied
/etc/default/grub to /usr/local/etc/default/grub
and reran.  This preserved the linux command line options.

I also added this to /boot/grub/grub.cfg:
	set debug=mmap,linux

That results in a different looking boot menu with two top
level entries:
	*Fedora GNU/Linux
	 Advanced options for Fedora GNU/Linux

I let the timeout expire and it booted with these prints:
  Booting `Fedora GNU/Linux'

Loading Linux 4.4.0-rc2+ ...
mmap/efi/mmap.c:66: EFI memory region 0x0-0x93000: 7
...[skipping rest of the entries]...
mmap/efi/mmap.c:66: EFI memory region 0x880000000-0xc80000000: 14
Unknown memory type 14, considering reserved
mmap/efi/mmap.c:66: EFI memory region 0x1480000000-0x1a80000000: 14
Unknown memory type 14, considering reserved
loader/i386/linux.c:253: prot_mode_mem = 0x2000000, prot_mode_target = 2000000,
prot_size = 12e9000
loader/i386/linux.c:877: bzImage, setup=0x4200, size=0x12e9000
Loading initial ramdisk ...
loader/i386/linux.c:1123: Initrd, addr=0x35dfd000, size=0x21f1ab1
mmap/efi/mmap.c:66: EFI memory region 0x0-0x93000: 7
...
mmap/efi/mmap.c:66: EFI memory region 0x880000000-0xc80000000: 14
Unknown memory type 14, considering reserved
mmap/efi/mmap.c:66: EFI memory region 0x1480000000-0x1a80000000: 14
Unknown memory type 14, considering reserved
mmap/efi/mmap.c:66: EFI memory region 0x0-0x93000: 7
...
Unknown memory type 14, considering reserved
mmap/efi/mmap.c:66: EFI memory region 0x1480000000-0x1a80000000: 14
Unknown memory type 14, considering reserved
loader/i386/linux.c:569: real_size = 6000, mmap_size = 2000
mmap/efi/mmap.c:66: EFI memory region 0x0-0x93000: 7
loader/i386/linux.c:423: addr = 10000, size = 80000, need_size = a000
mmap/efi/mmap.c:66: EFI memory region 0x93000-0x94000: 0
...
Unknown memory type 14, considering reserved
mmap/efi/mmap.c:66: EFI memory region 0x1480000000-0x1a80000000: 14
Unknown memory type 14, considering reserved
loader/i386/linux.c:581: real_mode_target = 86000, real_size = 6000,
efi_mmap_size = 4000
loader/i386/linux.c:598: real_mode_mem = 0x77926000
loader/i386/linux.c:608: code32_start = 2000000
mmap/efi/mmap.c:66: EFI memory region 0x0-0x93000: 7
...
mmap/efi/mmap.c:66: EFI memory region 0x880000000-0xc80000000: 14
Unknown memory type 14, considering reserved
mmap/efi/mmap.c:66: EFI memory region 0x1480000000-0x1a80000000: 14
Unknown memory type 14, considering reserved
mmap/efi/mmap.c:66: EFI memory region 0x0-0x93000: 7
...
mmap/efi/mmap.c:66: EFI memory region 0x880000000-0xc80000000: 14
Unknown memory type 14, considering reserved
mmap/efi/mmap.c:66: EFI memory region 0x1480000000-0x1a80000000: 14
Unknown memory type 14, considering reserved



This is booting a new kernel with the EFI boot stub, so I can't
confirm that it fixes the issue of exposing the address range as
normal, but based on the print it's probably working.  I could
add prints in grub-core/loader/i386/linux.c grub_e820_add_region
to confirm the e820 contents at exit.


---
Robert Elliott, HPE Persistent Memory
Andrei Borzenkov Nov. 26, 2015, 4:55 p.m. UTC | #4
26.11.2015 09:15, Elliott, Robert (Persistent Memory) ?????:
...

> ...
> mmap/efi/mmap.c:66: EFI memory region 0x880000000-0xc80000000: 14
> Unknown memory type 14, considering reserved
> mmap/efi/mmap.c:66: EFI memory region 0x1480000000-0x1a80000000: 14
> Unknown memory type 14, considering reserved
> 
> 
> 
> This is booting a new kernel with the EFI boot stub, so I can't
> confirm that it fixes the issue of exposing the address range as
> normal, but based on the print it's probably working.  I could
> add prints in grub-core/loader/i386/linux.c grub_e820_add_region
> to confirm the e820 contents at exit.
> 

Thanks. If it results in wrong e820 type we have much larger problem so
I do not think it necessary.

I pushed it; as I understand this should serve as stopgap. If I get
around to implement persistent memory type would you test it? But as I
cannot tell when it happens, feel free to submit patch in the meantime :)
Elliott, Robert (Servers) Nov. 26, 2015, 11:24 p.m. UTC | #5
> -----Original Message-----

> From: Andrei Borzenkov [mailto:arvidjaar@gmail.com]

> Sent: Thursday, November 26, 2015 10:55 AM

> To: Elliott, Robert (Persistent Memory) <elliott@hpe.com>; The

> development of GNU GRUB <grub-devel@gnu.org>;

> dan.j.williams@intel.com; linux-nvdimm@lists.01.org

> Subject: Re: grub causing NVDIMMs to be treated as normal memory

> 

> 26.11.2015 09:15, Elliott, Robert (Persistent Memory) ?????:

> ...

> 

> > ...

> > mmap/efi/mmap.c:66: EFI memory region 0x880000000-0xc80000000: 14

> > Unknown memory type 14, considering reserved

> > mmap/efi/mmap.c:66: EFI memory region 0x1480000000-0x1a80000000:

> 14

> > Unknown memory type 14, considering reserved

> >

> >

> >

> > This is booting a new kernel with the EFI boot stub, so I can't

> > confirm that it fixes the issue of exposing the address range as

> > normal, but based on the print it's probably working.  I could

> > add prints in grub-core/loader/i386/linux.c grub_e820_add_region

> > to confirm the e820 contents at exit.

> >

> 

> Thanks. If it results in wrong e820 type we have much larger problem

> so I do not think it necessary.

> 

> I pushed it; as I understand this should serve as stopgap. If I get

> around to implement persistent memory type would you test it? But as

> I cannot tell when it happens, feel free to submit patch in the

> meantime :)


As an experiment, I:
* compiled a linux 4.4rc2 kernel with CONFIG_EFI_STUB disabled
* added some prints in the linux kernel setup_memory_map()
  to print the incoming bios_params e820 entries
* modified the grub patch to set the range to AVAILABLE rather 
  than RESERVED (simulating the bug, to see its impacts on
  the kernel)

Before (CONFIG_EFI_STUB=y, grub mapping to RESERVED):
bp: [mem 0x0000000000000000-0x0000000000092fff] usable
bp: [mem 0x0000000000093000-0x0000000000093fff] reserved
bp: [mem 0x0000000000094000-0x000000000009ffff] usable
bp: [mem 0x0000000000100000-0x000000006affbfff] usable
bp: [mem 0x000000006affc000-0x000000006b5fbfff] reserved
bp: [mem 0x000000006b5fc000-0x000000006b5fcfff] usable
bp: [mem 0x000000006b5fd000-0x000000006b67dfff] reserved
bp: [mem 0x000000006b67e000-0x00000000784fefff] usable
bp: [mem 0x00000000784ff000-0x00000000791fefff] reserved <--b
bp: [mem 0x00000000791ff000-0x000000007b5fefff] ACPI NVS
bp: [mem 0x000000007b5ff000-0x000000007b7fefff] ACPI data
bp: [mem 0x000000007b7ff000-0x000000007b7fffff] usable
bp: [mem 0x0000000100000000-0x000000087fffffff] usable <--a
bp: [mem 0x0000000c80000000-0x000000147fffffff] usable <--a
bp: [mem 0x0000000080000000-0x000000008fffffff] reserved
bp: [mem 0x0000000880000000-0x0000000c7fffffff] reserved <--a
bp: [mem 0x0000001480000000-0x0000001a7fffffff] reserved <--a

After (CONFIG_EFI_STUB=n, grub mapping to AVAILABLE):
bp: [mem 0x0000000000000000-0x0000000000092fff] usable
bp: [mem 0x0000000000093000-0x0000000000093fff] reserved
bp: [mem 0x0000000000094000-0x000000000009ffff] usable
bp: [mem 0x0000000000100000-0x000000006affbfff] usable
bp: [mem 0x000000006affc000-0x000000006b5fbfff] reserved
bp: [mem 0x000000006b5fc000-0x000000006b5fcfff] usable
bp: [mem 0x000000006b5fd000-0x000000006b67dfff] reserved
bp: [mem 0x000000006b67e000-0x00000000784fefff] usable
bp: [mem 0x00000000784ff000-0x00000000788fefff] reserved <--b
bp: [mem 0x00000000788ff000-0x00000000790fefff] type 20  <--b
bp: [mem 0x00000000790ff000-0x00000000791fefff] reserved <--b
bp: [mem 0x00000000791ff000-0x000000007b5fefff] ACPI NVS
bp: [mem 0x000000007b5ff000-0x000000007b7fefff] ACPI data
bp: [mem 0x000000007b7ff000-0x000000007b7fffff] usable
bp: [mem 0x0000000080000000-0x000000008fffffff] reserved
bp: [mem 0x0000000100000000-0x0000001a7fffffff] usable <--a


Note (a): The 088 and 148 ranges were merged into the 
010 and 0c8 usable ranges, as expected for this experiment.

The NVDIMM drivers in the kernel don't load (nmem devices load,
but namespaces fail and no pmem devices show up):
[   27.835319] nd_pmem namespace0.0: could not reserve region [0x0x0000000880000000:0x400000000]
[   27.835323]  ndbus1: nd_pmem.probe(namespace0.0) = -16
[   27.835355] nd_pmem namespace2.0: could not reserve region [0x0x0000001880000000:0x200000000]
[   27.835356]  ndbus1: nd_pmem.probe(namespace2.0) = -16
[   27.835373] nd_pmem namespace1.0: could not reserve region [0x0x0000001480000000:0x400000000]
[   27.835374]  ndbus1: nd_pmem.probe(namespace1.0) = -16

That shows up as normal memory in /proc/iomem and elsewhere:
100000000-1a7fffffff : System RAM

$ numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 0 size: 104614 MB
node 0 free: 103303 MB
node distances:
node   0
  0:  10



Note (b): The internal GRUB_MEMORY_CODE (20) value is 
leaking through to the E820 table.

That appears to be from this patch on 2013-10-14:
	6de9ee86 Pass-through unknown E820 types
which deleted a switch statement mapping the internal codes 
to E820 values and included this:
-    case GRUB_MEMORY_CODE:
-    case GRUB_MEMORY_RESERVED:
-      ctx->cur.type = GRUB_E820_RESERVED;

based on this hope:
+/* GRUB types conveniently match E820 types.  */

That's only true for types 1 through 5:
typedef enum grub_memory_type
  {
    GRUB_MEMORY_AVAILABLE = 1,
    GRUB_MEMORY_RESERVED = 2,
    GRUB_MEMORY_ACPI = 3,
    GRUB_MEMORY_NVS = 4,
    GRUB_MEMORY_BADRAM = 5,
    GRUB_MEMORY_COREBOOT_TABLES = 16,
    GRUB_MEMORY_CODE = 20,
    /* This one is special: it's used internally but is never reported
       by firmware. Don't use -1 as it's used internally for other purposes. */
    GRUB_MEMORY_HOLE = -2,
    GRUB_MEMORY_MAX = 0x10000
  } grub_memory_type_t;


---
Robert Elliott, HPE Persistent Memory
Andrei Borzenkov Nov. 27, 2015, 3:58 a.m. UTC | #6
27.11.2015 02:24, Elliott, Robert (Persistent Memory) ?????:
> 
>> -----Original Message-----
>> From: Andrei Borzenkov [mailto:arvidjaar@gmail.com]
>> Sent: Thursday, November 26, 2015 10:55 AM
>> To: Elliott, Robert (Persistent Memory) <elliott@hpe.com>; The
>> development of GNU GRUB <grub-devel@gnu.org>;
>> dan.j.williams@intel.com; linux-nvdimm@lists.01.org
>> Subject: Re: grub causing NVDIMMs to be treated as normal memory
>>
>> 26.11.2015 09:15, Elliott, Robert (Persistent Memory) ?????:
>> ...
>>
>>> ...
>>> mmap/efi/mmap.c:66: EFI memory region 0x880000000-0xc80000000: 14
>>> Unknown memory type 14, considering reserved
>>> mmap/efi/mmap.c:66: EFI memory region 0x1480000000-0x1a80000000:
>> 14
>>> Unknown memory type 14, considering reserved
>>>
>>>
>>>
>>> This is booting a new kernel with the EFI boot stub, so I can't
>>> confirm that it fixes the issue of exposing the address range as
>>> normal, but based on the print it's probably working.  I could
>>> add prints in grub-core/loader/i386/linux.c grub_e820_add_region
>>> to confirm the e820 contents at exit.
>>>
>>
>> Thanks. If it results in wrong e820 type we have much larger problem
>> so I do not think it necessary.
>>
>> I pushed it; as I understand this should serve as stopgap. If I get
>> around to implement persistent memory type would you test it? But as
>> I cannot tell when it happens, feel free to submit patch in the
>> meantime :)
> 
> As an experiment, I:
> * compiled a linux 4.4rc2 kernel with CONFIG_EFI_STUB disabled
> * added some prints in the linux kernel setup_memory_map()
>   to print the incoming bios_params e820 entries
> * modified the grub patch to set the range to AVAILABLE rather 
>   than RESERVED (simulating the bug, to see its impacts on
>   the kernel)
> 
> Before (CONFIG_EFI_STUB=y, grub mapping to RESERVED):
> bp: [mem 0x0000000000000000-0x0000000000092fff] usable
> bp: [mem 0x0000000000093000-0x0000000000093fff] reserved
> bp: [mem 0x0000000000094000-0x000000000009ffff] usable
> bp: [mem 0x0000000000100000-0x000000006affbfff] usable
> bp: [mem 0x000000006affc000-0x000000006b5fbfff] reserved
> bp: [mem 0x000000006b5fc000-0x000000006b5fcfff] usable
> bp: [mem 0x000000006b5fd000-0x000000006b67dfff] reserved
> bp: [mem 0x000000006b67e000-0x00000000784fefff] usable
> bp: [mem 0x00000000784ff000-0x00000000791fefff] reserved <--b
> bp: [mem 0x00000000791ff000-0x000000007b5fefff] ACPI NVS
> bp: [mem 0x000000007b5ff000-0x000000007b7fefff] ACPI data
> bp: [mem 0x000000007b7ff000-0x000000007b7fffff] usable
> bp: [mem 0x0000000100000000-0x000000087fffffff] usable <--a
> bp: [mem 0x0000000c80000000-0x000000147fffffff] usable <--a
> bp: [mem 0x0000000080000000-0x000000008fffffff] reserved
> bp: [mem 0x0000000880000000-0x0000000c7fffffff] reserved <--a
> bp: [mem 0x0000001480000000-0x0000001a7fffffff] reserved <--a
> 
> After (CONFIG_EFI_STUB=n, grub mapping to AVAILABLE):
> bp: [mem 0x0000000000000000-0x0000000000092fff] usable
> bp: [mem 0x0000000000093000-0x0000000000093fff] reserved
> bp: [mem 0x0000000000094000-0x000000000009ffff] usable
> bp: [mem 0x0000000000100000-0x000000006affbfff] usable
> bp: [mem 0x000000006affc000-0x000000006b5fbfff] reserved
> bp: [mem 0x000000006b5fc000-0x000000006b5fcfff] usable
> bp: [mem 0x000000006b5fd000-0x000000006b67dfff] reserved
> bp: [mem 0x000000006b67e000-0x00000000784fefff] usable
> bp: [mem 0x00000000784ff000-0x00000000788fefff] reserved <--b
> bp: [mem 0x00000000788ff000-0x00000000790fefff] type 20  <--b
> bp: [mem 0x00000000790ff000-0x00000000791fefff] reserved <--b
> bp: [mem 0x00000000791ff000-0x000000007b5fefff] ACPI NVS
> bp: [mem 0x000000007b5ff000-0x000000007b7fefff] ACPI data
> bp: [mem 0x000000007b7ff000-0x000000007b7fffff] usable
> bp: [mem 0x0000000080000000-0x000000008fffffff] reserved
> bp: [mem 0x0000000100000000-0x0000001a7fffffff] usable <--a
> 
> 
> Note (a): The 088 and 148 ranges were merged into the 
> 010 and 0c8 usable ranges, as expected for this experiment.
> 
> The NVDIMM drivers in the kernel don't load (nmem devices load,
> but namespaces fail and no pmem devices show up):
> [   27.835319] nd_pmem namespace0.0: could not reserve region [0x0x0000000880000000:0x400000000]
> [   27.835323]  ndbus1: nd_pmem.probe(namespace0.0) = -16
> [   27.835355] nd_pmem namespace2.0: could not reserve region [0x0x0000001880000000:0x200000000]
> [   27.835356]  ndbus1: nd_pmem.probe(namespace2.0) = -16
> [   27.835373] nd_pmem namespace1.0: could not reserve region [0x0x0000001480000000:0x400000000]
> [   27.835374]  ndbus1: nd_pmem.probe(namespace1.0) = -16
> 
> That shows up as normal memory in /proc/iomem and elsewhere:
> 100000000-1a7fffffff : System RAM
> 

May be it is too early in the morning, but could you explain whether the
test outcome confirms the patch worked or not? :)

> $ numactl --hardware
> available: 1 nodes (0)
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
> node 0 size: 104614 MB
> node 0 free: 103303 MB
> node distances:
> node   0
>   0:  10
> 
> 
> 
> Note (b): The internal GRUB_MEMORY_CODE (20) value is 
> leaking through to the E820 table.
> 
> That appears to be from this patch on 2013-10-14:
> 	6de9ee86 Pass-through unknown E820 types

If we are discussing ACPI 6.0 systems here, it explicitly says that
values above 12 should be treated as reserved. Does it cause problems?

> which deleted a switch statement mapping the internal codes 
> to E820 values and included this:
> -    case GRUB_MEMORY_CODE:
> -    case GRUB_MEMORY_RESERVED:
> -      ctx->cur.type = GRUB_E820_RESERVED;
> 
> based on this hope:
> +/* GRUB types conveniently match E820 types.  */
> 
> That's only true for types 1 through 5:
> typedef enum grub_memory_type
>   {
>     GRUB_MEMORY_AVAILABLE = 1,
>     GRUB_MEMORY_RESERVED = 2,
>     GRUB_MEMORY_ACPI = 3,
>     GRUB_MEMORY_NVS = 4,
>     GRUB_MEMORY_BADRAM = 5,
>     GRUB_MEMORY_COREBOOT_TABLES = 16,
>     GRUB_MEMORY_CODE = 20,
>     /* This one is special: it's used internally but is never reported
>        by firmware. Don't use -1 as it's used internally for other purposes. */
>     GRUB_MEMORY_HOLE = -2,
>     GRUB_MEMORY_MAX = 0x10000
>   } grub_memory_type_t;
> 
> 
> ---
> Robert Elliott, HPE Persistent Memory
> 
> 
> 
>
Elliott, Robert (Servers) Nov. 27, 2015, 6:22 a.m. UTC | #7
> -----Original Message-----

> From: Andrei Borzenkov [mailto:arvidjaar@gmail.com]

> Sent: Thursday, November 26, 2015 9:58 PM

> To: Elliott, Robert (Persistent Memory) <elliott@hpe.com>; The

> development of GNU GRUB <grub-devel@gnu.org>;

> dan.j.williams@intel.com; linux-nvdimm@lists.01.org

> Subject: Re: grub causing NVDIMMs to be treated as normal memory

> 

> 27.11.2015 02:24, Elliott, Robert (Persistent Memory) ?????:

> >

...
> > Note (a): The 088 and 148 ranges were merged into the

> > 010 and 0c8 usable ranges, as expected for this experiment.

> >

> > The NVDIMM drivers in the kernel don't load (nmem devices load,

> > but namespaces fail and no pmem devices show up):

...
> > That shows up as normal memory in /proc/iomem and elsewhere:

> > 100000000-1a7fffffff : System RAM

> >

> 

> May be it is too early in the morning, but could you explain whether

> the test outcome confirms the patch worked or not? :)


Yes, that all worked as expected.  This description was more
for the linux-nvdimm folks (trying to understand the impact of
the issue).

> > Note (b): The internal GRUB_MEMORY_CODE (20) value is

> > leaking through to the E820 table.

> >

> > That appears to be from this patch on 2013-10-14:

> > 	6de9ee86 Pass-through unknown E820 types

> 

> If we are discussing ACPI 6.0 systems here, it explicitly says that

> values above 12 should be treated as reserved. Does it cause

> problems?


All undefined values are reserved for future standardization;
the meaning they might have in the future is unpredictable.

Software compatible with ACPI 6.0 is supposed to treat them as
reserved, but software compatible with a future version of ACPI
might interpret them as having some different meaning that isn't
compatible with GRUB_MEMORY_CODE.

Some companies used e820 type 12 to mean persistent memory without
getting that assigned by the ACPI WG, so that value was
contaminated.  We should probably mark 20 as contaminated too, 
given this issue.
Elliott, Robert (Servers) Nov. 28, 2015, 6:41 a.m. UTC | #8
> -----Original Message-----
> From: grub-devel-bounces+elliott=hp.com@gnu.org [mailto:grub-devel-
> bounces+elliott=hp.com@gnu.org] On Behalf Of Vladimir 'f-
> coder/phcoder' Serbinenko
> Sent: Friday, November 27, 2015 5:08 AM
> To: The development of GNU GRUB <grub-devel@gnu.org>
> Subject: Re: grub causing NVDIMMs to be treated as normal memory
> 
> What about this patch for the passing of pram?
...
> --- a/include/grub/memory.h
> +++ b/include/grub/memory.h
> @@ -30,6 +30,7 @@ typedef enum grub_memory_type
>      GRUB_MEMORY_ACPI = 3,
>      GRUB_MEMORY_NVS = 4,
>      GRUB_MEMORY_BADRAM = 5,
> +    GRUB_MEMORY_PRAM = 7,

linux chose to use these abbreviations:
* PMEM for the ACPI standard type 7 
* PRAM for the non-standard type 12

You might want to use the same.

>      GRUB_MEMORY_COREBOOT_TABLES = 16,
>      GRUB_MEMORY_CODE = 20,
>      /* This one is special: it's used internally but is never
> reported


The patch works; UEFI type 14 ends up mapped into E820 type 7. With 
linux kernel 4.4rc2 with CONFIG_EFI_STUB disabled:
* UEFI memory map:
  * efi: mem51: [Persistent Memory  |   |  |NV|  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000880000000-0x0000000c7fffffff] (34 GiB for 16 GiB)
  * efi: mem52: [Persistent Memory  |   |  |NV|  |  |  |  |   |WB|WT|WC|UC] range=[0x0000001480000000-0x0000001a7fffffff] (82 GiB for 24 GiB)
* E820 entries in bios_params at startup:
  * bp: [mem 0x0000000880000000-0x0000000c7fffffff] persistent (type 7)
  * bp: [mem 0x0000001480000000-0x0000001a7fffffff] persistent (type 7)


A few more places that might need fixes:

grub-core/commands/lsmmap.c 
* names[] array - as Andre mentioned
* you might want to decode e820 type 12 there too, providing more
  visibility in case it's a legacy BIOS system that still reports
  the pre-standard e820 type 12.

grub-core/efiemu/mm.c:
* efiemu_alloc_requests reqorder[] - add GRUB_EFI_PERSISTENT_MEMORY
  (although we don't want to encourage ever allocating these
   ranges, UNUSABLE_MEMORY is already there)
* fill_hook - map GRUB_MEMORY_PMEM to GRUB_EFI_PERSISTENT_MEMORY
* grub_efiemu_mmap_iterate - handle GRUB_EFI_PERSISTENT_MEMORY.
  Also, this function includes MAX_MEMORY_TYPE in the case statement
  and doesn't have a default case... it should treat all the reserved
  types the same, not handle that one differently
* grub_efiemu_mmap_sort_and_uniq priority[] - add
  GRUB_EFI_PERSISTENT_MEMORY
* I would not bother translating e820 type 12 here 

grub-core/mmap/efi/mmap.c:
* make_efi_memtype - map GRUB_MEMORY_PMEM to GRUB_EFI_PERSISTENT_MEMORY


> >>> Note (b): The internal GRUB_MEMORY_CODE (20) value is
> >>> leaking through to the E820 table.
> >>>
> >>> That appears to be from this patch on 2013-10-14:
> >>> 	6de9ee86 Pass-through unknown E820 types
> >>
> >> If we are discussing ACPI 6.0 systems here, it explicitly says
> that
> >> values above 12 should be treated as reserved. Does it cause
> >> problems?
> >
> > All undefined values are reserved for future standardization;
> > the meaning they might have in the future is unpredictable.
> >
> > Software compatible with ACPI 6.0 is supposed to treat them as
> > reserved, but software compatible with a future version of ACPI
> > might interpret them as having some different meaning that isn't
> > compatible with GRUB_MEMORY_CODE.
> >
> > Some companies used e820 type 12 to mean persistent memory without
> > getting that assigned by the ACPI WG, so that value was
> > contaminated.  We should probably mark 20 as contaminated too,
> > given this issue.
> >
> I see now that we have leaked 16 (coreboot tables) as well. Could we
> mark 16 as contaminated as well?

Yes, if it is possible for that to make it to e820.  If there is
some reason that cannot happen, then we don't need to worry about
that value.

I will prepare an ACPI WG proposal to mark those values as "OEM
defined", since the problem has been there for several years.

> For memory code: should we just pass reserved in linux e820 or is it
> better to keep doing this bug given possible reliance on it by other
> software?

If you can define a standard meaning for 16 and 20, that'd be more
useful than marking them as OEM defined.  There will always be a mix
of software that interprets it as unusable vs. follows this new
advice.

---
Robert Elliott, HPE Persistent Memory
Elliott, Robert (Servers) Dec. 1, 2015, 12:25 a.m. UTC | #9
> grub-core/efiemu/mm.c:
> * efiemu_alloc_requests reqorder[] - add GRUB_EFI_PERSISTENT_MEMORY
>   (although we don't want to encourage ever allocating these
>    ranges, UNUSABLE_MEMORY is already there)

Reviewing that code, since this array has one entry per 
EFI memory type:
static grub_size_t requested_memory[GRUB_EFI_MAX_MEMORY_TYPE];

but grub_efiemu_request_memalign() is the only function
that sets the entries to non-zero values, and that function
filters out GRUB_EFI_LOADER_CODE:
	if (type >= GRUB_EFI_MAX_MEMORY_TYPE || type <= GRUB_EFI_LOADER_CODE)
	  return -2;

then the efiemu_alloc_requests() reqorder[] array entry for
GRUB_EFI_LOADER_CODE is not used.

Should the <= really be <, or should that entry be removed from
the reqorder[] array?  Those lines were all created together in 
patch 5caf964d.

Regardless, I think it's best to add GRUB_EFI_PERSISTENT_MEMORY
to that filter and keep it out of the reqorder[] array, so it's
handled like GRUB_EFI_RESERVED_MEMORY_TYPE (which is the only
value < GRUB_EFI_LOADER_CODE):

  if (type <= GRUB_EFI_LOADER_CODE || type == GRUB_EFI_PERSISTENT_MEMORY ||
        type >= GRUB_EFI_MAX_MEMORY_SIZE)


---
Robert Elliott, HPE Persistent Memory
Elliott, Robert (Servers) Dec. 3, 2015, 5:50 p.m. UTC | #10
> > -----Original Message-----
> > From: grub-devel-bounces+elliott=hp.com@gnu.org [mailto:grub-devel-
> > bounces+elliott=hp.com@gnu.org] On Behalf Of Vladimir 'f-
> > coder/phcoder' Serbinenko
> > Sent: Friday, November 27, 2015 5:08 AM
> > To: The development of GNU GRUB <grub-devel@gnu.org>
> > Subject: Re: grub causing NVDIMMs to be treated as normal memory
> >
> > What about this patch for the passing of pram?
> ...
> > --- a/include/grub/memory.h
> > +++ b/include/grub/memory.h
> > @@ -30,6 +30,7 @@ typedef enum grub_memory_type
> >      GRUB_MEMORY_ACPI = 3,
> >      GRUB_MEMORY_NVS = 4,
> >      GRUB_MEMORY_BADRAM = 5,
> > +    GRUB_MEMORY_PRAM = 7,
...

Here is a proposed patch series expanding upon that.  I compiled
the efiemu changes but am not sure how to test them.  I did test
that lsnvimm properly displays the new type name.

I suggest bumping the version number so we have a way to
distinguish grubs that destroy NVDIMM data from ones that
don't.  It looks like grub has been stuck on "2.02~beta2" for
about two years, which is very confusing.  An update to NEWS
should be included with that patch.


Robert Elliott (4):
  Translate UEFI persistent memory type
  Add persistent memory type to efiemu
  efiemu: Handle all reserved UEFI memory map types the same way
  Bump version to 2.03

 configure.ac                |  2 +-
 grub-core/commands/lsmmap.c |  2 ++
 grub-core/efiemu/mm.c       | 25 ++++++++++++++++++++++---
 grub-core/mmap/efi/mmap.c   | 11 +++++++++++
 include/grub/efi/api.h      |  1 +
 include/grub/memory.h       |  2 ++
 6 files changed, 39 insertions(+), 4 deletions(-)

---
Robert Elliott, HPE Persistent Memory
diff mbox

Patch

From: Andrei Borzenkov <arvidjaar@gmail.com>
Subject: [PATCH] efi: really mark memory of unknown type as reserved

9be4c45dbe3c877d1f4856e99ee15133c6cd2261 added switch case between
fall through cases, causing all memory regions of unknown type to be
marked as available.

Move default case into its own block and add explicit FALLTHROUGH
annotation.

Reported by Elliott, Robert (Persistent Memory) <elliott@hpe.com>

---
 grub-core/mmap/efi/mmap.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/grub-core/mmap/efi/mmap.c b/grub-core/mmap/efi/mmap.c
index a77efe8..6b5f5d8 100644
--- a/grub-core/mmap/efi/mmap.c
+++ b/grub-core/mmap/efi/mmap.c
@@ -73,6 +73,7 @@  grub_efi_mmap_iterate (grub_memory_hook_t hook, void *hook_data,
 		    GRUB_MEMORY_AVAILABLE, hook_data);
 	      break;
 	    }
+	  /* FALLTHROUGH */
 	case GRUB_EFI_RUNTIME_SERVICES_CODE:
 	  hook (desc->physical_start, desc->num_pages * 4096,
 		GRUB_MEMORY_CODE, hook_data);
@@ -83,10 +84,6 @@  grub_efi_mmap_iterate (grub_memory_hook_t hook, void *hook_data,
 		GRUB_MEMORY_BADRAM, hook_data);
 	  break;
 
-	default:
-	  grub_printf ("Unknown memory type %d, considering reserved\n",
-		       desc->type);
-
 	case GRUB_EFI_BOOT_SERVICES_DATA:
 	  if (!avoid_efi_boot_services)
 	    {
@@ -94,6 +91,7 @@  grub_efi_mmap_iterate (grub_memory_hook_t hook, void *hook_data,
 		    GRUB_MEMORY_AVAILABLE, hook_data);
 	      break;
 	    }
+	  /* FALLTHROUGH */
 	case GRUB_EFI_RESERVED_MEMORY_TYPE:
 	case GRUB_EFI_RUNTIME_SERVICES_DATA:
 	case GRUB_EFI_MEMORY_MAPPED_IO:
@@ -119,6 +117,13 @@  grub_efi_mmap_iterate (grub_memory_hook_t hook, void *hook_data,
 	  hook (desc->physical_start, desc->num_pages * 4096,
 		GRUB_MEMORY_NVS, hook_data);
 	  break;
+	default:
+	  grub_printf ("Unknown memory type %d, considering reserved\n",
+		       desc->type);
+	  hook (desc->physical_start, desc->num_pages * 4096,
+		GRUB_MEMORY_RESERVED, hook_data);
+	  break;
+
 	}
     }
 
-- 
tg: (f9d1b44..) u/efi-unknown-memory-type (depends on: master)