diff mbox

[v12,1/5] efi: ARM/arm64: ignore DT memory nodes instead of removing them

Message ID 1456192703-2274-2-git-send-email-ddaney.cavm@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

David Daney Feb. 23, 2016, 1:58 a.m. UTC
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>

There are two problems with the UEFI stub DT memory node removal
routine:
- it deletes nodes as it traverses the tree, which happens to work
  but is not supported, as deletion invalidates the node iterator;
- deleting memory nodes entirely may discard annotations in the form
  of additional properties on the nodes.

Since the discovery of DT memory nodes occurs strictly before the
UEFI init sequence, we can simply clear the memblock memory table
before parsing the UEFI memory map. This way, it is no longer
necessary to remove the nodes, so we can remove that logic from the
stub as well.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: David Daney <david.daney@cavium.com>
---
 drivers/firmware/efi/arm-init.c    |  8 ++++++++
 drivers/firmware/efi/libstub/fdt.c | 24 +-----------------------
 2 files changed, 9 insertions(+), 23 deletions(-)

Comments

Mark Rutland Feb. 23, 2016, 11:58 a.m. UTC | #1
Hi,

On Mon, Feb 22, 2016 at 05:58:19PM -0800, David Daney wrote:
> From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> 
> There are two problems with the UEFI stub DT memory node removal
> routine:
> - it deletes nodes as it traverses the tree, which happens to work
>   but is not supported, as deletion invalidates the node iterator;
> - deleting memory nodes entirely may discard annotations in the form
>   of additional properties on the nodes.
> 
> Since the discovery of DT memory nodes occurs strictly before the
> UEFI init sequence, we can simply clear the memblock memory table
> before parsing the UEFI memory map. This way, it is no longer
> necessary to remove the nodes, so we can remove that logic from the
> stub as well.

This is a little bit scary, but I guess this works.

My only concern is that when we get kexec, a subsequent kernel must also
have EFI memory map support, or things go bad for the next EFI-aware
kernel after that (as things like the runtime services may have been
corrupted by the kernel in the middle). It's difficult to fix the
general case later.

A different option would be to support status="disabled" for the memory
nodes, and ignore these in early_init_dt_scan_memory. That way a kernel
cannot use memory without first having parsed the EFI memory map, and we
can still get NUMA info from the disabled nodes.

You'd still need a new kernel to take into account status, but at least
we'd know all kernels would avoid using RAM that potentially needs to be
preserved.

Ard, Rob, thoughts?

Mark.

> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Signed-off-by: David Daney <david.daney@cavium.com>
> ---
>  drivers/firmware/efi/arm-init.c    |  8 ++++++++
>  drivers/firmware/efi/libstub/fdt.c | 24 +-----------------------
>  2 files changed, 9 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
> index 9e15d57..40c9d85 100644
> --- a/drivers/firmware/efi/arm-init.c
> +++ b/drivers/firmware/efi/arm-init.c
> @@ -143,6 +143,14 @@ static __init void reserve_regions(void)
>  	if (efi_enabled(EFI_DBG))
>  		pr_info("Processing EFI memory map:\n");
>  
> +	/*
> +	 * Discard memblocks discovered so far: if there are any at this
> +	 * point, they originate from memory nodes in the DT, and UEFI
> +	 * uses its own memory map instead.
> +	 */
> +	memblock_dump_all();
> +	memblock_remove(0, ULLONG_MAX);
> +
>  	for_each_efi_memory_desc(&memmap, md) {
>  		paddr = md->phys_addr;
>  		npages = md->num_pages;
> diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
> index cf7b7d4..9df1560 100644
> --- a/drivers/firmware/efi/libstub/fdt.c
> +++ b/drivers/firmware/efi/libstub/fdt.c
> @@ -24,7 +24,7 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
>  			unsigned long map_size, unsigned long desc_size,
>  			u32 desc_ver)
>  {
> -	int node, prev, num_rsv;
> +	int node, num_rsv;
>  	int status;
>  	u32 fdt_val32;
>  	u64 fdt_val64;
> @@ -54,28 +54,6 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
>  		goto fdt_set_fail;
>  
>  	/*
> -	 * Delete any memory nodes present. We must delete nodes which
> -	 * early_init_dt_scan_memory may try to use.
> -	 */
> -	prev = 0;
> -	for (;;) {
> -		const char *type;
> -		int len;
> -
> -		node = fdt_next_node(fdt, prev, NULL);
> -		if (node < 0)
> -			break;
> -
> -		type = fdt_getprop(fdt, node, "device_type", &len);
> -		if (type && strncmp(type, "memory", len) == 0) {
> -			fdt_del_node(fdt, node);
> -			continue;
> -		}
> -
> -		prev = node;
> -	}
> -
> -	/*
>  	 * Delete all memory reserve map entries. When booting via UEFI,
>  	 * kernel will use the UEFI memory map to find reserved regions.
>  	 */
> -- 
> 1.8.3.1
>
Will Deacon Feb. 23, 2016, 12:16 p.m. UTC | #2
On Tue, Feb 23, 2016 at 11:58:05AM +0000, Mark Rutland wrote:
> On Mon, Feb 22, 2016 at 05:58:19PM -0800, David Daney wrote:
> > From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > 
> > There are two problems with the UEFI stub DT memory node removal
> > routine:
> > - it deletes nodes as it traverses the tree, which happens to work
> >   but is not supported, as deletion invalidates the node iterator;
> > - deleting memory nodes entirely may discard annotations in the form
> >   of additional properties on the nodes.
> > 
> > Since the discovery of DT memory nodes occurs strictly before the
> > UEFI init sequence, we can simply clear the memblock memory table
> > before parsing the UEFI memory map. This way, it is no longer
> > necessary to remove the nodes, so we can remove that logic from the
> > stub as well.
> 
> This is a little bit scary, but I guess this works.
> 
> My only concern is that when we get kexec, a subsequent kernel must also
> have EFI memory map support, or things go bad for the next EFI-aware
> kernel after that (as things like the runtime services may have been
> corrupted by the kernel in the middle). It's difficult to fix the
> general case later.
> 
> A different option would be to support status="disabled" for the memory
> nodes, and ignore these in early_init_dt_scan_memory. That way a kernel
> cannot use memory without first having parsed the EFI memory map, and we
> can still get NUMA info from the disabled nodes.

So in that case, the middle, non-EFI kernel would fail to boot?
Realistically, once you've kexec'd a non-EFI payload, I don't think you
can rely on the EFI state remaining intact for future EFI applications.

Is this really something we should be trying to police in the kernel?

Will
Ard Biesheuvel Feb. 23, 2016, 12:20 p.m. UTC | #3
On 23 February 2016 at 13:16, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Feb 23, 2016 at 11:58:05AM +0000, Mark Rutland wrote:
>> On Mon, Feb 22, 2016 at 05:58:19PM -0800, David Daney wrote:
>> > From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> >
>> > There are two problems with the UEFI stub DT memory node removal
>> > routine:
>> > - it deletes nodes as it traverses the tree, which happens to work
>> >   but is not supported, as deletion invalidates the node iterator;
>> > - deleting memory nodes entirely may discard annotations in the form
>> >   of additional properties on the nodes.
>> >
>> > Since the discovery of DT memory nodes occurs strictly before the
>> > UEFI init sequence, we can simply clear the memblock memory table
>> > before parsing the UEFI memory map. This way, it is no longer
>> > necessary to remove the nodes, so we can remove that logic from the
>> > stub as well.
>>
>> This is a little bit scary, but I guess this works.
>>
>> My only concern is that when we get kexec, a subsequent kernel must also
>> have EFI memory map support, or things go bad for the next EFI-aware
>> kernel after that (as things like the runtime services may have been
>> corrupted by the kernel in the middle). It's difficult to fix the
>> general case later.
>>
>> A different option would be to support status="disabled" for the memory
>> nodes, and ignore these in early_init_dt_scan_memory. That way a kernel
>> cannot use memory without first having parsed the EFI memory map, and we
>> can still get NUMA info from the disabled nodes.
>
> So in that case, the middle, non-EFI kernel would fail to boot?
> Realistically, once you've kexec'd a non-EFI payload, I don't think you
> can rely on the EFI state remaining intact for future EFI applications.
>
> Is this really something we should be trying to police in the kernel?
>

Well, we could add entries to /reserved-memory in the stub for all the
regions UEFI cares about, that would probably be sufficient to fix
this case.
Rob Herring Feb. 23, 2016, 10:12 p.m. UTC | #4
On Tue, Feb 23, 2016 at 11:58:05AM +0000, Mark Rutland wrote:
> Hi,
> 
> On Mon, Feb 22, 2016 at 05:58:19PM -0800, David Daney wrote:
> > From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > 
> > There are two problems with the UEFI stub DT memory node removal
> > routine:
> > - it deletes nodes as it traverses the tree, which happens to work
> >   but is not supported, as deletion invalidates the node iterator;
> > - deleting memory nodes entirely may discard annotations in the form
> >   of additional properties on the nodes.
> > 
> > Since the discovery of DT memory nodes occurs strictly before the
> > UEFI init sequence, we can simply clear the memblock memory table
> > before parsing the UEFI memory map. This way, it is no longer
> > necessary to remove the nodes, so we can remove that logic from the
> > stub as well.
> 
> This is a little bit scary, but I guess this works.

The way it is worded/implemented is, I agree. But if we simply say both 
can be present and the kernel will default to UEFI memory map, that 
seems sufficient to me.
 
> My only concern is that when we get kexec, a subsequent kernel must also
> have EFI memory map support, or things go bad for the next EFI-aware
> kernel after that (as things like the runtime services may have been
> corrupted by the kernel in the middle). It's difficult to fix the
> general case later.
> 
> A different option would be to support status="disabled" for the memory
> nodes, and ignore these in early_init_dt_scan_memory. That way a kernel
> cannot use memory without first having parsed the EFI memory map, and we
> can still get NUMA info from the disabled nodes.

That would be a bit strange that the node is disabled, but still used. 

What if DT and UEFI tables are out of sync somehow? RAM is multiple 
mapped and different addresses were picked for example.

> You'd still need a new kernel to take into account status, but at least
> we'd know all kernels would avoid using RAM that potentially needs to be
> preserved.
> 
> Ard, Rob, thoughts?
Frank Rowand Feb. 24, 2016, 7:03 p.m. UTC | #5
On 2/23/2016 3:58 AM, Mark Rutland wrote:
> Hi,
> 
> On Mon, Feb 22, 2016 at 05:58:19PM -0800, David Daney wrote:
>> From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>
>> There are two problems with the UEFI stub DT memory node removal
>> routine:
>> - it deletes nodes as it traverses the tree, which happens to work
>>   but is not supported, as deletion invalidates the node iterator;
>> - deleting memory nodes entirely may discard annotations in the form
>>   of additional properties on the nodes.
>>
>> Since the discovery of DT memory nodes occurs strictly before the
>> UEFI init sequence, we can simply clear the memblock memory table
>> before parsing the UEFI memory map. This way, it is no longer
>> necessary to remove the nodes, so we can remove that logic from the
>> stub as well.
> 
> This is a little bit scary, but I guess this works.
> 
> My only concern is that when we get kexec, a subsequent kernel must also
> have EFI memory map support, or things go bad for the next EFI-aware
> kernel after that (as things like the runtime services may have been
> corrupted by the kernel in the middle). It's difficult to fix the
> general case later.
> 
> A different option would be to support status="disabled" for the memory
> nodes, and ignore these in early_init_dt_scan_memory. That way a kernel
> cannot use memory without first having parsed the EFI memory map, and we
> can still get NUMA info from the disabled nodes.

Please do not play games of treating nodes with status="disabled" as
valid nodes.  The mindset should be if it is disabled, it does not exist.

There have been two bugs reported in the last week where code should
have been ignoring disabled nodes and failed to.  An audit of code
scanning all nodes instead of all enabled nodes is now on my todo list.

< snip >

-Frank
Rob Herring Feb. 24, 2016, 7:30 p.m. UTC | #6
On Wed, Feb 24, 2016 at 1:03 PM, Frank Rowand <frowand.list@gmail.com> wrote:
> On 2/23/2016 3:58 AM, Mark Rutland wrote:
>> Hi,
>>
>> On Mon, Feb 22, 2016 at 05:58:19PM -0800, David Daney wrote:
>>> From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>
>>> There are two problems with the UEFI stub DT memory node removal
>>> routine:
>>> - it deletes nodes as it traverses the tree, which happens to work
>>>   but is not supported, as deletion invalidates the node iterator;
>>> - deleting memory nodes entirely may discard annotations in the form
>>>   of additional properties on the nodes.
>>>
>>> Since the discovery of DT memory nodes occurs strictly before the
>>> UEFI init sequence, we can simply clear the memblock memory table
>>> before parsing the UEFI memory map. This way, it is no longer
>>> necessary to remove the nodes, so we can remove that logic from the
>>> stub as well.
>>
>> This is a little bit scary, but I guess this works.
>>
>> My only concern is that when we get kexec, a subsequent kernel must also
>> have EFI memory map support, or things go bad for the next EFI-aware
>> kernel after that (as things like the runtime services may have been
>> corrupted by the kernel in the middle). It's difficult to fix the
>> general case later.
>>
>> A different option would be to support status="disabled" for the memory
>> nodes, and ignore these in early_init_dt_scan_memory. That way a kernel
>> cannot use memory without first having parsed the EFI memory map, and we
>> can still get NUMA info from the disabled nodes.
>
> Please do not play games of treating nodes with status="disabled" as
> valid nodes.  The mindset should be if it is disabled, it does not exist.
>
> There have been two bugs reported in the last week where code should
> have been ignoring disabled nodes and failed to.  An audit of code
> scanning all nodes instead of all enabled nodes is now on my todo list.

Perhaps we should merge the default/available variants of iterators
into one. I suspect there are some valid uses. Otherwise, we could
also just not even populate those nodes in the live tree. There are
some cases where the kernel changes the status.

Rob
Mark Rutland Feb. 24, 2016, 7:33 p.m. UTC | #7
On Wed, Feb 24, 2016 at 11:03:08AM -0800, Frank Rowand wrote:
> On 2/23/2016 3:58 AM, Mark Rutland wrote:
> > Hi,
> > 
> > On Mon, Feb 22, 2016 at 05:58:19PM -0800, David Daney wrote:
> >> From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >>
> >> There are two problems with the UEFI stub DT memory node removal
> >> routine:
> >> - it deletes nodes as it traverses the tree, which happens to work
> >>   but is not supported, as deletion invalidates the node iterator;
> >> - deleting memory nodes entirely may discard annotations in the form
> >>   of additional properties on the nodes.
> >>
> >> Since the discovery of DT memory nodes occurs strictly before the
> >> UEFI init sequence, we can simply clear the memblock memory table
> >> before parsing the UEFI memory map. This way, it is no longer
> >> necessary to remove the nodes, so we can remove that logic from the
> >> stub as well.
> > 
> > This is a little bit scary, but I guess this works.
> > 
> > My only concern is that when we get kexec, a subsequent kernel must also
> > have EFI memory map support, or things go bad for the next EFI-aware
> > kernel after that (as things like the runtime services may have been
> > corrupted by the kernel in the middle). It's difficult to fix the
> > general case later.
> > 
> > A different option would be to support status="disabled" for the memory
> > nodes, and ignore these in early_init_dt_scan_memory. That way a kernel
> > cannot use memory without first having parsed the EFI memory map, and we
> > can still get NUMA info from the disabled nodes.
> 
> Please do not play games of treating nodes with status="disabled" as
> valid nodes.  The mindset should be if it is disabled, it does not exist.

I completely agree with this generally.

The only possible wiggle room is ePAPR's decription of the precise
meaning of the status property being binding-specific (and there may be
some way to later "enable" the node or otehrwise make use of it). As
with above, we'd only be extracting some information in the presence of
a UEFI memory map.

I agree that this is not a great pattern, and we don't necessarily want
that even for "safe" cases like NUMA.

> There have been two bugs reported in the last week where code should
> have been ignoring disabled nodes and failed to.  An audit of code
> scanning all nodes instead of all enabled nodes is now on my todo list.

That would be great!

Mark.
Mark Rutland Feb. 24, 2016, 7:38 p.m. UTC | #8
On Tue, Feb 23, 2016 at 04:12:02PM -0600, Rob Herring wrote:
> On Tue, Feb 23, 2016 at 11:58:05AM +0000, Mark Rutland wrote:
> > Hi,
> > 
> > On Mon, Feb 22, 2016 at 05:58:19PM -0800, David Daney wrote:
> > > From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > > 
> > > There are two problems with the UEFI stub DT memory node removal
> > > routine:
> > > - it deletes nodes as it traverses the tree, which happens to work
> > >   but is not supported, as deletion invalidates the node iterator;
> > > - deleting memory nodes entirely may discard annotations in the form
> > >   of additional properties on the nodes.
> > > 
> > > Since the discovery of DT memory nodes occurs strictly before the
> > > UEFI init sequence, we can simply clear the memblock memory table
> > > before parsing the UEFI memory map. This way, it is no longer
> > > necessary to remove the nodes, so we can remove that logic from the
> > > stub as well.
> > 
> > This is a little bit scary, but I guess this works.
> 
> The way it is worded/implemented is, I agree. But if we simply say both 
> can be present and the kernel will default to UEFI memory map, that 
> seems sufficient to me.
>  
> > My only concern is that when we get kexec, a subsequent kernel must also
> > have EFI memory map support, or things go bad for the next EFI-aware
> > kernel after that (as things like the runtime services may have been
> > corrupted by the kernel in the middle). It's difficult to fix the
> > general case later.
> > 
> > A different option would be to support status="disabled" for the memory
> > nodes, and ignore these in early_init_dt_scan_memory. That way a kernel
> > cannot use memory without first having parsed the EFI memory map, and we
> > can still get NUMA info from the disabled nodes.
> 
> That would be a bit strange that the node is disabled, but still used. 

I agree this would be strange, and not necessarily a precedent we'd want
to see copied elsewhere.

Per ePAPR, a "disabled" node can be enabled in a binding-specific
manner, so having the presence of a UEFI memory map "enable" the NUMA
information would appear to be permitted.

> What if DT and UEFI tables are out of sync somehow? RAM is multiple 
> mapped and different addresses were picked for example.

That applies regardless of the status of the memory nodes.

My suggestion was only that we acquired the NUMA node information, and
added this node information (and not any additional extent of memory) to
the UEFI memory map.

This is precisely what we do with Ard's code, with the exception that in
the absence of a UEFI memory map the kernel would know it was not
permitted to access memory.

Mark.
diff mbox

Patch

diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
index 9e15d57..40c9d85 100644
--- a/drivers/firmware/efi/arm-init.c
+++ b/drivers/firmware/efi/arm-init.c
@@ -143,6 +143,14 @@  static __init void reserve_regions(void)
 	if (efi_enabled(EFI_DBG))
 		pr_info("Processing EFI memory map:\n");
 
+	/*
+	 * Discard memblocks discovered so far: if there are any at this
+	 * point, they originate from memory nodes in the DT, and UEFI
+	 * uses its own memory map instead.
+	 */
+	memblock_dump_all();
+	memblock_remove(0, ULLONG_MAX);
+
 	for_each_efi_memory_desc(&memmap, md) {
 		paddr = md->phys_addr;
 		npages = md->num_pages;
diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
index cf7b7d4..9df1560 100644
--- a/drivers/firmware/efi/libstub/fdt.c
+++ b/drivers/firmware/efi/libstub/fdt.c
@@ -24,7 +24,7 @@  efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
 			unsigned long map_size, unsigned long desc_size,
 			u32 desc_ver)
 {
-	int node, prev, num_rsv;
+	int node, num_rsv;
 	int status;
 	u32 fdt_val32;
 	u64 fdt_val64;
@@ -54,28 +54,6 @@  efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt,
 		goto fdt_set_fail;
 
 	/*
-	 * Delete any memory nodes present. We must delete nodes which
-	 * early_init_dt_scan_memory may try to use.
-	 */
-	prev = 0;
-	for (;;) {
-		const char *type;
-		int len;
-
-		node = fdt_next_node(fdt, prev, NULL);
-		if (node < 0)
-			break;
-
-		type = fdt_getprop(fdt, node, "device_type", &len);
-		if (type && strncmp(type, "memory", len) == 0) {
-			fdt_del_node(fdt, node);
-			continue;
-		}
-
-		prev = node;
-	}
-
-	/*
 	 * Delete all memory reserve map entries. When booting via UEFI,
 	 * kernel will use the UEFI memory map to find reserved regions.
 	 */