diff mbox series

xen/arm: bootfdt: Check return code of device_tree_for_each_node()

Message ID 20231207101432.37732-1-michal.orzel@amd.com (mailing list archive)
State New, archived
Headers show
Series xen/arm: bootfdt: Check return code of device_tree_for_each_node() | expand

Commit Message

Michal Orzel Dec. 7, 2023, 10:14 a.m. UTC
As a result of not checking the return code of device_tree_for_each_node()
in boot_fdt_info(), any error occured during early FDT parsing does not
stop Xen from booting. This can result in an unwanted behavior in later
boot stages. Fix it by checking the return code and panicing on an error.

Fixes: 9cf4a9a46717 ("device tree: add device_tree_for_each_node()")
Signed-off-by: Michal Orzel <michal.orzel@amd.com>
---
I've lost count how many times I had to fix missing rc check. However, I have
not yet found any checker for this (-Wunused-result is pretty useless).
---
 xen/arch/arm/bootfdt.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Julien Grall Dec. 7, 2023, 12:20 p.m. UTC | #1
Hi Michal,

On 07/12/2023 10:14, Michal Orzel wrote:
> As a result of not checking the return code of device_tree_for_each_node()
> in boot_fdt_info(), any error occured during early FDT parsing does not
> stop Xen from booting. This can result in an unwanted behavior in later
> boot stages. Fix it by checking the return code and panicing on an error.
> 
> Fixes: 9cf4a9a46717 ("device tree: add device_tree_for_each_node()")
> Signed-off-by: Michal Orzel <michal.orzel@amd.com>

With one remark below:

Acked-by: Julien Grall <jgrall@amazon.com>

> ---
> I've lost count how many times I had to fix missing rc check. However, I have
> not yet found any checker for this (-Wunused-result is pretty useless).
> ---
>   xen/arch/arm/bootfdt.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
> index b1f03eb2fcdd..f496a8cf9494 100644
> --- a/xen/arch/arm/bootfdt.c
> +++ b/xen/arch/arm/bootfdt.c
> @@ -541,7 +541,9 @@ size_t __init boot_fdt_info(const void *fdt, paddr_t paddr)
>   
>       add_boot_module(BOOTMOD_FDT, paddr, fdt_totalsize(fdt), false);
>   
> -    device_tree_for_each_node((void *)fdt, 0, early_scan_node, NULL);
> +    ret = device_tree_for_each_node((void *)fdt, 0, early_scan_node, NULL);
> +    if ( ret )
> +        panic("Early FDT parsing failed (%d)\n", ret);

AFAIU, the parsing is done before the console is setup. This means a 
user would not be able to get some logs unless they are re-compiling Xen 
with earlyprintk.

I understand this is not a new issue, but I am getting concerned of the 
amount of check we add before the console is up and running.

What is the impact if we don't check the return here? Is it missing regions?

I wonder whether we can either enable the console earlier, or make 
earlyprintk more dynamic (similar to what Linux is doing with 
earlyprintk=...).

Cheers,
Michal Orzel Dec. 7, 2023, 12:39 p.m. UTC | #2
Hi Julien,

On 07/12/2023 13:20, Julien Grall wrote:
> 
> 
> Hi Michal,
> 
> On 07/12/2023 10:14, Michal Orzel wrote:
>> As a result of not checking the return code of device_tree_for_each_node()
>> in boot_fdt_info(), any error occured during early FDT parsing does not
>> stop Xen from booting. This can result in an unwanted behavior in later
>> boot stages. Fix it by checking the return code and panicing on an error.
>>
>> Fixes: 9cf4a9a46717 ("device tree: add device_tree_for_each_node()")
>> Signed-off-by: Michal Orzel <michal.orzel@amd.com>
> 
> With one remark below:
> 
> Acked-by: Julien Grall <jgrall@amazon.com>
> 
>> ---
>> I've lost count how many times I had to fix missing rc check. However, I have
>> not yet found any checker for this (-Wunused-result is pretty useless).
>> ---
>>   xen/arch/arm/bootfdt.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
>> index b1f03eb2fcdd..f496a8cf9494 100644
>> --- a/xen/arch/arm/bootfdt.c
>> +++ b/xen/arch/arm/bootfdt.c
>> @@ -541,7 +541,9 @@ size_t __init boot_fdt_info(const void *fdt, paddr_t paddr)
>>
>>       add_boot_module(BOOTMOD_FDT, paddr, fdt_totalsize(fdt), false);
>>
>> -    device_tree_for_each_node((void *)fdt, 0, early_scan_node, NULL);
>> +    ret = device_tree_for_each_node((void *)fdt, 0, early_scan_node, NULL);
>> +    if ( ret )
>> +        panic("Early FDT parsing failed (%d)\n", ret);
> 
> AFAIU, the parsing is done before the console is setup. This means a
> user would not be able to get some logs unless they are re-compiling Xen
> with earlyprintk.
> 
> I understand this is not a new issue, but I am getting concerned of the
> amount of check we add before the console is up and running.
> 
> What is the impact if we don't check the return here? Is it missing regions?
There are many things that can go wrong.
Quite recently, I faced an issue where I specified 2 dom0less domUs in configuration
and due to the error while parsing the last node of domU1, domU2 node was skipped and
Xen booted only domU1 without giving any errors.

Issues with shared memory led to either Xen continue to run with improper configuration,
silencing overlap conditions, errors at later boot stages that were impossible to deduct
from the logs.

All in all, early boot code parsing assume the error to result in a failure and the parsing
for domain creation makes this assumption as well (checks are more relaxed to avoid duplication).

For now, we can't do anything better than panicing early if we want to avoid unwanted behavior.

> 
> I wonder whether we can either enable the console earlier, or make
> earlyprintk more dynamic (similar to what Linux is doing with
> earlyprintk=...).
The most imporatant part is early fdt parsing. The main console init cannot be moved that early.
We need to add earlycon support just like in Linux. User could then select either earlyprintk or earlycon.
(earlyprintk would also print from assembly + from everything before boot_fdt_info). I think adding support
for it is doable. But as always, no one has time for that.

~Michal
Julien Grall Dec. 7, 2023, 12:54 p.m. UTC | #3
Hi Michal,

On 07/12/2023 12:39, Michal Orzel wrote:
> On 07/12/2023 13:20, Julien Grall wrote:
>>
>>
>> Hi Michal,
>>
>> On 07/12/2023 10:14, Michal Orzel wrote:
>>> As a result of not checking the return code of device_tree_for_each_node()
>>> in boot_fdt_info(), any error occured during early FDT parsing does not
>>> stop Xen from booting. This can result in an unwanted behavior in later
>>> boot stages. Fix it by checking the return code and panicing on an error.
>>>
>>> Fixes: 9cf4a9a46717 ("device tree: add device_tree_for_each_node()")
>>> Signed-off-by: Michal Orzel <michal.orzel@amd.com>
>>
>> With one remark below:
>>
>> Acked-by: Julien Grall <jgrall@amazon.com>
>>
>>> ---
>>> I've lost count how many times I had to fix missing rc check. However, I have
>>> not yet found any checker for this (-Wunused-result is pretty useless).
>>> ---
>>>    xen/arch/arm/bootfdt.c | 4 +++-
>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
>>> index b1f03eb2fcdd..f496a8cf9494 100644
>>> --- a/xen/arch/arm/bootfdt.c
>>> +++ b/xen/arch/arm/bootfdt.c
>>> @@ -541,7 +541,9 @@ size_t __init boot_fdt_info(const void *fdt, paddr_t paddr)
>>>
>>>        add_boot_module(BOOTMOD_FDT, paddr, fdt_totalsize(fdt), false);
>>>
>>> -    device_tree_for_each_node((void *)fdt, 0, early_scan_node, NULL);
>>> +    ret = device_tree_for_each_node((void *)fdt, 0, early_scan_node, NULL);
>>> +    if ( ret )
>>> +        panic("Early FDT parsing failed (%d)\n", ret);
>>
>> AFAIU, the parsing is done before the console is setup. This means a
>> user would not be able to get some logs unless they are re-compiling Xen
>> with earlyprintk.
>>
>> I understand this is not a new issue, but I am getting concerned of the
>> amount of check we add before the console is up and running.
>>
>> What is the impact if we don't check the return here? Is it missing regions?
> There are many things that can go wrong.
> Quite recently, I faced an issue where I specified 2 dom0less domUs in configuration
> and due to the error while parsing the last node of domU1, domU2 node was skipped and
> Xen booted only domU1 without giving any errors.
> 
> Issues with shared memory led to either Xen continue to run with improper configuration,
> silencing overlap conditions, errors at later boot stages that were impossible to deduct
> from the logs.
> 
> All in all, early boot code parsing assume the error to result in a failure and the parsing
> for domain creation makes this assumption as well (checks are more relaxed to avoid duplication).
> 
> For now, we can't do anything better than panicing early if we want to avoid unwanted behavior.
> 
>>
>> I wonder whether we can either enable the console earlier, or make
>> earlyprintk more dynamic (similar to what Linux is doing with
>> earlyprintk=...).
> The most imporatant part is early fdt parsing. The main console init cannot be moved that early.

I think we need to understand a bit more why because on x86 
consoel_init_preirq() is called very early. So we ought to be able to do 
the same on Arm.

I will add this in my list after I get the earlyprintk working while 
switching the MMU.

Cheers,
Julien Grall Dec. 7, 2023, 12:58 p.m. UTC | #4
On 07/12/2023 12:20, Julien Grall wrote:
> Hi Michal,
> 
> On 07/12/2023 10:14, Michal Orzel wrote:
>> As a result of not checking the return code of 
>> device_tree_for_each_node()
>> in boot_fdt_info(), any error occured during early FDT parsing does not
>> stop Xen from booting. This can result in an unwanted behavior in later
>> boot stages. Fix it by checking the return code and panicing on an error.
>>
>> Fixes: 9cf4a9a46717 ("device tree: add device_tree_for_each_node()")
>> Signed-off-by: Michal Orzel <michal.orzel@amd.com>
> 
> With one remark below:
> 
> Acked-by: Julien Grall <jgrall@amazon.com>

It is now committed.

Cheers,
Michal Orzel Dec. 7, 2023, 2:34 p.m. UTC | #5
On 07/12/2023 13:54, Julien Grall wrote:
> 
> 
> Hi Michal,
> 
> On 07/12/2023 12:39, Michal Orzel wrote:
>> On 07/12/2023 13:20, Julien Grall wrote:
>>>
>>>
>>> Hi Michal,
>>>
>>> On 07/12/2023 10:14, Michal Orzel wrote:
>>>> As a result of not checking the return code of device_tree_for_each_node()
>>>> in boot_fdt_info(), any error occured during early FDT parsing does not
>>>> stop Xen from booting. This can result in an unwanted behavior in later
>>>> boot stages. Fix it by checking the return code and panicing on an error.
>>>>
>>>> Fixes: 9cf4a9a46717 ("device tree: add device_tree_for_each_node()")
>>>> Signed-off-by: Michal Orzel <michal.orzel@amd.com>
>>>
>>> With one remark below:
>>>
>>> Acked-by: Julien Grall <jgrall@amazon.com>
>>>
>>>> ---
>>>> I've lost count how many times I had to fix missing rc check. However, I have
>>>> not yet found any checker for this (-Wunused-result is pretty useless).
>>>> ---
>>>>    xen/arch/arm/bootfdt.c | 4 +++-
>>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
>>>> index b1f03eb2fcdd..f496a8cf9494 100644
>>>> --- a/xen/arch/arm/bootfdt.c
>>>> +++ b/xen/arch/arm/bootfdt.c
>>>> @@ -541,7 +541,9 @@ size_t __init boot_fdt_info(const void *fdt, paddr_t paddr)
>>>>
>>>>        add_boot_module(BOOTMOD_FDT, paddr, fdt_totalsize(fdt), false);
>>>>
>>>> -    device_tree_for_each_node((void *)fdt, 0, early_scan_node, NULL);
>>>> +    ret = device_tree_for_each_node((void *)fdt, 0, early_scan_node, NULL);
>>>> +    if ( ret )
>>>> +        panic("Early FDT parsing failed (%d)\n", ret);
>>>
>>> AFAIU, the parsing is done before the console is setup. This means a
>>> user would not be able to get some logs unless they are re-compiling Xen
>>> with earlyprintk.
>>>
>>> I understand this is not a new issue, but I am getting concerned of the
>>> amount of check we add before the console is up and running.
>>>
>>> What is the impact if we don't check the return here? Is it missing regions?
>> There are many things that can go wrong.
>> Quite recently, I faced an issue where I specified 2 dom0less domUs in configuration
>> and due to the error while parsing the last node of domU1, domU2 node was skipped and
>> Xen booted only domU1 without giving any errors.
>>
>> Issues with shared memory led to either Xen continue to run with improper configuration,
>> silencing overlap conditions, errors at later boot stages that were impossible to deduct
>> from the logs.
>>
>> All in all, early boot code parsing assume the error to result in a failure and the parsing
>> for domain creation makes this assumption as well (checks are more relaxed to avoid duplication).
>>
>> For now, we can't do anything better than panicing early if we want to avoid unwanted behavior.
>>
>>>
>>> I wonder whether we can either enable the console earlier, or make
>>> earlyprintk more dynamic (similar to what Linux is doing with
>>> earlyprintk=...).
>> The most imporatant part is early fdt parsing. The main console init cannot be moved that early.
> 
> I think we need to understand a bit more why because on x86
> consoel_init_preirq() is called very early. So we ought to be able to do
> the same on Arm.
But this won't cover early fdt parsing. I don't think we can get away without adding earlycon
and earlycon helpers in all the serial drivers. arm_uart_init depends on unflattening fdt which
depends on relocating fdt which is done after parsing FDT. We want to be able to print messages
after early_fdt_map. Once FDT is mapped, the only thing we need is to retrieve the bootargs to parse
for earlycon= , add the region to fixmap and register the handlers.

~Michal
diff mbox series

Patch

diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
index b1f03eb2fcdd..f496a8cf9494 100644
--- a/xen/arch/arm/bootfdt.c
+++ b/xen/arch/arm/bootfdt.c
@@ -541,7 +541,9 @@  size_t __init boot_fdt_info(const void *fdt, paddr_t paddr)
 
     add_boot_module(BOOTMOD_FDT, paddr, fdt_totalsize(fdt), false);
 
-    device_tree_for_each_node((void *)fdt, 0, early_scan_node, NULL);
+    ret = device_tree_for_each_node((void *)fdt, 0, early_scan_node, NULL);
+    if ( ret )
+        panic("Early FDT parsing failed (%d)\n", ret);
 
     /*
      * On Arm64 setup_directmap_mappings() expects to be called with the lowest