diff mbox series

xen/arm: Skip memory nodes if not enabled

Message ID 20230926053333.180811-1-leo.yan@linaro.org (mailing list archive)
State Superseded
Headers show
Series xen/arm: Skip memory nodes if not enabled | expand

Commit Message

Leo Yan Sept. 26, 2023, 5:33 a.m. UTC
During the Linux kernel booting, an error is reported by the Xen
hypervisor:

  (XEN) arch/arm/p2m.c:2202: d0v0: Failing to acquire the MFN 0x1a02dc

The kernel attempts to use an invalid memory frame number, which can be
converted to: 0x1a02dc << PAGE_SHIFT, resulting in 0x1_a02d_c000.

The invalid memory frame falls into a reserved memory node, which is
defined in the device tree as below:

  reserved-memory {
          #address-cells = <0x02>;
          #size-cells = <0x02>;
          ranges;

	  ...

          ethosn_reserved {
                  compatible = "shared-dma-pool";
                  reg = <0x01 0xa0000000 0x00 0x20000000>;
                  status = "disabled";
                  no-map;
          };

	  ...
  };

Xen excludes all reserved memory regions from the frame management
through the dt_unreserved_regions() function. On the other hand, the
reserved memory nodes are passed to the Linux kernel. However, the Linux
kernel ignores the 'ethosn_reserved' node since its status is
"disabled". This leads to the Linux kernel to allocate pages from the
reserved memory range.

As a result, when the kernel passes the data structure residing in the
frame 0x1_a02d_c000 in the Xen hypervisor, the hypervisor detects that
it misses to manage the frame and reports the error.

Essentially, this issue is caused by the Xen hypervisor which misses to
handle the status for the memory nodes (for both the normal memory nodes
and the reserved memory nodes) and transfers them to the Linux kernel.

This patch introduces a function memory_node_is_available(). If it
detects a memory node is not enabled, the hypervisor will not add the
memory region into the memory lists. In the end, this avoids to generate
the memory nodes from the memory lists sent to the kernel and the kernel
cannot use the disabled memory nodes any longer.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 xen/arch/arm/bootfdt.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Comments

Michal Orzel Sept. 26, 2023, 8:36 a.m. UTC | #1
Hello,

On 26/09/2023 07:33, Leo Yan wrote:
> 
> 
> During the Linux kernel booting, an error is reported by the Xen
> hypervisor:
> 
>   (XEN) arch/arm/p2m.c:2202: d0v0: Failing to acquire the MFN 0x1a02dc
> 
> The kernel attempts to use an invalid memory frame number, which can be
> converted to: 0x1a02dc << PAGE_SHIFT, resulting in 0x1_a02d_c000.
> 
> The invalid memory frame falls into a reserved memory node, which is
> defined in the device tree as below:
> 
>   reserved-memory {
>           #address-cells = <0x02>;
>           #size-cells = <0x02>;
>           ranges;
> 
>           ...
> 
>           ethosn_reserved {
>                   compatible = "shared-dma-pool";
>                   reg = <0x01 0xa0000000 0x00 0x20000000>;
>                   status = "disabled";
>                   no-map;
>           };
> 
>           ...
>   };
> 
> Xen excludes all reserved memory regions from the frame management
> through the dt_unreserved_regions() function. On the other hand, the
> reserved memory nodes are passed to the Linux kernel. However, the Linux
> kernel ignores the 'ethosn_reserved' node since its status is
> "disabled". This leads to the Linux kernel to allocate pages from the
> reserved memory range.
> 
> As a result, when the kernel passes the data structure residing in the
> frame 0x1_a02d_c000 in the Xen hypervisor, the hypervisor detects that
> it misses to manage the frame and reports the error.
> 
> Essentially, this issue is caused by the Xen hypervisor which misses to
> handle the status for the memory nodes (for both the normal memory nodes
> and the reserved memory nodes) and transfers them to the Linux kernel.
> 
> This patch introduces a function memory_node_is_available(). If it
> detects a memory node is not enabled, the hypervisor will not add the
> memory region into the memory lists. In the end, this avoids to generate
> the memory nodes from the memory lists sent to the kernel and the kernel
> cannot use the disabled memory nodes any longer.

Interesting. So FWICS, we have 2 issues that have a common ground:
1) If the reserved memory node has a status "disabled", it implies that this region
is no longer reserved and can be used which is not handled today by Xen and leads
to the above mentioned problem.

2) If the memory node has a status "disabled" it implies that it should not be used
which is not the case in current Xen. This means that at the moment, Xen would try
to use such a memory region which is incorrect.

I think the commit msg should be more generic and focus on these two issues.
Summary:
Xen does not handle the status property of memory nodes and ends up using them.
Xen does not handle the status property of reserved memory nodes. If status is disabled
it means the reserved region shall no longer be treated as reserved. Xen passes the reserved
memory nodes untouched to dom0 fdt and creates a memory node to cover reserved regions.
Disabled reserved memory nodes are ignored by the guest which leaves us with the exposed
memory nodes. Guest can access such a region but it is excluded from the page management in Xen
which results in Xen being unable to obtain the corresponding MFN.

> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  xen/arch/arm/bootfdt.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
> index 2673ad17a1..b46dde06a9 100644
> --- a/xen/arch/arm/bootfdt.c
> +++ b/xen/arch/arm/bootfdt.c
> @@ -206,11 +206,27 @@ int __init device_tree_for_each_node(const void *fdt, int node,
>      return 0;
>  }
> 
> +static bool __init memory_node_is_available(const void *fdt, unsigned long node)
This function is not strictly related to memory node so it would be better to call it e.g. device_tree_node_is_available.
This way it can be used in the future for other nodes if needed. Also, I would move it somewhere near the top of the file
next to other helpers.
Also, node should be of type 'int'


> +{
> +    const char *status = fdt_getprop(fdt, node, "status", NULL);
> +
> +    if (!status)
white spaces between brackets and condition
if ( !status )

> +        return true;
> +
> +    if (!strcmp(status, "ok") || !strcmp(status, "okay"))
white spaces between brackets and condition
if ( !strcmp(status, "ok") || !strcmp(status, "okay") )

> +        return true;
> +
> +    return false;
> +}
> +
>  static int __init process_memory_node(const void *fdt, int node,
>                                        const char *name, int depth,
>                                        u32 address_cells, u32 size_cells,
>                                        void *data)
>  {
> +    if (!memory_node_is_available(fdt, node))
> +        return 0;
I would move this check to device_tree_get_meminfo()
> +
>      return device_tree_get_meminfo(fdt, node, "reg", address_cells, size_cells,
>                                     data, MEMBANK_DEFAULT);
>  }
> --
> 2.39.2
> 
> 

Also, I think it would be nice to add ASSERT(bootinfo.mem.nr_banks); e.g. in boot_fdt_info().
Otherwise the panic from Xen when there is no memory bank:
The frametable cannot cover the physical region ...
is not really meaningful for normal users.

This is just my suggestion (@Julien ?)

~Michal
Luca Fancellu Sept. 26, 2023, 9:52 a.m. UTC | #2
> On 26 Sep 2023, at 06:33, Leo Yan <leo.yan@linaro.org> wrote:
> 
> During the Linux kernel booting, an error is reported by the Xen
> hypervisor:
> 
>  (XEN) arch/arm/p2m.c:2202: d0v0: Failing to acquire the MFN 0x1a02dc
> 
> The kernel attempts to use an invalid memory frame number, which can be
> converted to: 0x1a02dc << PAGE_SHIFT, resulting in 0x1_a02d_c000.
> 
> The invalid memory frame falls into a reserved memory node, which is
> defined in the device tree as below:
> 
>  reserved-memory {
>          #address-cells = <0x02>;
>          #size-cells = <0x02>;
>          ranges;
> 
>  ...
> 
>          ethosn_reserved {
>                  compatible = "shared-dma-pool";
>                  reg = <0x01 0xa0000000 0x00 0x20000000>;
>                  status = "disabled";
>                  no-map;
>          };
> 
>  ...
>  };
> 
> Xen excludes all reserved memory regions from the frame management
> through the dt_unreserved_regions() function. On the other hand, the
> reserved memory nodes are passed to the Linux kernel. However, the Linux
> kernel ignores the 'ethosn_reserved' node since its status is
> "disabled". This leads to the Linux kernel to allocate pages from the
> reserved memory range.

I might be wrong, but reading the specifications seems that “status” is not a property
of the child nodes of /reserved-memory, so I’m not sure Xen should do something about it.

If it is the case, it’s possible that the device tree is wrong in this case.
Michal Orzel Sept. 26, 2023, 10:04 a.m. UTC | #3
Hi Luca,

On 26/09/2023 11:52, Luca Fancellu wrote:
> 
> 
>> On 26 Sep 2023, at 06:33, Leo Yan <leo.yan@linaro.org> wrote:
>>
>> During the Linux kernel booting, an error is reported by the Xen
>> hypervisor:
>>
>>  (XEN) arch/arm/p2m.c:2202: d0v0: Failing to acquire the MFN 0x1a02dc
>>
>> The kernel attempts to use an invalid memory frame number, which can be
>> converted to: 0x1a02dc << PAGE_SHIFT, resulting in 0x1_a02d_c000.
>>
>> The invalid memory frame falls into a reserved memory node, which is
>> defined in the device tree as below:
>>
>>  reserved-memory {
>>          #address-cells = <0x02>;
>>          #size-cells = <0x02>;
>>          ranges;
>>
>>  ...
>>
>>          ethosn_reserved {
>>                  compatible = "shared-dma-pool";
>>                  reg = <0x01 0xa0000000 0x00 0x20000000>;
>>                  status = "disabled";
>>                  no-map;
>>          };
>>
>>  ...
>>  };
>>
>> Xen excludes all reserved memory regions from the frame management
>> through the dt_unreserved_regions() function. On the other hand, the
>> reserved memory nodes are passed to the Linux kernel. However, the Linux
>> kernel ignores the 'ethosn_reserved' node since its status is
>> "disabled". This leads to the Linux kernel to allocate pages from the
>> reserved memory range.
> 
> I might be wrong, but reading the specifications seems that “status” is not a property
> of the child nodes of /reserved-memory, so I’m not sure Xen should do something about it.
Please take a look at dt documentation (v0.4) for /memory and /reserved-memory.
Under the tables listing possible properties, there is a statement:
Note: All other standard properties (Section 2.3) are allowed but are optional.

"status" is part of standard properties so it is perfectly fine for /memory or /reserved-memory
nodes to have it defined.

~Michal
Luca Fancellu Sept. 26, 2023, 10:10 a.m. UTC | #4
> On 26 Sep 2023, at 11:04, Michal Orzel <michal.orzel@amd.com> wrote:
> 
> Hi Luca,
> 
> On 26/09/2023 11:52, Luca Fancellu wrote:
>> 
>> 
>>> On 26 Sep 2023, at 06:33, Leo Yan <leo.yan@linaro.org> wrote:
>>> 
>>> During the Linux kernel booting, an error is reported by the Xen
>>> hypervisor:
>>> 
>>> (XEN) arch/arm/p2m.c:2202: d0v0: Failing to acquire the MFN 0x1a02dc
>>> 
>>> The kernel attempts to use an invalid memory frame number, which can be
>>> converted to: 0x1a02dc << PAGE_SHIFT, resulting in 0x1_a02d_c000.
>>> 
>>> The invalid memory frame falls into a reserved memory node, which is
>>> defined in the device tree as below:
>>> 
>>> reserved-memory {
>>>         #address-cells = <0x02>;
>>>         #size-cells = <0x02>;
>>>         ranges;
>>> 
>>> ...
>>> 
>>>         ethosn_reserved {
>>>                 compatible = "shared-dma-pool";
>>>                 reg = <0x01 0xa0000000 0x00 0x20000000>;
>>>                 status = "disabled";
>>>                 no-map;
>>>         };
>>> 
>>> ...
>>> };
>>> 
>>> Xen excludes all reserved memory regions from the frame management
>>> through the dt_unreserved_regions() function. On the other hand, the
>>> reserved memory nodes are passed to the Linux kernel. However, the Linux
>>> kernel ignores the 'ethosn_reserved' node since its status is
>>> "disabled". This leads to the Linux kernel to allocate pages from the
>>> reserved memory range.
>> 
>> I might be wrong, but reading the specifications seems that “status” is not a property
>> of the child nodes of /reserved-memory, so I’m not sure Xen should do something about it.
> Please take a look at dt documentation (v0.4) for /memory and /reserved-memory.
> Under the tables listing possible properties, there is a statement:
> Note: All other standard properties (Section 2.3) are allowed but are optional.

Thanks for pointing that out, I missed that bit.

> 
> "status" is part of standard properties so it is perfectly fine for /memory or /reserved-memory
> nodes to have it defined.
> 
> ~Michal
Leo Yan Sept. 26, 2023, 11:13 a.m. UTC | #5
Hi Michal,

On Tue, Sep 26, 2023 at 10:36:04AM +0200, Michal Orzel wrote:

[...]

> > Essentially, this issue is caused by the Xen hypervisor which misses to
> > handle the status for the memory nodes (for both the normal memory nodes
> > and the reserved memory nodes) and transfers them to the Linux kernel.
> > 
> > This patch introduces a function memory_node_is_available(). If it
> > detects a memory node is not enabled, the hypervisor will not add the
> > memory region into the memory lists. In the end, this avoids to generate
> > the memory nodes from the memory lists sent to the kernel and the kernel
> > cannot use the disabled memory nodes any longer.
> 
> Interesting. So FWICS, we have 2 issues that have a common ground:
>
> 1) If the reserved memory node has a status "disabled", it implies that this region
> is no longer reserved and can be used which is not handled today by Xen and leads
> to the above mentioned problem.

Correct.

> 2) If the memory node has a status "disabled" it implies that it should not be used
> which is not the case in current Xen. This means that at the moment, Xen would try
> to use such a memory region which is incorrect.

Exactly.

> I think the commit msg should be more generic and focus on these two issues.
> Summary:
> Xen does not handle the status property of memory nodes and ends up using them.
> Xen does not handle the status property of reserved memory nodes. If status is disabled
> it means the reserved region shall no longer be treated as reserved. Xen passes the reserved
> memory nodes untouched to dom0 fdt and creates a memory node to cover reserved regions.
> Disabled reserved memory nodes are ignored by the guest which leaves us with the exposed
> memory nodes. Guest can access such a region but it is excluded from the page management in Xen
> which results in Xen being unable to obtain the corresponding MFN.

Yes, thanks a lot for good summary!

In theory, a good practice should use two separate patches to fix two
issues respectively. Given we can use simple change to fix these two
issues in one patch, I will amend the patch's commit log with your
summary for better recording these issues.

> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  xen/arch/arm/bootfdt.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
> > index 2673ad17a1..b46dde06a9 100644
> > --- a/xen/arch/arm/bootfdt.c
> > +++ b/xen/arch/arm/bootfdt.c
> > @@ -206,11 +206,27 @@ int __init device_tree_for_each_node(const void *fdt, int node,
> >      return 0;
> >  }
> > 
> > +static bool __init memory_node_is_available(const void *fdt, unsigned long node)
>
> This function is not strictly related to memory node so it would be better to call it e.g. device_tree_node_is_available.
> This way it can be used in the future for other nodes if needed. Also, I would move it somewhere near the top of the file
> next to other helpers.

Will do.

> Also, node should be of type 'int'

Will fix.

> > +{
> > +    const char *status = fdt_getprop(fdt, node, "status", NULL);
> > +
> > +    if (!status)
> white spaces between brackets and condition
> if ( !status )

Will fix.

> > +        return true;
> > +
> > +    if (!strcmp(status, "ok") || !strcmp(status, "okay"))
> white spaces between brackets and condition
> if ( !strcmp(status, "ok") || !strcmp(status, "okay") )

Will fix.

> > +        return true;
> > +
> > +    return false;
> > +}
> > +
> >  static int __init process_memory_node(const void *fdt, int node,
> >                                        const char *name, int depth,
> >                                        u32 address_cells, u32 size_cells,
> >                                        void *data)
> >  {
> > +    if (!memory_node_is_available(fdt, node))
> > +        return 0;
>
> I would move this check to device_tree_get_meminfo()

Okay, I will do it.

> > +
> >      return device_tree_get_meminfo(fdt, node, "reg", address_cells, size_cells,
> >                                     data, MEMBANK_DEFAULT);
> >  }
> > --
> > 2.39.2
> > 
> > 
> 
> Also, I think it would be nice to add ASSERT(bootinfo.mem.nr_banks); e.g. in boot_fdt_info().
> Otherwise the panic from Xen when there is no memory bank:
> The frametable cannot cover the physical region ...
> is not really meaningful for normal users.

I'd like to use a separate patch to validate the memory banks.
> 
> This is just my suggestion (@Julien ?)

Thank you a lot for review.

Leo
Leo Yan Sept. 26, 2023, 11:23 a.m. UTC | #6
Hi Luca,

On Tue, Sep 26, 2023 at 10:10:57AM +0000, Luca Fancellu wrote:

[...]

> >> I might be wrong, but reading the specifications seems that “status” is not a property
> >> of the child nodes of /reserved-memory, so I’m not sure Xen should do something about it.
> >
> > Please take a look at dt documentation (v0.4) for /memory and /reserved-memory.
> > Under the tables listing possible properties, there is a statement:
> > Note: All other standard properties (Section 2.3) are allowed but are optional.
> 
> Thanks for pointing that out, I missed that bit.

Though ePAPR [1] doesn't explicitly say "status" property can be
included by memory nodes and reserved memory nodes, "status" property
is a standard property.

The Linux kernel checks "status" property for memory nodes and reserved
memory nodes with the of_fdt_device_is_available() function. Actually,
I just reuse this function and rename it in current patch :)

Thanks,
Leo

[1] https://elinux.org/images/c/cf/Power_ePAPR_APPROVED_v1.1.pdf
Julien Grall Sept. 27, 2023, 10:48 a.m. UTC | #7
Hi Leo,

Adding some comments on top of what already said.

On 26/09/2023 06:33, Leo Yan wrote:
> +static bool __init memory_node_is_available(const void *fdt, unsigned long node)
> +{
> +    const char *status = fdt_getprop(fdt, node, "status", NULL);
> +
> +    if (!status)
> +        return true;
> +

We have a similar implement based on the unflatten DT (see 
dt_device_is_available()). While trying to consolidate them is probably 
not worth it, I think the behavior should match.

In this case, you want to check the len is greater than 0 before doing 
the comparison.

Cheers,
Julien Grall Sept. 27, 2023, 11:01 a.m. UTC | #8
Hi Michal,

On 26/09/2023 09:36, Michal Orzel wrote:
> On 26/09/2023 07:33, Leo Yan wrote:
>>
>>
>> During the Linux kernel booting, an error is reported by the Xen
>> hypervisor:
>>
>>    (XEN) arch/arm/p2m.c:2202: d0v0: Failing to acquire the MFN 0x1a02dc
>>
>> The kernel attempts to use an invalid memory frame number, which can be
>> converted to: 0x1a02dc << PAGE_SHIFT, resulting in 0x1_a02d_c000.
>>
>> The invalid memory frame falls into a reserved memory node, which is
>> defined in the device tree as below:
>>
>>    reserved-memory {
>>            #address-cells = <0x02>;
>>            #size-cells = <0x02>;
>>            ranges;
>>
>>            ...
>>
>>            ethosn_reserved {
>>                    compatible = "shared-dma-pool";
>>                    reg = <0x01 0xa0000000 0x00 0x20000000>;
>>                    status = "disabled";
>>                    no-map;
>>            };
>>
>>            ...
>>    };
>>
>> Xen excludes all reserved memory regions from the frame management
>> through the dt_unreserved_regions() function. On the other hand, the
>> reserved memory nodes are passed to the Linux kernel. However, the Linux
>> kernel ignores the 'ethosn_reserved' node since its status is
>> "disabled". This leads to the Linux kernel to allocate pages from the
>> reserved memory range.
>>
>> As a result, when the kernel passes the data structure residing in the
>> frame 0x1_a02d_c000 in the Xen hypervisor, the hypervisor detects that
>> it misses to manage the frame and reports the error.
>>
>> Essentially, this issue is caused by the Xen hypervisor which misses to
>> handle the status for the memory nodes (for both the normal memory nodes
>> and the reserved memory nodes) and transfers them to the Linux kernel.
>>
>> This patch introduces a function memory_node_is_available(). If it
>> detects a memory node is not enabled, the hypervisor will not add the
>> memory region into the memory lists. In the end, this avoids to generate
>> the memory nodes from the memory lists sent to the kernel and the kernel
>> cannot use the disabled memory nodes any longer.
> 
> Interesting. So FWICS, we have 2 issues that have a common ground:
> 1) If the reserved memory node has a status "disabled", it implies that this region
> is no longer reserved and can be used which is not handled today by Xen and leads
> to the above mentioned problem.
> 
> 2) If the memory node has a status "disabled" it implies that it should not be used
> which is not the case in current Xen. This means that at the moment, Xen would try
> to use such a memory region which is incorrect.
> 
> I think the commit msg should be more generic and focus on these two issues.
> Summary:
> Xen does not handle the status property of memory nodes and ends up using them.
> Xen does not handle the status property of reserved memory nodes. If status is disabled
> it means the reserved region shall no longer be treated as reserved. Xen passes the reserved
> memory nodes untouched to dom0 fdt and creates a memory node to cover reserved regions.
> Disabled reserved memory nodes are ignored by the guest which leaves us with the exposed
> memory nodes. Guest can access such a region but it is excluded from the page management in Xen
> which results in Xen being unable to obtain the corresponding MFN.
> 
>>
>> Signed-off-by: Leo Yan <leo.yan@linaro.org>
>> ---
>>   xen/arch/arm/bootfdt.c | 16 ++++++++++++++++
>>   1 file changed, 16 insertions(+)
>>
>> diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
>> index 2673ad17a1..b46dde06a9 100644
>> --- a/xen/arch/arm/bootfdt.c
>> +++ b/xen/arch/arm/bootfdt.c
>> @@ -206,11 +206,27 @@ int __init device_tree_for_each_node(const void *fdt, int node,
>>       return 0;
>>   }
>>
>> +static bool __init memory_node_is_available(const void *fdt, unsigned long node)
> This function is not strictly related to memory node so it would be better to call it e.g. device_tree_node_is_available.

+1.

>> +{
>> +    const char *status = fdt_getprop(fdt, node, "status", NULL);
>> +
>> +    if (!status)
> white spaces between brackets and condition
> if ( !status )
> 
>> +        return true;
>> +
>> +    if (!strcmp(status, "ok") || !strcmp(status, "okay"))
> white spaces between brackets and condition
> if ( !strcmp(status, "ok") || !strcmp(status, "okay") )
> 
>> +        return true;
>> +
>> +    return false;
>> +}
>> +
>>   static int __init process_memory_node(const void *fdt, int node,
>>                                         const char *name, int depth,
>>                                         u32 address_cells, u32 size_cells,
>>                                         void *data)
>>   {
>> +    if (!memory_node_is_available(fdt, node))
>> +        return 0;
> I would move this check to device_tree_get_meminfo()

I am ok with that. But the commit message would need to gain a paragraph 
explaining that we will now support "status" for static memory/heap.

>> +
>>       return device_tree_get_meminfo(fdt, node, "reg", address_cells, size_cells,
>>                                      data, MEMBANK_DEFAULT);
>>   }
>> --
>> 2.39.2
>>
>>
> 
> Also, I think it would be nice to add ASSERT(bootinfo.mem.nr_banks); e.g. in boot_fdt_info().
> Otherwise the panic from Xen when there is no memory bank:
> The frametable cannot cover the physical region ...
> is not really meaningful for normal users.
> 
> This is just my suggestion (@Julien ?)

I think a check for the number of banks makes sense. But I would prefer 
if the check also happens in production. So, something like:

if ( !bootinfo.mem.nr_banks )
   panic(...);

We already have one in the setup_mm() for arm32. So we need another one 
for the arm64 version. The other solution is to consolidate it in one 
place you suggested.

I have a slightly preference for having it in setup_mm() even if this is 
duplicated.

Cheers,
Michal Orzel Sept. 27, 2023, 12:49 p.m. UTC | #9
Hi Julien,

On 27/09/2023 13:01, Julien Grall wrote:
> 
> 
> Hi Michal,
> 
> On 26/09/2023 09:36, Michal Orzel wrote:
>> On 26/09/2023 07:33, Leo Yan wrote:
>>>
>>>
>>> During the Linux kernel booting, an error is reported by the Xen
>>> hypervisor:
>>>
>>>    (XEN) arch/arm/p2m.c:2202: d0v0: Failing to acquire the MFN 0x1a02dc
>>>
>>> The kernel attempts to use an invalid memory frame number, which can be
>>> converted to: 0x1a02dc << PAGE_SHIFT, resulting in 0x1_a02d_c000.
>>>
>>> The invalid memory frame falls into a reserved memory node, which is
>>> defined in the device tree as below:
>>>
>>>    reserved-memory {
>>>            #address-cells = <0x02>;
>>>            #size-cells = <0x02>;
>>>            ranges;
>>>
>>>            ...
>>>
>>>            ethosn_reserved {
>>>                    compatible = "shared-dma-pool";
>>>                    reg = <0x01 0xa0000000 0x00 0x20000000>;
>>>                    status = "disabled";
>>>                    no-map;
>>>            };
>>>
>>>            ...
>>>    };
>>>
>>> Xen excludes all reserved memory regions from the frame management
>>> through the dt_unreserved_regions() function. On the other hand, the
>>> reserved memory nodes are passed to the Linux kernel. However, the Linux
>>> kernel ignores the 'ethosn_reserved' node since its status is
>>> "disabled". This leads to the Linux kernel to allocate pages from the
>>> reserved memory range.
>>>
>>> As a result, when the kernel passes the data structure residing in the
>>> frame 0x1_a02d_c000 in the Xen hypervisor, the hypervisor detects that
>>> it misses to manage the frame and reports the error.
>>>
>>> Essentially, this issue is caused by the Xen hypervisor which misses to
>>> handle the status for the memory nodes (for both the normal memory nodes
>>> and the reserved memory nodes) and transfers them to the Linux kernel.
>>>
>>> This patch introduces a function memory_node_is_available(). If it
>>> detects a memory node is not enabled, the hypervisor will not add the
>>> memory region into the memory lists. In the end, this avoids to generate
>>> the memory nodes from the memory lists sent to the kernel and the kernel
>>> cannot use the disabled memory nodes any longer.
>>
>> Interesting. So FWICS, we have 2 issues that have a common ground:
>> 1) If the reserved memory node has a status "disabled", it implies that this region
>> is no longer reserved and can be used which is not handled today by Xen and leads
>> to the above mentioned problem.
>>
>> 2) If the memory node has a status "disabled" it implies that it should not be used
>> which is not the case in current Xen. This means that at the moment, Xen would try
>> to use such a memory region which is incorrect.
>>
>> I think the commit msg should be more generic and focus on these two issues.
>> Summary:
>> Xen does not handle the status property of memory nodes and ends up using them.
>> Xen does not handle the status property of reserved memory nodes. If status is disabled
>> it means the reserved region shall no longer be treated as reserved. Xen passes the reserved
>> memory nodes untouched to dom0 fdt and creates a memory node to cover reserved regions.
>> Disabled reserved memory nodes are ignored by the guest which leaves us with the exposed
>> memory nodes. Guest can access such a region but it is excluded from the page management in Xen
>> which results in Xen being unable to obtain the corresponding MFN.
>>
>>>
>>> Signed-off-by: Leo Yan <leo.yan@linaro.org>
>>> ---
>>>   xen/arch/arm/bootfdt.c | 16 ++++++++++++++++
>>>   1 file changed, 16 insertions(+)
>>>
>>> diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
>>> index 2673ad17a1..b46dde06a9 100644
>>> --- a/xen/arch/arm/bootfdt.c
>>> +++ b/xen/arch/arm/bootfdt.c
>>> @@ -206,11 +206,27 @@ int __init device_tree_for_each_node(const void *fdt, int node,
>>>       return 0;
>>>   }
>>>
>>> +static bool __init memory_node_is_available(const void *fdt, unsigned long node)
>> This function is not strictly related to memory node so it would be better to call it e.g. device_tree_node_is_available.
> 
> +1.
> 
>>> +{
>>> +    const char *status = fdt_getprop(fdt, node, "status", NULL);
>>> +
>>> +    if (!status)
>> white spaces between brackets and condition
>> if ( !status )
>>
>>> +        return true;
>>> +
>>> +    if (!strcmp(status, "ok") || !strcmp(status, "okay"))
>> white spaces between brackets and condition
>> if ( !strcmp(status, "ok") || !strcmp(status, "okay") )
>>
>>> +        return true;
>>> +
>>> +    return false;
>>> +}
>>> +
>>>   static int __init process_memory_node(const void *fdt, int node,
>>>                                         const char *name, int depth,
>>>                                         u32 address_cells, u32 size_cells,
>>>                                         void *data)
>>>   {
>>> +    if (!memory_node_is_available(fdt, node))
>>> +        return 0;
>> I would move this check to device_tree_get_meminfo()
> 
> I am ok with that. But the commit message would need to gain a paragraph
> explaining that we will now support "status" for static memory/heap.
> 
>>> +
>>>       return device_tree_get_meminfo(fdt, node, "reg", address_cells, size_cells,
>>>                                      data, MEMBANK_DEFAULT);
>>>   }
>>> --
>>> 2.39.2
>>>
>>>
>>
>> Also, I think it would be nice to add ASSERT(bootinfo.mem.nr_banks); e.g. in boot_fdt_info().
>> Otherwise the panic from Xen when there is no memory bank:
>> The frametable cannot cover the physical region ...
>> is not really meaningful for normal users.
>>
>> This is just my suggestion (@Julien ?)
> 
> I think a check for the number of banks makes sense. But I would prefer
> if the check also happens in production. So, something like:
> 
> if ( !bootinfo.mem.nr_banks )
>    panic(...);
> 
> We already have one in the setup_mm() for arm32. So we need another one
> for the arm64 version. The other solution is to consolidate it in one
> place you suggested.
> 
> I have a slightly preference for having it in setup_mm() even if this is
> duplicated.
Either way is fine. The advantage of placing the check in boot_fdt_info() is
that we can have a single check instead of duplicated and we do the check before
the "first" use which happens to be the banks sorting. The advantage of setup_mm()
is that it is the one dealing with memory banks and is called after early_print_info()
so user can see some additional info.

@Leo, will you send a patch for that? Don't feel obliged as it is not strictly related
to your patch in which case I can handle it.

~Michal
Leo Yan Sept. 28, 2023, 2:37 p.m. UTC | #10
Hi Michal, Julien,

On Wed, Sep 27, 2023 at 02:49:23PM +0200, Michal Orzel wrote:

[...]

> Either way is fine. The advantage of placing the check in boot_fdt_info() is
> that we can have a single check instead of duplicated and we do the check before
> the "first" use which happens to be the banks sorting. The advantage of setup_mm()
> is that it is the one dealing with memory banks and is called after early_print_info()
> so user can see some additional info.
> 
> @Leo, will you send a patch for that? Don't feel obliged as it is not strictly related
> to your patch in which case I can handle it.

@Michal, since you have much better sense than me for adding check for
memory banks, I'd like to leave it to you.

I just refined the patch v2 according to your comments and sent out
for review.

Thanks,
Leo
Michal Orzel Sept. 29, 2023, 8:13 a.m. UTC | #11
Hi Leo,

On 28/09/2023 16:37, Leo Yan wrote:
> 
> 
> Hi Michal, Julien,
> 
> On Wed, Sep 27, 2023 at 02:49:23PM +0200, Michal Orzel wrote:
> 
> [...]
> 
>> Either way is fine. The advantage of placing the check in boot_fdt_info() is
>> that we can have a single check instead of duplicated and we do the check before
>> the "first" use which happens to be the banks sorting. The advantage of setup_mm()
>> is that it is the one dealing with memory banks and is called after early_print_info()
>> so user can see some additional info.
>>
>> @Leo, will you send a patch for that? Don't feel obliged as it is not strictly related
>> to your patch in which case I can handle it.
> 
> @Michal, since you have much better sense than me for adding check for
> memory banks, I'd like to leave it to you.
Ok, no problem.

~Michal
diff mbox series

Patch

diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
index 2673ad17a1..b46dde06a9 100644
--- a/xen/arch/arm/bootfdt.c
+++ b/xen/arch/arm/bootfdt.c
@@ -206,11 +206,27 @@  int __init device_tree_for_each_node(const void *fdt, int node,
     return 0;
 }
 
+static bool __init memory_node_is_available(const void *fdt, unsigned long node)
+{
+    const char *status = fdt_getprop(fdt, node, "status", NULL);
+
+    if (!status)
+        return true;
+
+    if (!strcmp(status, "ok") || !strcmp(status, "okay"))
+        return true;
+
+    return false;
+}
+
 static int __init process_memory_node(const void *fdt, int node,
                                       const char *name, int depth,
                                       u32 address_cells, u32 size_cells,
                                       void *data)
 {
+    if (!memory_node_is_available(fdt, node))
+        return 0;
+
     return device_tree_get_meminfo(fdt, node, "reg", address_cells, size_cells,
                                    data, MEMBANK_DEFAULT);
 }