diff mbox series

[v4,09/17] xen/arm: introduce a helper to parse device tree NUMA distance map

Message ID 20230425075655.4037980-10-Henry.Wang@arm.com (mailing list archive)
State Superseded
Headers show
Series Device tree based NUMA support for Arm - Part#3 | expand

Commit Message

Henry Wang April 25, 2023, 7:56 a.m. UTC
From: Wei Chen <wei.chen@arm.com>

A NUMA aware device tree will provide a "distance-map" node to
describe distance between any two nodes. This patch introduce a
new helper to parse this distance map.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
---
v3 -> v4:
1. The distance map default value is now NUMA_NO_DISTANCE, update
   the logic accordingly and add in-code comment as a note.
v2 -> v3:
1. No change.
v1 -> v2:
1. Get rid of useless braces.
2. Use new NUMA status helper.
3. Use PRIu32 to replace u in print messages.
4. Fix opposite = __node_distance(to, from).
5. disable dtb numa info table when we find an invalid data
   in dtb.
---
 xen/arch/arm/numa_device_tree.c | 108 ++++++++++++++++++++++++++++++++
 1 file changed, 108 insertions(+)

Comments

Jan Beulich April 25, 2023, 8:34 a.m. UTC | #1
On 25.04.2023 09:56, Henry Wang wrote:
> From: Wei Chen <wei.chen@arm.com>
> 
> A NUMA aware device tree will provide a "distance-map" node to
> describe distance between any two nodes. This patch introduce a
> new helper to parse this distance map.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> Signed-off-by: Henry Wang <Henry.Wang@arm.com>

While trying to hunt down the caller(s) of numa_set_distance() in the
context to replying to patch 3, I came across this one:

> --- a/xen/arch/arm/numa_device_tree.c
> +++ b/xen/arch/arm/numa_device_tree.c
> @@ -151,3 +151,111 @@ invalid_data:
>      numa_fw_bad();
>      return -EINVAL;
>  }
> +
> +/* Parse NUMA distance map v1 */
> +static int __init fdt_parse_numa_distance_map_v1(const void *fdt, int node)
> +{
> +    const struct fdt_property *prop;
> +    const __be32 *matrix;
> +    unsigned int i, entry_count;
> +    int len;
> +
> +    printk(XENLOG_INFO "NUMA: parsing numa-distance-map\n");
> +
> +    prop = fdt_get_property(fdt, node, "distance-matrix", &len);
> +    if ( !prop )
> +    {
> +        printk(XENLOG_WARNING
> +               "NUMA: No distance-matrix property in distance-map\n");
> +        goto invalid_data;
> +    }
> +
> +    if ( len % sizeof(__be32) != 0 )
> +    {
> +        printk(XENLOG_WARNING
> +               "distance-matrix in node is not a multiple of u32\n");
> +        goto invalid_data;
> +    }
> +
> +    entry_count = len / sizeof(__be32);
> +    if ( entry_count == 0 )
> +    {
> +        printk(XENLOG_WARNING "NUMA: Invalid distance-matrix\n");
> +        goto invalid_data;
> +    }
> +
> +    matrix = (const __be32 *)prop->data;
> +    for ( i = 0; i + 2 < entry_count; i += 3 )
> +    {
> +        unsigned int from, to, distance, opposite;

With these ...

> +        from = dt_next_cell(1, &matrix);
> +        to = dt_next_cell(1, &matrix);
> +        distance = dt_next_cell(1, &matrix);
> +        if ( (from == to && distance != NUMA_LOCAL_DISTANCE) ||
> +             (from != to && distance <= NUMA_LOCAL_DISTANCE) )
> +        {
> +            printk(XENLOG_WARNING
> +                   "NUMA: Invalid distance: NODE#%"PRIu32"->NODE#%"PRIu32":%"PRIu32"\n",

... you don't mean PRIu32 here and ...

> +                   from, to, distance);
> +            goto invalid_data;
> +        }
> +
> +        printk(XENLOG_INFO "NUMA: distance: NODE#%"PRIu32"->NODE#%"PRIu32":%"PRIu32"\n",

... here and yet further down anymore. That'll at the same time shorten
all these lines quite a bit.

> +               from, to, distance);
> +
> +        /* Get opposite way distance */
> +        opposite = __node_distance(to, from);
> +        /* The default value in node_distance_map is NUMA_NO_DISTANCE */
> +        if ( opposite == NUMA_NO_DISTANCE )

And the matrix you're reading from can't hold NUMA_NO_DISTANCE entries?
I ask because you don't check this above; you only check against
NUMA_LOCAL_DISTANCE.

> +        {
> +            /* Bi-directions are not set, set both */
> +            numa_set_distance(from, to, distance);
> +            numa_set_distance(to, from, distance);
> +        }
> +        else
> +        {
> +            /*
> +             * Opposite way distance has been set to a different value.
> +             * It may be a firmware device tree bug?
> +             */
> +            if ( opposite != distance )
> +            {
> +                /*
> +                 * In device tree NUMA distance-matrix binding:
> +                 * https://www.kernel.org/doc/Documentation/devicetree/bindings/numa.txt
> +                 * There is a notes mentions:
> +                 * "Each entry represents distance from first node to
> +                 *  second node. The distances are equal in either
> +                 *  direction."
> +                 *
> +                 * That means device tree doesn't permit this case.
> +                 * But in ACPI spec, it cares to specifically permit this
> +                 * case:
> +                 * "Except for the relative distance from a System Locality
> +                 *  to itself, each relative distance is stored twice in the
> +                 *  matrix. This provides the capability to describe the
> +                 *  scenario where the relative distances for the two
> +                 *  directions between System Localities is different."
> +                 *
> +                 * That means a real machine allows such NUMA configuration.
> +                 * So, place a WARNING here to notice system administrators,
> +                 * is it the specail case that they hijack the device tree
> +                 * to support their rare machines?
> +                 */
> +                printk(XENLOG_WARNING
> +                       "Un-matched bi-direction! NODE#%"PRIu32"->NODE#%"PRIu32":%"PRIu32", NODE#%"PRIu32"->NODE#%"PRIu32":%"PRIu32"\n",
> +                       from, to, distance, to, from, opposite);
> +            }
> +
> +            /* Opposite way distance has been set, just set this way */
> +            numa_set_distance(from, to, distance);

It took me a while to understand what the comment is to tell me,
because in this iteration the opposite entry wasn't set. May I
suggest to make more explicit that you refer to an earlier iteration,
e.g. by "... was set before, ..."?

> +        }
> +    }
> +
> +    return 0;
> +
> +invalid_data:

Nit: Style (labels to be indented by [at least] one blank).

Jan
Jan Beulich April 25, 2023, 8:39 a.m. UTC | #2
On 25.04.2023 09:56, Henry Wang wrote:
> --- a/xen/arch/arm/numa_device_tree.c
> +++ b/xen/arch/arm/numa_device_tree.c
> @@ -151,3 +151,111 @@ invalid_data:
>      numa_fw_bad();
>      return -EINVAL;
>  }
> +
> +/* Parse NUMA distance map v1 */
> +static int __init fdt_parse_numa_distance_map_v1(const void *fdt, int node)
> +{
> +    const struct fdt_property *prop;
> +    const __be32 *matrix;
> +    unsigned int i, entry_count;
> +    int len;
> +
> +    printk(XENLOG_INFO "NUMA: parsing numa-distance-map\n");
> +
> +    prop = fdt_get_property(fdt, node, "distance-matrix", &len);
> +    if ( !prop )
> +    {
> +        printk(XENLOG_WARNING
> +               "NUMA: No distance-matrix property in distance-map\n");
> +        goto invalid_data;
> +    }
> +
> +    if ( len % sizeof(__be32) != 0 )
> +    {
> +        printk(XENLOG_WARNING
> +               "distance-matrix in node is not a multiple of u32\n");
> +        goto invalid_data;
> +    }
> +
> +    entry_count = len / sizeof(__be32);
> +    if ( entry_count == 0 )
> +    {
> +        printk(XENLOG_WARNING "NUMA: Invalid distance-matrix\n");
> +        goto invalid_data;
> +    }
> +
> +    matrix = (const __be32 *)prop->data;
> +    for ( i = 0; i + 2 < entry_count; i += 3 )
> +    {
> +        unsigned int from, to, distance, opposite;
> +
> +        from = dt_next_cell(1, &matrix);
> +        to = dt_next_cell(1, &matrix);
> +        distance = dt_next_cell(1, &matrix);

Upon second thought I checked what dt_next_cell() returns: You're silently
truncating here and then ...

> +        if ( (from == to && distance != NUMA_LOCAL_DISTANCE) ||
> +             (from != to && distance <= NUMA_LOCAL_DISTANCE) )
> +        {
> +            printk(XENLOG_WARNING
> +                   "NUMA: Invalid distance: NODE#%"PRIu32"->NODE#%"PRIu32":%"PRIu32"\n",
> +                   from, to, distance);
> +            goto invalid_data;
> +        }
> +
> +        printk(XENLOG_INFO "NUMA: distance: NODE#%"PRIu32"->NODE#%"PRIu32":%"PRIu32"\n",
> +               from, to, distance);
> +
> +        /* Get opposite way distance */
> +        opposite = __node_distance(to, from);
> +        /* The default value in node_distance_map is NUMA_NO_DISTANCE */
> +        if ( opposite == NUMA_NO_DISTANCE )
> +        {
> +            /* Bi-directions are not set, set both */
> +            numa_set_distance(from, to, distance);
> +            numa_set_distance(to, from, distance);

... here again. Is that really the intention?

Jan
Henry Wang April 26, 2023, 5:33 a.m. UTC | #3
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Subject: Re: [PATCH v4 09/17] xen/arm: introduce a helper to parse device
> tree NUMA distance map
> 
> > +        unsigned int from, to, distance, opposite;
> 
> With these ...
> 
> > +        from = dt_next_cell(1, &matrix);
> > +        to = dt_next_cell(1, &matrix);
> > +        distance = dt_next_cell(1, &matrix);
> > +        if ( (from == to && distance != NUMA_LOCAL_DISTANCE) ||
> > +             (from != to && distance <= NUMA_LOCAL_DISTANCE) )
> > +        {
> > +            printk(XENLOG_WARNING
> > +                   "NUMA: Invalid distance: NODE#%"PRIu32"-
> >NODE#%"PRIu32":%"PRIu32"\n",
> 
> ... you don't mean PRIu32 here and ...
> 
> > +                   from, to, distance);
> > +            goto invalid_data;
> > +        }
> > +
> > +        printk(XENLOG_INFO "NUMA: distance: NODE#%"PRIu32"-
> >NODE#%"PRIu32":%"PRIu32"\n",
> 
> ... here and yet further down anymore. That'll at the same time shorten
> all these lines quite a bit.

I did a little bit archeology to check the discussion back to 2 years ago when
Wei originally sent the v1 to the mailing list. Changing %u to %"PRIu32" was
one of the comment at that time [1] when these variables were defined as
uint32_t. But now with these variables being unsigned int I think now %u is
a more proper way here. I will do the according changes in v5.

> 
> > +               from, to, distance);
> > +
> > +        /* Get opposite way distance */
> > +        opposite = __node_distance(to, from);
> > +        /* The default value in node_distance_map is NUMA_NO_DISTANCE
> */
> > +        if ( opposite == NUMA_NO_DISTANCE )
> 
> And the matrix you're reading from can't hold NUMA_NO_DISTANCE entries?
> I ask because you don't check this above; you only check against
> NUMA_LOCAL_DISTANCE.

My understanding for the purpose of this part of code is to check if the opposite
way distance has already been set, so we need to compare the opposite way
distance with the default value NUMA_NO_DISTANCE here.

Back to your question, I can see your point of the question. However I don't think
NUMA_NO_DISTANCE is a valid value to describe the node distance in the device
tree. This is because I hunted down the previous discussions and found [2] about
we should try to keep consistent between the value used in device tree and ACPI
tables. From the ACPI spec, 0xFF, i.e. NUMA_NO_DISTANCE means unreachable.
I think this is also the reason why NUMA_NO_DISTANCE can be used as the default
value of the distance map, otherwise we won't have any value to use.

> 
> > +        {
> > +            /* Bi-directions are not set, set both */
> > +            numa_set_distance(from, to, distance);
> > +            numa_set_distance(to, from, distance);
> > +        }
> > +        else
> > +        {
> > +            /*
> > +             * Opposite way distance has been set to a different value.
> > +             * It may be a firmware device tree bug?
> > +             */
> > +            if ( opposite != distance )
> > +            {
> > +                /*
> > +                 * In device tree NUMA distance-matrix binding:
> > +                 *
> https://www.kernel.org/doc/Documentation/devicetree/bindings/numa.txt
> > +                 * There is a notes mentions:
> > +                 * "Each entry represents distance from first node to
> > +                 *  second node. The distances are equal in either
> > +                 *  direction."
> > +                 *
> > +                 * That means device tree doesn't permit this case.
> > +                 * But in ACPI spec, it cares to specifically permit this
> > +                 * case:
> > +                 * "Except for the relative distance from a System Locality
> > +                 *  to itself, each relative distance is stored twice in the
> > +                 *  matrix. This provides the capability to describe the
> > +                 *  scenario where the relative distances for the two
> > +                 *  directions between System Localities is different."
> > +                 *
> > +                 * That means a real machine allows such NUMA configuration.
> > +                 * So, place a WARNING here to notice system administrators,
> > +                 * is it the specail case that they hijack the device tree
> > +                 * to support their rare machines?
> > +                 */
> > +                printk(XENLOG_WARNING
> > +                       "Un-matched bi-direction! NODE#%"PRIu32"-
> >NODE#%"PRIu32":%"PRIu32", NODE#%"PRIu32"-
> >NODE#%"PRIu32":%"PRIu32"\n",
> > +                       from, to, distance, to, from, opposite);
> > +            }
> > +
> > +            /* Opposite way distance has been set, just set this way */
> > +            numa_set_distance(from, to, distance);
> 
> It took me a while to understand what the comment is to tell me,
> because in this iteration the opposite entry wasn't set. May I
> suggest to make more explicit that you refer to an earlier iteration,
> e.g. by "... was set before, ..."?

Yes, thanks for pointing this out. I will make the comment more explicit
as you suggested.

> 
> > +        }
> > +    }
> > +
> > +    return 0;
> > +
> > +invalid_data:
> 
> Nit: Style (labels to be indented by [at least] one blank).

Sure. Will add a space before.

[1] https://lists.xenproject.org/archives/html/xen-devel/2021-09/msg02066.html
[2] https://lists.xenproject.org/archives/html/xen-devel/2023-01/msg00690.html

Kind regards,
Henry

> 
> Jan
Henry Wang April 26, 2023, 6:29 a.m. UTC | #4
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Subject: Re: [PATCH v4 09/17] xen/arm: introduce a helper to parse device
> tree NUMA distance map
> 
> > +        distance = dt_next_cell(1, &matrix);
> 
> Upon second thought I checked what dt_next_cell() returns: You're silently
> truncating here and then ...
> 
> > +            /* Bi-directions are not set, set both */
> > +            numa_set_distance(from, to, distance);
> > +            numa_set_distance(to, from, distance);
> 
> ... here again. Is that really the intention?

By hunting down the historical discussions I found that using dt_next_cell() is
what Julien suggested 2 years ago in the RFC series [1]. Given the truncation
here is for node id (from/to) and distance which I am pretty sure will not
exceed 32-bit range, I think the silent truncation is safe.

However I understand your point here, the silent truncation is not ideal, so
I wonder if you have any suggestions to improve, do you think I should change
these variables to u64 or maybe I need to do the explicit type cast or any
better suggestions from you? Thanks!

[1] https://lists.xenproject.org/archives/html/xen-devel/2021-08/msg01175.html

Kind regards,
Henry

> 
> Jan
Jan Beulich April 26, 2023, 6:56 a.m. UTC | #5
On 26.04.2023 07:33, Henry Wang wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>>
>>> +        /* Get opposite way distance */
>>> +        opposite = __node_distance(to, from);
>>> +        /* The default value in node_distance_map is NUMA_NO_DISTANCE
>> */
>>> +        if ( opposite == NUMA_NO_DISTANCE )
>>
>> And the matrix you're reading from can't hold NUMA_NO_DISTANCE entries?
>> I ask because you don't check this above; you only check against
>> NUMA_LOCAL_DISTANCE.
> 
> My understanding for the purpose of this part of code is to check if the opposite
> way distance has already been set, so we need to compare the opposite way
> distance with the default value NUMA_NO_DISTANCE here.
> 
> Back to your question, I can see your point of the question. However I don't think
> NUMA_NO_DISTANCE is a valid value to describe the node distance in the device
> tree. This is because I hunted down the previous discussions and found [2] about
> we should try to keep consistent between the value used in device tree and ACPI
> tables. From the ACPI spec, 0xFF, i.e. NUMA_NO_DISTANCE means unreachable.
> I think this is also the reason why NUMA_NO_DISTANCE can be used as the default
> value of the distance map, otherwise we won't have any value to use.

The [2] link you provided discusses NUMA_LOCAL_DISTANCE. Looking at
Linux'es Documentation/devicetree/numa.txt, there's no mention of an
upper bound on the distance values. It only says that on the diagonal
entries should be 10 (i.e. matching ACPI, without really saying so).

Jan
Jan Beulich April 26, 2023, 7:07 a.m. UTC | #6
On 26.04.2023 08:29, Henry Wang wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>>
>>> +        distance = dt_next_cell(1, &matrix);
>>
>> Upon second thought I checked what dt_next_cell() returns: You're silently
>> truncating here and then ...
>>
>>> +            /* Bi-directions are not set, set both */
>>> +            numa_set_distance(from, to, distance);
>>> +            numa_set_distance(to, from, distance);
>>
>> ... here again. Is that really the intention?
> 
> By hunting down the historical discussions I found that using dt_next_cell() is
> what Julien suggested 2 years ago in the RFC series [1]. Given the truncation
> here is for node id (from/to) and distance which I am pretty sure will not
> exceed 32-bit range, I think the silent truncation is safe.

That discussion is orthogonal; the previously used dt_read_number() is no
different in the regard I'm referring to.

> However I understand your point here, the silent truncation is not ideal, so
> I wonder if you have any suggestions to improve, do you think I should change
> these variables to u64 or maybe I need to do the explicit type cast or any
> better suggestions from you? Thanks!

So one thing I overlooked is that by passing 1 as the first argument, you
only request a 32-bit value. Hence there's no (silent) truncation then on
the dt_next_cell() uses. But the numa_set_distance() calls still truncate
to 8 bits. Adding explicit type casts won't help at all - truncation will
remain as silent as it was before. However, numa_set_distance()'s first
two arguments could easily become "unsigned int", resulting in its node
related bounds checking to take care of all truncation issues. The
"distance" parameter already is unsigned int, and is already being bounds
checked against NUMA_NO_DISTANCE.

Jan

> [1] https://lists.xenproject.org/archives/html/xen-devel/2021-08/msg01175.html
> 
> Kind regards,
> Henry
Henry Wang April 26, 2023, 7:08 a.m. UTC | #7
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Subject: Re: [PATCH v4 09/17] xen/arm: introduce a helper to parse device
> tree NUMA distance map
> 
> On 26.04.2023 07:33, Henry Wang wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >>
> >>> +        /* Get opposite way distance */
> >>> +        opposite = __node_distance(to, from);
> >>> +        /* The default value in node_distance_map is
> NUMA_NO_DISTANCE
> >> */
> >>> +        if ( opposite == NUMA_NO_DISTANCE )
> >>
> >> And the matrix you're reading from can't hold NUMA_NO_DISTANCE
> entries?
> >> I ask because you don't check this above; you only check against
> >> NUMA_LOCAL_DISTANCE.
> >
> > My understanding for the purpose of this part of code is to check if the
> opposite
> > way distance has already been set, so we need to compare the opposite
> way
> > distance with the default value NUMA_NO_DISTANCE here.
> >
> > Back to your question, I can see your point of the question. However I don't
> think
> > NUMA_NO_DISTANCE is a valid value to describe the node distance in the
> device
> > tree. This is because I hunted down the previous discussions and found [2]
> about
> > we should try to keep consistent between the value used in device tree and
> ACPI
> > tables. From the ACPI spec, 0xFF, i.e. NUMA_NO_DISTANCE means
> unreachable.
> > I think this is also the reason why NUMA_NO_DISTANCE can be used as the
> default
> > value of the distance map, otherwise we won't have any value to use.
> 
> The [2] link you provided discusses NUMA_LOCAL_DISTANCE.

I inferred the discussion as "we should try to keep consistent between the value
used in device tree and ACPI tables". Maybe my inference is wrong.

> Looking at
> Linux'es Documentation/devicetree/numa.txt, there's no mention of an
> upper bound on the distance values. It only says that on the diagonal
> entries should be 10 (i.e. matching ACPI, without really saying so).

I agree that the NUMA device tree binding is a little bit vague. So I cannot
say the case that you provided is not valid. I would like to ask Arm maintainers
(putting them into To:) opinion on this as I think I am not the one to decide the
expected behavior on Arm.

Bertrand/Julien/Stefano: Would you please kindly share your opinion on which
value should be used as the default value of the node distance map? Do you
think reusing the "unreachable" distance, i.e. 0xFF, as the default node distance
is acceptable here? Thanks!

Kind regards,
Henry

> 
> Jan
Jan Beulich April 26, 2023, 7:12 a.m. UTC | #8
On 26.04.2023 09:08, Henry Wang wrote:
> Hi Jan,
> 
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Subject: Re: [PATCH v4 09/17] xen/arm: introduce a helper to parse device
>> tree NUMA distance map
>>
>> On 26.04.2023 07:33, Henry Wang wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>
>>>>> +        /* Get opposite way distance */
>>>>> +        opposite = __node_distance(to, from);
>>>>> +        /* The default value in node_distance_map is
>> NUMA_NO_DISTANCE
>>>> */
>>>>> +        if ( opposite == NUMA_NO_DISTANCE )
>>>>
>>>> And the matrix you're reading from can't hold NUMA_NO_DISTANCE
>> entries?
>>>> I ask because you don't check this above; you only check against
>>>> NUMA_LOCAL_DISTANCE.
>>>
>>> My understanding for the purpose of this part of code is to check if the
>> opposite
>>> way distance has already been set, so we need to compare the opposite
>> way
>>> distance with the default value NUMA_NO_DISTANCE here.
>>>
>>> Back to your question, I can see your point of the question. However I don't
>> think
>>> NUMA_NO_DISTANCE is a valid value to describe the node distance in the
>> device
>>> tree. This is because I hunted down the previous discussions and found [2]
>> about
>>> we should try to keep consistent between the value used in device tree and
>> ACPI
>>> tables. From the ACPI spec, 0xFF, i.e. NUMA_NO_DISTANCE means
>> unreachable.
>>> I think this is also the reason why NUMA_NO_DISTANCE can be used as the
>> default
>>> value of the distance map, otherwise we won't have any value to use.
>>
>> The [2] link you provided discusses NUMA_LOCAL_DISTANCE.
> 
> I inferred the discussion as "we should try to keep consistent between the value
> used in device tree and ACPI tables". Maybe my inference is wrong.
> 
>> Looking at
>> Linux'es Documentation/devicetree/numa.txt, there's no mention of an
>> upper bound on the distance values. It only says that on the diagonal
>> entries should be 10 (i.e. matching ACPI, without really saying so).
> 
> I agree that the NUMA device tree binding is a little bit vague. So I cannot
> say the case that you provided is not valid. I would like to ask Arm maintainers
> (putting them into To:) opinion on this as I think I am not the one to decide the
> expected behavior on Arm.
> 
> Bertrand/Julien/Stefano: Would you please kindly share your opinion on which
> value should be used as the default value of the node distance map? Do you
> think reusing the "unreachable" distance, i.e. 0xFF, as the default node distance
> is acceptable here? Thanks!

My suggestion would be to, rather than rejecting values >= 0xff, to saturate
at 0xfe, while keeping 0xff for NUMA_NO_DISTANCE (and overall keeping things
consistent with ACPI).

Jan
Henry Wang April 26, 2023, 7:36 a.m. UTC | #9
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Subject: Re: [PATCH v4 09/17] xen/arm: introduce a helper to parse device
> tree NUMA distance map
> 
> On 26.04.2023 08:29, Henry Wang wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >>
> >>> +        distance = dt_next_cell(1, &matrix);
> >>
> >> Upon second thought I checked what dt_next_cell() returns: You're silently
> >> truncating here and then ...
> >>
> >>> +            /* Bi-directions are not set, set both */
> >>> +            numa_set_distance(from, to, distance);
> >>> +            numa_set_distance(to, from, distance);
> >>
> >> ... here again. Is that really the intention?
> >
> > By hunting down the historical discussions I found that using dt_next_cell()
> is
> > what Julien suggested 2 years ago in the RFC series [1]. Given the truncation
> > here is for node id (from/to) and distance which I am pretty sure will not
> > exceed 32-bit range, I think the silent truncation is safe.
> 
> That discussion is orthogonal; the previously used dt_read_number() is no
> different in the regard I'm referring to.
> 
> > However I understand your point here, the silent truncation is not ideal, so
> > I wonder if you have any suggestions to improve, do you think I should
> change
> > these variables to u64 or maybe I need to do the explicit type cast or any
> > better suggestions from you? Thanks!
> 
> So one thing I overlooked is that by passing 1 as the first argument, you
> only request a 32-bit value. Hence there's no (silent) truncation then on
> the dt_next_cell() uses. But the numa_set_distance() calls still truncate
> to 8 bits. Adding explicit type casts won't help at all - truncation will
> remain as silent as it was before. However, numa_set_distance()'s first
> two arguments could easily become "unsigned int", resulting in its node
> related bounds checking to take care of all truncation issues. The
> "distance" parameter already is unsigned int, and is already being bounds
> checked against NUMA_NO_DISTANCE.

Great points! Thanks for pointing the 8-bit truncation out. You are correct.
Somehow my impression of numa_set_distance()'s first two arguments are
already "unsigned int" so I missed this part...Sorry.

In that case, I think I will add a check between "from, to" and MAX_NUMNODES
as soon as the values of "from" and "to" are populated by dt_next_cell().
Hopefully this will address your concern.

Kind regards,
Henry

> 
> Jan
> 
> > [1] https://lists.xenproject.org/archives/html/xen-devel/2021-
> 08/msg01175.html
> >
> > Kind regards,
> > Henry
Jan Beulich April 26, 2023, 8:15 a.m. UTC | #10
On 26.04.2023 09:36, Henry Wang wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>>
>> On 26.04.2023 08:29, Henry Wang wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>
>>>>> +        distance = dt_next_cell(1, &matrix);
>>>>
>>>> Upon second thought I checked what dt_next_cell() returns: You're silently
>>>> truncating here and then ...
>>>>
>>>>> +            /* Bi-directions are not set, set both */
>>>>> +            numa_set_distance(from, to, distance);
>>>>> +            numa_set_distance(to, from, distance);
>>>>
>>>> ... here again. Is that really the intention?
>>>
>>> By hunting down the historical discussions I found that using dt_next_cell()
>> is
>>> what Julien suggested 2 years ago in the RFC series [1]. Given the truncation
>>> here is for node id (from/to) and distance which I am pretty sure will not
>>> exceed 32-bit range, I think the silent truncation is safe.
>>
>> That discussion is orthogonal; the previously used dt_read_number() is no
>> different in the regard I'm referring to.
>>
>>> However I understand your point here, the silent truncation is not ideal, so
>>> I wonder if you have any suggestions to improve, do you think I should
>> change
>>> these variables to u64 or maybe I need to do the explicit type cast or any
>>> better suggestions from you? Thanks!
>>
>> So one thing I overlooked is that by passing 1 as the first argument, you
>> only request a 32-bit value. Hence there's no (silent) truncation then on
>> the dt_next_cell() uses. But the numa_set_distance() calls still truncate
>> to 8 bits. Adding explicit type casts won't help at all - truncation will
>> remain as silent as it was before. However, numa_set_distance()'s first
>> two arguments could easily become "unsigned int", resulting in its node
>> related bounds checking to take care of all truncation issues. The
>> "distance" parameter already is unsigned int, and is already being bounds
>> checked against NUMA_NO_DISTANCE.
> 
> Great points! Thanks for pointing the 8-bit truncation out. You are correct.
> Somehow my impression of numa_set_distance()'s first two arguments are
> already "unsigned int" so I missed this part...Sorry.
> 
> In that case, I think I will add a check between "from, to" and MAX_NUMNODES
> as soon as the values of "from" and "to" are populated by dt_next_cell().
> Hopefully this will address your concern.

While this would address by concern, I don't see why you want to repeat
the checking that numa_set_distance() already does.

Jan
Henry Wang April 26, 2023, 8:29 a.m. UTC | #11
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Subject: Re: [PATCH v4 09/17] xen/arm: introduce a helper to parse device
> tree NUMA distance map
> 
> > Great points! Thanks for pointing the 8-bit truncation out. You are correct.
> > Somehow my impression of numa_set_distance()'s first two arguments are
> > already "unsigned int" so I missed this part...Sorry.
> >
> > In that case, I think I will add a check between "from, to" and
> MAX_NUMNODES
> > as soon as the values of "from" and "to" are populated by dt_next_cell().
> > Hopefully this will address your concern.
> 
> While this would address by concern, I don't see why you want to repeat
> the checking that numa_set_distance() already does.

Correct, I think I would better to move the check in numa_set_distance() to
the caller fdt_parse_numa_distance_map_v1() as I believe if the truncation
really happens it is too late to check in numa_set_distance().

Kind regards,
Henry

> 
> Jan
Henry Wang April 26, 2023, 8:56 a.m. UTC | #12
Hi Jan

> -----Original Message-----
> From: Henry Wang
> Subject: RE: [PATCH v4 09/17] xen/arm: introduce a helper to parse device
> tree NUMA distance map
> 
> Hi Jan,
> 
> > -----Original Message-----
> > From: Jan Beulich <jbeulich@suse.com>
> > Subject: Re: [PATCH v4 09/17] xen/arm: introduce a helper to parse device
> > tree NUMA distance map
> >
> > > Great points! Thanks for pointing the 8-bit truncation out. You are correct.
> > > Somehow my impression of numa_set_distance()'s first two arguments
> are
> > > already "unsigned int" so I missed this part...Sorry.
> > >
> > > In that case, I think I will add a check between "from, to" and
> > MAX_NUMNODES
> > > as soon as the values of "from" and "to" are populated by dt_next_cell().
> > > Hopefully this will address your concern.
> >
> > While this would address by concern, I don't see why you want to repeat
> > the checking that numa_set_distance() already does.
> 
> Correct, I think I would better to move the check in numa_set_distance() to
> the caller fdt_parse_numa_distance_map_v1() as I believe if the truncation
> really happens it is too late to check in numa_set_distance().

On second thought, maybe even remove the same check in __node_distance()
if we do the check in the caller, as I believe they will suffer the same problem...

Kind regards,
Henry

> 
> Kind regards,
> Henry
> 
> >
> > Jan
Henry Wang May 23, 2023, 6:31 a.m. UTC | #13
Hi Jan,

> -----Original Message-----
> Subject: Re: [PATCH v4 09/17] xen/arm: introduce a helper to parse device
> tree NUMA distance map
> 
> >>>>> +        /* The default value in node_distance_map is
> >> NUMA_NO_DISTANCE
> >>>> */
> >>>>> +        if ( opposite == NUMA_NO_DISTANCE )
> >>>>
> >>>> And the matrix you're reading from can't hold NUMA_NO_DISTANCE
> >> entries?
> >>>> I ask because you don't check this above; you only check against
> >>>> NUMA_LOCAL_DISTANCE.
> >>>
> >>> My understanding for the purpose of this part of code is to check if the
> >> opposite
> >>> way distance has already been set, so we need to compare the opposite
> >> way
> >>> distance with the default value NUMA_NO_DISTANCE here.
> >>>
> >>> Back to your question, I can see your point of the question. However I
> don't
> >> think
> >>> NUMA_NO_DISTANCE is a valid value to describe the node distance in the
> >> device
> >>> tree. This is because I hunted down the previous discussions and found
> [2]
> >> about
> >>> we should try to keep consistent between the value used in device tree
> and
> >> ACPI
> >>> tables. From the ACPI spec, 0xFF, i.e. NUMA_NO_DISTANCE means
> >> unreachable.
> >>> I think this is also the reason why NUMA_NO_DISTANCE can be used as
> the
> >> default
> >>> value of the distance map, otherwise we won't have any value to use.
> >>
> >> The [2] link you provided discusses NUMA_LOCAL_DISTANCE.
> >
> > I inferred the discussion as "we should try to keep consistent between the
> value
> > used in device tree and ACPI tables". Maybe my inference is wrong.
> >
> >> Looking at
> >> Linux'es Documentation/devicetree/numa.txt, there's no mention of an
> >> upper bound on the distance values. It only says that on the diagonal
> >> entries should be 10 (i.e. matching ACPI, without really saying so).
> >
> > I agree that the NUMA device tree binding is a little bit vague. So I cannot
> > say the case that you provided is not valid. I would like to ask Arm
> maintainers
> > (putting them into To:) opinion on this as I think I am not the one to decide
> the
> > expected behavior on Arm.
> >
> > Bertrand/Julien/Stefano: Would you please kindly share your opinion on
> which
> > value should be used as the default value of the node distance map? Do
> you
> > think reusing the "unreachable" distance, i.e. 0xFF, as the default node
> distance
> > is acceptable here? Thanks!
> 
> My suggestion would be to, rather than rejecting values >= 0xff, to saturate
> at 0xfe, while keeping 0xff for NUMA_NO_DISTANCE (and overall keeping
> things
> consistent with ACPI).

Since it has been a while and there were no feedback from Arm maintainers,
I would like to follow your suggestions for v5. However while I was writing the
code for the "saturation", i.e., adding below snippet in numa_set_distance():
```
if ( distance > NUMA_NO_DISTANCE )
        distance = NUMA_MAX_DISTANCE;
```
I noticed an issue which needs your clarification:
Currently, the default value of the distance map is NUMA_NO_DISTANCE,
which indicates the nodes are not reachable. IMHO, if in device tree, we do
saturations like above for ACPI invalid distances (distances >= 0xff), by saturating
the distance to 0xfe, we are making the unreachable nodes to reachable. I am
not sure if this will lead to problems. Do you have any more thoughts? Thanks!

Kind regards,
Henry

> 
> Jan
Jan Beulich May 23, 2023, 6:37 a.m. UTC | #14
On 23.05.2023 08:31, Henry Wang wrote:
>> -----Original Message-----
>> Subject: Re: [PATCH v4 09/17] xen/arm: introduce a helper to parse device
>> tree NUMA distance map
>>
>>>>>>> +        /* The default value in node_distance_map is
>>>> NUMA_NO_DISTANCE
>>>>>> */
>>>>>>> +        if ( opposite == NUMA_NO_DISTANCE )
>>>>>>
>>>>>> And the matrix you're reading from can't hold NUMA_NO_DISTANCE
>>>> entries?
>>>>>> I ask because you don't check this above; you only check against
>>>>>> NUMA_LOCAL_DISTANCE.
>>>>>
>>>>> My understanding for the purpose of this part of code is to check if the
>>>> opposite
>>>>> way distance has already been set, so we need to compare the opposite
>>>> way
>>>>> distance with the default value NUMA_NO_DISTANCE here.
>>>>>
>>>>> Back to your question, I can see your point of the question. However I
>> don't
>>>> think
>>>>> NUMA_NO_DISTANCE is a valid value to describe the node distance in the
>>>> device
>>>>> tree. This is because I hunted down the previous discussions and found
>> [2]
>>>> about
>>>>> we should try to keep consistent between the value used in device tree
>> and
>>>> ACPI
>>>>> tables. From the ACPI spec, 0xFF, i.e. NUMA_NO_DISTANCE means
>>>> unreachable.
>>>>> I think this is also the reason why NUMA_NO_DISTANCE can be used as
>> the
>>>> default
>>>>> value of the distance map, otherwise we won't have any value to use.
>>>>
>>>> The [2] link you provided discusses NUMA_LOCAL_DISTANCE.
>>>
>>> I inferred the discussion as "we should try to keep consistent between the
>> value
>>> used in device tree and ACPI tables". Maybe my inference is wrong.
>>>
>>>> Looking at
>>>> Linux'es Documentation/devicetree/numa.txt, there's no mention of an
>>>> upper bound on the distance values. It only says that on the diagonal
>>>> entries should be 10 (i.e. matching ACPI, without really saying so).
>>>
>>> I agree that the NUMA device tree binding is a little bit vague. So I cannot
>>> say the case that you provided is not valid. I would like to ask Arm
>> maintainers
>>> (putting them into To:) opinion on this as I think I am not the one to decide
>> the
>>> expected behavior on Arm.
>>>
>>> Bertrand/Julien/Stefano: Would you please kindly share your opinion on
>> which
>>> value should be used as the default value of the node distance map? Do
>> you
>>> think reusing the "unreachable" distance, i.e. 0xFF, as the default node
>> distance
>>> is acceptable here? Thanks!
>>
>> My suggestion would be to, rather than rejecting values >= 0xff, to saturate
>> at 0xfe, while keeping 0xff for NUMA_NO_DISTANCE (and overall keeping
>> things
>> consistent with ACPI).
> 
> Since it has been a while and there were no feedback from Arm maintainers,
> I would like to follow your suggestions for v5. However while I was writing the
> code for the "saturation", i.e., adding below snippet in numa_set_distance():
> ```
> if ( distance > NUMA_NO_DISTANCE )
>         distance = NUMA_MAX_DISTANCE;
> ```
> I noticed an issue which needs your clarification:
> Currently, the default value of the distance map is NUMA_NO_DISTANCE,
> which indicates the nodes are not reachable. IMHO, if in device tree, we do
> saturations like above for ACPI invalid distances (distances >= 0xff), by saturating
> the distance to 0xfe, we are making the unreachable nodes to reachable. I am
> not sure if this will lead to problems. Do you have any more thoughts? Thanks!

I can only answer this with a question back: How is "unreachable" represented
in DT? Or is "unreachable" simply expressed by the absence of any data?

Jan
Henry Wang May 23, 2023, 7:32 a.m. UTC | #15
Hi Jan,

> -----Original Message-----
> Subject: Re: [PATCH v4 09/17] xen/arm: introduce a helper to parse device
> tree NUMA distance map
> 
> >>>> The [2] link you provided discusses NUMA_LOCAL_DISTANCE.
> >>>
> >>> I inferred the discussion as "we should try to keep consistent between the
> >> value
> >>> used in device tree and ACPI tables". Maybe my inference is wrong.
> >>>
> >>>> Looking at
> >>>> Linux'es Documentation/devicetree/numa.txt, there's no mention of an
> >>>> upper bound on the distance values. It only says that on the diagonal
> >>>> entries should be 10 (i.e. matching ACPI, without really saying so).
> >>>
> >>> I agree that the NUMA device tree binding is a little bit vague. So I cannot
> >>> say the case that you provided is not valid. I would like to ask Arm
> >> maintainers
> >>> (putting them into To:) opinion on this as I think I am not the one to
> decide
> >> the
> >>> expected behavior on Arm.
> >>>
> >>> Bertrand/Julien/Stefano: Would you please kindly share your opinion on
> >> which
> >>> value should be used as the default value of the node distance map? Do
> >> you
> >>> think reusing the "unreachable" distance, i.e. 0xFF, as the default node
> >> distance
> >>> is acceptable here? Thanks!
> >>
> >> My suggestion would be to, rather than rejecting values >= 0xff, to saturate
> >> at 0xfe, while keeping 0xff for NUMA_NO_DISTANCE (and overall keeping
> >> things
> >> consistent with ACPI).
> >
> > Since it has been a while and there were no feedback from Arm
> maintainers,
> > I would like to follow your suggestions for v5. However while I was writing
> the
> > code for the "saturation", i.e., adding below snippet in
> numa_set_distance():
> > ```
> > if ( distance > NUMA_NO_DISTANCE )
> >         distance = NUMA_MAX_DISTANCE;
> > ```
> > I noticed an issue which needs your clarification:
> > Currently, the default value of the distance map is NUMA_NO_DISTANCE,
> > which indicates the nodes are not reachable. IMHO, if in device tree, we do
> > saturations like above for ACPI invalid distances (distances >= 0xff), by
> saturating
> > the distance to 0xfe, we are making the unreachable nodes to reachable. I
> am
> > not sure if this will lead to problems. Do you have any more thoughts?
> Thanks!
> 
> I can only answer this with a question back: How is "unreachable"
> represented
> in DT? 

Exactly, that is also what we I am trying to find but failed. My understanding
is that, the spec of NUMA is defined in the ACPI, and the DT NUMA binding
only specifies how users can use DT to represent the same set of ACPI data,
instead of define another standard.

By looking into the existing implementation of NUMA for DT,
in Linux, from drivers/base/arch_numa.c: numa_set_distance(), there is a
"if ((u8)distance != distance)" then return. So I think at least in Linux the
distance value is consistent with the ACPI spec. 

> Or is "unreachable" simply expressed by the absence of any data?

Maybe I am wrong but I don't think so, as in the Linux implementation,
drivers/of/of_numa.c: of_numa_parse_distance_map_v1(), the for loop
"for (i = 0; i + 2 < entry_count; i += 3)" basically implies no fields should
be omitted in the distance map entry.

Kind regards,
Henry

> 
> Jan
Jan Beulich May 23, 2023, 10:05 a.m. UTC | #16
On 23.05.2023 09:32, Henry Wang wrote:
>> -----Original Message-----
>> Subject: Re: [PATCH v4 09/17] xen/arm: introduce a helper to parse device
>> tree NUMA distance map
>>
>>>>>> The [2] link you provided discusses NUMA_LOCAL_DISTANCE.
>>>>>
>>>>> I inferred the discussion as "we should try to keep consistent between the
>>>> value
>>>>> used in device tree and ACPI tables". Maybe my inference is wrong.
>>>>>
>>>>>> Looking at
>>>>>> Linux'es Documentation/devicetree/numa.txt, there's no mention of an
>>>>>> upper bound on the distance values. It only says that on the diagonal
>>>>>> entries should be 10 (i.e. matching ACPI, without really saying so).
>>>>>
>>>>> I agree that the NUMA device tree binding is a little bit vague. So I cannot
>>>>> say the case that you provided is not valid. I would like to ask Arm
>>>> maintainers
>>>>> (putting them into To:) opinion on this as I think I am not the one to
>> decide
>>>> the
>>>>> expected behavior on Arm.
>>>>>
>>>>> Bertrand/Julien/Stefano: Would you please kindly share your opinion on
>>>> which
>>>>> value should be used as the default value of the node distance map? Do
>>>> you
>>>>> think reusing the "unreachable" distance, i.e. 0xFF, as the default node
>>>> distance
>>>>> is acceptable here? Thanks!
>>>>
>>>> My suggestion would be to, rather than rejecting values >= 0xff, to saturate
>>>> at 0xfe, while keeping 0xff for NUMA_NO_DISTANCE (and overall keeping
>>>> things
>>>> consistent with ACPI).
>>>
>>> Since it has been a while and there were no feedback from Arm
>> maintainers,
>>> I would like to follow your suggestions for v5. However while I was writing
>> the
>>> code for the "saturation", i.e., adding below snippet in
>> numa_set_distance():
>>> ```
>>> if ( distance > NUMA_NO_DISTANCE )
>>>         distance = NUMA_MAX_DISTANCE;
>>> ```
>>> I noticed an issue which needs your clarification:
>>> Currently, the default value of the distance map is NUMA_NO_DISTANCE,
>>> which indicates the nodes are not reachable. IMHO, if in device tree, we do
>>> saturations like above for ACPI invalid distances (distances >= 0xff), by
>> saturating
>>> the distance to 0xfe, we are making the unreachable nodes to reachable. I
>> am
>>> not sure if this will lead to problems. Do you have any more thoughts?
>> Thanks!
>>
>> I can only answer this with a question back: How is "unreachable"
>> represented
>> in DT? 
> 
> Exactly, that is also what we I am trying to find but failed. My understanding
> is that, the spec of NUMA is defined in the ACPI, and the DT NUMA binding
> only specifies how users can use DT to represent the same set of ACPI data,
> instead of define another standard.
> 
> By looking into the existing implementation of NUMA for DT,
> in Linux, from drivers/base/arch_numa.c: numa_set_distance(), there is a
> "if ((u8)distance != distance)" then return. So I think at least in Linux the
> distance value is consistent with the ACPI spec. 

Can we simply gain a similar check (suitably commented), eliminating the
need for saturation?

Jan

>> Or is "unreachable" simply expressed by the absence of any data?
> 
> Maybe I am wrong but I don't think so, as in the Linux implementation,
> drivers/of/of_numa.c: of_numa_parse_distance_map_v1(), the for loop
> "for (i = 0; i + 2 < entry_count; i += 3)" basically implies no fields should
> be omitted in the distance map entry.
> 
> Kind regards,
> Henry
Henry Wang May 23, 2023, 10:35 a.m. UTC | #17
Hi Jan,

> -----Original Message-----
> On 23.05.2023 09:32, Henry Wang wrote:
> > By looking into the existing implementation of NUMA for DT,
> > in Linux, from drivers/base/arch_numa.c: numa_set_distance(), there is a
> > "if ((u8)distance != distance)" then return. So I think at least in Linux the
> > distance value is consistent with the ACPI spec.
> 
> Can we simply gain a similar check (suitably commented), eliminating the
> need for saturation?

Yes, I will use the similar check and add the related comments in v5. Thanks
for your confirmation!

Kind regards,
Henry

> 
> Jan
diff mbox series

Patch

diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
index 38f5e93d1d..9d9d09273e 100644
--- a/xen/arch/arm/numa_device_tree.c
+++ b/xen/arch/arm/numa_device_tree.c
@@ -151,3 +151,111 @@  invalid_data:
     numa_fw_bad();
     return -EINVAL;
 }
+
+/* Parse NUMA distance map v1 */
+static int __init fdt_parse_numa_distance_map_v1(const void *fdt, int node)
+{
+    const struct fdt_property *prop;
+    const __be32 *matrix;
+    unsigned int i, entry_count;
+    int len;
+
+    printk(XENLOG_INFO "NUMA: parsing numa-distance-map\n");
+
+    prop = fdt_get_property(fdt, node, "distance-matrix", &len);
+    if ( !prop )
+    {
+        printk(XENLOG_WARNING
+               "NUMA: No distance-matrix property in distance-map\n");
+        goto invalid_data;
+    }
+
+    if ( len % sizeof(__be32) != 0 )
+    {
+        printk(XENLOG_WARNING
+               "distance-matrix in node is not a multiple of u32\n");
+        goto invalid_data;
+    }
+
+    entry_count = len / sizeof(__be32);
+    if ( entry_count == 0 )
+    {
+        printk(XENLOG_WARNING "NUMA: Invalid distance-matrix\n");
+        goto invalid_data;
+    }
+
+    matrix = (const __be32 *)prop->data;
+    for ( i = 0; i + 2 < entry_count; i += 3 )
+    {
+        unsigned int from, to, distance, opposite;
+
+        from = dt_next_cell(1, &matrix);
+        to = dt_next_cell(1, &matrix);
+        distance = dt_next_cell(1, &matrix);
+        if ( (from == to && distance != NUMA_LOCAL_DISTANCE) ||
+             (from != to && distance <= NUMA_LOCAL_DISTANCE) )
+        {
+            printk(XENLOG_WARNING
+                   "NUMA: Invalid distance: NODE#%"PRIu32"->NODE#%"PRIu32":%"PRIu32"\n",
+                   from, to, distance);
+            goto invalid_data;
+        }
+
+        printk(XENLOG_INFO "NUMA: distance: NODE#%"PRIu32"->NODE#%"PRIu32":%"PRIu32"\n",
+               from, to, distance);
+
+        /* Get opposite way distance */
+        opposite = __node_distance(to, from);
+        /* The default value in node_distance_map is NUMA_NO_DISTANCE */
+        if ( opposite == NUMA_NO_DISTANCE )
+        {
+            /* Bi-directions are not set, set both */
+            numa_set_distance(from, to, distance);
+            numa_set_distance(to, from, distance);
+        }
+        else
+        {
+            /*
+             * Opposite way distance has been set to a different value.
+             * It may be a firmware device tree bug?
+             */
+            if ( opposite != distance )
+            {
+                /*
+                 * In device tree NUMA distance-matrix binding:
+                 * https://www.kernel.org/doc/Documentation/devicetree/bindings/numa.txt
+                 * There is a notes mentions:
+                 * "Each entry represents distance from first node to
+                 *  second node. The distances are equal in either
+                 *  direction."
+                 *
+                 * That means device tree doesn't permit this case.
+                 * But in ACPI spec, it cares to specifically permit this
+                 * case:
+                 * "Except for the relative distance from a System Locality
+                 *  to itself, each relative distance is stored twice in the
+                 *  matrix. This provides the capability to describe the
+                 *  scenario where the relative distances for the two
+                 *  directions between System Localities is different."
+                 *
+                 * That means a real machine allows such NUMA configuration.
+                 * So, place a WARNING here to notice system administrators,
+                 * is it the specail case that they hijack the device tree
+                 * to support their rare machines?
+                 */
+                printk(XENLOG_WARNING
+                       "Un-matched bi-direction! NODE#%"PRIu32"->NODE#%"PRIu32":%"PRIu32", NODE#%"PRIu32"->NODE#%"PRIu32":%"PRIu32"\n",
+                       from, to, distance, to, from, opposite);
+            }
+
+            /* Opposite way distance has been set, just set this way */
+            numa_set_distance(from, to, distance);
+        }
+    }
+
+    return 0;
+
+invalid_data:
+    numa_fw_bad();
+    return -EINVAL;
+}