diff mbox series

[v4] NUMA: Introduce NODE_DATA->node_present_pages(RAM pages)

Message ID 20241027144305.1839348-1-bernhardkaindl7@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v4] NUMA: Introduce NODE_DATA->node_present_pages(RAM pages) | expand

Commit Message

Bernhard Kaindl Oct. 27, 2024, 2:43 p.m. UTC
From: Bernhard Kaindl <bernhard.kaindl@cloud.com>

At the moment, Xen keeps track of the spans of PFNs of the NUMA nodes.
But the PFN span sometimes includes large MMIO holes, so these values
might not be an exact representation of the total usable RAM of nodes.

Xen does not need it, but the size of the NUMA node's memory can be
helpful for management tools and HW information tools like hwloc/lstopo
with its Xen backend for Dom0: https://github.com/xenserver-next/hwloc/

First, introduce NODE_DATA(nodeid)->node_present_pages to node_data[],
determine the sum of usable PFNs at boot and update them on memory_add().

(The Linux kernel handles NODE_DATA->node_present_pages likewise)

Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
---
Changes in v3:
- Use PFN_UP/DOWN, refactored further to simplify the code while leaving
  compiler-level optimisations to the compiler's optimisation passes.
Changes in v4:
- Refactored code and doxygen documentation according to the review.
---
 xen/arch/x86/numa.c      | 13 +++++++++++++
 xen/arch/x86/x86_64/mm.c |  3 +++
 xen/common/numa.c        | 36 +++++++++++++++++++++++++++++++++---
 xen/include/xen/numa.h   | 21 +++++++++++++++++++++
 4 files changed, 70 insertions(+), 3 deletions(-)

Comments

Alejandro Vallejo Oct. 28, 2024, 12:05 p.m. UTC | #1
Hi,

On Sun Oct 27, 2024 at 2:43 PM GMT, Bernhard Kaindl wrote:
> From: Bernhard Kaindl <bernhard.kaindl@cloud.com>
>
> At the moment, Xen keeps track of the spans of PFNs of the NUMA nodes.
> But the PFN span sometimes includes large MMIO holes, so these values
> might not be an exact representation of the total usable RAM of nodes.
>
> Xen does not need it, but the size of the NUMA node's memory can be
> helpful for management tools and HW information tools like hwloc/lstopo
> with its Xen backend for Dom0: https://github.com/xenserver-next/hwloc/
>
> First, introduce NODE_DATA(nodeid)->node_present_pages to node_data[],
> determine the sum of usable PFNs at boot and update them on memory_add().
>
> (The Linux kernel handles NODE_DATA->node_present_pages likewise)
>
> Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
> ---
> Changes in v3:
> - Use PFN_UP/DOWN, refactored further to simplify the code while leaving
>   compiler-level optimisations to the compiler's optimisation passes.
> Changes in v4:
> - Refactored code and doxygen documentation according to the review.
> ---
>  xen/arch/x86/numa.c      | 13 +++++++++++++
>  xen/arch/x86/x86_64/mm.c |  3 +++
>  xen/common/numa.c        | 36 +++++++++++++++++++++++++++++++++---
>  xen/include/xen/numa.h   | 21 +++++++++++++++++++++
>  4 files changed, 70 insertions(+), 3 deletions(-)
>
> diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
> index 4b0b297c7e..3c0574f773 100644
> --- a/xen/arch/x86/numa.c
> +++ b/xen/arch/x86/numa.c
> @@ -100,6 +100,19 @@ unsigned int __init arch_get_dma_bitsize(void)
>                   + PAGE_SHIFT, 32);
>  }
>  
> +/**
> + * @brief Retrieves the RAM range for a given index from the e820 memory map.
> + *
> + * This function fetches the start and end address (exclusive) of a RAM range
> + * specified by the given index idx from the e820 memory map.
> + *
> + * @param idx The index of the RAM range in the e820 memory map to retrieve.
> + * @param start Pointer to store the start address of the RAM range.
> + * @param end Pointer to store the end address of the RAM range.

Same as setup_node_bootmem(), we probably want this to explicitly state
"exclusive" to indicate it's not the last address, but the address after.

> + *
> + * @return 0 on success, -ENOENT if the index is out of bounds,
> + *         or -ENODATA if the memory map at index idx is not of type E820_RAM.
> + */
>  int __init arch_get_ram_range(unsigned int idx, paddr_t *start, paddr_t *end)
>  {
>      if ( idx >= e820.nr_map )
> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
> index b2a280fba3..66b9bed057 100644
> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -1334,6 +1334,9 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
>      share_hotadd_m2p_table(&info);
>      transfer_pages_to_heap(&info);
>  
> +    /* Update the node's present pages (like the total_pages of the system) */
> +    NODE_DATA(node)->node_present_pages += epfn - spfn;
> +
>      return 0;
>  
>  destroy_m2p:
> diff --git a/xen/common/numa.c b/xen/common/numa.c
> index 209c546a3b..9a8b805dd7 100644
> --- a/xen/common/numa.c
> +++ b/xen/common/numa.c
> @@ -4,6 +4,7 @@
>   * Adapted for Xen: Ryan Harper <ryanh@us.ibm.com>
>   */
>  
> +#include "xen/pfn.h"
>  #include <xen/init.h>
>  #include <xen/keyhandler.h>
>  #include <xen/mm.h>
> @@ -499,15 +500,44 @@ int __init compute_hash_shift(const struct node *nodes,
>      return shift;
>  }
>  
> -/* Initialize NODE_DATA given nodeid and start/end */
> +/**
> + * @brief Initialize a NUMA node's node_data structure at boot.
> + *
> + * It is given the NUMA node's index in the node_data array as well
> + * as the start and exclusive end address of the node's memory span
> + * as arguments and initializes the node_data entry with this information.
> + *
> + * It then initializes the total number of usable memory pages within
> + * the NUMA node's memory span using the arch_get_ram_range() function.
> + *
> + * @param nodeid The index into the node_data array for the node.
> + * @param start The starting physical address of the node's memory range.
> + * @param end The exclusive ending physical address of the node's memory range.
> + */
>  void __init setup_node_bootmem(nodeid_t nodeid, paddr_t start, paddr_t end)
>  {
>      unsigned long start_pfn = paddr_to_pfn(start);
>      unsigned long end_pfn = paddr_to_pfn(end);
> +    struct node_data *numa_node = NODE_DATA(nodeid);
> +    paddr_t start_ram, end_ram;

With the loop in place and arch_get_ram_range() being called inside, these two
can further reduce scope by being moved inside as well.

> +    unsigned int idx = 0;
> +    unsigned long *pages = &numa_node->node_present_pages;
>  
> -    NODE_DATA(nodeid)->node_start_pfn = start_pfn;
> -    NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn;
> +    numa_node->node_start_pfn = start_pfn;
> +    numa_node->node_spanned_pages = end_pfn - start_pfn;
> +
> +    /* Calculate the number of present RAM pages within the node: */

nit: that last ":" feels a bit out of place

> +    *pages = 0;
> +    do {
> +        int err = arch_get_ram_range(idx++, &start_ram, &end_ram);
> +
> +        if (err == -ENOENT)

Missing spaces between condition and the parenthesis of the conditional. But...

> +            break;
> +        if ( err || start_ram >= end || end_ram <= start )
> +            continue;  /* range is outside of the node, or not usable RAM */
>  
> +        *pages += PFN_DOWN(min(end_ram, end)) - PFN_UP(max(start_ram, start));
> +    } while (1);

... testing for validity rather than invalidity would allow the loop to be
checked for termination on the termination condition rather than the ad-hoc
check inside. That is...

    (untested)

    do {
        paddr_t start_ram, end_ram;
        int err = arch_get_ram_range(idx++, &start_ram, &end_ram);

        if ( !err && start_ram < end && end_ram > start )
            *pages += PFN_DOWN(min(end_ram, end)) -
                      PFN_UP(max(start_ram, start));
    } while (err != ENOENT);

>      node_set_online(nodeid);
>  }
>  
> diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> index fd1511a6fb..6e82dfd2a8 100644
> --- a/xen/include/xen/numa.h
> +++ b/xen/include/xen/numa.h
> @@ -68,9 +68,28 @@ extern unsigned int memnode_shift;
>  extern unsigned long memnodemapsize;
>  extern nodeid_t *memnodemap;
>  
> +/**
> + * @struct numa_node
> + * @brief Represents the memory information of a NUMA node.
> + *
> + * @var numa_node::node_start_pfn
> + * The starting page frame number (lowest pfn) of the NUMA node.
> + *
> + * @var numa_node::node_spanned_pages
> + * The number of pages spanned by the NUMA node, including memory holes.
> + * Used to get the end of the node memory when scrubbing unallocated memory.
> + *
> + * @var numa_node::node_present_pages
> + * The total number of usable memory pages that are available in this NUMA node.
> + * The value of total_pages would be the sum of all node's node_present_pages.
> + *
> + * The Xen Hypervisor does not use this field internally, but it is useful
> + * for reporting the memory information of NUMA nodes to management tools.
> + */

I like the content, but we don't actually use Doxygen in that fashion (or any
fashion for that matter, AFAIK). In Xen style, the comments for each field tend
to be written on top of each respective field rather than stashed on top of the
containing struct.

>  struct node_data {
>      unsigned long node_start_pfn;
>      unsigned long node_spanned_pages;
> +    unsigned long node_present_pages;
>  };
>  
>  extern struct node_data node_data[];
> @@ -91,6 +110,7 @@ static inline nodeid_t mfn_to_nid(mfn_t mfn)
>  
>  #define node_start_pfn(nid)     (NODE_DATA(nid)->node_start_pfn)
>  #define node_spanned_pages(nid) (NODE_DATA(nid)->node_spanned_pages)
> +#define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages)
>  #define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
>                                   NODE_DATA(nid)->node_spanned_pages)
>  
> @@ -123,6 +143,7 @@ extern void numa_set_processor_nodes_parsed(nodeid_t node);
>  extern mfn_t first_valid_mfn;
>  
>  #define node_spanned_pages(nid) (max_page - mfn_x(first_valid_mfn))
> +#define node_present_pages(nid) total_pages
>  #define node_start_pfn(nid) mfn_x(first_valid_mfn)
>  #define __node_distance(a, b) 20
>  

That said, take all of this with a pinch of salt. I'm not a maintainer here,
after all, and you might want to wait for Andrew, Jan or Roger to chip in.

Cheers,
Alejandro
Jan Beulich Oct. 28, 2024, 12:53 p.m. UTC | #2
On 28.10.2024 13:05, Alejandro Vallejo wrote:
> On Sun Oct 27, 2024 at 2:43 PM GMT, Bernhard Kaindl wrote:
>> @@ -499,15 +500,44 @@ int __init compute_hash_shift(const struct node *nodes,
>>      return shift;
>>  }
>>  
>> -/* Initialize NODE_DATA given nodeid and start/end */
>> +/**
>> + * @brief Initialize a NUMA node's node_data structure at boot.
>> + *
>> + * It is given the NUMA node's index in the node_data array as well
>> + * as the start and exclusive end address of the node's memory span
>> + * as arguments and initializes the node_data entry with this information.
>> + *
>> + * It then initializes the total number of usable memory pages within
>> + * the NUMA node's memory span using the arch_get_ram_range() function.
>> + *
>> + * @param nodeid The index into the node_data array for the node.
>> + * @param start The starting physical address of the node's memory range.
>> + * @param end The exclusive ending physical address of the node's memory range.
>> + */
>>  void __init setup_node_bootmem(nodeid_t nodeid, paddr_t start, paddr_t end)
>>  {
>>      unsigned long start_pfn = paddr_to_pfn(start);
>>      unsigned long end_pfn = paddr_to_pfn(end);
>> +    struct node_data *numa_node = NODE_DATA(nodeid);
>> +    paddr_t start_ram, end_ram;
> 
> With the loop in place and arch_get_ram_range() being called inside, these two
> can further reduce scope by being moved inside as well.
> 
>> +    unsigned int idx = 0;
>> +    unsigned long *pages = &numa_node->node_present_pages;
>>  
>> -    NODE_DATA(nodeid)->node_start_pfn = start_pfn;
>> -    NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn;
>> +    numa_node->node_start_pfn = start_pfn;
>> +    numa_node->node_spanned_pages = end_pfn - start_pfn;
>> +
>> +    /* Calculate the number of present RAM pages within the node: */
> 
> nit: that last ":" feels a bit out of place
> 
>> +    *pages = 0;
>> +    do {
>> +        int err = arch_get_ram_range(idx++, &start_ram, &end_ram);
>> +
>> +        if (err == -ENOENT)
> 
> Missing spaces between condition and the parenthesis of the conditional. But...
> 
>> +            break;
>> +        if ( err || start_ram >= end || end_ram <= start )
>> +            continue;  /* range is outside of the node, or not usable RAM */
>>  
>> +        *pages += PFN_DOWN(min(end_ram, end)) - PFN_UP(max(start_ram, start));
>> +    } while (1);
> 
> ... testing for validity rather than invalidity would allow the loop to be
> checked for termination on the termination condition rather than the ad-hoc
> check inside. That is...
> 
>     (untested)
> 
>     do {
>         paddr_t start_ram, end_ram;
>         int err = arch_get_ram_range(idx++, &start_ram, &end_ram);
> 
>         if ( !err && start_ram < end && end_ram > start )
>             *pages += PFN_DOWN(min(end_ram, end)) -
>                       PFN_UP(max(start_ram, start));
>     } while (err != ENOENT);

     } while ( err != -ENOENT );

> That said, take all of this with a pinch of salt. I'm not a maintainer here,
> after all, and you might want to wait for Andrew, Jan or Roger to chip in.

Apart from the small remark above I agree with the comments made, fwiw.

Jan
Jan Beulich Oct. 29, 2024, 3:53 p.m. UTC | #3
On 27.10.2024 15:43, Bernhard Kaindl wrote:
> From: Bernhard Kaindl <bernhard.kaindl@cloud.com>
> 
> At the moment, Xen keeps track of the spans of PFNs of the NUMA nodes.
> But the PFN span sometimes includes large MMIO holes, so these values
> might not be an exact representation of the total usable RAM of nodes.
> 
> Xen does not need it, but the size of the NUMA node's memory can be
> helpful for management tools and HW information tools like hwloc/lstopo
> with its Xen backend for Dom0: https://github.com/xenserver-next/hwloc/
> 
> First, introduce NODE_DATA(nodeid)->node_present_pages to node_data[],
> determine the sum of usable PFNs at boot and update them on memory_add().
> 
> (The Linux kernel handles NODE_DATA->node_present_pages likewise)
> 
> Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
> ---
> Changes in v3:
> - Use PFN_UP/DOWN, refactored further to simplify the code while leaving
>   compiler-level optimisations to the compiler's optimisation passes.
> Changes in v4:
> - Refactored code and doxygen documentation according to the review.
> ---
>  xen/arch/x86/numa.c      | 13 +++++++++++++
>  xen/arch/x86/x86_64/mm.c |  3 +++
>  xen/common/numa.c        | 36 +++++++++++++++++++++++++++++++++---
>  xen/include/xen/numa.h   | 21 +++++++++++++++++++++
>  4 files changed, 70 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
> index 4b0b297c7e..3c0574f773 100644
> --- a/xen/arch/x86/numa.c
> +++ b/xen/arch/x86/numa.c
> @@ -100,6 +100,19 @@ unsigned int __init arch_get_dma_bitsize(void)
>                   + PAGE_SHIFT, 32);
>  }
>  
> +/**
> + * @brief Retrieves the RAM range for a given index from the e820 memory map.
> + *
> + * This function fetches the start and end address (exclusive) of a RAM range
> + * specified by the given index idx from the e820 memory map.

I think the use of (exclusive) here leaves room for ambiguity (as it may,
unusually, apply to start as well then). Imo it would better be put ...

> + * @param idx The index of the RAM range in the e820 memory map to retrieve.
> + * @param start Pointer to store the start address of the RAM range.
> + * @param end Pointer to store the end address of the RAM range.

... here, just like you have it ...

> + * @return 0 on success, -ENOENT if the index is out of bounds,
> + *         or -ENODATA if the memory map at index idx is not of type E820_RAM.
> + */
>  int __init arch_get_ram_range(unsigned int idx, paddr_t *start, paddr_t *end)
>  {
>      if ( idx >= e820.nr_map )
> --- a/xen/common/numa.c
> +++ b/xen/common/numa.c
> @@ -4,6 +4,7 @@
>   * Adapted for Xen: Ryan Harper <ryanh@us.ibm.com>
>   */
>  
> +#include "xen/pfn.h"
>  #include <xen/init.h>
>  #include <xen/keyhandler.h>
>  #include <xen/mm.h>
> @@ -499,15 +500,44 @@ int __init compute_hash_shift(const struct node *nodes,
>      return shift;
>  }
>  
> -/* Initialize NODE_DATA given nodeid and start/end */
> +/**
> + * @brief Initialize a NUMA node's node_data structure at boot.
> + *
> + * It is given the NUMA node's index in the node_data array as well
> + * as the start and exclusive end address of the node's memory span
> + * as arguments and initializes the node_data entry with this information.
> + *
> + * It then initializes the total number of usable memory pages within
> + * the NUMA node's memory span using the arch_get_ram_range() function.
> + *
> + * @param nodeid The index into the node_data array for the node.
> + * @param start The starting physical address of the node's memory range.
> + * @param end The exclusive ending physical address of the node's memory range.

... here.

> + */
>  void __init setup_node_bootmem(nodeid_t nodeid, paddr_t start, paddr_t end)
>  {
>      unsigned long start_pfn = paddr_to_pfn(start);
>      unsigned long end_pfn = paddr_to_pfn(end);
> +    struct node_data *numa_node = NODE_DATA(nodeid);
> +    paddr_t start_ram, end_ram;
> +    unsigned int idx = 0;
> +    unsigned long *pages = &numa_node->node_present_pages;
>  
> -    NODE_DATA(nodeid)->node_start_pfn = start_pfn;
> -    NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn;
> +    numa_node->node_start_pfn = start_pfn;
> +    numa_node->node_spanned_pages = end_pfn - start_pfn;
> +
> +    /* Calculate the number of present RAM pages within the node: */
> +    *pages = 0;
> +    do {
> +        int err = arch_get_ram_range(idx++, &start_ram, &end_ram);
> +
> +        if (err == -ENOENT)
> +            break;
> +        if ( err || start_ram >= end || end_ram <= start )
> +            continue;  /* range is outside of the node, or not usable RAM */
>  
> +        *pages += PFN_DOWN(min(end_ram, end)) - PFN_UP(max(start_ram, start));
> +    } while (1);

Nit: While we have ample bad examples, I think even in such while() uses style
ought to be followed (i.e. "while ( 1 )"). Personally, since this looks a little
odd to me, I generally prefer "for ( ; ; )" in such cases.

With respective adjustments (which I'm happy to make while committing, so long
as you agree):
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan
Jan Beulich Oct. 31, 2024, 11:33 a.m. UTC | #4
On 29.10.2024 16:53, Jan Beulich wrote:
> On 27.10.2024 15:43, Bernhard Kaindl wrote:
>> From: Bernhard Kaindl <bernhard.kaindl@cloud.com>
>>
>> At the moment, Xen keeps track of the spans of PFNs of the NUMA nodes.
>> But the PFN span sometimes includes large MMIO holes, so these values
>> might not be an exact representation of the total usable RAM of nodes.
>>
>> Xen does not need it, but the size of the NUMA node's memory can be
>> helpful for management tools and HW information tools like hwloc/lstopo
>> with its Xen backend for Dom0: https://github.com/xenserver-next/hwloc/
>>
>> First, introduce NODE_DATA(nodeid)->node_present_pages to node_data[],
>> determine the sum of usable PFNs at boot and update them on memory_add().
>>
>> (The Linux kernel handles NODE_DATA->node_present_pages likewise)
>>
>> Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
>> ---
>> Changes in v3:
>> - Use PFN_UP/DOWN, refactored further to simplify the code while leaving
>>   compiler-level optimisations to the compiler's optimisation passes.
>> Changes in v4:
>> - Refactored code and doxygen documentation according to the review.
>> ---
>>  xen/arch/x86/numa.c      | 13 +++++++++++++
>>  xen/arch/x86/x86_64/mm.c |  3 +++
>>  xen/common/numa.c        | 36 +++++++++++++++++++++++++++++++++---
>>  xen/include/xen/numa.h   | 21 +++++++++++++++++++++
>>  4 files changed, 70 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
>> index 4b0b297c7e..3c0574f773 100644
>> --- a/xen/arch/x86/numa.c
>> +++ b/xen/arch/x86/numa.c
>> @@ -100,6 +100,19 @@ unsigned int __init arch_get_dma_bitsize(void)
>>                   + PAGE_SHIFT, 32);
>>  }
>>  
>> +/**
>> + * @brief Retrieves the RAM range for a given index from the e820 memory map.
>> + *
>> + * This function fetches the start and end address (exclusive) of a RAM range
>> + * specified by the given index idx from the e820 memory map.
> 
> I think the use of (exclusive) here leaves room for ambiguity (as it may,
> unusually, apply to start as well then). Imo it would better be put ...
> 
>> + * @param idx The index of the RAM range in the e820 memory map to retrieve.
>> + * @param start Pointer to store the start address of the RAM range.
>> + * @param end Pointer to store the end address of the RAM range.
> 
> ... here, just like you have it ...
> 
>> + * @return 0 on success, -ENOENT if the index is out of bounds,
>> + *         or -ENODATA if the memory map at index idx is not of type E820_RAM.
>> + */
>>  int __init arch_get_ram_range(unsigned int idx, paddr_t *start, paddr_t *end)
>>  {
>>      if ( idx >= e820.nr_map )
>> --- a/xen/common/numa.c
>> +++ b/xen/common/numa.c
>> @@ -4,6 +4,7 @@
>>   * Adapted for Xen: Ryan Harper <ryanh@us.ibm.com>
>>   */
>>  
>> +#include "xen/pfn.h"
>>  #include <xen/init.h>
>>  #include <xen/keyhandler.h>
>>  #include <xen/mm.h>
>> @@ -499,15 +500,44 @@ int __init compute_hash_shift(const struct node *nodes,
>>      return shift;
>>  }
>>  
>> -/* Initialize NODE_DATA given nodeid and start/end */
>> +/**
>> + * @brief Initialize a NUMA node's node_data structure at boot.
>> + *
>> + * It is given the NUMA node's index in the node_data array as well
>> + * as the start and exclusive end address of the node's memory span
>> + * as arguments and initializes the node_data entry with this information.
>> + *
>> + * It then initializes the total number of usable memory pages within
>> + * the NUMA node's memory span using the arch_get_ram_range() function.
>> + *
>> + * @param nodeid The index into the node_data array for the node.
>> + * @param start The starting physical address of the node's memory range.
>> + * @param end The exclusive ending physical address of the node's memory range.
> 
> ... here.
> 
>> + */
>>  void __init setup_node_bootmem(nodeid_t nodeid, paddr_t start, paddr_t end)
>>  {
>>      unsigned long start_pfn = paddr_to_pfn(start);
>>      unsigned long end_pfn = paddr_to_pfn(end);
>> +    struct node_data *numa_node = NODE_DATA(nodeid);
>> +    paddr_t start_ram, end_ram;
>> +    unsigned int idx = 0;
>> +    unsigned long *pages = &numa_node->node_present_pages;
>>  
>> -    NODE_DATA(nodeid)->node_start_pfn = start_pfn;
>> -    NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn;
>> +    numa_node->node_start_pfn = start_pfn;
>> +    numa_node->node_spanned_pages = end_pfn - start_pfn;
>> +
>> +    /* Calculate the number of present RAM pages within the node: */
>> +    *pages = 0;
>> +    do {
>> +        int err = arch_get_ram_range(idx++, &start_ram, &end_ram);
>> +
>> +        if (err == -ENOENT)
>> +            break;
>> +        if ( err || start_ram >= end || end_ram <= start )
>> +            continue;  /* range is outside of the node, or not usable RAM */
>>  
>> +        *pages += PFN_DOWN(min(end_ram, end)) - PFN_UP(max(start_ram, start));
>> +    } while (1);
> 
> Nit: While we have ample bad examples, I think even in such while() uses style
> ought to be followed (i.e. "while ( 1 )"). Personally, since this looks a little
> odd to me, I generally prefer "for ( ; ; )" in such cases.
> 
> With respective adjustments (which I'm happy to make while committing, so long
> as you agree):

Ah, no, I take that back. Alejandro's comments also want addressing, one way or
another.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index 4b0b297c7e..3c0574f773 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -100,6 +100,19 @@  unsigned int __init arch_get_dma_bitsize(void)
                  + PAGE_SHIFT, 32);
 }
 
+/**
+ * @brief Retrieves the RAM range for a given index from the e820 memory map.
+ *
+ * This function fetches the start and end address (exclusive) of a RAM range
+ * specified by the given index idx from the e820 memory map.
+ *
+ * @param idx The index of the RAM range in the e820 memory map to retrieve.
+ * @param start Pointer to store the start address of the RAM range.
+ * @param end Pointer to store the end address of the RAM range.
+ *
+ * @return 0 on success, -ENOENT if the index is out of bounds,
+ *         or -ENODATA if the memory map at index idx is not of type E820_RAM.
+ */
 int __init arch_get_ram_range(unsigned int idx, paddr_t *start, paddr_t *end)
 {
     if ( idx >= e820.nr_map )
diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index b2a280fba3..66b9bed057 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1334,6 +1334,9 @@  int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
     share_hotadd_m2p_table(&info);
     transfer_pages_to_heap(&info);
 
+    /* Update the node's present pages (like the total_pages of the system) */
+    NODE_DATA(node)->node_present_pages += epfn - spfn;
+
     return 0;
 
 destroy_m2p:
diff --git a/xen/common/numa.c b/xen/common/numa.c
index 209c546a3b..9a8b805dd7 100644
--- a/xen/common/numa.c
+++ b/xen/common/numa.c
@@ -4,6 +4,7 @@ 
  * Adapted for Xen: Ryan Harper <ryanh@us.ibm.com>
  */
 
+#include "xen/pfn.h"
 #include <xen/init.h>
 #include <xen/keyhandler.h>
 #include <xen/mm.h>
@@ -499,15 +500,44 @@  int __init compute_hash_shift(const struct node *nodes,
     return shift;
 }
 
-/* Initialize NODE_DATA given nodeid and start/end */
+/**
+ * @brief Initialize a NUMA node's node_data structure at boot.
+ *
+ * It is given the NUMA node's index in the node_data array as well
+ * as the start and exclusive end address of the node's memory span
+ * as arguments and initializes the node_data entry with this information.
+ *
+ * It then initializes the total number of usable memory pages within
+ * the NUMA node's memory span using the arch_get_ram_range() function.
+ *
+ * @param nodeid The index into the node_data array for the node.
+ * @param start The starting physical address of the node's memory range.
+ * @param end The exclusive ending physical address of the node's memory range.
+ */
 void __init setup_node_bootmem(nodeid_t nodeid, paddr_t start, paddr_t end)
 {
     unsigned long start_pfn = paddr_to_pfn(start);
     unsigned long end_pfn = paddr_to_pfn(end);
+    struct node_data *numa_node = NODE_DATA(nodeid);
+    paddr_t start_ram, end_ram;
+    unsigned int idx = 0;
+    unsigned long *pages = &numa_node->node_present_pages;
 
-    NODE_DATA(nodeid)->node_start_pfn = start_pfn;
-    NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn;
+    numa_node->node_start_pfn = start_pfn;
+    numa_node->node_spanned_pages = end_pfn - start_pfn;
+
+    /* Calculate the number of present RAM pages within the node: */
+    *pages = 0;
+    do {
+        int err = arch_get_ram_range(idx++, &start_ram, &end_ram);
+
+        if (err == -ENOENT)
+            break;
+        if ( err || start_ram >= end || end_ram <= start )
+            continue;  /* range is outside of the node, or not usable RAM */
 
+        *pages += PFN_DOWN(min(end_ram, end)) - PFN_UP(max(start_ram, start));
+    } while (1);
     node_set_online(nodeid);
 }
 
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index fd1511a6fb..6e82dfd2a8 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -68,9 +68,28 @@  extern unsigned int memnode_shift;
 extern unsigned long memnodemapsize;
 extern nodeid_t *memnodemap;
 
+/**
+ * @struct numa_node
+ * @brief Represents the memory information of a NUMA node.
+ *
+ * @var numa_node::node_start_pfn
+ * The starting page frame number (lowest pfn) of the NUMA node.
+ *
+ * @var numa_node::node_spanned_pages
+ * The number of pages spanned by the NUMA node, including memory holes.
+ * Used to get the end of the node memory when scrubbing unallocated memory.
+ *
+ * @var numa_node::node_present_pages
+ * The total number of usable memory pages that are available in this NUMA node.
+ * The value of total_pages would be the sum of all node's node_present_pages.
+ *
+ * The Xen Hypervisor does not use this field internally, but it is useful
+ * for reporting the memory information of NUMA nodes to management tools.
+ */
 struct node_data {
     unsigned long node_start_pfn;
     unsigned long node_spanned_pages;
+    unsigned long node_present_pages;
 };
 
 extern struct node_data node_data[];
@@ -91,6 +110,7 @@  static inline nodeid_t mfn_to_nid(mfn_t mfn)
 
 #define node_start_pfn(nid)     (NODE_DATA(nid)->node_start_pfn)
 #define node_spanned_pages(nid) (NODE_DATA(nid)->node_spanned_pages)
+#define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages)
 #define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
                                  NODE_DATA(nid)->node_spanned_pages)
 
@@ -123,6 +143,7 @@  extern void numa_set_processor_nodes_parsed(nodeid_t node);
 extern mfn_t first_valid_mfn;
 
 #define node_spanned_pages(nid) (max_page - mfn_x(first_valid_mfn))
+#define node_present_pages(nid) total_pages
 #define node_start_pfn(nid) mfn_x(first_valid_mfn)
 #define __node_distance(a, b) 20