Message ID | d10d1a9f11e9f8752c7ec5ff5bb262b3f6c6bb85.1744175097.git.donettom@linux.ibm.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block | expand |
On Wed, Apr 09, 2025 at 10:57:57AM +0530, Donet Tom wrote: > In the current implementation, when CONFIG_DEFERRED_STRUCT_PAGE_INIT is > set, we iterate over all PFNs in the memory block and use > early_pfn_to_nid to find the NID until a match is found. > > This patch we are using curr_node_memblock_intersect_memory_block() to > check if the current node's memblock intersects with the memory block > passed when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. If an intersection > is found, the memory block is added to the current node. > > If CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set, the existing mechanism > for finding the NID will continue to be used. I don't think we really need different mechanisms for different settings of CONFIG_DEFERRED_STRUCT_PAGE_INIT. node_dev_init() runs after all struct pages are already initialized and can always use pfn_to_nid(). kernel_init_freeable() -> page_alloc_init_late(); /* completes initialization of deferred pages */ ... do_basic_setup() -> driver_init() -> node_dev_init(); The next step could be refactoring register_mem_block_under_node_early() to loop over memblock regions rather than over pfns. > Signed-off-by: Donet Tom <donettom@linux.ibm.com> > --- > drivers/base/node.c | 37 +++++++++++++++++++++++++++++-------- > 1 file changed, 29 insertions(+), 8 deletions(-) > > diff --git a/drivers/base/node.c b/drivers/base/node.c > index cd13ef287011..5c5dd02b8bdd 100644 > --- a/drivers/base/node.c > +++ b/drivers/base/node.c > @@ -20,6 +20,8 @@ > #include <linux/pm_runtime.h> > #include <linux/swap.h> > #include <linux/slab.h> > +#include <linux/memblock.h> > + > > static const struct bus_type node_subsys = { > .name = "node", > @@ -782,16 +784,19 @@ static void do_register_memory_block_under_node(int nid, > ret); > } > > -/* register memory section under specified node if it spans that node */ > -static int register_mem_block_under_node_early(struct memory_block *mem_blk, > - void *arg) > +static int register_mem_block_early_if_dfer_page_init(struct memory_block *mem_blk, > + unsigned long start_pfn, unsigned long end_pfn, int nid) > { > - unsigned long memory_block_pfns = memory_block_size_bytes() / PAGE_SIZE; > - unsigned long start_pfn = section_nr_to_pfn(mem_blk->start_section_nr); > - unsigned long end_pfn = start_pfn + memory_block_pfns - 1; > - int nid = *(int *)arg; > - unsigned long pfn; > > + if (curr_node_memblock_intersect_memory_block(start_pfn, end_pfn, nid)) > + do_register_memory_block_under_node(nid, mem_blk, MEMINIT_EARLY); > + return 0; > +} > + > +static int register_mem_block_early__normal(struct memory_block *mem_blk, > + unsigned long start_pfn, unsigned long end_pfn, int nid) > +{ > + unsigned long pfn; > for (pfn = start_pfn; pfn <= end_pfn; pfn++) { > int page_nid; > > @@ -821,6 +826,22 @@ static int register_mem_block_under_node_early(struct memory_block *mem_blk, > /* mem section does not span the specified node */ > return 0; > } > +/* register memory section under specified node if it spans that node */ > +static int register_mem_block_under_node_early(struct memory_block *mem_blk, > + void *arg) > +{ > + unsigned long memory_block_pfns = memory_block_size_bytes() / PAGE_SIZE; > + unsigned long start_pfn = section_nr_to_pfn(mem_blk->start_section_nr); > + unsigned long end_pfn = start_pfn + memory_block_pfns - 1; > + int nid = *(int *)arg; > + > +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT > + if (system_state < SYSTEM_RUNNING) > + return register_mem_block_early_if_dfer_page_init(mem_blk, start_pfn, end_pfn, nid); > +#endif > + return register_mem_block_early__normal(mem_blk, start_pfn, end_pfn, nid); > + > +} > > /* > * During hotplug we know that all pages in the memory block belong to the same > -- > 2.48.1 >
On 4/10/25 1:37 PM, Mike Rapoport wrote: > On Wed, Apr 09, 2025 at 10:57:57AM +0530, Donet Tom wrote: >> In the current implementation, when CONFIG_DEFERRED_STRUCT_PAGE_INIT is >> set, we iterate over all PFNs in the memory block and use >> early_pfn_to_nid to find the NID until a match is found. >> >> This patch we are using curr_node_memblock_intersect_memory_block() to >> check if the current node's memblock intersects with the memory block >> passed when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. If an intersection >> is found, the memory block is added to the current node. >> >> If CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set, the existing mechanism >> for finding the NID will continue to be used. > I don't think we really need different mechanisms for different settings of > CONFIG_DEFERRED_STRUCT_PAGE_INIT. > > node_dev_init() runs after all struct pages are already initialized and can > always use pfn_to_nid(). In the current implementation, if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, we perform a binary search in the memblock region to determine the pfn's nid. Otherwise, we use pfn_to_nid() to obtain the pfn's nid. Your point is that we could unify this logic and always use pfn_to_nid() to determine the pfn's nid, regardless of whether CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. Is that correct? > > kernel_init_freeable() -> > page_alloc_init_late(); /* completes initialization of deferred pages */ > ... > do_basic_setup() -> > driver_init() -> > node_dev_init(); > > The next step could be refactoring register_mem_block_under_node_early() to > loop over memblock regions rather than over pfns. So it the current implementation node_dev_init() register_one_node register_memory_blocks_under_node walk_memory_blocks() register_mem_block_under_node_early get_nid_for_pfn We get each node's start and end PFN from the pg_data. Using these values, we determine the memory block's start and end within the current node. To identify the node to which these memory block belongs,we iterate over each PFN in the range. The problem I am facing is, In my system node4 has a memory block ranging from memory30351 to memory38524, and memory128433. The memory blocks between memory38524 and memory128433 do not belong to this node. In walk_memory_blocks() we iterate over all memory blocks starting from memory38524 to memory128433. In register_mem_block_under_node_early(), up to memory38524, the first pfn correctly returns the corresponding nid and the function returns from there. But after memory38524 and until memory128433, the loop iterates through each pfn and checks the nid. Since the nid does not match the required nid, the loop continues. This causes the soft lockups. This issue occurs only when CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, as a binary search is used to determine the PFN's nid. When this configuration is disabled, pfn_to_nid is faster, and the issue does not seen.( Faster because nid is getting from page) To speed up the code when CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, I added this function that iterates over all memblock regions for each memory block to determine its nid. "Loop over memblock regions instead of iterating over PFNs" - My question is - in register_one_node, do you mean that we should iterate over all memblock regions, identify the regions belonging to the current node, and then retrieve the corresponding memory blocks to register them under that node? Thanks Donet > >> Signed-off-by: Donet Tom <donettom@linux.ibm.com> >> --- >> drivers/base/node.c | 37 +++++++++++++++++++++++++++++-------- >> 1 file changed, 29 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/base/node.c b/drivers/base/node.c >> index cd13ef287011..5c5dd02b8bdd 100644 >> --- a/drivers/base/node.c >> +++ b/drivers/base/node.c >> @@ -20,6 +20,8 @@ >> #include <linux/pm_runtime.h> >> #include <linux/swap.h> >> #include <linux/slab.h> >> +#include <linux/memblock.h> >> + >> >> static const struct bus_type node_subsys = { >> .name = "node", >> @@ -782,16 +784,19 @@ static void do_register_memory_block_under_node(int nid, >> ret); >> } >> >> -/* register memory section under specified node if it spans that node */ >> -static int register_mem_block_under_node_early(struct memory_block *mem_blk, >> - void *arg) >> +static int register_mem_block_early_if_dfer_page_init(struct memory_block *mem_blk, >> + unsigned long start_pfn, unsigned long end_pfn, int nid) >> { >> - unsigned long memory_block_pfns = memory_block_size_bytes() / PAGE_SIZE; >> - unsigned long start_pfn = section_nr_to_pfn(mem_blk->start_section_nr); >> - unsigned long end_pfn = start_pfn + memory_block_pfns - 1; >> - int nid = *(int *)arg; >> - unsigned long pfn; >> >> + if (curr_node_memblock_intersect_memory_block(start_pfn, end_pfn, nid)) >> + do_register_memory_block_under_node(nid, mem_blk, MEMINIT_EARLY); >> + return 0; >> +} >> + >> +static int register_mem_block_early__normal(struct memory_block *mem_blk, >> + unsigned long start_pfn, unsigned long end_pfn, int nid) >> +{ >> + unsigned long pfn; >> for (pfn = start_pfn; pfn <= end_pfn; pfn++) { >> int page_nid; >> >> @@ -821,6 +826,22 @@ static int register_mem_block_under_node_early(struct memory_block *mem_blk, >> /* mem section does not span the specified node */ >> return 0; >> } >> +/* register memory section under specified node if it spans that node */ >> +static int register_mem_block_under_node_early(struct memory_block *mem_blk, >> + void *arg) >> +{ >> + unsigned long memory_block_pfns = memory_block_size_bytes() / PAGE_SIZE; >> + unsigned long start_pfn = section_nr_to_pfn(mem_blk->start_section_nr); >> + unsigned long end_pfn = start_pfn + memory_block_pfns - 1; >> + int nid = *(int *)arg; >> + >> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT >> + if (system_state < SYSTEM_RUNNING) >> + return register_mem_block_early_if_dfer_page_init(mem_blk, start_pfn, end_pfn, nid); >> +#endif >> + return register_mem_block_early__normal(mem_blk, start_pfn, end_pfn, nid); >> + >> +} >> >> /* >> * During hotplug we know that all pages in the memory block belong to the same >> -- >> 2.48.1 >>
On Fri, Apr 11, 2025 at 12:27:28AM +0530, Donet Tom wrote: > > On 4/10/25 1:37 PM, Mike Rapoport wrote: > > On Wed, Apr 09, 2025 at 10:57:57AM +0530, Donet Tom wrote: > > > In the current implementation, when CONFIG_DEFERRED_STRUCT_PAGE_INIT is > > > set, we iterate over all PFNs in the memory block and use > > > early_pfn_to_nid to find the NID until a match is found. > > > > > > This patch we are using curr_node_memblock_intersect_memory_block() to > > > check if the current node's memblock intersects with the memory block > > > passed when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. If an intersection > > > is found, the memory block is added to the current node. > > > > > > If CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set, the existing mechanism > > > for finding the NID will continue to be used. > > I don't think we really need different mechanisms for different settings of > > CONFIG_DEFERRED_STRUCT_PAGE_INIT. > > > > node_dev_init() runs after all struct pages are already initialized and can > > always use pfn_to_nid(). > > > In the current implementation, if CONFIG_DEFERRED_STRUCT_PAGE_INIT > is enabled, we perform a binary search in the memblock region to > determine the pfn's nid. Otherwise, we use pfn_to_nid() to obtain > the pfn's nid. > > Your point is that we could unify this logic and always use > pfn_to_nid() to determine the pfn's nid, regardless of whether > CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. Is that > correct? Yes, struct pages should be ready by the time node_dev_init() is called even when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. > > > > kernel_init_freeable() -> > > page_alloc_init_late(); /* completes initialization of deferred pages */ > > ... > > do_basic_setup() -> > > driver_init() -> > > node_dev_init(); > > > > The next step could be refactoring register_mem_block_under_node_early() to > > loop over memblock regions rather than over pfns. > So it the current implementation > > node_dev_init() > register_one_node > register_memory_blocks_under_node > walk_memory_blocks() > register_mem_block_under_node_early > get_nid_for_pfn > > We get each node's start and end PFN from the pg_data. Using these > values, we determine the memory block's start and end within the > current node. To identify the node to which these memory block > belongs,we iterate over each PFN in the range. > > The problem I am facing is, > > In my system node4 has a memory block ranging from memory30351 > to memory38524, and memory128433. The memory blocks between > memory38524 and memory128433 do not belong to this node. > > In walk_memory_blocks() we iterate over all memory blocks starting > from memory38524 to memory128433. > In register_mem_block_under_node_early(), up to memory38524, the > first pfn correctly returns the corresponding nid and the function > returns from there. But after memory38524 and until memory128433, > the loop iterates through each pfn and checks the nid. Since the nid > does not match the required nid, the loop continues. This causes > the soft lockups. > > This issue occurs only when CONFIG_DEFERRED_STRUCT_PAGE_INIT > is enabled, as a binary search is used to determine the PFN's nid. When > this configuration is disabled, pfn_to_nid is faster, and the issue does > not seen.( Faster because nid is getting from page) > > To speed up the code when CONFIG_DEFERRED_STRUCT_PAGE_INIT > is enabled, I added this function that iterates over all memblock regions > for each memory block to determine its nid. > > "Loop over memblock regions instead of iterating over PFNs" - > My question is - in register_one_node, do you mean that we should iterate > over all memblock regions, identify the regions belonging to the current > node, and then retrieve the corresponding memory blocks to register them > under that node? I looked more closely at register_mem_block_under_node_early() and iteration over memblock regions won't make sense there. It might make sense to use for_each_mem_range() as top level loop in node_dev_init(), but that's a separate topic. > Thanks > Donet >
On 4/11/25 4:29 PM, Mike Rapoport wrote: > On Fri, Apr 11, 2025 at 12:27:28AM +0530, Donet Tom wrote: >> On 4/10/25 1:37 PM, Mike Rapoport wrote: >>> On Wed, Apr 09, 2025 at 10:57:57AM +0530, Donet Tom wrote: >>>> In the current implementation, when CONFIG_DEFERRED_STRUCT_PAGE_INIT is >>>> set, we iterate over all PFNs in the memory block and use >>>> early_pfn_to_nid to find the NID until a match is found. >>>> >>>> This patch we are using curr_node_memblock_intersect_memory_block() to >>>> check if the current node's memblock intersects with the memory block >>>> passed when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. If an intersection >>>> is found, the memory block is added to the current node. >>>> >>>> If CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set, the existing mechanism >>>> for finding the NID will continue to be used. >>> I don't think we really need different mechanisms for different settings of >>> CONFIG_DEFERRED_STRUCT_PAGE_INIT. >>> >>> node_dev_init() runs after all struct pages are already initialized and can >>> always use pfn_to_nid(). >> >> In the current implementation, if CONFIG_DEFERRED_STRUCT_PAGE_INIT >> is enabled, we perform a binary search in the memblock region to >> determine the pfn's nid. Otherwise, we use pfn_to_nid() to obtain >> the pfn's nid. >> >> Your point is that we could unify this logic and always use >> pfn_to_nid() to determine the pfn's nid, regardless of whether >> CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. Is that >> correct? > Yes, struct pages should be ready by the time node_dev_init() is called > even when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. ok. Thanks Mike. > >>> kernel_init_freeable() -> >>> page_alloc_init_late(); /* completes initialization of deferred pages */ >>> ... >>> do_basic_setup() -> >>> driver_init() -> >>> node_dev_init(); >>> >>> The next step could be refactoring register_mem_block_under_node_early() to >>> loop over memblock regions rather than over pfns. >> So it the current implementation >> >> node_dev_init() >> register_one_node >> register_memory_blocks_under_node >> walk_memory_blocks() >> register_mem_block_under_node_early >> get_nid_for_pfn >> >> We get each node's start and end PFN from the pg_data. Using these >> values, we determine the memory block's start and end within the >> current node. To identify the node to which these memory block >> belongs,we iterate over each PFN in the range. >> >> The problem I am facing is, >> >> In my system node4 has a memory block ranging from memory30351 >> to memory38524, and memory128433. The memory blocks between >> memory38524 and memory128433 do not belong to this node. >> >> In walk_memory_blocks() we iterate over all memory blocks starting >> from memory38524 to memory128433. >> In register_mem_block_under_node_early(), up to memory38524, the >> first pfn correctly returns the corresponding nid and the function >> returns from there. But after memory38524 and until memory128433, >> the loop iterates through each pfn and checks the nid. Since the nid >> does not match the required nid, the loop continues. This causes >> the soft lockups. >> >> This issue occurs only when CONFIG_DEFERRED_STRUCT_PAGE_INIT >> is enabled, as a binary search is used to determine the PFN's nid. When >> this configuration is disabled, pfn_to_nid is faster, and the issue does >> not seen.( Faster because nid is getting from page) >> >> To speed up the code when CONFIG_DEFERRED_STRUCT_PAGE_INIT >> is enabled, I added this function that iterates over all memblock regions >> for each memory block to determine its nid. >> >> "Loop over memblock regions instead of iterating over PFNs" - >> My question is - in register_one_node, do you mean that we should iterate >> over all memblock regions, identify the regions belonging to the current >> node, and then retrieve the corresponding memory blocks to register them >> under that node? > I looked more closely at register_mem_block_under_node_early() and > iteration over memblock regions won't make sense there. > > It might make sense to use for_each_mem_range() as top level loop in > node_dev_init(), but that's a separate topic. Yes, this makes sense to me as well. So in your opinion, instead of adding a new memblock search function like I added , it's better to use |for_each_mem_range()| in|node_dev_init()|, which would work for all cases—regardless of whether|CONFIG_DEFERRED_STRUCT_PAGE_INIT| is set or not. Right? > >> Thanks >> Donet >>
On Fri, Apr 11, 2025 at 05:06:55PM +0530, Donet Tom wrote: > On 4/11/25 4:29 PM, Mike Rapoport wrote: > > > > It might make sense to use for_each_mem_range() as top level loop in > > node_dev_init(), but that's a separate topic. > > Yes, this makes sense to me as well. So in your opinion, instead of adding > a new memblock search function like I added , it's better to use > |for_each_mem_range()| in|node_dev_init()|, which would work for all > cases—regardless of whether|CONFIG_DEFERRED_STRUCT_PAGE_INIT| is set or > not. Right? Yes
On 4/15/25 3:16 PM, Mike Rapoport wrote: > On Fri, Apr 11, 2025 at 05:06:55PM +0530, Donet Tom wrote: >> On 4/11/25 4:29 PM, Mike Rapoport wrote: >>> It might make sense to use for_each_mem_range() as top level loop in >>> node_dev_init(), but that's a separate topic. >> Yes, this makes sense to me as well. So in your opinion, instead of adding >> a new memblock search function like I added , it's better to use >> |for_each_mem_range()| in|node_dev_init()|, which would work for all >> cases—regardless of whether|CONFIG_DEFERRED_STRUCT_PAGE_INIT| is set or >> not. Right? > > Yes Thank you so much. I will implement it.
diff --git a/drivers/base/node.c b/drivers/base/node.c index cd13ef287011..5c5dd02b8bdd 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -20,6 +20,8 @@ #include <linux/pm_runtime.h> #include <linux/swap.h> #include <linux/slab.h> +#include <linux/memblock.h> + static const struct bus_type node_subsys = { .name = "node", @@ -782,16 +784,19 @@ static void do_register_memory_block_under_node(int nid, ret); } -/* register memory section under specified node if it spans that node */ -static int register_mem_block_under_node_early(struct memory_block *mem_blk, - void *arg) +static int register_mem_block_early_if_dfer_page_init(struct memory_block *mem_blk, + unsigned long start_pfn, unsigned long end_pfn, int nid) { - unsigned long memory_block_pfns = memory_block_size_bytes() / PAGE_SIZE; - unsigned long start_pfn = section_nr_to_pfn(mem_blk->start_section_nr); - unsigned long end_pfn = start_pfn + memory_block_pfns - 1; - int nid = *(int *)arg; - unsigned long pfn; + if (curr_node_memblock_intersect_memory_block(start_pfn, end_pfn, nid)) + do_register_memory_block_under_node(nid, mem_blk, MEMINIT_EARLY); + return 0; +} + +static int register_mem_block_early__normal(struct memory_block *mem_blk, + unsigned long start_pfn, unsigned long end_pfn, int nid) +{ + unsigned long pfn; for (pfn = start_pfn; pfn <= end_pfn; pfn++) { int page_nid; @@ -821,6 +826,22 @@ static int register_mem_block_under_node_early(struct memory_block *mem_blk, /* mem section does not span the specified node */ return 0; } +/* register memory section under specified node if it spans that node */ +static int register_mem_block_under_node_early(struct memory_block *mem_blk, + void *arg) +{ + unsigned long memory_block_pfns = memory_block_size_bytes() / PAGE_SIZE; + unsigned long start_pfn = section_nr_to_pfn(mem_blk->start_section_nr); + unsigned long end_pfn = start_pfn + memory_block_pfns - 1; + int nid = *(int *)arg; + +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT + if (system_state < SYSTEM_RUNNING) + return register_mem_block_early_if_dfer_page_init(mem_blk, start_pfn, end_pfn, nid); +#endif + return register_mem_block_early__normal(mem_blk, start_pfn, end_pfn, nid); + +} /* * During hotplug we know that all pages in the memory block belong to the same
In the current implementation, when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set, we iterate over all PFNs in the memory block and use early_pfn_to_nid to find the NID until a match is found. This patch we are using curr_node_memblock_intersect_memory_block() to check if the current node's memblock intersects with the memory block passed when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. If an intersection is found, the memory block is added to the current node. If CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set, the existing mechanism for finding the NID will continue to be used. Signed-off-by: Donet Tom <donettom@linux.ibm.com> --- drivers/base/node.c | 37 +++++++++++++++++++++++++++++-------- 1 file changed, 29 insertions(+), 8 deletions(-)