Message ID | 9f1dbe7ee1301c7163b2770e32954ff5e3ecf2c4.1697711415.git.zhengqi.arch@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | handle memoryless nodes more appropriately | expand |
Qi Zheng <zhengqi.arch@bytedance.com> writes: > In offline_pages(), if a node becomes memoryless, we > will clear its N_MEMORY state by calling node_states_clear_node(). > But we do this after rebuilding the zonelists by calling > build_all_zonelists(), which will cause this memoryless node to > still be in the fallback list of other nodes. For fallback list, do you mean pgdat->node_zonelists[]? If so, in build_all_zonelists __build_all_zonelists build_zonelists build_zonelists_in_node_order build_zonerefs_node populated_zone() will be checked before adding zone into zonelist. So, IIUC, we will not try to allocate from the memory less node. -- Best Regards, Huang, Ying > This will incur > some runtime overhead. > > To drop memoryless node from fallback lists in this case, just > call node_states_clear_node() before calling build_all_zonelists(). > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > Acked-by: David Hildenbrand <david@redhat.com> [snip] -- Best Regards, Huang, Ying
Hi Ying, On 2023/10/20 15:05, Huang, Ying wrote: > Qi Zheng <zhengqi.arch@bytedance.com> writes: > >> In offline_pages(), if a node becomes memoryless, we >> will clear its N_MEMORY state by calling node_states_clear_node(). >> But we do this after rebuilding the zonelists by calling >> build_all_zonelists(), which will cause this memoryless node to >> still be in the fallback list of other nodes. > > For fallback list, do you mean pgdat->node_zonelists[]? If so, in > > build_all_zonelists > __build_all_zonelists > build_zonelists > build_zonelists_in_node_order > build_zonerefs_node > > populated_zone() will be checked before adding zone into zonelist. > > So, IIUC, we will not try to allocate from the memory less node. Normally yes, but if it is the weird topology mentioned in [1], it's possible to allocate memory from it, it is a memoryless node, but it also has memory. In addition to the above case, I think it's reasonable to remove memory less node from node_order[] in advance. In this way it will not to be traversed in build_zonelists_in_node_order(). [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ Thanks, Qi > > -- > Best Regards, > Huang, Ying > > >> This will incur >> some runtime overhead. >> >> To drop memoryless node from fallback lists in this case, just >> call node_states_clear_node() before calling build_all_zonelists(). >> >> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> Acked-by: David Hildenbrand <david@redhat.com> > > [snip] > > -- > Best Regards, > Huang, Ying
* Qi Zheng <zhengqi.arch@bytedance.com> wrote: > In offline_pages(), if a node becomes memoryless, we > will clear its N_MEMORY state by calling node_states_clear_node(). > But we do this after rebuilding the zonelists by calling > build_all_zonelists(), which will cause this memoryless node to > still be in the fallback list of other nodes. This will incur > some runtime overhead. > > To drop memoryless node from fallback lists in this case, just > call node_states_clear_node() before calling build_all_zonelists(). s/memoryless node /memoryless nodes > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > Acked-by: David Hildenbrand <david@redhat.com> > --- > mm/memory_hotplug.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index d4a364fdaf8f..f019f7d6272c 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -2036,12 +2036,16 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, > /* reinitialise watermarks and update pcp limits */ > init_per_zone_wmark_min(); > > + /* > + * Make sure to mark the node as memory-less before rebuilding the zone > + * list. Otherwise this node would still appear in the fallback lists. > + */ > + node_states_clear_node(node, &arg); Acked-by: Ingo Molnar <mingo@kernel.org> Thanks, Ingo
Hi Ingo, On 2023/10/20 16:32, Ingo Molnar wrote: > > * Qi Zheng <zhengqi.arch@bytedance.com> wrote: > >> In offline_pages(), if a node becomes memoryless, we >> will clear its N_MEMORY state by calling node_states_clear_node(). >> But we do this after rebuilding the zonelists by calling >> build_all_zonelists(), which will cause this memoryless node to >> still be in the fallback list of other nodes. This will incur >> some runtime overhead. >> >> To drop memoryless node from fallback lists in this case, just >> call node_states_clear_node() before calling build_all_zonelists(). > > s/memoryless node > /memoryless nodes Will do. > >> >> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> Acked-by: David Hildenbrand <david@redhat.com> >> --- >> mm/memory_hotplug.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index d4a364fdaf8f..f019f7d6272c 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -2036,12 +2036,16 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, >> /* reinitialise watermarks and update pcp limits */ >> init_per_zone_wmark_min(); >> >> + /* >> + * Make sure to mark the node as memory-less before rebuilding the zone >> + * list. Otherwise this node would still appear in the fallback lists. >> + */ >> + node_states_clear_node(node, &arg); > > Acked-by: Ingo Molnar <mingo@kernel.org> Thanks. > > Thanks, > > Ingo
Qi Zheng <zhengqi.arch@bytedance.com> writes: > Hi Ying, > > On 2023/10/20 15:05, Huang, Ying wrote: >> Qi Zheng <zhengqi.arch@bytedance.com> writes: >> >>> In offline_pages(), if a node becomes memoryless, we >>> will clear its N_MEMORY state by calling node_states_clear_node(). >>> But we do this after rebuilding the zonelists by calling >>> build_all_zonelists(), which will cause this memoryless node to >>> still be in the fallback list of other nodes. >> For fallback list, do you mean pgdat->node_zonelists[]? If so, in >> build_all_zonelists >> __build_all_zonelists >> build_zonelists >> build_zonelists_in_node_order >> build_zonerefs_node >> populated_zone() will be checked before adding zone into zonelist. >> So, IIUC, we will not try to allocate from the memory less node. > > Normally yes, but if it is the weird topology mentioned in [1], it's > possible to allocate memory from it, it is a memoryless node, but it > also has memory. > > In addition to the above case, I think it's reasonable to remove > memory less node from node_order[] in advance. In this way it will > not to be traversed in build_zonelists_in_node_order(). > > [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ Got it! Thank you for information. I think that it may be good to include this in the patch description to avoid potential confusing in the future. -- Best Regards, Huang, Ying > Thanks, > Qi > > >> -- >> Best Regards, >> Huang, Ying >> >>> This will incur >>> some runtime overhead. >>> >>> To drop memoryless node from fallback lists in this case, just >>> call node_states_clear_node() before calling build_all_zonelists(). >>> >>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >>> Acked-by: David Hildenbrand <david@redhat.com> >> [snip] >> -- >> Best Regards, >> Huang, Ying
Hi Ying, On 2023/10/23 09:18, Huang, Ying wrote: > Qi Zheng <zhengqi.arch@bytedance.com> writes: > >> Hi Ying, >> >> On 2023/10/20 15:05, Huang, Ying wrote: >>> Qi Zheng <zhengqi.arch@bytedance.com> writes: >>> >>>> In offline_pages(), if a node becomes memoryless, we >>>> will clear its N_MEMORY state by calling node_states_clear_node(). >>>> But we do this after rebuilding the zonelists by calling >>>> build_all_zonelists(), which will cause this memoryless node to >>>> still be in the fallback list of other nodes. >>> For fallback list, do you mean pgdat->node_zonelists[]? If so, in >>> build_all_zonelists >>> __build_all_zonelists >>> build_zonelists >>> build_zonelists_in_node_order >>> build_zonerefs_node >>> populated_zone() will be checked before adding zone into zonelist. >>> So, IIUC, we will not try to allocate from the memory less node. >> >> Normally yes, but if it is the weird topology mentioned in [1], it's >> possible to allocate memory from it, it is a memoryless node, but it >> also has memory. >> >> In addition to the above case, I think it's reasonable to remove >> memory less node from node_order[] in advance. In this way it will >> not to be traversed in build_zonelists_in_node_order(). >> >> [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ > > Got it! Thank you for information. I think that it may be good to > include this in the patch description to avoid potential confusing in > the future. OK, maybe the commit message can be changed to the following: ``` In offline_pages(), if a node becomes memoryless, we will clear its N_MEMORY state by calling node_states_clear_node(). But we do this after rebuilding the zonelists by calling build_all_zonelists(), which will cause this memoryless node to still be in the fallback nodes (node_order[]) of other nodes. To drop memoryless nodes from fallback nodes in this case, just call node_states_clear_node() before calling build_all_zonelists(). In this way, we will not try to allocate pages from memoryless node0, then the panic mentioned in [1] will also be fixed. Even though this problem has been solved by dropping the NODE_MIN_SIZE constrain in x86 [2], it would be better to fix it in the core MM as well. [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ [2]. https://lore.kernel.org/all/20231017062215.171670-1-rppt@kernel.org/ ``` Thanks, Qi > > -- > Best Regards, > Huang, Ying > >> Thanks, >> Qi >> >> >>> -- >>> Best Regards, >>> Huang, Ying >>> >>>> This will incur >>>> some runtime overhead. >>>> >>>> To drop memoryless node from fallback lists in this case, just >>>> call node_states_clear_node() before calling build_all_zonelists(). >>>> >>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >>>> Acked-by: David Hildenbrand <david@redhat.com> >>> [snip] >>> -- >>> Best Regards, >>> Huang, Ying
Qi Zheng <zhengqi.arch@bytedance.com> writes: > Hi Ying, > > On 2023/10/23 09:18, Huang, Ying wrote: >> Qi Zheng <zhengqi.arch@bytedance.com> writes: >> >>> Hi Ying, >>> >>> On 2023/10/20 15:05, Huang, Ying wrote: >>>> Qi Zheng <zhengqi.arch@bytedance.com> writes: >>>> >>>>> In offline_pages(), if a node becomes memoryless, we >>>>> will clear its N_MEMORY state by calling node_states_clear_node(). >>>>> But we do this after rebuilding the zonelists by calling >>>>> build_all_zonelists(), which will cause this memoryless node to >>>>> still be in the fallback list of other nodes. >>>> For fallback list, do you mean pgdat->node_zonelists[]? If so, in >>>> build_all_zonelists >>>> __build_all_zonelists >>>> build_zonelists >>>> build_zonelists_in_node_order >>>> build_zonerefs_node >>>> populated_zone() will be checked before adding zone into zonelist. >>>> So, IIUC, we will not try to allocate from the memory less node. >>> >>> Normally yes, but if it is the weird topology mentioned in [1], it's >>> possible to allocate memory from it, it is a memoryless node, but it >>> also has memory. >>> >>> In addition to the above case, I think it's reasonable to remove >>> memory less node from node_order[] in advance. In this way it will >>> not to be traversed in build_zonelists_in_node_order(). >>> >>> [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ >> Got it! Thank you for information. I think that it may be good to >> include this in the patch description to avoid potential confusing in >> the future. > > OK, maybe the commit message can be changed to the following: > > ``` > In offline_pages(), if a node becomes memoryless, we > will clear its N_MEMORY state by calling node_states_clear_node(). > But we do this after rebuilding the zonelists by calling > build_all_zonelists(), which will cause this memoryless node to > still be in the fallback nodes (node_order[]) of other nodes. > > To drop memoryless nodes from fallback nodes in this case, just > call node_states_clear_node() before calling build_all_zonelists(). > > In this way, we will not try to allocate pages from memoryless > node0, then the panic mentioned in [1] will also be fixed. Even though > this problem has been solved by dropping the NODE_MIN_SIZE constrain > in x86 [2], it would be better to fix it in the core MM as well. > > [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ > [2]. https://lore.kernel.org/all/20231017062215.171670-1-rppt@kernel.org/ > > ``` This is helpful. Thanks! -- Best Regards, Huang, Ying > Thanks, > Qi > >> -- >> Best Regards, >> Huang, Ying >> >>> Thanks, >>> Qi >>> >>> >>>> -- >>>> Best Regards, >>>> Huang, Ying >>>> >>>>> This will incur >>>>> some runtime overhead. >>>>> >>>>> To drop memoryless node from fallback lists in this case, just >>>>> call node_states_clear_node() before calling build_all_zonelists(). >>>>> >>>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >>>>> Acked-by: David Hildenbrand <david@redhat.com> >>>> [snip] >>>> -- >>>> Best Regards, >>>> Huang, Ying
On 2023/10/23 11:10, Huang, Ying wrote: > Qi Zheng <zhengqi.arch@bytedance.com> writes: > >> Hi Ying, >> >> On 2023/10/23 09:18, Huang, Ying wrote: >>> Qi Zheng <zhengqi.arch@bytedance.com> writes: >>> >>>> Hi Ying, >>>> >>>> On 2023/10/20 15:05, Huang, Ying wrote: >>>>> Qi Zheng <zhengqi.arch@bytedance.com> writes: >>>>> >>>>>> In offline_pages(), if a node becomes memoryless, we >>>>>> will clear its N_MEMORY state by calling node_states_clear_node(). >>>>>> But we do this after rebuilding the zonelists by calling >>>>>> build_all_zonelists(), which will cause this memoryless node to >>>>>> still be in the fallback list of other nodes. >>>>> For fallback list, do you mean pgdat->node_zonelists[]? If so, in >>>>> build_all_zonelists >>>>> __build_all_zonelists >>>>> build_zonelists >>>>> build_zonelists_in_node_order >>>>> build_zonerefs_node >>>>> populated_zone() will be checked before adding zone into zonelist. >>>>> So, IIUC, we will not try to allocate from the memory less node. >>>> >>>> Normally yes, but if it is the weird topology mentioned in [1], it's >>>> possible to allocate memory from it, it is a memoryless node, but it >>>> also has memory. >>>> >>>> In addition to the above case, I think it's reasonable to remove >>>> memory less node from node_order[] in advance. In this way it will >>>> not to be traversed in build_zonelists_in_node_order(). >>>> >>>> [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ >>> Got it! Thank you for information. I think that it may be good to >>> include this in the patch description to avoid potential confusing in >>> the future. >> >> OK, maybe the commit message can be changed to the following: >> >> ``` >> In offline_pages(), if a node becomes memoryless, we >> will clear its N_MEMORY state by calling node_states_clear_node(). >> But we do this after rebuilding the zonelists by calling >> build_all_zonelists(), which will cause this memoryless node to >> still be in the fallback nodes (node_order[]) of other nodes. >> >> To drop memoryless nodes from fallback nodes in this case, just >> call node_states_clear_node() before calling build_all_zonelists(). >> >> In this way, we will not try to allocate pages from memoryless >> node0, then the panic mentioned in [1] will also be fixed. Even though >> this problem has been solved by dropping the NODE_MIN_SIZE constrain >> in x86 [2], it would be better to fix it in the core MM as well. >> >> [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ >> [2]. https://lore.kernel.org/all/20231017062215.171670-1-rppt@kernel.org/ >> >> ``` Hi Andrew, can you help modify the commit message to this? :) Thanks, Qi > > This is helpful. Thanks! > > -- > Best Regards, > Huang, Ying > >> Thanks, >> Qi >> >>> -- >>> Best Regards, >>> Huang, Ying >>> >>>> Thanks, >>>> Qi >>>> >>>> >>>>> -- >>>>> Best Regards, >>>>> Huang, Ying >>>>> >>>>>> This will incur >>>>>> some runtime overhead. >>>>>> >>>>>> To drop memoryless node from fallback lists in this case, just >>>>>> call node_states_clear_node() before calling build_all_zonelists(). >>>>>> >>>>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >>>>>> Acked-by: David Hildenbrand <david@redhat.com> >>>>> [snip] >>>>> -- >>>>> Best Regards, >>>>> Huang, Ying
On Mon, 23 Oct 2023 11:17:03 +0800 Qi Zheng <zhengqi.arch@bytedance.com> wrote:
> Hi Andrew, can you help modify the commit message to this? :)
Done, thanks.
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index d4a364fdaf8f..f019f7d6272c 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -2036,12 +2036,16 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, /* reinitialise watermarks and update pcp limits */ init_per_zone_wmark_min(); + /* + * Make sure to mark the node as memory-less before rebuilding the zone + * list. Otherwise this node would still appear in the fallback lists. + */ + node_states_clear_node(node, &arg); if (!populated_zone(zone)) { zone_pcp_reset(zone); build_all_zonelists(NULL); } - node_states_clear_node(node, &arg); if (arg.status_change_nid >= 0) { kcompactd_stop(node); kswapd_stop(node);