Message ID | 20220203020022.3044-1-richard.weiyang@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/page_alloc: add zone to zonelist if populated | expand |
On 03.02.22 03:00, Wei Yang wrote: > During memory hotplug, when online/offline a zone, we need to rebuild > the zonelist for all nodes. Current behavior would lose a valid zone in > zonelist since only pick up managed_zone. > > There are two cases for a zone with memory but still !managed. > > * all pages were allocated via memblock > * all pages were taken by ballooning / virtio-mem > > This state maybe temporary, since both of them may release some memory. > Then it end up with a managed zone not in zonelist. > > This is introduced in 'commit 6aa303defb74 ("mm, vmscan: only allocate > and reclaim from zones with pages managed by the buddy allocator")'. > This patch restore the behavior. > > Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > CC: Mel Gorman <mgorman@techsingularity.net> > CC: David Hildenbrand <david@redhat.com> > Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator") That commit mentions that there used to be some ppc64 cases with fadump where it might have been a real problem. Unfortunately, that commit doesn't really tell what the performance implications are. We'd have to know how many "permanent memblock" allocations we have, that can never get freed. > --- > mm/page_alloc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index de15021a2887..b433a57ee76f 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -6092,7 +6092,7 @@ static int build_zonerefs_node(pg_data_t *pgdat, struct zoneref *zonerefs) > do { > zone_type--; > zone = pgdat->node_zones + zone_type; > - if (managed_zone(zone)) { > + if (populated_zone(zone)) { > zoneref_set_zone(zone, &zonerefs[nr_zones++]); > check_highest_zone(zone_type); > } The comment above the function also expresses that "Add all populated zones of a node to the zonelist.", so one way or the other, that should be made consistent.
On Thu 03-02-22 02:00:22, Wei Yang wrote: > During memory hotplug, when online/offline a zone, we need to rebuild > the zonelist for all nodes. Current behavior would lose a valid zone in > zonelist since only pick up managed_zone. > > There are two cases for a zone with memory but still !managed. > > * all pages were allocated via memblock > * all pages were taken by ballooning / virtio-mem > > This state maybe temporary, since both of them may release some memory. > Then it end up with a managed zone not in zonelist. > > This is introduced in 'commit 6aa303defb74 ("mm, vmscan: only allocate > and reclaim from zones with pages managed by the buddy allocator")'. > This patch restore the behavior. It has been introduced to fix a problem described in the the changelog (FADUMP configuration making kswapd hogging a cpu). You are not explaining why the original issue is not possible after this change. I also think that this is more of theoretical issue than anything that is a real life concern. It is good to state that in the changelog as well. That being said I am not against the change but the changelog needs more explanation before I can ack it. > Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > CC: Mel Gorman <mgorman@techsingularity.net> > CC: David Hildenbrand <david@redhat.com> > Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator") Fixes tag should be really used only if the referenced commit breaks something. I do not really see this to be the case here. Thanks! > --- > mm/page_alloc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index de15021a2887..b433a57ee76f 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -6092,7 +6092,7 @@ static int build_zonerefs_node(pg_data_t *pgdat, struct zoneref *zonerefs) > do { > zone_type--; > zone = pgdat->node_zones + zone_type; > - if (managed_zone(zone)) { > + if (populated_zone(zone)) { > zoneref_set_zone(zone, &zonerefs[nr_zones++]); > check_highest_zone(zone_type); > } > -- > 2.33.1
On Thu, Feb 03, 2022 at 10:25:51AM +0100, David Hildenbrand wrote: >On 03.02.22 03:00, Wei Yang wrote: >> During memory hotplug, when online/offline a zone, we need to rebuild >> the zonelist for all nodes. Current behavior would lose a valid zone in >> zonelist since only pick up managed_zone. >> >> There are two cases for a zone with memory but still !managed. >> >> * all pages were allocated via memblock >> * all pages were taken by ballooning / virtio-mem >> >> This state maybe temporary, since both of them may release some memory. >> Then it end up with a managed zone not in zonelist. >> >> This is introduced in 'commit 6aa303defb74 ("mm, vmscan: only allocate >> and reclaim from zones with pages managed by the buddy allocator")'. >> This patch restore the behavior. >> >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> >> CC: Mel Gorman <mgorman@techsingularity.net> >> CC: David Hildenbrand <david@redhat.com> >> Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator") > >That commit mentions that there used to be some ppc64 cases with fadump >where it might have been a real problem. Unfortunately, that commit >doesn't really tell what the performance implications are. > It mentioned a 100% CPU usage by commit 1d82de618ddd. Currently I don't find which part introduced this and how it is fixed. >We'd have to know how many "permanent memblock" allocations we have, >that can never get freed. > For the case in that commit, the memory are reserved for crash kernel. I am afraid this never get freed. But for all the cases, I am not sure.
On Thu, Feb 03, 2022 at 10:27:11AM +0100, Michal Hocko wrote: >On Thu 03-02-22 02:00:22, Wei Yang wrote: >> During memory hotplug, when online/offline a zone, we need to rebuild >> the zonelist for all nodes. Current behavior would lose a valid zone in >> zonelist since only pick up managed_zone. >> >> There are two cases for a zone with memory but still !managed. >> >> * all pages were allocated via memblock >> * all pages were taken by ballooning / virtio-mem >> >> This state maybe temporary, since both of them may release some memory. >> Then it end up with a managed zone not in zonelist. >> >> This is introduced in 'commit 6aa303defb74 ("mm, vmscan: only allocate >> and reclaim from zones with pages managed by the buddy allocator")'. >> This patch restore the behavior. > >It has been introduced to fix a problem described in the the changelog >(FADUMP configuration making kswapd hogging a cpu). You are not >explaining why the original issue is not possible after this change. > The first sight is kswapd deals with pgdat->node_zones, which is not affected by pgdat->node_zonelists. For the exact detail, I don't figure that out now. Will need some time to take a look into. For that commit, I only found this link. http://lkml.kernel.org/r/20160831195104.GB8119@techsingularity.net If there are some other discussions, it would be helpful. >I also think that this is more of theoretical issue than anything that >is a real life concern. It is good to state that in the changelog as >well. > >That being said I am not against the change but the changelog needs more >explanation before I can ack it. > >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> >> CC: Mel Gorman <mgorman@techsingularity.net> >> CC: David Hildenbrand <david@redhat.com> >> Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator") > >Fixes tag should be really used only if the referenced commit breaks >something. I do not really see this to be the case here. > Got it. >Thanks! > >> --- >> mm/page_alloc.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index de15021a2887..b433a57ee76f 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -6092,7 +6092,7 @@ static int build_zonerefs_node(pg_data_t *pgdat, struct zoneref *zonerefs) >> do { >> zone_type--; >> zone = pgdat->node_zones + zone_type; >> - if (managed_zone(zone)) { >> + if (populated_zone(zone)) { >> zoneref_set_zone(zone, &zonerefs[nr_zones++]); >> check_highest_zone(zone_type); >> } >> -- >> 2.33.1 > >-- >Michal Hocko >SUSE Labs
On Thu, Feb 03, 2022 at 10:27:11AM +0100, Michal Hocko wrote: >On Thu 03-02-22 02:00:22, Wei Yang wrote: >> During memory hotplug, when online/offline a zone, we need to rebuild >> the zonelist for all nodes. Current behavior would lose a valid zone in >> zonelist since only pick up managed_zone. >> >> There are two cases for a zone with memory but still !managed. >> >> * all pages were allocated via memblock >> * all pages were taken by ballooning / virtio-mem >> >> This state maybe temporary, since both of them may release some memory. >> Then it end up with a managed zone not in zonelist. >> >> This is introduced in 'commit 6aa303defb74 ("mm, vmscan: only allocate >> and reclaim from zones with pages managed by the buddy allocator")'. >> This patch restore the behavior. > >It has been introduced to fix a problem described in the the changelog >(FADUMP configuration making kswapd hogging a cpu). You are not >explaining why the original issue is not possible after this change. > After some reading, here is what I find. To prevent this problem again, we need to make sure reclaim only applies to managed_zones. After go through the code, there are only two places we don't guarantee this when iterating zone. 1. skip_throttle_noprogress() 2. throttle_direct_reclaim() After we make sure vmscan only reclaim on managed_zone, the problem won't be possible after this change. BTW, there are another two places use for_each_zone_zonelist_nodemask(). It's ok to not check managed_zone, since actually they are doing a node base iteration. If this looks good to you, I would adjust the changelog and send two patches to fix the above two places. >I also think that this is more of theoretical issue than anything that >is a real life concern. It is good to state that in the changelog as >well. > >That being said I am not against the change but the changelog needs more >explanation before I can ack it. > >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> >> CC: Mel Gorman <mgorman@techsingularity.net> >> CC: David Hildenbrand <david@redhat.com> >> Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator") > >Fixes tag should be really used only if the referenced commit breaks >something. I do not really see this to be the case here. > >Thanks! >
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index de15021a2887..b433a57ee76f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6092,7 +6092,7 @@ static int build_zonerefs_node(pg_data_t *pgdat, struct zoneref *zonerefs) do { zone_type--; zone = pgdat->node_zones + zone_type; - if (managed_zone(zone)) { + if (populated_zone(zone)) { zoneref_set_zone(zone, &zonerefs[nr_zones++]); check_highest_zone(zone_type); }
During memory hotplug, when online/offline a zone, we need to rebuild the zonelist for all nodes. Current behavior would lose a valid zone in zonelist since only pick up managed_zone. There are two cases for a zone with memory but still !managed. * all pages were allocated via memblock * all pages were taken by ballooning / virtio-mem This state maybe temporary, since both of them may release some memory. Then it end up with a managed zone not in zonelist. This is introduced in 'commit 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")'. This patch restore the behavior. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> CC: Mel Gorman <mgorman@techsingularity.net> CC: David Hildenbrand <david@redhat.com> Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator") --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)