Message ID | 20181127162005.15833-6-osalvador@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Do not touch pages in hot-remove path | expand |
On Tue 27-11-18 17:20:05, Oscar Salvador wrote: > From: Oscar Salvador <osalvador@suse.com> > > shrink_zone_span and shrink_pgdat_span look a bit weird. > > They both have a loop at the end to check if the zone > or pgdat contains only holes in case the section to be removed > was not either the first one or the last one. > > Both code loops look quite similar, so we can simplify it a bit. > We do that by creating a function (has_only_holes), that basically > calls find_smallest_section_pfn() with the full range. > In case nothing has to be found, we do not have any more sections > there. > > To be honest, I am not really sure we even need to go through this > check in case we are removing a middle section, because from what I can see, > we will always have a first/last section. > > Taking the chance, we could also simplify both find_smallest_section_pfn() > and find_biggest_section_pfn() functions and move the common code > to a helper. I didn't get to read through this whole series but one thing that is on my todo list for a long time is to remove all this stuff. I do not think we really want to simplify it when there shouldn't be any real reason to have it around at all. Why do we need to shrink zone/node at all? Now that we can override and assign memory to both normal na movable zones I think we should be good to remove shrinking.
On Wed, 2018-11-28 at 07:50 +0100, Michal Hocko wrote: > > I didn't get to read through this whole series but one thing that is > on > my todo list for a long time is to remove all this stuff. I do not > think > we really want to simplify it when there shouldn't be any real reason > to > have it around at all. Why do we need to shrink zone/node at all? > > Now that we can override and assign memory to both normal na movable > zones I think we should be good to remove shrinking. I feel like I am missing a piece of obvious information here. Right now, we shrink zone/node to decrease spanned pages. I thought this was done for consistency, and in case of the node, in try_offline_node we use the spanned pages to go through all sections to check whether the node can be removed or not. From your comment, I understand that we do not really care about spanned pages. Why? Could you please expand on that? And if we remove it, would not this give to a user "bad"/confusing information when looking at /proc/zoneinfo? Thanks
On 28.11.18 08:07, Oscar Salvador wrote: > On Wed, 2018-11-28 at 07:50 +0100, Michal Hocko wrote: >> >> I didn't get to read through this whole series but one thing that is >> on >> my todo list for a long time is to remove all this stuff. I do not >> think >> we really want to simplify it when there shouldn't be any real reason >> to >> have it around at all. Why do we need to shrink zone/node at all? >> >> Now that we can override and assign memory to both normal na movable >> zones I think we should be good to remove shrinking. > > I feel like I am missing a piece of obvious information here. > Right now, we shrink zone/node to decrease spanned pages. > I thought this was done for consistency, and in case of the node, in > try_offline_node we use the spanned pages to go through all sections > to check whether the node can be removed or not. > I am also not sure if that can be done. Anyhow, simplifying first and getting rid later is in my opinion also good enough. One step at a time :) > From your comment, I understand that we do not really care about > spanned pages. Why? > Could you please expand on that? > > And if we remove it, would not this give to a user "bad"/confusing > information when looking at /proc/zoneinfo? > > > Thanks >
On Wed 28-11-18 08:07:46, Oscar Salvador wrote: > On Wed, 2018-11-28 at 07:50 +0100, Michal Hocko wrote: > > > > I didn't get to read through this whole series but one thing that is > > on > > my todo list for a long time is to remove all this stuff. I do not > > think > > we really want to simplify it when there shouldn't be any real reason > > to > > have it around at all. Why do we need to shrink zone/node at all? > > > > Now that we can override and assign memory to both normal na movable > > zones I think we should be good to remove shrinking. > > I feel like I am missing a piece of obvious information here. > Right now, we shrink zone/node to decrease spanned pages. > I thought this was done for consistency, and in case of the node, in > try_offline_node we use the spanned pages to go through all sections > to check whether the node can be removed or not. > > >From your comment, I understand that we do not really care about > spanned pages. Why? > Could you please expand on that? OK, so what is the difference between memory hotremoving a range withing a zone and on the zone boundary? There should be none, yet spanned pages do get updated only when we do the later, IIRC? So spanned pages is not really all that valuable information. It just tells the zone_end-zone_start. Also not what is the semantic of spanned_pages for interleaving zones. > And if we remove it, would not this give to a user "bad"/confusing > information when looking at /proc/zoneinfo? Who does use spanned pages for anything really important? It is managed pages that people do care about. Maybe there is something that makes this harder than I anticipate but I have a strong feeling that this all complication should simply go.
> OK, so what is the difference between memory hotremoving a range > withing > a zone and on the zone boundary? There should be none, yet spanned > pages > do get updated only when we do the later, IIRC? So spanned pages is not > really all that valuable information. It just tells the > zone_end-zone_start. Also not what is the semantic of > spanned_pages for interleaving zones. Ok, I think I start getting your point. Yes, spanned_pages are only touched in case we remove the first or the last section of memory range. So your point is to get rid of shrink_zone_span() and shrink_node_span(), and do not touch spanned_pages at all? (only when the zone is gone or the node goes offline?) The only thing I am worried about is that by doing that, the system will account spanned_pages incorrectly. So, if we remove pages on zone-boundary, neither zone_start_pfn nor spanned_pages will change. I did not check yet, but could it be that somewhere we use zone/node's spanned_pages information to compute something? I mean, do not get me wrong, getting rid of all shrink stuff would be great, it will remove a __lot__ of code and some complexity, but I am not sure if it is totally safe. >> And if we remove it, would not this give to a user "bad"/confusing >> information when looking at /proc/zoneinfo? > > Who does use spanned pages for anything really important? It is managed > pages that people do care about. Fair enough.
On Wed 28-11-18 12:00:35, osalvador@suse.de wrote: > > > OK, so what is the difference between memory hotremoving a range withing > > a zone and on the zone boundary? There should be none, yet spanned pages > > do get updated only when we do the later, IIRC? So spanned pages is not > > really all that valuable information. It just tells the > > zone_end-zone_start. Also not what is the semantic of > > spanned_pages for interleaving zones. > > Ok, I think I start getting your point. > Yes, spanned_pages are only touched in case we remove the first or the last > section of memory range. > > So your point is to get rid of shrink_zone_span() and shrink_node_span(), > and do not touch spanned_pages at all? (only when the zone is gone or the > node > goes offline?) yep. Or when we extend a zone/node via hotplug. > The only thing I am worried about is that by doing that, the system > will account spanned_pages incorrectly. As long as end_pfn - start_pfn matches then I do not see what would be incorrect. > So, if we remove pages on zone-boundary, neither zone_start_pfn nor > spanned_pages will change. > I did not check yet, but could it be that somewhere we use zone/node's > spanned_pages > information to compute something? That is an obvious homework to do when posting such a patch ;) > I mean, do not get me wrong, getting rid of all shrink stuff would be great, > it will remove a __lot__ of code and some complexity, but I am not sure if > it is totally safe. Yes it is a lot of code and I do not see any strong justification for it. In the past the zone boundary was really important becuase it defined the memory zone for the new memory to hotplug. For quite some time we have a much more flexible semantic and you can online memory to normal/movable zones as you like. So I _believe_ the need for shrink code is gone. Maybe I am missing something of course. All I want to say is that it would be _so_ great to get rid of it rather than waste a lip stick on a pig...
> yep. Or when we extend a zone/node via hotplug. > >> The only thing I am worried about is that by doing that, the system >> will account spanned_pages incorrectly. > > As long as end_pfn - start_pfn matches then I do not see what would be > incorrect. If by end_pfn - start_pfn you mean zone_end_pfn - zone_start_pfn, then we would still need to change zone_start_pfn when removing the first section, and adjust spanned_pages in case we remove the last section, would not we? Let us say we have a zone with 3 sections: zone_start_pfn = 0 zone_end_pfn = 98304 If we hot-remove the last section, zone_end_pfn should be adjusted to 65536. Otherwise zone_end_pfn - zone_start_pfn will give us more. The same goes when we hot-remove the first section. Of course, we should not care when removing a section which is not either the first one or the last one. Having said that, I will check the uses we have for spanned_pages.
On Wed 28-11-18 13:51:42, osalvador@suse.de wrote: > > yep. Or when we extend a zone/node via hotplug. > > > > > The only thing I am worried about is that by doing that, the system > > > will account spanned_pages incorrectly. > > > > As long as end_pfn - start_pfn matches then I do not see what would be > > incorrect. > > If by end_pfn - start_pfn you mean zone_end_pfn - zone_start_pfn, > then we would still need to change zone_start_pfn when removing > the first section, and adjust spanned_pages in case we remove the last > section, > would not we? Why? Again, how is removing the last/first section of the zone any different from any other section?
On 2018-11-28 13:51, osalvador@suse.de wrote: >> yep. Or when we extend a zone/node via hotplug. >> >>> The only thing I am worried about is that by doing that, the system >>> will account spanned_pages incorrectly. >> >> As long as end_pfn - start_pfn matches then I do not see what would be >> incorrect. Or unless I misunderstood you, and you would like to instead of having this shrink code, re-use resize_zone/pgdat_range to adjust end_pfn and start_pfn when offlining first or last sections.
On 2018-11-28 14:08, Michal Hocko wrote: > On Wed 28-11-18 13:51:42, osalvador@suse.de wrote: >> > yep. Or when we extend a zone/node via hotplug. >> > >> > > The only thing I am worried about is that by doing that, the system >> > > will account spanned_pages incorrectly. >> > >> > As long as end_pfn - start_pfn matches then I do not see what would be >> > incorrect. >> >> If by end_pfn - start_pfn you mean zone_end_pfn - zone_start_pfn, >> then we would still need to change zone_start_pfn when removing >> the first section, and adjust spanned_pages in case we remove the last >> section, >> would not we? > > Why? Again, how is removing the last/first section of the zone any > different from any other section? Because removing last/first section changes the zone's boundary. A zone that you removed the first section, will no longer start at zone_start_pfn. A quick glance points that, for example, compact_zone() relies on zone_start_pfn to get where the zone starts. Now, if you remove the first section and zone_start_pfn does not get adjusted, you will get a wrong start. Maybe that is fine, I am not sure. Sorry for looping here, but it is being difficult for me to grasp it.
On Wed 28-11-18 14:18:43, osalvador@suse.de wrote: > On 2018-11-28 14:08, Michal Hocko wrote: > > On Wed 28-11-18 13:51:42, osalvador@suse.de wrote: > > > > yep. Or when we extend a zone/node via hotplug. > > > > > > > > > The only thing I am worried about is that by doing that, the system > > > > > will account spanned_pages incorrectly. > > > > > > > > As long as end_pfn - start_pfn matches then I do not see what would be > > > > incorrect. > > > > > > If by end_pfn - start_pfn you mean zone_end_pfn - zone_start_pfn, > > > then we would still need to change zone_start_pfn when removing > > > the first section, and adjust spanned_pages in case we remove the last > > > section, > > > would not we? > > > > Why? Again, how is removing the last/first section of the zone any > > different from any other section? > > Because removing last/first section changes the zone's boundary. > A zone that you removed the first section, will no longer start > at zone_start_pfn. > > A quick glance points that, for example, compact_zone() relies on > zone_start_pfn > to get where the zone starts. > Now, if you remove the first section and zone_start_pfn does not get > adjusted, you > will get a wrong start. > > Maybe that is fine, I am not sure. > Sorry for looping here, but it is being difficult for me to grasp it. OK, so let me try again. What is the difference for a pfn walker to start at an offline pfn start from any other offlined section withing a zone boundary? I believe there is none because the pfn walker needs to skip over offline pfns anyway whether they start at a zone boundary or not.
>> Maybe that is fine, I am not sure. >> Sorry for looping here, but it is being difficult for me to grasp it. > > OK, so let me try again. What is the difference for a pfn walker to > start at an offline pfn start from any other offlined section withing a > zone boundary? I believe there is none because the pfn walker needs to > skip over offline pfns anyway whether they start at a zone boundary or > not. If the pfn walker in question skips over "invalid" (not online) pfn, then we are fine I guess. But we need to make sure that this is the case, and that we do not have someone relaying on zone_start_pfn and trusting it blindly. I will go through the code and check all cases to be sure that this is not the case. If that is the case, then I am fine with getting rid of the shrink code. Thanks for explanations ;-)
> OK, so let me try again. What is the difference for a pfn walker to > start at an offline pfn start from any other offlined section withing a > zone boundary? I believe there is none because the pfn walker needs to > skip over offline pfns anyway whether they start at a zone boundary or > not. I checked most of the users of zone_start_pnf: * split_huge_pages_set: - It uses pfn_valid(). I guess this is fine as it will check if the section still has a map. * __reset_isolation_suitable(): - Safe as it uses pfn_to_online_page(). * isolate_freepages_range(): * isolate_migratepages_range(): * isolate_migratepages(): - They use pageblock_pfn_to_page(). If !zone->contiguos, it will use __pageblock_pfn_to_page()->pfn_to_online_page() If zone->contiguos is true, it will use pageblock_pfn_to_page()->pfn_to_page(), which is bad because it will not skip over offlined pfns. * count_highmem_pages: * count_data_pages: * copy_data_pages: - page_is_saveable()->pfn_valid(). I guess this is fine as it will check if the section still has a map. So, leaving out isolate_* functions, it seems that we should be safe. isolate_* functions would depend on !zone->contiguos to call __pageblock_pfn_to_page()->pfn_to_online_page(). So whenever we remove a section in a zone, we should clear zone->contiguos. But this really calls for some deep check that we will not shoot in the foot. What I can do for now is to drop this patch from the patchset and re-send v3 without it.
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 49b91907e19e..0e3f89423153 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -325,28 +325,29 @@ static bool is_section_ok(struct mem_section *ms, bool zone) else return valid_section(ms); } + +static bool is_valid_section(struct zone *zone, int nid, unsigned long pfn) +{ + struct mem_section *ms = __pfn_to_section(pfn); + + if (unlikely(!is_section_ok(ms, !!zone))) + return false; + if (unlikely(pfn_to_nid(pfn) != nid)) + return false; + if (zone && zone != page_zone(pfn_to_page(pfn))) + return false; + + return true; +} + /* find the smallest valid pfn in the range [start_pfn, end_pfn) */ static unsigned long find_smallest_section_pfn(int nid, struct zone *zone, unsigned long start_pfn, unsigned long end_pfn) { - struct mem_section *ms; - - for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) { - ms = __pfn_to_section(start_pfn); - - if (!is_section_ok(ms, !!zone)) - continue; - - if (unlikely(pfn_to_nid(start_pfn) != nid)) - continue; - - if (zone && zone != page_zone(pfn_to_page(start_pfn))) - continue; - - return start_pfn; - } - + for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) + if (is_valid_section(zone, nid, start_pfn)) + return start_pfn; return 0; } @@ -355,29 +356,26 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone, unsigned long start_pfn, unsigned long end_pfn) { - struct mem_section *ms; unsigned long pfn; /* pfn is the end pfn of a memory section. */ pfn = end_pfn - 1; - for (; pfn >= start_pfn; pfn -= PAGES_PER_SECTION) { - ms = __pfn_to_section(pfn); - - if (!is_section_ok(ms, !!zone)) - continue; - - if (unlikely(pfn_to_nid(pfn) != nid)) - continue; - - if (zone && zone != page_zone(pfn_to_page(pfn))) - continue; - - return pfn; - } - + for (; pfn >= start_pfn; pfn -= PAGES_PER_SECTION) + if (is_valid_section(zone, nid, pfn)) + return pfn; return 0; } +static bool has_only_holes(int nid, struct zone *zone, + unsigned long start_pfn, + unsigned long end_pfn) +{ + /* + * Let us check if the range has only holes + */ + return !find_smallest_section_pfn(nid, zone, start_pfn, end_pfn); +} + static void shrink_zone_span(struct zone *zone, unsigned long start_pfn, unsigned long end_pfn) { @@ -385,7 +383,6 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn, unsigned long z = zone_end_pfn(zone); /* zone_end_pfn namespace clash */ unsigned long zone_end_pfn = z; unsigned long pfn; - struct mem_section *ms; int nid = zone_to_nid(zone); zone_span_writelock(zone); @@ -397,11 +394,12 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn, * for shrinking zone. */ pfn = find_smallest_section_pfn(nid, zone, end_pfn, - zone_end_pfn); - if (pfn) { - zone->zone_start_pfn = pfn; - zone->spanned_pages = zone_end_pfn - pfn; - } + zone_end_pfn); + if (!pfn) + goto no_sections; + + zone->zone_start_pfn = pfn; + zone->spanned_pages = zone_end_pfn - pfn; } else if (zone_end_pfn == end_pfn) { /* * If the section is biggest section in the zone, it need @@ -410,39 +408,23 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn, * shrinking zone. */ pfn = find_biggest_section_pfn(nid, zone, zone_start_pfn, - start_pfn); - if (pfn) - zone->spanned_pages = pfn - zone_start_pfn + 1; - } - - /* - * The section is not biggest or smallest mem_section in the zone, it - * only creates a hole in the zone. So in this case, we need not - * change the zone. But perhaps, the zone has only hole data. Thus - * it check the zone has only hole or not. - */ - pfn = zone_start_pfn; - for (; pfn < zone_end_pfn; pfn += PAGES_PER_SECTION) { - ms = __pfn_to_section(pfn); - - if (unlikely(!online_section(ms))) - continue; + start_pfn); + if (!pfn) + goto no_sections; - if (page_zone(pfn_to_page(pfn)) != zone) - continue; - - /* If the section is current section, it continues the loop */ - if (start_pfn == pfn) - continue; - - /* If we find valid section, we have nothing to do */ - zone_span_writeunlock(zone); - return; + zone->spanned_pages = pfn - zone_start_pfn + 1; + } else { + if (has_only_holes(nid, zone, zone_start_pfn, zone_end_pfn)) + goto no_sections; } + goto out; + +no_sections: /* The zone has no valid section */ zone->zone_start_pfn = 0; zone->spanned_pages = 0; +out: zone_span_writeunlock(zone); } @@ -453,7 +435,6 @@ static void shrink_pgdat_span(struct pglist_data *pgdat, unsigned long p = pgdat_end_pfn(pgdat); /* pgdat_end_pfn namespace clash */ unsigned long pgdat_end_pfn = p; unsigned long pfn; - struct mem_section *ms; int nid = pgdat->node_id; if (pgdat_start_pfn == start_pfn) { @@ -465,10 +446,11 @@ static void shrink_pgdat_span(struct pglist_data *pgdat, */ pfn = find_smallest_section_pfn(nid, NULL, end_pfn, pgdat_end_pfn); - if (pfn) { - pgdat->node_start_pfn = pfn; - pgdat->node_spanned_pages = pgdat_end_pfn - pfn; - } + if (!pfn) + goto no_sections; + + pgdat->node_start_pfn = pfn; + pgdat->node_spanned_pages = pgdat_end_pfn - pfn; } else if (pgdat_end_pfn == end_pfn) { /* * If the section is biggest section in the pgdat, it need @@ -478,35 +460,18 @@ static void shrink_pgdat_span(struct pglist_data *pgdat, */ pfn = find_biggest_section_pfn(nid, NULL, pgdat_start_pfn, start_pfn); - if (pfn) - pgdat->node_spanned_pages = pfn - pgdat_start_pfn + 1; - } - - /* - * If the section is not biggest or smallest mem_section in the pgdat, - * it only creates a hole in the pgdat. So in this case, we need not - * change the pgdat. - * But perhaps, the pgdat has only hole data. Thus it check the pgdat - * has only hole or not. - */ - pfn = pgdat_start_pfn; - for (; pfn < pgdat_end_pfn; pfn += PAGES_PER_SECTION) { - ms = __pfn_to_section(pfn); - - if (unlikely(!valid_section(ms))) - continue; - - if (pfn_to_nid(pfn) != nid) - continue; + if (!pfn) + goto no_sections; - /* If the section is current section, it continues the loop */ - if (start_pfn == pfn) - continue; - - /* If we find valid section, we have nothing to do */ - return; + pgdat->node_spanned_pages = pfn - pgdat_start_pfn + 1; + } else { + if (has_only_holes(nid, NULL, pgdat_start_pfn, pgdat_end_pfn)) + goto no_sections; } + return; + +no_sections: /* The pgdat has no valid section */ pgdat->node_start_pfn = 0; pgdat->node_spanned_pages = 0; @@ -540,6 +505,7 @@ static int __remove_section(int nid, struct mem_section *ms, unsigned long map_offset, struct vmem_altmap *altmap) { int ret = -EINVAL; + int sect_nr = __section_nr(ms); if (!valid_section(ms)) return ret; @@ -548,9 +514,8 @@ static int __remove_section(int nid, struct mem_section *ms, if (ret) return ret; - shrink_pgdat(nid, __section_nr(ms)); - sparse_remove_one_section(nid, ms, map_offset, altmap); + shrink_pgdat(nid, (unsigned long)sect_nr); return 0; }