Message ID | 20220119190623.1029355-4-zi.yan@sent.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Use pageblock_order for cma and alloc_contig_range alignment. | expand |
On 2022-01-19 20:06, Zi Yan wrote: > From: Zi Yan <ziy@nvidia.com> > > Enable set_migratetype_isolate() to check specified sub-range for > unmovable pages during isolation. Page isolation is done > at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all > pages within that granularity are intended to be isolated. For example, > alloc_contig_range(), which uses page isolation, allows ranges without > alignment. This commit makes unmovable page check only look for > interesting pages, so that page isolation can succeed for any > non-overlapping ranges. Hi Zi Yan, I had to re-read this several times as I found this a bit misleading. I was mainly confused by the fact that memory_hotplug does isolation on PAGES_PER_SECTION granularity, and reading the above seems to indicate that either do it at MAX_ORDER_NR_PAGES or at pageblock_nr_pages granularity. True is that start_isolate_page_range() expects the range to be pageblock aligned and works in pageblock_nr_pages chunks, but I do not think that is what you meant to say here. Now, to the change itself, below: > @@ -47,8 +51,8 @@ static struct page *has_unmovable_pages(struct zone > *zone, struct page *page, > return page; > } > > - for (; iter < pageblock_nr_pages - offset; iter++) { > - page = pfn_to_page(pfn + iter); > + for (pfn = first_pfn; pfn < last_pfn; pfn++) { You already did pfn = first_pfn before. > /** > * start_isolate_page_range() - make page-allocation-type of range of > pages to > * be MIGRATE_ISOLATE. > - * @start_pfn: The lower PFN of the range to be isolated. > - * @end_pfn: The upper PFN of the range to be isolated. > + * @start_pfn: The lower PFN of the range to be checked for > + * possibility of isolation. > + * @end_pfn: The upper PFN of the range to be checked for > + * possibility of isolation. > + * @isolate_start: The lower PFN of the range to be isolated. > + * @isolate_end: The upper PFN of the range to be isolated. So, what does "possibility" means here. I think this need to be clarified a bit better. From what you pointed out in the commit message I think what you are doing is: - alloc_contig_range() gets a range to be isolated. - then you pass two ranges to start_isolate_page_range() (start_pfn, end_pfn]: which is the unaligned range you got in alloc_contig_range() (isolate_start, isolate_end]: which got aligned to, let's say, to MAX_ORDER_NR_PAGES Now, most likely, (start_pfn, end_pfn] only covers a sub-range of (isolate_start, isolate_end], and that sub-range is what you really want to isolate (so (start_pfn, end_pfn])? If so, should not the names be reversed?
On 24 Jan 2022, at 4:55, Oscar Salvador wrote: > On 2022-01-19 20:06, Zi Yan wrote: >> From: Zi Yan <ziy@nvidia.com> >> >> Enable set_migratetype_isolate() to check specified sub-range for >> unmovable pages during isolation. Page isolation is done >> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all >> pages within that granularity are intended to be isolated. For example, >> alloc_contig_range(), which uses page isolation, allows ranges without >> alignment. This commit makes unmovable page check only look for >> interesting pages, so that page isolation can succeed for any >> non-overlapping ranges. > > Hi Zi Yan, > > I had to re-read this several times as I found this a bit misleading. > I was mainly confused by the fact that memory_hotplug does isolation on PAGES_PER_SECTION granularity, and reading the above seems to indicate that either do it at MAX_ORDER_NR_PAGES or at pageblock_nr_pages granularity. You are right. Sorry for the confusion. I think it should be “Page isolation is done at least on max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity.” memory_hotplug uses PAGES_PER_SECTION. It is greater than that. > > True is that start_isolate_page_range() expects the range to be pageblock aligned and works in pageblock_nr_pages chunks, but I do not think that is what you meant to say here. Actually, start_isolate_page_range() should expect max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) alignment instead of pageblock alignment. It seems to be an uncovered bug in the current code, since all callers uses at least max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) alignment. The reason is that if start_isolate_page_range() is only pageblock aligned and a caller wants to isolate one pageblock from a MAX_ORDER-1 (2 pageblocks on x84_64 systems) free page, this will lead to MIGRATE_ISOLATE accounting error. To avoid it, start_isolate_page_range() needs to isolate the max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) aligned range. > > Now, to the change itself, below: > > >> @@ -47,8 +51,8 @@ static struct page *has_unmovable_pages(struct zone >> *zone, struct page *page, >> return page; >> } >> >> - for (; iter < pageblock_nr_pages - offset; iter++) { >> - page = pfn_to_page(pfn + iter); >> + for (pfn = first_pfn; pfn < last_pfn; pfn++) { > > You already did pfn = first_pfn before. Got it. Will remove the redundant code. > >> /** >> * start_isolate_page_range() - make page-allocation-type of range of pages to >> * be MIGRATE_ISOLATE. >> - * @start_pfn: The lower PFN of the range to be isolated. >> - * @end_pfn: The upper PFN of the range to be isolated. >> + * @start_pfn: The lower PFN of the range to be checked for >> + * possibility of isolation. >> + * @end_pfn: The upper PFN of the range to be checked for >> + * possibility of isolation. >> + * @isolate_start: The lower PFN of the range to be isolated. >> + * @isolate_end: The upper PFN of the range to be isolated. > > So, what does "possibility" means here. I think this need to be clarified a bit better. start_isolate_page_range() needs to check if unmovable pages exist in the range [start_pfn, end_pfn) but mark all pageblocks within [isolate_start, isolate_end) MIGRATE_ISOLATE (isolate_* need to be max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) aligned). But now I realize “possibility” here is very confusing, since both ranges decide whether the isolation can succeed. > > From what you pointed out in the commit message I think what you are doing is: > > - alloc_contig_range() gets a range to be isolated. > - then you pass two ranges to start_isolate_page_range() > (start_pfn, end_pfn]: which is the unaligned range you got in alloc_contig_range() > (isolate_start, isolate_end]: which got aligned to, let's say, to MAX_ORDER_NR_PAGES > > Now, most likely, (start_pfn, end_pfn] only covers a sub-range of (isolate_start, isolate_end], and that > sub-range is what you really want to isolate (so (start_pfn, end_pfn])? Correct. I agree that isolate_start and isolate_end are pretty confusing here. They are implementation details of start_isolate_page_range() and should not be exposed. I will remove them from the parameter list and produce them inside start_isolate_page_range(). They are pfn_max_align_down() and pfn_max_align_up() of start_pfn and end_pfn, respectively. In alloc_contig_range(), the code is still needed to save and restore migrateypes for [isolate_start, start_pfn) and (end_pfn, isolate_end], because [start_pfn, end_pfn) is not required to be max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) aligned. Like I said in the patch, the code will go away once MIGRATE_ISOLATE becomes a standalone bit without overwriting existing migratetypes during page isolation. And then isolate_start and isolate_end here will be completely transparent to callers of start_isolate_page_range(). Thanks for your review and comment. -- Best Regards, Yan, Zi
On Mon, Jan 24, 2022 at 12:17:23PM -0500, Zi Yan wrote: > You are right. Sorry for the confusion. I think it should be > “Page isolation is done at least on max(MAX_ORDER_NR_PAEGS, > pageblock_nr_pages) granularity.” > > memory_hotplug uses PAGES_PER_SECTION. It is greater than that. Or just specify that the max(MAX_ORDER_NR_PAGES, pageblock_nr_pages) granurality only comes from alloc_contig_range at the moment. Other callers might want to work in other granularity (e.g: memory-hotplug) although ultimately the range has to be aligned to something. > > True is that start_isolate_page_range() expects the range to be pageblock aligned and works in pageblock_nr_pages chunks, but I do not think that is what you meant to say here. > > Actually, start_isolate_page_range() should expect max(MAX_ORDER_NR_PAEGS, > pageblock_nr_pages) alignment instead of pageblock alignment. It seems to > be an uncovered bug in the current code, since all callers uses at least > max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) alignment. > > The reason is that if start_isolate_page_range() is only pageblock aligned > and a caller wants to isolate one pageblock from a MAX_ORDER-1 > (2 pageblocks on x84_64 systems) free page, this will lead to MIGRATE_ISOLATE > accounting error. To avoid it, start_isolate_page_range() needs to isolate > the max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) aligned range. So, let me see if I get this straight: You are saying that, currently, alloc_contig_ranges() works on the biggest alignment otherwise we might have this scenario: [ MAX_ORDER-1 ] [pageblock#0][pageblock#1] We only want to isolate pageblock#1, so we pass a pageblock-aligned range to start_isolate_page_range(), but the page belonging to pageblock#1 spans pageblock#0 and pageblock#1 because it is a MAX_ORDER-1 page. So when we call set_migratetype_isolate()->set_pageblock_migratetype(), this will mark either pageblock#0 or pageblock#1 as isolated, but the whole page will be put in the MIGRATE_ISOLATE freelist by move_freepages_block()->move_freepages(). Meaning, we wil effectively have two pageblocks isolated, but only one marked as such? Did I get it right or did I miss something? I know that this has been discussed previously, and the cover-letter already mentions it, but I think it would be great to have some sort of information about the problem in the commit message as well, so people do not have to go and find it somewhere else.
On Tue, Jan 25, 2022 at 02:19:46PM +0100, Oscar Salvador wrote: > I know that this has been discussed previously, and the cover-letter already > mentions it, but I think it would be great to have some sort of information about > the problem in the commit message as well, so people do not have to go and find > it somewhere else. Sorry, the commit already points it out, but I meant to elaborate some more.
On 25 Jan 2022, at 8:21, Oscar Salvador wrote: > On Tue, Jan 25, 2022 at 02:19:46PM +0100, Oscar Salvador wrote: >> I know that this has been discussed previously, and the cover-letter already >> mentions it, but I think it would be great to have some sort of information about >> the problem in the commit message as well, so people do not have to go and find >> it somewhere else. > > Sorry, the commit already points it out, but I meant to elaborate some more. You got it right about the issue. And I will add more text in the commit message and function comments to clarify the situation. Thanks for your suggestions. -- Best Regards, Yan, Zi
On Wed, Jan 19, 2022 at 02:06:19PM -0500, Zi Yan wrote: > From: Zi Yan <ziy@nvidia.com> > > Enable set_migratetype_isolate() to check specified sub-range for > unmovable pages during isolation. Page isolation is done > at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all > pages within that granularity are intended to be isolated. For example, > alloc_contig_range(), which uses page isolation, allows ranges without > alignment. This commit makes unmovable page check only look for > interesting pages, so that page isolation can succeed for any > non-overlapping ranges. Another thing that came to my mind. Prior to this patch, has_unmovable_pages() was checking on pageblock granularity, starting at pfn#0 of the pageblock. With this patch, you no longer check on pageblock granularity, which means you might isolate a pageblock, but some pages that sneaked in might actually be unmovable. E.g: Let's say you have a pageblock that spans (pfn#512,pfn#1024), and you pass alloc_contig_range() (pfn#514,pfn#1024). has_unmovable_pages() will start checking the pageblock at pfn#514, and so it will mis pfn#512 and pfn#513. Isn't that a problem, if those pfn turn out to be actually unmovable?
On 02.02.22 13:18, Oscar Salvador wrote: > On Wed, Jan 19, 2022 at 02:06:19PM -0500, Zi Yan wrote: >> From: Zi Yan <ziy@nvidia.com> >> >> Enable set_migratetype_isolate() to check specified sub-range for >> unmovable pages during isolation. Page isolation is done >> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all >> pages within that granularity are intended to be isolated. For example, >> alloc_contig_range(), which uses page isolation, allows ranges without >> alignment. This commit makes unmovable page check only look for >> interesting pages, so that page isolation can succeed for any >> non-overlapping ranges. > > Another thing that came to my mind. > Prior to this patch, has_unmovable_pages() was checking on pageblock > granularity, starting at pfn#0 of the pageblock. > With this patch, you no longer check on pageblock granularity, which > means you might isolate a pageblock, but some pages that sneaked in > might actually be unmovable. > > E.g: > > Let's say you have a pageblock that spans (pfn#512,pfn#1024), > and you pass alloc_contig_range() (pfn#514,pfn#1024). > has_unmovable_pages() will start checking the pageblock at pfn#514, > and so it will mis pfn#512 and pfn#513. Isn't that a problem, if those > pfn turn out to be actually unmovable? That's the whole idea for being able to allocate parts of an unmovable pageblock that are movable. If the first part is unmovable but the second part is movable, nothing should stop us from trying to allocate the second part. Of course, we want to remember the original migratetype of the pageblock, to restore that after isolation -- otherwise we'll end up converting all such pageblocks to MIGRATE_MOVABLE. The next patch does that IIRC. However, devil is in the detail, and I still have to review those parts of this series. Note that there are no current users of alloc_contig_range() that allocate < MAX_ORDER - 1 -- except CMA, but for CMA we immediately exit has_unmovable_pages() either way.
On 2 Feb 2022, at 7:25, David Hildenbrand wrote: > On 02.02.22 13:18, Oscar Salvador wrote: >> On Wed, Jan 19, 2022 at 02:06:19PM -0500, Zi Yan wrote: >>> From: Zi Yan <ziy@nvidia.com> >>> >>> Enable set_migratetype_isolate() to check specified sub-range for >>> unmovable pages during isolation. Page isolation is done >>> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all >>> pages within that granularity are intended to be isolated. For example, >>> alloc_contig_range(), which uses page isolation, allows ranges without >>> alignment. This commit makes unmovable page check only look for >>> interesting pages, so that page isolation can succeed for any >>> non-overlapping ranges. >> >> Another thing that came to my mind. >> Prior to this patch, has_unmovable_pages() was checking on pageblock >> granularity, starting at pfn#0 of the pageblock. >> With this patch, you no longer check on pageblock granularity, which >> means you might isolate a pageblock, but some pages that sneaked in >> might actually be unmovable. >> >> E.g: >> >> Let's say you have a pageblock that spans (pfn#512,pfn#1024), >> and you pass alloc_contig_range() (pfn#514,pfn#1024). >> has_unmovable_pages() will start checking the pageblock at pfn#514, >> and so it will mis pfn#512 and pfn#513. Isn't that a problem, if those >> pfn turn out to be actually unmovable? > > That's the whole idea for being able to allocate parts of an unmovable > pageblock that are movable. > > If the first part is unmovable but the second part is movable, nothing > should stop us from trying to allocate the second part. > > Of course, we want to remember the original migratetype of the > pageblock, to restore that after isolation -- otherwise we'll end up > converting all such pageblocks to MIGRATE_MOVABLE. The next patch does > that IIRC. Yes. A desirable optimization is to make MIGRATE_ISOLATE a standalone bit, so isolating a pageblock will not remove its original migratetype. It is on my todo list. > > However, devil is in the detail, and I still have to review those parts > of this series. Thanks. You can wait for my next version. I am planning to make start_isolate_page_range() accept any address range and move migratetype save and restore into it, so that the caller do not need to worry about alignment. Basically, start_isolate_page_range() will need to migrate compound pages at the beginning and/or the end of the given range [start_pfn, end_pfn) if start_pfn and/or end_pfn-1 is in the middle of a compound page. If start_pfn and/or end_pfn-1 is in the middle of a free page, the free page will need to be split and put into separate migratetype lists. > > > Note that there are no current users of alloc_contig_range() that > allocate < MAX_ORDER - 1 -- except CMA, but for CMA we immediately exit > has_unmovable_pages() either way. > > -- > Thanks, > > David / dhildenb -- Best Regards, Yan, Zi
On Wed, Feb 02, 2022 at 01:25:28PM +0100, David Hildenbrand wrote: > That's the whole idea for being able to allocate parts of an unmovable > pageblock that are movable. > > If the first part is unmovable but the second part is movable, nothing > should stop us from trying to allocate the second part. Yeah, I see, I was a bit slow there, but I see the point now. Thanks David
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index e14eddf6741a..a4d2687ed4e6 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -42,6 +42,7 @@ int move_freepages_block(struct zone *zone, struct page *page, */ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, + unsigned long isolate_start, unsigned long isolate_end, unsigned migratetype, int flags); /* diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 0139b77c51d5..5db84c3fa882 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1901,8 +1901,18 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, zone_pcp_disable(zone); lru_cache_disable(); - /* set above range as isolated */ + /* + * set above range as isolated + * + * start_pfn and end_pfn are the same as isolate_start and isolate_end, + * because start_pfn and end_pfn are already PAGES_PER_SECTION + * (>= MAX_ORDER_NR_PAGES) aligned; if start_pfn is + * pageblock_nr_pages aligned in memmap_on_memory case, there is no + * need to isolate pages before start_pfn, since they are used by + * memmap thus not user visible. + */ ret = start_isolate_page_range(start_pfn, end_pfn, + start_pfn, end_pfn, MIGRATE_MOVABLE, MEMORY_OFFLINE | REPORT_FAILURE); if (ret) { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1d812268c2a9..812cf557b20f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -9016,7 +9016,7 @@ int alloc_contig_range(unsigned long start, unsigned long end, * put back to page allocator so that buddy can use them. */ - ret = start_isolate_page_range(pfn_max_align_down(start), + ret = start_isolate_page_range(start, end, pfn_max_align_down(start), pfn_max_align_up(end), migratetype, 0); if (ret) return ret; diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 6c841274bf46..d17ad9a7d4bf 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -16,7 +16,8 @@ #include <trace/events/page_isolation.h> /* - * This function checks whether pageblock includes unmovable pages or not. + * This function checks whether pageblock within [start_pfn, end_pfn) includes + * unmovable pages or not. * * PageLRU check without isolation or lru_lock could race so that * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable @@ -29,11 +30,14 @@ * */ static struct page *has_unmovable_pages(struct zone *zone, struct page *page, - int migratetype, int flags) + int migratetype, int flags, + unsigned long start_pfn, unsigned long end_pfn) { - unsigned long iter = 0; - unsigned long pfn = page_to_pfn(page); - unsigned long offset = pfn % pageblock_nr_pages; + unsigned long first_pfn = max(page_to_pfn(page), start_pfn); + unsigned long pfn = first_pfn; + unsigned long last_pfn = min(ALIGN(pfn + 1, pageblock_nr_pages), end_pfn); + + page = pfn_to_page(pfn); if (is_migrate_cma_page(page)) { /* @@ -47,8 +51,8 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page, return page; } - for (; iter < pageblock_nr_pages - offset; iter++) { - page = pfn_to_page(pfn + iter); + for (pfn = first_pfn; pfn < last_pfn; pfn++) { + page = pfn_to_page(pfn); /* * Both, bootmem allocations and memory holes are marked @@ -85,7 +89,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page, } skip_pages = compound_nr(head) - (page - head); - iter += skip_pages - 1; + pfn += skip_pages - 1; continue; } @@ -97,7 +101,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page, */ if (!page_ref_count(page)) { if (PageBuddy(page)) - iter += (1 << buddy_order(page)) - 1; + pfn += (1 << buddy_order(page)) - 1; continue; } @@ -134,7 +138,13 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page, return NULL; } -static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags) +/* + * This function set pageblock migratetype to isolate if no unmovable page is + * present in [start_pfn, end_pfn). The pageblock must be within + * [start_pfn, end_pfn). + */ +static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags, + unsigned long start_pfn, unsigned long end_pfn) { struct zone *zone = page_zone(page); struct page *unmovable; @@ -156,7 +166,7 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_ * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself. * We just check MOVABLE pages. */ - unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags); + unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags, start_pfn, end_pfn); if (!unmovable) { unsigned long nr_pages; int mt = get_pageblock_migratetype(page); @@ -265,8 +275,12 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) /** * start_isolate_page_range() - make page-allocation-type of range of pages to * be MIGRATE_ISOLATE. - * @start_pfn: The lower PFN of the range to be isolated. - * @end_pfn: The upper PFN of the range to be isolated. + * @start_pfn: The lower PFN of the range to be checked for + * possibility of isolation. + * @end_pfn: The upper PFN of the range to be checked for + * possibility of isolation. + * @isolate_start: The lower PFN of the range to be isolated. + * @isolate_end: The upper PFN of the range to be isolated. * start_pfn/end_pfn must be aligned to pageblock_order. * @migratetype: Migrate type to set in error recovery. * @flags: The following flags are allowed (they can be combined in @@ -304,20 +318,19 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) * Return: 0 on success and -EBUSY if any part of range cannot be isolated. */ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, + unsigned long isolate_start, unsigned long isolate_end, unsigned migratetype, int flags) { unsigned long pfn; struct page *page; - BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages)); - BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages)); - - for (pfn = start_pfn; - pfn < end_pfn; + for (pfn = isolate_start; + pfn < isolate_end; pfn += pageblock_nr_pages) { page = __first_valid_page(pfn, pageblock_nr_pages); - if (page && set_migratetype_isolate(page, migratetype, flags)) { - undo_isolate_page_range(start_pfn, pfn, migratetype); + if (page && set_migratetype_isolate(page, migratetype, flags, + start_pfn, end_pfn)) { + undo_isolate_page_range(isolate_start, pfn, migratetype); return -EBUSY; } }