Message ID | 20220524194756.1698351-1-zi.yan@sent.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: fix a potential infinite loop in start_isolate_page_range(). | expand |
On Tue, 24 May 2022 15:47:56 -0400 Zi Yan <zi.yan@sent.com> wrote: > From: Zi Yan <ziy@nvidia.com> > > In isolate_single_pageblock() called by start_isolate_page_range(), > there are some pageblock isolation issues causing a potential > infinite loop when isolating a page range. This is reported by Qian Cai. > > 1. the pageblock was isolated by just changing pageblock migratetype > without checking unmovable pages. Calling set_migratetype_isolate() to > isolate pageblock properly. > 2. an off-by-one error caused migrating pages unnecessarily, since the page > is not crossing pageblock boundary. > 3. migrating a compound page across pageblock boundary then splitting the > free page later has a small race window that the free page might be > allocated again, so that the code will try again, causing an potential > infinite loop. Temporarily set the to-be-migrated page's pageblock to > MIGRATE_ISOLATE to prevent that and bail out early if no free page is > found after page migration. > > An additional fix to split_free_page() aims to avoid crashing in > __free_one_page(). When the free page is split at the specified > split_pfn_offset, free_page_order should check both the first bit of > free_page_pfn and the last bit of split_pfn_offset and use the smaller one. > For example, if free_page_pfn=0x10000, split_pfn_offset=0xc000, > free_page_order should first be 0x8000 then 0x4000, instead of 0x4000 then > 0x8000, which the original algorithm did. > > ... > > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1114,13 +1114,16 @@ void split_free_page(struct page *free_page, > unsigned long flags; > int free_page_order; > > + if (split_pfn_offset == 0) > + return; > + > spin_lock_irqsave(&zone->lock, flags); > del_page_from_free_list(free_page, zone, order); > for (pfn = free_page_pfn; > pfn < free_page_pfn + (1UL << order);) { > int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn); > > - free_page_order = ffs(split_pfn_offset) - 1; > + free_page_order = min(pfn ? __ffs(pfn) : order, __fls(split_pfn_offset)); Why is it testing the zeroness of `pfn' here? Can pfn==0 even happen? If so, it's a legitimate value so why does it get special-cased?
On 24 May 2022, at 16:23, Andrew Morton wrote: > On Tue, 24 May 2022 15:47:56 -0400 Zi Yan <zi.yan@sent.com> wrote: > >> From: Zi Yan <ziy@nvidia.com> >> >> In isolate_single_pageblock() called by start_isolate_page_range(), >> there are some pageblock isolation issues causing a potential >> infinite loop when isolating a page range. This is reported by Qian Cai. >> >> 1. the pageblock was isolated by just changing pageblock migratetype >> without checking unmovable pages. Calling set_migratetype_isolate() to >> isolate pageblock properly. >> 2. an off-by-one error caused migrating pages unnecessarily, since the page >> is not crossing pageblock boundary. >> 3. migrating a compound page across pageblock boundary then splitting the >> free page later has a small race window that the free page might be >> allocated again, so that the code will try again, causing an potential >> infinite loop. Temporarily set the to-be-migrated page's pageblock to >> MIGRATE_ISOLATE to prevent that and bail out early if no free page is >> found after page migration. >> >> An additional fix to split_free_page() aims to avoid crashing in >> __free_one_page(). When the free page is split at the specified >> split_pfn_offset, free_page_order should check both the first bit of >> free_page_pfn and the last bit of split_pfn_offset and use the smaller one. >> For example, if free_page_pfn=0x10000, split_pfn_offset=0xc000, >> free_page_order should first be 0x8000 then 0x4000, instead of 0x4000 then >> 0x8000, which the original algorithm did. >> >> ... >> >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1114,13 +1114,16 @@ void split_free_page(struct page *free_page, >> unsigned long flags; >> int free_page_order; >> >> + if (split_pfn_offset == 0) >> + return; >> + >> spin_lock_irqsave(&zone->lock, flags); >> del_page_from_free_list(free_page, zone, order); >> for (pfn = free_page_pfn; >> pfn < free_page_pfn + (1UL << order);) { >> int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn); >> >> - free_page_order = ffs(split_pfn_offset) - 1; >> + free_page_order = min(pfn ? __ffs(pfn) : order, __fls(split_pfn_offset)); > > Why is it testing the zeroness of `pfn' here? Can pfn==0 even happen? > If so, it's a legitimate value so why does it get special-cased? __ffs() and __fls() are undefined if no bit exists, based on their comments. I checked both pfn and split_pfn_offset against 0 just in case, even if pfn most likely is not going to be 0. -- Best Regards, Yan, Zi
On 24.05.2022 21:47, Zi Yan wrote: > From: Zi Yan <ziy@nvidia.com> > > In isolate_single_pageblock() called by start_isolate_page_range(), > there are some pageblock isolation issues causing a potential > infinite loop when isolating a page range. This is reported by Qian Cai. > > 1. the pageblock was isolated by just changing pageblock migratetype > without checking unmovable pages. Calling set_migratetype_isolate() to > isolate pageblock properly. > 2. an off-by-one error caused migrating pages unnecessarily, since the page > is not crossing pageblock boundary. > 3. migrating a compound page across pageblock boundary then splitting the > free page later has a small race window that the free page might be > allocated again, so that the code will try again, causing an potential > infinite loop. Temporarily set the to-be-migrated page's pageblock to > MIGRATE_ISOLATE to prevent that and bail out early if no free page is > found after page migration. > > An additional fix to split_free_page() aims to avoid crashing in > __free_one_page(). When the free page is split at the specified > split_pfn_offset, free_page_order should check both the first bit of > free_page_pfn and the last bit of split_pfn_offset and use the smaller one. > For example, if free_page_pfn=0x10000, split_pfn_offset=0xc000, > free_page_order should first be 0x8000 then 0x4000, instead of 0x4000 then > 0x8000, which the original algorithm did. > > Fixes: b2c9e2fbba ("mm: make alloc_contig_range work at pageblock granularity") > Reported-by: Qian Cai <quic_qiancai@quicinc.com> > Signed-off-by: Zi Yan <ziy@nvidia.com> This patch landed in linux next-20220525 as commit 29a8af92b874 ("mm: fix a potential infinite loop in start_isolate_page_range()"). Unfortunately it breaks all CMA allocations done by the DMA-mapping framework. I've observed this on ARM 32bit and 64bit. In the logs I only see messages like: cma: cma_alloc: linux,cma: alloc failed, req-size: 128 pages, ret: -16 I will try to analyze it a bit more tomorrow, but it looks that isolation always fails. > --- > mm/page_alloc.c | 5 ++++- > mm/page_isolation.c | 52 ++++++++++++++++++++++++++++++++++----------- > 2 files changed, 44 insertions(+), 13 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 267599dd9706..6eec0211e0be 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1114,13 +1114,16 @@ void split_free_page(struct page *free_page, > unsigned long flags; > int free_page_order; > > + if (split_pfn_offset == 0) > + return; > + > spin_lock_irqsave(&zone->lock, flags); > del_page_from_free_list(free_page, zone, order); > for (pfn = free_page_pfn; > pfn < free_page_pfn + (1UL << order);) { > int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn); > > - free_page_order = ffs(split_pfn_offset) - 1; > + free_page_order = min(pfn ? __ffs(pfn) : order, __fls(split_pfn_offset)); > __free_one_page(pfn_to_page(pfn), pfn, zone, free_page_order, > mt, FPI_NONE); > pfn += 1UL << free_page_order; > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index b3f074d1682e..c643c8420809 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -283,6 +283,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) > * isolate_single_pageblock() -- tries to isolate a pageblock that might be > * within a free or in-use page. > * @boundary_pfn: pageblock-aligned pfn that a page might cross > + * @flags: isolation flags > * @gfp_flags: GFP flags used for migrating pages > * @isolate_before: isolate the pageblock before the boundary_pfn > * > @@ -298,14 +299,15 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) > * either. The function handles this by splitting the free page or migrating > * the in-use page then splitting the free page. > */ > -static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, > - bool isolate_before) > +static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, > + gfp_t gfp_flags, bool isolate_before) > { > unsigned char saved_mt; > unsigned long start_pfn; > unsigned long isolate_pageblock; > unsigned long pfn; > struct zone *zone; > + int ret; > > VM_BUG_ON(!IS_ALIGNED(boundary_pfn, pageblock_nr_pages)); > > @@ -325,7 +327,11 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, > zone->zone_start_pfn); > > saved_mt = get_pageblock_migratetype(pfn_to_page(isolate_pageblock)); > - set_pageblock_migratetype(pfn_to_page(isolate_pageblock), MIGRATE_ISOLATE); > + ret = set_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt, flags, > + isolate_pageblock, isolate_pageblock + pageblock_nr_pages); > + > + if (ret) > + return ret; > > /* > * Bail out early when the to-be-isolated pageblock does not form > @@ -374,7 +380,7 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, > struct page *head = compound_head(page); > unsigned long head_pfn = page_to_pfn(head); > > - if (head_pfn + nr_pages < boundary_pfn) { > + if (head_pfn + nr_pages <= boundary_pfn) { > pfn = head_pfn + nr_pages; > continue; > } > @@ -386,7 +392,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, > if (PageHuge(page) || PageLRU(page) || __PageMovable(page)) { > int order; > unsigned long outer_pfn; > - int ret; > + int page_mt = get_pageblock_migratetype(page); > + bool isolate_page = !is_migrate_isolate_page(page); > struct compact_control cc = { > .nr_migratepages = 0, > .order = -1, > @@ -399,9 +406,31 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, > }; > INIT_LIST_HEAD(&cc.migratepages); > > + /* > + * XXX: mark the page as MIGRATE_ISOLATE so that > + * no one else can grab the freed page after migration. > + * Ideally, the page should be freed as two separate > + * pages to be added into separate migratetype free > + * lists. > + */ > + if (isolate_page) { > + ret = set_migratetype_isolate(page, page_mt, > + flags, head_pfn, head_pfn + nr_pages); > + if (ret) > + goto failed; > + } > + > ret = __alloc_contig_migrate_range(&cc, head_pfn, > head_pfn + nr_pages); > > + /* > + * restore the page's migratetype so that it can > + * be split into separate migratetype free lists > + * later. > + */ > + if (isolate_page) > + unset_migratetype_isolate(page, page_mt); > + > if (ret) > goto failed; > /* > @@ -417,10 +446,9 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, > order = 0; > outer_pfn = pfn; > while (!PageBuddy(pfn_to_page(outer_pfn))) { > - if (++order >= MAX_ORDER) { > - outer_pfn = pfn; > - break; > - } > + /* stop if we cannot find the free page */ > + if (++order >= MAX_ORDER) > + goto failed; > outer_pfn &= ~0UL << order; > } > pfn = outer_pfn; > @@ -435,7 +463,7 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, > return 0; > failed: > /* restore the original migratetype */ > - set_pageblock_migratetype(pfn_to_page(isolate_pageblock), saved_mt); > + unset_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt); > return -EBUSY; > } > > @@ -496,12 +524,12 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, > int ret; > > /* isolate [isolate_start, isolate_start + pageblock_nr_pages) pageblock */ > - ret = isolate_single_pageblock(isolate_start, gfp_flags, false); > + ret = isolate_single_pageblock(isolate_start, flags, gfp_flags, false); > if (ret) > return ret; > > /* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */ > - ret = isolate_single_pageblock(isolate_end, gfp_flags, true); > + ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true); > if (ret) { > unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype); > return ret; Best regards
On 25 May 2022, at 17:48, Marek Szyprowski wrote: > On 24.05.2022 21:47, Zi Yan wrote: >> From: Zi Yan <ziy@nvidia.com> >> >> In isolate_single_pageblock() called by start_isolate_page_range(), >> there are some pageblock isolation issues causing a potential >> infinite loop when isolating a page range. This is reported by Qian Cai. >> >> 1. the pageblock was isolated by just changing pageblock migratetype >> without checking unmovable pages. Calling set_migratetype_isolate() to >> isolate pageblock properly. >> 2. an off-by-one error caused migrating pages unnecessarily, since the page >> is not crossing pageblock boundary. >> 3. migrating a compound page across pageblock boundary then splitting the >> free page later has a small race window that the free page might be >> allocated again, so that the code will try again, causing an potential >> infinite loop. Temporarily set the to-be-migrated page's pageblock to >> MIGRATE_ISOLATE to prevent that and bail out early if no free page is >> found after page migration. >> >> An additional fix to split_free_page() aims to avoid crashing in >> __free_one_page(). When the free page is split at the specified >> split_pfn_offset, free_page_order should check both the first bit of >> free_page_pfn and the last bit of split_pfn_offset and use the smaller one. >> For example, if free_page_pfn=0x10000, split_pfn_offset=0xc000, >> free_page_order should first be 0x8000 then 0x4000, instead of 0x4000 then >> 0x8000, which the original algorithm did. >> >> Fixes: b2c9e2fbba ("mm: make alloc_contig_range work at pageblock granularity") >> Reported-by: Qian Cai <quic_qiancai@quicinc.com> >> Signed-off-by: Zi Yan <ziy@nvidia.com> > > This patch landed in linux next-20220525 as commit 29a8af92b874 ("mm: > fix a potential infinite loop in start_isolate_page_range()"). > Unfortunately it breaks all CMA allocations done by the DMA-mapping > framework. I've observed this on ARM 32bit and 64bit. In the logs I only > see messages like: > > cma: cma_alloc: linux,cma: alloc failed, req-size: 128 pages, ret: -16 > > I will try to analyze it a bit more tomorrow, but it looks that > isolation always fails. > Hi Marek, Thanks for reporting the issue. Can you try the patch below to see if it fixes the issue? Basically, the bug introduced by this commit is that it does not consider the situation when a smaller than pageblock range is to be isolated, the set_migratetype_isolate() in the second isolate_single_pageblock() called by start_isolate_page_range() returns with a failure. Skipping isolating the pageblock which has been isolated by the first isolate_single_pageblock() solves the issue. The patch below also includes the fix for the free memory accounting issue. diff --git a/mm/internal.h b/mm/internal.h index 64e61b032dac..c0f8fbe0445b 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -374,8 +374,8 @@ extern void *memmap_alloc(phys_addr_t size, phys_addr_t align, phys_addr_t min_addr, int nid, bool exact_nid); -void split_free_page(struct page *free_page, - int order, unsigned long split_pfn_offset); +int split_free_page(struct page *free_page, + unsigned int order, unsigned long split_pfn_offset); #if defined CONFIG_COMPACTION || defined CONFIG_CMA diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bc93a82e51e6..6f6e4649ac21 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1100,30 +1100,44 @@ static inline void __free_one_page(struct page *page, * @order: the order of the page * @split_pfn_offset: split offset within the page * + * Return -ENOENT if the free page is changed, otherwise 0 + * * It is used when the free page crosses two pageblocks with different migratetypes * at split_pfn_offset within the page. The split free page will be put into * separate migratetype lists afterwards. Otherwise, the function achieves * nothing. */ -void split_free_page(struct page *free_page, - int order, unsigned long split_pfn_offset) +int split_free_page(struct page *free_page, + unsigned int order, unsigned long split_pfn_offset) { struct zone *zone = page_zone(free_page); unsigned long free_page_pfn = page_to_pfn(free_page); unsigned long pfn; unsigned long flags; int free_page_order; + int mt; + int ret = 0; if (split_pfn_offset == 0) - return; + return ret; spin_lock_irqsave(&zone->lock, flags); + + if (!PageBuddy(free_page) || buddy_order(free_page) != order) { + ret = -ENOENT; + goto out; + } + + mt = get_pageblock_migratetype(free_page); + if (likely(!is_migrate_isolate(mt))) + __mod_zone_freepage_state(zone, -(1UL << order), mt); + del_page_from_free_list(free_page, zone, order); for (pfn = free_page_pfn; pfn < free_page_pfn + (1UL << order);) { int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn); - free_page_order = min_t(int, + free_page_order = min_t(unsigned int, pfn ? __ffs(pfn) : order, __fls(split_pfn_offset)); __free_one_page(pfn_to_page(pfn), pfn, zone, free_page_order, @@ -1134,7 +1148,9 @@ void split_free_page(struct page *free_page, if (split_pfn_offset == 0) split_pfn_offset = (1UL << order) - (pfn - free_page_pfn); } +out: spin_unlock_irqrestore(&zone->lock, flags); + return ret; } /* * A bad page could be due to a number of fields. Instead of multiple branches, diff --git a/mm/page_isolation.c b/mm/page_isolation.c index c643c8420809..f539ccf7fb44 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -300,7 +300,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) * the in-use page then splitting the free page. */ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, - gfp_t gfp_flags, bool isolate_before) + gfp_t gfp_flags, bool isolate_before, bool skip_isolation) { unsigned char saved_mt; unsigned long start_pfn; @@ -327,11 +327,16 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, zone->zone_start_pfn); saved_mt = get_pageblock_migratetype(pfn_to_page(isolate_pageblock)); - ret = set_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt, flags, - isolate_pageblock, isolate_pageblock + pageblock_nr_pages); - if (ret) - return ret; + if (skip_isolation) + VM_BUG_ON(!is_migrate_isolate(saved_mt)); + else { + ret = set_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt, flags, + isolate_pageblock, isolate_pageblock + pageblock_nr_pages); + + if (ret) + return ret; + } /* * Bail out early when the to-be-isolated pageblock does not form @@ -367,8 +372,11 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, int order = buddy_order(page); if (pfn + (1UL << order) > boundary_pfn) - split_free_page(page, order, boundary_pfn - pfn); - pfn += (1UL << order); + /* free page changed before split, check it again */ + if (split_free_page(page, order, boundary_pfn - pfn)) + continue; + + pfn += 1UL << order; continue; } /* @@ -463,7 +471,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, return 0; failed: /* restore the original migratetype */ - unset_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt); + if (!skip_isolation) + unset_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt); return -EBUSY; } @@ -522,14 +531,18 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, unsigned long isolate_start = ALIGN_DOWN(start_pfn, pageblock_nr_pages); unsigned long isolate_end = ALIGN(end_pfn, pageblock_nr_pages); int ret; + bool skip_isolation = false; /* isolate [isolate_start, isolate_start + pageblock_nr_pages) pageblock */ - ret = isolate_single_pageblock(isolate_start, flags, gfp_flags, false); + ret = isolate_single_pageblock(isolate_start, flags, gfp_flags, false, skip_isolation); if (ret) return ret; + if (isolate_start == isolate_end - pageblock_nr_pages) + skip_isolation = true; + /* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */ - ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true); + ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true, skip_isolation); if (ret) { unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype); return ret; -- Best Regards, Yan, Zi
Hi, On 26.05.2022 19:32, Zi Yan wrote: > On 25 May 2022, at 17:48, Marek Szyprowski wrote: >> On 24.05.2022 21:47, Zi Yan wrote: >>> From: Zi Yan <ziy@nvidia.com> >>> >>> In isolate_single_pageblock() called by start_isolate_page_range(), >>> there are some pageblock isolation issues causing a potential >>> infinite loop when isolating a page range. This is reported by Qian Cai. >>> >>> 1. the pageblock was isolated by just changing pageblock migratetype >>> without checking unmovable pages. Calling set_migratetype_isolate() to >>> isolate pageblock properly. >>> 2. an off-by-one error caused migrating pages unnecessarily, since the page >>> is not crossing pageblock boundary. >>> 3. migrating a compound page across pageblock boundary then splitting the >>> free page later has a small race window that the free page might be >>> allocated again, so that the code will try again, causing an potential >>> infinite loop. Temporarily set the to-be-migrated page's pageblock to >>> MIGRATE_ISOLATE to prevent that and bail out early if no free page is >>> found after page migration. >>> >>> An additional fix to split_free_page() aims to avoid crashing in >>> __free_one_page(). When the free page is split at the specified >>> split_pfn_offset, free_page_order should check both the first bit of >>> free_page_pfn and the last bit of split_pfn_offset and use the smaller one. >>> For example, if free_page_pfn=0x10000, split_pfn_offset=0xc000, >>> free_page_order should first be 0x8000 then 0x4000, instead of 0x4000 then >>> 0x8000, which the original algorithm did. >>> >>> Fixes: b2c9e2fbba ("mm: make alloc_contig_range work at pageblock granularity") >>> Reported-by: Qian Cai <quic_qiancai@quicinc.com> >>> Signed-off-by: Zi Yan <ziy@nvidia.com> >> This patch landed in linux next-20220525 as commit 29a8af92b874 ("mm: >> fix a potential infinite loop in start_isolate_page_range()"). >> Unfortunately it breaks all CMA allocations done by the DMA-mapping >> framework. I've observed this on ARM 32bit and 64bit. In the logs I only >> see messages like: >> >> cma: cma_alloc: linux,cma: alloc failed, req-size: 128 pages, ret: -16 >> >> I will try to analyze it a bit more tomorrow, but it looks that >> isolation always fails. >> > Hi Marek, > > Thanks for reporting the issue. > > Can you try the patch below to see if it fixes the issue? > > Basically, the bug introduced by this commit is that it does not consider > the situation when a smaller than pageblock range is to be isolated, > the set_migratetype_isolate() in the second isolate_single_pageblock() > called by start_isolate_page_range() returns with a failure. Skipping isolating > the pageblock which has been isolated by the first isolate_single_pageblock() > solves the issue. > > The patch below also includes the fix for the free memory accounting issue. This patch fixed the issue, thanks! Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > diff --git a/mm/internal.h b/mm/internal.h > index 64e61b032dac..c0f8fbe0445b 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -374,8 +374,8 @@ extern void *memmap_alloc(phys_addr_t size, phys_addr_t align, > phys_addr_t min_addr, > int nid, bool exact_nid); > > -void split_free_page(struct page *free_page, > - int order, unsigned long split_pfn_offset); > +int split_free_page(struct page *free_page, > + unsigned int order, unsigned long split_pfn_offset); > > #if defined CONFIG_COMPACTION || defined CONFIG_CMA > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index bc93a82e51e6..6f6e4649ac21 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1100,30 +1100,44 @@ static inline void __free_one_page(struct page *page, > * @order: the order of the page > * @split_pfn_offset: split offset within the page > * > + * Return -ENOENT if the free page is changed, otherwise 0 > + * > * It is used when the free page crosses two pageblocks with different migratetypes > * at split_pfn_offset within the page. The split free page will be put into > * separate migratetype lists afterwards. Otherwise, the function achieves > * nothing. > */ > -void split_free_page(struct page *free_page, > - int order, unsigned long split_pfn_offset) > +int split_free_page(struct page *free_page, > + unsigned int order, unsigned long split_pfn_offset) > { > struct zone *zone = page_zone(free_page); > unsigned long free_page_pfn = page_to_pfn(free_page); > unsigned long pfn; > unsigned long flags; > int free_page_order; > + int mt; > + int ret = 0; > > if (split_pfn_offset == 0) > - return; > + return ret; > > spin_lock_irqsave(&zone->lock, flags); > + > + if (!PageBuddy(free_page) || buddy_order(free_page) != order) { > + ret = -ENOENT; > + goto out; > + } > + > + mt = get_pageblock_migratetype(free_page); > + if (likely(!is_migrate_isolate(mt))) > + __mod_zone_freepage_state(zone, -(1UL << order), mt); > + > del_page_from_free_list(free_page, zone, order); > for (pfn = free_page_pfn; > pfn < free_page_pfn + (1UL << order);) { > int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn); > > - free_page_order = min_t(int, > + free_page_order = min_t(unsigned int, > pfn ? __ffs(pfn) : order, > __fls(split_pfn_offset)); > __free_one_page(pfn_to_page(pfn), pfn, zone, free_page_order, > @@ -1134,7 +1148,9 @@ void split_free_page(struct page *free_page, > if (split_pfn_offset == 0) > split_pfn_offset = (1UL << order) - (pfn - free_page_pfn); > } > +out: > spin_unlock_irqrestore(&zone->lock, flags); > + return ret; > } > /* > * A bad page could be due to a number of fields. Instead of multiple branches, > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index c643c8420809..f539ccf7fb44 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -300,7 +300,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) > * the in-use page then splitting the free page. > */ > static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, > - gfp_t gfp_flags, bool isolate_before) > + gfp_t gfp_flags, bool isolate_before, bool skip_isolation) > { > unsigned char saved_mt; > unsigned long start_pfn; > @@ -327,11 +327,16 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, > zone->zone_start_pfn); > > saved_mt = get_pageblock_migratetype(pfn_to_page(isolate_pageblock)); > - ret = set_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt, flags, > - isolate_pageblock, isolate_pageblock + pageblock_nr_pages); > > - if (ret) > - return ret; > + if (skip_isolation) > + VM_BUG_ON(!is_migrate_isolate(saved_mt)); > + else { > + ret = set_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt, flags, > + isolate_pageblock, isolate_pageblock + pageblock_nr_pages); > + > + if (ret) > + return ret; > + } > > /* > * Bail out early when the to-be-isolated pageblock does not form > @@ -367,8 +372,11 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, > int order = buddy_order(page); > > if (pfn + (1UL << order) > boundary_pfn) > - split_free_page(page, order, boundary_pfn - pfn); > - pfn += (1UL << order); > + /* free page changed before split, check it again */ > + if (split_free_page(page, order, boundary_pfn - pfn)) > + continue; > + > + pfn += 1UL << order; > continue; > } > /* > @@ -463,7 +471,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, > return 0; > failed: > /* restore the original migratetype */ > - unset_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt); > + if (!skip_isolation) > + unset_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt); > return -EBUSY; > } > > @@ -522,14 +531,18 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, > unsigned long isolate_start = ALIGN_DOWN(start_pfn, pageblock_nr_pages); > unsigned long isolate_end = ALIGN(end_pfn, pageblock_nr_pages); > int ret; > + bool skip_isolation = false; > > /* isolate [isolate_start, isolate_start + pageblock_nr_pages) pageblock */ > - ret = isolate_single_pageblock(isolate_start, flags, gfp_flags, false); > + ret = isolate_single_pageblock(isolate_start, flags, gfp_flags, false, skip_isolation); > if (ret) > return ret; > > + if (isolate_start == isolate_end - pageblock_nr_pages) > + skip_isolation = true; > + > /* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */ > - ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true); > + ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true, skip_isolation); > if (ret) { > unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype); > return ret; > > > > -- > Best Regards, > Yan, Zi Best regards
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 267599dd9706..6eec0211e0be 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1114,13 +1114,16 @@ void split_free_page(struct page *free_page, unsigned long flags; int free_page_order; + if (split_pfn_offset == 0) + return; + spin_lock_irqsave(&zone->lock, flags); del_page_from_free_list(free_page, zone, order); for (pfn = free_page_pfn; pfn < free_page_pfn + (1UL << order);) { int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn); - free_page_order = ffs(split_pfn_offset) - 1; + free_page_order = min(pfn ? __ffs(pfn) : order, __fls(split_pfn_offset)); __free_one_page(pfn_to_page(pfn), pfn, zone, free_page_order, mt, FPI_NONE); pfn += 1UL << free_page_order; diff --git a/mm/page_isolation.c b/mm/page_isolation.c index b3f074d1682e..c643c8420809 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -283,6 +283,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) * isolate_single_pageblock() -- tries to isolate a pageblock that might be * within a free or in-use page. * @boundary_pfn: pageblock-aligned pfn that a page might cross + * @flags: isolation flags * @gfp_flags: GFP flags used for migrating pages * @isolate_before: isolate the pageblock before the boundary_pfn * @@ -298,14 +299,15 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) * either. The function handles this by splitting the free page or migrating * the in-use page then splitting the free page. */ -static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, - bool isolate_before) +static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, + gfp_t gfp_flags, bool isolate_before) { unsigned char saved_mt; unsigned long start_pfn; unsigned long isolate_pageblock; unsigned long pfn; struct zone *zone; + int ret; VM_BUG_ON(!IS_ALIGNED(boundary_pfn, pageblock_nr_pages)); @@ -325,7 +327,11 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, zone->zone_start_pfn); saved_mt = get_pageblock_migratetype(pfn_to_page(isolate_pageblock)); - set_pageblock_migratetype(pfn_to_page(isolate_pageblock), MIGRATE_ISOLATE); + ret = set_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt, flags, + isolate_pageblock, isolate_pageblock + pageblock_nr_pages); + + if (ret) + return ret; /* * Bail out early when the to-be-isolated pageblock does not form @@ -374,7 +380,7 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, struct page *head = compound_head(page); unsigned long head_pfn = page_to_pfn(head); - if (head_pfn + nr_pages < boundary_pfn) { + if (head_pfn + nr_pages <= boundary_pfn) { pfn = head_pfn + nr_pages; continue; } @@ -386,7 +392,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, if (PageHuge(page) || PageLRU(page) || __PageMovable(page)) { int order; unsigned long outer_pfn; - int ret; + int page_mt = get_pageblock_migratetype(page); + bool isolate_page = !is_migrate_isolate_page(page); struct compact_control cc = { .nr_migratepages = 0, .order = -1, @@ -399,9 +406,31 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, }; INIT_LIST_HEAD(&cc.migratepages); + /* + * XXX: mark the page as MIGRATE_ISOLATE so that + * no one else can grab the freed page after migration. + * Ideally, the page should be freed as two separate + * pages to be added into separate migratetype free + * lists. + */ + if (isolate_page) { + ret = set_migratetype_isolate(page, page_mt, + flags, head_pfn, head_pfn + nr_pages); + if (ret) + goto failed; + } + ret = __alloc_contig_migrate_range(&cc, head_pfn, head_pfn + nr_pages); + /* + * restore the page's migratetype so that it can + * be split into separate migratetype free lists + * later. + */ + if (isolate_page) + unset_migratetype_isolate(page, page_mt); + if (ret) goto failed; /* @@ -417,10 +446,9 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, order = 0; outer_pfn = pfn; while (!PageBuddy(pfn_to_page(outer_pfn))) { - if (++order >= MAX_ORDER) { - outer_pfn = pfn; - break; - } + /* stop if we cannot find the free page */ + if (++order >= MAX_ORDER) + goto failed; outer_pfn &= ~0UL << order; } pfn = outer_pfn; @@ -435,7 +463,7 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, return 0; failed: /* restore the original migratetype */ - set_pageblock_migratetype(pfn_to_page(isolate_pageblock), saved_mt); + unset_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt); return -EBUSY; } @@ -496,12 +524,12 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, int ret; /* isolate [isolate_start, isolate_start + pageblock_nr_pages) pageblock */ - ret = isolate_single_pageblock(isolate_start, gfp_flags, false); + ret = isolate_single_pageblock(isolate_start, flags, gfp_flags, false); if (ret) return ret; /* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */ - ret = isolate_single_pageblock(isolate_end, gfp_flags, true); + ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true); if (ret) { unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype); return ret;