Message ID | 20221124095523.31061-1-gshan@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] mm: migrate: Fix THP's mapcount on isolation | expand |
On 24.11.22 10:55, Gavin Shan wrote: > The issue is reported when removing memory through virtio_mem device. > The transparent huge page, experienced copy-on-write fault, is wrongly > regarded as pinned. The transparent huge page is escaped from being > isolated in isolate_migratepages_block(). The transparent huge page > can't be migrated and the corresponding memory block can't be put > into offline state. > > Fix it by replacing page_mapcount() with total_mapcount(). With this, > the transparent huge page can be isolated and migrated, and the memory > block can be put into offline state. Besides, The page's refcount is > increased a bit earlier to avoid the page is released when the check > is executed. Did you look into handling pages that are in the swapcache case as well? See is_refcount_suitable() in mm/khugepaged.c. Should be easy to reproduce, let me know if you need inspiration. > > Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations") > Cc: stable@vger.kernel.org # v5.7+ > Reported-by: Zhenyu Zhang <zhenyzha@redhat.com> > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Gavin Shan <gshan@redhat.com> > --- > v2: Corrected fix tag and increase page's refcount before the check > --- > mm/compaction.c | 22 +++++++++++----------- > 1 file changed, 11 insertions(+), 11 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index c51f7f545afe..1f6da31dd9a5 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -984,29 +984,29 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > goto isolate_fail; > } > > + /* > + * Be careful not to clear PageLRU until after we're > + * sure the page is not being freed elsewhere -- the > + * page release code relies on it. > + */ > + if (unlikely(!get_page_unless_zero(page))) > + goto isolate_fail; > + > /* > * Migration will fail if an anonymous page is pinned in memory, > * so avoid taking lru_lock and isolating it unnecessarily in an > * admittedly racy check. > */ > mapping = page_mapping(page); > - if (!mapping && page_count(page) > page_mapcount(page)) > - goto isolate_fail; > + if (!mapping && (page_count(page) - 1) > total_mapcount(page)) > + goto isolate_fail_put; > > /* > * Only allow to migrate anonymous pages in GFP_NOFS context > * because those do not depend on fs locks. > */ > if (!(cc->gfp_mask & __GFP_FS) && mapping) > - goto isolate_fail; > - > - /* > - * Be careful not to clear PageLRU until after we're > - * sure the page is not being freed elsewhere -- the > - * page release code relies on it. > - */ > - if (unlikely(!get_page_unless_zero(page))) > - goto isolate_fail; > + goto isolate_fail_put; > > /* Only take pages on LRU: a check now makes later tests safe */ > if (!PageLRU(page))
On 11/24/22 6:09 PM, David Hildenbrand wrote: > On 24.11.22 10:55, Gavin Shan wrote: >> The issue is reported when removing memory through virtio_mem device. >> The transparent huge page, experienced copy-on-write fault, is wrongly >> regarded as pinned. The transparent huge page is escaped from being >> isolated in isolate_migratepages_block(). The transparent huge page >> can't be migrated and the corresponding memory block can't be put >> into offline state. >> >> Fix it by replacing page_mapcount() with total_mapcount(). With this, >> the transparent huge page can be isolated and migrated, and the memory >> block can be put into offline state. Besides, The page's refcount is >> increased a bit earlier to avoid the page is released when the check >> is executed. > > Did you look into handling pages that are in the swapcache case as well? > > See is_refcount_suitable() in mm/khugepaged.c. > > Should be easy to reproduce, let me know if you need inspiration. > Nope, I didn't look into the case. Please elaborate the details so that I can reproduce it firstly. >> >> Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations") >> Cc: stable@vger.kernel.org # v5.7+ >> Reported-by: Zhenyu Zhang <zhenyzha@redhat.com> >> Suggested-by: David Hildenbrand <david@redhat.com> >> Signed-off-by: Gavin Shan <gshan@redhat.com> >> --- >> v2: Corrected fix tag and increase page's refcount before the check >> --- >> mm/compaction.c | 22 +++++++++++----------- >> 1 file changed, 11 insertions(+), 11 deletions(-) >> >> diff --git a/mm/compaction.c b/mm/compaction.c >> index c51f7f545afe..1f6da31dd9a5 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -984,29 +984,29 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, >> goto isolate_fail; >> } >> + /* >> + * Be careful not to clear PageLRU until after we're >> + * sure the page is not being freed elsewhere -- the >> + * page release code relies on it. >> + */ >> + if (unlikely(!get_page_unless_zero(page))) >> + goto isolate_fail; >> + >> /* >> * Migration will fail if an anonymous page is pinned in memory, >> * so avoid taking lru_lock and isolating it unnecessarily in an >> * admittedly racy check. >> */ >> mapping = page_mapping(page); >> - if (!mapping && page_count(page) > page_mapcount(page)) >> - goto isolate_fail; >> + if (!mapping && (page_count(page) - 1) > total_mapcount(page)) >> + goto isolate_fail_put; >> /* >> * Only allow to migrate anonymous pages in GFP_NOFS context >> * because those do not depend on fs locks. >> */ >> if (!(cc->gfp_mask & __GFP_FS) && mapping) >> - goto isolate_fail; >> - >> - /* >> - * Be careful not to clear PageLRU until after we're >> - * sure the page is not being freed elsewhere -- the >> - * page release code relies on it. >> - */ >> - if (unlikely(!get_page_unless_zero(page))) >> - goto isolate_fail; >> + goto isolate_fail_put; >> /* Only take pages on LRU: a check now makes later tests safe */ >> if (!PageLRU(page)) > Thanks, Gavin
On 24.11.22 11:21, Gavin Shan wrote: > On 11/24/22 6:09 PM, David Hildenbrand wrote: >> On 24.11.22 10:55, Gavin Shan wrote: >>> The issue is reported when removing memory through virtio_mem device. >>> The transparent huge page, experienced copy-on-write fault, is wrongly >>> regarded as pinned. The transparent huge page is escaped from being >>> isolated in isolate_migratepages_block(). The transparent huge page >>> can't be migrated and the corresponding memory block can't be put >>> into offline state. >>> >>> Fix it by replacing page_mapcount() with total_mapcount(). With this, >>> the transparent huge page can be isolated and migrated, and the memory >>> block can be put into offline state. Besides, The page's refcount is >>> increased a bit earlier to avoid the page is released when the check >>> is executed. >> >> Did you look into handling pages that are in the swapcache case as well? >> >> See is_refcount_suitable() in mm/khugepaged.c. >> >> Should be easy to reproduce, let me know if you need inspiration. >> > > Nope, I didn't look into the case. Please elaborate the details so that > I can reproduce it firstly. A simple reproducer would be (on a system with ordinary swap (not zram)) 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP 2) Enable THP for that region (MADV_HUGEPAGE) 3) Populate a THP (e.g., write access) 4) PTE-map the THP, for example, using MADV_FREE on the last subpage 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT 6) Read-access to some subpages to fault them in from the swapcache Now you'd have a THP, which 1) Is partially PTE-mapped into the page table 2) Is in the swapcache (each subpage should have one reference from the swapache) Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem).
On 24 Nov 2022, at 5:43, David Hildenbrand wrote: > On 24.11.22 11:21, Gavin Shan wrote: >> On 11/24/22 6:09 PM, David Hildenbrand wrote: >>> On 24.11.22 10:55, Gavin Shan wrote: >>>> The issue is reported when removing memory through virtio_mem device. >>>> The transparent huge page, experienced copy-on-write fault, is wrongly >>>> regarded as pinned. The transparent huge page is escaped from being >>>> isolated in isolate_migratepages_block(). The transparent huge page >>>> can't be migrated and the corresponding memory block can't be put >>>> into offline state. >>>> >>>> Fix it by replacing page_mapcount() with total_mapcount(). With this, >>>> the transparent huge page can be isolated and migrated, and the memory >>>> block can be put into offline state. Besides, The page's refcount is >>>> increased a bit earlier to avoid the page is released when the check >>>> is executed. >>> >>> Did you look into handling pages that are in the swapcache case as well? >>> >>> See is_refcount_suitable() in mm/khugepaged.c. >>> >>> Should be easy to reproduce, let me know if you need inspiration. >>> >> >> Nope, I didn't look into the case. Please elaborate the details so that >> I can reproduce it firstly. > > > A simple reproducer would be (on a system with ordinary swap (not zram)) > > 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP > > 2) Enable THP for that region (MADV_HUGEPAGE) > > 3) Populate a THP (e.g., write access) > > 4) PTE-map the THP, for example, using MADV_FREE on the last subpage > > 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT Added the original THP swapout code author, Ying. At this step, the THP will be split, right? https://elixir.bootlin.com/linux/latest/source/mm/vmscan.c#L1786 Even if a THP has PMD mapping, IIRC, it is split in the add_to_swap() then swapped out. But I cannot find that split code now. > > 6) Read-access to some subpages to fault them in from the swapcache > > > Now you'd have a THP, which > > 1) Is partially PTE-mapped into the page table > 2) Is in the swapcache (each subpage should have one reference from the swapache) > > > Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem). > > -- > Thanks, > > David / dhildenb -- Best Regards, Yan Zi
On 11/24/22 6:43 PM, David Hildenbrand wrote: > On 24.11.22 11:21, Gavin Shan wrote: >> On 11/24/22 6:09 PM, David Hildenbrand wrote: >>> On 24.11.22 10:55, Gavin Shan wrote: >>>> The issue is reported when removing memory through virtio_mem device. >>>> The transparent huge page, experienced copy-on-write fault, is wrongly >>>> regarded as pinned. The transparent huge page is escaped from being >>>> isolated in isolate_migratepages_block(). The transparent huge page >>>> can't be migrated and the corresponding memory block can't be put >>>> into offline state. >>>> >>>> Fix it by replacing page_mapcount() with total_mapcount(). With this, >>>> the transparent huge page can be isolated and migrated, and the memory >>>> block can be put into offline state. Besides, The page's refcount is >>>> increased a bit earlier to avoid the page is released when the check >>>> is executed. >>> >>> Did you look into handling pages that are in the swapcache case as well? >>> >>> See is_refcount_suitable() in mm/khugepaged.c. >>> >>> Should be easy to reproduce, let me know if you need inspiration. >>> >> >> Nope, I didn't look into the case. Please elaborate the details so that >> I can reproduce it firstly. > > > A simple reproducer would be (on a system with ordinary swap (not zram)) > > 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP > > 2) Enable THP for that region (MADV_HUGEPAGE) > > 3) Populate a THP (e.g., write access) > > 4) PTE-map the THP, for example, using MADV_FREE on the last subpage > > 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT > > 6) Read-access to some subpages to fault them in from the swapcache > > > Now you'd have a THP, which > > 1) Is partially PTE-mapped into the page table > 2) Is in the swapcache (each subpage should have one reference from the swapache) > > > Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem). > Thanks for the details. Step (4) and (5) can be actually combined. To swap part of the THP (e.g. one sub-page) will force the THP to be split. I followed your steps in the attached program, there is no issue to do memory hot-remove through virtio-mem with or without this patch. # numactl -p 1 testsuite mm swap -k Any key to split THP Any key to swap sub-pages Any key to read the swapped sub-pages Page[000]: 0xffffffffffffffff Page[001]: 0xffffffffffffffff : Page[255]: 0xffffffffffffffff Any key to exit // hold here and the program doesn't exit (qemu) qom-set vm1 requested-size 0 [ 356.005396] virtio_mem virtio1: plugged size: 0x40000000 [ 356.005996] virtio_mem virtio1: requested size: 0x0 [ 356.350299] Fallback order for Node 0: 0 1 [ 356.350810] Fallback order for Node 1: 1 0 [ 356.351260] Built 2 zonelists, mobility grouping on. Total pages: 491343 [ 356.351998] Policy zone: DMA Thanks, Gavin
On 24.11.22 13:38, Zi Yan wrote: > > On 24 Nov 2022, at 5:43, David Hildenbrand wrote: > >> On 24.11.22 11:21, Gavin Shan wrote: >>> On 11/24/22 6:09 PM, David Hildenbrand wrote: >>>> On 24.11.22 10:55, Gavin Shan wrote: >>>>> The issue is reported when removing memory through virtio_mem device. >>>>> The transparent huge page, experienced copy-on-write fault, is wrongly >>>>> regarded as pinned. The transparent huge page is escaped from being >>>>> isolated in isolate_migratepages_block(). The transparent huge page >>>>> can't be migrated and the corresponding memory block can't be put >>>>> into offline state. >>>>> >>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this, >>>>> the transparent huge page can be isolated and migrated, and the memory >>>>> block can be put into offline state. Besides, The page's refcount is >>>>> increased a bit earlier to avoid the page is released when the check >>>>> is executed. >>>> >>>> Did you look into handling pages that are in the swapcache case as well? >>>> >>>> See is_refcount_suitable() in mm/khugepaged.c. >>>> >>>> Should be easy to reproduce, let me know if you need inspiration. >>>> >>> >>> Nope, I didn't look into the case. Please elaborate the details so that >>> I can reproduce it firstly. >> >> >> A simple reproducer would be (on a system with ordinary swap (not zram)) >> >> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP >> >> 2) Enable THP for that region (MADV_HUGEPAGE) >> >> 3) Populate a THP (e.g., write access) >> >> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage >> >> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT > > Added the original THP swapout code author, Ying. > > At this step, the THP will be split, right? > > https://elixir.bootlin.com/linux/latest/source/mm/vmscan.c#L1786 > > Even if a THP has PMD mapping, IIRC, it is split in the add_to_swap() > then swapped out. But I cannot find that split code now. I recall there was some sequence to achieve it. Maybe it was swapping out the PMD first and not triggering a PTE-mapping first. mm/vmscan.c:shrink_folio_list() if (folio_test_large(folio)) { /* cannot split folio, skip it */ if (!can_split_folio(folio, NULL)) goto activate_locked; /* * Split folios without a PMD map right * away. Chances are some or all of the * tail pages can be freed without IO. */ if (!folio_entire_mapcount(folio) && split_folio_to_list(folio, folio_list)) goto activate_locked; } } So the sequence might have to be 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP 2) Enable THP for that region (MADV_HUGEPAGE) 3) Populate a THP (e.g., write access) 4) Trigger swapout of the THP, for example, using MADV_PAGEOUT 5) Access some subpage As we don't have PMD swap entries, we will PTE-map the THP during try_to_unmap() IIRC. Independent of that, the check we have here also doesn't consider ordinary order-0 pages that might be in the swapache.
On 24.11.22 13:55, Gavin Shan wrote: > On 11/24/22 6:43 PM, David Hildenbrand wrote: >> On 24.11.22 11:21, Gavin Shan wrote: >>> On 11/24/22 6:09 PM, David Hildenbrand wrote: >>>> On 24.11.22 10:55, Gavin Shan wrote: >>>>> The issue is reported when removing memory through virtio_mem device. >>>>> The transparent huge page, experienced copy-on-write fault, is wrongly >>>>> regarded as pinned. The transparent huge page is escaped from being >>>>> isolated in isolate_migratepages_block(). The transparent huge page >>>>> can't be migrated and the corresponding memory block can't be put >>>>> into offline state. >>>>> >>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this, >>>>> the transparent huge page can be isolated and migrated, and the memory >>>>> block can be put into offline state. Besides, The page's refcount is >>>>> increased a bit earlier to avoid the page is released when the check >>>>> is executed. >>>> >>>> Did you look into handling pages that are in the swapcache case as well? >>>> >>>> See is_refcount_suitable() in mm/khugepaged.c. >>>> >>>> Should be easy to reproduce, let me know if you need inspiration. >>>> >>> >>> Nope, I didn't look into the case. Please elaborate the details so that >>> I can reproduce it firstly. >> >> >> A simple reproducer would be (on a system with ordinary swap (not zram)) >> >> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP >> >> 2) Enable THP for that region (MADV_HUGEPAGE) >> >> 3) Populate a THP (e.g., write access) >> >> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage >> >> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT >> >> 6) Read-access to some subpages to fault them in from the swapcache >> >> >> Now you'd have a THP, which >> >> 1) Is partially PTE-mapped into the page table >> 2) Is in the swapcache (each subpage should have one reference from the swapache) >> >> >> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem). >> > > Thanks for the details. Step (4) and (5) can be actually combined. To swap part of > the THP (e.g. one sub-page) will force the THP to be split. > > I followed your steps in the attached program, there is no issue to do memory hot-remove > through virtio-mem with or without this patch. Interesting. But I don't really see how we could pass this check with a page that's in the swapcache, maybe I'm missing something else. I'll try to see if I can reproduce it.
On 24.11.22 14:22, David Hildenbrand wrote: > On 24.11.22 13:55, Gavin Shan wrote: >> On 11/24/22 6:43 PM, David Hildenbrand wrote: >>> On 24.11.22 11:21, Gavin Shan wrote: >>>> On 11/24/22 6:09 PM, David Hildenbrand wrote: >>>>> On 24.11.22 10:55, Gavin Shan wrote: >>>>>> The issue is reported when removing memory through virtio_mem device. >>>>>> The transparent huge page, experienced copy-on-write fault, is wrongly >>>>>> regarded as pinned. The transparent huge page is escaped from being >>>>>> isolated in isolate_migratepages_block(). The transparent huge page >>>>>> can't be migrated and the corresponding memory block can't be put >>>>>> into offline state. >>>>>> >>>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this, >>>>>> the transparent huge page can be isolated and migrated, and the memory >>>>>> block can be put into offline state. Besides, The page's refcount is >>>>>> increased a bit earlier to avoid the page is released when the check >>>>>> is executed. >>>>> >>>>> Did you look into handling pages that are in the swapcache case as well? >>>>> >>>>> See is_refcount_suitable() in mm/khugepaged.c. >>>>> >>>>> Should be easy to reproduce, let me know if you need inspiration. >>>>> >>>> >>>> Nope, I didn't look into the case. Please elaborate the details so that >>>> I can reproduce it firstly. >>> >>> >>> A simple reproducer would be (on a system with ordinary swap (not zram)) >>> >>> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP >>> >>> 2) Enable THP for that region (MADV_HUGEPAGE) >>> >>> 3) Populate a THP (e.g., write access) >>> >>> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage >>> >>> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT >>> >>> 6) Read-access to some subpages to fault them in from the swapcache >>> >>> >>> Now you'd have a THP, which >>> >>> 1) Is partially PTE-mapped into the page table >>> 2) Is in the swapcache (each subpage should have one reference from the swapache) >>> >>> >>> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem). >>> >> >> Thanks for the details. Step (4) and (5) can be actually combined. To swap part of >> the THP (e.g. one sub-page) will force the THP to be split. >> >> I followed your steps in the attached program, there is no issue to do memory hot-remove >> through virtio-mem with or without this patch. > > Interesting. But I don't really see how we could pass this check with a > page that's in the swapcache, maybe I'm missing something else. > > I'll try to see if I can reproduce it. > After some unsuccessful attempts and many head-scratches, I realized that it's quite simple why we don't have to worry about swapcache pages here: page_mapping() is != NULL for pages in the swapcache: folio_mapping() makes this rather obvious: if (unlikely(folio_test_swapcache(folio)) return swap_address_space(folio_swap_entry(folio)); I think the get_page_unless_zero() might also be a fix for the page_mapping() call, smells like something could blow up on concurrent page freeing. (what about concurrent removal from the swapcache? nobody knows :) ) Thanks Gavin! Acked-by: David Hildenbrand <david@redhat.com>
With the patch applied, I'm unable to hit memory hot-remove failure in the environment where the issue was initially found. Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> On Thu, Nov 24, 2022 at 10:09 PM David Hildenbrand <david@redhat.com> wrote: > > On 24.11.22 14:22, David Hildenbrand wrote: > > On 24.11.22 13:55, Gavin Shan wrote: > >> On 11/24/22 6:43 PM, David Hildenbrand wrote: > >>> On 24.11.22 11:21, Gavin Shan wrote: > >>>> On 11/24/22 6:09 PM, David Hildenbrand wrote: > >>>>> On 24.11.22 10:55, Gavin Shan wrote: > >>>>>> The issue is reported when removing memory through virtio_mem device. > >>>>>> The transparent huge page, experienced copy-on-write fault, is wrongly > >>>>>> regarded as pinned. The transparent huge page is escaped from being > >>>>>> isolated in isolate_migratepages_block(). The transparent huge page > >>>>>> can't be migrated and the corresponding memory block can't be put > >>>>>> into offline state. > >>>>>> > >>>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this, > >>>>>> the transparent huge page can be isolated and migrated, and the memory > >>>>>> block can be put into offline state. Besides, The page's refcount is > >>>>>> increased a bit earlier to avoid the page is released when the check > >>>>>> is executed. > >>>>> > >>>>> Did you look into handling pages that are in the swapcache case as well? > >>>>> > >>>>> See is_refcount_suitable() in mm/khugepaged.c. > >>>>> > >>>>> Should be easy to reproduce, let me know if you need inspiration. > >>>>> > >>>> > >>>> Nope, I didn't look into the case. Please elaborate the details so that > >>>> I can reproduce it firstly. > >>> > >>> > >>> A simple reproducer would be (on a system with ordinary swap (not zram)) > >>> > >>> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP > >>> > >>> 2) Enable THP for that region (MADV_HUGEPAGE) > >>> > >>> 3) Populate a THP (e.g., write access) > >>> > >>> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage > >>> > >>> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT > >>> > >>> 6) Read-access to some subpages to fault them in from the swapcache > >>> > >>> > >>> Now you'd have a THP, which > >>> > >>> 1) Is partially PTE-mapped into the page table > >>> 2) Is in the swapcache (each subpage should have one reference from the swapache) > >>> > >>> > >>> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem). > >>> > >> > >> Thanks for the details. Step (4) and (5) can be actually combined. To swap part of > >> the THP (e.g. one sub-page) will force the THP to be split. > >> > >> I followed your steps in the attached program, there is no issue to do memory hot-remove > >> through virtio-mem with or without this patch. > > > > Interesting. But I don't really see how we could pass this check with a > > page that's in the swapcache, maybe I'm missing something else. > > > > I'll try to see if I can reproduce it. > > > > After some unsuccessful attempts and many head-scratches, I realized > that it's quite simple why we don't have to worry about swapcache pages > here: > > page_mapping() is != NULL for pages in the swapcache: folio_mapping() > makes this rather obvious: > > if (unlikely(folio_test_swapcache(folio)) > return swap_address_space(folio_swap_entry(folio)); > > > I think the get_page_unless_zero() might also be a fix for the > page_mapping() call, smells like something could blow up on concurrent > page freeing. (what about concurrent removal from the swapcache? nobody > knows :) ) > > > Thanks Gavin! > > Acked-by: David Hildenbrand <david@redhat.com> > > > -- > Thanks, > > David / dhildenb >
diff --git a/mm/compaction.c b/mm/compaction.c index c51f7f545afe..1f6da31dd9a5 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -984,29 +984,29 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, goto isolate_fail; } + /* + * Be careful not to clear PageLRU until after we're + * sure the page is not being freed elsewhere -- the + * page release code relies on it. + */ + if (unlikely(!get_page_unless_zero(page))) + goto isolate_fail; + /* * Migration will fail if an anonymous page is pinned in memory, * so avoid taking lru_lock and isolating it unnecessarily in an * admittedly racy check. */ mapping = page_mapping(page); - if (!mapping && page_count(page) > page_mapcount(page)) - goto isolate_fail; + if (!mapping && (page_count(page) - 1) > total_mapcount(page)) + goto isolate_fail_put; /* * Only allow to migrate anonymous pages in GFP_NOFS context * because those do not depend on fs locks. */ if (!(cc->gfp_mask & __GFP_FS) && mapping) - goto isolate_fail; - - /* - * Be careful not to clear PageLRU until after we're - * sure the page is not being freed elsewhere -- the - * page release code relies on it. - */ - if (unlikely(!get_page_unless_zero(page))) - goto isolate_fail; + goto isolate_fail_put; /* Only take pages on LRU: a check now makes later tests safe */ if (!PageLRU(page))
The issue is reported when removing memory through virtio_mem device. The transparent huge page, experienced copy-on-write fault, is wrongly regarded as pinned. The transparent huge page is escaped from being isolated in isolate_migratepages_block(). The transparent huge page can't be migrated and the corresponding memory block can't be put into offline state. Fix it by replacing page_mapcount() with total_mapcount(). With this, the transparent huge page can be isolated and migrated, and the memory block can be put into offline state. Besides, The page's refcount is increased a bit earlier to avoid the page is released when the check is executed. Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations") Cc: stable@vger.kernel.org # v5.7+ Reported-by: Zhenyu Zhang <zhenyzha@redhat.com> Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Gavin Shan <gshan@redhat.com> --- v2: Corrected fix tag and increase page's refcount before the check --- mm/compaction.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-)