diff mbox series

[hotfix] mm/migrate: fix kernel BUG at mm/compaction.c:2761!

Message ID 46c948b4-4dd8-6e03-4c7b-ce4e81cfa536@google.com (mailing list archive)
State New
Headers show
Series [hotfix] mm/migrate: fix kernel BUG at mm/compaction.c:2761! | expand

Commit Message

Hugh Dickins June 12, 2024, 5:06 a.m. UTC
I hit the VM_BUG_ON(!list_empty(&cc->migratepages)) in compact_zone();
and if DEBUG_VM were off, then pages would be lost on a local list.

Our convention is that if migrate_pages() reports complete success (0),
then the migratepages list will be empty; but if it reports an error or
some pages remaining, then its caller must putback_movable_pages().

There's a new case in which migrate_pages() has been reporting complete
success, but returning with pages left on the migratepages list: when
migrate_pages_batch() successfully split a folio on the deferred list,
but then the "Failure isn't counted" call does not dispose of them all.

Since that block is expecting the large folio to have been counted as 1
failure already, and since the return code is later adjusted to success
whenever the returned list is found empty, the simple way to fix this
safely is to count splitting the deferred folio as "a failure".

Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list")
Signed-off-by: Hugh Dickins <hughd@google.com>
---
A hotfix to 6.10-rc, not needed for stable.

 mm/migrate.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Huang, Ying June 12, 2024, 6:19 a.m. UTC | #1
Hugh Dickins <hughd@google.com> writes:

> I hit the VM_BUG_ON(!list_empty(&cc->migratepages)) in compact_zone();
> and if DEBUG_VM were off, then pages would be lost on a local list.
>
> Our convention is that if migrate_pages() reports complete success (0),
> then the migratepages list will be empty; but if it reports an error or
> some pages remaining, then its caller must putback_movable_pages().
>
> There's a new case in which migrate_pages() has been reporting complete
> success, but returning with pages left on the migratepages list: when
> migrate_pages_batch() successfully split a folio on the deferred list,
> but then the "Failure isn't counted" call does not dispose of them all.
>
> Since that block is expecting the large folio to have been counted as 1
> failure already, and since the return code is later adjusted to success
> whenever the returned list is found empty, the simple way to fix this
> safely is to count splitting the deferred folio as "a failure".
>
> Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
> A hotfix to 6.10-rc, not needed for stable.
>
>  mm/migrate.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1654,7 +1654,12 @@ static int migrate_pages_batch(struct list_head *from,
>  
>  			/*
>  			 * The rare folio on the deferred split list should
> -			 * be split now. It should not count as a failure.
> +			 * be split now. It should not count as a failure:
> +			 * but increment nr_failed because, without doing so,
> +			 * migrate_pages() may report success with (split but
> +			 * unmigrated) pages still on its fromlist; whereas it
> +			 * always reports success when its fromlist is empty.
> +			 *
>  			 * Only check it without removing it from the list.
>  			 * Since the folio can be on deferred_split_scan()
>  			 * local list and removing it can cause the local list
> @@ -1669,6 +1674,7 @@ static int migrate_pages_batch(struct list_head *from,
>  			if (nr_pages > 2 &&
>  			   !list_empty(&folio->_deferred_list)) {
>  				if (try_split_folio(folio, split_folios) == 0) {
> +					nr_failed++;

It appears better to add

        stats->nr_thp_failed++;

too.  Otherwise, if migrate_pages_batch() is called via migrate_pages(,
MIGRATE_ASYNC, ), nr_thp_failed will not increase.  But if
migrate_pages_batch() is called via migrate_pages(, MIGRATE_SYNC*, ),
nr_thp_failed will increase in migrate_pages_sync() via

        stats->nr_thp_failed += astats.nr_thp_split;

That is, they are not consistent.  The issue exists since commit
7262f208ca68 ("mm/migrate: split source folio if it is on deferred split
list").

Otherwise, this looks good to me.  Thanks!

>  					stats->nr_thp_split += is_thp;
>  					stats->nr_split++;
>  					continue;

--
Best Regards,
Huang, Ying
Hugh Dickins June 12, 2024, 7:11 a.m. UTC | #2
On Wed, 12 Jun 2024, Huang, Ying wrote:
> Hugh Dickins <hughd@google.com> writes:
> 
> > I hit the VM_BUG_ON(!list_empty(&cc->migratepages)) in compact_zone();
> > and if DEBUG_VM were off, then pages would be lost on a local list.
> >
> > Our convention is that if migrate_pages() reports complete success (0),
> > then the migratepages list will be empty; but if it reports an error or
> > some pages remaining, then its caller must putback_movable_pages().
> >
> > There's a new case in which migrate_pages() has been reporting complete
> > success, but returning with pages left on the migratepages list: when
> > migrate_pages_batch() successfully split a folio on the deferred list,
> > but then the "Failure isn't counted" call does not dispose of them all.
> >
> > Since that block is expecting the large folio to have been counted as 1
> > failure already, and since the return code is later adjusted to success
> > whenever the returned list is found empty, the simple way to fix this
> > safely is to count splitting the deferred folio as "a failure".
> >
> > Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list")
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > ---
> > A hotfix to 6.10-rc, not needed for stable.
> >
> >  mm/migrate.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -1654,7 +1654,12 @@ static int migrate_pages_batch(struct list_head *from,
> >  
> >  			/*
> >  			 * The rare folio on the deferred split list should
> > -			 * be split now. It should not count as a failure.
> > +			 * be split now. It should not count as a failure:
> > +			 * but increment nr_failed because, without doing so,
> > +			 * migrate_pages() may report success with (split but
> > +			 * unmigrated) pages still on its fromlist; whereas it
> > +			 * always reports success when its fromlist is empty.
> > +			 *
> >  			 * Only check it without removing it from the list.
> >  			 * Since the folio can be on deferred_split_scan()
> >  			 * local list and removing it can cause the local list
> > @@ -1669,6 +1674,7 @@ static int migrate_pages_batch(struct list_head *from,
> >  			if (nr_pages > 2 &&
> >  			   !list_empty(&folio->_deferred_list)) {
> >  				if (try_split_folio(folio, split_folios) == 0) {
> > +					nr_failed++;
> 
> It appears better to add
> 
>         stats->nr_thp_failed++;
> 
> too.  Otherwise, if migrate_pages_batch() is called via migrate_pages(,
> MIGRATE_ASYNC, ), nr_thp_failed will not increase.  But if
> migrate_pages_batch() is called via migrate_pages(, MIGRATE_SYNC*, ),
> nr_thp_failed will increase in migrate_pages_sync() via
> 
>         stats->nr_thp_failed += astats.nr_thp_split;
> 
> That is, they are not consistent.  The issue exists since commit
> 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split
> list").

Sorry, I'll have to let you take over and send your own patch instead
of mine - thanks. Those stats, and any attempt at consistency there,
is way beyond me! I thought consistency was impossible, unless all
the numbers were changed to order-0 counts.

Hugh

> 
> Otherwise, this looks good to me.  Thanks!
> 
> >  					stats->nr_thp_split += is_thp;
> >  					stats->nr_split++;
> >  					continue;
> 
> --
> Best Regards,
> Huang, Ying
Andrew Morton June 12, 2024, 4 p.m. UTC | #3
On Wed, 12 Jun 2024 00:11:44 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:

> > It appears better to add
> > 
> >         stats->nr_thp_failed++;
> > 
> > too.  Otherwise, if migrate_pages_batch() is called via migrate_pages(,
> > MIGRATE_ASYNC, ), nr_thp_failed will not increase.  But if
> > migrate_pages_batch() is called via migrate_pages(, MIGRATE_SYNC*, ),
> > nr_thp_failed will increase in migrate_pages_sync() via
> > 
> >         stats->nr_thp_failed += astats.nr_thp_split;
> > 
> > That is, they are not consistent.  The issue exists since commit
> > 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split
> > list").
> 
> Sorry, I'll have to let you take over and send your own patch instead
> of mine - thanks. Those stats, and any attempt at consistency there,
> is way beyond me! I thought consistency was impossible, unless all
> the numbers were changed to order-0 counts.

I'll add this patch as-is for now.
Peter Xu July 8, 2024, 3:52 p.m. UTC | #4
On Tue, Jun 11, 2024 at 10:06:20PM -0700, Hugh Dickins wrote:
> I hit the VM_BUG_ON(!list_empty(&cc->migratepages)) in compact_zone();
> and if DEBUG_VM were off, then pages would be lost on a local list.
> 
> Our convention is that if migrate_pages() reports complete success (0),
> then the migratepages list will be empty; but if it reports an error or
> some pages remaining, then its caller must putback_movable_pages().
> 
> There's a new case in which migrate_pages() has been reporting complete
> success, but returning with pages left on the migratepages list: when
> migrate_pages_batch() successfully split a folio on the deferred list,
> but then the "Failure isn't counted" call does not dispose of them all.
> 
> Since that block is expecting the large folio to have been counted as 1
> failure already, and since the return code is later adjusted to success
> whenever the returned list is found empty, the simple way to fix this
> safely is to count splitting the deferred folio as "a failure".
> 
> Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
> A hotfix to 6.10-rc, not needed for stable.
> 
>  mm/migrate.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1654,7 +1654,12 @@ static int migrate_pages_batch(struct list_head *from,
>  
>  			/*
>  			 * The rare folio on the deferred split list should
> -			 * be split now. It should not count as a failure.
> +			 * be split now. It should not count as a failure:
> +			 * but increment nr_failed because, without doing so,
> +			 * migrate_pages() may report success with (split but
> +			 * unmigrated) pages still on its fromlist; whereas it
> +			 * always reports success when its fromlist is empty.
> +			 *
>  			 * Only check it without removing it from the list.
>  			 * Since the folio can be on deferred_split_scan()
>  			 * local list and removing it can cause the local list
> @@ -1669,6 +1674,7 @@ static int migrate_pages_batch(struct list_head *from,
>  			if (nr_pages > 2 &&
>  			   !list_empty(&folio->_deferred_list)) {
>  				if (try_split_folio(folio, split_folios) == 0) {
> +					nr_failed++;
>  					stats->nr_thp_split += is_thp;
>  					stats->nr_split++;
>  					continue;
> -- 
> 2.35.3
> 
> 

We probably hit the same issue in our testbeds, but in the other
migrate_misplaced_folio() path, which contains the BUG_ON() rather than
VM_BUG_ON().  Looks like this patch can also fix that.

When looking at that, I wonder whether we overlooked one more spot where we
mostly always use putback_movable_pages() for migrate failures, but didn't
in migrate_misplaced_folio().  I feel like it was overlooked but want to
check with all of you here, as I do think the folio can already be split
when reaching here too. So I wonder whether below would make sense as a fix
from that POV.

===8<===
diff --git a/mm/migrate.c b/mm/migrate.c
index e10d2445fbd8..20da2595527a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2615,14 +2615,8 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma,
        nr_remaining = migrate_pages(&migratepages, alloc_misplaced_dst_folio,
                                     NULL, node, MIGRATE_ASYNC,
                                     MR_NUMA_MISPLACED, &nr_succeeded);
-       if (nr_remaining) {
-               if (!list_empty(&migratepages)) {
-                       list_del(&folio->lru);
-                       node_stat_mod_folio(folio, NR_ISOLATED_ANON +
-                                       folio_is_file_lru(folio), -nr_pages);
-                       folio_putback_lru(folio);
-               }
-       }
+       if (nr_remaining && !list_empty(&migratepages))
+               putback_movable_pages(&migratepages);
        if (nr_succeeded) {
                count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
                if (!node_is_toptier(folio_nid(folio)) && node_is_toptier(node))
===8<===

Thanks,
Zi Yan July 8, 2024, 4:39 p.m. UTC | #5
On 8 Jul 2024, at 11:52, Peter Xu wrote:

> On Tue, Jun 11, 2024 at 10:06:20PM -0700, Hugh Dickins wrote:
>> I hit the VM_BUG_ON(!list_empty(&cc->migratepages)) in compact_zone();
>> and if DEBUG_VM were off, then pages would be lost on a local list.
>>
>> Our convention is that if migrate_pages() reports complete success (0),
>> then the migratepages list will be empty; but if it reports an error or
>> some pages remaining, then its caller must putback_movable_pages().
>>
>> There's a new case in which migrate_pages() has been reporting complete
>> success, but returning with pages left on the migratepages list: when
>> migrate_pages_batch() successfully split a folio on the deferred list,
>> but then the "Failure isn't counted" call does not dispose of them all.
>>
>> Since that block is expecting the large folio to have been counted as 1
>> failure already, and since the return code is later adjusted to success
>> whenever the returned list is found empty, the simple way to fix this
>> safely is to count splitting the deferred folio as "a failure".
>>
>> Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list")
>> Signed-off-by: Hugh Dickins <hughd@google.com>
>> ---
>> A hotfix to 6.10-rc, not needed for stable.
>>
>>  mm/migrate.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1654,7 +1654,12 @@ static int migrate_pages_batch(struct list_head *from,
>>
>>  			/*
>>  			 * The rare folio on the deferred split list should
>> -			 * be split now. It should not count as a failure.
>> +			 * be split now. It should not count as a failure:
>> +			 * but increment nr_failed because, without doing so,
>> +			 * migrate_pages() may report success with (split but
>> +			 * unmigrated) pages still on its fromlist; whereas it
>> +			 * always reports success when its fromlist is empty.
>> +			 *
>>  			 * Only check it without removing it from the list.
>>  			 * Since the folio can be on deferred_split_scan()
>>  			 * local list and removing it can cause the local list
>> @@ -1669,6 +1674,7 @@ static int migrate_pages_batch(struct list_head *from,
>>  			if (nr_pages > 2 &&
>>  			   !list_empty(&folio->_deferred_list)) {
>>  				if (try_split_folio(folio, split_folios) == 0) {
>> +					nr_failed++;
>>  					stats->nr_thp_split += is_thp;
>>  					stats->nr_split++;
>>  					continue;
>> -- 
>> 2.35.3
>>
>>
>
> We probably hit the same issue in our testbeds, but in the other
> migrate_misplaced_folio() path, which contains the BUG_ON() rather than
> VM_BUG_ON().  Looks like this patch can also fix that.
>
> When looking at that, I wonder whether we overlooked one more spot where we
> mostly always use putback_movable_pages() for migrate failures, but didn't
> in migrate_misplaced_folio().  I feel like it was overlooked but want to
> check with all of you here, as I do think the folio can already be split
> when reaching here too. So I wonder whether below would make sense as a fix
> from that POV.
>
> ===8<===
> diff --git a/mm/migrate.c b/mm/migrate.c
> index e10d2445fbd8..20da2595527a 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2615,14 +2615,8 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma,
>         nr_remaining = migrate_pages(&migratepages, alloc_misplaced_dst_folio,
>                                      NULL, node, MIGRATE_ASYNC,
>                                      MR_NUMA_MISPLACED, &nr_succeeded);
> -       if (nr_remaining) {
> -               if (!list_empty(&migratepages)) {
> -                       list_del(&folio->lru);
> -                       node_stat_mod_folio(folio, NR_ISOLATED_ANON +
> -                                       folio_is_file_lru(folio), -nr_pages);
> -                       folio_putback_lru(folio);
> -               }
> -       }
> +       if (nr_remaining && !list_empty(&migratepages))
> +               putback_movable_pages(&migratepages);
>         if (nr_succeeded) {
>                 count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
>                 if (!node_is_toptier(folio_nid(folio)) && node_is_toptier(node))
> ===8<===

If the original folio is large and split without migrating all subpages,
not migrated sub pages will be left on migratepages list. list_del(&folio->lru)
can remove the first subpage from a wrong list, if it is migrated, and loses
the rest. It is not a problem before, since MR_NUMA_MISPLACED prevents the
folio from being split.

The fix looks good to me.

--
Best Regards,
Yan, Zi
Peter Xu July 8, 2024, 9:50 p.m. UTC | #6
On Mon, Jul 08, 2024 at 12:39:10PM -0400, Zi Yan wrote:
> On 8 Jul 2024, at 11:52, Peter Xu wrote:
> 
> > On Tue, Jun 11, 2024 at 10:06:20PM -0700, Hugh Dickins wrote:
> >> I hit the VM_BUG_ON(!list_empty(&cc->migratepages)) in compact_zone();
> >> and if DEBUG_VM were off, then pages would be lost on a local list.
> >>
> >> Our convention is that if migrate_pages() reports complete success (0),
> >> then the migratepages list will be empty; but if it reports an error or
> >> some pages remaining, then its caller must putback_movable_pages().
> >>
> >> There's a new case in which migrate_pages() has been reporting complete
> >> success, but returning with pages left on the migratepages list: when
> >> migrate_pages_batch() successfully split a folio on the deferred list,
> >> but then the "Failure isn't counted" call does not dispose of them all.
> >>
> >> Since that block is expecting the large folio to have been counted as 1
> >> failure already, and since the return code is later adjusted to success
> >> whenever the returned list is found empty, the simple way to fix this
> >> safely is to count splitting the deferred folio as "a failure".
> >>
> >> Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list")
> >> Signed-off-by: Hugh Dickins <hughd@google.com>
> >> ---
> >> A hotfix to 6.10-rc, not needed for stable.
> >>
> >>  mm/migrate.c | 8 +++++++-
> >>  1 file changed, 7 insertions(+), 1 deletion(-)
> >>
> >> --- a/mm/migrate.c
> >> +++ b/mm/migrate.c
> >> @@ -1654,7 +1654,12 @@ static int migrate_pages_batch(struct list_head *from,
> >>
> >>  			/*
> >>  			 * The rare folio on the deferred split list should
> >> -			 * be split now. It should not count as a failure.
> >> +			 * be split now. It should not count as a failure:
> >> +			 * but increment nr_failed because, without doing so,
> >> +			 * migrate_pages() may report success with (split but
> >> +			 * unmigrated) pages still on its fromlist; whereas it
> >> +			 * always reports success when its fromlist is empty.
> >> +			 *
> >>  			 * Only check it without removing it from the list.
> >>  			 * Since the folio can be on deferred_split_scan()
> >>  			 * local list and removing it can cause the local list
> >> @@ -1669,6 +1674,7 @@ static int migrate_pages_batch(struct list_head *from,
> >>  			if (nr_pages > 2 &&
> >>  			   !list_empty(&folio->_deferred_list)) {
> >>  				if (try_split_folio(folio, split_folios) == 0) {
> >> +					nr_failed++;
> >>  					stats->nr_thp_split += is_thp;
> >>  					stats->nr_split++;
> >>  					continue;
> >> -- 
> >> 2.35.3
> >>
> >>
> >
> > We probably hit the same issue in our testbeds, but in the other
> > migrate_misplaced_folio() path, which contains the BUG_ON() rather than
> > VM_BUG_ON().  Looks like this patch can also fix that.
> >
> > When looking at that, I wonder whether we overlooked one more spot where we
> > mostly always use putback_movable_pages() for migrate failures, but didn't
> > in migrate_misplaced_folio().  I feel like it was overlooked but want to
> > check with all of you here, as I do think the folio can already be split
> > when reaching here too. So I wonder whether below would make sense as a fix
> > from that POV.
> >
> > ===8<===
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index e10d2445fbd8..20da2595527a 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -2615,14 +2615,8 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma,
> >         nr_remaining = migrate_pages(&migratepages, alloc_misplaced_dst_folio,
> >                                      NULL, node, MIGRATE_ASYNC,
> >                                      MR_NUMA_MISPLACED, &nr_succeeded);
> > -       if (nr_remaining) {
> > -               if (!list_empty(&migratepages)) {
> > -                       list_del(&folio->lru);
> > -                       node_stat_mod_folio(folio, NR_ISOLATED_ANON +
> > -                                       folio_is_file_lru(folio), -nr_pages);
> > -                       folio_putback_lru(folio);
> > -               }
> > -       }
> > +       if (nr_remaining && !list_empty(&migratepages))
> > +               putback_movable_pages(&migratepages);
> >         if (nr_succeeded) {
> >                 count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
> >                 if (!node_is_toptier(folio_nid(folio)) && node_is_toptier(node))
> > ===8<===
> 
> If the original folio is large and split without migrating all subpages,
> not migrated sub pages will be left on migratepages list. list_del(&folio->lru)
> can remove the first subpage from a wrong list, if it is migrated, and loses
> the rest. It is not a problem before, since MR_NUMA_MISPLACED prevents the
> folio from being split.
> 
> The fix looks good to me.

Thanks, Zi.  Let me send a formal patch then.
diff mbox series

Patch

--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1654,7 +1654,12 @@  static int migrate_pages_batch(struct list_head *from,
 
 			/*
 			 * The rare folio on the deferred split list should
-			 * be split now. It should not count as a failure.
+			 * be split now. It should not count as a failure:
+			 * but increment nr_failed because, without doing so,
+			 * migrate_pages() may report success with (split but
+			 * unmigrated) pages still on its fromlist; whereas it
+			 * always reports success when its fromlist is empty.
+			 *
 			 * Only check it without removing it from the list.
 			 * Since the folio can be on deferred_split_scan()
 			 * local list and removing it can cause the local list
@@ -1669,6 +1674,7 @@  static int migrate_pages_batch(struct list_head *from,
 			if (nr_pages > 2 &&
 			   !list_empty(&folio->_deferred_list)) {
 				if (try_split_folio(folio, split_folios) == 0) {
+					nr_failed++;
 					stats->nr_thp_split += is_thp;
 					stats->nr_split++;
 					continue;