Message ID | 46c948b4-4dd8-6e03-4c7b-ce4e81cfa536@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [hotfix] mm/migrate: fix kernel BUG at mm/compaction.c:2761! | expand |
Hugh Dickins <hughd@google.com> writes: > I hit the VM_BUG_ON(!list_empty(&cc->migratepages)) in compact_zone(); > and if DEBUG_VM were off, then pages would be lost on a local list. > > Our convention is that if migrate_pages() reports complete success (0), > then the migratepages list will be empty; but if it reports an error or > some pages remaining, then its caller must putback_movable_pages(). > > There's a new case in which migrate_pages() has been reporting complete > success, but returning with pages left on the migratepages list: when > migrate_pages_batch() successfully split a folio on the deferred list, > but then the "Failure isn't counted" call does not dispose of them all. > > Since that block is expecting the large folio to have been counted as 1 > failure already, and since the return code is later adjusted to success > whenever the returned list is found empty, the simple way to fix this > safely is to count splitting the deferred folio as "a failure". > > Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list") > Signed-off-by: Hugh Dickins <hughd@google.com> > --- > A hotfix to 6.10-rc, not needed for stable. > > mm/migrate.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1654,7 +1654,12 @@ static int migrate_pages_batch(struct list_head *from, > > /* > * The rare folio on the deferred split list should > - * be split now. It should not count as a failure. > + * be split now. It should not count as a failure: > + * but increment nr_failed because, without doing so, > + * migrate_pages() may report success with (split but > + * unmigrated) pages still on its fromlist; whereas it > + * always reports success when its fromlist is empty. > + * > * Only check it without removing it from the list. > * Since the folio can be on deferred_split_scan() > * local list and removing it can cause the local list > @@ -1669,6 +1674,7 @@ static int migrate_pages_batch(struct list_head *from, > if (nr_pages > 2 && > !list_empty(&folio->_deferred_list)) { > if (try_split_folio(folio, split_folios) == 0) { > + nr_failed++; It appears better to add stats->nr_thp_failed++; too. Otherwise, if migrate_pages_batch() is called via migrate_pages(, MIGRATE_ASYNC, ), nr_thp_failed will not increase. But if migrate_pages_batch() is called via migrate_pages(, MIGRATE_SYNC*, ), nr_thp_failed will increase in migrate_pages_sync() via stats->nr_thp_failed += astats.nr_thp_split; That is, they are not consistent. The issue exists since commit 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list"). Otherwise, this looks good to me. Thanks! > stats->nr_thp_split += is_thp; > stats->nr_split++; > continue; -- Best Regards, Huang, Ying
On Wed, 12 Jun 2024, Huang, Ying wrote: > Hugh Dickins <hughd@google.com> writes: > > > I hit the VM_BUG_ON(!list_empty(&cc->migratepages)) in compact_zone(); > > and if DEBUG_VM were off, then pages would be lost on a local list. > > > > Our convention is that if migrate_pages() reports complete success (0), > > then the migratepages list will be empty; but if it reports an error or > > some pages remaining, then its caller must putback_movable_pages(). > > > > There's a new case in which migrate_pages() has been reporting complete > > success, but returning with pages left on the migratepages list: when > > migrate_pages_batch() successfully split a folio on the deferred list, > > but then the "Failure isn't counted" call does not dispose of them all. > > > > Since that block is expecting the large folio to have been counted as 1 > > failure already, and since the return code is later adjusted to success > > whenever the returned list is found empty, the simple way to fix this > > safely is to count splitting the deferred folio as "a failure". > > > > Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list") > > Signed-off-by: Hugh Dickins <hughd@google.com> > > --- > > A hotfix to 6.10-rc, not needed for stable. > > > > mm/migrate.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -1654,7 +1654,12 @@ static int migrate_pages_batch(struct list_head *from, > > > > /* > > * The rare folio on the deferred split list should > > - * be split now. It should not count as a failure. > > + * be split now. It should not count as a failure: > > + * but increment nr_failed because, without doing so, > > + * migrate_pages() may report success with (split but > > + * unmigrated) pages still on its fromlist; whereas it > > + * always reports success when its fromlist is empty. > > + * > > * Only check it without removing it from the list. > > * Since the folio can be on deferred_split_scan() > > * local list and removing it can cause the local list > > @@ -1669,6 +1674,7 @@ static int migrate_pages_batch(struct list_head *from, > > if (nr_pages > 2 && > > !list_empty(&folio->_deferred_list)) { > > if (try_split_folio(folio, split_folios) == 0) { > > + nr_failed++; > > It appears better to add > > stats->nr_thp_failed++; > > too. Otherwise, if migrate_pages_batch() is called via migrate_pages(, > MIGRATE_ASYNC, ), nr_thp_failed will not increase. But if > migrate_pages_batch() is called via migrate_pages(, MIGRATE_SYNC*, ), > nr_thp_failed will increase in migrate_pages_sync() via > > stats->nr_thp_failed += astats.nr_thp_split; > > That is, they are not consistent. The issue exists since commit > 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split > list"). Sorry, I'll have to let you take over and send your own patch instead of mine - thanks. Those stats, and any attempt at consistency there, is way beyond me! I thought consistency was impossible, unless all the numbers were changed to order-0 counts. Hugh > > Otherwise, this looks good to me. Thanks! > > > stats->nr_thp_split += is_thp; > > stats->nr_split++; > > continue; > > -- > Best Regards, > Huang, Ying
On Wed, 12 Jun 2024 00:11:44 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote: > > It appears better to add > > > > stats->nr_thp_failed++; > > > > too. Otherwise, if migrate_pages_batch() is called via migrate_pages(, > > MIGRATE_ASYNC, ), nr_thp_failed will not increase. But if > > migrate_pages_batch() is called via migrate_pages(, MIGRATE_SYNC*, ), > > nr_thp_failed will increase in migrate_pages_sync() via > > > > stats->nr_thp_failed += astats.nr_thp_split; > > > > That is, they are not consistent. The issue exists since commit > > 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split > > list"). > > Sorry, I'll have to let you take over and send your own patch instead > of mine - thanks. Those stats, and any attempt at consistency there, > is way beyond me! I thought consistency was impossible, unless all > the numbers were changed to order-0 counts. I'll add this patch as-is for now.
On Tue, Jun 11, 2024 at 10:06:20PM -0700, Hugh Dickins wrote: > I hit the VM_BUG_ON(!list_empty(&cc->migratepages)) in compact_zone(); > and if DEBUG_VM were off, then pages would be lost on a local list. > > Our convention is that if migrate_pages() reports complete success (0), > then the migratepages list will be empty; but if it reports an error or > some pages remaining, then its caller must putback_movable_pages(). > > There's a new case in which migrate_pages() has been reporting complete > success, but returning with pages left on the migratepages list: when > migrate_pages_batch() successfully split a folio on the deferred list, > but then the "Failure isn't counted" call does not dispose of them all. > > Since that block is expecting the large folio to have been counted as 1 > failure already, and since the return code is later adjusted to success > whenever the returned list is found empty, the simple way to fix this > safely is to count splitting the deferred folio as "a failure". > > Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list") > Signed-off-by: Hugh Dickins <hughd@google.com> > --- > A hotfix to 6.10-rc, not needed for stable. > > mm/migrate.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1654,7 +1654,12 @@ static int migrate_pages_batch(struct list_head *from, > > /* > * The rare folio on the deferred split list should > - * be split now. It should not count as a failure. > + * be split now. It should not count as a failure: > + * but increment nr_failed because, without doing so, > + * migrate_pages() may report success with (split but > + * unmigrated) pages still on its fromlist; whereas it > + * always reports success when its fromlist is empty. > + * > * Only check it without removing it from the list. > * Since the folio can be on deferred_split_scan() > * local list and removing it can cause the local list > @@ -1669,6 +1674,7 @@ static int migrate_pages_batch(struct list_head *from, > if (nr_pages > 2 && > !list_empty(&folio->_deferred_list)) { > if (try_split_folio(folio, split_folios) == 0) { > + nr_failed++; > stats->nr_thp_split += is_thp; > stats->nr_split++; > continue; > -- > 2.35.3 > > We probably hit the same issue in our testbeds, but in the other migrate_misplaced_folio() path, which contains the BUG_ON() rather than VM_BUG_ON(). Looks like this patch can also fix that. When looking at that, I wonder whether we overlooked one more spot where we mostly always use putback_movable_pages() for migrate failures, but didn't in migrate_misplaced_folio(). I feel like it was overlooked but want to check with all of you here, as I do think the folio can already be split when reaching here too. So I wonder whether below would make sense as a fix from that POV. ===8<=== diff --git a/mm/migrate.c b/mm/migrate.c index e10d2445fbd8..20da2595527a 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2615,14 +2615,8 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma, nr_remaining = migrate_pages(&migratepages, alloc_misplaced_dst_folio, NULL, node, MIGRATE_ASYNC, MR_NUMA_MISPLACED, &nr_succeeded); - if (nr_remaining) { - if (!list_empty(&migratepages)) { - list_del(&folio->lru); - node_stat_mod_folio(folio, NR_ISOLATED_ANON + - folio_is_file_lru(folio), -nr_pages); - folio_putback_lru(folio); - } - } + if (nr_remaining && !list_empty(&migratepages)) + putback_movable_pages(&migratepages); if (nr_succeeded) { count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded); if (!node_is_toptier(folio_nid(folio)) && node_is_toptier(node)) ===8<=== Thanks,
On 8 Jul 2024, at 11:52, Peter Xu wrote: > On Tue, Jun 11, 2024 at 10:06:20PM -0700, Hugh Dickins wrote: >> I hit the VM_BUG_ON(!list_empty(&cc->migratepages)) in compact_zone(); >> and if DEBUG_VM were off, then pages would be lost on a local list. >> >> Our convention is that if migrate_pages() reports complete success (0), >> then the migratepages list will be empty; but if it reports an error or >> some pages remaining, then its caller must putback_movable_pages(). >> >> There's a new case in which migrate_pages() has been reporting complete >> success, but returning with pages left on the migratepages list: when >> migrate_pages_batch() successfully split a folio on the deferred list, >> but then the "Failure isn't counted" call does not dispose of them all. >> >> Since that block is expecting the large folio to have been counted as 1 >> failure already, and since the return code is later adjusted to success >> whenever the returned list is found empty, the simple way to fix this >> safely is to count splitting the deferred folio as "a failure". >> >> Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list") >> Signed-off-by: Hugh Dickins <hughd@google.com> >> --- >> A hotfix to 6.10-rc, not needed for stable. >> >> mm/migrate.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -1654,7 +1654,12 @@ static int migrate_pages_batch(struct list_head *from, >> >> /* >> * The rare folio on the deferred split list should >> - * be split now. It should not count as a failure. >> + * be split now. It should not count as a failure: >> + * but increment nr_failed because, without doing so, >> + * migrate_pages() may report success with (split but >> + * unmigrated) pages still on its fromlist; whereas it >> + * always reports success when its fromlist is empty. >> + * >> * Only check it without removing it from the list. >> * Since the folio can be on deferred_split_scan() >> * local list and removing it can cause the local list >> @@ -1669,6 +1674,7 @@ static int migrate_pages_batch(struct list_head *from, >> if (nr_pages > 2 && >> !list_empty(&folio->_deferred_list)) { >> if (try_split_folio(folio, split_folios) == 0) { >> + nr_failed++; >> stats->nr_thp_split += is_thp; >> stats->nr_split++; >> continue; >> -- >> 2.35.3 >> >> > > We probably hit the same issue in our testbeds, but in the other > migrate_misplaced_folio() path, which contains the BUG_ON() rather than > VM_BUG_ON(). Looks like this patch can also fix that. > > When looking at that, I wonder whether we overlooked one more spot where we > mostly always use putback_movable_pages() for migrate failures, but didn't > in migrate_misplaced_folio(). I feel like it was overlooked but want to > check with all of you here, as I do think the folio can already be split > when reaching here too. So I wonder whether below would make sense as a fix > from that POV. > > ===8<=== > diff --git a/mm/migrate.c b/mm/migrate.c > index e10d2445fbd8..20da2595527a 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -2615,14 +2615,8 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma, > nr_remaining = migrate_pages(&migratepages, alloc_misplaced_dst_folio, > NULL, node, MIGRATE_ASYNC, > MR_NUMA_MISPLACED, &nr_succeeded); > - if (nr_remaining) { > - if (!list_empty(&migratepages)) { > - list_del(&folio->lru); > - node_stat_mod_folio(folio, NR_ISOLATED_ANON + > - folio_is_file_lru(folio), -nr_pages); > - folio_putback_lru(folio); > - } > - } > + if (nr_remaining && !list_empty(&migratepages)) > + putback_movable_pages(&migratepages); > if (nr_succeeded) { > count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded); > if (!node_is_toptier(folio_nid(folio)) && node_is_toptier(node)) > ===8<=== If the original folio is large and split without migrating all subpages, not migrated sub pages will be left on migratepages list. list_del(&folio->lru) can remove the first subpage from a wrong list, if it is migrated, and loses the rest. It is not a problem before, since MR_NUMA_MISPLACED prevents the folio from being split. The fix looks good to me. -- Best Regards, Yan, Zi
On Mon, Jul 08, 2024 at 12:39:10PM -0400, Zi Yan wrote: > On 8 Jul 2024, at 11:52, Peter Xu wrote: > > > On Tue, Jun 11, 2024 at 10:06:20PM -0700, Hugh Dickins wrote: > >> I hit the VM_BUG_ON(!list_empty(&cc->migratepages)) in compact_zone(); > >> and if DEBUG_VM were off, then pages would be lost on a local list. > >> > >> Our convention is that if migrate_pages() reports complete success (0), > >> then the migratepages list will be empty; but if it reports an error or > >> some pages remaining, then its caller must putback_movable_pages(). > >> > >> There's a new case in which migrate_pages() has been reporting complete > >> success, but returning with pages left on the migratepages list: when > >> migrate_pages_batch() successfully split a folio on the deferred list, > >> but then the "Failure isn't counted" call does not dispose of them all. > >> > >> Since that block is expecting the large folio to have been counted as 1 > >> failure already, and since the return code is later adjusted to success > >> whenever the returned list is found empty, the simple way to fix this > >> safely is to count splitting the deferred folio as "a failure". > >> > >> Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list") > >> Signed-off-by: Hugh Dickins <hughd@google.com> > >> --- > >> A hotfix to 6.10-rc, not needed for stable. > >> > >> mm/migrate.c | 8 +++++++- > >> 1 file changed, 7 insertions(+), 1 deletion(-) > >> > >> --- a/mm/migrate.c > >> +++ b/mm/migrate.c > >> @@ -1654,7 +1654,12 @@ static int migrate_pages_batch(struct list_head *from, > >> > >> /* > >> * The rare folio on the deferred split list should > >> - * be split now. It should not count as a failure. > >> + * be split now. It should not count as a failure: > >> + * but increment nr_failed because, without doing so, > >> + * migrate_pages() may report success with (split but > >> + * unmigrated) pages still on its fromlist; whereas it > >> + * always reports success when its fromlist is empty. > >> + * > >> * Only check it without removing it from the list. > >> * Since the folio can be on deferred_split_scan() > >> * local list and removing it can cause the local list > >> @@ -1669,6 +1674,7 @@ static int migrate_pages_batch(struct list_head *from, > >> if (nr_pages > 2 && > >> !list_empty(&folio->_deferred_list)) { > >> if (try_split_folio(folio, split_folios) == 0) { > >> + nr_failed++; > >> stats->nr_thp_split += is_thp; > >> stats->nr_split++; > >> continue; > >> -- > >> 2.35.3 > >> > >> > > > > We probably hit the same issue in our testbeds, but in the other > > migrate_misplaced_folio() path, which contains the BUG_ON() rather than > > VM_BUG_ON(). Looks like this patch can also fix that. > > > > When looking at that, I wonder whether we overlooked one more spot where we > > mostly always use putback_movable_pages() for migrate failures, but didn't > > in migrate_misplaced_folio(). I feel like it was overlooked but want to > > check with all of you here, as I do think the folio can already be split > > when reaching here too. So I wonder whether below would make sense as a fix > > from that POV. > > > > ===8<=== > > diff --git a/mm/migrate.c b/mm/migrate.c > > index e10d2445fbd8..20da2595527a 100644 > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -2615,14 +2615,8 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma, > > nr_remaining = migrate_pages(&migratepages, alloc_misplaced_dst_folio, > > NULL, node, MIGRATE_ASYNC, > > MR_NUMA_MISPLACED, &nr_succeeded); > > - if (nr_remaining) { > > - if (!list_empty(&migratepages)) { > > - list_del(&folio->lru); > > - node_stat_mod_folio(folio, NR_ISOLATED_ANON + > > - folio_is_file_lru(folio), -nr_pages); > > - folio_putback_lru(folio); > > - } > > - } > > + if (nr_remaining && !list_empty(&migratepages)) > > + putback_movable_pages(&migratepages); > > if (nr_succeeded) { > > count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded); > > if (!node_is_toptier(folio_nid(folio)) && node_is_toptier(node)) > > ===8<=== > > If the original folio is large and split without migrating all subpages, > not migrated sub pages will be left on migratepages list. list_del(&folio->lru) > can remove the first subpage from a wrong list, if it is migrated, and loses > the rest. It is not a problem before, since MR_NUMA_MISPLACED prevents the > folio from being split. > > The fix looks good to me. Thanks, Zi. Let me send a formal patch then.
--- a/mm/migrate.c +++ b/mm/migrate.c @@ -1654,7 +1654,12 @@ static int migrate_pages_batch(struct list_head *from, /* * The rare folio on the deferred split list should - * be split now. It should not count as a failure. + * be split now. It should not count as a failure: + * but increment nr_failed because, without doing so, + * migrate_pages() may report success with (split but + * unmigrated) pages still on its fromlist; whereas it + * always reports success when its fromlist is empty. + * * Only check it without removing it from the list. * Since the folio can be on deferred_split_scan() * local list and removing it can cause the local list @@ -1669,6 +1674,7 @@ static int migrate_pages_batch(struct list_head *from, if (nr_pages > 2 && !list_empty(&folio->_deferred_list)) { if (try_split_folio(folio, split_folios) == 0) { + nr_failed++; stats->nr_thp_split += is_thp; stats->nr_split++; continue;
I hit the VM_BUG_ON(!list_empty(&cc->migratepages)) in compact_zone(); and if DEBUG_VM were off, then pages would be lost on a local list. Our convention is that if migrate_pages() reports complete success (0), then the migratepages list will be empty; but if it reports an error or some pages remaining, then its caller must putback_movable_pages(). There's a new case in which migrate_pages() has been reporting complete success, but returning with pages left on the migratepages list: when migrate_pages_batch() successfully split a folio on the deferred list, but then the "Failure isn't counted" call does not dispose of them all. Since that block is expecting the large folio to have been counted as 1 failure already, and since the return code is later adjusted to success whenever the returned list is found empty, the simple way to fix this safely is to count splitting the deferred folio as "a failure". Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list") Signed-off-by: Hugh Dickins <hughd@google.com> --- A hotfix to 6.10-rc, not needed for stable. mm/migrate.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)