Message ID | 20231009203231.1715845-2-zi.yan@sent.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Large folio migration fix and questions on migration stats | expand |
Zi Yan <zi.yan@sent.com> writes: > From: Zi Yan <ziy@nvidia.com> > > nr_failed was missing the rc value from migrate_pages_batch() and can > cause a mismatch between migrate_pages() return value and the number of > not migrated pages, i.e., when the return value of migrate_pages() is 0, > there are still pages left in the from page list. It will happen when a > non-PMD THP large folio fails to migrate due to -ENOMEM and is split > successfully but not all the split pages are not migrated, > migrate_pages_batch() would return non-zero, but astats.nr_thp_split = 0. > nr_failed would be 0 and returned to the caller of migrate_pages(), but > the not migrated pages are left in the from page list without being added > back to LRU lists. > > Fixes: 2ef7dbb26990 ("migrate_pages: try migrate in batch asynchronously firstly") > Signed-off-by: Zi Yan <ziy@nvidia.com> > --- > mm/migrate.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index c602bf6dec97..5348827bd958 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1834,7 +1834,7 @@ static int migrate_pages_sync(struct list_head *from, new_folio_t get_new_folio, > return rc; > } > stats->nr_thp_failed += astats.nr_thp_split; > - nr_failed += astats.nr_thp_split; > + nr_failed += rc + astats.nr_thp_split; > /* > * Fall back to migrate all failed folios one by one synchronously. All > * failed folios except split THPs will be retried, so their failure I don't think this is a correct fix. The failed folios will be retried in the following synchronous migration below. To fix the issue, we should track nr_split for all large folios (not only THP), then use nr_failed += astats.nr_split; -- Best Regards, Huang, Ying
On 9 Oct 2023, at 23:36, Huang, Ying wrote: > Zi Yan <zi.yan@sent.com> writes: > >> From: Zi Yan <ziy@nvidia.com> >> >> nr_failed was missing the rc value from migrate_pages_batch() and can >> cause a mismatch between migrate_pages() return value and the number of >> not migrated pages, i.e., when the return value of migrate_pages() is 0, >> there are still pages left in the from page list. It will happen when a >> non-PMD THP large folio fails to migrate due to -ENOMEM and is split >> successfully but not all the split pages are not migrated, >> migrate_pages_batch() would return non-zero, but astats.nr_thp_split = 0. >> nr_failed would be 0 and returned to the caller of migrate_pages(), but >> the not migrated pages are left in the from page list without being added >> back to LRU lists. >> >> Fixes: 2ef7dbb26990 ("migrate_pages: try migrate in batch asynchronously firstly") >> Signed-off-by: Zi Yan <ziy@nvidia.com> >> --- >> mm/migrate.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> index c602bf6dec97..5348827bd958 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -1834,7 +1834,7 @@ static int migrate_pages_sync(struct list_head *from, new_folio_t get_new_folio, >> return rc; >> } >> stats->nr_thp_failed += astats.nr_thp_split; >> - nr_failed += astats.nr_thp_split; >> + nr_failed += rc + astats.nr_thp_split; >> /* >> * Fall back to migrate all failed folios one by one synchronously. All >> * failed folios except split THPs will be retried, so their failure > > I don't think this is a correct fix. The failed folios will be retried > in the following synchronous migration below. > > To fix the issue, we should track nr_split for all large folios (not > only THP), then use > > nr_failed += astats.nr_split; You are suggesting a new stats "nr_split" in addition to nr_thp_split? And nr_split includes nr_thp_split? -- Best Regards, Yan, Zi
diff --git a/mm/migrate.c b/mm/migrate.c index c602bf6dec97..5348827bd958 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1834,7 +1834,7 @@ static int migrate_pages_sync(struct list_head *from, new_folio_t get_new_folio, return rc; } stats->nr_thp_failed += astats.nr_thp_split; - nr_failed += astats.nr_thp_split; + nr_failed += rc + astats.nr_thp_split; /* * Fall back to migrate all failed folios one by one synchronously. All * failed folios except split THPs will be retried, so their failure