Message ID | 20250330064732.3781046-2-mcgrof@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: move migration work around to buffer-heads | expand |
On Sat, Mar 29, 2025 at 11:47:30PM -0700, Luis Chamberlain wrote: > However tracing shows that folio_mc_copy() *isn't* being called > as often as we'd expect from buffer_migrate_folio_norefs() path > as we're likely bailing early now thanks to the check added by commit > 060913999d7a ("mm: migrate: support poisoned recover from migrate > folio"). Umm. You're saying that most folios we try to migrate have extra refs? That seems unexpected; does it indicate a bug in 060913999d7a? > +++ b/mm/migrate.c > @@ -751,6 +751,8 @@ static int __migrate_folio(struct address_space *mapping, struct folio *dst, > { > int rc, expected_count = folio_expected_refs(mapping, src); > > + might_sleep(); We deliberately don't sleep when the folio is only a single page. So this needs to be: might_sleep_if(folio_test_large(folio)); > /* Check whether src does not have extra refs before we do more work */ > if (folio_ref_count(src) != expected_count) > return -EAGAIN; > -- > 2.47.2 >
On Sun, Mar 30, 2025 at 01:04:02PM +0100, Matthew Wilcox wrote: > On Sat, Mar 29, 2025 at 11:47:30PM -0700, Luis Chamberlain wrote: > > However tracing shows that folio_mc_copy() *isn't* being called > > as often as we'd expect from buffer_migrate_folio_norefs() path > > as we're likely bailing early now thanks to the check added by commit > > 060913999d7a ("mm: migrate: support poisoned recover from migrate > > folio"). > > Umm. You're saying that most folios we try to migrate have extra refs? > That seems unexpected; does it indicate a bug in 060913999d7a? I've debugged this further, the migration does succeed and I don't see any failures due to the new refcheck added by 060913999d7a. I've added stats in a out of tree patch [0] in case folks find this useful, I could submit this. But the point is that even if you use dd against a large block device you won't always end up trying to migrate large folios *right away* even if you trigger folio migration through compaction, specially if you use a large bs on dd like bs=1M. Using a size matching more close to the logical block size will trigger large folio migration much faster. Example of the stats: # cat /sys/kernel/debug/mm/migrate/bh/stats [buffer_migrate_folio] calls 9874 success 9854 fails 20 [buffer_migrate_folio_norefs] calls 3694 success 1651 fails 2043 no-head-success 532 no-head-fails 0 invalid 2040 valid 1119 valid-success 1119 valid-fails 0 Success ratios: buffer_migrate_folio: 99% success (9854/9874) buffer_migrate_folio_norefs: 44% success (1651/3694) > > +++ b/mm/migrate.c > > @@ -751,6 +751,8 @@ static int __migrate_folio(struct address_space *mapping, struct folio *dst, > > { > > int rc, expected_count = folio_expected_refs(mapping, src); > > > > + might_sleep(); > > We deliberately don't sleep when the folio is only a single page. > So this needs to be: > > might_sleep_if(folio_test_large(folio)); That does reduce the scope of our test coverage but, sure. [0] https://lore.kernel.org/all/20250331061306.4073352-1-mcgrof@kernel.org/ Luis
diff --git a/mm/migrate.c b/mm/migrate.c index f3ee6d8d5e2e..712ddd11f3f0 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -751,6 +751,8 @@ static int __migrate_folio(struct address_space *mapping, struct folio *dst, { int rc, expected_count = folio_expected_refs(mapping, src); + might_sleep(); + /* Check whether src does not have extra refs before we do more work */ if (folio_ref_count(src) != expected_count) return -EAGAIN;
When we do page migration of large folios folio_mc_copy() can cond_resched() *iff* we are on a large folio. There's a hairy bug reported by both 0-day [0] and syzbot [1] where it has been detected we can call folio_mc_copy() in atomic context. While, technically speaking that should in theory be only possible today from buffer-head filesystems using buffer_migrate_folio_norefs() on page migration the only buffer-head large folio filesystem -- the block device cache, and so with block devices with large block sizes. However tracing shows that folio_mc_copy() *isn't* being called as often as we'd expect from buffer_migrate_folio_norefs() path as we're likely bailing early now thanks to the check added by commit 060913999d7a ("mm: migrate: support poisoned recover from migrate folio"). *Most* folio_mc_copy() calls in turn end up *not* being in atomic context, and so we won't hit a splat when using: CONFIG_PROVE_LOCKING=y CONFIG_DEBUG_ATOMIC_SLEEP=y But we *want* to help proactively find callers of __migrate_folio() in atomic context, so make might_sleep() explicit to help us root out large folio atomic callers of migrate_folio(). Link: https://lkml.kernel.org/r/202503101536.27099c77-lkp@intel.com # [0] Link: https://lkml.kernel.org/r/67e57c41.050a0220.2f068f.0033.GAE@google.com # [1] Link: https://lkml.kernel.org/r/Z-c6BqCSmAnNxb57@bombadil.infradead.org # [2] Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- mm/migrate.c | 2 ++ 1 file changed, 2 insertions(+)