Message ID | 20250415231635.83960-8-dave@stgolabs.net (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fs/buffer: split pagecache lookups into atomic or blocking | expand |
On Tue 15-04-25 16:16:35, Davidlohr Bueso wrote: > The large folio + buffer head noref migration scenarios are > being naughty and blocking while holding a spinlock. > > As a consequence of the pagecache lookup path taking the > folio lock this serializes against migration paths, so > they can wait for each other. For the private_lock > atomic case, a new BH_Migrate flag is introduced which > enables the lookup to bail. > > This allows the critical region of the private_lock on > the migration path to be reduced to the way it was before > ebdf4de5642fb6 ("mm: migrate: fix reference check race > between __find_get_block() and migration"), that is covering > the count checks. > > The scope is always noref migration. > > Reported-by: kernel test robot <oliver.sang@intel.com> > Reported-by: syzbot+f3c6fda1297c748a7076@syzkaller.appspotmail.com > Closes: https://lore.kernel.org/oe-lkp/202503101536.27099c77-lkp@intel.com > Fixes: 3c20917120ce61 ("block/bdev: enable large folio support for large logical block sizes") > Co-developed-by: Luis Chamberlain <mcgrof@kernel.org> > Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Looks good! Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/buffer.c | 12 +++++++++++- > fs/ext4/ialloc.c | 3 ++- > include/linux/buffer_head.h | 1 + > mm/migrate.c | 8 +++++--- > 4 files changed, 19 insertions(+), 5 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index f8e63885604b..b8e1e6e325cd 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -207,6 +207,15 @@ __find_get_block_slow(struct block_device *bdev, sector_t block, bool atomic) > head = folio_buffers(folio); > if (!head) > goto out_unlock; > + /* > + * Upon a noref migration, the folio lock serializes here; > + * otherwise bail. > + */ > + if (test_bit_acquire(BH_Migrate, &head->b_state)) { > + WARN_ON(!atomic); > + goto out_unlock; > + } > + > bh = head; > do { > if (!buffer_mapped(bh)) > @@ -1390,7 +1399,8 @@ lookup_bh_lru(struct block_device *bdev, sector_t block, unsigned size) > /* > * Perform a pagecache lookup for the matching buffer. If it's there, refresh > * it in the LRU and mark it as accessed. If it is not present then return > - * NULL > + * NULL. Atomic context callers may also return NULL if the buffer is being > + * migrated; similarly the page is not marked accessed either. > */ > static struct buffer_head * > find_get_block_common(struct block_device *bdev, sector_t block, > diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c > index 38bc8d74f4cc..e7ecc7c8a729 100644 > --- a/fs/ext4/ialloc.c > +++ b/fs/ext4/ialloc.c > @@ -691,7 +691,8 @@ static int recently_deleted(struct super_block *sb, ext4_group_t group, int ino) > if (!bh || !buffer_uptodate(bh)) > /* > * If the block is not in the buffer cache, then it > - * must have been written out. > + * must have been written out, or, most unlikely, is > + * being migrated - false failure should be OK here. > */ > goto out; > > diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h > index c791aa9a08da..0029ff880e27 100644 > --- a/include/linux/buffer_head.h > +++ b/include/linux/buffer_head.h > @@ -34,6 +34,7 @@ enum bh_state_bits { > BH_Meta, /* Buffer contains metadata */ > BH_Prio, /* Buffer should be submitted with REQ_PRIO */ > BH_Defer_Completion, /* Defer AIO completion to workqueue */ > + BH_Migrate, /* Buffer is being migrated (norefs) */ > > BH_PrivateStart,/* not a state bit, but the first bit available > * for private allocation by other entities > diff --git a/mm/migrate.c b/mm/migrate.c > index 6e2488e5dbe4..c80591514e66 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -845,9 +845,11 @@ static int __buffer_migrate_folio(struct address_space *mapping, > return -EAGAIN; > > if (check_refs) { > - bool busy; > + bool busy, migrating; > bool invalidated = false; > > + migrating = test_and_set_bit_lock(BH_Migrate, &head->b_state); > + VM_WARN_ON_ONCE(migrating); > recheck_buffers: > busy = false; > spin_lock(&mapping->i_private_lock); > @@ -859,12 +861,12 @@ static int __buffer_migrate_folio(struct address_space *mapping, > } > bh = bh->b_this_page; > } while (bh != head); > + spin_unlock(&mapping->i_private_lock); > if (busy) { > if (invalidated) { > rc = -EAGAIN; > goto unlock_buffers; > } > - spin_unlock(&mapping->i_private_lock); > invalidate_bh_lrus(); > invalidated = true; > goto recheck_buffers; > @@ -883,8 +885,7 @@ static int __buffer_migrate_folio(struct address_space *mapping, > > unlock_buffers: > if (check_refs) > - spin_unlock(&mapping->i_private_lock); > + clear_bit_unlock(BH_Migrate, &head->b_state); > bh = head; > do { > unlock_buffer(bh); > -- > 2.39.5 >
diff --git a/fs/buffer.c b/fs/buffer.c index f8e63885604b..b8e1e6e325cd 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -207,6 +207,15 @@ __find_get_block_slow(struct block_device *bdev, sector_t block, bool atomic) head = folio_buffers(folio); if (!head) goto out_unlock; + /* + * Upon a noref migration, the folio lock serializes here; + * otherwise bail. + */ + if (test_bit_acquire(BH_Migrate, &head->b_state)) { + WARN_ON(!atomic); + goto out_unlock; + } + bh = head; do { if (!buffer_mapped(bh)) @@ -1390,7 +1399,8 @@ lookup_bh_lru(struct block_device *bdev, sector_t block, unsigned size) /* * Perform a pagecache lookup for the matching buffer. If it's there, refresh * it in the LRU and mark it as accessed. If it is not present then return - * NULL + * NULL. Atomic context callers may also return NULL if the buffer is being + * migrated; similarly the page is not marked accessed either. */ static struct buffer_head * find_get_block_common(struct block_device *bdev, sector_t block, diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 38bc8d74f4cc..e7ecc7c8a729 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -691,7 +691,8 @@ static int recently_deleted(struct super_block *sb, ext4_group_t group, int ino) if (!bh || !buffer_uptodate(bh)) /* * If the block is not in the buffer cache, then it - * must have been written out. + * must have been written out, or, most unlikely, is + * being migrated - false failure should be OK here. */ goto out; diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index c791aa9a08da..0029ff880e27 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -34,6 +34,7 @@ enum bh_state_bits { BH_Meta, /* Buffer contains metadata */ BH_Prio, /* Buffer should be submitted with REQ_PRIO */ BH_Defer_Completion, /* Defer AIO completion to workqueue */ + BH_Migrate, /* Buffer is being migrated (norefs) */ BH_PrivateStart,/* not a state bit, but the first bit available * for private allocation by other entities diff --git a/mm/migrate.c b/mm/migrate.c index 6e2488e5dbe4..c80591514e66 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -845,9 +845,11 @@ static int __buffer_migrate_folio(struct address_space *mapping, return -EAGAIN; if (check_refs) { - bool busy; + bool busy, migrating; bool invalidated = false; + migrating = test_and_set_bit_lock(BH_Migrate, &head->b_state); + VM_WARN_ON_ONCE(migrating); recheck_buffers: busy = false; spin_lock(&mapping->i_private_lock); @@ -859,12 +861,12 @@ static int __buffer_migrate_folio(struct address_space *mapping, } bh = bh->b_this_page; } while (bh != head); + spin_unlock(&mapping->i_private_lock); if (busy) { if (invalidated) { rc = -EAGAIN; goto unlock_buffers; } - spin_unlock(&mapping->i_private_lock); invalidate_bh_lrus(); invalidated = true; goto recheck_buffers; @@ -883,8 +885,7 @@ static int __buffer_migrate_folio(struct address_space *mapping, unlock_buffers: if (check_refs) - spin_unlock(&mapping->i_private_lock); + clear_bit_unlock(BH_Migrate, &head->b_state); bh = head; do { unlock_buffer(bh); --
The large folio + buffer head noref migration scenarios are being naughty and blocking while holding a spinlock. As a consequence of the pagecache lookup path taking the folio lock this serializes against migration paths, so they can wait for each other. For the private_lock atomic case, a new BH_Migrate flag is introduced which enables the lookup to bail. This allows the critical region of the private_lock on the migration path to be reduced to the way it was before ebdf4de5642fb6 ("mm: migrate: fix reference check race between __find_get_block() and migration"), that is covering the count checks. The scope is always noref migration. Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: syzbot+f3c6fda1297c748a7076@syzkaller.appspotmail.com Closes: https://lore.kernel.org/oe-lkp/202503101536.27099c77-lkp@intel.com Fixes: 3c20917120ce61 ("block/bdev: enable large folio support for large logical block sizes") Co-developed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> --- fs/buffer.c | 12 +++++++++++- fs/ext4/ialloc.c | 3 ++- include/linux/buffer_head.h | 1 + mm/migrate.c | 8 +++++--- 4 files changed, 19 insertions(+), 5 deletions(-) 2.39.5