Message ID | 20240821114628.645455-1-sunjunchao2870@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fix the race between umount and btrfs-cleaner | expand |
On Wed, Aug 21, 2024 at 07:46:28PM +0800, Julian Sun wrote: > There is a race condition generic_shutdown_super() and > __btrfs_run_defrag_inode(). > Consider the following scenario: > > umount thread: btrfs-cleaner thread: > btrfs_run_delayed_iputs() > ->run_delayed_iput_locked() > ->iput(inode) > // Here the inode (ie ino 261) will be cleared and freed > btrfs_kill_super() > ->generic_shutdown_super() > btrfs_run_defrag_inodes() > ->__btrfs_run_defrag_inode() > ->btrfs_iget(ino) > // The inode 261 was recreated with i_count=1 > // and added to the sb list > ->evict_inodes(sb) // After some work > // inode 261 was added ->iput(inode) > // to the dispose list ->iput_funal() > ->evict(inode) ->evict(inode) This is wrong though, evict_inodes() isn't supposed to isolate if i_count == 1, and iput_final sets I_FREEING, so we won't get the evict() from evict_inodes. Something else is happening here. Thanks, Josef
Hi, Josef. Thanks for your review and comments. Yeah, you are correct, sorry for the mistake. I'm thinking if is there any misunderstanding in the following scenario umount thread: btrfs-cleaner thread: btrfs_run_delayed_iputs() ->run_delayed_iput_locked() ->iput(inode) ->spin_lock(inode) // inode->i_count dec to 0 ->iput_final() ->__inode_add_lru() // mapping_shrinkable() returned false // so iput_final() returned with i_count=0 // note here: the inode still btrfs_kill_super() // in the sb list and I_FREEING ->generic_shutdown_super() // was not set ->evict_inodes() ->spin_unlock(inode) ->spin_lock(inode) // i_count is 0 and // continue running... // I_FREEING was not set ->__btrfs_run_defrag_inode() // so the inode was added to ->btrfs_iget() // dispost list // note the inode is still // in the sb list // and I_WILL_FREE|I_FREEING // was not set ->spin_unlock() ->find_inode() ->spin_lock(inode) ->__iget(); ->spin_unlock(); ->dispose_list() ->evict() ->iput() // note: i_lock was not ->spin_lock(inode) // hold here. ->iput_final() ->spin_unlock() ->evict() Now, we have two threads simultaneously evicting the same inode, which led to a bug. The key points here are: 1. btrfs-cleaner: iput_final() called __inode_add_lru() and then return because some reason with i_count=0, and I_FREEING was not set, and most important, i_lock was unlocked, which gives evict_inodes() a perfect chance to put the inode into dispose list. 2. umount thread: The inode (which i_count is 0 and I_FREEING was not set) was added to dispose list. 3. btrfs-cleaner: Before umount thread set I_FREEING flag, btrfs_iget() finded the inode and inc i_count 4. umount thread called evict() and btrfs-cleaner called iput(), which leads to bug. The above behavior can be confirmed by the debugging log based on commit d30d0e49da71: diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c index 407ccec3e57e..cc50dd1e8beb 100644 --- a/fs/btrfs/defrag.c +++ b/fs/btrfs/defrag.c @@ -278,6 +278,8 @@ static int __btrfs_run_defrag_inode(struct btrfs_fs_info *fs_info, ret = btrfs_defrag_file(inode, NULL, &range, defrag->transid, BTRFS_DEFRAG_BATCH); sb_end_write(fs_info->sb); + if (inode->i_state & (1<<19)) + pr_info("bug! double evict! crash will happen\n"); iput(inode); if (ret < 0) diff --git a/fs/inode.c b/fs/inode.c index 3a41f83a4ba5..641a828bc2b7 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -461,8 +461,14 @@ static void __inode_add_lru(struct inode *inode, bool rotate) return; if (!(inode->i_sb->s_flags & SB_ACTIVE)) return; - if (!mapping_shrinkable(&inode->i_data)) + if (!mapping_shrinkable(&inode->i_data)) { + if (inode->i_ino == 261) { + pr_info("set inode 0x%px dbg_flag casue mapping_shrinkable\n", inode); + dump_stack(); + WRITE_ONCE(inode->i_state, inode- >i_state|(1<<19)); + } return; + } if (list_lru_add_obj(&inode->i_sb->s_inode_lru, &inode- >i_lru)) this_cpu_inc(nr_unused); @@ -623,7 +629,10 @@ void clear_inode(struct inode *inode) xa_unlock_irq(&inode->i_data.i_pages); BUG_ON(!list_empty(&inode->i_data.i_private_list)); BUG_ON(!(inode->i_state & I_FREEING)); - BUG_ON(inode->i_state & I_CLEAR); + if (inode->i_state & I_CLEAR) { + pr_info("inode 0x%px state 0x%lx triggered bug\n", inode, inode->i_state); + BUG(); + } BUG_ON(!list_empty(&inode->i_wb_list)); /* don't need i_lock here, no concurrent mods to i_state */ inode->i_state = I_FREEING | I_CLEAR; @@ -1754,7 +1763,10 @@ void iput(struct inode *inode) { if (!inode) return; - BUG_ON(inode->i_state & I_CLEAR); + if (inode->i_state & I_CLEAR) { + pr_info("inode 0x%px state 0x%lx triggered a bug\n", inode, inode->i_state); + BUG(); + } retry: if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock)) { if (inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) { @@ -1764,6 +1776,8 @@ void iput(struct inode *inode) mark_inode_dirty_sync(inode); goto retry; } + if (inode->i_ino == 261 && (inode->i_state &(1<<19))) + pr_info("state is 0x%lx, bug may be triggered\n", inode->i_state); iput_final(inode); } } The (1<<19) bit was used to debugging. Here is the log when bug was triggered: [ 33.028024][ T2451] set inode 0xffff88801fb87738 dbg_flag casue mapping_shrinkable [ 33.028478][ T2451] CPU: 0 PID: 2451 Comm: btrfs-cleaner Not tainted 6.10.0-rc2-00225-gf0deb01d52b3-dirty #132 [ 33.029072][ T2451] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 33.029675][ T2451] Call Trace: [ 33.029873][ T2451] <TASK> [ 33.030046][ T2451] dump_stack_lvl+0x16c/0x1f0 [ 33.030330][ T2451] __inode_add_lru.part.0+0x1ea/0x280 [ 33.030647][ T2451] iput+0x7c7/0x920 [ 33.030876][ T2451] run_delayed_iput_locked+0x136/0x1e0 [ 33.031199][ T2451] btrfs_run_delayed_iputs+0x8e/0x120 [ 33.031516][ T2451] cleaner_kthread+0x2d3/0x480 [ 33.031801][ T2451] ? __pfx_cleaner_kthread+0x10/0x10 [ 33.032113][ T2451] kthread+0x2c1/0x3a0 [ 33.032354][ T2451] ? _raw_spin_unlock_irq+0x23/0x50 [ 33.032659][ T2451] ? __pfx_kthread+0x10/0x10 [ 33.032929][ T2451] ret_from_fork+0x45/0x80 [ 33.033198][ T2451] ? __pfx_kthread+0x10/0x10 [ 33.033469][ T2451] ret_from_fork_asm+0x1a/0x30 [ 33.033756][ T2451] </TASK> [ 33.033978][ T2451] bug! double evict! crash will happen [ 33.034168][ T2332] BTRFS info (device loop0): last unmount of filesystem c9fe44da-de57-406a-8241-57ec7d4412cf [ 33.034295][ T2451] inode 0xffff88801fb87738 state 0x60 triggered a bug [ 33.034332][ T2451] ------------[ cut here ]------------ [ 33.035578][ T2451] kernel BUG at fs/inode.c:1768! [ 33.035873][ T2451] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 33.036286][ T2451] CPU: 0 PID: 2451 Comm: btrfs-cleaner Not tainted 6.10.0-rc2-00225-gf0deb01d52b3-dirty #132 [ 33.036870][ T2451] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 33.037464][ T2451] RIP: 0010:iput+0x896/0x920 // logs ... [ 33.042882][ T2451] PKRU: 55555554 [ 33.043092][ T2451] Call Trace: [ 33.043288][ T2451] <TASK> [ 33.044190][ T2451] ? iput+0x896/0x920 [ 33.047321][ T2451] ? btrfs_run_defrag_inodes+0xb49/0xe60 [ 33.047653][ T2451] btrfs_run_defrag_inodes+0xa40/0xe60 [ 33.047975][ T2451] ? __pfx_btrfs_run_defrag_inodes+0x10/0x10 [ 33.048327][ T2451] ? __pfx___mutex_unlock_slowpath+0x10/0x10 [ 33.048678][ T2451] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 33.048996][ T2451] ? _raw_spin_unlock+0x28/0x50 [ 33.049285][ T2451] ? btrfs_clean_one_deleted_snapshot+0x2b2/0x420 [ 33.049660][ T2451] cleaner_kthread+0x2ee/0x480 [ 33.049941][ T2451] ? __pfx_cleaner_kthread+0x10/0x10 [ 33.050252][ T2451] kthread+0x2c1/0x3a0 [ 33.050492][ T2451] ? _raw_spin_unlock_irq+0x23/0x50 [ 33.050796][ T2451] ? __pfx_kthread+0x10/0x10 [ 33.051067][ T2451] ret_from_fork+0x45/0x80 [ 33.051333][ T2451] ? __pfx_kthread+0x10/0x10 [ 33.051603][ T2451] ret_from_fork_asm+0x1a/0x30 [ 33.051887][ T2451] </TASK> [ 33.052066][ T2451] Modules linked in: [ 33.052316][ T2451] ---[ end trace 0000000000000000 ]--- [ 33.052634][ T2451] RIP: 0010:iput+0x896/0x920 Please note inode 0xffff88801fb87738, which was set debug flag by __inode_add_lru() in the run_delayed_iput_locked() code path. This indicates that the inode was not actually evicted. Then, please note the double evict log, which shows that this inode which was set debug flag and was not be evicted is still on the sb list and was successfully acquired by btrfs_iget(). Next, take note of the unmount log, indicating that there was a concurrent unmount thread which has already called evict_inodes() at that time. Finally, observe where the BUG was triggered: the state of inode 0xffff88801fb87738 had been set to 0x60 by the unmount thread calling inode_clear(), and at the same time, btrfs-cleaner called iput(), ultimately leading to the BUG at the beginning check of iput(). Although the point of trigger differs from the one reported by syzbot, there is no doubt that they are the same bug. You could apply the above diff and try to reproduce the bug using the reproducer[1]. If there is anything I'm missing or more info was needed, please let me know. [1]: https://syzkaller.appspot.com/x/repro.c?x=14c57f16980000 Thanks.
On Thu, 2024-08-22 at 19:41 +0800, Julian Sun wrote: > Hi, Josef. > Thanks for your review and comments. > Yeah, you are correct, sorry for the mistake. > > I'm thinking if is there any misunderstanding in the following > scenario > > umount thread: btrfs-cleaner thread: > btrfs_run_delayed_iputs() > ->run_delayed_iput_locked() > ->iput(inode) > ->spin_lock(inode) > // inode->i_count dec to 0 > ->iput_final() > ->__inode_add_lru() > // mapping_shrinkable() > returned false > // so iput_final() > returned with i_count=0 > // note here: the inode > still > btrfs_kill_super() // in the sb list and > I_FREEING > ->generic_shutdown_super() // was not set > ->evict_inodes() ->spin_unlock(inode) > ->spin_lock(inode) > // i_count is 0 and // continue running... > // I_FREEING was not set ->__btrfs_run_defrag_inode() > // so the inode was added to ->btrfs_iget() > // dispost list // note the inode is > still > // in the sb list > // and > I_WILL_FREE|I_FREEING > // was not set > ->spin_unlock() ->find_inode() > ->spin_lock(inode) > ->__iget(); > ->spin_unlock(); > ->dispose_list() > ->evict() ->iput() > // note: i_lock was not ->spin_lock(inode) > // hold here. ->iput_final() > ->spin_unlock() > ->evict() > The understanding here is not correct, I will try to analyze it more carefully. > Now, we have two threads simultaneously evicting > the same inode, which led to a bug. > > The key points here are: > 1. btrfs-cleaner: iput_final() called __inode_add_lru() and then > return because some reason with i_count=0, and I_FREEING was not set, > and most important, i_lock was unlocked, > which gives evict_inodes() a perfect chance to put the inode into > dispose list. > 2. umount thread: The inode (which i_count is 0 and I_FREEING was not > set) > was added to dispose list. > 3. btrfs-cleaner: Before umount thread set I_FREEING flag, > btrfs_iget() > finded the inode and inc i_count > 4. umount thread called evict() and btrfs-cleaner called iput(), > which leads > to bug. > > The above behavior can be confirmed by the debugging log > based on commit d30d0e49da71: > > diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c > index 407ccec3e57e..cc50dd1e8beb 100644 > --- a/fs/btrfs/defrag.c > +++ b/fs/btrfs/defrag.c > @@ -278,6 +278,8 @@ static int __btrfs_run_defrag_inode(struct > btrfs_fs_info *fs_info, > ret = btrfs_defrag_file(inode, NULL, &range, defrag->transid, > BTRFS_DEFRAG_BATCH); > sb_end_write(fs_info->sb); > + if (inode->i_state & (1<<19)) > + pr_info("bug! double evict! crash will happen\n"); > iput(inode); > > if (ret < 0) > diff --git a/fs/inode.c b/fs/inode.c > index 3a41f83a4ba5..641a828bc2b7 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -461,8 +461,14 @@ static void __inode_add_lru(struct inode *inode, > bool rotate) > return; > if (!(inode->i_sb->s_flags & SB_ACTIVE)) > return; > - if (!mapping_shrinkable(&inode->i_data)) > + if (!mapping_shrinkable(&inode->i_data)) { > + if (inode->i_ino == 261) { > + pr_info("set inode 0x%px dbg_flag casue > mapping_shrinkable\n", inode); > + dump_stack(); > + WRITE_ONCE(inode->i_state, inode- > > i_state|(1<<19)); > + } > return; > + } > > if (list_lru_add_obj(&inode->i_sb->s_inode_lru, &inode- > > i_lru)) > this_cpu_inc(nr_unused); > @@ -623,7 +629,10 @@ void clear_inode(struct inode *inode) > xa_unlock_irq(&inode->i_data.i_pages); > BUG_ON(!list_empty(&inode->i_data.i_private_list)); > BUG_ON(!(inode->i_state & I_FREEING)); > - BUG_ON(inode->i_state & I_CLEAR); > + if (inode->i_state & I_CLEAR) { > + pr_info("inode 0x%px state 0x%lx triggered bug\n", > inode, inode->i_state); > + BUG(); > + } > BUG_ON(!list_empty(&inode->i_wb_list)); > /* don't need i_lock here, no concurrent mods to i_state */ > inode->i_state = I_FREEING | I_CLEAR; > @@ -1754,7 +1763,10 @@ void iput(struct inode *inode) > { > if (!inode) > return; > - BUG_ON(inode->i_state & I_CLEAR); > + if (inode->i_state & I_CLEAR) { > + pr_info("inode 0x%px state 0x%lx triggered a bug\n", > inode, inode->i_state); > + BUG(); > + } > retry: > if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock)) { > if (inode->i_nlink && (inode->i_state & > I_DIRTY_TIME)) > { > @@ -1764,6 +1776,8 @@ void iput(struct inode *inode) > mark_inode_dirty_sync(inode); > goto retry; > } > + if (inode->i_ino == 261 && (inode->i_state &(1<<19))) > + pr_info("state is 0x%lx, bug may be > triggered\n", inode->i_state); > iput_final(inode); > } > } > > The (1<<19) bit was used to debugging. > > Here is the log when bug was triggered: > > [ 33.028024][ T2451] set inode 0xffff88801fb87738 dbg_flag casue > mapping_shrinkable > [ 33.028478][ T2451] CPU: 0 PID: 2451 Comm: btrfs-cleaner Not > tainted > 6.10.0-rc2-00225-gf0deb01d52b3-dirty #132 > [ 33.029072][ T2451] Hardware name: QEMU Standard PC (i440FX + > PIIX, > 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > [ 33.029675][ T2451] Call Trace: > [ 33.029873][ T2451] <TASK> > [ 33.030046][ T2451] dump_stack_lvl+0x16c/0x1f0 > [ 33.030330][ T2451] __inode_add_lru.part.0+0x1ea/0x280 > [ 33.030647][ T2451] iput+0x7c7/0x920 > [ 33.030876][ T2451] run_delayed_iput_locked+0x136/0x1e0 > [ 33.031199][ T2451] btrfs_run_delayed_iputs+0x8e/0x120 > [ 33.031516][ T2451] cleaner_kthread+0x2d3/0x480 > [ 33.031801][ T2451] ? __pfx_cleaner_kthread+0x10/0x10 > [ 33.032113][ T2451] kthread+0x2c1/0x3a0 > [ 33.032354][ T2451] ? _raw_spin_unlock_irq+0x23/0x50 > [ 33.032659][ T2451] ? __pfx_kthread+0x10/0x10 > [ 33.032929][ T2451] ret_from_fork+0x45/0x80 > [ 33.033198][ T2451] ? __pfx_kthread+0x10/0x10 > [ 33.033469][ T2451] ret_from_fork_asm+0x1a/0x30 > [ 33.033756][ T2451] </TASK> > [ 33.033978][ T2451] bug! double evict! crash will happen > [ 33.034168][ T2332] BTRFS info (device loop0): last unmount of > filesystem c9fe44da-de57-406a-8241-57ec7d4412cf > [ 33.034295][ T2451] inode 0xffff88801fb87738 state 0x60 triggered > a > bug > [ 33.034332][ T2451] ------------[ cut here ]------------ > [ 33.035578][ T2451] kernel BUG at fs/inode.c:1768! > [ 33.035873][ T2451] Oops: invalid opcode: 0000 [#1] PREEMPT SMP > KASAN NOPTI > [ 33.036286][ T2451] CPU: 0 PID: 2451 Comm: btrfs-cleaner Not > tainted > 6.10.0-rc2-00225-gf0deb01d52b3-dirty #132 > [ 33.036870][ T2451] Hardware name: QEMU Standard PC (i440FX + > PIIX, > 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > [ 33.037464][ T2451] RIP: 0010:iput+0x896/0x920 > // logs ... > [ 33.042882][ T2451] PKRU: 55555554 > [ 33.043092][ T2451] Call Trace: > [ 33.043288][ T2451] <TASK> > [ 33.044190][ T2451] ? iput+0x896/0x920 > [ 33.047321][ T2451] ? btrfs_run_defrag_inodes+0xb49/0xe60 > [ 33.047653][ T2451] btrfs_run_defrag_inodes+0xa40/0xe60 > [ 33.047975][ T2451] ? __pfx_btrfs_run_defrag_inodes+0x10/0x10 > [ 33.048327][ T2451] ? __pfx___mutex_unlock_slowpath+0x10/0x10 > [ 33.048678][ T2451] ? __pfx_do_raw_spin_lock+0x10/0x10 > [ 33.048996][ T2451] ? _raw_spin_unlock+0x28/0x50 > [ 33.049285][ T2451] ? > btrfs_clean_one_deleted_snapshot+0x2b2/0x420 > [ 33.049660][ T2451] cleaner_kthread+0x2ee/0x480 > [ 33.049941][ T2451] ? __pfx_cleaner_kthread+0x10/0x10 > [ 33.050252][ T2451] kthread+0x2c1/0x3a0 > [ 33.050492][ T2451] ? _raw_spin_unlock_irq+0x23/0x50 > [ 33.050796][ T2451] ? __pfx_kthread+0x10/0x10 > [ 33.051067][ T2451] ret_from_fork+0x45/0x80 > [ 33.051333][ T2451] ? __pfx_kthread+0x10/0x10 > [ 33.051603][ T2451] ret_from_fork_asm+0x1a/0x30 > [ 33.051887][ T2451] </TASK> > [ 33.052066][ T2451] Modules linked in: > [ 33.052316][ T2451] ---[ end trace 0000000000000000 ]--- > [ 33.052634][ T2451] RIP: 0010:iput+0x896/0x920 > > Please note inode 0xffff88801fb87738, which was set debug flag by > __inode_add_lru() in the run_delayed_iput_locked() code path. This > indicates that the inode was not actually evicted. > Then, please note the double evict log, which shows that this > inode which was set debug flag and was not be evicted is > still on the sb list and was successfully acquired by btrfs_iget(). > Next, take note of the unmount log, indicating that there was > a concurrent unmount thread which has already called evict_inodes() > at that time. > Finally, observe where the BUG was triggered: the state of inode > 0xffff88801fb87738 had been set to 0x60 by the unmount thread calling > inode_clear(), and at the same time, btrfs-cleaner called iput(), > ultimately leading to the BUG at the beginning check of iput(). > Although the point of trigger differs from the one reported by > syzbot, there is no doubt that they are the same bug. > > You could apply the above diff and try to reproduce the bug using the > reproducer[1]. > > If there is anything I'm missing or more info was needed, please let > me know. > > [1]: https://syzkaller.appspot.com/x/repro.c?x=14c57f16980000 > > Thanks.
Hi, Josef I believe there is a bug in the following scenario, but I'm not sure if it is the same bug reported by syzbot. Do you have any idea? umount thread: btrfs-cleaner thread: btrfs_run_delayed_iputs() ->run_delayed_iput_locked() btrfs_kill_super() ->iput(inode) ->generic_shutdown_super() ->spin_lock(inode) ->evict_inodes() // inode->i_count dec to 0 ->spin_lock(inode) ->iput_final() // cause some reason, get into // __inode_add_lru // passed i_count==0 test ->__inode_add_lru() // and then schedule out // so iput_final() returned wich I_FREEING was not set // note here: the inode still in the sb list ->__btrfs_run_defrag_inode() ->btrfs_iget() ->find_inode() ->spin_lock(inode) ->__iget(); // i_count inc to 1 ->spin_unlock(inode); // schedule back spin_lock(inode) // I_FREEING was not set // so we continue set I_FREEING flag spin_unlock(inode) ->iput() // put the inode into dispose ->spin_lock(inode) // list // dec i_count to 0 ->dispose_list() ->iput_final() ->evict() ->spin_unlock() ->evict() Now, we have two threads simultaneously evicting the same inode, which led to a bug. I think this can be addressed by protecting the atomic_read(inode->i_count) in evict_inode() with inode->i_lock.
On Thu, Aug 22, 2024 at 08:45:24PM +0800, Julian Sun wrote: > Hi, Josef > > I believe there is a bug in the following scenario, but I'm not sure > if it is the same bug reported by syzbot. Do you have any idea? > > umount thread: btrfs-cleaner thread: > btrfs_run_delayed_iputs() > ->run_delayed_iput_locked() > btrfs_kill_super() ->iput(inode) > ->generic_shutdown_super() ->spin_lock(inode) > ->evict_inodes() // inode->i_count dec to 0 > ->spin_lock(inode) ->iput_final() > // cause some reason, get into > // __inode_add_lru > // passed i_count==0 test ->__inode_add_lru() > // and then schedule out // so iput_final() returned wich I_FREEING was not set > // note here: the inode still in the sb list > ->__btrfs_run_defrag_inode() > ->btrfs_iget() > ->find_inode() > ->spin_lock(inode) > ->__iget(); // i_count inc to 1 > ->spin_unlock(inode); > // schedule back > spin_lock(inode) > // I_FREEING was not set > // so we continue > set I_FREEING flag > spin_unlock(inode) ->iput() > // put the inode into dispose ->spin_lock(inode) > // list // dec i_count to 0 > ->dispose_list() ->iput_final() > ->evict() ->spin_unlock() > ->evict() > > Now, we have two threads simultaneously evicting > the same inode, which led to a bug. > I think this can be addressed by protecting the > atomic_read(inode->i_count) in evict_inode() with > inode->i_lock. For reference, this will be fixed in VFS https://lore.kernel.org/linux-btrfs/20240823130730.658881-1-sunjunchao2870@gmail.com/
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index f05cce7c8b8d..f7e87fe583ab 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2093,6 +2093,7 @@ static int btrfs_get_tree(struct fs_context *fc) static void btrfs_kill_super(struct super_block *sb) { struct btrfs_fs_info *fs_info = btrfs_sb(sb); + set_bit(BTRFS_FS_CLOSING_START, &fs_info->flags); kill_anon_super(sb); btrfs_free_fs_info(fs_info); }
There is a race condition generic_shutdown_super() and __btrfs_run_defrag_inode(). Consider the following scenario: umount thread: btrfs-cleaner thread: btrfs_run_delayed_iputs() ->run_delayed_iput_locked() ->iput(inode) // Here the inode (ie ino 261) will be cleared and freed btrfs_kill_super() ->generic_shutdown_super() btrfs_run_defrag_inodes() ->__btrfs_run_defrag_inode() ->btrfs_iget(ino) // The inode 261 was recreated with i_count=1 // and added to the sb list ->evict_inodes(sb) // After some work // inode 261 was added ->iput(inode) // to the dispose list ->iput_funal() ->evict(inode) ->evict(inode) Now, we have two threads simultaneously evicting the same inode, which led to a bug. The above behavior can be confirmed by the log I added for debugging and the log printed when BUG was triggered. Due to space limitations, I cannot paste the full diff and here is a brief describtion. First, within __btrfs_run_defrag_inode(), set inode->i_state |= (1<<19) just before calling iput(). Within the dispose_list(), check the flag, if the flag was set, then pr_info("bug! double evict! crash will happen! state is 0x%lx\n", inode->i_state); Here is the printed log when the BUG was triggered: [ 190.686726][ T2336] bug! double evict! crash will happen! state is 0x80020 [ 190.687647][ T2336] ------------[ cut here ]------------ [ 190.688294][ T2336] kernel BUG at fs/inode.c:626! [ 190.688939][ T2336] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 190.689792][ T2336] CPU: 1 PID: 2336 Comm: a.out Not tainted 6.10.0-rc2-00223-g0c529ab65ef8-dirty #109 [ 190.690894][ T2336] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 190.692111][ T2336] RIP: 0010:clear_inode+0x15b/0x190 // some logs... [ 190.704501][ T2336] btrfs_evict_inode+0x529/0xe80 [ 190.706966][ T2336] evict+0x2ed/0x6c0 [ 190.707209][ T2336] dispose_list+0x62/0x260 [ 190.707490][ T2336] evict_inodes+0x34e/0x450 To prevent this behavior, we need to set BTRFS_FS_CLOSING_START before kill_anon_super() to ensure that btrfs_run_defrag_inodes() doesn't continue working after unmount. Reported-and-tested-by: syzbot+67ba3c42bcbb4665d3ad@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=67ba3c42bcbb4665d3ad CC: stable@vger.kernel.org Fixes: c146afad2c7f ("Btrfs: mount ro and remount support") Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> --- fs/btrfs/super.c | 1 + 1 file changed, 1 insertion(+)