Message ID | 1417015735-8581-4-git-send-email-fdmanana@suse.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On 11/26/2014 10:28 AM, Filipe Manana wrote: > There's a race between adding a block group to the list of the unused > block groups and removing an unused block group (cleaner kthread) that > leads to freeing extents that are in use or a crash during transaction > commmit. Basically the cleaner kthread, when executing > btrfs_delete_unused_bgs(), might catch the newly added block group to > the list fs_info->unused_bgs and clear the range representing the whole > group from fs_info->freed_extents[] before the task that added the block > group to the list (running update_block_group()) marked the last freed > extent as dirty in fs_info->freed_extents (pinned_extents). > > That is: > > CPU 1 CPU 2 > > btrfs_delete_unused_bgs() > update_block_group() > add block group to > fs_info->unused_bgs > got block group from the list > clear_extent_bits for the whole > block group range in freed_extents[] > set_extent_dirty for the > range covering the freed > extent in freed_extents[] > (fs_info->pinned_extents) > > block group deleted, and a new block > group with the same logical address is > created > > reserve space from the new block group > for new data or metadata - the reserved > space overlaps the range specified by > CPU 1 for set_extent_dirty() > > commit transaction > find all ranges marked as dirty in > fs_info->pinned_extents, clear them > and add them to the free space cache > > Alternatively, if CPU 2 doesn't create a new block group with the same > logical address, we get a crash/BUG_ON at transaction commit when unpining > extent ranges because we can't find a block group for the range marked as > dirty by CPU 1. Sample trace: > > [ 2163.426462] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC > [ 2163.426640] Modules linked in: btrfs xor raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio crc32c_generic libcrc32c dm_mod nfsd auth_rpc > gss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse parport_pc parport i2c_piix4 processor thermal_sys i2ccore evdev button pcspkr microcode serio_raw ext4 crc16 jbd2 mbcache > sg sr_mod cdrom sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic virtio_scsi floppy ata_piix libata e1000 scsi_mod virtio_pci virtio_ring virtio > [ 2163.428209] CPU: 0 PID: 11858 Comm: btrfs-transacti Tainted: G W 3.17.0-rc5-btrfs-next-1+ #1 > [ 2163.428519] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 > [ 2163.428875] task: ffff88009f2c0650 ti: ffff8801356bc000 task.ti: ffff8801356bc000 > [ 2163.429157] RIP: 0010:[<ffffffffa037728e>] [<ffffffffa037728e>] unpin_extent_range.isra.58+0x62/0x192 [btrfs] > [ 2163.429562] RSP: 0018:ffff8801356bfda8 EFLAGS: 00010246 > [ 2163.429802] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 > [ 2163.429990] RDX: 0000000041bfffff RSI: 0000000001c00000 RDI: ffff880024307080 > [ 2163.430042] RBP: ffff8801356bfde8 R08: 0000000000000068 R09: ffff88003734f118 > [ 2163.430042] R10: ffff8801356bfcb8 R11: fffffffffffffb69 R12: ffff8800243070d0 > [ 2163.430042] R13: 0000000083c04000 R14: ffff8800751b0f00 R15: ffff880024307000 > [ 2163.430042] FS: 0000000000000000(0000) GS:ffff88013f400000(0000) knlGS:0000000000000000 > [ 2163.430042] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 2163.430042] CR2: 00007ff10eb43fc0 CR3: 0000000004cb8000 CR4: 00000000000006f0 > [ 2163.430042] Stack: > [ 2163.430042] ffff8800243070d0 0000000083c08000 0000000083c07fff ffff88012d6bc800 > [ 2163.430042] ffff8800243070d0 ffff8800751b0f18 ffff8800751b0f00 0000000000000000 > [ 2163.430042] ffff8801356bfe18 ffffffffa037a481 0000000083c04000 0000000083c07fff > [ 2163.430042] Call Trace: > [ 2163.430042] [<ffffffffa037a481>] btrfs_finish_extent_commit+0xac/0xbf [btrfs] > [ 2163.430042] [<ffffffffa038c06d>] btrfs_commit_transaction+0x6ee/0x882 [btrfs] > [ 2163.430042] [<ffffffffa03881f1>] transaction_kthread+0xf2/0x1a4 [btrfs] > [ 2163.430042] [<ffffffffa03880ff>] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs] > [ 2163.430042] [<ffffffff8105966b>] kthread+0xb7/0xbf > [ 2163.430042] [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67 > [ 2163.430042] [<ffffffff813ebeac>] ret_from_fork+0x7c/0xb0 > [ 2163.430042] [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67 > > So fix this by making update_block_group() first set the range as dirty > in pinned_extents before adding the block group to the unused_bgs list. > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > --- > fs/btrfs/extent-tree.c | 21 ++++++++++----------- > 1 file changed, 10 insertions(+), 11 deletions(-) > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index b7e40ef..92f61f2 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -5439,7 +5439,17 @@ static int update_block_group(struct btrfs_root *root, > spin_unlock(&cache->space_info->lock); > } else { > old_val -= num_bytes; > + btrfs_set_block_group_used(&cache->item, old_val); > + cache->pinned += num_bytes; > + cache->space_info->bytes_pinned += num_bytes; > + cache->space_info->bytes_used -= num_bytes; > + cache->space_info->disk_used -= num_bytes * factor; > + spin_unlock(&cache->lock); > + spin_unlock(&cache->space_info->lock); > > + set_extent_dirty(info->pinned_extents, > + bytenr, bytenr + num_bytes - 1, > + GFP_NOFS | __GFP_NOFAIL); > /* > * No longer have used bytes in this block group, queue > * it for deletion. > @@ -5453,17 +5463,6 @@ static int update_block_group(struct btrfs_root *root, > } > spin_unlock(&info->unused_bgs_lock); > } > - btrfs_set_block_group_used(&cache->item, old_val); > - cache->pinned += num_bytes; > - cache->space_info->bytes_pinned += num_bytes; > - cache->space_info->bytes_used -= num_bytes; > - cache->space_info->disk_used -= num_bytes * factor; > - spin_unlock(&cache->lock); > - spin_unlock(&cache->space_info->lock); > - > - set_extent_dirty(info->pinned_extents, > - bytenr, bytenr + num_bytes - 1, > - GFP_NOFS | __GFP_NOFAIL); > } > btrfs_put_block_group(cache); > total -= num_bytes; > Reviewed-by: Josef Bacik <jbacik@fb.com> Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index b7e40ef..92f61f2 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -5439,7 +5439,17 @@ static int update_block_group(struct btrfs_root *root, spin_unlock(&cache->space_info->lock); } else { old_val -= num_bytes; + btrfs_set_block_group_used(&cache->item, old_val); + cache->pinned += num_bytes; + cache->space_info->bytes_pinned += num_bytes; + cache->space_info->bytes_used -= num_bytes; + cache->space_info->disk_used -= num_bytes * factor; + spin_unlock(&cache->lock); + spin_unlock(&cache->space_info->lock); + set_extent_dirty(info->pinned_extents, + bytenr, bytenr + num_bytes - 1, + GFP_NOFS | __GFP_NOFAIL); /* * No longer have used bytes in this block group, queue * it for deletion. @@ -5453,17 +5463,6 @@ static int update_block_group(struct btrfs_root *root, } spin_unlock(&info->unused_bgs_lock); } - btrfs_set_block_group_used(&cache->item, old_val); - cache->pinned += num_bytes; - cache->space_info->bytes_pinned += num_bytes; - cache->space_info->bytes_used -= num_bytes; - cache->space_info->disk_used -= num_bytes * factor; - spin_unlock(&cache->lock); - spin_unlock(&cache->space_info->lock); - - set_extent_dirty(info->pinned_extents, - bytenr, bytenr + num_bytes - 1, - GFP_NOFS | __GFP_NOFAIL); } btrfs_put_block_group(cache); total -= num_bytes;
There's a race between adding a block group to the list of the unused block groups and removing an unused block group (cleaner kthread) that leads to freeing extents that are in use or a crash during transaction commmit. Basically the cleaner kthread, when executing btrfs_delete_unused_bgs(), might catch the newly added block group to the list fs_info->unused_bgs and clear the range representing the whole group from fs_info->freed_extents[] before the task that added the block group to the list (running update_block_group()) marked the last freed extent as dirty in fs_info->freed_extents (pinned_extents). That is: CPU 1 CPU 2 btrfs_delete_unused_bgs() update_block_group() add block group to fs_info->unused_bgs got block group from the list clear_extent_bits for the whole block group range in freed_extents[] set_extent_dirty for the range covering the freed extent in freed_extents[] (fs_info->pinned_extents) block group deleted, and a new block group with the same logical address is created reserve space from the new block group for new data or metadata - the reserved space overlaps the range specified by CPU 1 for set_extent_dirty() commit transaction find all ranges marked as dirty in fs_info->pinned_extents, clear them and add them to the free space cache Alternatively, if CPU 2 doesn't create a new block group with the same logical address, we get a crash/BUG_ON at transaction commit when unpining extent ranges because we can't find a block group for the range marked as dirty by CPU 1. Sample trace: [ 2163.426462] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [ 2163.426640] Modules linked in: btrfs xor raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio crc32c_generic libcrc32c dm_mod nfsd auth_rpc gss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse parport_pc parport i2c_piix4 processor thermal_sys i2ccore evdev button pcspkr microcode serio_raw ext4 crc16 jbd2 mbcache sg sr_mod cdrom sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic virtio_scsi floppy ata_piix libata e1000 scsi_mod virtio_pci virtio_ring virtio [ 2163.428209] CPU: 0 PID: 11858 Comm: btrfs-transacti Tainted: G W 3.17.0-rc5-btrfs-next-1+ #1 [ 2163.428519] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 2163.428875] task: ffff88009f2c0650 ti: ffff8801356bc000 task.ti: ffff8801356bc000 [ 2163.429157] RIP: 0010:[<ffffffffa037728e>] [<ffffffffa037728e>] unpin_extent_range.isra.58+0x62/0x192 [btrfs] [ 2163.429562] RSP: 0018:ffff8801356bfda8 EFLAGS: 00010246 [ 2163.429802] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 2163.429990] RDX: 0000000041bfffff RSI: 0000000001c00000 RDI: ffff880024307080 [ 2163.430042] RBP: ffff8801356bfde8 R08: 0000000000000068 R09: ffff88003734f118 [ 2163.430042] R10: ffff8801356bfcb8 R11: fffffffffffffb69 R12: ffff8800243070d0 [ 2163.430042] R13: 0000000083c04000 R14: ffff8800751b0f00 R15: ffff880024307000 [ 2163.430042] FS: 0000000000000000(0000) GS:ffff88013f400000(0000) knlGS:0000000000000000 [ 2163.430042] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 2163.430042] CR2: 00007ff10eb43fc0 CR3: 0000000004cb8000 CR4: 00000000000006f0 [ 2163.430042] Stack: [ 2163.430042] ffff8800243070d0 0000000083c08000 0000000083c07fff ffff88012d6bc800 [ 2163.430042] ffff8800243070d0 ffff8800751b0f18 ffff8800751b0f00 0000000000000000 [ 2163.430042] ffff8801356bfe18 ffffffffa037a481 0000000083c04000 0000000083c07fff [ 2163.430042] Call Trace: [ 2163.430042] [<ffffffffa037a481>] btrfs_finish_extent_commit+0xac/0xbf [btrfs] [ 2163.430042] [<ffffffffa038c06d>] btrfs_commit_transaction+0x6ee/0x882 [btrfs] [ 2163.430042] [<ffffffffa03881f1>] transaction_kthread+0xf2/0x1a4 [btrfs] [ 2163.430042] [<ffffffffa03880ff>] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs] [ 2163.430042] [<ffffffff8105966b>] kthread+0xb7/0xbf [ 2163.430042] [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67 [ 2163.430042] [<ffffffff813ebeac>] ret_from_fork+0x7c/0xb0 [ 2163.430042] [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67 So fix this by making update_block_group() first set the range as dirty in pinned_extents before adding the block group to the unused_bgs list. Signed-off-by: Filipe Manana <fdmanana@suse.com> --- fs/btrfs/extent-tree.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-)