diff mbox

[2/6] Btrfs: fix crash caused by block group removal

Message ID 1417015735-8581-3-git-send-email-fdmanana@suse.com (mailing list archive)
State Accepted
Headers show

Commit Message

Filipe Manana Nov. 26, 2014, 3:28 p.m. UTC
If we remove a block group (because it became empty), we might have left
a caching_ctl structure in fs_info->caching_block_groups that points to
the block group and is accessed at transaction commit time. This results
in accessing an invalid or incorrect block group. This issue became visible
after Josef's patch "Btrfs: remove empty block groups automatically".

So if the block group is removed make sure we don't leave a dangling
caching_ctl in caching_block_groups.

Sample crash trace:

[58380.439449] BUG: unable to handle kernel paging request at ffff8801446eaeb8
[58380.439707] IP: [<ffffffffa03f6d05>] block_group_cache_done.isra.21+0xc/0x1c [btrfs]
[58380.440879] PGD 1acb067 PUD 23f5ff067 PMD 23f5db067 PTE 80000001446ea060
[58380.441220] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[58380.441486] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse processor i2c_piix4 parport_pc parport pcspkr serio_raw evdev i2ccore thermal_sys microcode button ext4 crc16 jbd2 mbcache sr_mod cdrom ata_generic sg sd_mod crc_t10dif crct10dif_generic crct10dif_common virtio_scsi floppy ata_piix e1000 libata virtio_pci scsi_mod virtio_ring virtio [last unloaded: btrfs]
[58380.443238] CPU: 3 PID: 25728 Comm: btrfs-transacti Tainted: G        W      3.17.0-rc5-btrfs-next-1+ #1
[58380.443238] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[58380.443238] task: ffff88013ac82090 ti: ffff88013896c000 task.ti: ffff88013896c000
[58380.443238] RIP: 0010:[<ffffffffa03f6d05>]  [<ffffffffa03f6d05>] block_group_cache_done.isra.21+0xc/0x1c [btrfs]
[58380.443238] RSP: 0018:ffff88013896fdd8  EFLAGS: 00010283
[58380.443238] RAX: ffff880222cae850 RBX: ffff880119ba74c0 RCX: 0000000000000000
[58380.443238] RDX: 0000000000000000 RSI: ffff880185e16800 RDI: ffff8801446eaeb8
[58380.443238] RBP: ffff88013896fdd8 R08: ffff8801a9ca9fa8 R09: ffff88013896fc60
[58380.443238] R10: ffff88013896fd28 R11: 0000000000000000 R12: ffff880222cae000
[58380.443238] R13: ffff880222cae850 R14: ffff880222cae6b0 R15: ffff8801446eae00
[58380.443238] FS:  0000000000000000(0000) GS:ffff88023ed80000(0000) knlGS:0000000000000000
[58380.443238] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[58380.443238] CR2: ffff8801446eaeb8 CR3: 0000000001811000 CR4: 00000000000006e0
[58380.443238] Stack:
[58380.443238]  ffff88013896fe18 ffffffffa03fe2d5 ffff880222cae850 ffff880185e16800
[58380.443238]  ffff88000dc41c20 0000000000000000 ffff8801a9ca9f00 0000000000000000
[58380.443238]  ffff88013896fe80 ffffffffa040fbcf ffff88018b0dcdb0 ffff88013ac82090
[58380.443238] Call Trace:
[58380.443238]  [<ffffffffa03fe2d5>] btrfs_prepare_extent_commit+0x5a/0xd7 [btrfs]
[58380.443238]  [<ffffffffa040fbcf>] btrfs_commit_transaction+0x45c/0x882 [btrfs]
[58380.443238]  [<ffffffffa040c058>] transaction_kthread+0xf2/0x1a4 [btrfs]
[58380.443238]  [<ffffffffa040bf66>] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs]
[58380.443238]  [<ffffffff8105966b>] kthread+0xb7/0xbf
[58380.443238]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
[58380.443238]  [<ffffffff813ebeac>] ret_from_fork+0x7c/0xb0
[58380.443238]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.h       |  1 +
 fs/btrfs/extent-tree.c | 27 +++++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

Comments

Josef Bacik Nov. 26, 2014, 3:57 p.m. UTC | #1
On 11/26/2014 10:28 AM, Filipe Manana wrote:
> If we remove a block group (because it became empty), we might have left
> a caching_ctl structure in fs_info->caching_block_groups that points to
> the block group and is accessed at transaction commit time. This results
> in accessing an invalid or incorrect block group. This issue became visible
> after Josef's patch "Btrfs: remove empty block groups automatically".
>
> So if the block group is removed make sure we don't leave a dangling
> caching_ctl in caching_block_groups.
>
> Sample crash trace:
>
> [58380.439449] BUG: unable to handle kernel paging request at ffff8801446eaeb8
> [58380.439707] IP: [<ffffffffa03f6d05>] block_group_cache_done.isra.21+0xc/0x1c [btrfs]
> [58380.440879] PGD 1acb067 PUD 23f5ff067 PMD 23f5db067 PTE 80000001446ea060
> [58380.441220] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> [58380.441486] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse processor i2c_piix4 parport_pc parport pcspkr serio_raw evdev i2ccore thermal_sys microcode button ext4 crc16 jbd2 mbcache sr_mod cdrom ata_generic sg sd_mod crc_t10dif crct10dif_generic crct10dif_common virtio_scsi floppy ata_piix e1000 libata virtio_pci scsi_mod virtio_ring virtio [last unloaded: btrfs]
> [58380.443238] CPU: 3 PID: 25728 Comm: btrfs-transacti Tainted: G        W      3.17.0-rc5-btrfs-next-1+ #1
> [58380.443238] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
> [58380.443238] task: ffff88013ac82090 ti: ffff88013896c000 task.ti: ffff88013896c000
> [58380.443238] RIP: 0010:[<ffffffffa03f6d05>]  [<ffffffffa03f6d05>] block_group_cache_done.isra.21+0xc/0x1c [btrfs]
> [58380.443238] RSP: 0018:ffff88013896fdd8  EFLAGS: 00010283
> [58380.443238] RAX: ffff880222cae850 RBX: ffff880119ba74c0 RCX: 0000000000000000
> [58380.443238] RDX: 0000000000000000 RSI: ffff880185e16800 RDI: ffff8801446eaeb8
> [58380.443238] RBP: ffff88013896fdd8 R08: ffff8801a9ca9fa8 R09: ffff88013896fc60
> [58380.443238] R10: ffff88013896fd28 R11: 0000000000000000 R12: ffff880222cae000
> [58380.443238] R13: ffff880222cae850 R14: ffff880222cae6b0 R15: ffff8801446eae00
> [58380.443238] FS:  0000000000000000(0000) GS:ffff88023ed80000(0000) knlGS:0000000000000000
> [58380.443238] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [58380.443238] CR2: ffff8801446eaeb8 CR3: 0000000001811000 CR4: 00000000000006e0
> [58380.443238] Stack:
> [58380.443238]  ffff88013896fe18 ffffffffa03fe2d5 ffff880222cae850 ffff880185e16800
> [58380.443238]  ffff88000dc41c20 0000000000000000 ffff8801a9ca9f00 0000000000000000
> [58380.443238]  ffff88013896fe80 ffffffffa040fbcf ffff88018b0dcdb0 ffff88013ac82090
> [58380.443238] Call Trace:
> [58380.443238]  [<ffffffffa03fe2d5>] btrfs_prepare_extent_commit+0x5a/0xd7 [btrfs]
> [58380.443238]  [<ffffffffa040fbcf>] btrfs_commit_transaction+0x45c/0x882 [btrfs]
> [58380.443238]  [<ffffffffa040c058>] transaction_kthread+0xf2/0x1a4 [btrfs]
> [58380.443238]  [<ffffffffa040bf66>] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs]
> [58380.443238]  [<ffffffff8105966b>] kthread+0xb7/0xbf
> [58380.443238]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
> [58380.443238]  [<ffffffff813ebeac>] ret_from_fork+0x7c/0xb0
> [58380.443238]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>   fs/btrfs/ctree.h       |  1 +
>   fs/btrfs/extent-tree.c | 27 +++++++++++++++++++++++++++
>   2 files changed, 28 insertions(+)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index d3ccd09..7f40a65 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1277,6 +1277,7 @@ struct btrfs_block_group_cache {
>   	unsigned int ro:1;
>   	unsigned int dirty:1;
>   	unsigned int iref:1;
> +	unsigned int has_caching_ctl:1;
>

Don't do this, just unconditionally call get_caching_control in 
btrfs_remove_block_group, then if we get something we can do stuff, 
otherwise we can just continue.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Filipe Manana Nov. 26, 2014, 4:09 p.m. UTC | #2
On Wed, Nov 26, 2014 at 3:57 PM, Josef Bacik <jbacik@fb.com> wrote:
> On 11/26/2014 10:28 AM, Filipe Manana wrote:
>>
>> If we remove a block group (because it became empty), we might have left
>> a caching_ctl structure in fs_info->caching_block_groups that points to
>> the block group and is accessed at transaction commit time. This results
>> in accessing an invalid or incorrect block group. This issue became
>> visible
>> after Josef's patch "Btrfs: remove empty block groups automatically".
>>
>> So if the block group is removed make sure we don't leave a dangling
>> caching_ctl in caching_block_groups.
>>
>> Sample crash trace:
>>
>> [58380.439449] BUG: unable to handle kernel paging request at
>> ffff8801446eaeb8
>> [58380.439707] IP: [<ffffffffa03f6d05>]
>> block_group_cache_done.isra.21+0xc/0x1c [btrfs]
>> [58380.440879] PGD 1acb067 PUD 23f5ff067 PMD 23f5db067 PTE
>> 80000001446ea060
>> [58380.441220] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
>> [58380.441486] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd
>> auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse
>> processor i2c_piix4 parport_pc parport pcspkr serio_raw evdev i2ccore
>> thermal_sys microcode button ext4 crc16 jbd2 mbcache sr_mod cdrom
>> ata_generic sg sd_mod crc_t10dif crct10dif_generic crct10dif_common
>> virtio_scsi floppy ata_piix e1000 libata virtio_pci scsi_mod virtio_ring
>> virtio [last unloaded: btrfs]
>> [58380.443238] CPU: 3 PID: 25728 Comm: btrfs-transacti Tainted: G        W
>> 3.17.0-rc5-btrfs-next-1+ #1
>> [58380.443238] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
>> [58380.443238] task: ffff88013ac82090 ti: ffff88013896c000 task.ti:
>> ffff88013896c000
>> [58380.443238] RIP: 0010:[<ffffffffa03f6d05>]  [<ffffffffa03f6d05>]
>> block_group_cache_done.isra.21+0xc/0x1c [btrfs]
>> [58380.443238] RSP: 0018:ffff88013896fdd8  EFLAGS: 00010283
>> [58380.443238] RAX: ffff880222cae850 RBX: ffff880119ba74c0 RCX:
>> 0000000000000000
>> [58380.443238] RDX: 0000000000000000 RSI: ffff880185e16800 RDI:
>> ffff8801446eaeb8
>> [58380.443238] RBP: ffff88013896fdd8 R08: ffff8801a9ca9fa8 R09:
>> ffff88013896fc60
>> [58380.443238] R10: ffff88013896fd28 R11: 0000000000000000 R12:
>> ffff880222cae000
>> [58380.443238] R13: ffff880222cae850 R14: ffff880222cae6b0 R15:
>> ffff8801446eae00
>> [58380.443238] FS:  0000000000000000(0000) GS:ffff88023ed80000(0000)
>> knlGS:0000000000000000
>> [58380.443238] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [58380.443238] CR2: ffff8801446eaeb8 CR3: 0000000001811000 CR4:
>> 00000000000006e0
>> [58380.443238] Stack:
>> [58380.443238]  ffff88013896fe18 ffffffffa03fe2d5 ffff880222cae850
>> ffff880185e16800
>> [58380.443238]  ffff88000dc41c20 0000000000000000 ffff8801a9ca9f00
>> 0000000000000000
>> [58380.443238]  ffff88013896fe80 ffffffffa040fbcf ffff88018b0dcdb0
>> ffff88013ac82090
>> [58380.443238] Call Trace:
>> [58380.443238]  [<ffffffffa03fe2d5>] btrfs_prepare_extent_commit+0x5a/0xd7
>> [btrfs]
>> [58380.443238]  [<ffffffffa040fbcf>] btrfs_commit_transaction+0x45c/0x882
>> [btrfs]
>> [58380.443238]  [<ffffffffa040c058>] transaction_kthread+0xf2/0x1a4
>> [btrfs]
>> [58380.443238]  [<ffffffffa040bf66>] ?
>> btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs]
>> [58380.443238]  [<ffffffff8105966b>] kthread+0xb7/0xbf
>> [58380.443238]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
>> [58380.443238]  [<ffffffff813ebeac>] ret_from_fork+0x7c/0xb0
>> [58380.443238]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
>>
>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>> ---
>>   fs/btrfs/ctree.h       |  1 +
>>   fs/btrfs/extent-tree.c | 27 +++++++++++++++++++++++++++
>>   2 files changed, 28 insertions(+)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index d3ccd09..7f40a65 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -1277,6 +1277,7 @@ struct btrfs_block_group_cache {
>>         unsigned int ro:1;
>>         unsigned int dirty:1;
>>         unsigned int iref:1;
>> +       unsigned int has_caching_ctl:1;
>>
>
> Don't do this, just unconditionally call get_caching_control in
> btrfs_remove_block_group, then if we get something we can do stuff,
> otherwise we can just continue.  Thanks,

That's what I initially thought too. However, get_caching_control only
returns us the caching_ctl if block_group->cached ==
BTRFS_CACHE_STARTED, so it's not enough to use it exclusively. The
has_caching_ctl flag is just to avoid holding the semaphore and search
through the list (since block_group->caching_ctl can be NULL but a
caching_ctl point to the block group can be in the list).

>
> Josef
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josef Bacik Nov. 26, 2014, 4:24 p.m. UTC | #3
On 11/26/2014 11:09 AM, Filipe David Manana wrote:
> On Wed, Nov 26, 2014 at 3:57 PM, Josef Bacik <jbacik@fb.com> wrote:
>> On 11/26/2014 10:28 AM, Filipe Manana wrote:
>>>
>>> If we remove a block group (because it became empty), we might have left
>>> a caching_ctl structure in fs_info->caching_block_groups that points to
>>> the block group and is accessed at transaction commit time. This results
>>> in accessing an invalid or incorrect block group. This issue became
>>> visible
>>> after Josef's patch "Btrfs: remove empty block groups automatically".
>>>
>>> So if the block group is removed make sure we don't leave a dangling
>>> caching_ctl in caching_block_groups.
>>>
>>> Sample crash trace:
>>>
>>> [58380.439449] BUG: unable to handle kernel paging request at
>>> ffff8801446eaeb8
>>> [58380.439707] IP: [<ffffffffa03f6d05>]
>>> block_group_cache_done.isra.21+0xc/0x1c [btrfs]
>>> [58380.440879] PGD 1acb067 PUD 23f5ff067 PMD 23f5db067 PTE
>>> 80000001446ea060
>>> [58380.441220] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
>>> [58380.441486] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd
>>> auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse
>>> processor i2c_piix4 parport_pc parport pcspkr serio_raw evdev i2ccore
>>> thermal_sys microcode button ext4 crc16 jbd2 mbcache sr_mod cdrom
>>> ata_generic sg sd_mod crc_t10dif crct10dif_generic crct10dif_common
>>> virtio_scsi floppy ata_piix e1000 libata virtio_pci scsi_mod virtio_ring
>>> virtio [last unloaded: btrfs]
>>> [58380.443238] CPU: 3 PID: 25728 Comm: btrfs-transacti Tainted: G        W
>>> 3.17.0-rc5-btrfs-next-1+ #1
>>> [58380.443238] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>>> rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
>>> [58380.443238] task: ffff88013ac82090 ti: ffff88013896c000 task.ti:
>>> ffff88013896c000
>>> [58380.443238] RIP: 0010:[<ffffffffa03f6d05>]  [<ffffffffa03f6d05>]
>>> block_group_cache_done.isra.21+0xc/0x1c [btrfs]
>>> [58380.443238] RSP: 0018:ffff88013896fdd8  EFLAGS: 00010283
>>> [58380.443238] RAX: ffff880222cae850 RBX: ffff880119ba74c0 RCX:
>>> 0000000000000000
>>> [58380.443238] RDX: 0000000000000000 RSI: ffff880185e16800 RDI:
>>> ffff8801446eaeb8
>>> [58380.443238] RBP: ffff88013896fdd8 R08: ffff8801a9ca9fa8 R09:
>>> ffff88013896fc60
>>> [58380.443238] R10: ffff88013896fd28 R11: 0000000000000000 R12:
>>> ffff880222cae000
>>> [58380.443238] R13: ffff880222cae850 R14: ffff880222cae6b0 R15:
>>> ffff8801446eae00
>>> [58380.443238] FS:  0000000000000000(0000) GS:ffff88023ed80000(0000)
>>> knlGS:0000000000000000
>>> [58380.443238] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> [58380.443238] CR2: ffff8801446eaeb8 CR3: 0000000001811000 CR4:
>>> 00000000000006e0
>>> [58380.443238] Stack:
>>> [58380.443238]  ffff88013896fe18 ffffffffa03fe2d5 ffff880222cae850
>>> ffff880185e16800
>>> [58380.443238]  ffff88000dc41c20 0000000000000000 ffff8801a9ca9f00
>>> 0000000000000000
>>> [58380.443238]  ffff88013896fe80 ffffffffa040fbcf ffff88018b0dcdb0
>>> ffff88013ac82090
>>> [58380.443238] Call Trace:
>>> [58380.443238]  [<ffffffffa03fe2d5>] btrfs_prepare_extent_commit+0x5a/0xd7
>>> [btrfs]
>>> [58380.443238]  [<ffffffffa040fbcf>] btrfs_commit_transaction+0x45c/0x882
>>> [btrfs]
>>> [58380.443238]  [<ffffffffa040c058>] transaction_kthread+0xf2/0x1a4
>>> [btrfs]
>>> [58380.443238]  [<ffffffffa040bf66>] ?
>>> btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs]
>>> [58380.443238]  [<ffffffff8105966b>] kthread+0xb7/0xbf
>>> [58380.443238]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
>>> [58380.443238]  [<ffffffff813ebeac>] ret_from_fork+0x7c/0xb0
>>> [58380.443238]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
>>>
>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>> ---
>>>    fs/btrfs/ctree.h       |  1 +
>>>    fs/btrfs/extent-tree.c | 27 +++++++++++++++++++++++++++
>>>    2 files changed, 28 insertions(+)
>>>
>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>> index d3ccd09..7f40a65 100644
>>> --- a/fs/btrfs/ctree.h
>>> +++ b/fs/btrfs/ctree.h
>>> @@ -1277,6 +1277,7 @@ struct btrfs_block_group_cache {
>>>          unsigned int ro:1;
>>>          unsigned int dirty:1;
>>>          unsigned int iref:1;
>>> +       unsigned int has_caching_ctl:1;
>>>
>>
>> Don't do this, just unconditionally call get_caching_control in
>> btrfs_remove_block_group, then if we get something we can do stuff,
>> otherwise we can just continue.  Thanks,
>
> That's what I initially thought too. However, get_caching_control only
> returns us the caching_ctl if block_group->cached ==
> BTRFS_CACHE_STARTED, so it's not enough to use it exclusively. The
> has_caching_ctl flag is just to avoid holding the semaphore and search
> through the list (since block_group->caching_ctl can be NULL but a
> caching_ctl point to the block group can be in the list).
>

Oh God that's not good, we need to change get_caching_control to return 
if there is a caching control at all, since the other users want to wait 
for the fast caching to finish too.  So change that and then use it 
unconditionally.  I bet this has been causing us the random early ENOSPC 
problems.  Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Filipe Manana Nov. 26, 2014, 4:34 p.m. UTC | #4
On Wed, Nov 26, 2014 at 4:24 PM, Josef Bacik <jbacik@fb.com> wrote:
> On 11/26/2014 11:09 AM, Filipe David Manana wrote:
>>
>> On Wed, Nov 26, 2014 at 3:57 PM, Josef Bacik <jbacik@fb.com> wrote:
>>>
>>> On 11/26/2014 10:28 AM, Filipe Manana wrote:
>>>>
>>>>
>>>> If we remove a block group (because it became empty), we might have left
>>>> a caching_ctl structure in fs_info->caching_block_groups that points to
>>>> the block group and is accessed at transaction commit time. This results
>>>> in accessing an invalid or incorrect block group. This issue became
>>>> visible
>>>> after Josef's patch "Btrfs: remove empty block groups automatically".
>>>>
>>>> So if the block group is removed make sure we don't leave a dangling
>>>> caching_ctl in caching_block_groups.
>>>>
>>>> Sample crash trace:
>>>>
>>>> [58380.439449] BUG: unable to handle kernel paging request at
>>>> ffff8801446eaeb8
>>>> [58380.439707] IP: [<ffffffffa03f6d05>]
>>>> block_group_cache_done.isra.21+0xc/0x1c [btrfs]
>>>> [58380.440879] PGD 1acb067 PUD 23f5ff067 PMD 23f5db067 PTE
>>>> 80000001446ea060
>>>> [58380.441220] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
>>>> [58380.441486] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd
>>>> auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse
>>>> processor i2c_piix4 parport_pc parport pcspkr serio_raw evdev i2ccore
>>>> thermal_sys microcode button ext4 crc16 jbd2 mbcache sr_mod cdrom
>>>> ata_generic sg sd_mod crc_t10dif crct10dif_generic crct10dif_common
>>>> virtio_scsi floppy ata_piix e1000 libata virtio_pci scsi_mod virtio_ring
>>>> virtio [last unloaded: btrfs]
>>>> [58380.443238] CPU: 3 PID: 25728 Comm: btrfs-transacti Tainted: G
>>>> W
>>>> 3.17.0-rc5-btrfs-next-1+ #1
>>>> [58380.443238] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>>>> BIOS
>>>> rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
>>>> [58380.443238] task: ffff88013ac82090 ti: ffff88013896c000 task.ti:
>>>> ffff88013896c000
>>>> [58380.443238] RIP: 0010:[<ffffffffa03f6d05>]  [<ffffffffa03f6d05>]
>>>> block_group_cache_done.isra.21+0xc/0x1c [btrfs]
>>>> [58380.443238] RSP: 0018:ffff88013896fdd8  EFLAGS: 00010283
>>>> [58380.443238] RAX: ffff880222cae850 RBX: ffff880119ba74c0 RCX:
>>>> 0000000000000000
>>>> [58380.443238] RDX: 0000000000000000 RSI: ffff880185e16800 RDI:
>>>> ffff8801446eaeb8
>>>> [58380.443238] RBP: ffff88013896fdd8 R08: ffff8801a9ca9fa8 R09:
>>>> ffff88013896fc60
>>>> [58380.443238] R10: ffff88013896fd28 R11: 0000000000000000 R12:
>>>> ffff880222cae000
>>>> [58380.443238] R13: ffff880222cae850 R14: ffff880222cae6b0 R15:
>>>> ffff8801446eae00
>>>> [58380.443238] FS:  0000000000000000(0000) GS:ffff88023ed80000(0000)
>>>> knlGS:0000000000000000
>>>> [58380.443238] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>> [58380.443238] CR2: ffff8801446eaeb8 CR3: 0000000001811000 CR4:
>>>> 00000000000006e0
>>>> [58380.443238] Stack:
>>>> [58380.443238]  ffff88013896fe18 ffffffffa03fe2d5 ffff880222cae850
>>>> ffff880185e16800
>>>> [58380.443238]  ffff88000dc41c20 0000000000000000 ffff8801a9ca9f00
>>>> 0000000000000000
>>>> [58380.443238]  ffff88013896fe80 ffffffffa040fbcf ffff88018b0dcdb0
>>>> ffff88013ac82090
>>>> [58380.443238] Call Trace:
>>>> [58380.443238]  [<ffffffffa03fe2d5>]
>>>> btrfs_prepare_extent_commit+0x5a/0xd7
>>>> [btrfs]
>>>> [58380.443238]  [<ffffffffa040fbcf>]
>>>> btrfs_commit_transaction+0x45c/0x882
>>>> [btrfs]
>>>> [58380.443238]  [<ffffffffa040c058>] transaction_kthread+0xf2/0x1a4
>>>> [btrfs]
>>>> [58380.443238]  [<ffffffffa040bf66>] ?
>>>> btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs]
>>>> [58380.443238]  [<ffffffff8105966b>] kthread+0xb7/0xbf
>>>> [58380.443238]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
>>>> [58380.443238]  [<ffffffff813ebeac>] ret_from_fork+0x7c/0xb0
>>>> [58380.443238]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
>>>>
>>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>>> ---
>>>>    fs/btrfs/ctree.h       |  1 +
>>>>    fs/btrfs/extent-tree.c | 27 +++++++++++++++++++++++++++
>>>>    2 files changed, 28 insertions(+)
>>>>
>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>>> index d3ccd09..7f40a65 100644
>>>> --- a/fs/btrfs/ctree.h
>>>> +++ b/fs/btrfs/ctree.h
>>>> @@ -1277,6 +1277,7 @@ struct btrfs_block_group_cache {
>>>>          unsigned int ro:1;
>>>>          unsigned int dirty:1;
>>>>          unsigned int iref:1;
>>>> +       unsigned int has_caching_ctl:1;
>>>>
>>>
>>> Don't do this, just unconditionally call get_caching_control in
>>> btrfs_remove_block_group, then if we get something we can do stuff,
>>> otherwise we can just continue.  Thanks,
>>
>>
>> That's what I initially thought too. However, get_caching_control only
>> returns us the caching_ctl if block_group->cached ==
>> BTRFS_CACHE_STARTED, so it's not enough to use it exclusively. The
>> has_caching_ctl flag is just to avoid holding the semaphore and search
>> through the list (since block_group->caching_ctl can be NULL but a
>> caching_ctl point to the block group can be in the list).
>>
>
> Oh God that's not good, we need to change get_caching_control to return if
> there is a caching control at all, since the other users want to wait for
> the fast caching to finish too.  So change that and then use it
> unconditionally.  I bet this has been causing us the random early ENOSPC
> problems.  Thanks,

Right, I think that's a separate change different from what I'm trying to fix.

When caching_thread finishes, it sets the block_group->caching_ctl to
NULL, and caching_ctl remains in the list until transaction commit
time. So doing that change, to make get_caching_control() always
return the block group's ->caching_ctl wouldn't solve this issue - we
still need to search in the list if block_group->caching_ctl is NULL.


>
> Josef
>
Josef Bacik Nov. 26, 2014, 4:41 p.m. UTC | #5
On 11/26/2014 11:34 AM, Filipe David Manana wrote:
> On Wed, Nov 26, 2014 at 4:24 PM, Josef Bacik <jbacik@fb.com> wrote:
>> On 11/26/2014 11:09 AM, Filipe David Manana wrote:
>>>
>>> On Wed, Nov 26, 2014 at 3:57 PM, Josef Bacik <jbacik@fb.com> wrote:
>>>>
>>>> On 11/26/2014 10:28 AM, Filipe Manana wrote:
>>>>>
>>>>>
>>>>> If we remove a block group (because it became empty), we might have left
>>>>> a caching_ctl structure in fs_info->caching_block_groups that points to
>>>>> the block group and is accessed at transaction commit time. This results
>>>>> in accessing an invalid or incorrect block group. This issue became
>>>>> visible
>>>>> after Josef's patch "Btrfs: remove empty block groups automatically".
>>>>>
>>>>> So if the block group is removed make sure we don't leave a dangling
>>>>> caching_ctl in caching_block_groups.
>>>>>
>>>>> Sample crash trace:
>>>>>
>>>>> [58380.439449] BUG: unable to handle kernel paging request at
>>>>> ffff8801446eaeb8
>>>>> [58380.439707] IP: [<ffffffffa03f6d05>]
>>>>> block_group_cache_done.isra.21+0xc/0x1c [btrfs]
>>>>> [58380.440879] PGD 1acb067 PUD 23f5ff067 PMD 23f5db067 PTE
>>>>> 80000001446ea060
>>>>> [58380.441220] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
>>>>> [58380.441486] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd
>>>>> auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse
>>>>> processor i2c_piix4 parport_pc parport pcspkr serio_raw evdev i2ccore
>>>>> thermal_sys microcode button ext4 crc16 jbd2 mbcache sr_mod cdrom
>>>>> ata_generic sg sd_mod crc_t10dif crct10dif_generic crct10dif_common
>>>>> virtio_scsi floppy ata_piix e1000 libata virtio_pci scsi_mod virtio_ring
>>>>> virtio [last unloaded: btrfs]
>>>>> [58380.443238] CPU: 3 PID: 25728 Comm: btrfs-transacti Tainted: G
>>>>> W
>>>>> 3.17.0-rc5-btrfs-next-1+ #1
>>>>> [58380.443238] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>>>>> BIOS
>>>>> rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
>>>>> [58380.443238] task: ffff88013ac82090 ti: ffff88013896c000 task.ti:
>>>>> ffff88013896c000
>>>>> [58380.443238] RIP: 0010:[<ffffffffa03f6d05>]  [<ffffffffa03f6d05>]
>>>>> block_group_cache_done.isra.21+0xc/0x1c [btrfs]
>>>>> [58380.443238] RSP: 0018:ffff88013896fdd8  EFLAGS: 00010283
>>>>> [58380.443238] RAX: ffff880222cae850 RBX: ffff880119ba74c0 RCX:
>>>>> 0000000000000000
>>>>> [58380.443238] RDX: 0000000000000000 RSI: ffff880185e16800 RDI:
>>>>> ffff8801446eaeb8
>>>>> [58380.443238] RBP: ffff88013896fdd8 R08: ffff8801a9ca9fa8 R09:
>>>>> ffff88013896fc60
>>>>> [58380.443238] R10: ffff88013896fd28 R11: 0000000000000000 R12:
>>>>> ffff880222cae000
>>>>> [58380.443238] R13: ffff880222cae850 R14: ffff880222cae6b0 R15:
>>>>> ffff8801446eae00
>>>>> [58380.443238] FS:  0000000000000000(0000) GS:ffff88023ed80000(0000)
>>>>> knlGS:0000000000000000
>>>>> [58380.443238] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>> [58380.443238] CR2: ffff8801446eaeb8 CR3: 0000000001811000 CR4:
>>>>> 00000000000006e0
>>>>> [58380.443238] Stack:
>>>>> [58380.443238]  ffff88013896fe18 ffffffffa03fe2d5 ffff880222cae850
>>>>> ffff880185e16800
>>>>> [58380.443238]  ffff88000dc41c20 0000000000000000 ffff8801a9ca9f00
>>>>> 0000000000000000
>>>>> [58380.443238]  ffff88013896fe80 ffffffffa040fbcf ffff88018b0dcdb0
>>>>> ffff88013ac82090
>>>>> [58380.443238] Call Trace:
>>>>> [58380.443238]  [<ffffffffa03fe2d5>]
>>>>> btrfs_prepare_extent_commit+0x5a/0xd7
>>>>> [btrfs]
>>>>> [58380.443238]  [<ffffffffa040fbcf>]
>>>>> btrfs_commit_transaction+0x45c/0x882
>>>>> [btrfs]
>>>>> [58380.443238]  [<ffffffffa040c058>] transaction_kthread+0xf2/0x1a4
>>>>> [btrfs]
>>>>> [58380.443238]  [<ffffffffa040bf66>] ?
>>>>> btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs]
>>>>> [58380.443238]  [<ffffffff8105966b>] kthread+0xb7/0xbf
>>>>> [58380.443238]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
>>>>> [58380.443238]  [<ffffffff813ebeac>] ret_from_fork+0x7c/0xb0
>>>>> [58380.443238]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
>>>>>
>>>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>>>> ---
>>>>>     fs/btrfs/ctree.h       |  1 +
>>>>>     fs/btrfs/extent-tree.c | 27 +++++++++++++++++++++++++++
>>>>>     2 files changed, 28 insertions(+)
>>>>>
>>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>>>> index d3ccd09..7f40a65 100644
>>>>> --- a/fs/btrfs/ctree.h
>>>>> +++ b/fs/btrfs/ctree.h
>>>>> @@ -1277,6 +1277,7 @@ struct btrfs_block_group_cache {
>>>>>           unsigned int ro:1;
>>>>>           unsigned int dirty:1;
>>>>>           unsigned int iref:1;
>>>>> +       unsigned int has_caching_ctl:1;
>>>>>
>>>>
>>>> Don't do this, just unconditionally call get_caching_control in
>>>> btrfs_remove_block_group, then if we get something we can do stuff,
>>>> otherwise we can just continue.  Thanks,
>>>
>>>
>>> That's what I initially thought too. However, get_caching_control only
>>> returns us the caching_ctl if block_group->cached ==
>>> BTRFS_CACHE_STARTED, so it's not enough to use it exclusively. The
>>> has_caching_ctl flag is just to avoid holding the semaphore and search
>>> through the list (since block_group->caching_ctl can be NULL but a
>>> caching_ctl point to the block group can be in the list).
>>>
>>
>> Oh God that's not good, we need to change get_caching_control to return if
>> there is a caching control at all, since the other users want to wait for
>> the fast caching to finish too.  So change that and then use it
>> unconditionally.  I bet this has been causing us the random early ENOSPC
>> problems.  Thanks,
>
> Right, I think that's a separate change different from what I'm trying to fix.
>
> When caching_thread finishes, it sets the block_group->caching_ctl to
> NULL, and caching_ctl remains in the list until transaction commit
> time. So doing that change, to make get_caching_control() always
> return the block group's ->caching_ctl wouldn't solve this issue - we
> still need to search in the list if block_group->caching_ctl is NULL.
>

/me bangs his head on the desk.  Ok we should probably remove the 
caching ctl from the list when we finish caching so we don't have this 
problem, but I'm fine doing that later.  I'll fix the 
get_caching_control thing here and send a patch.  Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index d3ccd09..7f40a65 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1277,6 +1277,7 @@  struct btrfs_block_group_cache {
 	unsigned int ro:1;
 	unsigned int dirty:1;
 	unsigned int iref:1;
+	unsigned int has_caching_ctl:1;
 
 	int disk_cache_state;
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3ba65d9..b7e40ef 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -607,6 +607,7 @@  static int cache_block_group(struct btrfs_block_group_cache *cache,
 				cache->cached = BTRFS_CACHE_NO;
 			} else {
 				cache->cached = BTRFS_CACHE_STARTED;
+				cache->has_caching_ctl = 1;
 			}
 		}
 		spin_unlock(&cache->lock);
@@ -627,6 +628,7 @@  static int cache_block_group(struct btrfs_block_group_cache *cache,
 			cache->cached = BTRFS_CACHE_NO;
 		} else {
 			cache->cached = BTRFS_CACHE_STARTED;
+			cache->has_caching_ctl = 1;
 		}
 		spin_unlock(&cache->lock);
 		wake_up(&caching_ctl->wait);
@@ -9328,6 +9330,7 @@  int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 	int ret;
 	int index;
 	int factor;
+	struct btrfs_caching_control *caching_ctl = NULL;
 
 	root = root->fs_info->extent_root;
 
@@ -9435,8 +9438,32 @@  int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 		kobject_put(kobj);
 	}
 
+	if (block_group->has_caching_ctl)
+		caching_ctl = get_caching_control(block_group);
 	if (block_group->cached == BTRFS_CACHE_STARTED)
 		wait_block_group_cache_done(block_group);
+	if (block_group->has_caching_ctl) {
+		down_write(&root->fs_info->commit_root_sem);
+		if (!caching_ctl) {
+			struct btrfs_caching_control *ctl;
+
+			list_for_each_entry(ctl,
+				    &root->fs_info->caching_block_groups, list)
+				if (ctl->block_group == block_group) {
+					caching_ctl = ctl;
+					atomic_inc(&caching_ctl->count);
+					break;
+				}
+		}
+		if (caching_ctl)
+			list_del_init(&caching_ctl->list);
+		up_write(&root->fs_info->commit_root_sem);
+		if (caching_ctl) {
+			/* Once for the caching bgs list and once for us. */
+			put_caching_control(caching_ctl);
+			put_caching_control(caching_ctl);
+		}
+	}
 
 	btrfs_remove_free_space_cache(block_group);