[v2] btrfs: handle missing chunk mapping more gracefully

Message ID	9b53f585503429f5c81eeb222f1e2cb8014807f5.1677722020.git.wqu@suse.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@vger.kernel.org> From: Qu Wenruo <wqu@suse.com> To: linux-btrfs@vger.kernel.org Subject: [PATCH v2] btrfs: handle missing chunk mapping more gracefully Date: Thu, 2 Mar 2023 09:54:12 +0800 Message-Id: <9b53f585503429f5c81eeb222f1e2cb8014807f5.1677722020.git.wqu@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[v2] btrfs: handle missing chunk mapping more gracefully \| expand [v2] btrfs: handle missing chunk mapping more gracefully

Qu Wenruo March 2, 2023, 1:54 a.m. UTC

[BUG]
During my scrub rework, I did a stupid thing like this:

        bio->bi_iter.bi_sector = stripe->logical;
        btrfs_submit_bio(fs_info, bio, stripe->mirror_num);

Above bi_sector assignment is using logical address directly, which
lacks ">> SECTOR_SHIFT".

This results a read on a range which has no chunk mapping.

This results the following crash:

 BTRFS critical (device dm-1): unable to find logical 11274289152 length 65536
 assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387
 ------------[ cut here ]------------

Sure this is all my fault, but this shows a possible problem in real
world, that some bitflip in file extents/tree block can point to
unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT
is not configured, cause invalid pointer.

[PROBLEMS]
In above call chain, we just don't handle the possible error from
btrfs_get_chunk_map() inside __btrfs_map_block().

[FIX]
The fix is pretty straightforward, just replace the ASSERT() with proper
error handling.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changelog:
v2:
- Rebased to latest misc-next
  The error path in bio.c is already fixed, thus only need to replace
  the ASSERT() in __btrfs_map_block().
---
 fs/btrfs/volumes.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Anand Jain March 2, 2023, 6:12 a.m. UTC | #1

On 3/2/23 09:54, Qu Wenruo wrote:
> [BUG]
> During my scrub rework, I did a stupid thing like this:
> 
>          bio->bi_iter.bi_sector = stripe->logical;
>          btrfs_submit_bio(fs_info, bio, stripe->mirror_num);
> 
> Above bi_sector assignment is using logical address directly, which
> lacks ">> SECTOR_SHIFT".
> 
> This results a read on a range which has no chunk mapping.
> 
> This results the following crash:
> 
>   BTRFS critical (device dm-1): unable to find logical 11274289152 length 65536
>   assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387
>   ------------[ cut here ]------------
> 
> Sure this is all my fault, but this shows a possible problem in real
> world, that some bitflip in file extents/tree block can point to
> unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT
> is not configured, cause invalid pointer.
> 
> [PROBLEMS]
> In above call chain, we just don't handle the possible error from
> btrfs_get_chunk_map() inside __btrfs_map_block().
> 
> [FIX]
> The fix is pretty straightforward, just replace the ASSERT() with proper
> error handling.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Changelog:
> v2:
> - Rebased to latest misc-next
>    The error path in bio.c is already fixed, thus only need to replace
>    the ASSERT() in __btrfs_map_block().
> ---
>   fs/btrfs/volumes.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 4d479ac233a4..93bc45001e68 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6242,7 +6242,8 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>   		return -EINVAL;
>   
>   	em = btrfs_get_chunk_map(fs_info, logical, *length);
> -	ASSERT(!IS_ERR(em));
> +	if (IS_ERR(em))
> +		return PTR_ERR(em);


Consider adding btrfs_err_rl() here.
Except for scrub_find_good_copy(), the other functions do not report
such errors.
Furthermore, scrub_find_good_copy() lack sufficient details for
effective debugging in the event of an issue.


>   
>   	map = em->map_lookup;
>   	data_stripes = nr_data_stripes(map);

Qu Wenruo March 2, 2023, 6:49 a.m. UTC | #2

On 2023/3/2 14:12, Anand Jain wrote:
> On 3/2/23 09:54, Qu Wenruo wrote:
>> [BUG]
>> During my scrub rework, I did a stupid thing like this:
>>
>>          bio->bi_iter.bi_sector = stripe->logical;
>>          btrfs_submit_bio(fs_info, bio, stripe->mirror_num);
>>
>> Above bi_sector assignment is using logical address directly, which
>> lacks ">> SECTOR_SHIFT".
>>
>> This results a read on a range which has no chunk mapping.
>>
>> This results the following crash:
>>
>>   BTRFS critical (device dm-1): unable to find logical 11274289152 
>> length 65536
>>   assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387
>>   ------------[ cut here ]------------
>>
>> Sure this is all my fault, but this shows a possible problem in real
>> world, that some bitflip in file extents/tree block can point to
>> unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT
>> is not configured, cause invalid pointer.
>>
>> [PROBLEMS]
>> In above call chain, we just don't handle the possible error from
>> btrfs_get_chunk_map() inside __btrfs_map_block().
>>
>> [FIX]
>> The fix is pretty straightforward, just replace the ASSERT() with proper
>> error handling.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> Changelog:
>> v2:
>> - Rebased to latest misc-next
>>    The error path in bio.c is already fixed, thus only need to replace
>>    the ASSERT() in __btrfs_map_block().
>> ---
>>   fs/btrfs/volumes.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 4d479ac233a4..93bc45001e68 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -6242,7 +6242,8 @@ int __btrfs_map_block(struct btrfs_fs_info 
>> *fs_info, enum btrfs_map_op op,
>>           return -EINVAL;
>>       em = btrfs_get_chunk_map(fs_info, logical, *length);
>> -    ASSERT(!IS_ERR(em));
>> +    if (IS_ERR(em))
>> +        return PTR_ERR(em);
> 
> 
> Consider adding btrfs_err_rl() here.
> Except for scrub_find_good_copy(), the other functions do not report
> such errors.
> Furthermore, scrub_find_good_copy() lack sufficient details for
> effective debugging in the event of an issue.

Function btrfs_get_chunk_map() would already output an error message.

Thanks,
Qu
> 
> 
>>       map = em->map_lookup;
>>       data_stripes = nr_data_stripes(map);
> 
> 
> 
> 
> 
>

Anand Jain March 2, 2023, 7:56 a.m. UTC | #3

On 3/2/23 14:49, Qu Wenruo wrote:
> 
> 
> On 2023/3/2 14:12, Anand Jain wrote:
>> On 3/2/23 09:54, Qu Wenruo wrote:
>>> [BUG]
>>> During my scrub rework, I did a stupid thing like this:
>>>
>>>          bio->bi_iter.bi_sector = stripe->logical;
>>>          btrfs_submit_bio(fs_info, bio, stripe->mirror_num);
>>>
>>> Above bi_sector assignment is using logical address directly, which
>>> lacks ">> SECTOR_SHIFT".
>>>
>>> This results a read on a range which has no chunk mapping.
>>>
>>> This results the following crash:
>>>
>>>   BTRFS critical (device dm-1): unable to find logical 11274289152 
>>> length 65536
>>>   assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387
>>>   ------------[ cut here ]------------
>>>
>>> Sure this is all my fault, but this shows a possible problem in real
>>> world, that some bitflip in file extents/tree block can point to
>>> unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT
>>> is not configured, cause invalid pointer.
>>>
>>> [PROBLEMS]
>>> In above call chain, we just don't handle the possible error from
>>> btrfs_get_chunk_map() inside __btrfs_map_block().
>>>
>>> [FIX]
>>> The fix is pretty straightforward, just replace the ASSERT() with proper
>>> error handling.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>> Changelog:
>>> v2:
>>> - Rebased to latest misc-next
>>>    The error path in bio.c is already fixed, thus only need to replace
>>>    the ASSERT() in __btrfs_map_block().
>>> ---
>>>   fs/btrfs/volumes.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 4d479ac233a4..93bc45001e68 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -6242,7 +6242,8 @@ int __btrfs_map_block(struct btrfs_fs_info 
>>> *fs_info, enum btrfs_map_op op,
>>>           return -EINVAL;
>>>       em = btrfs_get_chunk_map(fs_info, logical, *length);
>>> -    ASSERT(!IS_ERR(em));
>>> +    if (IS_ERR(em))
>>> +        return PTR_ERR(em);
>>
>>
>> Consider adding btrfs_err_rl() here.
>> Except for scrub_find_good_copy(), the other functions do not report
>> such errors.
>> Furthermore, scrub_find_good_copy() lack sufficient details for
>> effective debugging in the event of an issue.
> 
> Function btrfs_get_chunk_map() would already output an error message.

Ah. Right..

Reviewed-by: Anand Jain <anand.jain@oracle.com>

David Sterba March 6, 2023, 7:04 p.m. UTC | #4

On Thu, Mar 02, 2023 at 09:54:12AM +0800, Qu Wenruo wrote:
> [BUG]
> During my scrub rework, I did a stupid thing like this:
> 
>         bio->bi_iter.bi_sector = stripe->logical;
>         btrfs_submit_bio(fs_info, bio, stripe->mirror_num);
> 
> Above bi_sector assignment is using logical address directly, which
> lacks ">> SECTOR_SHIFT".
> 
> This results a read on a range which has no chunk mapping.
> 
> This results the following crash:
> 
>  BTRFS critical (device dm-1): unable to find logical 11274289152 length 65536
>  assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387
>  ------------[ cut here ]------------
> 
> Sure this is all my fault, but this shows a possible problem in real
> world, that some bitflip in file extents/tree block can point to
> unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT
> is not configured, cause invalid pointer.
> 
> [PROBLEMS]
> In above call chain, we just don't handle the possible error from
> btrfs_get_chunk_map() inside __btrfs_map_block().
> 
> [FIX]
> The fix is pretty straightforward, just replace the ASSERT() with proper
> error handling.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Added to misc-next, thanks.

Celeste Liu Feb. 10, 2024, 2:34 p.m. UTC | #5

On 3/2/23 09:54, Qu Wenruo wrote:
> [BUG]
> During my scrub rework, I did a stupid thing like this:
> 
>          bio->bi_iter.bi_sector = stripe->logical;
>          btrfs_submit_bio(fs_info, bio, stripe->mirror_num);
> 
> Above bi_sector assignment is using logical address directly, which
> lacks ">> SECTOR_SHIFT".
> 
> This results a read on a range which has no chunk mapping.
> 
> This results the following crash:
> 
>   BTRFS critical (device dm-1): unable to find logical 11274289152 length 65536
>   assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387
>   ------------[ cut here ]------------
> 
> Sure this is all my fault, but this shows a possible problem in real
> world, that some bitflip in file extents/tree block can point to
> unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT
> is not configured, cause invalid pointer.
> 
> [PROBLEMS]
> In above call chain, we just don't handle the possible error from
> btrfs_get_chunk_map() inside __btrfs_map_block().
> 
> [FIX]
> The fix is pretty straightforward, just replace the ASSERT() with proper
> error handling.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Changelog:
> v2:
> - Rebased to latest misc-next
>    The error path in bio.c is already fixed, thus only need to replace
>    the ASSERT() in __btrfs_map_block().
> ---
>   fs/btrfs/volumes.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 4d479ac233a4..93bc45001e68 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6242,7 +6242,8 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>   		return -EINVAL;
>   
>   	em = btrfs_get_chunk_map(fs_info, logical, *length);
> -	ASSERT(!IS_ERR(em));
> +	if (IS_ERR(em))
> +		return PTR_ERR(em);
>   
>   	map = em->map_lookup;
>   	data_stripes = nr_data_stripes(map);

This bug affects 6.1.y LTS branch but no backport commit of this fix in
6.1.y branch. Please include it. Thanks.

Qu Wenruo Feb. 10, 2024, 7:58 p.m. UTC | #6

On 2024/2/11 01:04, Celeste Liu wrote:
> On 3/2/23 09:54, Qu Wenruo wrote:
>> [BUG]
>> During my scrub rework, I did a stupid thing like this:
>>
>>           bio->bi_iter.bi_sector = stripe->logical;
>>           btrfs_submit_bio(fs_info, bio, stripe->mirror_num);
>>
>> Above bi_sector assignment is using logical address directly, which
>> lacks ">> SECTOR_SHIFT".
>>
>> This results a read on a range which has no chunk mapping.
>>
>> This results the following crash:
>>
>>    BTRFS critical (device dm-1): unable to find logical 11274289152 length 65536
>>    assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387
>>    ------------[ cut here ]------------
>>
>> Sure this is all my fault, but this shows a possible problem in real
>> world, that some bitflip in file extents/tree block can point to
>> unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT
>> is not configured, cause invalid pointer.
>>
>> [PROBLEMS]
>> In above call chain, we just don't handle the possible error from
>> btrfs_get_chunk_map() inside __btrfs_map_block().
>>
>> [FIX]
>> The fix is pretty straightforward, just replace the ASSERT() with proper
>> error handling.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> Changelog:
>> v2:
>> - Rebased to latest misc-next
>>     The error path in bio.c is already fixed, thus only need to replace
>>     the ASSERT() in __btrfs_map_block().
>> ---
>>    fs/btrfs/volumes.c | 3 ++-
>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 4d479ac233a4..93bc45001e68 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -6242,7 +6242,8 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>>    		return -EINVAL;
>>
>>    	em = btrfs_get_chunk_map(fs_info, logical, *length);
>> -	ASSERT(!IS_ERR(em));
>> +	if (IS_ERR(em))
>> +		return PTR_ERR(em);
>>
>>    	map = em->map_lookup;
>>    	data_stripes = nr_data_stripes(map);
>
> This bug affects 6.1.y LTS branch but no backport commit of this fix in
> 6.1.y branch. Please include it. Thanks.
>
Just want to make sure, are you hitting the ASSERT()?

If so, please provide the calltrace and btrfs-check output to pin down
the cause.
Just backporting the patch is only to make it a little more graceful,
but won't solve the root problem.

Thanks,
Qu

Celeste Liu Feb. 11, 2024, 3:13 a.m. UTC | #7

On 2024-02-11 03:58, Qu Wenruo wrote:

> 
> 
> On 2024/2/11 01:04, Celeste Liu wrote:
>> On 3/2/23 09:54, Qu Wenruo wrote:
>>> [BUG]
>>> During my scrub rework, I did a stupid thing like this:
>>>
>>>           bio->bi_iter.bi_sector = stripe->logical;
>>>           btrfs_submit_bio(fs_info, bio, stripe->mirror_num);
>>>
>>> Above bi_sector assignment is using logical address directly, which
>>> lacks ">> SECTOR_SHIFT".
>>>
>>> This results a read on a range which has no chunk mapping.
>>>
>>> This results the following crash:
>>>
>>>    BTRFS critical (device dm-1): unable to find logical 11274289152 length 65536
>>>    assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387
>>>    ------------[ cut here ]------------
>>>
>>> Sure this is all my fault, but this shows a possible problem in real
>>> world, that some bitflip in file extents/tree block can point to
>>> unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT
>>> is not configured, cause invalid pointer.
>>>
>>> [PROBLEMS]
>>> In above call chain, we just don't handle the possible error from
>>> btrfs_get_chunk_map() inside __btrfs_map_block().
>>>
>>> [FIX]
>>> The fix is pretty straightforward, just replace the ASSERT() with proper
>>> error handling.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>> Changelog:
>>> v2:
>>> - Rebased to latest misc-next
>>>     The error path in bio.c is already fixed, thus only need to replace
>>>     the ASSERT() in __btrfs_map_block().
>>> ---
>>>    fs/btrfs/volumes.c | 3 ++-
>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 4d479ac233a4..93bc45001e68 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -6242,7 +6242,8 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>>>            return -EINVAL;
>>>
>>>        em = btrfs_get_chunk_map(fs_info, logical, *length);
>>> -    ASSERT(!IS_ERR(em));
>>> +    if (IS_ERR(em))
>>> +        return PTR_ERR(em);
>>>
>>>        map = em->map_lookup;
>>>        data_stripes = nr_data_stripes(map);
>>
>> This bug affects 6.1.y LTS branch but no backport commit of this fix in
>> 6.1.y branch. Please include it. Thanks.
>>
> Just want to make sure, are you hitting the ASSERT()?

No, in non-debug build, ASSERT() is noop. So em will be a pointer whose
value is errno and there is no any error handle for it. And then it
trigger kernel NULL Pointer Dereference exception in btrfs_get_io_geometry()
with em->map_lookup (Of course, it's not 0 but 0x5a
(-EINVAL + offsetof(map_lookup)). But it still is a illegal memory access),
this kernel thread was killed but the lock hold by this thread are not
released. After that, all fs operations was block by the lock.

> 
> If so, please provide the calltrace and btrfs-check output to pin down
> the cause.

It doesn't happen on my machine, I've notified the finder to send a
backtrace to the mailing list

> Just backporting the patch is only to make it a little more graceful,
> but won't solve the root problem.

No. In non debug build, ASSERT() is noop so it lead that there is no any
error handle code. There is no RAII/SBRM, unexpected exit will lead the
resources will not be released correctly. If it has unique holder (e.g. mutex),
it will break all other code that need this resource!

> 
> Thanks,
> Qu

Qu Wenruo Feb. 11, 2024, 4:22 a.m. UTC | #8

On 2024/2/11 13:43, Celeste Liu wrote:
> On 2024-02-11 03:58, Qu Wenruo wrote:
>
>>
>>
>> On 2024/2/11 01:04, Celeste Liu wrote:
>>> On 3/2/23 09:54, Qu Wenruo wrote:
>>>> [BUG]
>>>> During my scrub rework, I did a stupid thing like this:
>>>>
>>>>            bio->bi_iter.bi_sector = stripe->logical;
>>>>            btrfs_submit_bio(fs_info, bio, stripe->mirror_num);
>>>>
>>>> Above bi_sector assignment is using logical address directly, which
>>>> lacks ">> SECTOR_SHIFT".
>>>>
>>>> This results a read on a range which has no chunk mapping.
>>>>
>>>> This results the following crash:
>>>>
>>>>     BTRFS critical (device dm-1): unable to find logical 11274289152 length 65536
>>>>     assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387
>>>>     ------------[ cut here ]------------
>>>>
>>>> Sure this is all my fault, but this shows a possible problem in real
>>>> world, that some bitflip in file extents/tree block can point to
>>>> unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT
>>>> is not configured, cause invalid pointer.
>>>>
>>>> [PROBLEMS]
>>>> In above call chain, we just don't handle the possible error from
>>>> btrfs_get_chunk_map() inside __btrfs_map_block().
>>>>
>>>> [FIX]
>>>> The fix is pretty straightforward, just replace the ASSERT() with proper
>>>> error handling.
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>> Changelog:
>>>> v2:
>>>> - Rebased to latest misc-next
>>>>      The error path in bio.c is already fixed, thus only need to replace
>>>>      the ASSERT() in __btrfs_map_block().
>>>> ---
>>>>     fs/btrfs/volumes.c | 3 ++-
>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>> index 4d479ac233a4..93bc45001e68 100644
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -6242,7 +6242,8 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>>>>             return -EINVAL;
>>>>
>>>>         em = btrfs_get_chunk_map(fs_info, logical, *length);
>>>> -    ASSERT(!IS_ERR(em));
>>>> +    if (IS_ERR(em))
>>>> +        return PTR_ERR(em);
>>>>
>>>>         map = em->map_lookup;
>>>>         data_stripes = nr_data_stripes(map);
>>>
>>> This bug affects 6.1.y LTS branch but no backport commit of this fix in
>>> 6.1.y branch. Please include it. Thanks.
>>>
>> Just want to make sure, are you hitting the ASSERT()?
>
> No, in non-debug build, ASSERT() is noop. So em will be a pointer whose
> value is errno and there is no any error handle for it. And then it
> trigger kernel NULL Pointer Dereference exception in btrfs_get_io_geometry()
> with em->map_lookup (Of course, it's not 0 but 0x5a
> (-EINVAL + offsetof(map_lookup)). But it still is a illegal memory access),
> this kernel thread was killed but the lock hold by this thread are not
> released. After that, all fs operations was block by the lock.

I know, what I mean "hitting the ASSERT()" includes the cases where
CONFIG_BTRFS_ASSERT is not enabled.
As it would lead to an obvious invalid memory access immediately.

And lock/resources thing is the last thing you need to bother, as long
as the kernel fs module triggered invalid memory access inside kernel
space, it's busted already.

>
>>
>> If so, please provide the calltrace and btrfs-check output to pin down
>> the cause.
>
> It doesn't happen on my machine, I've notified the finder to send a
> backtrace to the mailing list

It's better to allow the reporter to send needed info directly to the ML.

Trust me, working with a man in the middle is only going to slow down
diagnosis and fix.

And don't forget "btrfs check --readonly", as on-disk corrupted is also
a very possible reason to lead IO out of chunk mapped ranges.

>
>> Just backporting the patch is only to make it a little more graceful,
>> but won't solve the root problem.
>
> No. In non debug build, ASSERT() is noop so it lead that there is no any
> error handle code. There is no RAII/SBRM, unexpected exit will lead the
> resources will not be released correctly. If it has unique holder (e.g. mutex),
> it will break all other code that need this resource!

Sure, but even with a backport, it would still cause (although graceful)
errors when such IO is triggered.

Hitting the ASSERT() is only a symptom, not the root cause.

I can definitely do a backport immediately, but I'm more interested in
the root cause.
What if it's caused by another hidden bug?

Thanks,
Qu
>
>>
>> Thanks,
>> Qu
>
>

[v2] btrfs: handle missing chunk mapping more gracefully

Commit Message

Comments

Patch