diff mbox

Btrfs: wait for ordered extents before removing extent maps

Message ID 1387032218-29998-1-git-send-email-fdmanana@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Filipe Manana Dec. 14, 2013, 2:43 p.m. UTC
Wang Shilong got into a case where during inode eviction we were
removing an extent map while it was pinned. This triggered a warning
in remove_extent_mapping() because the extent map had the pinned
flag set:

[ 1209.102076]  [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs]
[ 1209.102084]  [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs]
[ 1209.102089]  [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40
[ 1209.102092]  [<ffffffff8118ab2e>] evict+0x9e/0x190
[ 1209.102094]  [<ffffffff8118b313>] iput+0xf3/0x180
[ 1209.102101]  [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs]
[ 1209.102107]  [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs]

Therefore wait for any pending ordered extents, if any, which will
trigger calls to unpin_extent_cache(), before removing the extent maps.

Wang's solution of simply clearing the pinned bit wasn't enough, as after
unpin_extent_cache() will be called and trigger another WARN_ON() because
the lookup for the extent map returned NULL.

Thanks Wang for finding out this.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
---
 fs/btrfs/inode.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Wang Shilong Dec. 14, 2013, 2:56 p.m. UTC | #1
Hello Filipe,

2013/12/14 Filipe David Borba Manana <fdmanana@gmail.com>:
> Wang Shilong got into a case where during inode eviction we were
> removing an extent map while it was pinned. This triggered a warning
> in remove_extent_mapping() because the extent map had the pinned
> flag set:
>
> [ 1209.102076]  [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs]
> [ 1209.102084]  [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs]
> [ 1209.102089]  [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40
> [ 1209.102092]  [<ffffffff8118ab2e>] evict+0x9e/0x190
> [ 1209.102094]  [<ffffffff8118b313>] iput+0xf3/0x180
> [ 1209.102101]  [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs]
> [ 1209.102107]  [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs]
>
> Therefore wait for any pending ordered extents, if any, which will
> trigger calls to unpin_extent_cache(), before removing the extent maps.
>
> Wang's solution of simply clearing the pinned bit wasn't enough, as after
> unpin_extent_cache() will be called and trigger another WARN_ON() because
> the lookup for the extent map returned NULL.

Why not in evict_inode_truncate_pages() move remove_extent_mapping() after
clear_extent_bit()?

Thanks,
Wang
>
> Thanks Wang for finding out this.
>
> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
> ---
>  fs/btrfs/inode.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index e889779..c2933fb 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -4509,6 +4509,9 @@ static void evict_inode_truncate_pages(struct inode *inode)
>         ASSERT(inode->i_state & I_FREEING);
>         truncate_inode_pages(&inode->i_data, 0);
>
> +       /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
> +       btrfs_wait_ordered_range(inode, 0, (u64)-1);
> +
>         write_lock(&map_tree->lock);
>         while (!RB_EMPTY_ROOT(&map_tree->map)) {
>                 struct extent_map *em;
> @@ -4566,8 +4569,6 @@ void btrfs_evict_inode(struct inode *inode)
>                 btrfs_orphan_del(NULL, inode);
>                 goto no_delete;
>         }
> -       /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
> -       btrfs_wait_ordered_range(inode, 0, (u64)-1);
>
>         if (root->fs_info->log_root_recovering) {
>                 BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Filipe Manana Dec. 14, 2013, 3:01 p.m. UTC | #2
On Sat, Dec 14, 2013 at 2:56 PM, Shilong Wang <wangshilong1991@gmail.com> wrote:
> Hello Filipe,
>
> 2013/12/14 Filipe David Borba Manana <fdmanana@gmail.com>:
>> Wang Shilong got into a case where during inode eviction we were
>> removing an extent map while it was pinned. This triggered a warning
>> in remove_extent_mapping() because the extent map had the pinned
>> flag set:
>>
>> [ 1209.102076]  [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs]
>> [ 1209.102084]  [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs]
>> [ 1209.102089]  [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40
>> [ 1209.102092]  [<ffffffff8118ab2e>] evict+0x9e/0x190
>> [ 1209.102094]  [<ffffffff8118b313>] iput+0xf3/0x180
>> [ 1209.102101]  [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs]
>> [ 1209.102107]  [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs]
>>
>> Therefore wait for any pending ordered extents, if any, which will
>> trigger calls to unpin_extent_cache(), before removing the extent maps.
>>
>> Wang's solution of simply clearing the pinned bit wasn't enough, as after
>> unpin_extent_cache() will be called and trigger another WARN_ON() because
>> the lookup for the extent map returned NULL.
>
> Why not in evict_inode_truncate_pages() move remove_extent_mapping() after
> clear_extent_bit()?

So, if the pinned bit is set, it means some task will clear it later,
via unpin_extent_cache(). And if you look at that function, it has
this:

write_lock(&tree->lock);
em = lookup_extent_mapping(tree, start, len);

WARN_ON(!em || em->start != start);

And remove_extent_mapping() will remove the em from the rbtree,
regardless of its reference count value, therefore triggering that
warning above.

Does it makes sense?

thanks

>
> Thanks,
> Wang
>>
>> Thanks Wang for finding out this.
>>
>> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
>> ---
>>  fs/btrfs/inode.c |    5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index e889779..c2933fb 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -4509,6 +4509,9 @@ static void evict_inode_truncate_pages(struct inode *inode)
>>         ASSERT(inode->i_state & I_FREEING);
>>         truncate_inode_pages(&inode->i_data, 0);
>>
>> +       /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
>> +       btrfs_wait_ordered_range(inode, 0, (u64)-1);
>> +
>>         write_lock(&map_tree->lock);
>>         while (!RB_EMPTY_ROOT(&map_tree->map)) {
>>                 struct extent_map *em;
>> @@ -4566,8 +4569,6 @@ void btrfs_evict_inode(struct inode *inode)
>>                 btrfs_orphan_del(NULL, inode);
>>                 goto no_delete;
>>         }
>> -       /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
>> -       btrfs_wait_ordered_range(inode, 0, (u64)-1);
>>
>>         if (root->fs_info->log_root_recovering) {
>>                 BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
>> --
>> 1.7.9.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wang Shilong Dec. 14, 2013, 3:08 p.m. UTC | #3
2013/12/14 Filipe David Manana <fdmanana@gmail.com>:
> On Sat, Dec 14, 2013 at 2:56 PM, Shilong Wang <wangshilong1991@gmail.com> wrote:
>> Hello Filipe,
>>
>> 2013/12/14 Filipe David Borba Manana <fdmanana@gmail.com>:
>>> Wang Shilong got into a case where during inode eviction we were
>>> removing an extent map while it was pinned. This triggered a warning
>>> in remove_extent_mapping() because the extent map had the pinned
>>> flag set:
>>>
>>> [ 1209.102076]  [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs]
>>> [ 1209.102084]  [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs]
>>> [ 1209.102089]  [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40
>>> [ 1209.102092]  [<ffffffff8118ab2e>] evict+0x9e/0x190
>>> [ 1209.102094]  [<ffffffff8118b313>] iput+0xf3/0x180
>>> [ 1209.102101]  [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs]
>>> [ 1209.102107]  [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs]
>>>
>>> Therefore wait for any pending ordered extents, if any, which will
>>> trigger calls to unpin_extent_cache(), before removing the extent maps.
>>>
>>> Wang's solution of simply clearing the pinned bit wasn't enough, as after
>>> unpin_extent_cache() will be called and trigger another WARN_ON() because
>>> the lookup for the extent map returned NULL.
>>
>> Why not in evict_inode_truncate_pages() move remove_extent_mapping() after
>> clear_extent_bit()?
>
> So, if the pinned bit is set, it means some task will clear it later,
> via unpin_extent_cache(). And if you look at that function, it has
> this:
>
> write_lock(&tree->lock);
> em = lookup_extent_mapping(tree, start, len);
>
> WARN_ON(!em || em->start != start);
>
> And remove_extent_mapping() will remove the em from the rbtree,
> regardless of its reference count value, therefore triggering that
> warning above.

Here i mean,  in evict_inode_truncate_pages()
We change it to:

Step1:  unpin_extent_cache()
Step2:  remove it from extent_mapping

Dose this cause any problems? i am a little confused, correct me if i
am wrong some places^_^.


>
> Does it makes sense?
>
> thanks
>
>>
>> Thanks,
>> Wang
>>>
>>> Thanks Wang for finding out this.
>>>
>>> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
>>> ---
>>>  fs/btrfs/inode.c |    5 +++--
>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index e889779..c2933fb 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -4509,6 +4509,9 @@ static void evict_inode_truncate_pages(struct inode *inode)
>>>         ASSERT(inode->i_state & I_FREEING);
>>>         truncate_inode_pages(&inode->i_data, 0);
>>>
>>> +       /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
>>> +       btrfs_wait_ordered_range(inode, 0, (u64)-1);
>>> +
>>>         write_lock(&map_tree->lock);
>>>         while (!RB_EMPTY_ROOT(&map_tree->map)) {
>>>                 struct extent_map *em;
>>> @@ -4566,8 +4569,6 @@ void btrfs_evict_inode(struct inode *inode)
>>>                 btrfs_orphan_del(NULL, inode);
>>>                 goto no_delete;
>>>         }
>>> -       /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
>>> -       btrfs_wait_ordered_range(inode, 0, (u64)-1);
>>>
>>>         if (root->fs_info->log_root_recovering) {
>>>                 BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
>>> --
>>> 1.7.9.5
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Filipe Manana Dec. 14, 2013, 3:13 p.m. UTC | #4
On Sat, Dec 14, 2013 at 3:08 PM, Shilong Wang <wangshilong1991@gmail.com> wrote:
> 2013/12/14 Filipe David Manana <fdmanana@gmail.com>:
>> On Sat, Dec 14, 2013 at 2:56 PM, Shilong Wang <wangshilong1991@gmail.com> wrote:
>>> Hello Filipe,
>>>
>>> 2013/12/14 Filipe David Borba Manana <fdmanana@gmail.com>:
>>>> Wang Shilong got into a case where during inode eviction we were
>>>> removing an extent map while it was pinned. This triggered a warning
>>>> in remove_extent_mapping() because the extent map had the pinned
>>>> flag set:
>>>>
>>>> [ 1209.102076]  [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs]
>>>> [ 1209.102084]  [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs]
>>>> [ 1209.102089]  [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40
>>>> [ 1209.102092]  [<ffffffff8118ab2e>] evict+0x9e/0x190
>>>> [ 1209.102094]  [<ffffffff8118b313>] iput+0xf3/0x180
>>>> [ 1209.102101]  [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs]
>>>> [ 1209.102107]  [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs]
>>>>
>>>> Therefore wait for any pending ordered extents, if any, which will
>>>> trigger calls to unpin_extent_cache(), before removing the extent maps.
>>>>
>>>> Wang's solution of simply clearing the pinned bit wasn't enough, as after
>>>> unpin_extent_cache() will be called and trigger another WARN_ON() because
>>>> the lookup for the extent map returned NULL.
>>>
>>> Why not in evict_inode_truncate_pages() move remove_extent_mapping() after
>>> clear_extent_bit()?
>>
>> So, if the pinned bit is set, it means some task will clear it later,
>> via unpin_extent_cache(). And if you look at that function, it has
>> this:
>>
>> write_lock(&tree->lock);
>> em = lookup_extent_mapping(tree, start, len);
>>
>> WARN_ON(!em || em->start != start);
>>
>> And remove_extent_mapping() will remove the em from the rbtree,
>> regardless of its reference count value, therefore triggering that
>> warning above.
>
> Here i mean,  in evict_inode_truncate_pages()
> We change it to:
>
> Step1:  unpin_extent_cache()
> Step2:  remove it from extent_mapping
>
> Dose this cause any problems? i am a little confused, correct me if i
> am wrong some places^_^.

It can still lead to the same WARN_ON I think. So when calling
unpin_extent_cache(), it can merge the em with its left neighbor,
therefore changing its ->start value. So later, if other task (the one
which set the pinned flag) calls remove_extent_mapping(), it will get
an em with a different ->start (because of the merge), therefore
triggering that WARN_ON().

What do you think?

thanks

>
>
>>
>> Does it makes sense?
>>
>> thanks
>>
>>>
>>> Thanks,
>>> Wang
>>>>
>>>> Thanks Wang for finding out this.
>>>>
>>>> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
>>>> ---
>>>>  fs/btrfs/inode.c |    5 +++--
>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>>> index e889779..c2933fb 100644
>>>> --- a/fs/btrfs/inode.c
>>>> +++ b/fs/btrfs/inode.c
>>>> @@ -4509,6 +4509,9 @@ static void evict_inode_truncate_pages(struct inode *inode)
>>>>         ASSERT(inode->i_state & I_FREEING);
>>>>         truncate_inode_pages(&inode->i_data, 0);
>>>>
>>>> +       /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
>>>> +       btrfs_wait_ordered_range(inode, 0, (u64)-1);
>>>> +
>>>>         write_lock(&map_tree->lock);
>>>>         while (!RB_EMPTY_ROOT(&map_tree->map)) {
>>>>                 struct extent_map *em;
>>>> @@ -4566,8 +4569,6 @@ void btrfs_evict_inode(struct inode *inode)
>>>>                 btrfs_orphan_del(NULL, inode);
>>>>                 goto no_delete;
>>>>         }
>>>> -       /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
>>>> -       btrfs_wait_ordered_range(inode, 0, (u64)-1);
>>>>
>>>>         if (root->fs_info->log_root_recovering) {
>>>>                 BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
>>>> --
>>>> 1.7.9.5
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Filipe David Manana,
>>
>> "Reasonable men adapt themselves to the world.
>>  Unreasonable men adapt the world to themselves.
>>  That's why all progress depends on unreasonable men."
Filipe Manana Dec. 14, 2013, 3:51 p.m. UTC | #5
On Sat, Dec 14, 2013 at 3:13 PM, Filipe David Manana <fdmanana@gmail.com> wrote:
> On Sat, Dec 14, 2013 at 3:08 PM, Shilong Wang <wangshilong1991@gmail.com> wrote:
>> 2013/12/14 Filipe David Manana <fdmanana@gmail.com>:
>>> On Sat, Dec 14, 2013 at 2:56 PM, Shilong Wang <wangshilong1991@gmail.com> wrote:
>>>> Hello Filipe,
>>>>
>>>> 2013/12/14 Filipe David Borba Manana <fdmanana@gmail.com>:
>>>>> Wang Shilong got into a case where during inode eviction we were
>>>>> removing an extent map while it was pinned. This triggered a warning
>>>>> in remove_extent_mapping() because the extent map had the pinned
>>>>> flag set:
>>>>>
>>>>> [ 1209.102076]  [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs]
>>>>> [ 1209.102084]  [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs]
>>>>> [ 1209.102089]  [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40
>>>>> [ 1209.102092]  [<ffffffff8118ab2e>] evict+0x9e/0x190
>>>>> [ 1209.102094]  [<ffffffff8118b313>] iput+0xf3/0x180
>>>>> [ 1209.102101]  [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs]
>>>>> [ 1209.102107]  [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs]
>>>>>
>>>>> Therefore wait for any pending ordered extents, if any, which will
>>>>> trigger calls to unpin_extent_cache(), before removing the extent maps.
>>>>>
>>>>> Wang's solution of simply clearing the pinned bit wasn't enough, as after
>>>>> unpin_extent_cache() will be called and trigger another WARN_ON() because
>>>>> the lookup for the extent map returned NULL.
>>>>
>>>> Why not in evict_inode_truncate_pages() move remove_extent_mapping() after
>>>> clear_extent_bit()?
>>>
>>> So, if the pinned bit is set, it means some task will clear it later,
>>> via unpin_extent_cache(). And if you look at that function, it has
>>> this:
>>>
>>> write_lock(&tree->lock);
>>> em = lookup_extent_mapping(tree, start, len);
>>>
>>> WARN_ON(!em || em->start != start);
>>>
>>> And remove_extent_mapping() will remove the em from the rbtree,
>>> regardless of its reference count value, therefore triggering that
>>> warning above.
>>
>> Here i mean,  in evict_inode_truncate_pages()
>> We change it to:
>>
>> Step1:  unpin_extent_cache()
>> Step2:  remove it from extent_mapping
>>
>> Dose this cause any problems? i am a little confused, correct me if i
>> am wrong some places^_^.
>
> It can still lead to the same WARN_ON I think. So when calling
> unpin_extent_cache(), it can merge the em with its left neighbor,
> therefore changing its ->start value. So later, if other task (the one
> which set the pinned flag) calls remove_extent_mapping(), it will get
> an em with a different ->start (because of the merge), therefore
> triggering that WARN_ON().

Or because it is not found the second time.

On the other hand, you didn't get such WARN_ON triggered, right?

So maybe just clearing the pinned bit is ok. So btrfs_invalidatepage,
if it finds an ordered extent, it sets the BTRFS_ORDERED_TRUNCATED
flag on it, and then it might call btrfs_finish_ordered_io() against
it, which not always unpins the extent when it has the truncated flag
set. So this might well be what you ran into.

I'm ok with your approach too.

thanks


>
> What do you think?
>
> thanks
>
>>
>>
>>>
>>> Does it makes sense?
>>>
>>> thanks
>>>
>>>>
>>>> Thanks,
>>>> Wang
>>>>>
>>>>> Thanks Wang for finding out this.
>>>>>
>>>>> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
>>>>> ---
>>>>>  fs/btrfs/inode.c |    5 +++--
>>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>>>> index e889779..c2933fb 100644
>>>>> --- a/fs/btrfs/inode.c
>>>>> +++ b/fs/btrfs/inode.c
>>>>> @@ -4509,6 +4509,9 @@ static void evict_inode_truncate_pages(struct inode *inode)
>>>>>         ASSERT(inode->i_state & I_FREEING);
>>>>>         truncate_inode_pages(&inode->i_data, 0);
>>>>>
>>>>> +       /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
>>>>> +       btrfs_wait_ordered_range(inode, 0, (u64)-1);
>>>>> +
>>>>>         write_lock(&map_tree->lock);
>>>>>         while (!RB_EMPTY_ROOT(&map_tree->map)) {
>>>>>                 struct extent_map *em;
>>>>> @@ -4566,8 +4569,6 @@ void btrfs_evict_inode(struct inode *inode)
>>>>>                 btrfs_orphan_del(NULL, inode);
>>>>>                 goto no_delete;
>>>>>         }
>>>>> -       /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
>>>>> -       btrfs_wait_ordered_range(inode, 0, (u64)-1);
>>>>>
>>>>>         if (root->fs_info->log_root_recovering) {
>>>>>                 BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
>>>>> --
>>>>> 1.7.9.5
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>> --
>>> Filipe David Manana,
>>>
>>> "Reasonable men adapt themselves to the world.
>>>  Unreasonable men adapt the world to themselves.
>>>  That's why all progress depends on unreasonable men."
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."
Wang Shilong Dec. 15, 2013, 3:34 a.m. UTC | #6
2013/12/14 Filipe David Manana <fdmanana@gmail.com>:
> On Sat, Dec 14, 2013 at 3:13 PM, Filipe David Manana <fdmanana@gmail.com> wrote:
>> On Sat, Dec 14, 2013 at 3:08 PM, Shilong Wang <wangshilong1991@gmail.com> wrote:
>>> 2013/12/14 Filipe David Manana <fdmanana@gmail.com>:
>>>> On Sat, Dec 14, 2013 at 2:56 PM, Shilong Wang <wangshilong1991@gmail.com> wrote:
>>>>> Hello Filipe,
>>>>>
>>>>> 2013/12/14 Filipe David Borba Manana <fdmanana@gmail.com>:
>>>>>> Wang Shilong got into a case where during inode eviction we were
>>>>>> removing an extent map while it was pinned. This triggered a warning
>>>>>> in remove_extent_mapping() because the extent map had the pinned
>>>>>> flag set:
>>>>>>
>>>>>> [ 1209.102076]  [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs]
>>>>>> [ 1209.102084]  [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs]
>>>>>> [ 1209.102089]  [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40
>>>>>> [ 1209.102092]  [<ffffffff8118ab2e>] evict+0x9e/0x190
>>>>>> [ 1209.102094]  [<ffffffff8118b313>] iput+0xf3/0x180
>>>>>> [ 1209.102101]  [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs]
>>>>>> [ 1209.102107]  [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs]
>>>>>>
>>>>>> Therefore wait for any pending ordered extents, if any, which will
>>>>>> trigger calls to unpin_extent_cache(), before removing the extent maps.
>>>>>>
>>>>>> Wang's solution of simply clearing the pinned bit wasn't enough, as after
>>>>>> unpin_extent_cache() will be called and trigger another WARN_ON() because
>>>>>> the lookup for the extent map returned NULL.
>>>>>
>>>>> Why not in evict_inode_truncate_pages() move remove_extent_mapping() after
>>>>> clear_extent_bit()?
>>>>
>>>> So, if the pinned bit is set, it means some task will clear it later,
>>>> via unpin_extent_cache(). And if you look at that function, it has
>>>> this:
>>>>
>>>> write_lock(&tree->lock);
>>>> em = lookup_extent_mapping(tree, start, len);
>>>>
>>>> WARN_ON(!em || em->start != start);
>>>>
>>>> And remove_extent_mapping() will remove the em from the rbtree,
>>>> regardless of its reference count value, therefore triggering that
>>>> warning above.
>>>
>>> Here i mean,  in evict_inode_truncate_pages()
>>> We change it to:
>>>
>>> Step1:  unpin_extent_cache()
>>> Step2:  remove it from extent_mapping
>>>
>>> Dose this cause any problems? i am a little confused, correct me if i
>>> am wrong some places^_^.
>>
>> It can still lead to the same WARN_ON I think. So when calling
>> unpin_extent_cache(), it can merge the em with its left neighbor,
>> therefore changing its ->start value. So later, if other task (the one
>> which set the pinned flag) calls remove_extent_mapping(), it will get
>> an em with a different ->start (because of the merge), therefore
>> triggering that WARN_ON().
>
> Or because it is not found the second time.
>
> On the other hand, you didn't get such WARN_ON triggered, right?

During my test, i did not hit another WARN_ON in fact.

>
> So maybe just clearing the pinned bit is ok. So btrfs_invalidatepage,
> if it finds an ordered extent, it sets the BTRFS_ORDERED_TRUNCATED
> flag on it, and then it might call btrfs_finish_ordered_io() against
> it, which not always unpins the extent when it has the truncated flag
> set. So this might well be what you ran into.

Let's dig more.

I take a deep look in btrfs_finish_ordered_io(), it won't unset PINNED flag
For an NOCOW extent.

2579         if (test_bit(BTRFS_ORDERED_NOCOW,
&ordered_extent->flags)) {
2580                 BUG_ON(!list_empty(&ordered_extent->list)); /*
Logic error */
2581                 btrfs_ordered_update_i_size(inode, 0, ordered_extent);
2582                 if (nolock)
2583                         trans = btrfs_join_transaction_nolock(root);
2584                 else
2585                         trans = btrfs_join_transaction(root);
2586                 if (IS_ERR(trans)) {
2587                         ret = PTR_ERR(trans);
2588                         trans = NULL;
2589                         goto out;
2590                 }
2591                 trans->block_rsv = &root->fs_info->delalloc_block_rsv;
2592                 ret = btrfs_update_inode_fallback(trans, root, inode);
2593                 if (ret) /* -ENOMEM or corruption */
2594                         btrfs_abort_transaction(trans, root, ret);
2595                 goto out;
------------------------->here we goto out directly,
unpin_extent_cache() won't be called.
2596         }
I don't know why we do something like this...


Previously, i unset PINNED flag directly is a lazy approach, that is
because i think it
dosen't hurt to do so, because we are going to evict that inode, and after
btrfs_invalidatepages() is called, nobody should can still access those pages.


Besides, i think we don't need call btrfs_wait_ordered_extents() in
evicting inode here..
(Expecially after your patch that we remove extent map cache)....


What do you think ^_^


>
> I'm ok with your approach too.
>
> thanks
>
>
>>
>> What do you think?
>>
>> thanks
>>
>>>
>>>
>>>>
>>>> Does it makes sense?
>>>>
>>>> thanks
>>>>
>>>>>
>>>>> Thanks,
>>>>> Wang
>>>>>>
>>>>>> Thanks Wang for finding out this.
>>>>>>
>>>>>> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
>>>>>> ---
>>>>>>  fs/btrfs/inode.c |    5 +++--
>>>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>>>>> index e889779..c2933fb 100644
>>>>>> --- a/fs/btrfs/inode.c
>>>>>> +++ b/fs/btrfs/inode.c
>>>>>> @@ -4509,6 +4509,9 @@ static void evict_inode_truncate_pages(struct inode *inode)
>>>>>>         ASSERT(inode->i_state & I_FREEING);
>>>>>>         truncate_inode_pages(&inode->i_data, 0);
>>>>>>
>>>>>> +       /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
>>>>>> +       btrfs_wait_ordered_range(inode, 0, (u64)-1);
>>>>>> +
>>>>>>         write_lock(&map_tree->lock);
>>>>>>         while (!RB_EMPTY_ROOT(&map_tree->map)) {
>>>>>>                 struct extent_map *em;
>>>>>> @@ -4566,8 +4569,6 @@ void btrfs_evict_inode(struct inode *inode)
>>>>>>                 btrfs_orphan_del(NULL, inode);
>>>>>>                 goto no_delete;
>>>>>>         }
>>>>>> -       /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
>>>>>> -       btrfs_wait_ordered_range(inode, 0, (u64)-1);
>>>>>>
>>>>>>         if (root->fs_info->log_root_recovering) {
>>>>>>                 BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
>>>>>> --
>>>>>> 1.7.9.5
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>> --
>>>> Filipe David Manana,
>>>>
>>>> "Reasonable men adapt themselves to the world.
>>>>  Unreasonable men adapt the world to themselves.
>>>>  That's why all progress depends on unreasonable men."
>>
>>
>>
>> --
>> Filipe David Manana,
>>
>> "Reasonable men adapt themselves to the world.
>>  Unreasonable men adapt the world to themselves.
>>  That's why all progress depends on unreasonable men."
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Filipe Manana Dec. 15, 2013, 12:41 p.m. UTC | #7
On Sun, Dec 15, 2013 at 3:34 AM, Shilong Wang <wangshilong1991@gmail.com> wrote:
> 2013/12/14 Filipe David Manana <fdmanana@gmail.com>:
>> On Sat, Dec 14, 2013 at 3:13 PM, Filipe David Manana <fdmanana@gmail.com> wrote:
>>> On Sat, Dec 14, 2013 at 3:08 PM, Shilong Wang <wangshilong1991@gmail.com> wrote:
>>>> 2013/12/14 Filipe David Manana <fdmanana@gmail.com>:
>>>>> On Sat, Dec 14, 2013 at 2:56 PM, Shilong Wang <wangshilong1991@gmail.com> wrote:
>>>>>> Hello Filipe,
>>>>>>
>>>>>> 2013/12/14 Filipe David Borba Manana <fdmanana@gmail.com>:
>>>>>>> Wang Shilong got into a case where during inode eviction we were
>>>>>>> removing an extent map while it was pinned. This triggered a warning
>>>>>>> in remove_extent_mapping() because the extent map had the pinned
>>>>>>> flag set:
>>>>>>>
>>>>>>> [ 1209.102076]  [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs]
>>>>>>> [ 1209.102084]  [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs]
>>>>>>> [ 1209.102089]  [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40
>>>>>>> [ 1209.102092]  [<ffffffff8118ab2e>] evict+0x9e/0x190
>>>>>>> [ 1209.102094]  [<ffffffff8118b313>] iput+0xf3/0x180
>>>>>>> [ 1209.102101]  [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs]
>>>>>>> [ 1209.102107]  [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs]
>>>>>>>
>>>>>>> Therefore wait for any pending ordered extents, if any, which will
>>>>>>> trigger calls to unpin_extent_cache(), before removing the extent maps.
>>>>>>>
>>>>>>> Wang's solution of simply clearing the pinned bit wasn't enough, as after
>>>>>>> unpin_extent_cache() will be called and trigger another WARN_ON() because
>>>>>>> the lookup for the extent map returned NULL.
>>>>>>
>>>>>> Why not in evict_inode_truncate_pages() move remove_extent_mapping() after
>>>>>> clear_extent_bit()?
>>>>>
>>>>> So, if the pinned bit is set, it means some task will clear it later,
>>>>> via unpin_extent_cache(). And if you look at that function, it has
>>>>> this:
>>>>>
>>>>> write_lock(&tree->lock);
>>>>> em = lookup_extent_mapping(tree, start, len);
>>>>>
>>>>> WARN_ON(!em || em->start != start);
>>>>>
>>>>> And remove_extent_mapping() will remove the em from the rbtree,
>>>>> regardless of its reference count value, therefore triggering that
>>>>> warning above.
>>>>
>>>> Here i mean,  in evict_inode_truncate_pages()
>>>> We change it to:
>>>>
>>>> Step1:  unpin_extent_cache()
>>>> Step2:  remove it from extent_mapping
>>>>
>>>> Dose this cause any problems? i am a little confused, correct me if i
>>>> am wrong some places^_^.
>>>
>>> It can still lead to the same WARN_ON I think. So when calling
>>> unpin_extent_cache(), it can merge the em with its left neighbor,
>>> therefore changing its ->start value. So later, if other task (the one
>>> which set the pinned flag) calls remove_extent_mapping(), it will get
>>> an em with a different ->start (because of the merge), therefore
>>> triggering that WARN_ON().
>>
>> Or because it is not found the second time.
>>
>> On the other hand, you didn't get such WARN_ON triggered, right?
>
> During my test, i did not hit another WARN_ON in fact.
>
>>
>> So maybe just clearing the pinned bit is ok. So btrfs_invalidatepage,
>> if it finds an ordered extent, it sets the BTRFS_ORDERED_TRUNCATED
>> flag on it, and then it might call btrfs_finish_ordered_io() against
>> it, which not always unpins the extent when it has the truncated flag
>> set. So this might well be what you ran into.
>
> Let's dig more.
>
> I take a deep look in btrfs_finish_ordered_io(), it won't unset PINNED flag
> For an NOCOW extent.
>
> 2579         if (test_bit(BTRFS_ORDERED_NOCOW,
> &ordered_extent->flags)) {
> 2580                 BUG_ON(!list_empty(&ordered_extent->list)); /*
> Logic error */
> 2581                 btrfs_ordered_update_i_size(inode, 0, ordered_extent);
> 2582                 if (nolock)
> 2583                         trans = btrfs_join_transaction_nolock(root);
> 2584                 else
> 2585                         trans = btrfs_join_transaction(root);
> 2586                 if (IS_ERR(trans)) {
> 2587                         ret = PTR_ERR(trans);
> 2588                         trans = NULL;
> 2589                         goto out;
> 2590                 }
> 2591                 trans->block_rsv = &root->fs_info->delalloc_block_rsv;
> 2592                 ret = btrfs_update_inode_fallback(trans, root, inode);
> 2593                 if (ret) /* -ENOMEM or corruption */
> 2594                         btrfs_abort_transaction(trans, root, ret);
> 2595                 goto out;
> ------------------------->here we goto out directly,
> unpin_extent_cache() won't be called.
> 2596         }
> I don't know why we do something like this...
>
>
> Previously, i unset PINNED flag directly is a lazy approach, that is
> because i think it
> dosen't hurt to do so, because we are going to evict that inode, and after
> btrfs_invalidatepages() is called, nobody should can still access those pages.
>
>
> Besides, i think we don't need call btrfs_wait_ordered_extents() in
> evicting inode here..
> (Expecially after your patch that we remove extent map cache)....
>
>
> What do you think ^_^

I think it might be ok.

Thanks

>
>
>>
>> I'm ok with your approach too.
>>
>> thanks
>>
>>
>>>
>>> What do you think?
>>>
>>> thanks
>>>
>>>>
>>>>
>>>>>
>>>>> Does it makes sense?
>>>>>
>>>>> thanks
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Wang
>>>>>>>
>>>>>>> Thanks Wang for finding out this.
>>>>>>>
>>>>>>> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
>>>>>>> ---
>>>>>>>  fs/btrfs/inode.c |    5 +++--
>>>>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>>>>>> index e889779..c2933fb 100644
>>>>>>> --- a/fs/btrfs/inode.c
>>>>>>> +++ b/fs/btrfs/inode.c
>>>>>>> @@ -4509,6 +4509,9 @@ static void evict_inode_truncate_pages(struct inode *inode)
>>>>>>>         ASSERT(inode->i_state & I_FREEING);
>>>>>>>         truncate_inode_pages(&inode->i_data, 0);
>>>>>>>
>>>>>>> +       /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
>>>>>>> +       btrfs_wait_ordered_range(inode, 0, (u64)-1);
>>>>>>> +
>>>>>>>         write_lock(&map_tree->lock);
>>>>>>>         while (!RB_EMPTY_ROOT(&map_tree->map)) {
>>>>>>>                 struct extent_map *em;
>>>>>>> @@ -4566,8 +4569,6 @@ void btrfs_evict_inode(struct inode *inode)
>>>>>>>                 btrfs_orphan_del(NULL, inode);
>>>>>>>                 goto no_delete;
>>>>>>>         }
>>>>>>> -       /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
>>>>>>> -       btrfs_wait_ordered_range(inode, 0, (u64)-1);
>>>>>>>
>>>>>>>         if (root->fs_info->log_root_recovering) {
>>>>>>>                 BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
>>>>>>> --
>>>>>>> 1.7.9.5
>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Filipe David Manana,
>>>>>
>>>>> "Reasonable men adapt themselves to the world.
>>>>>  Unreasonable men adapt the world to themselves.
>>>>>  That's why all progress depends on unreasonable men."
>>>
>>>
>>>
>>> --
>>> Filipe David Manana,
>>>
>>> "Reasonable men adapt themselves to the world.
>>>  Unreasonable men adapt the world to themselves.
>>>  That's why all progress depends on unreasonable men."
>>
>>
>>
>> --
>> Filipe David Manana,
>>
>> "Reasonable men adapt themselves to the world.
>>  Unreasonable men adapt the world to themselves.
>>  That's why all progress depends on unreasonable men."
diff mbox

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e889779..c2933fb 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4509,6 +4509,9 @@  static void evict_inode_truncate_pages(struct inode *inode)
 	ASSERT(inode->i_state & I_FREEING);
 	truncate_inode_pages(&inode->i_data, 0);
 
+	/* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
+	btrfs_wait_ordered_range(inode, 0, (u64)-1);
+
 	write_lock(&map_tree->lock);
 	while (!RB_EMPTY_ROOT(&map_tree->map)) {
 		struct extent_map *em;
@@ -4566,8 +4569,6 @@  void btrfs_evict_inode(struct inode *inode)
 		btrfs_orphan_del(NULL, inode);
 		goto no_delete;
 	}
-	/* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
-	btrfs_wait_ordered_range(inode, 0, (u64)-1);
 
 	if (root->fs_info->log_root_recovering) {
 		BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,