diff mbox

btrfs: let btrfs_delete_unused_bgs() to clean relocated bgs

Message ID 1472802392-10851-1-git-send-email-naohiro.aota@hgst.com (mailing list archive)
State Accepted
Headers show

Commit Message

Naohiro Aota Sept. 2, 2016, 7:46 a.m. UTC
Currently, btrfs_relocate_chunk() is removing relocated BG by itself. But
the work can be done by btrfs_delete_unused_bgs() (and it's better since it
trim the BG). Let's dedupe the code.

While btrfs_delete_unused_bgs() is already hitting the relocated BG, it
skip the BG since the BG has "ro" flag set (to keep balancing BG intact).
On the other hand, btrfs cannot drop "ro" flag here to prevent additional
writes. So this patch make use of "removed" flag.
btrfs_delete_unused_bgs() now detect the flag to distinguish whether a
read-only BG is relocating or not.

Signed-off-by: Naohiro Aota <naohiro.aota@hgst.com>
---
 fs/btrfs/extent-tree.c |  2 +-
 fs/btrfs/volumes.c     | 24 ++++++++++--------------
 2 files changed, 11 insertions(+), 15 deletions(-)

Comments

Josef Bacik Sept. 2, 2016, 1:35 p.m. UTC | #1
On 09/02/2016 03:46 AM, Naohiro Aota wrote:
> Currently, btrfs_relocate_chunk() is removing relocated BG by itself. But
> the work can be done by btrfs_delete_unused_bgs() (and it's better since it
> trim the BG). Let's dedupe the code.
>
> While btrfs_delete_unused_bgs() is already hitting the relocated BG, it
> skip the BG since the BG has "ro" flag set (to keep balancing BG intact).
> On the other hand, btrfs cannot drop "ro" flag here to prevent additional
> writes. So this patch make use of "removed" flag.
> btrfs_delete_unused_bgs() now detect the flag to distinguish whether a
> read-only BG is relocating or not.
>

This seems racey to me.  We remove the last part of the block group, it ends up 
on the unused_bgs_list, we process this list, see that removed isn't set and we 
skip it, then later we set removed, but it's too late.  I think the right way is 
to actually do a transaction, set ->removed, manually add it to the 
unused_bgs_list if it's not already, then end the transaction.  This way we are 
guaranteed to have the bg on the list when it is ready to be removed.  This is 
my analysis after looking at it for 10 seconds after being awake for like 30 
minutes so if I'm missing something let me know.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Naohiro Aota Sept. 5, 2016, 4:32 a.m. UTC | #2
2016-09-02 (金) の 09:35 -0400 に Josef Bacik さんは書きました:
> On 09/02/2016 03:46 AM, Naohiro Aota wrote:

> > 

> > Currently, btrfs_relocate_chunk() is removing relocated BG by

> > itself. But

> > the work can be done by btrfs_delete_unused_bgs() (and it's better

> > since it

> > trim the BG). Let's dedupe the code.

> > 

> > While btrfs_delete_unused_bgs() is already hitting the relocated

> > BG, it

> > skip the BG since the BG has "ro" flag set (to keep balancing BG

> > intact).

> > On the other hand, btrfs cannot drop "ro" flag here to prevent

> > additional

> > writes. So this patch make use of "removed" flag.

> > btrfs_delete_unused_bgs() now detect the flag to distinguish

> > whether a

> > read-only BG is relocating or not.

> > 

> 

> This seems racey to me.  We remove the last part of the block group,

> it ends up 

> on the unused_bgs_list, we process this list, see that removed isn't

> set and we 

> skip it, then later we set removed, but it's too late.  I think the

> right way is 

> to actually do a transaction, set ->removed, manually add it to the 

> unused_bgs_list if it's not already, then end the transaction.  This

> way we are 

> guaranteed to have the bg on the list when it is ready to be

> removed.  This is 

> my analysis after looking at it for 10 seconds after being awake for

> like 30 

> minutes so if I'm missing something let me know.  Thanks,


I don't think a race will happen. Since we are holding
delete_unused_bgs_mutex here, btrfs_delte_unused_bgs() checks ->removed
flag after we unlock the mutex i.e. we setup the flag properly. For a
case btrfs_delete_usused_bgs() checks the BG before we hold
delte_unused_bgs_mutex, then that BG is removed by it (if it's empty)
and btrfs_relocate_chunk() should never see it.

Regards,
Naohiro
Josef Bacik Sept. 6, 2016, 12:52 p.m. UTC | #3
On 09/05/2016 12:32 AM, Naohiro Aota wrote:
> 2016-09-02 (金) の 09:35 -0400 に Josef Bacik さんは書きました:
>> On 09/02/2016 03:46 AM, Naohiro Aota wrote:
>>>
>>> Currently, btrfs_relocate_chunk() is removing relocated BG by
>>> itself. But
>>> the work can be done by btrfs_delete_unused_bgs() (and it's better
>>> since it
>>> trim the BG). Let's dedupe the code.
>>>
>>> While btrfs_delete_unused_bgs() is already hitting the relocated
>>> BG, it
>>> skip the BG since the BG has "ro" flag set (to keep balancing BG
>>> intact).
>>> On the other hand, btrfs cannot drop "ro" flag here to prevent
>>> additional
>>> writes. So this patch make use of "removed" flag.
>>> btrfs_delete_unused_bgs() now detect the flag to distinguish
>>> whether a
>>> read-only BG is relocating or not.
>>>
>>
>> This seems racey to me.  We remove the last part of the block group,
>> it ends up
>> on the unused_bgs_list, we process this list, see that removed isn't
>> set and we
>> skip it, then later we set removed, but it's too late.  I think the
>> right way is
>> to actually do a transaction, set ->removed, manually add it to the
>> unused_bgs_list if it's not already, then end the transaction.  This
>> way we are
>> guaranteed to have the bg on the list when it is ready to be
>> removed.  This is
>> my analysis after looking at it for 10 seconds after being awake for
>> like 30
>> minutes so if I'm missing something let me know.  Thanks,
>
> I don't think a race will happen. Since we are holding
> delete_unused_bgs_mutex here, btrfs_delte_unused_bgs() checks ->removed
> flag after we unlock the mutex i.e. we setup the flag properly. For a
> case btrfs_delete_usused_bgs() checks the BG before we hold
> delte_unused_bgs_mutex, then that BG is removed by it (if it's empty)
> and btrfs_relocate_chunk() should never see it.
>

Ok that's what I was missing, thanks

Reviewed-by: Josef Bacik <jbacik@fb.com>

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason Oct. 10, 2016, 9:04 p.m. UTC | #4
On 09/02/2016 03:46 AM, Naohiro Aota wrote:
> Currently, btrfs_relocate_chunk() is removing relocated BG by itself. But
> the work can be done by btrfs_delete_unused_bgs() (and it's better since it
> trim the BG). Let's dedupe the code.
>
> While btrfs_delete_unused_bgs() is already hitting the relocated BG, it
> skip the BG since the BG has "ro" flag set (to keep balancing BG intact).
> On the other hand, btrfs cannot drop "ro" flag here to prevent additional
> writes. So this patch make use of "removed" flag.
> btrfs_delete_unused_bgs() now detect the flag to distinguish whether a
> read-only BG is relocating or not.
>
> Signed-off-by: Naohiro Aota <naohiro.aota@hgst.com>

This runs into trouble with btrfs_rm_device(), I've been triggering 
crashes with btrfs/101 here.

The problem is that by the time we get around to running 
btrfs_delete_unused_bgs(), btrfs_rm_device() has long since free'd the 
device.

I thought about calling btrfs_delete_unused_bgs() directly from 
btrfs_rm_device(), but it might bail out without returning an error for 
a number of reasons.

For now, I've reverted this patch from the pull, but we can bring it 
back once the device removal path is covered.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 843ed27..d382735 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10971,7 +10971,7 @@  void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
 		spin_lock(&block_group->lock);
 		if (block_group->reserved ||
 		    btrfs_block_group_used(&block_group->item) ||
-		    block_group->ro ||
+		    (block_group->ro && !block_group->removed) ||
 		    list_is_singular(&block_group->list)) {
 			/*
 			 * We want to bail if we made new allocations or have
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 7e6399f..1a6789d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2931,8 +2931,8 @@  out:
 static int btrfs_relocate_chunk(struct btrfs_root *root, u64 chunk_offset)
 {
 	struct btrfs_root *extent_root;
-	struct btrfs_trans_handle *trans;
 	int ret;
+	struct btrfs_block_group_cache *block_group;
 
 	root = root->fs_info->chunk_root;
 	extent_root = root->fs_info->extent_root;
@@ -2962,21 +2962,17 @@  static int btrfs_relocate_chunk(struct btrfs_root *root, u64 chunk_offset)
 	if (ret)
 		return ret;
 
-	trans = btrfs_start_trans_remove_block_group(root->fs_info,
-						     chunk_offset);
-	if (IS_ERR(trans)) {
-		ret = PTR_ERR(trans);
-		btrfs_handle_fs_error(root->fs_info, ret, NULL);
-		return ret;
-	}
-
 	/*
-	 * step two, delete the device extents and the
-	 * chunk tree entries
+	 * step two, flag the chunk as removed and let
+	 * btrfs_delete_unused_bgs() remove it.
 	 */
-	ret = btrfs_remove_chunk(trans, root, chunk_offset);
-	btrfs_end_transaction(trans, extent_root);
-	return ret;
+	block_group = btrfs_lookup_block_group(root->fs_info, chunk_offset);
+	spin_lock(&block_group->lock);
+	block_group->removed = 1;
+	spin_unlock(&block_group->lock);
+	btrfs_put_block_group(block_group);
+
+	return 0;
 }
 
 static int btrfs_relocate_sys_chunks(struct btrfs_root *root)