Message ID | 1472802392-10851-1-git-send-email-naohiro.aota@hgst.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On 09/02/2016 03:46 AM, Naohiro Aota wrote: > Currently, btrfs_relocate_chunk() is removing relocated BG by itself. But > the work can be done by btrfs_delete_unused_bgs() (and it's better since it > trim the BG). Let's dedupe the code. > > While btrfs_delete_unused_bgs() is already hitting the relocated BG, it > skip the BG since the BG has "ro" flag set (to keep balancing BG intact). > On the other hand, btrfs cannot drop "ro" flag here to prevent additional > writes. So this patch make use of "removed" flag. > btrfs_delete_unused_bgs() now detect the flag to distinguish whether a > read-only BG is relocating or not. > This seems racey to me. We remove the last part of the block group, it ends up on the unused_bgs_list, we process this list, see that removed isn't set and we skip it, then later we set removed, but it's too late. I think the right way is to actually do a transaction, set ->removed, manually add it to the unused_bgs_list if it's not already, then end the transaction. This way we are guaranteed to have the bg on the list when it is ready to be removed. This is my analysis after looking at it for 10 seconds after being awake for like 30 minutes so if I'm missing something let me know. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2016-09-02 (金) の 09:35 -0400 に Josef Bacik さんは書きました: > On 09/02/2016 03:46 AM, Naohiro Aota wrote: > > > > Currently, btrfs_relocate_chunk() is removing relocated BG by > > itself. But > > the work can be done by btrfs_delete_unused_bgs() (and it's better > > since it > > trim the BG). Let's dedupe the code. > > > > While btrfs_delete_unused_bgs() is already hitting the relocated > > BG, it > > skip the BG since the BG has "ro" flag set (to keep balancing BG > > intact). > > On the other hand, btrfs cannot drop "ro" flag here to prevent > > additional > > writes. So this patch make use of "removed" flag. > > btrfs_delete_unused_bgs() now detect the flag to distinguish > > whether a > > read-only BG is relocating or not. > > > > This seems racey to me. We remove the last part of the block group, > it ends up > on the unused_bgs_list, we process this list, see that removed isn't > set and we > skip it, then later we set removed, but it's too late. I think the > right way is > to actually do a transaction, set ->removed, manually add it to the > unused_bgs_list if it's not already, then end the transaction. This > way we are > guaranteed to have the bg on the list when it is ready to be > removed. This is > my analysis after looking at it for 10 seconds after being awake for > like 30 > minutes so if I'm missing something let me know. Thanks, I don't think a race will happen. Since we are holding delete_unused_bgs_mutex here, btrfs_delte_unused_bgs() checks ->removed flag after we unlock the mutex i.e. we setup the flag properly. For a case btrfs_delete_usused_bgs() checks the BG before we hold delte_unused_bgs_mutex, then that BG is removed by it (if it's empty) and btrfs_relocate_chunk() should never see it. Regards, Naohiro
On 09/05/2016 12:32 AM, Naohiro Aota wrote: > 2016-09-02 (金) の 09:35 -0400 に Josef Bacik さんは書きました: >> On 09/02/2016 03:46 AM, Naohiro Aota wrote: >>> >>> Currently, btrfs_relocate_chunk() is removing relocated BG by >>> itself. But >>> the work can be done by btrfs_delete_unused_bgs() (and it's better >>> since it >>> trim the BG). Let's dedupe the code. >>> >>> While btrfs_delete_unused_bgs() is already hitting the relocated >>> BG, it >>> skip the BG since the BG has "ro" flag set (to keep balancing BG >>> intact). >>> On the other hand, btrfs cannot drop "ro" flag here to prevent >>> additional >>> writes. So this patch make use of "removed" flag. >>> btrfs_delete_unused_bgs() now detect the flag to distinguish >>> whether a >>> read-only BG is relocating or not. >>> >> >> This seems racey to me. We remove the last part of the block group, >> it ends up >> on the unused_bgs_list, we process this list, see that removed isn't >> set and we >> skip it, then later we set removed, but it's too late. I think the >> right way is >> to actually do a transaction, set ->removed, manually add it to the >> unused_bgs_list if it's not already, then end the transaction. This >> way we are >> guaranteed to have the bg on the list when it is ready to be >> removed. This is >> my analysis after looking at it for 10 seconds after being awake for >> like 30 >> minutes so if I'm missing something let me know. Thanks, > > I don't think a race will happen. Since we are holding > delete_unused_bgs_mutex here, btrfs_delte_unused_bgs() checks ->removed > flag after we unlock the mutex i.e. we setup the flag properly. For a > case btrfs_delete_usused_bgs() checks the BG before we hold > delte_unused_bgs_mutex, then that BG is removed by it (if it's empty) > and btrfs_relocate_chunk() should never see it. > Ok that's what I was missing, thanks Reviewed-by: Josef Bacik <jbacik@fb.com> Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/02/2016 03:46 AM, Naohiro Aota wrote: > Currently, btrfs_relocate_chunk() is removing relocated BG by itself. But > the work can be done by btrfs_delete_unused_bgs() (and it's better since it > trim the BG). Let's dedupe the code. > > While btrfs_delete_unused_bgs() is already hitting the relocated BG, it > skip the BG since the BG has "ro" flag set (to keep balancing BG intact). > On the other hand, btrfs cannot drop "ro" flag here to prevent additional > writes. So this patch make use of "removed" flag. > btrfs_delete_unused_bgs() now detect the flag to distinguish whether a > read-only BG is relocating or not. > > Signed-off-by: Naohiro Aota <naohiro.aota@hgst.com> This runs into trouble with btrfs_rm_device(), I've been triggering crashes with btrfs/101 here. The problem is that by the time we get around to running btrfs_delete_unused_bgs(), btrfs_rm_device() has long since free'd the device. I thought about calling btrfs_delete_unused_bgs() directly from btrfs_rm_device(), but it might bail out without returning an error for a number of reasons. For now, I've reverted this patch from the pull, but we can bring it back once the device removal path is covered. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 843ed27..d382735 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -10971,7 +10971,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) spin_lock(&block_group->lock); if (block_group->reserved || btrfs_block_group_used(&block_group->item) || - block_group->ro || + (block_group->ro && !block_group->removed) || list_is_singular(&block_group->list)) { /* * We want to bail if we made new allocations or have diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 7e6399f..1a6789d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2931,8 +2931,8 @@ out: static int btrfs_relocate_chunk(struct btrfs_root *root, u64 chunk_offset) { struct btrfs_root *extent_root; - struct btrfs_trans_handle *trans; int ret; + struct btrfs_block_group_cache *block_group; root = root->fs_info->chunk_root; extent_root = root->fs_info->extent_root; @@ -2962,21 +2962,17 @@ static int btrfs_relocate_chunk(struct btrfs_root *root, u64 chunk_offset) if (ret) return ret; - trans = btrfs_start_trans_remove_block_group(root->fs_info, - chunk_offset); - if (IS_ERR(trans)) { - ret = PTR_ERR(trans); - btrfs_handle_fs_error(root->fs_info, ret, NULL); - return ret; - } - /* - * step two, delete the device extents and the - * chunk tree entries + * step two, flag the chunk as removed and let + * btrfs_delete_unused_bgs() remove it. */ - ret = btrfs_remove_chunk(trans, root, chunk_offset); - btrfs_end_transaction(trans, extent_root); - return ret; + block_group = btrfs_lookup_block_group(root->fs_info, chunk_offset); + spin_lock(&block_group->lock); + block_group->removed = 1; + spin_unlock(&block_group->lock); + btrfs_put_block_group(block_group); + + return 0; } static int btrfs_relocate_sys_chunks(struct btrfs_root *root)
Currently, btrfs_relocate_chunk() is removing relocated BG by itself. But the work can be done by btrfs_delete_unused_bgs() (and it's better since it trim the BG). Let's dedupe the code. While btrfs_delete_unused_bgs() is already hitting the relocated BG, it skip the BG since the BG has "ro" flag set (to keep balancing BG intact). On the other hand, btrfs cannot drop "ro" flag here to prevent additional writes. So this patch make use of "removed" flag. btrfs_delete_unused_bgs() now detect the flag to distinguish whether a read-only BG is relocating or not. Signed-off-by: Naohiro Aota <naohiro.aota@hgst.com> --- fs/btrfs/extent-tree.c | 2 +- fs/btrfs/volumes.c | 24 ++++++++++-------------- 2 files changed, 11 insertions(+), 15 deletions(-)