Message ID | CAL3q7H6uShZiaj7wbVSopDWJ3gWDBypzcOuwGzM32oj8N2bFiA@mail.gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Hi, Filipe Manana > -----Original Message----- > From: Filipe Manana [mailto:fdmanana@gmail.com] > Sent: Wednesday, September 30, 2015 3:41 PM > To: Zhao Lei <zhaolei@cn.fujitsu.com> > Cc: linux-btrfs@vger.kernel.org > Subject: Re: [PATCH 2/2] btrfs: Fix lost-data-profile caused by balance bg > > On Wed, Sep 30, 2015 at 5:20 AM, Zhao Lei <zhaolei@cn.fujitsu.com> wrote: > > Hi, Filipe Manana > > > > Thanks for reviewing. > > > >> -----Original Message----- > >> From: Filipe Manana [mailto:fdmanana@gmail.com] > >> Sent: Tuesday, September 29, 2015 11:48 PM > >> To: Zhao Lei <zhaolei@cn.fujitsu.com> > >> Cc: linux-btrfs@vger.kernel.org > >> Subject: Re: [PATCH 2/2] btrfs: Fix lost-data-profile caused by > >> balance bg > >> > >> On Tue, Sep 29, 2015 at 2:51 PM, Zhao Lei <zhaolei@cn.fujitsu.com> wrote: > >> > Reproduce: > >> > (In integration-4.3 branch) > >> > > >> > TEST_DEV=(/dev/vdg /dev/vdh) > >> > TEST_DIR=/mnt/tmp > >> > > >> > umount "$TEST_DEV" >/dev/null > >> > mkfs.btrfs -f -d raid1 "${TEST_DEV[@]}" > >> > > >> > mount -o nospace_cache "$TEST_DEV" "$TEST_DIR" > >> > btrfs balance start -dusage=0 $TEST_DIR btrfs filesystem usage > >> > $TEST_DIR > >> > > >> > dd if=/dev/zero of="$TEST_DIR"/file count=100 btrfs filesystem > >> > usage $TEST_DIR > >> > > >> > Result: > >> > We can see "no data chunk" in first "btrfs filesystem usage": > >> > # btrfs filesystem usage $TEST_DIR > >> > Overall: > >> > ... > >> > Metadata,single: Size:8.00MiB, Used:0.00B > >> > /dev/vdg 8.00MiB > >> > Metadata,RAID1: Size:122.88MiB, Used:112.00KiB > >> > /dev/vdg 122.88MiB > >> > /dev/vdh 122.88MiB > >> > System,single: Size:4.00MiB, Used:0.00B > >> > /dev/vdg 4.00MiB > >> > System,RAID1: Size:8.00MiB, Used:16.00KiB > >> > /dev/vdg 8.00MiB > >> > /dev/vdh 8.00MiB > >> > Unallocated: > >> > /dev/vdg 1.06GiB > >> > /dev/vdh 1.07GiB > >> > > >> > And "data chunks changed from raid1 to single" in second "btrfs > >> > filesystem usage": > >> > # btrfs filesystem usage $TEST_DIR > >> > Overall: > >> > ... > >> > Data,single: Size:256.00MiB, Used:0.00B > >> > /dev/vdh 256.00MiB > >> > Metadata,single: Size:8.00MiB, Used:0.00B > >> > /dev/vdg 8.00MiB > >> > Metadata,RAID1: Size:122.88MiB, Used:112.00KiB > >> > /dev/vdg 122.88MiB > >> > /dev/vdh 122.88MiB > >> > System,single: Size:4.00MiB, Used:0.00B > >> > /dev/vdg 4.00MiB > >> > System,RAID1: Size:8.00MiB, Used:16.00KiB > >> > /dev/vdg 8.00MiB > >> > /dev/vdh 8.00MiB > >> > Unallocated: > >> > /dev/vdg 1.06GiB > >> > /dev/vdh 841.92MiB > >> > > >> > Reason: > >> > btrfs balance delete last data chunk in case of no data in the > >> > filesystem, then we can see "no data chunk" by "fi usage" > >> > command. > >> > > >> > And when we do write operation to fs, the only available data > >> > profile is 0x0, result is all new chunks are allocated single type. > >> > > >> > Fix: > >> > Allocate a data chunk explicitly in balance operation, to reserve > >> > at least one data chunk in the filesystem. > >> > >> Allocate a data chunk explicitly to ensure we don't lose the raid profile for > data. > >> > > > > Thanks, will change in v2. > > > >> > > >> > Test: > >> > Test by above script, and confirmed the logic by debug output. > >> > > >> > Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> > >> > --- > >> > fs/btrfs/volumes.c | 19 +++++++++++++++++++ > >> > 1 file changed, 19 insertions(+) > >> > > >> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index > >> > 6fc73586..3d5e41e 100644 > >> > --- a/fs/btrfs/volumes.c > >> > +++ b/fs/btrfs/volumes.c > >> > @@ -3277,6 +3277,7 @@ static int __btrfs_balance(struct > >> > btrfs_fs_info > >> *fs_info) > >> > u64 limit_data = bctl->data.limit; > >> > u64 limit_meta = bctl->meta.limit; > >> > u64 limit_sys = bctl->sys.limit; > >> > + int chunk_reserved = 0; > >> > > >> > /* step one make some room on all the devices */ > >> > devices = &fs_info->fs_devices->devices; @@ -3387,6 > >> > +3388,24 @@ again: > >> > goto loop; > >> > } > >> > > >> > + if (!chunk_reserved) { > >> > + trans = btrfs_start_transaction(chunk_root, > 0); > >> > + if (IS_ERR(trans)) { > >> > + > >> mutex_unlock(&fs_info->delete_unused_bgs_mutex); > >> > + ret = PTR_ERR(trans); > >> > + goto error; > >> > + } > >> > + > >> > + ret = btrfs_force_chunk_alloc(trans, > >> > + chunk_root, 1); > >> > >> Can we please use the symbol BTRFS_BLOCK_GROUP_DATA instead of 1? > >> > > Thanks, will change in v2. > > > > > >> > + if (ret < 0) { > >> > + > >> mutex_unlock(&fs_info->delete_unused_bgs_mutex); > >> > + goto error; > >> > + } > >> > + > >> > + btrfs_end_transaction(trans, chunk_root); > >> > + chunk_reserved = 1; > >> > + } > >> > >> Can we do this logic only if the block group is a data one? I.e. no > >> point allocating a data block group if our balance only touches > >> metadata ones (due to some filter for e.g.). > >> > > Thanks for point out it, will change in v2. > > I find it somewhat awkward that we always allocate a new data block group no > matter what. Some balance operations that used to succeed in the past may > now fail with -ENOSPC for example... > > How about making this simpler and not so special for data block groups, like the > following (compile tested only): > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 644e070..067b1eb > 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -2774,6 +2774,8 @@ static int btrfs_relocate_chunk(struct btrfs_root > *root, u64 chunk_offset) > struct btrfs_root *extent_root; > struct btrfs_trans_handle *trans; > int ret; > + struct btrfs_block_group_cache *bg; > + bool remove = true; > > root = root->fs_info->chunk_root; > extent_root = root->fs_info->extent_root; @@ -2803,6 +2805,23 > @@ static int btrfs_relocate_chunk(struct btrfs_root *root, u64 chunk_offset) > if (ret) > return ret; > > + bg = btrfs_lookup_block_group(root->fs_info, chunk_offset); > + ASSERT(bg); > + down_read(&bg->space_info->groups_sem); > + /* > + * Do not remove the last block group of a kind to prevent losing > + * raid profile information for future chunk allocations. > + */ > + if (bg->list.next == bg->list.prev) > + remove = false; > + up_read(&bg->space_info->groups_sem); > + if (!remove) > + btrfs_dec_block_group_ro(extent_root, bg); > + btrfs_put_block_group(bg); > + > + if (!remove) > + return 0; > + > trans = btrfs_start_transaction(root, 0); > if (IS_ERR(trans)) { > ret = PTR_ERR(trans); > Thanks for your detailed review! Reason of creating new bg is: Balance operation may be used to change raid-profile of the filesystem. If we want to change filesystem form raid1 to raid5, we need: 1: delete all raid1 bgs 2: move all data to raid5 bg if data exist in filesystem 3: reserve only one blank raid5 bg if no data in filesystem If we use similar operation as patch 1/2(not delete lattest blockgroup), we'll have following problem: 1: the old raid1 blockgroups are not all-deleted 2: if no data in filesystem, the profile will not changed from raid1 to raid5. I understand your worry of "NO_SPACE" when create additional bg, and I also considered it in making patch, but I choose to use this way because: 1: balance operation had check free space before operation (In front of __btrfs_balance) 2: for filesystem with data, we have to create target-chunk in balance operation, this patch only make "creating-chunk" earlier, and the created chunk will be used to save data in balance operation, and reduce one chunk-allocate in balance operation. 3: for a blank filesystem, it is necessary to create a blank chunk with of target raid profile. the old code hadn't create it, and this patch created. It is right operation even if cause no-space. Actually, create a chunk in empty filesystem rarely cause no-space, plus we have free-space check in 1. 4: 1~3 ensure this patch rarely cause additional no-space problem in logic. And if it really caused additional no-space(if out of my consideration), We can change code of 1 to avoid it. Thanks Zhaolei > (also at https://friendpaste.com/5IeAIIzBv3oKhureKfvjwm) > > thanks > > > > > >> Also, can't this be ineffective when the data block group we allocate > >> ends up with a key in the chunk_root that is lower than the key we > >> found in the current iteration? I.e., key found in current iteration > >> has object id B, we allocate a new block group which gets a key with > >> object id A, where A < B and the next iteration of our loop sees key > >> A, deletes the respective block group A if it's empty. In the end we > >> have no data block groups, assuming that before A there are no other > non-empty data block groups. > >> Your example works because there's no free space before the offset > >> where the chunk starts in the device. > >> > > New chunk will always use increased logic address, even if there are > > "hole" before, so A's logic address will always >B. > > And current code of balance also use this feature to avoid balance > > new-created chunks(which was created by balance operation itself). > > > > Thanks > > Zhaolei > > > >> thanks > >> > >> > + > >> > ret = btrfs_relocate_chunk(chunk_root, > >> > found_key.offset); > >> > mutex_unlock(&fs_info->delete_unused_bgs_mutex); > >> > -- > >> > 1.8.5.1 > >> > > >> > -- > >> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" > >> > in the body of a message to majordomo@vger.kernel.org More > >> > majordomo info at http://vger.kernel.org/majordomo-info.html > >> > >> > >> > >> -- > >> Filipe David Manana, > >> > >> "Reasonable men adapt themselves to the world. > >> Unreasonable men adapt the world to themselves. > >> That's why all progress depends on unreasonable men." > > > > > > -- > Filipe David Manana, > > "Reasonable men adapt themselves to the world. > Unreasonable men adapt the world to themselves. > That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 30, 2015 at 9:26 AM, Zhao Lei <zhaolei@cn.fujitsu.com> wrote: > Hi, Filipe Manana > >> -----Original Message----- >> From: Filipe Manana [mailto:fdmanana@gmail.com] >> Sent: Wednesday, September 30, 2015 3:41 PM >> To: Zhao Lei <zhaolei@cn.fujitsu.com> >> Cc: linux-btrfs@vger.kernel.org >> Subject: Re: [PATCH 2/2] btrfs: Fix lost-data-profile caused by balance bg >> >> On Wed, Sep 30, 2015 at 5:20 AM, Zhao Lei <zhaolei@cn.fujitsu.com> wrote: >> > Hi, Filipe Manana >> > >> > Thanks for reviewing. >> > >> >> -----Original Message----- >> >> From: Filipe Manana [mailto:fdmanana@gmail.com] >> >> Sent: Tuesday, September 29, 2015 11:48 PM >> >> To: Zhao Lei <zhaolei@cn.fujitsu.com> >> >> Cc: linux-btrfs@vger.kernel.org >> >> Subject: Re: [PATCH 2/2] btrfs: Fix lost-data-profile caused by >> >> balance bg >> >> >> >> On Tue, Sep 29, 2015 at 2:51 PM, Zhao Lei <zhaolei@cn.fujitsu.com> wrote: >> >> > Reproduce: >> >> > (In integration-4.3 branch) >> >> > >> >> > TEST_DEV=(/dev/vdg /dev/vdh) >> >> > TEST_DIR=/mnt/tmp >> >> > >> >> > umount "$TEST_DEV" >/dev/null >> >> > mkfs.btrfs -f -d raid1 "${TEST_DEV[@]}" >> >> > >> >> > mount -o nospace_cache "$TEST_DEV" "$TEST_DIR" >> >> > btrfs balance start -dusage=0 $TEST_DIR btrfs filesystem usage >> >> > $TEST_DIR >> >> > >> >> > dd if=/dev/zero of="$TEST_DIR"/file count=100 btrfs filesystem >> >> > usage $TEST_DIR >> >> > >> >> > Result: >> >> > We can see "no data chunk" in first "btrfs filesystem usage": >> >> > # btrfs filesystem usage $TEST_DIR >> >> > Overall: >> >> > ... >> >> > Metadata,single: Size:8.00MiB, Used:0.00B >> >> > /dev/vdg 8.00MiB >> >> > Metadata,RAID1: Size:122.88MiB, Used:112.00KiB >> >> > /dev/vdg 122.88MiB >> >> > /dev/vdh 122.88MiB >> >> > System,single: Size:4.00MiB, Used:0.00B >> >> > /dev/vdg 4.00MiB >> >> > System,RAID1: Size:8.00MiB, Used:16.00KiB >> >> > /dev/vdg 8.00MiB >> >> > /dev/vdh 8.00MiB >> >> > Unallocated: >> >> > /dev/vdg 1.06GiB >> >> > /dev/vdh 1.07GiB >> >> > >> >> > And "data chunks changed from raid1 to single" in second "btrfs >> >> > filesystem usage": >> >> > # btrfs filesystem usage $TEST_DIR >> >> > Overall: >> >> > ... >> >> > Data,single: Size:256.00MiB, Used:0.00B >> >> > /dev/vdh 256.00MiB >> >> > Metadata,single: Size:8.00MiB, Used:0.00B >> >> > /dev/vdg 8.00MiB >> >> > Metadata,RAID1: Size:122.88MiB, Used:112.00KiB >> >> > /dev/vdg 122.88MiB >> >> > /dev/vdh 122.88MiB >> >> > System,single: Size:4.00MiB, Used:0.00B >> >> > /dev/vdg 4.00MiB >> >> > System,RAID1: Size:8.00MiB, Used:16.00KiB >> >> > /dev/vdg 8.00MiB >> >> > /dev/vdh 8.00MiB >> >> > Unallocated: >> >> > /dev/vdg 1.06GiB >> >> > /dev/vdh 841.92MiB >> >> > >> >> > Reason: >> >> > btrfs balance delete last data chunk in case of no data in the >> >> > filesystem, then we can see "no data chunk" by "fi usage" >> >> > command. >> >> > >> >> > And when we do write operation to fs, the only available data >> >> > profile is 0x0, result is all new chunks are allocated single type. >> >> > >> >> > Fix: >> >> > Allocate a data chunk explicitly in balance operation, to reserve >> >> > at least one data chunk in the filesystem. >> >> >> >> Allocate a data chunk explicitly to ensure we don't lose the raid profile for >> data. >> >> >> > >> > Thanks, will change in v2. >> > >> >> > >> >> > Test: >> >> > Test by above script, and confirmed the logic by debug output. >> >> > >> >> > Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> >> >> > --- >> >> > fs/btrfs/volumes.c | 19 +++++++++++++++++++ >> >> > 1 file changed, 19 insertions(+) >> >> > >> >> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index >> >> > 6fc73586..3d5e41e 100644 >> >> > --- a/fs/btrfs/volumes.c >> >> > +++ b/fs/btrfs/volumes.c >> >> > @@ -3277,6 +3277,7 @@ static int __btrfs_balance(struct >> >> > btrfs_fs_info >> >> *fs_info) >> >> > u64 limit_data = bctl->data.limit; >> >> > u64 limit_meta = bctl->meta.limit; >> >> > u64 limit_sys = bctl->sys.limit; >> >> > + int chunk_reserved = 0; >> >> > >> >> > /* step one make some room on all the devices */ >> >> > devices = &fs_info->fs_devices->devices; @@ -3387,6 >> >> > +3388,24 @@ again: >> >> > goto loop; >> >> > } >> >> > >> >> > + if (!chunk_reserved) { >> >> > + trans = btrfs_start_transaction(chunk_root, >> 0); >> >> > + if (IS_ERR(trans)) { >> >> > + >> >> mutex_unlock(&fs_info->delete_unused_bgs_mutex); >> >> > + ret = PTR_ERR(trans); >> >> > + goto error; >> >> > + } >> >> > + >> >> > + ret = btrfs_force_chunk_alloc(trans, >> >> > + chunk_root, 1); >> >> >> >> Can we please use the symbol BTRFS_BLOCK_GROUP_DATA instead of 1? >> >> >> > Thanks, will change in v2. >> > >> > >> >> > + if (ret < 0) { >> >> > + >> >> mutex_unlock(&fs_info->delete_unused_bgs_mutex); >> >> > + goto error; >> >> > + } >> >> > + >> >> > + btrfs_end_transaction(trans, chunk_root); >> >> > + chunk_reserved = 1; >> >> > + } >> >> >> >> Can we do this logic only if the block group is a data one? I.e. no >> >> point allocating a data block group if our balance only touches >> >> metadata ones (due to some filter for e.g.). >> >> >> > Thanks for point out it, will change in v2. >> >> I find it somewhat awkward that we always allocate a new data block group no >> matter what. Some balance operations that used to succeed in the past may >> now fail with -ENOSPC for example... >> >> How about making this simpler and not so special for data block groups, like the >> following (compile tested only): >> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 644e070..067b1eb >> 100644 >> --- a/fs/btrfs/volumes.c >> +++ b/fs/btrfs/volumes.c >> @@ -2774,6 +2774,8 @@ static int btrfs_relocate_chunk(struct btrfs_root >> *root, u64 chunk_offset) >> struct btrfs_root *extent_root; >> struct btrfs_trans_handle *trans; >> int ret; >> + struct btrfs_block_group_cache *bg; >> + bool remove = true; >> >> root = root->fs_info->chunk_root; >> extent_root = root->fs_info->extent_root; @@ -2803,6 +2805,23 >> @@ static int btrfs_relocate_chunk(struct btrfs_root *root, u64 chunk_offset) >> if (ret) >> return ret; >> >> + bg = btrfs_lookup_block_group(root->fs_info, chunk_offset); >> + ASSERT(bg); >> + down_read(&bg->space_info->groups_sem); >> + /* >> + * Do not remove the last block group of a kind to prevent losing >> + * raid profile information for future chunk allocations. >> + */ >> + if (bg->list.next == bg->list.prev) >> + remove = false; >> + up_read(&bg->space_info->groups_sem); >> + if (!remove) >> + btrfs_dec_block_group_ro(extent_root, bg); >> + btrfs_put_block_group(bg); >> + >> + if (!remove) >> + return 0; >> + >> trans = btrfs_start_transaction(root, 0); >> if (IS_ERR(trans)) { >> ret = PTR_ERR(trans); >> > > Thanks for your detailed review! > > Reason of creating new bg is: > Balance operation may be used to change raid-profile of the filesystem. > If we want to change filesystem form raid1 to raid5, we need: > 1: delete all raid1 bgs > 2: move all data to raid5 bg if data exist in filesystem > 3: reserve only one blank raid5 bg if no data in filesystem I see. Thanks. > > If we use similar operation as patch 1/2(not delete lattest blockgroup), > we'll have following problem: > 1: the old raid1 blockgroups are not all-deleted > 2: if no data in filesystem, the profile will not changed from raid1 to raid5. > > I understand your worry of "NO_SPACE" when create additional bg, > and I also considered it in making patch, but I choose to use this way because: > 1: balance operation had check free space before operation > (In front of __btrfs_balance) > 2: for filesystem with data, we have to create target-chunk in balance operation, > this patch only make "creating-chunk" earlier, and the created chunk will be used > to save data in balance operation, and reduce one chunk-allocate in balance operation. > 3: for a blank filesystem, it is necessary to create a blank chunk with of target raid profile. > the old code hadn't create it, and this patch created. It is right operation even if > cause no-space. Actually, create a chunk in empty filesystem rarely cause no-space, > plus we have free-space check in 1. > 4: 1~3 ensure this patch rarely cause additional no-space problem in logic. > And if it really caused additional no-space(if out of my consideration), > We can change code of 1 to avoid it. > > Thanks > Zhaolei > >> (also at https://friendpaste.com/5IeAIIzBv3oKhureKfvjwm) >> >> thanks >> >> >> > >> >> Also, can't this be ineffective when the data block group we allocate >> >> ends up with a key in the chunk_root that is lower than the key we >> >> found in the current iteration? I.e., key found in current iteration >> >> has object id B, we allocate a new block group which gets a key with >> >> object id A, where A < B and the next iteration of our loop sees key >> >> A, deletes the respective block group A if it's empty. In the end we >> >> have no data block groups, assuming that before A there are no other >> non-empty data block groups. >> >> Your example works because there's no free space before the offset >> >> where the chunk starts in the device. >> >> >> > New chunk will always use increased logic address, even if there are >> > "hole" before, so A's logic address will always >B. >> > And current code of balance also use this feature to avoid balance >> > new-created chunks(which was created by balance operation itself). >> > >> > Thanks >> > Zhaolei >> > >> >> thanks >> >> >> >> > + >> >> > ret = btrfs_relocate_chunk(chunk_root, >> >> > found_key.offset); >> >> > mutex_unlock(&fs_info->delete_unused_bgs_mutex); >> >> > -- >> >> > 1.8.5.1 >> >> > >> >> > -- >> >> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" >> >> > in the body of a message to majordomo@vger.kernel.org More >> >> > majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> >> >> >> >> -- >> >> Filipe David Manana, >> >> >> >> "Reasonable men adapt themselves to the world. >> >> Unreasonable men adapt the world to themselves. >> >> That's why all progress depends on unreasonable men." >> > >> >> >> >> -- >> Filipe David Manana, >> >> "Reasonable men adapt themselves to the world. >> Unreasonable men adapt the world to themselves. >> That's why all progress depends on unreasonable men." >
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 644e070..067b1eb 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2774,6 +2774,8 @@ static int btrfs_relocate_chunk(struct btrfs_root *root, u64 chunk_offset) struct btrfs_root *extent_root; struct btrfs_trans_handle *trans; int ret; + struct btrfs_block_group_cache *bg; + bool remove = true; root = root->fs_info->chunk_root;