Message ID | 0803c4de21aac935169b8289de1cb71c695452be.1719326990.git.naohiro.aota@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: zoned: fix calc_available_free_space for zoned mode | expand |
On Tue, Jun 25, 2024 at 11:58:49PM +0900, Naohiro Aota wrote: > calc_available_free_space() returns the total size of metadata (or system) > block groups, which can be allocated from unallocated disk space. The logic > is wrong on zoned mode in two places. > > First, the calculation of data_chunk_size is wrong. We always allocate one > zone as one chunk, and no partial allocation of a zone. So, we should use > zone_size (= data_sinfo->chunk_size) as it is. > > Second, the result "avail" may not be zone aligned. Since we always > allocate one zone as one chunk on zoned mode, returning non-zone size > alingned bytes will result in less pressure on the async metadata reclaim > process. > > This is serious for the nearly full state with a large zone size > device. Allowing over-commit too much will result in less async reclaim > work and end up in ENOSPC. We can align down to the zone size to avoid > that. I sort of wish we could get rid of the "data_sinfo->chunk_size is wrong" thing. If we never actually use that value, perhaps we can try to really fix it? I didn't do it in my patches either, so there must have been something that made me hesitate that I'm now forgetting. Obviously not the focus here, so with that said, the fix looks good. Reviewed-by: Boris Burkov <boris@bur.io> > > Fixes: cb6cbab79055 ("btrfs: adjust overcommit logic when very close to full") > CC: stable@vger.kernel.org # 6.9 > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> > --- > As I mentioned in [1], I based this patch on for-next but before > Boris's "dynamic block_group reclaim threshold" series, because it would > be easier to backport this patch. I'll resend this patch basing on > for-next, if you'd prefer that. I looked at the merge conflict and it's the fact that I pulled the data chunk logic into a helper function. Just fixing it is easy, but like you said, makes backporting this fix fussier. Since the conflict is pretty trivial, I support re-ordering things so that this goes under the reclaim stuff, whether we pull my calc_effective_data_chunk_size refactor into this patch and put it under, or just edit the refactor to bring along the zoned fix. Let me know if I can help with that! > > [1] https://lore.kernel.org/linux-btrfs/avjakfevy3gtwcgxugnzwsfkld35tfgktd5ywpz3kac6gfraxh@uic6zl3jmgbl/ > --- > fs/btrfs/space-info.c | 24 +++++++++++++++++++++--- > 1 file changed, 21 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c > index 0283ee9bf813..85ff44a74223 100644 > --- a/fs/btrfs/space-info.c > +++ b/fs/btrfs/space-info.c > @@ -373,11 +373,18 @@ static u64 calc_available_free_space(struct btrfs_fs_info *fs_info, > * "optimal" chunk size based on the fs size. However when we actually > * allocate the chunk we will strip this down further, making it no more > * than 10% of the disk or 1G, whichever is smaller. > + * > + * On the zoned mode, we need to use zone_size (= > + * data_sinfo->chunk_size) as it is. > */ > data_sinfo = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_DATA); > - data_chunk_size = min(data_sinfo->chunk_size, > - mult_perc(fs_info->fs_devices->total_rw_bytes, 10)); > - data_chunk_size = min_t(u64, data_chunk_size, SZ_1G); > + if (!btrfs_is_zoned(fs_info)) { > + data_chunk_size = min(data_sinfo->chunk_size, > + mult_perc(fs_info->fs_devices->total_rw_bytes, 10)); > + data_chunk_size = min_t(u64, data_chunk_size, SZ_1G); > + } else { > + data_chunk_size = data_sinfo->chunk_size; > + } > > /* > * Since data allocations immediately use block groups as part of the > @@ -405,6 +412,17 @@ static u64 calc_available_free_space(struct btrfs_fs_info *fs_info, > avail >>= 3; > else > avail >>= 1; > + > + /* > + * On the zoned mode, we always allocate one zone as one chunk. > + * Returning non-zone size alingned bytes here will result in > + * less pressure for the async metadata reclaim process, and it > + * will over-commit too much leading to ENOSPC. Align down to the > + * zone size to avoid that. > + */ > + if (btrfs_is_zoned(fs_info)) > + avail = ALIGN_DOWN(avail, fs_info->zone_size); > + > return avail; > } > > -- > 2.45.2 >
On Tue, Jun 25, 2024 at 09:17:14AM GMT, Boris Burkov wrote: > On Tue, Jun 25, 2024 at 11:58:49PM +0900, Naohiro Aota wrote: > > calc_available_free_space() returns the total size of metadata (or system) > > block groups, which can be allocated from unallocated disk space. The logic > > is wrong on zoned mode in two places. > > > > First, the calculation of data_chunk_size is wrong. We always allocate one > > zone as one chunk, and no partial allocation of a zone. So, we should use > > zone_size (= data_sinfo->chunk_size) as it is. > > > > Second, the result "avail" may not be zone aligned. Since we always > > allocate one zone as one chunk on zoned mode, returning non-zone size > > alingned bytes will result in less pressure on the async metadata reclaim > > process. > > > > This is serious for the nearly full state with a large zone size > > device. Allowing over-commit too much will result in less async reclaim > > work and end up in ENOSPC. We can align down to the zone size to avoid > > that. > > I sort of wish we could get rid of the "data_sinfo->chunk_size is wrong" > thing. If we never actually use that value, perhaps we can try to really > fix it? I didn't do it in my patches either, so there must have been > something that made me hesitate that I'm now forgetting. The "data_sinfo->chunk_size" itself is correct. It contains a preferable default chunk size and it is used always on zoned mode. OTOH, on regular mode, we need to tweak the size depending on the situation (unallocated disk space). So, I don't see anything that needs to be fixed. > > Obviously not the focus here, so with that said, the fix looks > good. > > Reviewed-by: Boris Burkov <boris@bur.io> > > > > > Fixes: cb6cbab79055 ("btrfs: adjust overcommit logic when very close to full") > > CC: stable@vger.kernel.org # 6.9 > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> > > --- > > As I mentioned in [1], I based this patch on for-next but before > > Boris's "dynamic block_group reclaim threshold" series, because it would > > be easier to backport this patch. I'll resend this patch basing on > > for-next, if you'd prefer that. > > I looked at the merge conflict and it's the fact that I pulled the data > chunk logic into a helper function. Just fixing it is easy, but like you > said, makes backporting this fix fussier. > > Since the conflict is pretty trivial, I support re-ordering things so > that this goes under the reclaim stuff, whether we pull my > calc_effective_data_chunk_size refactor into this patch and put it > under, or just edit the refactor to bring along the zoned fix. Let me > know if I can help with that! Thank you. I placed my fix patch before your series and modified your patch accordingly.
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 0283ee9bf813..85ff44a74223 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -373,11 +373,18 @@ static u64 calc_available_free_space(struct btrfs_fs_info *fs_info, * "optimal" chunk size based on the fs size. However when we actually * allocate the chunk we will strip this down further, making it no more * than 10% of the disk or 1G, whichever is smaller. + * + * On the zoned mode, we need to use zone_size (= + * data_sinfo->chunk_size) as it is. */ data_sinfo = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_DATA); - data_chunk_size = min(data_sinfo->chunk_size, - mult_perc(fs_info->fs_devices->total_rw_bytes, 10)); - data_chunk_size = min_t(u64, data_chunk_size, SZ_1G); + if (!btrfs_is_zoned(fs_info)) { + data_chunk_size = min(data_sinfo->chunk_size, + mult_perc(fs_info->fs_devices->total_rw_bytes, 10)); + data_chunk_size = min_t(u64, data_chunk_size, SZ_1G); + } else { + data_chunk_size = data_sinfo->chunk_size; + } /* * Since data allocations immediately use block groups as part of the @@ -405,6 +412,17 @@ static u64 calc_available_free_space(struct btrfs_fs_info *fs_info, avail >>= 3; else avail >>= 1; + + /* + * On the zoned mode, we always allocate one zone as one chunk. + * Returning non-zone size alingned bytes here will result in + * less pressure for the async metadata reclaim process, and it + * will over-commit too much leading to ENOSPC. Align down to the + * zone size to avoid that. + */ + if (btrfs_is_zoned(fs_info)) + avail = ALIGN_DOWN(avail, fs_info->zone_size); + return avail; }
calc_available_free_space() returns the total size of metadata (or system) block groups, which can be allocated from unallocated disk space. The logic is wrong on zoned mode in two places. First, the calculation of data_chunk_size is wrong. We always allocate one zone as one chunk, and no partial allocation of a zone. So, we should use zone_size (= data_sinfo->chunk_size) as it is. Second, the result "avail" may not be zone aligned. Since we always allocate one zone as one chunk on zoned mode, returning non-zone size alingned bytes will result in less pressure on the async metadata reclaim process. This is serious for the nearly full state with a large zone size device. Allowing over-commit too much will result in less async reclaim work and end up in ENOSPC. We can align down to the zone size to avoid that. Fixes: cb6cbab79055 ("btrfs: adjust overcommit logic when very close to full") CC: stable@vger.kernel.org # 6.9 Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> --- As I mentioned in [1], I based this patch on for-next but before Boris's "dynamic block_group reclaim threshold" series, because it would be easier to backport this patch. I'll resend this patch basing on for-next, if you'd prefer that. [1] https://lore.kernel.org/linux-btrfs/avjakfevy3gtwcgxugnzwsfkld35tfgktd5ywpz3kac6gfraxh@uic6zl3jmgbl/ --- fs/btrfs/space-info.c | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-)