Message ID | cover.1691424260.git.naohiro.aota@wdc.com (mailing list archive) |
---|---|
Headers | show |
Series | btrfs: zoned: write-time activation of metadata block group | expand |
On Tue, Aug 08, 2023 at 01:12:30AM +0900, Naohiro Aota wrote: > In the current implementation, block groups are activated at > reservation time to ensure that all reserved bytes can be written to > an active metadata block group. However, this approach has proven to > be less efficient, as it activates block groups more frequently than > necessary, putting pressure on the active zone resource and leading to > potential issues such as early ENOSPC or hung_task. > > Another drawback of the current method is that it hampers metadata > over-commit, and necessitates additional flush operations and block > group allocations, resulting in decreased overall performance. > > Actually, we don't need so many active metadata block groups because > there is only one sequential metadata write stream. > > So, this series introduces a write-time activation of metadata and > system block group. This involves reserving at least one active block > group specifically for a metadata and system block group. When the > write goes into a new block group, it should have allocated all the > regions in the current active block group. So, we can wait for IOs to > fill the space, and then switch to a new block group. > > Switching to the write-time activation solves the above issue and will > lead to better performance. > > * Performance > > There is a significant difference with a workload (buffered write without > sync) because we re-enable metadata over-commit. > > before the patch: 741.00 MB/sec > after the patch: 1430.27 MB/sec (+ 93%) > > * Organization > > Patches 1-5 are preparation patches involves meta_write_pointer check. > > Patches 6 and 7 are the main part of this series, implementing the > write-time activation. > > Patches 8-10 addresses code for reserve time activation: counting fresh > block group as zone_unusable, activating a block group on allocation, > and disabling metadata over-commit. > > * Changes > > - v3 > - Rework the reservation patch to fix the over-reservation problem > https://lore.kernel.org/all/xpb5wdmxx5wops26ihulo73oluc64dt4zpxqc7cirp2wvxl3qy@hv7lsvma5hxf/ > - Rename btrfs_eb_write_context's block_group to zoned_bg. Added to misc-next, thanks. We need it in order to enable zoned tests in the CI so this goes in now, any fixups or more review tags will be done in the commits.
On Tue, Aug 08, 2023 at 01:12:30AM +0900, Naohiro Aota wrote: > In the current implementation, block groups are activated at > reservation time to ensure that all reserved bytes can be written to > an active metadata block group. However, this approach has proven to > be less efficient, as it activates block groups more frequently than > necessary, putting pressure on the active zone resource and leading to > potential issues such as early ENOSPC or hung_task. > > Another drawback of the current method is that it hampers metadata > over-commit, and necessitates additional flush operations and block > group allocations, resulting in decreased overall performance. > > Actually, we don't need so many active metadata block groups because > there is only one sequential metadata write stream. > > So, this series introduces a write-time activation of metadata and > system block group. This involves reserving at least one active block > group specifically for a metadata and system block group. When the > write goes into a new block group, it should have allocated all the > regions in the current active block group. So, we can wait for IOs to > fill the space, and then switch to a new block group. > > Switching to the write-time activation solves the above issue and will > lead to better performance. > > * Performance > > There is a significant difference with a workload (buffered write without > sync) because we re-enable metadata over-commit. > > before the patch: 741.00 MB/sec > after the patch: 1430.27 MB/sec (+ 93%) > > * Organization > > Patches 1-5 are preparation patches involves meta_write_pointer check. > > Patches 6 and 7 are the main part of this series, implementing the > write-time activation. > > Patches 8-10 addresses code for reserve time activation: counting fresh > block group as zone_unusable, activating a block group on allocation, > and disabling metadata over-commit. > Hey Naohiro, This enabled me to turn on the zoned vm for the GitHub CI, we're only failing 7 tests now, so great job! However all the !zoned vms panic immediately https://paste.centos.org/view/54d11384 Can you fix that up? Also you can submit a PR against the 'ci' branch of our linux repo in the btrfs GitHub project to run through the CI yourself to make sure you didn't mess anything up. Thanks, Josef
On Tue, Aug 08, 2023 at 01:12:30AM +0900, Naohiro Aota wrote: > In the current implementation, block groups are activated at > reservation time to ensure that all reserved bytes can be written to > an active metadata block group. However, this approach has proven to > be less efficient, as it activates block groups more frequently than > necessary, putting pressure on the active zone resource and leading to > potential issues such as early ENOSPC or hung_task. > > Another drawback of the current method is that it hampers metadata > over-commit, and necessitates additional flush operations and block > group allocations, resulting in decreased overall performance. > > Actually, we don't need so many active metadata block groups because > there is only one sequential metadata write stream. > > So, this series introduces a write-time activation of metadata and > system block group. This involves reserving at least one active block > group specifically for a metadata and system block group. When the > write goes into a new block group, it should have allocated all the > regions in the current active block group. So, we can wait for IOs to > fill the space, and then switch to a new block group. > > Switching to the write-time activation solves the above issue and will > lead to better performance. > > * Performance > > There is a significant difference with a workload (buffered write without > sync) because we re-enable metadata over-commit. > > before the patch: 741.00 MB/sec > after the patch: 1430.27 MB/sec (+ 93%) > > * Organization > > Patches 1-5 are preparation patches involves meta_write_pointer check. > > Patches 6 and 7 are the main part of this series, implementing the > write-time activation. > > Patches 8-10 addresses code for reserve time activation: counting fresh > block group as zone_unusable, activating a block group on allocation, > and disabling metadata over-commit. > > * Changes Additionally you had these failures in the CI setup btrfs/220 btrfs/237 btrfs/239 btrfs/273 btrfs/295 generic/551 generic/574 I've excluded them so we can catch regressions, but everything except btrfs/220 seem like legitimate failures. btrfs/220 needs to be updated since zoned doesn't do discard=async, but you can do that whenever, I'm less worried about that. The rest should be investigated at some point, though not as a prerequisite for merging this series. Thanks, Josef
On Thu, Aug 10, 2023 at 08:59:37AM -0400, Josef Bacik wrote: > On Tue, Aug 08, 2023 at 01:12:30AM +0900, Naohiro Aota wrote: > > In the current implementation, block groups are activated at > > reservation time to ensure that all reserved bytes can be written to > > an active metadata block group. However, this approach has proven to > > be less efficient, as it activates block groups more frequently than > > necessary, putting pressure on the active zone resource and leading to > > potential issues such as early ENOSPC or hung_task. > > > > Another drawback of the current method is that it hampers metadata > > over-commit, and necessitates additional flush operations and block > > group allocations, resulting in decreased overall performance. > > > > Actually, we don't need so many active metadata block groups because > > there is only one sequential metadata write stream. > > > > So, this series introduces a write-time activation of metadata and > > system block group. This involves reserving at least one active block > > group specifically for a metadata and system block group. When the > > write goes into a new block group, it should have allocated all the > > regions in the current active block group. So, we can wait for IOs to > > fill the space, and then switch to a new block group. > > > > Switching to the write-time activation solves the above issue and will > > lead to better performance. > > > > * Performance > > > > There is a significant difference with a workload (buffered write without > > sync) because we re-enable metadata over-commit. > > > > before the patch: 741.00 MB/sec > > after the patch: 1430.27 MB/sec (+ 93%) > > > > * Organization > > > > Patches 1-5 are preparation patches involves meta_write_pointer check. > > > > Patches 6 and 7 are the main part of this series, implementing the > > write-time activation. > > > > Patches 8-10 addresses code for reserve time activation: counting fresh > > block group as zone_unusable, activating a block group on allocation, > > and disabling metadata over-commit. > > > > Hey Naohiro, > > This enabled me to turn on the zoned vm for the GitHub CI, we're only failing 7 > tests now, so great job! Thanks! The github CI setup is really interesting. I tried to figure out how it setup the zoned devices. Are they QEmu emulated ZNS devices? > However all the !zoned vms panic immediately > > https://paste.centos.org/view/54d11384 > > Can you fix that up? Also you can submit a PR against the 'ci' branch of our > linux repo in the btrfs GitHub project to run through the CI yourself to make > sure you didn't mess anything up. Thanks, I sent a candidate fix as a PR. I hope it works well. > > Josef
On Thu, Aug 10, 2023 at 09:34:58AM -0400, Josef Bacik wrote: > On Tue, Aug 08, 2023 at 01:12:30AM +0900, Naohiro Aota wrote: > > In the current implementation, block groups are activated at > > reservation time to ensure that all reserved bytes can be written to > > an active metadata block group. However, this approach has proven to > > be less efficient, as it activates block groups more frequently than > > necessary, putting pressure on the active zone resource and leading to > > potential issues such as early ENOSPC or hung_task. > > > > Another drawback of the current method is that it hampers metadata > > over-commit, and necessitates additional flush operations and block > > group allocations, resulting in decreased overall performance. > > > > Actually, we don't need so many active metadata block groups because > > there is only one sequential metadata write stream. > > > > So, this series introduces a write-time activation of metadata and > > system block group. This involves reserving at least one active block > > group specifically for a metadata and system block group. When the > > write goes into a new block group, it should have allocated all the > > regions in the current active block group. So, we can wait for IOs to > > fill the space, and then switch to a new block group. > > > > Switching to the write-time activation solves the above issue and will > > lead to better performance. > > > > * Performance > > > > There is a significant difference with a workload (buffered write without > > sync) because we re-enable metadata over-commit. > > > > before the patch: 741.00 MB/sec > > after the patch: 1430.27 MB/sec (+ 93%) > > > > * Organization > > > > Patches 1-5 are preparation patches involves meta_write_pointer check. > > > > Patches 6 and 7 are the main part of this series, implementing the > > write-time activation. > > > > Patches 8-10 addresses code for reserve time activation: counting fresh > > block group as zone_unusable, activating a block group on allocation, > > and disabling metadata over-commit. > > > > * Changes > > Additionally you had these failures in the CI setup > > btrfs/220 btrfs/237 btrfs/239 btrfs/273 btrfs/295 generic/551 generic/574 > > I've excluded them so we can catch regressions, but everything except btrfs/220 > seem like legitimate failures. btrfs/220 needs to be updated since zoned > doesn't do discard=async, but you can do that whenever, I'm less worried about > that. The rest should be investigated at some point, though not as a > prerequisite for merging this series. Thanks, I checked the CI log. Yes, btrfs/220 is due to discards=async. * known to fail - btrfs/237: we need to tweak the test for ZNS (zone capacity != zone size) - btrfs/239: somehow, tree-log is behaving differently on zoned mode... I have no idea why it fail. But, I think it is still a valid status... * need to modify test? - btrfs/295: overwriting a zoned device won't work. So, this test should be skipped. - generic/574: not sure fsverity works with zoned mode. Need to check. So, btrfs/273 and generic/551 are suspicious. btrfs/273 prints some WARN dmesg and generic/551 killed a AIO_TEST program... Are there details available? > > Josef
On Thu, Aug 10, 2023 at 02:34:11PM +0000, Naohiro Aota wrote: > > seem like legitimate failures. btrfs/220 needs to be updated since zoned > > doesn't do discard=async, but you can do that whenever, I'm less worried about > > that. The rest should be investigated at some point, though not as a > > prerequisite for merging this series. Thanks, > > I checked the CI log. Yes, btrfs/220 is due to discards=async. > > * known to fail > - btrfs/237: we need to tweak the test for ZNS (zone capacity != zone size) > - btrfs/239: somehow, tree-log is behaving differently on zoned mode... I > have no idea why it fail. But, I think it is still a valid status... > > * need to modify test? > - generic/574: not sure fsverity works with zoned mode. Need to check. The compatibility matrix at https://btrfs.readthedocs.io/en/latest/Status.html#zoned-mode does not mention fsverity, so somebody has to test it and add the entry.