mbox series

[0/5] file-posix: Clean up and fix zoned checks

Message ID 20230824155345.109765-1-hreitz@redhat.com (mailing list archive)
Headers show
Series file-posix: Clean up and fix zoned checks | expand

Message

Hanna Czenczek Aug. 24, 2023, 3:53 p.m. UTC
Hi,

As presented in [1] there is a bug in the zone code in raw_co_prw(),
specifically we don’t check whether there actually is zone information
before running code that assumes there is (and thus we run into a
division by zero).  This has now also been reported in [2].

I believe the solution [1] is incomplete, though, which is why I’m
sending this separate series: I don’t think checking bs->wps and/or
bs->bl.zone_size to determine whether there is zone information is
right; for example, we do not have raw_refresh_zoned_limits() clear
those values if on a refresh, zone information were to disappear.

It is also weird that we separate checking bs->wps and bs->bl.zone_size
at all; raw_refresh_zoned_limits() seems to intend to ensure that either
we have information with non-NULL bs->wps and non-zero bs->bl.zone_size,
or we don’t.

I think we should have a single flag that tells whether we have valid
information or not, and it looks to me like bs->bl.zoned != BLK_Z_NONE
is the condition that fits best.

Patch 1 ensures that raw_refresh_zoned_limits() will set bs->bl.zoned to
BLK_Z_NONE on error, so that we can actually be sure that this condition
tells us whether we have valid information or not.

Patch 2 unifies all conditional checks for zone information to use
bs->bl.zoned != BLK_Z_NONE.

Patch 3 is the I/O error path fix, which is not really different from
[1].

Patch 4 does a bit of clean-up.

Patch 5 adds a regression test.


[1] https://lists.nongnu.org/archive/html/qemu-devel/2023-06/msg01742.html
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2234374


Hanna Czenczek (5):
  file-posix: Clear bs->bl.zoned on error
  file-posix: Check bs->bl.zoned for zone info
  file-posix: Fix zone update in I/O error path
  file-posix: Simplify raw_co_prw's 'out' zone code
  tests/file-io-error: New test

 block/file-posix.c                         | 42 +++++----
 tests/qemu-iotests/tests/file-io-error     | 99 ++++++++++++++++++++++
 tests/qemu-iotests/tests/file-io-error.out | 31 +++++++
 3 files changed, 150 insertions(+), 22 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/file-io-error
 create mode 100644 tests/qemu-iotests/tests/file-io-error.out

Comments

Sam Li Aug. 24, 2023, 4:53 p.m. UTC | #1
Hi Hanna,

Hanna Czenczek <hreitz@redhat.com> 于2023年8月24日周四 23:53写道:
>
> Hi,
>
> As presented in [1] there is a bug in the zone code in raw_co_prw(),
> specifically we don’t check whether there actually is zone information
> before running code that assumes there is (and thus we run into a
> division by zero).  This has now also been reported in [2].

Thanks for catching the bugs and your work.

>
> I believe the solution [1] is incomplete, though, which is why I’m
> sending this separate series: I don’t think checking bs->wps and/or
> bs->bl.zone_size to determine whether there is zone information is
> right; for example, we do not have raw_refresh_zoned_limits() clear
> those values if on a refresh, zone information were to disappear.
>
> It is also weird that we separate checking bs->wps and bs->bl.zone_size
> at all; raw_refresh_zoned_limits() seems to intend to ensure that either
> we have information with non-NULL bs->wps and non-zero bs->bl.zone_size,
> or we don’t.
>
> I think we should have a single flag that tells whether we have valid
> information or not, and it looks to me like bs->bl.zoned != BLK_Z_NONE
> is the condition that fits best.

The former way only checks zone information when it is being used to
avoid divide-by-zero or nullptr errors. Putting the error path with
non-zoned model implies a zoned device must have non-zero zone size
and allocated write pointers. Given that no other parts are changing
the zone_size to 0 and free wps, It does simplify the code path.

Thanks,
Sam
Michael Tokarev Sept. 21, 2023, 6:21 p.m. UTC | #2
24.08.2023 18:53, Hanna Czenczek wrote:
> Hi,
> 
> As presented in [1] there is a bug in the zone code in raw_co_prw(),
> specifically we don’t check whether there actually is zone information
> before running code that assumes there is (and thus we run into a
> division by zero).  This has now also been reported in [2].
> 
> I believe the solution [1] is incomplete, though, which is why I’m
> sending this separate series: I don’t think checking bs->wps and/or
> bs->bl.zone_size to determine whether there is zone information is
> right; for example, we do not have raw_refresh_zoned_limits() clear
> those values if on a refresh, zone information were to disappear.
> 
> It is also weird that we separate checking bs->wps and bs->bl.zone_size
> at all; raw_refresh_zoned_limits() seems to intend to ensure that either
> we have information with non-NULL bs->wps and non-zero bs->bl.zone_size,
> or we don’t.
> 
> I think we should have a single flag that tells whether we have valid
> information or not, and it looks to me like bs->bl.zoned != BLK_Z_NONE
> is the condition that fits best.
> 
> Patch 1 ensures that raw_refresh_zoned_limits() will set bs->bl.zoned to
> BLK_Z_NONE on error, so that we can actually be sure that this condition
> tells us whether we have valid information or not.
> 
> Patch 2 unifies all conditional checks for zone information to use
> bs->bl.zoned != BLK_Z_NONE.
> 
> Patch 3 is the I/O error path fix, which is not really different from
> [1].
> 
> Patch 4 does a bit of clean-up.
> 
> Patch 5 adds a regression test.
> 
> 
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-06/msg01742.html
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=2234374

Is this stable-worthy (at least 1-3)?  From the bug description it smells
like it should be in 8.1.x, or maybe whole series.

Thanks,

/mjt
Michael Tokarev Sept. 21, 2023, 6:52 p.m. UTC | #3
21.09.2023 21:21, Michael Tokarev wrote:
..
> Is this stable-worthy (at least 1-3)?  From the bug description it smells
> like it should be in 8.1.x, or maybe whole series.

N/M, this whole patchset has been Cc'd qemu-stable already.

Thanks,

/mjt