Message ID | 20211116101431.105252-1-hreitz@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2,for-6.2] file-posix: Fix alignment after reopen changing O_DIRECT | expand |
On Tue, Nov 16, 2021 at 11:14:31AM +0100, Hanna Reitz wrote: > From: Kevin Wolf <kwolf@redhat.com> > > At the end of a reopen, we already call bdrv_refresh_limits(), which > should update bs->request_alignment according to the new file > descriptor. However, raw_probe_alignment() relies on s->needs_alignment > and just uses 1 if it isn't set. We neglected to update this field, so > starting with cache=writeback and then reopening with cache=none means > that we get an incorrect bs->request_alignment == 1 and unaligned > requests fail instead of being automatically aligned. > > v2: > Don't continue operating on a qcow2 file with an effectively random > size, but create a new 1 MB file to run our O_DIRECT reopen read tests > on. Otherwise, we'll get into permission trouble because qemu-io does > not take the RESIZE permission by default, which it would need to > auto-align the file size to the sector size. > > Note that the qcow2 file is not even aligned to 512 byte sectors before > the test case (its size is 196616), but the error message doesn't appear > then because qemu internally calculates file sizes in multiples of 512 > bytes (BDRV_SECTOR_SIZE), so a misalignment can only become visible when > the physical sector size exceeds 512 bytes. > (That's "OK" because qemu only counts sizes in multiples of 512, so any > resize below that granularity is not visible as a resize to qemu, and so > does not need the RESIZE permission.) > > Another way we could solve this problem is by having qemu-io take the > RESIZE permission, but I believe it would need to retain it not only > through the reopen, but until the image size is aligned to the sector > size; that is, we would need to resize the image anyway to be able to > drop the permission and not keep it constantly. So it's simpler to just > create an aligned image from the start. I concur that your patch is simpler, and since we are in rc phase, simpler is better. > +++ b/tests/qemu-iotests/142 > @@ -350,6 +350,35 @@ info block backing-file" > > echo "$hmp_cmds" | run_qemu -drive "$files","$ids" | grep "Cache" > > +echo > +echo "--- Alignment after changing O_DIRECT ---" > +echo > + > +# Directly test the protocol level: Can unaligned requests succeed even if > +# O_DIRECT was only enabled through a reopen and vice versa? > + > +# Ensure image size is a multiple of the sector size (required for O_DIRECT) > +$QEMU_IMG create -f file "$TEST_IMG" 1M | _filter_img_create > + > +# And write some data (not strictly necessary, but it feels better to actually > +# have something to be read) > +$QEMU_IO -f file -c 'write 0 4096' "$TEST_IMG" | _filter_qemu_io Looks nice. > + > +$QEMU_IO --cache=writeback -f file $TEST_IMG <<EOF | _filter_qemu_io > +read 42 42 > +reopen -o cache.direct=on > +read 42 42 This particular region of the disk is a sub-sector, but completely contained within one sector. Is it also worth testing the case of an unaligned read that crosses a 4096-byte boundary? But we could do that on top, your patch is already an improvement and catches the particular problem you set out to solve. Reviewed-by: Eric Blake <eblake@redhat.com>
diff --git a/block/file-posix.c b/block/file-posix.c index 7a27c83060..b283093e5b 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -167,6 +167,7 @@ typedef struct BDRVRawState { int page_cache_inconsistent; /* errno from fdatasync failure */ bool has_fallocate; bool needs_alignment; + bool force_alignment; bool drop_cache; bool check_cache_dropped; struct { @@ -351,6 +352,17 @@ static bool dio_byte_aligned(int fd) return false; } +static bool raw_needs_alignment(BlockDriverState *bs) +{ + BDRVRawState *s = bs->opaque; + + if ((bs->open_flags & BDRV_O_NOCACHE) != 0 && !dio_byte_aligned(s->fd)) { + return true; + } + + return s->force_alignment; +} + /* Check if read is allowed with given memory buffer and length. * * This function is used to check O_DIRECT memory buffer and request alignment. @@ -728,9 +740,6 @@ static int raw_open_common(BlockDriverState *bs, QDict *options, s->has_discard = true; s->has_write_zeroes = true; - if ((bs->open_flags & BDRV_O_NOCACHE) != 0 && !dio_byte_aligned(s->fd)) { - s->needs_alignment = true; - } if (fstat(s->fd, &st) < 0) { ret = -errno; @@ -784,9 +793,10 @@ static int raw_open_common(BlockDriverState *bs, QDict *options, * so QEMU makes sure all IO operations on the device are aligned * to sector size, or else FreeBSD will reject them with EINVAL. */ - s->needs_alignment = true; + s->force_alignment = true; } #endif + s->needs_alignment = raw_needs_alignment(bs); #ifdef CONFIG_XFS if (platform_test_xfs_fd(s->fd)) { @@ -1251,7 +1261,9 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp) BDRVRawState *s = bs->opaque; struct stat st; + s->needs_alignment = raw_needs_alignment(bs); raw_probe_alignment(bs, s->fd, errp); + bs->bl.min_mem_alignment = s->buf_align; bs->bl.opt_mem_alignment = MAX(s->buf_align, qemu_real_host_page_size); diff --git a/tests/qemu-iotests/142 b/tests/qemu-iotests/142 index 69fd10ef51..86d65a2d1a 100755 --- a/tests/qemu-iotests/142 +++ b/tests/qemu-iotests/142 @@ -350,6 +350,35 @@ info block backing-file" echo "$hmp_cmds" | run_qemu -drive "$files","$ids" | grep "Cache" +echo +echo "--- Alignment after changing O_DIRECT ---" +echo + +# Directly test the protocol level: Can unaligned requests succeed even if +# O_DIRECT was only enabled through a reopen and vice versa? + +# Ensure image size is a multiple of the sector size (required for O_DIRECT) +$QEMU_IMG create -f file "$TEST_IMG" 1M | _filter_img_create + +# And write some data (not strictly necessary, but it feels better to actually +# have something to be read) +$QEMU_IO -f file -c 'write 0 4096' "$TEST_IMG" | _filter_qemu_io + +$QEMU_IO --cache=writeback -f file $TEST_IMG <<EOF | _filter_qemu_io +read 42 42 +reopen -o cache.direct=on +read 42 42 +reopen -o cache.direct=off +read 42 42 +EOF +$QEMU_IO --cache=none -f file $TEST_IMG <<EOF | _filter_qemu_io +read 42 42 +reopen -o cache.direct=off +read 42 42 +reopen -o cache.direct=on +read 42 42 +EOF + # success, all done echo "*** done" rm -f $seq.full diff --git a/tests/qemu-iotests/142.out b/tests/qemu-iotests/142.out index a92b948edd..e20cfc8eb8 100644 --- a/tests/qemu-iotests/142.out +++ b/tests/qemu-iotests/142.out @@ -747,4 +747,22 @@ cache.no-flush=on on backing-file Cache mode: writeback Cache mode: writeback, direct Cache mode: writeback, ignore flushes + +--- Alignment after changing O_DIRECT --- + +Formatting 'TEST_DIR/t.IMGFMT', fmt=file size=1048576 +wrote 4096/4096 bytes at offset 0 +4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 42/42 bytes at offset 42 +42 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 42/42 bytes at offset 42 +42 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 42/42 bytes at offset 42 +42 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 42/42 bytes at offset 42 +42 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 42/42 bytes at offset 42 +42 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 42/42 bytes at offset 42 +42 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) *** done