Message ID | 20190312091944.2044-1-zlang@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | generic: unaligned direct AIO write test | expand |
On Tue, Mar 12, 2019 at 05:19:44PM +0800, Zorro Lang wrote: > A simply reproducer from Frank Sorenson: > > ftruncate(fd, 65012224) > io_prep_pwrite(iocbs[0], fd, buf[0], 1048576, 63963648); > io_prep_pwrite(iocbs[1], fd, buf[1], 1048576, 65012224); > > io_prep_pwrite(iocbs[2], fd, buf[2], 1048576, 66060800); > io_prep_pwrite(iocbs[3], fd, buf[3], 1048576, 67109376); > > io_submit(io_ctx, 1, &iocbs[0]); > io_submit(io_ctx, 1, &iocbs[1]); > io_submit(io_ctx, 1, &iocbs[2]); > io_submit(io_ctx, 1, &iocbs[3]); > > io_getevents(io_ctx, 4, 4, events, NULL) > > help to find an ext4 corruption: > **************** **************** **************** > * page 1 * * page 2 * * page 3 * > **************** **************** **************** > existing 0000000000000000 0000000000000000 0000000000000000 > write 1 AAAAAAAAAAAAAA AA > write 2 BBBBBBBBBBBBBB BB > > result 00AAAAAAAAAAAAAA 00BBBBBBBBBBBBBB BB00000000000000 > desired 00AAAAAAAAAAAAAA AABBBBBBBBBBBBBB BB00000000000000 > > This issue remind us we might miss unaligned AIO test for long time. > We thought fsx cover this part, but looks like it's not. So this case > trys to cover unaligned direct AIO write test on file with different > initial truncate i_size. > > Signed-off-by: Zorro Lang <zlang@redhat.com> > --- > > Hi, > > The aio-dio-write-verify.c nearly copy from aio-dio-eof-race.c, then change > a few logic. I've changed aio-dio-eof-race.c several times, to fit different > new AIO test cases. But this time I need to truncate the file, then it won't > write at EOF. This against the motive of aio-dio-eof-race.c. So I think it's > time to shift a new program to do more AIO write test. > > It's only my personal opinion :) > > Thanks, > Zorro > > src/aio-dio-regress/aio-dio-write-verify.c | 223 +++++++++++++++++++++ > tests/generic/999 | 71 +++++++ > tests/generic/999.out | 2 + > tests/generic/group | 1 + > 4 files changed, 297 insertions(+) > create mode 100644 src/aio-dio-regress/aio-dio-write-verify.c > create mode 100755 tests/generic/999 > create mode 100644 tests/generic/999.out > > diff --git a/src/aio-dio-regress/aio-dio-write-verify.c b/src/aio-dio-regress/aio-dio-write-verify.c > new file mode 100644 > index 00000000..402ddcdb > --- /dev/null > +++ b/src/aio-dio-regress/aio-dio-write-verify.c > @@ -0,0 +1,223 @@ > +// SPDX-License-Identifier: GPL-2.0+ > +/* > + * Copyright (C) 2019 Red Hat, Inc. All Rights reserved. > + * > + * AIO writes from a start offset on a truncated file, verify there's not > + * data corruption. > + */ > +#include <sys/stat.h> > +#include <sys/types.h> > +#include <errno.h> > +#include <fcntl.h> > +#include <stdio.h> > +#include <stdlib.h> > +#include <unistd.h> > +#include <ctype.h> > + > +#include <libaio.h> > + > +unsigned long buf_size = 0; > +unsigned long size_KB = 0; > +#define IO_PATTERN 0x5a > + > +void > +usage(char *progname) > +{ > + fprintf(stderr, "usage: %s [-s datasize] [-b bufsize] [-o startoff] [-t truncsize] filename\n" > + "\t-s datasize: specify the minimum data size(KB), doesn't count holes\n" > + "\t-b bufsize: buffer size\n" > + "\t-o startoff: start offset to write data, 0 by default\n" > + "\t-t truncsize: truncate the file to a special size at first 0 by default\n", > + progname); > + exit(1); > +} > + > +void > +dump_buffer( > + void *buf, > + off64_t offset, > + ssize_t len) > +{ > + int i, j; > + char *p; > + int new; > + > + for (i = 0, p = (char *)buf; i < len; i += 16) { > + char *s = p; > + > + if (i && !memcmp(p, p - 16, 16)) { > + new = 0; > + } else { > + if (i) > + printf("*\n"); > + new = 1; > + } > + > + if (!new) { > + p += 16; > + continue; > + } > + > + printf("%08llx ", (unsigned long long)offset + i); > + for (j = 0; j < 16 && i + j < len; j++, p++) > + printf("%02x ", *p); > + printf(" "); > + for (j = 0; j < 16 && i + j < len; j++, s++) { > + if (isalnum((int)*s)) > + printf("%c", *s); > + else > + printf("."); > + } > + printf("\n"); > + > + } > + printf("%08llx\n", (unsigned long long)offset + i); > +} > + > +int main(int argc, char *argv[]) > +{ > + struct io_context *ctx = NULL; > + struct io_event evs[2]; > + struct iocb iocb1, iocb2; > + struct iocb *iocbs[] = { &iocb1, &iocb2 }; > + void *buf; > + int fd, err = 0; > + off_t bytes; > + int c; > + char *cmp_buf = NULL; > + char *filename = NULL; > + /* start offset to write */ > + long long startoff = 0; > + /* truncate size */ > + off_t tsize = 0; > + > + while ((c = getopt(argc, argv, "s:b:o:t:")) != -1) { > + char *endp; > + > + switch (c) { > + case 's': /* XXX MB size will be extended */ > + size_KB = strtol(optarg, &endp, 0); > + break; > + case 'b': /* buffer size */ > + buf_size = strtol(optarg, &endp, 0); > + break; > + case 'o': /* start offset */ > + startoff = strtoll(optarg, &endp, 0); > + break; > + case 't': /* initial truncate size */ > + tsize = strtoul(optarg, &endp, 0); > + break; > + default: > + usage(argv[0]); > + } > + } > + > + if (size_KB == 0) /* default size is 64KB */ > + size_KB = 64; > + if (buf_size < 2048) /* default minimum buffer size is 2048 bytes */ > + buf_size = 2048; > + > + if (optind == argc - 1) > + filename = argv[optind]; > + else > + usage(argv[0]); > + > + fd = open(filename, O_DIRECT | O_CREAT | O_TRUNC | O_RDWR, 0600); > + if (fd == -1) { > + perror("open"); > + return 1; > + } > + > + /* truncate the file to a special size at first */ > + if (tsize != 0) { > + ftruncate(fd, tsize); > + if (fd == -1) { Huh? ret = ftruncate(...); if (ret) ? > + perror("ftruncate"); > + return 1; > + } > + } > + > + err = posix_memalign(&buf, getpagesize(), buf_size); > + if (err) { > + fprintf(stderr, "error %s during %s\n", > + strerror(err), > + "posix_memalign"); > + return 1; > + } > + cmp_buf = malloc(buf_size); > + memset(cmp_buf, IO_PATTERN, buf_size); > + > + err = io_setup(5, &ctx); > + if (err) { > + fprintf(stderr, "error %s during %s\n", > + strerror(err), > + "io_setup"); > + return 1; > + } > + > + bytes = 0; > + /* Keep extending until size_KB */ > + while (bytes < size_KB * 1024) { > + ssize_t sret; > + int i; > + > + memset(buf, IO_PATTERN, buf_size); > + > + io_prep_pwrite(&iocb1, fd, buf, buf_size/2, \ > + startoff + bytes + 0*buf_size/2); > + io_prep_pwrite(&iocb2, fd, buf, buf_size/2, \ > + startoff + bytes + 1*buf_size/2); > + > + err = io_submit(ctx, 2, iocbs); > + if (err != 2) { > + fprintf(stderr, "error %s during %s\n", > + strerror(err), > + "io_submit"); > + return 1; > + } > + > + err = io_getevents(ctx, 2, 2, evs, NULL); > + if (err != 2) { > + fprintf(stderr, "error %s during %s\n", > + strerror(err), > + "io_getevents"); > + return 1; > + } > + > + for (i = 0; i < err; i++) { > + /* > + * res is unsigned for some reason, so this is the best > + * way to detect that it contains a negative errno. > + */ > + if (evs[i].res > buf_size / 2) { > + fprintf(stderr, "pwrite: %s\n", > + strerror(-evs[i].res)); > + return 1; > + } > + } > + fsync(fd); Ignoring the return value here... --D > + > + /* > + * And then read it back, then compare with original content. > + */ > + sret = pread(fd, buf, buf_size, startoff + bytes); > + if (sret == -1) { > + perror("pread"); > + return 1; > + } else if (sret != buf_size) { > + fprintf(stderr, "short read %zd was less than %zu\n", > + sret, buf_size); > + return 1; > + } > + if (memcmp(buf, cmp_buf, buf_size)) { > + printf("Find corruption\n"); > + dump_buffer(buf, 0, buf_size); > + return 1; > + } > + > + /* Calculate how many bytes we've written after pread */ > + bytes += buf_size; > + } > + > + return 0; > +} > diff --git a/tests/generic/999 b/tests/generic/999 > new file mode 100755 > index 00000000..6fcc593a > --- /dev/null > +++ b/tests/generic/999 > @@ -0,0 +1,71 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2019 Red Hat, Inc. All Rights Reserved. > +# > +# FS QA Test No. 999 > +# > +# Non-page-aligned direct AIO write test. AIO write from unalinged offset > +# on a file with different initial truncate i_size. > +# > +# Uncover "ext4: Fix data corruption caused by unaligned direct AIO" > +# > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + cd / > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > + > +# remove previous $seqres.full before test > +rm -f $seqres.full > + > +# real QA test starts here > +_supported_fs generic > +_supported_os Linux > +_require_test > +_require_aiodio aio-dio-write-verify > +_require_test_program "feature" > + > +localfile=$TEST_DIR/tst-aio-dio-testfile > +diosize=`_min_dio_alignment $TEST_DEV` > +pagesize=`src/feature -s` > +bufsize=$((pagesize * 2)) > +filesize=$((bufsize * 3 / 1024)) > + > +# Need smaller logical block size to do non-page-aligned test > +if [ $diosize -ge $pagesize ];then > + _notrun "Need device logical block size($diosize) < page size($pagesize)" > +fi > + > +rm -rf $localfile 2>/dev/null > +# page-aligned aiodio write verification at first > +$AIO_TEST -s $((bufsize * 10)) -b $bufsize $localfile > + > +# non-page-aligned aiodio write verification > +i=0 > +while [ $((diosize * i)) -lt $bufsize ];do > + truncsize=$((diosize * i++)) > + # non-page-aligned AIO write on different i_size file > + $AIO_TEST -s $filesize -b $bufsize -o $diosize -t $truncsize $localfile > + if [ $? -ne 0 ];then > + echo "[FAIL] $AIO_TEST -s $filesize -b $bufsize -o $diosize -t $truncsize $localfile" > + fi > +done > + > +echo "Silence is golden" > + > +# success, all done > +status=0 > +exit > diff --git a/tests/generic/999.out b/tests/generic/999.out > new file mode 100644 > index 00000000..3b276ca8 > --- /dev/null > +++ b/tests/generic/999.out > @@ -0,0 +1,2 @@ > +QA output created by 999 > +Silence is golden > diff --git a/tests/generic/group b/tests/generic/group > index 15227b67..b1f93d99 100644 > --- a/tests/generic/group > +++ b/tests/generic/group > @@ -534,3 +534,4 @@ > 529 auto quick attr > 530 auto quick unlink > 531 auto quick unlink > +999 auto quick aio > -- > 2.17.2 >
On Tue, Mar 12, 2019 at 06:12:08PM -0700, Darrick J. Wong wrote: > On Tue, Mar 12, 2019 at 05:19:44PM +0800, Zorro Lang wrote: > > A simply reproducer from Frank Sorenson: > > > > ftruncate(fd, 65012224) > > io_prep_pwrite(iocbs[0], fd, buf[0], 1048576, 63963648); > > io_prep_pwrite(iocbs[1], fd, buf[1], 1048576, 65012224); > > > > io_prep_pwrite(iocbs[2], fd, buf[2], 1048576, 66060800); > > io_prep_pwrite(iocbs[3], fd, buf[3], 1048576, 67109376); > > > > io_submit(io_ctx, 1, &iocbs[0]); > > io_submit(io_ctx, 1, &iocbs[1]); > > io_submit(io_ctx, 1, &iocbs[2]); > > io_submit(io_ctx, 1, &iocbs[3]); > > > > io_getevents(io_ctx, 4, 4, events, NULL) > > > > help to find an ext4 corruption: > > **************** **************** **************** > > * page 1 * * page 2 * * page 3 * > > **************** **************** **************** > > existing 0000000000000000 0000000000000000 0000000000000000 > > write 1 AAAAAAAAAAAAAA AA > > write 2 BBBBBBBBBBBBBB BB > > > > result 00AAAAAAAAAAAAAA 00BBBBBBBBBBBBBB BB00000000000000 > > desired 00AAAAAAAAAAAAAA AABBBBBBBBBBBBBB BB00000000000000 > > > > This issue remind us we might miss unaligned AIO test for long time. > > We thought fsx cover this part, but looks like it's not. So this case > > trys to cover unaligned direct AIO write test on file with different > > initial truncate i_size. > > > > Signed-off-by: Zorro Lang <zlang@redhat.com> > > --- > > > > Hi, > > > > The aio-dio-write-verify.c nearly copy from aio-dio-eof-race.c, then change > > a few logic. I've changed aio-dio-eof-race.c several times, to fit different > > new AIO test cases. But this time I need to truncate the file, then it won't > > write at EOF. This against the motive of aio-dio-eof-race.c. So I think it's > > time to shift a new program to do more AIO write test. > > > > It's only my personal opinion :) > > > > Thanks, > > Zorro > > > > src/aio-dio-regress/aio-dio-write-verify.c | 223 +++++++++++++++++++++ > > tests/generic/999 | 71 +++++++ > > tests/generic/999.out | 2 + > > tests/generic/group | 1 + > > 4 files changed, 297 insertions(+) > > create mode 100644 src/aio-dio-regress/aio-dio-write-verify.c > > create mode 100755 tests/generic/999 > > create mode 100644 tests/generic/999.out > > > > diff --git a/src/aio-dio-regress/aio-dio-write-verify.c b/src/aio-dio-regress/aio-dio-write-verify.c > > new file mode 100644 > > index 00000000..402ddcdb > > --- /dev/null > > +++ b/src/aio-dio-regress/aio-dio-write-verify.c > > @@ -0,0 +1,223 @@ > > +// SPDX-License-Identifier: GPL-2.0+ > > +/* > > + * Copyright (C) 2019 Red Hat, Inc. All Rights reserved. > > + * > > + * AIO writes from a start offset on a truncated file, verify there's not > > + * data corruption. > > + */ > > +#include <sys/stat.h> > > +#include <sys/types.h> > > +#include <errno.h> [snip..] > > + /* truncate the file to a special size at first */ > > + if (tsize != 0) { > > + ftruncate(fd, tsize); > > + if (fd == -1) { > > Huh? ret = ftruncate(...); if (ret) ? Wow, my bad, that's totally a mistake. > > > + perror("ftruncate"); > > + return 1; > > + } > > + } > > + > > + err = posix_memalign(&buf, getpagesize(), buf_size); > > + if (err) { > > + fprintf(stderr, "error %s during %s\n", > > + strerror(err), > > + "posix_memalign"); > > + return 1; > > + } > > + cmp_buf = malloc(buf_size); > > + memset(cmp_buf, IO_PATTERN, buf_size); > > + > > + err = io_setup(5, &ctx); > > + if (err) { > > + fprintf(stderr, "error %s during %s\n", > > + strerror(err), > > + "io_setup"); > > + return 1; > > + } > > + > > + bytes = 0; > > + /* Keep extending until size_KB */ > > + while (bytes < size_KB * 1024) { > > + ssize_t sret; > > + int i; > > + > > + memset(buf, IO_PATTERN, buf_size); > > + > > + io_prep_pwrite(&iocb1, fd, buf, buf_size/2, \ > > + startoff + bytes + 0*buf_size/2); > > + io_prep_pwrite(&iocb2, fd, buf, buf_size/2, \ > > + startoff + bytes + 1*buf_size/2); > > + > > + err = io_submit(ctx, 2, iocbs); > > + if (err != 2) { > > + fprintf(stderr, "error %s during %s\n", > > + strerror(err), > > + "io_submit"); > > + return 1; > > + } > > + > > + err = io_getevents(ctx, 2, 2, evs, NULL); > > + if (err != 2) { > > + fprintf(stderr, "error %s during %s\n", > > + strerror(err), > > + "io_getevents"); > > + return 1; > > + } > > + > > + for (i = 0; i < err; i++) { > > + /* > > + * res is unsigned for some reason, so this is the best > > + * way to detect that it contains a negative errno. > > + */ > > + if (evs[i].res > buf_size / 2) { > > + fprintf(stderr, "pwrite: %s\n", > > + strerror(-evs[i].res)); > > + return 1; > > + } > > + } > > + fsync(fd); > > Ignoring the return value here... This operation is not a necessary step for the reproducer, so just try to do a sync, no matter the result. Hmm... should I remove it? > > --D > > > + > > + /* > > + * And then read it back, then compare with original content. > > + */ > > + sret = pread(fd, buf, buf_size, startoff + bytes); > > + if (sret == -1) { > > + perror("pread"); > > + return 1; > > + } else if (sret != buf_size) { > > + fprintf(stderr, "short read %zd was less than %zu\n", > > + sret, buf_size); > > + return 1; > > + } > > + if (memcmp(buf, cmp_buf, buf_size)) { > > + printf("Find corruption\n"); > > + dump_buffer(buf, 0, buf_size); > > + return 1; > > + } > > + > > + /* Calculate how many bytes we've written after pread */ > > + bytes += buf_size; > > + } > > + > > + return 0; > > +} > > diff --git a/tests/generic/999 b/tests/generic/999 > > new file mode 100755 > > index 00000000..6fcc593a > > --- /dev/null > > +++ b/tests/generic/999 > > @@ -0,0 +1,71 @@ > > +#! /bin/bash > > +# SPDX-License-Identifier: GPL-2.0 > > +# Copyright (c) 2019 Red Hat, Inc. All Rights Reserved. > > +# > > +# FS QA Test No. 999 > > +# > > +# Non-page-aligned direct AIO write test. AIO write from unalinged offset > > +# on a file with different initial truncate i_size. > > +# > > +# Uncover "ext4: Fix data corruption caused by unaligned direct AIO" > > +# > > +seq=`basename $0` > > +seqres=$RESULT_DIR/$seq > > +echo "QA output created by $seq" > > + > > +here=`pwd` > > +tmp=/tmp/$$ > > +status=1 # failure is the default! > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > + > > +_cleanup() > > +{ > > + cd / > > + rm -f $tmp.* > > +} > > + > > +# get standard environment, filters and checks > > +. ./common/rc > > +. ./common/filter > > + > > +# remove previous $seqres.full before test > > +rm -f $seqres.full > > + > > +# real QA test starts here > > +_supported_fs generic > > +_supported_os Linux > > +_require_test > > +_require_aiodio aio-dio-write-verify > > +_require_test_program "feature" > > + > > +localfile=$TEST_DIR/tst-aio-dio-testfile > > +diosize=`_min_dio_alignment $TEST_DEV` > > +pagesize=`src/feature -s` > > +bufsize=$((pagesize * 2)) > > +filesize=$((bufsize * 3 / 1024)) > > + > > +# Need smaller logical block size to do non-page-aligned test > > +if [ $diosize -ge $pagesize ];then > > + _notrun "Need device logical block size($diosize) < page size($pagesize)" > > +fi > > + > > +rm -rf $localfile 2>/dev/null > > +# page-aligned aiodio write verification at first > > +$AIO_TEST -s $((bufsize * 10)) -b $bufsize $localfile > > + > > +# non-page-aligned aiodio write verification > > +i=0 > > +while [ $((diosize * i)) -lt $bufsize ];do > > + truncsize=$((diosize * i++)) > > + # non-page-aligned AIO write on different i_size file > > + $AIO_TEST -s $filesize -b $bufsize -o $diosize -t $truncsize $localfile > > + if [ $? -ne 0 ];then > > + echo "[FAIL] $AIO_TEST -s $filesize -b $bufsize -o $diosize -t $truncsize $localfile" > > + fi > > +done > > + > > +echo "Silence is golden" > > + > > +# success, all done > > +status=0 > > +exit > > diff --git a/tests/generic/999.out b/tests/generic/999.out > > new file mode 100644 > > index 00000000..3b276ca8 > > --- /dev/null > > +++ b/tests/generic/999.out > > @@ -0,0 +1,2 @@ > > +QA output created by 999 > > +Silence is golden > > diff --git a/tests/generic/group b/tests/generic/group > > index 15227b67..b1f93d99 100644 > > --- a/tests/generic/group > > +++ b/tests/generic/group > > @@ -534,3 +534,4 @@ > > 529 auto quick attr > > 530 auto quick unlink > > 531 auto quick unlink > > +999 auto quick aio > > -- > > 2.17.2 > >
Hi Zorr, thanks for putting this together. On Tue, Mar 12, 2019 at 05:19:44PM +0800, Zorro Lang wrote: > A simply reproducer from Frank Sorenson: > > ftruncate(fd, 65012224) > io_prep_pwrite(iocbs[0], fd, buf[0], 1048576, 63963648); > io_prep_pwrite(iocbs[1], fd, buf[1], 1048576, 65012224); > > io_prep_pwrite(iocbs[2], fd, buf[2], 1048576, 66060800); > io_prep_pwrite(iocbs[3], fd, buf[3], 1048576, 67109376); These two are not actually necessary. > > io_submit(io_ctx, 1, &iocbs[0]); > io_submit(io_ctx, 1, &iocbs[1]); > io_submit(io_ctx, 1, &iocbs[2]); > io_submit(io_ctx, 1, &iocbs[3]); > > io_getevents(io_ctx, 4, 4, events, NULL) > > help to find an ext4 corruption: > **************** **************** **************** > * page 1 * * page 2 * * page 3 * > **************** **************** **************** > existing 0000000000000000 0000000000000000 0000000000000000 > write 1 AAAAAAAAAAAAAA AA > write 2 BBBBBBBBBBBBBB BB > > result 00AAAAAAAAAAAAAA 00BBBBBBBBBBBBBB BB00000000000000 > desired 00AAAAAAAAAAAAAA AABBBBBBBBBBBBBB BB00000000000000 > > This issue remind us we might miss unaligned AIO test for long time. > We thought fsx cover this part, but looks like it's not. So this case > trys to cover unaligned direct AIO write test on file with different > initial truncate i_size. > > Signed-off-by: Zorro Lang <zlang@redhat.com> > --- > > Hi, > > The aio-dio-write-verify.c nearly copy from aio-dio-eof-race.c, then change > a few logic. I've changed aio-dio-eof-race.c several times, to fit different > new AIO test cases. But this time I need to truncate the file, then it won't > write at EOF. This against the motive of aio-dio-eof-race.c. So I think it's > time to shift a new program to do more AIO write test. > > It's only my personal opinion :) > > Thanks, > Zorro > > src/aio-dio-regress/aio-dio-write-verify.c | 223 +++++++++++++++++++++ > tests/generic/999 | 71 +++++++ > tests/generic/999.out | 2 + > tests/generic/group | 1 + > 4 files changed, 297 insertions(+) > create mode 100644 src/aio-dio-regress/aio-dio-write-verify.c > create mode 100755 tests/generic/999 > create mode 100644 tests/generic/999.out > > diff --git a/src/aio-dio-regress/aio-dio-write-verify.c b/src/aio-dio-regress/aio-dio-write-verify.c > new file mode 100644 > index 00000000..402ddcdb > --- /dev/null > +++ b/src/aio-dio-regress/aio-dio-write-verify.c > @@ -0,0 +1,223 @@ > +// SPDX-License-Identifier: GPL-2.0+ > +/* > + * Copyright (C) 2019 Red Hat, Inc. All Rights reserved. > + * > + * AIO writes from a start offset on a truncated file, verify there's not > + * data corruption. Can you put the description of the problem here as well for the future generations to understand ? :) > + */ > +#include <sys/stat.h> > +#include <sys/types.h> > +#include <errno.h> > +#include <fcntl.h> > +#include <stdio.h> > +#include <stdlib.h> > +#include <unistd.h> > +#include <ctype.h> > + > +#include <libaio.h> > + > +unsigned long buf_size = 0; > +unsigned long size_KB = 0; > +#define IO_PATTERN 0x5a > + > +void > +usage(char *progname) > +{ > + fprintf(stderr, "usage: %s [-s datasize] [-b bufsize] [-o startoff] [-t truncsize] filename\n" > + "\t-s datasize: specify the minimum data size(KB), doesn't count holes\n" > + "\t-b bufsize: buffer size\n" > + "\t-o startoff: start offset to write data, 0 by default\n" > + "\t-t truncsize: truncate the file to a special size at first 0 by default\n", > + progname); > + exit(1); > +} > + > +void > +dump_buffer( > + void *buf, > + off64_t offset, > + ssize_t len) > +{ > + int i, j; > + char *p; > + int new; > + > + for (i = 0, p = (char *)buf; i < len; i += 16) { > + char *s = p; > + > + if (i && !memcmp(p, p - 16, 16)) { > + new = 0; > + } else { > + if (i) > + printf("*\n"); > + new = 1; > + } > + > + if (!new) { > + p += 16; > + continue; > + } > + > + printf("%08llx ", (unsigned long long)offset + i); > + for (j = 0; j < 16 && i + j < len; j++, p++) > + printf("%02x ", *p); > + printf(" "); > + for (j = 0; j < 16 && i + j < len; j++, s++) { > + if (isalnum((int)*s)) > + printf("%c", *s); > + else > + printf("."); > + } > + printf("\n"); > + > + } > + printf("%08llx\n", (unsigned long long)offset + i); > +} > + > +int main(int argc, char *argv[]) > +{ > + struct io_context *ctx = NULL; > + struct io_event evs[2]; > + struct iocb iocb1, iocb2; > + struct iocb *iocbs[] = { &iocb1, &iocb2 }; > + void *buf; > + int fd, err = 0; > + off_t bytes; > + int c; > + char *cmp_buf = NULL; > + char *filename = NULL; > + /* start offset to write */ > + long long startoff = 0; > + /* truncate size */ > + off_t tsize = 0; > + > + while ((c = getopt(argc, argv, "s:b:o:t:")) != -1) { > + char *endp; > + > + switch (c) { > + case 's': /* XXX MB size will be extended */ > + size_KB = strtol(optarg, &endp, 0); > + break; > + case 'b': /* buffer size */ > + buf_size = strtol(optarg, &endp, 0); > + break; > + case 'o': /* start offset */ > + startoff = strtoll(optarg, &endp, 0); > + break; > + case 't': /* initial truncate size */ > + tsize = strtoul(optarg, &endp, 0); > + break; > + default: > + usage(argv[0]); > + } > + } > + > + if (size_KB == 0) /* default size is 64KB */ > + size_KB = 64; > + if (buf_size < 2048) /* default minimum buffer size is 2048 bytes */ > + buf_size = 2048; 2048 is a weird number, but ok. > + > + if (optind == argc - 1) > + filename = argv[optind]; > + else > + usage(argv[0]); > + > + fd = open(filename, O_DIRECT | O_CREAT | O_TRUNC | O_RDWR, 0600); > + if (fd == -1) { > + perror("open"); > + return 1; > + } > + > + /* truncate the file to a special size at first */ > + if (tsize != 0) { > + ftruncate(fd, tsize); > + if (fd == -1) { > + perror("ftruncate"); > + return 1; > + } > + } > + > + err = posix_memalign(&buf, getpagesize(), buf_size); > + if (err) { > + fprintf(stderr, "error %s during %s\n", > + strerror(err), > + "posix_memalign"); > + return 1; > + } > + cmp_buf = malloc(buf_size); > + memset(cmp_buf, IO_PATTERN, buf_size); > + > + err = io_setup(5, &ctx); Are you expecting to have 5 events ? > + if (err) { > + fprintf(stderr, "error %s during %s\n", > + strerror(err), > + "io_setup"); > + return 1; > + } > + > + bytes = 0; > + /* Keep extending until size_KB */ > + while (bytes < size_KB * 1024) { > + ssize_t sret; > + int i; > + > + memset(buf, IO_PATTERN, buf_size); > + > + io_prep_pwrite(&iocb1, fd, buf, buf_size/2, \ > + startoff + bytes + 0*buf_size/2); > + io_prep_pwrite(&iocb2, fd, buf, buf_size/2, \ > + startoff + bytes + 1*buf_size/2); This is just a little bit confusing. I'd expect the write size to be a buffer size, not buffer size/2. Is there a reason you wanted to do it this way ? Also, do we need the size_KB ? I'd expect it would just do two writes. One starting in one block before i_size and ending before i_size but in the same block as i_size and the second starting beyond i_size but still in the same block as i_size. I guess it does not hurt to be able to specify size_KB, but I wonder what was your reason to include it ? Moreover I think it would be interesting to also allow the second write not to be aligned with i_size but still be in the same block. Something like: io_prep_pwrite(&iocb1, fd, buf, buf_size, startoff); io_prep_pwrite(&iocb2, fd, buf, buf_size, startoff + buf_size + shift); where "shift" (or whatever you want to call it) can be user specified. > + > + err = io_submit(ctx, 2, iocbs); > + if (err != 2) { > + fprintf(stderr, "error %s during %s\n", > + strerror(err), > + "io_submit"); > + return 1; > + } > + > + err = io_getevents(ctx, 2, 2, evs, NULL); > + if (err != 2) { > + fprintf(stderr, "error %s during %s\n", > + strerror(err), > + "io_getevents"); > + return 1; > + } > + > + for (i = 0; i < err; i++) { > + /* > + * res is unsigned for some reason, so this is the best > + * way to detect that it contains a negative errno. > + */ > + if (evs[i].res > buf_size / 2) { > + fprintf(stderr, "pwrite: %s\n", > + strerror(-evs[i].res)); > + return 1; > + } > + } > + fsync(fd); > + > + /* > + * And then read it back, then compare with original content. > + */ > + sret = pread(fd, buf, buf_size, startoff + bytes); > + if (sret == -1) { > + perror("pread"); > + return 1; > + } else if (sret != buf_size) { > + fprintf(stderr, "short read %zd was less than %zu\n", > + sret, buf_size); > + return 1; > + } > + if (memcmp(buf, cmp_buf, buf_size)) { > + printf("Find corruption\n"); > + dump_buffer(buf, 0, buf_size); > + return 1; > + } > + > + /* Calculate how many bytes we've written after pread */ > + bytes += buf_size; > + } > + > + return 0; > +} > diff --git a/tests/generic/999 b/tests/generic/999 > new file mode 100755 > index 00000000..6fcc593a > --- /dev/null > +++ b/tests/generic/999 > @@ -0,0 +1,71 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2019 Red Hat, Inc. All Rights Reserved. > +# > +# FS QA Test No. 999 > +# > +# Non-page-aligned direct AIO write test. AIO write from unalinged offset > +# on a file with different initial truncate i_size. > +# > +# Uncover "ext4: Fix data corruption caused by unaligned direct AIO" > +# > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + cd / > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > + > +# remove previous $seqres.full before test > +rm -f $seqres.full > + > +# real QA test starts here > +_supported_fs generic > +_supported_os Linux > +_require_test > +_require_aiodio aio-dio-write-verify > +_require_test_program "feature" > + > +localfile=$TEST_DIR/tst-aio-dio-testfile > +diosize=`_min_dio_alignment $TEST_DEV` > +pagesize=`src/feature -s` > +bufsize=$((pagesize * 2)) > +filesize=$((bufsize * 3 / 1024)) > + > +# Need smaller logical block size to do non-page-aligned test > +if [ $diosize -ge $pagesize ];then > + _notrun "Need device logical block size($diosize) < page size($pagesize)" > +fi > + > +rm -rf $localfile 2>/dev/null > +# page-aligned aiodio write verification at first > +$AIO_TEST -s $((bufsize * 10)) -b $bufsize $localfile > + > +# non-page-aligned aiodio write verification > +i=0 > +while [ $((diosize * i)) -lt $bufsize ];do > + truncsize=$((diosize * i++)) > + # non-page-aligned AIO write on different i_size file > + $AIO_TEST -s $filesize -b $bufsize -o $diosize -t $truncsize $localfile Why are we starting the write at diosize when truncsize is getting bigger and bigger and filesize remains the same. . Is filesize necessary ? -Lukas > + if [ $? -ne 0 ];then > + echo "[FAIL] $AIO_TEST -s $filesize -b $bufsize -o $diosize -t $truncsize $localfile" > + fi > +done > + > +echo "Silence is golden" > + > +# success, all done > +status=0 > +exit > diff --git a/tests/generic/999.out b/tests/generic/999.out > new file mode 100644 > index 00000000..3b276ca8 > --- /dev/null > +++ b/tests/generic/999.out > @@ -0,0 +1,2 @@ > +QA output created by 999 > +Silence is golden > diff --git a/tests/generic/group b/tests/generic/group > index 15227b67..b1f93d99 100644 > --- a/tests/generic/group > +++ b/tests/generic/group > @@ -534,3 +534,4 @@ > 529 auto quick attr > 530 auto quick unlink > 531 auto quick unlink > +999 auto quick aio > -- > 2.17.2 >
On Wed, Mar 13, 2019 at 09:52:38AM +0100, Lukas Czerner wrote: > Hi Zorr, > > thanks for putting this together. > > On Tue, Mar 12, 2019 at 05:19:44PM +0800, Zorro Lang wrote: > > A simply reproducer from Frank Sorenson: > > > > ftruncate(fd, 65012224) > > io_prep_pwrite(iocbs[0], fd, buf[0], 1048576, 63963648); > > io_prep_pwrite(iocbs[1], fd, buf[1], 1048576, 65012224); > > > > io_prep_pwrite(iocbs[2], fd, buf[2], 1048576, 66060800); > > io_prep_pwrite(iocbs[3], fd, buf[3], 1048576, 67109376); > > These two are not actually necessary. Sure. I just paste his original description at here:) > > > > > io_submit(io_ctx, 1, &iocbs[0]); > > io_submit(io_ctx, 1, &iocbs[1]); > > io_submit(io_ctx, 1, &iocbs[2]); > > io_submit(io_ctx, 1, &iocbs[3]); > > > > io_getevents(io_ctx, 4, 4, events, NULL) > > > > help to find an ext4 corruption: > > **************** **************** **************** > > * page 1 * * page 2 * * page 3 * > > **************** **************** **************** > > existing 0000000000000000 0000000000000000 0000000000000000 > > write 1 AAAAAAAAAAAAAA AA > > write 2 BBBBBBBBBBBBBB BB > > > > result 00AAAAAAAAAAAAAA 00BBBBBBBBBBBBBB BB00000000000000 > > desired 00AAAAAAAAAAAAAA AABBBBBBBBBBBBBB BB00000000000000 > > > > This issue remind us we might miss unaligned AIO test for long time. > > We thought fsx cover this part, but looks like it's not. So this case > > trys to cover unaligned direct AIO write test on file with different > > initial truncate i_size. > > > > Signed-off-by: Zorro Lang <zlang@redhat.com> > > --- > > > > Hi, > > > > The aio-dio-write-verify.c nearly copy from aio-dio-eof-race.c, then change > > a few logic. I've changed aio-dio-eof-race.c several times, to fit different > > new AIO test cases. But this time I need to truncate the file, then it won't > > write at EOF. This against the motive of aio-dio-eof-race.c. So I think it's > > time to shift a new program to do more AIO write test. > > > > It's only my personal opinion :) > > > > Thanks, > > Zorro > > > > src/aio-dio-regress/aio-dio-write-verify.c | 223 +++++++++++++++++++++ > > tests/generic/999 | 71 +++++++ > > tests/generic/999.out | 2 + > > tests/generic/group | 1 + > > 4 files changed, 297 insertions(+) > > create mode 100644 src/aio-dio-regress/aio-dio-write-verify.c > > create mode 100755 tests/generic/999 > > create mode 100644 tests/generic/999.out > > > > diff --git a/src/aio-dio-regress/aio-dio-write-verify.c b/src/aio-dio-regress/aio-dio-write-verify.c > > new file mode 100644 > > index 00000000..402ddcdb > > --- /dev/null > > +++ b/src/aio-dio-regress/aio-dio-write-verify.c > > @@ -0,0 +1,223 @@ > > +// SPDX-License-Identifier: GPL-2.0+ > > +/* > > + * Copyright (C) 2019 Red Hat, Inc. All Rights reserved. > > + * > > + * AIO writes from a start offset on a truncated file, verify there's not > > + * data corruption. > > > Can you put the description of the problem here as well for the future > generations to understand ? :) Sure. But I'd like to put the description into the case script(below generic/999), due to this program is not only to reproduce that bug, it has more common usage. > > > + */ > > +#include <sys/stat.h> > > +#include <sys/types.h> > > +#include <errno.h> > > +#include <fcntl.h> > > +#include <stdio.h> > > +#include <stdlib.h> > > +#include <unistd.h> > > +#include <ctype.h> > > + > > +#include <libaio.h> > > + > > +unsigned long buf_size = 0; > > +unsigned long size_KB = 0; > > +#define IO_PATTERN 0x5a > > + > > +void > > +usage(char *progname) > > +{ > > + fprintf(stderr, "usage: %s [-s datasize] [-b bufsize] [-o startoff] [-t truncsize] filename\n" > > + "\t-s datasize: specify the minimum data size(KB), doesn't count holes\n" > > + "\t-b bufsize: buffer size\n" > > + "\t-o startoff: start offset to write data, 0 by default\n" > > + "\t-t truncsize: truncate the file to a special size at first 0 by default\n", > > + progname); > > + exit(1); > > +} > > + > > +void > > +dump_buffer( > > + void *buf, > > + off64_t offset, > > + ssize_t len) > > +{ > > + int i, j; > > + char *p; > > + int new; > > + > > + for (i = 0, p = (char *)buf; i < len; i += 16) { > > + char *s = p; > > + > > + if (i && !memcmp(p, p - 16, 16)) { > > + new = 0; > > + } else { > > + if (i) > > + printf("*\n"); > > + new = 1; > > + } > > + > > + if (!new) { > > + p += 16; > > + continue; > > + } > > + > > + printf("%08llx ", (unsigned long long)offset + i); > > + for (j = 0; j < 16 && i + j < len; j++, p++) > > + printf("%02x ", *p); > > + printf(" "); > > + for (j = 0; j < 16 && i + j < len; j++, s++) { > > + if (isalnum((int)*s)) > > + printf("%c", *s); > > + else > > + printf("."); > > + } > > + printf("\n"); > > + > > + } > > + printf("%08llx\n", (unsigned long long)offset + i); > > +} > > + > > +int main(int argc, char *argv[]) > > +{ > > + struct io_context *ctx = NULL; > > + struct io_event evs[2]; > > + struct iocb iocb1, iocb2; > > + struct iocb *iocbs[] = { &iocb1, &iocb2 }; > > + void *buf; > > + int fd, err = 0; > > + off_t bytes; > > + int c; > > + char *cmp_buf = NULL; > > + char *filename = NULL; > > + /* start offset to write */ > > + long long startoff = 0; > > + /* truncate size */ > > + off_t tsize = 0; > > + > > + while ((c = getopt(argc, argv, "s:b:o:t:")) != -1) { > > + char *endp; > > + > > + switch (c) { > > + case 's': /* XXX MB size will be extended */ > > + size_KB = strtol(optarg, &endp, 0); > > + break; > > + case 'b': /* buffer size */ > > + buf_size = strtol(optarg, &endp, 0); > > + break; > > + case 'o': /* start offset */ > > + startoff = strtoll(optarg, &endp, 0); > > + break; > > + case 't': /* initial truncate size */ > > + tsize = strtoul(optarg, &endp, 0); > > + break; > > + default: > > + usage(argv[0]); > > + } > > + } > > + > > + if (size_KB == 0) /* default size is 64KB */ > > + size_KB = 64; > > + if (buf_size < 2048) /* default minimum buffer size is 2048 bytes */ > > + buf_size = 2048; > > 2048 is a weird number, but ok. Yeah, I think so, I'll set it to multi-pagesize. > > > + > > + if (optind == argc - 1) > > + filename = argv[optind]; > > + else > > + usage(argv[0]); > > + > > + fd = open(filename, O_DIRECT | O_CREAT | O_TRUNC | O_RDWR, 0600); > > + if (fd == -1) { > > + perror("open"); > > + return 1; > > + } > > + > > + /* truncate the file to a special size at first */ > > + if (tsize != 0) { > > + ftruncate(fd, tsize); > > + if (fd == -1) { > > + perror("ftruncate"); > > + return 1; > > + } > > + } > > + > > + err = posix_memalign(&buf, getpagesize(), buf_size); > > + if (err) { > > + fprintf(stderr, "error %s during %s\n", > > + strerror(err), > > + "posix_memalign"); > > + return 1; > > + } > > + cmp_buf = malloc(buf_size); > > + memset(cmp_buf, IO_PATTERN, buf_size); > > + > > + err = io_setup(5, &ctx); > > Are you expecting to have 5 events ? Sorry for this mistake, due to I use 5 events when I did test. I forgot to change this. > > > + if (err) { > > + fprintf(stderr, "error %s during %s\n", > > + strerror(err), > > + "io_setup"); > > + return 1; > > + } > > + > > + bytes = 0; > > + /* Keep extending until size_KB */ > > + while (bytes < size_KB * 1024) { > > + ssize_t sret; > > + int i; > > + > > + memset(buf, IO_PATTERN, buf_size); > > + > > + io_prep_pwrite(&iocb1, fd, buf, buf_size/2, \ > > + startoff + bytes + 0*buf_size/2); > > + io_prep_pwrite(&iocb2, fd, buf, buf_size/2, \ > > + startoff + bytes + 1*buf_size/2); > > This is just a little bit confusing. I'd expect the write size to be a > buffer size, not buffer size/2. Is there a reason you wanted to do it > this way ? The buf is used for below pread() too. We write twice separately at here, so write buf/2 each time. Then read whole buf size once. > > Also, do we need the size_KB ? I'd expect it would just do two writes. > One starting in one block before i_size and ending before i_size but in > the same block as i_size and the second starting beyond i_size but still > in the same block as i_size. > > I guess it does not hurt to be able to specify size_KB, but I wonder > what was your reason to include it ? > > Moreover I think it would be interesting to also allow the second write > not to be aligned with i_size but still be in the same block. Something > like: > > io_prep_pwrite(&iocb1, fd, buf, buf_size, startoff); > io_prep_pwrite(&iocb2, fd, buf, buf_size, startoff + buf_size + shift); > > where "shift" (or whatever you want to call it) can be user specified. Hmmm, I think I can do that in below generic/999, by provide different arguments to $AIO_TEST. > > > > + > > + err = io_submit(ctx, 2, iocbs); > > + if (err != 2) { > > + fprintf(stderr, "error %s during %s\n", > > + strerror(err), > > + "io_submit"); > > + return 1; > > + } > > + > > + err = io_getevents(ctx, 2, 2, evs, NULL); > > + if (err != 2) { > > + fprintf(stderr, "error %s during %s\n", > > + strerror(err), > > + "io_getevents"); > > + return 1; > > + } > > + > > + for (i = 0; i < err; i++) { > > + /* > > + * res is unsigned for some reason, so this is the best > > + * way to detect that it contains a negative errno. > > + */ > > + if (evs[i].res > buf_size / 2) { > > + fprintf(stderr, "pwrite: %s\n", > > + strerror(-evs[i].res)); > > + return 1; > > + } > > + } > > + fsync(fd); > > + > > + /* > > + * And then read it back, then compare with original content. > > + */ > > + sret = pread(fd, buf, buf_size, startoff + bytes); > > + if (sret == -1) { > > + perror("pread"); > > + return 1; > > + } else if (sret != buf_size) { > > + fprintf(stderr, "short read %zd was less than %zu\n", > > + sret, buf_size); > > + return 1; > > + } > > + if (memcmp(buf, cmp_buf, buf_size)) { > > + printf("Find corruption\n"); > > + dump_buffer(buf, 0, buf_size); > > + return 1; > > + } > > + > > + /* Calculate how many bytes we've written after pread */ > > + bytes += buf_size; > > + } > > + > > + return 0; > > +} > > diff --git a/tests/generic/999 b/tests/generic/999 > > new file mode 100755 > > index 00000000..6fcc593a > > --- /dev/null > > +++ b/tests/generic/999 > > @@ -0,0 +1,71 @@ > > +#! /bin/bash > > +# SPDX-License-Identifier: GPL-2.0 > > +# Copyright (c) 2019 Red Hat, Inc. All Rights Reserved. > > +# > > +# FS QA Test No. 999 > > +# > > +# Non-page-aligned direct AIO write test. AIO write from unalinged offset > > +# on a file with different initial truncate i_size. > > +# > > +# Uncover "ext4: Fix data corruption caused by unaligned direct AIO" > > +# > > +seq=`basename $0` > > +seqres=$RESULT_DIR/$seq > > +echo "QA output created by $seq" > > + > > +here=`pwd` > > +tmp=/tmp/$$ > > +status=1 # failure is the default! > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > + > > +_cleanup() > > +{ > > + cd / > > + rm -f $tmp.* > > +} > > + > > +# get standard environment, filters and checks > > +. ./common/rc > > +. ./common/filter > > + > > +# remove previous $seqres.full before test > > +rm -f $seqres.full > > + > > +# real QA test starts here > > +_supported_fs generic > > +_supported_os Linux > > +_require_test > > +_require_aiodio aio-dio-write-verify > > +_require_test_program "feature" > > + > > +localfile=$TEST_DIR/tst-aio-dio-testfile > > +diosize=`_min_dio_alignment $TEST_DEV` > > +pagesize=`src/feature -s` > > +bufsize=$((pagesize * 2)) > > +filesize=$((bufsize * 3 / 1024)) > > + > > +# Need smaller logical block size to do non-page-aligned test > > +if [ $diosize -ge $pagesize ];then > > + _notrun "Need device logical block size($diosize) < page size($pagesize)" > > +fi > > + > > +rm -rf $localfile 2>/dev/null > > +# page-aligned aiodio write verification at first > > +$AIO_TEST -s $((bufsize * 10)) -b $bufsize $localfile > > + > > +# non-page-aligned aiodio write verification > > +i=0 > > +while [ $((diosize * i)) -lt $bufsize ];do > > + truncsize=$((diosize * i++)) > > + # non-page-aligned AIO write on different i_size file > > + $AIO_TEST -s $filesize -b $bufsize -o $diosize -t $truncsize $localfile > > Why are we starting the write at diosize when truncsize is getting > bigger and bigger and filesize remains the same. . Is filesize > necessary ? The filesize is just used to decide how many times $AIO_TEST will write, then exit. It's not related with this bug, just a limit condition. That .c program is not only to reproduce that ext4 bug. It can be used to do direct AIO write and verify. Then the generic/999 is used to cover that ext4 bug, and maybe cover more operations. Thanks, Zorro > > -Lukas > > > + if [ $? -ne 0 ];then > > + echo "[FAIL] $AIO_TEST -s $filesize -b $bufsize -o $diosize -t $truncsize $localfile" > > + fi > > +done > > + > > +echo "Silence is golden" > > + > > +# success, all done > > +status=0 > > +exit > > diff --git a/tests/generic/999.out b/tests/generic/999.out > > new file mode 100644 > > index 00000000..3b276ca8 > > --- /dev/null > > +++ b/tests/generic/999.out > > @@ -0,0 +1,2 @@ > > +QA output created by 999 > > +Silence is golden > > diff --git a/tests/generic/group b/tests/generic/group > > index 15227b67..b1f93d99 100644 > > --- a/tests/generic/group > > +++ b/tests/generic/group > > @@ -534,3 +534,4 @@ > > 529 auto quick attr > > 530 auto quick unlink > > 531 auto quick unlink > > +999 auto quick aio > > -- > > 2.17.2 > >
On Wed, Mar 13, 2019 at 06:03:42PM +0800, Zorro Lang wrote: > > > + io_prep_pwrite(&iocb1, fd, buf, buf_size/2, \ > > > + startoff + bytes + 0*buf_size/2); > > > + io_prep_pwrite(&iocb2, fd, buf, buf_size/2, \ > > > + startoff + bytes + 1*buf_size/2); > > > > This is just a little bit confusing. I'd expect the write size to be a > > buffer size, not buffer size/2. Is there a reason you wanted to do it > > this way ? > > The buf is used for below pread() too. We write twice separately at here, > so write buf/2 each time. Then read whole buf size once. Sure, but from the persective of someone using this thing I'd expect the buf_size to be the size of the write. I do not think it matters too much as long as you can explain it in the description of the program. > > > > > Also, do we need the size_KB ? I'd expect it would just do two writes. > > One starting in one block before i_size and ending before i_size but in > > the same block as i_size and the second starting beyond i_size but still > > in the same block as i_size. > > > > I guess it does not hurt to be able to specify size_KB, but I wonder > > what was your reason to include it ? > > > > Moreover I think it would be interesting to also allow the second write > > not to be aligned with i_size but still be in the same block. Something > > like: > > > > io_prep_pwrite(&iocb1, fd, buf, buf_size, startoff); > > io_prep_pwrite(&iocb2, fd, buf, buf_size, startoff + buf_size + shift); > > > > where "shift" (or whatever you want to call it) can be user specified. > > Hmmm, I think I can do that in below generic/999, by provide different arguments > to $AIO_TEST. Looking at the code I do not think you can, but maybe I am wrong. > > > +localfile=$TEST_DIR/tst-aio-dio-testfile > > > +diosize=`_min_dio_alignment $TEST_DEV` > > > +pagesize=`src/feature -s` > > > +bufsize=$((pagesize * 2)) > > > +filesize=$((bufsize * 3 / 1024)) > > > + > > > +# Need smaller logical block size to do non-page-aligned test > > > +if [ $diosize -ge $pagesize ];then > > > + _notrun "Need device logical block size($diosize) < page size($pagesize)" > > > +fi > > > + > > > +rm -rf $localfile 2>/dev/null > > > +# page-aligned aiodio write verification at first > > > +$AIO_TEST -s $((bufsize * 10)) -b $bufsize $localfile > > > + > > > +# non-page-aligned aiodio write verification > > > +i=0 > > > +while [ $((diosize * i)) -lt $bufsize ];do > > > + truncsize=$((diosize * i++)) > > > + # non-page-aligned AIO write on different i_size file > > > + $AIO_TEST -s $filesize -b $bufsize -o $diosize -t $truncsize $localfile > > > > Why are we starting the write at diosize when truncsize is getting > > bigger and bigger and filesize remains the same. . Is filesize > > necessary ? > > The filesize is just used to decide how many times $AIO_TEST will write, then exit. It's > not related with this bug, just a limit condition. > > That .c program is not only to reproduce that ext4 bug. It can be used to do direct AIO > write and verify. Then the generic/999 is used to cover that ext4 bug, and maybe cover > more operations. Fair enough, just please decribe what the program is good for and what it can do. Currently it's not entirely clear. Thanks! -Lukas > > Thanks, > Zorro
On Wed, Mar 13, 2019 at 11:26:10AM +0100, Lukas Czerner wrote: > On Wed, Mar 13, 2019 at 06:03:42PM +0800, Zorro Lang wrote: > > > > + io_prep_pwrite(&iocb1, fd, buf, buf_size/2, \ > > > > + startoff + bytes + 0*buf_size/2); > > > > + io_prep_pwrite(&iocb2, fd, buf, buf_size/2, \ > > > > + startoff + bytes + 1*buf_size/2); > > > > > > This is just a little bit confusing. I'd expect the write size to be a > > > buffer size, not buffer size/2. Is there a reason you wanted to do it > > > this way ? > > > > The buf is used for below pread() too. We write twice separately at here, > > so write buf/2 each time. Then read whole buf size once. > > Sure, but from the persective of someone using this thing I'd expect the > buf_size to be the size of the write. I do not think it matters too much > as long as you can explain it in the description of the program. OK, I'll change that. > > > > > > > > > Also, do we need the size_KB ? I'd expect it would just do two writes. > > > One starting in one block before i_size and ending before i_size but in > > > the same block as i_size and the second starting beyond i_size but still > > > in the same block as i_size. > > > > > > I guess it does not hurt to be able to specify size_KB, but I wonder > > > what was your reason to include it ? > > > > > > Moreover I think it would be interesting to also allow the second write > > > not to be aligned with i_size but still be in the same block. Something > > > like: > > > > > > io_prep_pwrite(&iocb1, fd, buf, buf_size, startoff); > > > io_prep_pwrite(&iocb2, fd, buf, buf_size, startoff + buf_size + shift); > > > > > > where "shift" (or whatever you want to call it) can be user specified. > > > > Hmmm, I think I can do that in below generic/999, by provide different arguments > > to $AIO_TEST. > > Looking at the code I do not think you can, but maybe I am wrong. Oh, you mean the 2nd io_prep_pwrite not start from i_size. OK, I'll add another offset argument to affect the 2nd io_prep_pwrite. > > > > > +localfile=$TEST_DIR/tst-aio-dio-testfile > > > > +diosize=`_min_dio_alignment $TEST_DEV` > > > > +pagesize=`src/feature -s` > > > > +bufsize=$((pagesize * 2)) > > > > +filesize=$((bufsize * 3 / 1024)) > > > > + > > > > +# Need smaller logical block size to do non-page-aligned test > > > > +if [ $diosize -ge $pagesize ];then > > > > + _notrun "Need device logical block size($diosize) < page size($pagesize)" > > > > +fi > > > > + > > > > +rm -rf $localfile 2>/dev/null > > > > +# page-aligned aiodio write verification at first > > > > +$AIO_TEST -s $((bufsize * 10)) -b $bufsize $localfile > > > > + > > > > +# non-page-aligned aiodio write verification > > > > +i=0 > > > > +while [ $((diosize * i)) -lt $bufsize ];do > > > > + truncsize=$((diosize * i++)) > > > > + # non-page-aligned AIO write on different i_size file > > > > + $AIO_TEST -s $filesize -b $bufsize -o $diosize -t $truncsize $localfile > > > > > > Why are we starting the write at diosize when truncsize is getting > > > bigger and bigger and filesize remains the same. . Is filesize > > > necessary ? > > > > The filesize is just used to decide how many times $AIO_TEST will write, then exit. It's > > not related with this bug, just a limit condition. > > > > That .c program is not only to reproduce that ext4 bug. It can be used to do direct AIO > > write and verify. Then the generic/999 is used to cover that ext4 bug, and maybe cover > > more operations. > > Fair enough, just please decribe what the program is good for and what > it can do. Currently it's not entirely clear. Sure, Thanks for your review. I'll send a V2 patch later. Thanks, Zorro > > Thanks! > -Lukas > > > > > Thanks, > > Zorro
diff --git a/src/aio-dio-regress/aio-dio-write-verify.c b/src/aio-dio-regress/aio-dio-write-verify.c new file mode 100644 index 00000000..402ddcdb --- /dev/null +++ b/src/aio-dio-regress/aio-dio-write-verify.c @@ -0,0 +1,223 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2019 Red Hat, Inc. All Rights reserved. + * + * AIO writes from a start offset on a truncated file, verify there's not + * data corruption. + */ +#include <sys/stat.h> +#include <sys/types.h> +#include <errno.h> +#include <fcntl.h> +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> +#include <ctype.h> + +#include <libaio.h> + +unsigned long buf_size = 0; +unsigned long size_KB = 0; +#define IO_PATTERN 0x5a + +void +usage(char *progname) +{ + fprintf(stderr, "usage: %s [-s datasize] [-b bufsize] [-o startoff] [-t truncsize] filename\n" + "\t-s datasize: specify the minimum data size(KB), doesn't count holes\n" + "\t-b bufsize: buffer size\n" + "\t-o startoff: start offset to write data, 0 by default\n" + "\t-t truncsize: truncate the file to a special size at first 0 by default\n", + progname); + exit(1); +} + +void +dump_buffer( + void *buf, + off64_t offset, + ssize_t len) +{ + int i, j; + char *p; + int new; + + for (i = 0, p = (char *)buf; i < len; i += 16) { + char *s = p; + + if (i && !memcmp(p, p - 16, 16)) { + new = 0; + } else { + if (i) + printf("*\n"); + new = 1; + } + + if (!new) { + p += 16; + continue; + } + + printf("%08llx ", (unsigned long long)offset + i); + for (j = 0; j < 16 && i + j < len; j++, p++) + printf("%02x ", *p); + printf(" "); + for (j = 0; j < 16 && i + j < len; j++, s++) { + if (isalnum((int)*s)) + printf("%c", *s); + else + printf("."); + } + printf("\n"); + + } + printf("%08llx\n", (unsigned long long)offset + i); +} + +int main(int argc, char *argv[]) +{ + struct io_context *ctx = NULL; + struct io_event evs[2]; + struct iocb iocb1, iocb2; + struct iocb *iocbs[] = { &iocb1, &iocb2 }; + void *buf; + int fd, err = 0; + off_t bytes; + int c; + char *cmp_buf = NULL; + char *filename = NULL; + /* start offset to write */ + long long startoff = 0; + /* truncate size */ + off_t tsize = 0; + + while ((c = getopt(argc, argv, "s:b:o:t:")) != -1) { + char *endp; + + switch (c) { + case 's': /* XXX MB size will be extended */ + size_KB = strtol(optarg, &endp, 0); + break; + case 'b': /* buffer size */ + buf_size = strtol(optarg, &endp, 0); + break; + case 'o': /* start offset */ + startoff = strtoll(optarg, &endp, 0); + break; + case 't': /* initial truncate size */ + tsize = strtoul(optarg, &endp, 0); + break; + default: + usage(argv[0]); + } + } + + if (size_KB == 0) /* default size is 64KB */ + size_KB = 64; + if (buf_size < 2048) /* default minimum buffer size is 2048 bytes */ + buf_size = 2048; + + if (optind == argc - 1) + filename = argv[optind]; + else + usage(argv[0]); + + fd = open(filename, O_DIRECT | O_CREAT | O_TRUNC | O_RDWR, 0600); + if (fd == -1) { + perror("open"); + return 1; + } + + /* truncate the file to a special size at first */ + if (tsize != 0) { + ftruncate(fd, tsize); + if (fd == -1) { + perror("ftruncate"); + return 1; + } + } + + err = posix_memalign(&buf, getpagesize(), buf_size); + if (err) { + fprintf(stderr, "error %s during %s\n", + strerror(err), + "posix_memalign"); + return 1; + } + cmp_buf = malloc(buf_size); + memset(cmp_buf, IO_PATTERN, buf_size); + + err = io_setup(5, &ctx); + if (err) { + fprintf(stderr, "error %s during %s\n", + strerror(err), + "io_setup"); + return 1; + } + + bytes = 0; + /* Keep extending until size_KB */ + while (bytes < size_KB * 1024) { + ssize_t sret; + int i; + + memset(buf, IO_PATTERN, buf_size); + + io_prep_pwrite(&iocb1, fd, buf, buf_size/2, \ + startoff + bytes + 0*buf_size/2); + io_prep_pwrite(&iocb2, fd, buf, buf_size/2, \ + startoff + bytes + 1*buf_size/2); + + err = io_submit(ctx, 2, iocbs); + if (err != 2) { + fprintf(stderr, "error %s during %s\n", + strerror(err), + "io_submit"); + return 1; + } + + err = io_getevents(ctx, 2, 2, evs, NULL); + if (err != 2) { + fprintf(stderr, "error %s during %s\n", + strerror(err), + "io_getevents"); + return 1; + } + + for (i = 0; i < err; i++) { + /* + * res is unsigned for some reason, so this is the best + * way to detect that it contains a negative errno. + */ + if (evs[i].res > buf_size / 2) { + fprintf(stderr, "pwrite: %s\n", + strerror(-evs[i].res)); + return 1; + } + } + fsync(fd); + + /* + * And then read it back, then compare with original content. + */ + sret = pread(fd, buf, buf_size, startoff + bytes); + if (sret == -1) { + perror("pread"); + return 1; + } else if (sret != buf_size) { + fprintf(stderr, "short read %zd was less than %zu\n", + sret, buf_size); + return 1; + } + if (memcmp(buf, cmp_buf, buf_size)) { + printf("Find corruption\n"); + dump_buffer(buf, 0, buf_size); + return 1; + } + + /* Calculate how many bytes we've written after pread */ + bytes += buf_size; + } + + return 0; +} diff --git a/tests/generic/999 b/tests/generic/999 new file mode 100755 index 00000000..6fcc593a --- /dev/null +++ b/tests/generic/999 @@ -0,0 +1,71 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2019 Red Hat, Inc. All Rights Reserved. +# +# FS QA Test No. 999 +# +# Non-page-aligned direct AIO write test. AIO write from unalinged offset +# on a file with different initial truncate i_size. +# +# Uncover "ext4: Fix data corruption caused by unaligned direct AIO" +# +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# remove previous $seqres.full before test +rm -f $seqres.full + +# real QA test starts here +_supported_fs generic +_supported_os Linux +_require_test +_require_aiodio aio-dio-write-verify +_require_test_program "feature" + +localfile=$TEST_DIR/tst-aio-dio-testfile +diosize=`_min_dio_alignment $TEST_DEV` +pagesize=`src/feature -s` +bufsize=$((pagesize * 2)) +filesize=$((bufsize * 3 / 1024)) + +# Need smaller logical block size to do non-page-aligned test +if [ $diosize -ge $pagesize ];then + _notrun "Need device logical block size($diosize) < page size($pagesize)" +fi + +rm -rf $localfile 2>/dev/null +# page-aligned aiodio write verification at first +$AIO_TEST -s $((bufsize * 10)) -b $bufsize $localfile + +# non-page-aligned aiodio write verification +i=0 +while [ $((diosize * i)) -lt $bufsize ];do + truncsize=$((diosize * i++)) + # non-page-aligned AIO write on different i_size file + $AIO_TEST -s $filesize -b $bufsize -o $diosize -t $truncsize $localfile + if [ $? -ne 0 ];then + echo "[FAIL] $AIO_TEST -s $filesize -b $bufsize -o $diosize -t $truncsize $localfile" + fi +done + +echo "Silence is golden" + +# success, all done +status=0 +exit diff --git a/tests/generic/999.out b/tests/generic/999.out new file mode 100644 index 00000000..3b276ca8 --- /dev/null +++ b/tests/generic/999.out @@ -0,0 +1,2 @@ +QA output created by 999 +Silence is golden diff --git a/tests/generic/group b/tests/generic/group index 15227b67..b1f93d99 100644 --- a/tests/generic/group +++ b/tests/generic/group @@ -534,3 +534,4 @@ 529 auto quick attr 530 auto quick unlink 531 auto quick unlink +999 auto quick aio
A simply reproducer from Frank Sorenson: ftruncate(fd, 65012224) io_prep_pwrite(iocbs[0], fd, buf[0], 1048576, 63963648); io_prep_pwrite(iocbs[1], fd, buf[1], 1048576, 65012224); io_prep_pwrite(iocbs[2], fd, buf[2], 1048576, 66060800); io_prep_pwrite(iocbs[3], fd, buf[3], 1048576, 67109376); io_submit(io_ctx, 1, &iocbs[0]); io_submit(io_ctx, 1, &iocbs[1]); io_submit(io_ctx, 1, &iocbs[2]); io_submit(io_ctx, 1, &iocbs[3]); io_getevents(io_ctx, 4, 4, events, NULL) help to find an ext4 corruption: **************** **************** **************** * page 1 * * page 2 * * page 3 * **************** **************** **************** existing 0000000000000000 0000000000000000 0000000000000000 write 1 AAAAAAAAAAAAAA AA write 2 BBBBBBBBBBBBBB BB result 00AAAAAAAAAAAAAA 00BBBBBBBBBBBBBB BB00000000000000 desired 00AAAAAAAAAAAAAA AABBBBBBBBBBBBBB BB00000000000000 This issue remind us we might miss unaligned AIO test for long time. We thought fsx cover this part, but looks like it's not. So this case trys to cover unaligned direct AIO write test on file with different initial truncate i_size. Signed-off-by: Zorro Lang <zlang@redhat.com> --- Hi, The aio-dio-write-verify.c nearly copy from aio-dio-eof-race.c, then change a few logic. I've changed aio-dio-eof-race.c several times, to fit different new AIO test cases. But this time I need to truncate the file, then it won't write at EOF. This against the motive of aio-dio-eof-race.c. So I think it's time to shift a new program to do more AIO write test. It's only my personal opinion :) Thanks, Zorro src/aio-dio-regress/aio-dio-write-verify.c | 223 +++++++++++++++++++++ tests/generic/999 | 71 +++++++ tests/generic/999.out | 2 + tests/generic/group | 1 + 4 files changed, 297 insertions(+) create mode 100644 src/aio-dio-regress/aio-dio-write-verify.c create mode 100755 tests/generic/999 create mode 100644 tests/generic/999.out