Message ID | 1554474244-553661-1-git-send-email-andrey.shinkevich@virtuozzo.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block/io.c: fix for the allocation failure | expand |
On 4/5/19 10:24 AM, Andrey Shinkevich wrote: > On a file system used by the customer, fallocate() returns an error > if the block is not properly aligned. So, bdrv_co_pwrite_zeroes() > fails. We can handle that case the same way as it is done for the > unsupported cases, namely, call to bdrv_driver_pwritev() that writes > zeroes to an image for the unaligned chunk of the block. > > Suggested-by: Denis V. Lunev <den@openvz.org> > Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> > --- > block/io.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/io.c b/block/io.c > index dfc153b..0412a51 100644 > --- a/block/io.c > +++ b/block/io.c > @@ -1516,7 +1516,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, > assert(!bs->supported_zero_flags); > } > > - if (ret == -ENOTSUP && !(flags & BDRV_REQ_NO_FALLBACK)) { > + if (ret < 0 && !(flags & BDRV_REQ_NO_FALLBACK)) { > /* Fall back to bounce buffer if write zeroes is unsupported */ > BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE; > > I suppose that if fallocate fails for any reason and we're allowing fallback, we're either going to succeed ... or fail again very soon thereafter. Are there any cases where it is vital to not ignore the first fallocate failure? I'm a little wary of ignoring the return code from bdrv_co_pwrite_zeroes, but I am assuming that if there is a "real" failure here that the following bounce writes will also fail "safely." I'm not completely confident, but I have no tangible objections: Reviewed-by: John Snow <jsnow@redhat.com>
On Fri, Apr 05, 2019 at 05:24:04PM +0300, Andrey Shinkevich wrote: > On a file system used by the customer, fallocate() returns an error > if the block is not properly aligned. So, bdrv_co_pwrite_zeroes() > fails. We can handle that case the same way as it is done for the > unsupported cases, namely, call to bdrv_driver_pwritev() that writes > zeroes to an image for the unaligned chunk of the block. > > Suggested-by: Denis V. Lunev <den@openvz.org> > Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> > --- > block/io.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Thanks, applied to my block-next tree for QEMU 4.1: https://github.com/stefanha/qemu/commits/block-next Stefan
On 06/04/2019 01:50, John Snow wrote: > > > On 4/5/19 10:24 AM, Andrey Shinkevich wrote: >> On a file system used by the customer, fallocate() returns an error >> if the block is not properly aligned. So, bdrv_co_pwrite_zeroes() >> fails. We can handle that case the same way as it is done for the >> unsupported cases, namely, call to bdrv_driver_pwritev() that writes >> zeroes to an image for the unaligned chunk of the block. >> >> Suggested-by: Denis V. Lunev <den@openvz.org> >> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> >> --- >> block/io.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/block/io.c b/block/io.c >> index dfc153b..0412a51 100644 >> --- a/block/io.c >> +++ b/block/io.c >> @@ -1516,7 +1516,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, >> assert(!bs->supported_zero_flags); >> } >> >> - if (ret == -ENOTSUP && !(flags & BDRV_REQ_NO_FALLBACK)) { >> + if (ret < 0 && !(flags & BDRV_REQ_NO_FALLBACK)) { >> /* Fall back to bounce buffer if write zeroes is unsupported */ >> BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE; >> >> > > I suppose that if fallocate fails for any reason and we're allowing > fallback, we're either going to succeed ... or fail again very soon > thereafter. > > Are there any cases where it is vital to not ignore the first fallocate > failure? I'm a little wary of ignoring the return code from > bdrv_co_pwrite_zeroes, but I am assuming that if there is a "real" > failure here that the following bounce writes will also fail "safely." > > I'm not completely confident, but I have no tangible objections: > Reviewed-by: John Snow <jsnow@redhat.com> > Thank you for your review, John! Let me clarify the circumstances and quote the bug report: "Customer had Win-2012 VM with 50GB system disk which was later resized to 256GB without resizing the partition inside VM. Now, while trying to resize to 50G, the following error will appear 'Failed to reduce the number of L2 tables: Invalid argument' It was found that it is possible to shrink the disk to 128G and any size above that number, but size below 128G will bring the mentioned error." The fallocate() returns no error on that file system if the offset and the (offset + bytes) parameters of the bdrv_co_do_pwrite_zeroes() both are aligned to 4K.
On 08/04/2019 12:00, Stefan Hajnoczi wrote: > On Fri, Apr 05, 2019 at 05:24:04PM +0300, Andrey Shinkevich wrote: >> On a file system used by the customer, fallocate() returns an error >> if the block is not properly aligned. So, bdrv_co_pwrite_zeroes() >> fails. We can handle that case the same way as it is done for the >> unsupported cases, namely, call to bdrv_driver_pwritev() that writes >> zeroes to an image for the unaligned chunk of the block. >> >> Suggested-by: Denis V. Lunev <den@openvz.org> >> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> >> --- >> block/io.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) > > Thanks, applied to my block-next tree for QEMU 4.1: > https://github.com/stefanha/qemu/commits/block-next > > Stefan > Thank you all!
Am 08.04.2019 um 11:44 hat Andrey Shinkevich geschrieben: > > > On 06/04/2019 01:50, John Snow wrote: > > > > > > On 4/5/19 10:24 AM, Andrey Shinkevich wrote: > >> On a file system used by the customer, fallocate() returns an error > >> if the block is not properly aligned. So, bdrv_co_pwrite_zeroes() > >> fails. We can handle that case the same way as it is done for the > >> unsupported cases, namely, call to bdrv_driver_pwritev() that writes > >> zeroes to an image for the unaligned chunk of the block. > >> > >> Suggested-by: Denis V. Lunev <den@openvz.org> > >> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> > >> --- > >> block/io.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/block/io.c b/block/io.c > >> index dfc153b..0412a51 100644 > >> --- a/block/io.c > >> +++ b/block/io.c > >> @@ -1516,7 +1516,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, > >> assert(!bs->supported_zero_flags); > >> } > >> > >> - if (ret == -ENOTSUP && !(flags & BDRV_REQ_NO_FALLBACK)) { > >> + if (ret < 0 && !(flags & BDRV_REQ_NO_FALLBACK)) { > >> /* Fall back to bounce buffer if write zeroes is unsupported */ > >> BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE; > >> > >> > > > > I suppose that if fallocate fails for any reason and we're allowing > > fallback, we're either going to succeed ... or fail again very soon > > thereafter. > > > > Are there any cases where it is vital to not ignore the first fallocate > > failure? I'm a little wary of ignoring the return code from > > bdrv_co_pwrite_zeroes, but I am assuming that if there is a "real" > > failure here that the following bounce writes will also fail "safely." > > > > I'm not completely confident, but I have no tangible objections: > > Reviewed-by: John Snow <jsnow@redhat.com> > > > > Thank you for your review, John! > > Let me clarify the circumstances and quote the bug report: > "Customer had Win-2012 VM with 50GB system disk which was later resized > to 256GB without resizing the partition inside VM. > Now, while trying to resize to 50G, the following error will appear > 'Failed to reduce the number of L2 tables: Invalid argument' > It was found that it is possible to shrink the disk to 128G and any size > above that number, but size below 128G will bring the mentioned error." > > The fallocate() returns no error on that file system if the offset and > the (offset + bytes) parameters of the bdrv_co_do_pwrite_zeroes() both > are aligned to 4K. What is the return value you get from this file system? Maybe turning that into ENOTSUP in file-posix would be less invasive. Just falling back for any error gives me the vague feeling that it could cause problems sooner or later. Kevin
Am 08.04.2019 um 12:04 hat Kevin Wolf geschrieben: > Am 08.04.2019 um 11:44 hat Andrey Shinkevich geschrieben: > > > > > > On 06/04/2019 01:50, John Snow wrote: > > > > > > > > > On 4/5/19 10:24 AM, Andrey Shinkevich wrote: > > >> On a file system used by the customer, fallocate() returns an error > > >> if the block is not properly aligned. So, bdrv_co_pwrite_zeroes() > > >> fails. We can handle that case the same way as it is done for the > > >> unsupported cases, namely, call to bdrv_driver_pwritev() that writes > > >> zeroes to an image for the unaligned chunk of the block. > > >> > > >> Suggested-by: Denis V. Lunev <den@openvz.org> > > >> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> > > >> --- > > >> block/io.c | 2 +- > > >> 1 file changed, 1 insertion(+), 1 deletion(-) > > >> > > >> diff --git a/block/io.c b/block/io.c > > >> index dfc153b..0412a51 100644 > > >> --- a/block/io.c > > >> +++ b/block/io.c > > >> @@ -1516,7 +1516,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, > > >> assert(!bs->supported_zero_flags); > > >> } > > >> > > >> - if (ret == -ENOTSUP && !(flags & BDRV_REQ_NO_FALLBACK)) { > > >> + if (ret < 0 && !(flags & BDRV_REQ_NO_FALLBACK)) { > > >> /* Fall back to bounce buffer if write zeroes is unsupported */ > > >> BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE; > > >> > > >> > > > > > > I suppose that if fallocate fails for any reason and we're allowing > > > fallback, we're either going to succeed ... or fail again very soon > > > thereafter. > > > > > > Are there any cases where it is vital to not ignore the first fallocate > > > failure? I'm a little wary of ignoring the return code from > > > bdrv_co_pwrite_zeroes, but I am assuming that if there is a "real" > > > failure here that the following bounce writes will also fail "safely." > > > > > > I'm not completely confident, but I have no tangible objections: > > > Reviewed-by: John Snow <jsnow@redhat.com> > > > > > > > Thank you for your review, John! > > > > Let me clarify the circumstances and quote the bug report: > > "Customer had Win-2012 VM with 50GB system disk which was later resized > > to 256GB without resizing the partition inside VM. > > Now, while trying to resize to 50G, the following error will appear > > 'Failed to reduce the number of L2 tables: Invalid argument' > > It was found that it is possible to shrink the disk to 128G and any size > > above that number, but size below 128G will bring the mentioned error." > > > > The fallocate() returns no error on that file system if the offset and > > the (offset + bytes) parameters of the bdrv_co_do_pwrite_zeroes() both > > are aligned to 4K. > > What is the return value you get from this file system? > > Maybe turning that into ENOTSUP in file-posix would be less invasive. > Just falling back for any error gives me the vague feeling that it could > cause problems sooner or later. Also, which file system is this? This seems to be a file system bug. fallocate() isn't documented to possibly have alignment restrictions for FALLOC_FL_ZERO_RANGE (if this is the operation you're talking about). FALLOC_FL_PUNCH_HOLE even explicitly mentions the behaviour for partial blocks, so there is no doubt that operations for partial blocks are considered valid. Operations that may impose restrictions are explicitly documented as such. So in any case, this would only be a workaround for a buggy file system. The real bug needs to be fixed there. Kevin
On 08/04/2019 13:04, Kevin Wolf wrote: > Am 08.04.2019 um 11:44 hat Andrey Shinkevich geschrieben: >> >> >> On 06/04/2019 01:50, John Snow wrote: >>> >>> >>> On 4/5/19 10:24 AM, Andrey Shinkevich wrote: >>>> On a file system used by the customer, fallocate() returns an error >>>> if the block is not properly aligned. So, bdrv_co_pwrite_zeroes() >>>> fails. We can handle that case the same way as it is done for the >>>> unsupported cases, namely, call to bdrv_driver_pwritev() that writes >>>> zeroes to an image for the unaligned chunk of the block. >>>> >>>> Suggested-by: Denis V. Lunev <den@openvz.org> >>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> >>>> --- >>>> block/io.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/block/io.c b/block/io.c >>>> index dfc153b..0412a51 100644 >>>> --- a/block/io.c >>>> +++ b/block/io.c >>>> @@ -1516,7 +1516,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, >>>> assert(!bs->supported_zero_flags); >>>> } >>>> >>>> - if (ret == -ENOTSUP && !(flags & BDRV_REQ_NO_FALLBACK)) { >>>> + if (ret < 0 && !(flags & BDRV_REQ_NO_FALLBACK)) { >>>> /* Fall back to bounce buffer if write zeroes is unsupported */ >>>> BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE; >>>> >>>> >>> >>> I suppose that if fallocate fails for any reason and we're allowing >>> fallback, we're either going to succeed ... or fail again very soon >>> thereafter. >>> >>> Are there any cases where it is vital to not ignore the first fallocate >>> failure? I'm a little wary of ignoring the return code from >>> bdrv_co_pwrite_zeroes, but I am assuming that if there is a "real" >>> failure here that the following bounce writes will also fail "safely." >>> >>> I'm not completely confident, but I have no tangible objections: >>> Reviewed-by: John Snow <jsnow@redhat.com> >>> >> >> Thank you for your review, John! >> >> Let me clarify the circumstances and quote the bug report: >> "Customer had Win-2012 VM with 50GB system disk which was later resized >> to 256GB without resizing the partition inside VM. >> Now, while trying to resize to 50G, the following error will appear >> 'Failed to reduce the number of L2 tables: Invalid argument' >> It was found that it is possible to shrink the disk to 128G and any size >> above that number, but size below 128G will bring the mentioned error." >> >> The fallocate() returns no error on that file system if the offset and >> the (offset + bytes) parameters of the bdrv_co_do_pwrite_zeroes() both >> are aligned to 4K. > > What is the return value you get from this file system? > > Maybe turning that into ENOTSUP in file-posix would be less invasive. > Just falling back for any error gives me the vague feeling that it could > cause problems sooner or later. > > Kevin > The return value for that custom distributed file system is "Invalid argument", that's "EINVAL: mode is FALLOC_FL_ZERO_RANGE, but the file referred to by fd is not a regular file". When I reproduced the bug, I saw that the alignment in the bdrv_co_do_pwrite_zeroes() was set to '1' in that case: MAX(bs->bl.pwrite_zeroes_alignment /=0/), bs->bl.request_alignment /=1/); With my first patch I had not sent before, a new member of the structure BlockLimits, say pwrite_zeroes_alignment_min, was set to 4K for a raw file and the alignment was made for the block. Then zeroes were written to the image for the unaligned head and tail by invoking the bdrv_driver_pwritev(). That approach has cons also: we would write to the disk always. The file system maintainers say that the bug is a particular case and they may not return the general error such as 'unsupported'.
On Mon, Apr 08, 2019 at 12:14:49PM +0200, Kevin Wolf wrote: > Am 08.04.2019 um 12:04 hat Kevin Wolf geschrieben: > > Am 08.04.2019 um 11:44 hat Andrey Shinkevich geschrieben: > > > > > > > > > On 06/04/2019 01:50, John Snow wrote: > > > > > > > > > > > > On 4/5/19 10:24 AM, Andrey Shinkevich wrote: > > > >> On a file system used by the customer, fallocate() returns an error > > > >> if the block is not properly aligned. So, bdrv_co_pwrite_zeroes() > > > >> fails. We can handle that case the same way as it is done for the > > > >> unsupported cases, namely, call to bdrv_driver_pwritev() that writes > > > >> zeroes to an image for the unaligned chunk of the block. > > > >> > > > >> Suggested-by: Denis V. Lunev <den@openvz.org> > > > >> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> > > > >> --- > > > >> block/io.c | 2 +- > > > >> 1 file changed, 1 insertion(+), 1 deletion(-) > > > >> > > > >> diff --git a/block/io.c b/block/io.c > > > >> index dfc153b..0412a51 100644 > > > >> --- a/block/io.c > > > >> +++ b/block/io.c > > > >> @@ -1516,7 +1516,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, > > > >> assert(!bs->supported_zero_flags); > > > >> } > > > >> > > > >> - if (ret == -ENOTSUP && !(flags & BDRV_REQ_NO_FALLBACK)) { > > > >> + if (ret < 0 && !(flags & BDRV_REQ_NO_FALLBACK)) { > > > >> /* Fall back to bounce buffer if write zeroes is unsupported */ > > > >> BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE; > > > >> > > > >> > > > > > > > > I suppose that if fallocate fails for any reason and we're allowing > > > > fallback, we're either going to succeed ... or fail again very soon > > > > thereafter. > > > > > > > > Are there any cases where it is vital to not ignore the first fallocate > > > > failure? I'm a little wary of ignoring the return code from > > > > bdrv_co_pwrite_zeroes, but I am assuming that if there is a "real" > > > > failure here that the following bounce writes will also fail "safely." > > > > > > > > I'm not completely confident, but I have no tangible objections: > > > > Reviewed-by: John Snow <jsnow@redhat.com> > > > > > > > > > > Thank you for your review, John! > > > > > > Let me clarify the circumstances and quote the bug report: > > > "Customer had Win-2012 VM with 50GB system disk which was later resized > > > to 256GB without resizing the partition inside VM. > > > Now, while trying to resize to 50G, the following error will appear > > > 'Failed to reduce the number of L2 tables: Invalid argument' > > > It was found that it is possible to shrink the disk to 128G and any size > > > above that number, but size below 128G will bring the mentioned error." > > > > > > The fallocate() returns no error on that file system if the offset and > > > the (offset + bytes) parameters of the bdrv_co_do_pwrite_zeroes() both > > > are aligned to 4K. > > > > What is the return value you get from this file system? > > > > Maybe turning that into ENOTSUP in file-posix would be less invasive. > > Just falling back for any error gives me the vague feeling that it could > > cause problems sooner or later. > > Also, which file system is this? This seems to be a file system bug. > fallocate() isn't documented to possibly have alignment restrictions for > FALLOC_FL_ZERO_RANGE (if this is the operation you're talking about). > FALLOC_FL_PUNCH_HOLE even explicitly mentions the behaviour for partial > blocks, so there is no doubt that operations for partial blocks are > considered valid. Operations that may impose restrictions are explicitly > documented as such. > > So in any case, this would only be a workaround for a buggy file system. > The real bug needs to be fixed there. I agree regarding the root cause of the bug, but the fallback behavior is reasonable IMO. Andrey: If you update the patch with a more specific errno I'll queue that patch instead. Stefan
On 4/5/19 9:24 AM, Andrey Shinkevich wrote: > On a file system used by the customer, fallocate() returns an error Which error? > if the block is not properly aligned. So, bdrv_co_pwrite_zeroes() > fails. We can handle that case the same way as it is done for the > unsupported cases, namely, call to bdrv_driver_pwritev() that writes > zeroes to an image for the unaligned chunk of the block. > > Suggested-by: Denis V. Lunev <den@openvz.org> > Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> > --- > block/io.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/io.c b/block/io.c > index dfc153b..0412a51 100644 > --- a/block/io.c > +++ b/block/io.c > @@ -1516,7 +1516,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, > assert(!bs->supported_zero_flags); > } > > - if (ret == -ENOTSUP && !(flags & BDRV_REQ_NO_FALLBACK)) { > + if (ret < 0 && !(flags & BDRV_REQ_NO_FALLBACK)) { This change is a regression of sorts. Now, you are unconditionally attempting the fallback for ALL failures (such as EIO) and for all drivers, even when that was not previously attempted and increases the traffic. I think we should revert this patch and instead fix the fallocate() path to convert whatever ACTUAL errno you got from unaligned fallocate failure into ENOTSUP (that is, just the file-posix.c location that failed), while leaving all other errors as immediately fatal.
On 8/17/19 9:42 AM, Eric Blake wrote: > On 4/5/19 9:24 AM, Andrey Shinkevich wrote: >> On a file system used by the customer, fallocate() returns an error > > Which error? Okay, I read the rest of the thread; EINVAL. But the commit message was not amended before becoming commit 118f9944. >> >> - if (ret == -ENOTSUP && !(flags & BDRV_REQ_NO_FALLBACK)) { >> + if (ret < 0 && !(flags & BDRV_REQ_NO_FALLBACK)) { > > This change is a regression of sorts. Now, you are unconditionally > attempting the fallback for ALL failures (such as EIO) and for all > drivers, even when that was not previously attempted and increases the > traffic. I think we should revert this patch and instead fix the > fallocate() path to convert whatever ACTUAL errno you got from unaligned > fallocate failure into ENOTSUP (that is, just the file-posix.c location > that failed), while leaving all other errors as immediately fatal. And the rest of the thread worried about that exact scenario. Here's how I encountered it. I was trying to debug the nbdkit sh plugin, with: $ cat >script <<\EOF case $1 in get_size) echo 1m;; pread) false;; can_write|can_zero) ;; pwrite) ;; zero) echo ENOTSUP; exit 1 ;; *) exit 2;; esac EOF (the script has a subtle bug; zero) should be using 'echo ENOTSUP >&2', but because it didn't, nbdkit treats the failure as EIO rather than the intended ENOTSUP) coupled with: $ qemu-io -f raw nbd://localhost:10810 -c 'w -z 0 1' but when the script fails with EIO and qemu-io reported that the write was still successful, I was confused (I was debugging a server-side fallback to write, not a client-side one), until I discovered that we changed the semantics in qemu 4.1 that EIO is no longer fatal and attempts the write fallback.
On 8/17/19 9:49 AM, Eric Blake wrote: >> This change is a regression of sorts. Now, you are unconditionally >> attempting the fallback for ALL failures (such as EIO) and for all >> drivers, even when that was not previously attempted and increases the >> traffic. I think we should revert this patch and instead fix the >> fallocate() path to convert whatever ACTUAL errno you got from unaligned >> fallocate failure into ENOTSUP (that is, just the file-posix.c location >> that failed), while leaving all other errors as immediately fatal. Or even better, fix the call site of fallocate() to skip attempting an unaligned fallocate(), and just directly return ENOTSUP, rather than trying to diagnose EINVAL after the fact.
On 8/17/19 5:56 PM, Eric Blake wrote: > On 8/17/19 9:49 AM, Eric Blake wrote: > >>> This change is a regression of sorts. Now, you are unconditionally >>> attempting the fallback for ALL failures (such as EIO) and for all >>> drivers, even when that was not previously attempted and increases the >>> traffic. I think we should revert this patch and instead fix the >>> fallocate() path to convert whatever ACTUAL errno you got from unaligned >>> fallocate failure into ENOTSUP (that is, just the file-posix.c location >>> that failed), while leaving all other errors as immediately fatal. > Or even better, fix the call site of fallocate() to skip attempting an > unaligned fallocate(), and just directly return ENOTSUP, rather than > trying to diagnose EINVAL after the fact. > No way. Single ENOTSUP will turn off fallocate() support on caller side while aligned (99.99% of calls) works normally. Den
On 8/19/19 2:46 PM, Denis V. Lunev wrote: > On 8/17/19 5:56 PM, Eric Blake wrote: >> On 8/17/19 9:49 AM, Eric Blake wrote: >> >>>> This change is a regression of sorts. Now, you are unconditionally >>>> attempting the fallback for ALL failures (such as EIO) and for all >>>> drivers, even when that was not previously attempted and increases the >>>> traffic. I think we should revert this patch and instead fix the >>>> fallocate() path to convert whatever ACTUAL errno you got from unaligned >>>> fallocate failure into ENOTSUP (that is, just the file-posix.c location >>>> that failed), while leaving all other errors as immediately fatal. >> Or even better, fix the call site of fallocate() to skip attempting an >> unaligned fallocate(), and just directly return ENOTSUP, rather than >> trying to diagnose EINVAL after the fact. >> > No way. Single ENOTSUP will turn off fallocate() support on caller side > while > aligned (99.99% of calls) works normally. I didn't mean skip fallocate() unconditionally, only when unaligned: if (request not aligned enough) return -ENOTSUP; fallocate() ... so that the 99.99% requests that ARE aligned get to use fallocate() normally.
On 8/19/19 11:30 PM, Eric Blake wrote: > On 8/19/19 2:46 PM, Denis V. Lunev wrote: >> On 8/17/19 5:56 PM, Eric Blake wrote: >>> On 8/17/19 9:49 AM, Eric Blake wrote: >>> >>>>> This change is a regression of sorts. Now, you are unconditionally >>>>> attempting the fallback for ALL failures (such as EIO) and for all >>>>> drivers, even when that was not previously attempted and increases the >>>>> traffic. I think we should revert this patch and instead fix the >>>>> fallocate() path to convert whatever ACTUAL errno you got from unaligned >>>>> fallocate failure into ENOTSUP (that is, just the file-posix.c location >>>>> that failed), while leaving all other errors as immediately fatal. >>> Or even better, fix the call site of fallocate() to skip attempting an >>> unaligned fallocate(), and just directly return ENOTSUP, rather than >>> trying to diagnose EINVAL after the fact. >>> >> No way. Single ENOTSUP will turn off fallocate() support on caller side >> while >> aligned (99.99% of calls) works normally. > I didn't mean skip fallocate() unconditionally, only when unaligned: > > if (request not aligned enough) > return -ENOTSUP; > fallocate() ... > > so that the 99.99% requests that ARE aligned get to use fallocate() > normally. > static int handle_aiocb_write_zeroes(void *opaque) { ... #ifdef CONFIG_FALLOCATE_ZERO_RANGE if (s->has_write_zeroes) { int ret = do_fallocate(s->fd, FALLOC_FL_ZERO_RANGE, aiocb->aio_offset, aiocb->aio_nbytes); if (ret == 0 || ret != -ENOTSUP) { return ret; } s->has_write_zeroes = false; } #endif thus, right now, single ENOTSUP disables fallocate functionality completely setting s->has_write_zeroes to false and that is pretty much correct. ENOTSUP is "static" error code which returns persistent ENOTSUP under any consequences. Its handling usually disables some functionality. This is why original idea is proposed. Den
On 8/19/19 3:53 PM, Denis V. Lunev wrote: >>>> Or even better, fix the call site of fallocate() to skip attempting an >>>> unaligned fallocate(), and just directly return ENOTSUP, rather than >>>> trying to diagnose EINVAL after the fact. >>>> >>> No way. Single ENOTSUP will turn off fallocate() support on caller side >>> while >>> aligned (99.99% of calls) works normally. >> I didn't mean skip fallocate() unconditionally, only when unaligned: >> >> if (request not aligned enough) >> return -ENOTSUP; >> fallocate() ... >> >> so that the 99.99% requests that ARE aligned get to use fallocate() >> normally. >> > static int handle_aiocb_write_zeroes(void *opaque) > { > ... > #ifdef CONFIG_FALLOCATE_ZERO_RANGE > if (s->has_write_zeroes) { > int ret = do_fallocate(s->fd, FALLOC_FL_ZERO_RANGE, > aiocb->aio_offset, aiocb->aio_nbytes); > if (ret == 0 || ret != -ENOTSUP) { > return ret; > } > s->has_write_zeroes = false; > } > #endif > > thus, right now, single ENOTSUP disables fallocate > functionality completely setting s->has_write_zeroes > to false and that is pretty much correct. > > ENOTSUP is "static" error code which returns persistent > ENOTSUP under any consequences. Not always true. And the block layer doesn't expect it to be true. It is perfectly fine for one invocation to return ENOTSUP ('I can't handle this request, so fall back to pwrite for me) and the next to just work ('this one was aligned, so I handled it just fine). It just means that you have to be more careful with the logic: never set s->has_write_zeroes=false if you skipped the fallocate, or if the fallocate failed due to EINVAL rather than ENOTSUP (but still report ENOTSUP to the block layer, to document that you want the EINVAL for unaligned request to be turned into a fallback to pwrite).
diff --git a/block/io.c b/block/io.c index dfc153b..0412a51 100644 --- a/block/io.c +++ b/block/io.c @@ -1516,7 +1516,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, assert(!bs->supported_zero_flags); } - if (ret == -ENOTSUP && !(flags & BDRV_REQ_NO_FALLBACK)) { + if (ret < 0 && !(flags & BDRV_REQ_NO_FALLBACK)) { /* Fall back to bounce buffer if write zeroes is unsupported */ BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;
On a file system used by the customer, fallocate() returns an error if the block is not properly aligned. So, bdrv_co_pwrite_zeroes() fails. We can handle that case the same way as it is done for the unsupported cases, namely, call to bdrv_driver_pwritev() that writes zeroes to an image for the unaligned chunk of the block. Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)