diff mbox

block: Don't lose FUA flag during ZERO_WRITE fallback

Message ID 1462052936-16933-1-git-send-email-eblake@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Eric Blake April 30, 2016, 9:48 p.m. UTC
NBD has situations where it can support FUA but not ZERO_WRITE;
when that happens, the generic block layer fallback was losing
the FUA flag.  The problem of losing flags unrelated to
ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since
aa7bfbff, but back then, it did not matter because there was no
FUA flag.  But ever since 93f5e6d8 added bdrv_co_writev_flags(),
the loss of flags can impact correctness.

Compare to commit 9eeb6dd, which got it right in
bdrv_co_do_zero_pwritev().

Symptoms: I tested with qemu-io with default writethrough cache
(which is supposed to use FUA semantics on every write), and
targetted an NBD client connected to a server that intentionally
did not advertise NBD_FLAG_SEND_FUA.  When doing 'write 0 512',
the NBD client sent two operations (NBD_CMD_WRITE then
NBD_CMD_FLUSH) to get the fallback FUA semantics; but when doing
'write -z 0 512', the NBD client sent only NBD_CMD_WRITE; the
missing flush meant that an ill-timed disconnect could leave
the zeroes unflushed.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
---

As written, this patch applies to 2.7 on top of Kevin's block-next
branch.  Since it's (probably) too late for 2.6, we'll need to
backport it to there, but the backport will have to use
bdrv_co_writev_flags since 2.6 lacks bdrv_driver_pwritev().

 block/io.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Kevin Wolf May 2, 2016, 3:35 p.m. UTC | #1
Am 30.04.2016 um 23:48 hat Eric Blake geschrieben:
> NBD has situations where it can support FUA but not ZERO_WRITE;
> when that happens, the generic block layer fallback was losing
> the FUA flag.  The problem of losing flags unrelated to
> ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since
> aa7bfbff, but back then, it did not matter because there was no
> FUA flag.  But ever since 93f5e6d8 added bdrv_co_writev_flags(),
> the loss of flags can impact correctness.
> 
> Compare to commit 9eeb6dd, which got it right in
> bdrv_co_do_zero_pwritev().
> 
> Symptoms: I tested with qemu-io with default writethrough cache
> (which is supposed to use FUA semantics on every write), and
> targetted an NBD client connected to a server that intentionally
> did not advertise NBD_FLAG_SEND_FUA.  When doing 'write 0 512',
> the NBD client sent two operations (NBD_CMD_WRITE then
> NBD_CMD_FLUSH) to get the fallback FUA semantics; but when doing
> 'write -z 0 512', the NBD client sent only NBD_CMD_WRITE; the
> missing flush meant that an ill-timed disconnect could leave
> the zeroes unflushed.
> 
> CC: qemu-stable@nongnu.org
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> 
> As written, this patch applies to 2.7 on top of Kevin's block-next
> branch.  Since it's (probably) too late for 2.6, we'll need to
> backport it to there, but the backport will have to use
> bdrv_co_writev_flags since 2.6 lacks bdrv_driver_pwritev().
> 
>  block/io.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/block/io.c b/block/io.c
> index 0db1146..bd46e47 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1213,7 +1213,8 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>              qemu_iovec_init_external(&qiov, &iov, 1);
> 
>              ret = bdrv_driver_pwritev(bs, sector_num * BDRV_SECTOR_SIZE,
> -                                      num * BDRV_SECTOR_SIZE, &qiov, 0);
> +                                      num * BDRV_SECTOR_SIZE, &qiov,
> +                                      flags & ~BDRV_REQ_ZERO_WRITE);

This is a good change, but it's only in the fallback code. If we succeed
here:

    if (drv->bdrv_co_write_zeroes) {
        ret = drv->bdrv_co_write_zeroes(bs, sector_num, num, flags);
    }

then we still don't get the necessary flush unless the driver's
.bdrv_co_write_zeroes() implementation explicitly handles FUA. As far as
I know, we don't have any driver that implements FUA there.

Kevin
Eric Blake May 2, 2016, 3:42 p.m. UTC | #2
On 05/02/2016 09:35 AM, Kevin Wolf wrote:
> Am 30.04.2016 um 23:48 hat Eric Blake geschrieben:
>> NBD has situations where it can support FUA but not ZERO_WRITE;
>> when that happens, the generic block layer fallback was losing
>> the FUA flag.  The problem of losing flags unrelated to
>> ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since
>> aa7bfbff, but back then, it did not matter because there was no
>> FUA flag.  But ever since 93f5e6d8 added bdrv_co_writev_flags(),
>> the loss of flags can impact correctness.
>>

>> +++ b/block/io.c
>> @@ -1213,7 +1213,8 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>>              qemu_iovec_init_external(&qiov, &iov, 1);
>>
>>              ret = bdrv_driver_pwritev(bs, sector_num * BDRV_SECTOR_SIZE,
>> -                                      num * BDRV_SECTOR_SIZE, &qiov, 0);
>> +                                      num * BDRV_SECTOR_SIZE, &qiov,
>> +                                      flags & ~BDRV_REQ_ZERO_WRITE);
> 
> This is a good change, but it's only in the fallback code. If we succeed
> here:
> 
>     if (drv->bdrv_co_write_zeroes) {
>         ret = drv->bdrv_co_write_zeroes(bs, sector_num, num, flags);
>     }
> 
> then we still don't get the necessary flush unless the driver's
> .bdrv_co_write_zeroes() implementation explicitly handles FUA. As far as
> I know, we don't have any driver that implements FUA there.

NBD will, once we get to that part of my series.

But I see what you're saying: since we already emulate FUA for all
drivers where .supported_write_flags does not include BDRV_REQ_FUA
during bdrv_driver_pwritev(), we should also emulate it here for all the
same drivers (and any driver that DOES advertise BDRV_REQ_FUA as
supported as well as a write_zeroes callback should be fixed to honor
it).  I'll do that in v2, which I guess means I should post it at the
same time as my work for making .supported_write_flags a per-bds rather
than per-driver designation.
Eric Blake May 2, 2016, 5:14 p.m. UTC | #3
On 05/02/2016 09:42 AM, Eric Blake wrote:
> On 05/02/2016 09:35 AM, Kevin Wolf wrote:
>> Am 30.04.2016 um 23:48 hat Eric Blake geschrieben:
>>> NBD has situations where it can support FUA but not ZERO_WRITE;
>>> when that happens, the generic block layer fallback was losing
>>> the FUA flag.  The problem of losing flags unrelated to
>>> ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since
>>> aa7bfbff, but back then, it did not matter because there was no
>>> FUA flag.  But ever since 93f5e6d8 added bdrv_co_writev_flags(),
>>> the loss of flags can impact correctness.
>>>
> 

>> then we still don't get the necessary flush unless the driver's
>> .bdrv_co_write_zeroes() implementation explicitly handles FUA. As far as
>> I know, we don't have any driver that implements FUA there.
> 
> NBD will, once we get to that part of my series.

And looking further, it looks like SCSI does NOT support FUA during
WRITESAME(10/16), only during WRITE(10/16).  Which means we probably
want to start having both .supported_write_flags AND
.supported_write_zero flags, so that at least the iscsi driver can
properly report that it does NOT natively support FUA on a write_zeroes
request.

> 
> But I see what you're saying: since we already emulate FUA for all
> drivers where .supported_write_flags does not include BDRV_REQ_FUA
> during bdrv_driver_pwritev(), we should also emulate it here for all the
> same drivers (and any driver that DOES advertise BDRV_REQ_FUA as
> supported as well as a write_zeroes callback should be fixed to honor
> it).  I'll do that in v2, which I guess means I should post it at the
> same time as my work for making .supported_write_flags a per-bds rather
> than per-driver designation.

It also looks like SCSI would benefit from a per-bds
supported_write_flags, as our iscsi_modesense_sync() is setting
iscsilun->dpofua conditionally; likewise, iscsi_co_writev_flags blindly
ignores BDRV_REQ_FUA if iscsilun->dpofua is not set.  Which means that
not all ISCSI devices support the FUA bit in WRITE(10/16) requests, and
that therefore we should not be blindly assuming that FUA worked on
those devices where it was not sensed.  So just like NBD, iscsi has
cases where we need an automatic FUA fallback at the block layer,
regardless of whether we also need the fallback for WRITESAME(10/16).
Kevin Wolf May 3, 2016, 7:55 a.m. UTC | #4
Am 02.05.2016 um 19:14 hat Eric Blake geschrieben:
> On 05/02/2016 09:42 AM, Eric Blake wrote:
> > On 05/02/2016 09:35 AM, Kevin Wolf wrote:
> >> Am 30.04.2016 um 23:48 hat Eric Blake geschrieben:
> >>> NBD has situations where it can support FUA but not ZERO_WRITE;
> >>> when that happens, the generic block layer fallback was losing
> >>> the FUA flag.  The problem of losing flags unrelated to
> >>> ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since
> >>> aa7bfbff, but back then, it did not matter because there was no
> >>> FUA flag.  But ever since 93f5e6d8 added bdrv_co_writev_flags(),
> >>> the loss of flags can impact correctness.
> >>>
> > 
> 
> >> then we still don't get the necessary flush unless the driver's
> >> .bdrv_co_write_zeroes() implementation explicitly handles FUA. As far as
> >> I know, we don't have any driver that implements FUA there.
> > 
> > NBD will, once we get to that part of my series.
> 
> And looking further, it looks like SCSI does NOT support FUA during
> WRITESAME(10/16), only during WRITE(10/16).  Which means we probably
> want to start having both .supported_write_flags AND
> .supported_write_zero flags, so that at least the iscsi driver can
> properly report that it does NOT natively support FUA on a write_zeroes
> request.

Hm, not sure if it makes sense, but let me write that thought down:

You're going to convert .supported_write_flags to a function anyway.
Instead of adding a second function, how about passing a set of flags
there and the function returns a subset that it can support? For write
zeroes with FUA you would pass in BDRV_REQ_ZERO_WRITE | BDRV_REQ_FUA and
in this case the iscsi driver would return only BDRV_REQ_ZERO_WRITE.

If we ever decided to get rid of .bdrv_co_write_zeroes() and instead use
.bdrv_co_pwritev() with BDRV_REQ_ZERO_WRITE, this would probably be the
most consistent interface.

Kevin
Eric Blake May 3, 2016, 12:28 p.m. UTC | #5
On 05/03/2016 01:55 AM, Kevin Wolf wrote:
>> And looking further, it looks like SCSI does NOT support FUA during
>> WRITESAME(10/16), only during WRITE(10/16).  Which means we probably
>> want to start having both .supported_write_flags AND
>> .supported_write_zero flags, so that at least the iscsi driver can
>> properly report that it does NOT natively support FUA on a write_zeroes
>> request.
> 
> Hm, not sure if it makes sense, but let me write that thought down:
> 
> You're going to convert .supported_write_flags to a function anyway.
> Instead of adding a second function, how about passing a set of flags
> there and the function returns a subset that it can support? For write
> zeroes with FUA you would pass in BDRV_REQ_ZERO_WRITE | BDRV_REQ_FUA and
> in this case the iscsi driver would return only BDRV_REQ_ZERO_WRITE.
> 
> If we ever decided to get rid of .bdrv_co_write_zeroes() and instead use
> .bdrv_co_pwritev() with BDRV_REQ_ZERO_WRITE, this would probably be the
> most consistent interface.

I'm currently leaning towards two flags rather than a function (the
flags are set during .bdrv_open(), and don't change during the life of
the BDS):

bs->supported_write_flags = BDRV_REQ_FUA;
bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP; // | BDRV_REQ_FUA

and keeping the BDRV_REQ_ZERO_WRITE as a special case in the block layer
that chooses between .bdrv_co_write_zeroes() or .bdrv_driver_pwrite().
MAY_UNMAP makes no sense on a normal write, just as iscsi doesn't allow
FUA on a zero write, so having the two sets of valid flags according to
operations, as well as the two driver entries, seems to make the most
sense.  Then io.c, we ignore MAY_UNMAP where it is unsupported, and
emulate FUA on both write and write_zero as needed.

Of course, if we ever have a driver that can support everything through
a single entry, it could advertise supported_write_flags = BDRV_REQ_FUA
| BDRV_REQ_ZERO_WRITE | BDRV_REQ_MAY_UNMAP and provide only
.bdrv_driver_pwrite(), but I don't think that will be happening.
diff mbox

Patch

diff --git a/block/io.c b/block/io.c
index 0db1146..bd46e47 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1213,7 +1213,8 @@  static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
             qemu_iovec_init_external(&qiov, &iov, 1);

             ret = bdrv_driver_pwritev(bs, sector_num * BDRV_SECTOR_SIZE,
-                                      num * BDRV_SECTOR_SIZE, &qiov, 0);
+                                      num * BDRV_SECTOR_SIZE, &qiov,
+                                      flags & ~BDRV_REQ_ZERO_WRITE);

             /* Keep bounce buffer around if it is big enough for all
              * all future requests.