Message ID | 1462052936-16933-1-git-send-email-eblake@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Am 30.04.2016 um 23:48 hat Eric Blake geschrieben: > NBD has situations where it can support FUA but not ZERO_WRITE; > when that happens, the generic block layer fallback was losing > the FUA flag. The problem of losing flags unrelated to > ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since > aa7bfbff, but back then, it did not matter because there was no > FUA flag. But ever since 93f5e6d8 added bdrv_co_writev_flags(), > the loss of flags can impact correctness. > > Compare to commit 9eeb6dd, which got it right in > bdrv_co_do_zero_pwritev(). > > Symptoms: I tested with qemu-io with default writethrough cache > (which is supposed to use FUA semantics on every write), and > targetted an NBD client connected to a server that intentionally > did not advertise NBD_FLAG_SEND_FUA. When doing 'write 0 512', > the NBD client sent two operations (NBD_CMD_WRITE then > NBD_CMD_FLUSH) to get the fallback FUA semantics; but when doing > 'write -z 0 512', the NBD client sent only NBD_CMD_WRITE; the > missing flush meant that an ill-timed disconnect could leave > the zeroes unflushed. > > CC: qemu-stable@nongnu.org > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > > As written, this patch applies to 2.7 on top of Kevin's block-next > branch. Since it's (probably) too late for 2.6, we'll need to > backport it to there, but the backport will have to use > bdrv_co_writev_flags since 2.6 lacks bdrv_driver_pwritev(). > > block/io.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/block/io.c b/block/io.c > index 0db1146..bd46e47 100644 > --- a/block/io.c > +++ b/block/io.c > @@ -1213,7 +1213,8 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, > qemu_iovec_init_external(&qiov, &iov, 1); > > ret = bdrv_driver_pwritev(bs, sector_num * BDRV_SECTOR_SIZE, > - num * BDRV_SECTOR_SIZE, &qiov, 0); > + num * BDRV_SECTOR_SIZE, &qiov, > + flags & ~BDRV_REQ_ZERO_WRITE); This is a good change, but it's only in the fallback code. If we succeed here: if (drv->bdrv_co_write_zeroes) { ret = drv->bdrv_co_write_zeroes(bs, sector_num, num, flags); } then we still don't get the necessary flush unless the driver's .bdrv_co_write_zeroes() implementation explicitly handles FUA. As far as I know, we don't have any driver that implements FUA there. Kevin
On 05/02/2016 09:35 AM, Kevin Wolf wrote: > Am 30.04.2016 um 23:48 hat Eric Blake geschrieben: >> NBD has situations where it can support FUA but not ZERO_WRITE; >> when that happens, the generic block layer fallback was losing >> the FUA flag. The problem of losing flags unrelated to >> ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since >> aa7bfbff, but back then, it did not matter because there was no >> FUA flag. But ever since 93f5e6d8 added bdrv_co_writev_flags(), >> the loss of flags can impact correctness. >> >> +++ b/block/io.c >> @@ -1213,7 +1213,8 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, >> qemu_iovec_init_external(&qiov, &iov, 1); >> >> ret = bdrv_driver_pwritev(bs, sector_num * BDRV_SECTOR_SIZE, >> - num * BDRV_SECTOR_SIZE, &qiov, 0); >> + num * BDRV_SECTOR_SIZE, &qiov, >> + flags & ~BDRV_REQ_ZERO_WRITE); > > This is a good change, but it's only in the fallback code. If we succeed > here: > > if (drv->bdrv_co_write_zeroes) { > ret = drv->bdrv_co_write_zeroes(bs, sector_num, num, flags); > } > > then we still don't get the necessary flush unless the driver's > .bdrv_co_write_zeroes() implementation explicitly handles FUA. As far as > I know, we don't have any driver that implements FUA there. NBD will, once we get to that part of my series. But I see what you're saying: since we already emulate FUA for all drivers where .supported_write_flags does not include BDRV_REQ_FUA during bdrv_driver_pwritev(), we should also emulate it here for all the same drivers (and any driver that DOES advertise BDRV_REQ_FUA as supported as well as a write_zeroes callback should be fixed to honor it). I'll do that in v2, which I guess means I should post it at the same time as my work for making .supported_write_flags a per-bds rather than per-driver designation.
On 05/02/2016 09:42 AM, Eric Blake wrote: > On 05/02/2016 09:35 AM, Kevin Wolf wrote: >> Am 30.04.2016 um 23:48 hat Eric Blake geschrieben: >>> NBD has situations where it can support FUA but not ZERO_WRITE; >>> when that happens, the generic block layer fallback was losing >>> the FUA flag. The problem of losing flags unrelated to >>> ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since >>> aa7bfbff, but back then, it did not matter because there was no >>> FUA flag. But ever since 93f5e6d8 added bdrv_co_writev_flags(), >>> the loss of flags can impact correctness. >>> > >> then we still don't get the necessary flush unless the driver's >> .bdrv_co_write_zeroes() implementation explicitly handles FUA. As far as >> I know, we don't have any driver that implements FUA there. > > NBD will, once we get to that part of my series. And looking further, it looks like SCSI does NOT support FUA during WRITESAME(10/16), only during WRITE(10/16). Which means we probably want to start having both .supported_write_flags AND .supported_write_zero flags, so that at least the iscsi driver can properly report that it does NOT natively support FUA on a write_zeroes request. > > But I see what you're saying: since we already emulate FUA for all > drivers where .supported_write_flags does not include BDRV_REQ_FUA > during bdrv_driver_pwritev(), we should also emulate it here for all the > same drivers (and any driver that DOES advertise BDRV_REQ_FUA as > supported as well as a write_zeroes callback should be fixed to honor > it). I'll do that in v2, which I guess means I should post it at the > same time as my work for making .supported_write_flags a per-bds rather > than per-driver designation. It also looks like SCSI would benefit from a per-bds supported_write_flags, as our iscsi_modesense_sync() is setting iscsilun->dpofua conditionally; likewise, iscsi_co_writev_flags blindly ignores BDRV_REQ_FUA if iscsilun->dpofua is not set. Which means that not all ISCSI devices support the FUA bit in WRITE(10/16) requests, and that therefore we should not be blindly assuming that FUA worked on those devices where it was not sensed. So just like NBD, iscsi has cases where we need an automatic FUA fallback at the block layer, regardless of whether we also need the fallback for WRITESAME(10/16).
Am 02.05.2016 um 19:14 hat Eric Blake geschrieben: > On 05/02/2016 09:42 AM, Eric Blake wrote: > > On 05/02/2016 09:35 AM, Kevin Wolf wrote: > >> Am 30.04.2016 um 23:48 hat Eric Blake geschrieben: > >>> NBD has situations where it can support FUA but not ZERO_WRITE; > >>> when that happens, the generic block layer fallback was losing > >>> the FUA flag. The problem of losing flags unrelated to > >>> ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since > >>> aa7bfbff, but back then, it did not matter because there was no > >>> FUA flag. But ever since 93f5e6d8 added bdrv_co_writev_flags(), > >>> the loss of flags can impact correctness. > >>> > > > > >> then we still don't get the necessary flush unless the driver's > >> .bdrv_co_write_zeroes() implementation explicitly handles FUA. As far as > >> I know, we don't have any driver that implements FUA there. > > > > NBD will, once we get to that part of my series. > > And looking further, it looks like SCSI does NOT support FUA during > WRITESAME(10/16), only during WRITE(10/16). Which means we probably > want to start having both .supported_write_flags AND > .supported_write_zero flags, so that at least the iscsi driver can > properly report that it does NOT natively support FUA on a write_zeroes > request. Hm, not sure if it makes sense, but let me write that thought down: You're going to convert .supported_write_flags to a function anyway. Instead of adding a second function, how about passing a set of flags there and the function returns a subset that it can support? For write zeroes with FUA you would pass in BDRV_REQ_ZERO_WRITE | BDRV_REQ_FUA and in this case the iscsi driver would return only BDRV_REQ_ZERO_WRITE. If we ever decided to get rid of .bdrv_co_write_zeroes() and instead use .bdrv_co_pwritev() with BDRV_REQ_ZERO_WRITE, this would probably be the most consistent interface. Kevin
On 05/03/2016 01:55 AM, Kevin Wolf wrote: >> And looking further, it looks like SCSI does NOT support FUA during >> WRITESAME(10/16), only during WRITE(10/16). Which means we probably >> want to start having both .supported_write_flags AND >> .supported_write_zero flags, so that at least the iscsi driver can >> properly report that it does NOT natively support FUA on a write_zeroes >> request. > > Hm, not sure if it makes sense, but let me write that thought down: > > You're going to convert .supported_write_flags to a function anyway. > Instead of adding a second function, how about passing a set of flags > there and the function returns a subset that it can support? For write > zeroes with FUA you would pass in BDRV_REQ_ZERO_WRITE | BDRV_REQ_FUA and > in this case the iscsi driver would return only BDRV_REQ_ZERO_WRITE. > > If we ever decided to get rid of .bdrv_co_write_zeroes() and instead use > .bdrv_co_pwritev() with BDRV_REQ_ZERO_WRITE, this would probably be the > most consistent interface. I'm currently leaning towards two flags rather than a function (the flags are set during .bdrv_open(), and don't change during the life of the BDS): bs->supported_write_flags = BDRV_REQ_FUA; bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP; // | BDRV_REQ_FUA and keeping the BDRV_REQ_ZERO_WRITE as a special case in the block layer that chooses between .bdrv_co_write_zeroes() or .bdrv_driver_pwrite(). MAY_UNMAP makes no sense on a normal write, just as iscsi doesn't allow FUA on a zero write, so having the two sets of valid flags according to operations, as well as the two driver entries, seems to make the most sense. Then io.c, we ignore MAY_UNMAP where it is unsupported, and emulate FUA on both write and write_zero as needed. Of course, if we ever have a driver that can support everything through a single entry, it could advertise supported_write_flags = BDRV_REQ_FUA | BDRV_REQ_ZERO_WRITE | BDRV_REQ_MAY_UNMAP and provide only .bdrv_driver_pwrite(), but I don't think that will be happening.
diff --git a/block/io.c b/block/io.c index 0db1146..bd46e47 100644 --- a/block/io.c +++ b/block/io.c @@ -1213,7 +1213,8 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, qemu_iovec_init_external(&qiov, &iov, 1); ret = bdrv_driver_pwritev(bs, sector_num * BDRV_SECTOR_SIZE, - num * BDRV_SECTOR_SIZE, &qiov, 0); + num * BDRV_SECTOR_SIZE, &qiov, + flags & ~BDRV_REQ_ZERO_WRITE); /* Keep bounce buffer around if it is big enough for all * all future requests.
NBD has situations where it can support FUA but not ZERO_WRITE; when that happens, the generic block layer fallback was losing the FUA flag. The problem of losing flags unrelated to ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since aa7bfbff, but back then, it did not matter because there was no FUA flag. But ever since 93f5e6d8 added bdrv_co_writev_flags(), the loss of flags can impact correctness. Compare to commit 9eeb6dd, which got it right in bdrv_co_do_zero_pwritev(). Symptoms: I tested with qemu-io with default writethrough cache (which is supposed to use FUA semantics on every write), and targetted an NBD client connected to a server that intentionally did not advertise NBD_FLAG_SEND_FUA. When doing 'write 0 512', the NBD client sent two operations (NBD_CMD_WRITE then NBD_CMD_FLUSH) to get the fallback FUA semantics; but when doing 'write -z 0 512', the NBD client sent only NBD_CMD_WRITE; the missing flush meant that an ill-timed disconnect could leave the zeroes unflushed. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> --- As written, this patch applies to 2.7 on top of Kevin's block-next branch. Since it's (probably) too late for 2.6, we'll need to backport it to there, but the backport will have to use bdrv_co_writev_flags since 2.6 lacks bdrv_driver_pwritev(). block/io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)