Message ID | 20190503163028.213823-1-sgarzare@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] block/rbd: increase dynamically the image size | expand |
On Fri, May 3, 2019 at 12:30 PM Stefano Garzarella <sgarzare@redhat.com> wrote: > > RBD APIs don't allow us to write more than the size set with > rbd_create() or rbd_resize(). > In order to support growing images (eg. qcow2), we resize the > image before write operations that exceed the current size. > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> > --- > v2: > - use bs->total_sectors instead of adding a new field [Kevin] > - resize the image only during write operation [Kevin] > for read operation, the bdrv_aligned_preadv() already handles reads > that exceed the length returned by bdrv_getlength(), so IMHO we can > avoid to handle it in the rbd driver > --- > block/rbd.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/block/rbd.c b/block/rbd.c > index 0c549c9935..613e8f4982 100644 > --- a/block/rbd.c > +++ b/block/rbd.c > @@ -934,13 +934,25 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs, > } > > switch (cmd) { > - case RBD_AIO_WRITE: > + case RBD_AIO_WRITE: { > + /* > + * RBD APIs don't allow us to write more than actual size, so in order > + * to support growing images, we resize the image before write > + * operations that exceed the current size. > + */ > + if (off + size > bs->total_sectors * BDRV_SECTOR_SIZE) { When will "bs->total_sectors" be refreshed to represent the correct current size? You wouldn't want a future write whose extent was greater than the original image size but less then a previous IO that expanded the image to attempt to shrink the image. > + r = rbd_resize(s->image, off + size); > + if (r < 0) { > + goto failed_completion; > + } > + } > #ifdef LIBRBD_SUPPORTS_IOVEC > r = rbd_aio_writev(s->image, qiov->iov, qiov->niov, off, c); > #else > r = rbd_aio_write(s->image, off, size, rcb->buf, c); > #endif > break; > + } > case RBD_AIO_READ: > #ifdef LIBRBD_SUPPORTS_IOVEC > r = rbd_aio_readv(s->image, qiov->iov, qiov->niov, off, c); > -- > 2.20.1 > >
On Fri, May 03, 2019 at 01:21:23PM -0400, Jason Dillaman wrote: > On Fri, May 3, 2019 at 12:30 PM Stefano Garzarella <sgarzare@redhat.com> wrote: > > > > RBD APIs don't allow us to write more than the size set with > > rbd_create() or rbd_resize(). > > In order to support growing images (eg. qcow2), we resize the > > image before write operations that exceed the current size. > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> > > --- > > v2: > > - use bs->total_sectors instead of adding a new field [Kevin] > > - resize the image only during write operation [Kevin] > > for read operation, the bdrv_aligned_preadv() already handles reads > > that exceed the length returned by bdrv_getlength(), so IMHO we can > > avoid to handle it in the rbd driver > > --- > > block/rbd.c | 14 +++++++++++++- > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > diff --git a/block/rbd.c b/block/rbd.c > > index 0c549c9935..613e8f4982 100644 > > --- a/block/rbd.c > > +++ b/block/rbd.c > > @@ -934,13 +934,25 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs, > > } > > > > switch (cmd) { > > - case RBD_AIO_WRITE: > > + case RBD_AIO_WRITE: { > > + /* > > + * RBD APIs don't allow us to write more than actual size, so in order > > + * to support growing images, we resize the image before write > > + * operations that exceed the current size. > > + */ > > + if (off + size > bs->total_sectors * BDRV_SECTOR_SIZE) { > > When will "bs->total_sectors" be refreshed to represent the correct > current size? You wouldn't want a future write whose extent was > greater than the original image size but less then a previous IO that > expanded the image to attempt to shrink the image. > Good point! IIUC it can happen, because in the bdrv_aligned_pwritev() we do these steps: 1. call bdrv_driver_pwritev() that invokes "drv->bdrv_aio_pwritev" and then it waits calling "qemu_coroutine_yield()" 2. call bdrv_co_write_req_finish() that updates the "bs->total_sectors" Between steps 1 and 2, maybe another request can be executed, then the issue that you described can occur. The solutions that I have in mind are: a. Add a variable in the BDRVRBDState to track the latest resize. b. Call rbd_get_size() before the rbd_resize() to be sure to avoid to shrink the image. c. Updates the "bs->total_sectors" after the rbd_resize(), but I'm not sure it is allowed. @Jason, @Kevin Do you have any advice? Thanks, Stefano
Am 06.05.2019 um 11:50 hat Stefano Garzarella geschrieben: > On Fri, May 03, 2019 at 01:21:23PM -0400, Jason Dillaman wrote: > > On Fri, May 3, 2019 at 12:30 PM Stefano Garzarella <sgarzare@redhat.com> wrote: > > > > > > RBD APIs don't allow us to write more than the size set with > > > rbd_create() or rbd_resize(). > > > In order to support growing images (eg. qcow2), we resize the > > > image before write operations that exceed the current size. > > > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> > > > --- > > > v2: > > > - use bs->total_sectors instead of adding a new field [Kevin] > > > - resize the image only during write operation [Kevin] > > > for read operation, the bdrv_aligned_preadv() already handles reads > > > that exceed the length returned by bdrv_getlength(), so IMHO we can > > > avoid to handle it in the rbd driver > > > --- > > > block/rbd.c | 14 +++++++++++++- > > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > > > diff --git a/block/rbd.c b/block/rbd.c > > > index 0c549c9935..613e8f4982 100644 > > > --- a/block/rbd.c > > > +++ b/block/rbd.c > > > @@ -934,13 +934,25 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs, > > > } > > > > > > switch (cmd) { > > > - case RBD_AIO_WRITE: > > > + case RBD_AIO_WRITE: { > > > + /* > > > + * RBD APIs don't allow us to write more than actual size, so in order > > > + * to support growing images, we resize the image before write > > > + * operations that exceed the current size. > > > + */ > > > + if (off + size > bs->total_sectors * BDRV_SECTOR_SIZE) { > > > > When will "bs->total_sectors" be refreshed to represent the correct > > current size? You wouldn't want a future write whose extent was > > greater than the original image size but less then a previous IO that > > expanded the image to attempt to shrink the image. > > > > Good point! > IIUC it can happen, because in the bdrv_aligned_pwritev() we do these > steps: > 1. call bdrv_driver_pwritev() that invokes "drv->bdrv_aio_pwritev" and > then it waits calling "qemu_coroutine_yield()" > 2. call bdrv_co_write_req_finish() that updates the "bs->total_sectors" > > Between steps 1 and 2, maybe another request can be executed, then the > issue that you described can occur. > > The solutions that I have in mind are: > a. Add a variable in the BDRVRBDState to track the latest resize. This would work and be relatively simple. > b. Call rbd_get_size() before the rbd_resize() to be sure to avoid to shrink > the image. I'm not sure if rbd_get_size() involves network traffic or other significant complexity. If so, I'd definitely avoid it. > c. Updates the "bs->total_sectors" after the rbd_resize(), but I'm not > sure it is allowed. > > @Jason, @Kevin Do you have any advice? We need to make sure to run everything that bdrv_co_write_req_finish() does for resizing an image: bs->total_sectors = end_sector; bdrv_parent_cb_resize(bs); bdrv_dirty_bitmap_truncate(bs, end_sector << BDRV_SECTOR_BITS); Just duplicating that code wouldn't be good; if something is added, we'd probably forget updating rbd, too. So I think your solution c would at least involve refactoring the above code into a separate function that can be called from rbd. But solution a might actually be the simplest. In this case, sorry for giving you bad advice in v1 of the patch. Kevin
On Tue, May 07, 2019 at 11:43:50AM +0200, Kevin Wolf wrote: > Am 06.05.2019 um 11:50 hat Stefano Garzarella geschrieben: > > On Fri, May 03, 2019 at 01:21:23PM -0400, Jason Dillaman wrote: > > > On Fri, May 3, 2019 at 12:30 PM Stefano Garzarella <sgarzare@redhat.com> wrote: > > > > > > > > RBD APIs don't allow us to write more than the size set with > > > > rbd_create() or rbd_resize(). > > > > In order to support growing images (eg. qcow2), we resize the > > > > image before write operations that exceed the current size. > > > > > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> > > > > --- > > > > v2: > > > > - use bs->total_sectors instead of adding a new field [Kevin] > > > > - resize the image only during write operation [Kevin] > > > > for read operation, the bdrv_aligned_preadv() already handles reads > > > > that exceed the length returned by bdrv_getlength(), so IMHO we can > > > > avoid to handle it in the rbd driver > > > > --- > > > > block/rbd.c | 14 +++++++++++++- > > > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/block/rbd.c b/block/rbd.c > > > > index 0c549c9935..613e8f4982 100644 > > > > --- a/block/rbd.c > > > > +++ b/block/rbd.c > > > > @@ -934,13 +934,25 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs, > > > > } > > > > > > > > switch (cmd) { > > > > - case RBD_AIO_WRITE: > > > > + case RBD_AIO_WRITE: { > > > > + /* > > > > + * RBD APIs don't allow us to write more than actual size, so in order > > > > + * to support growing images, we resize the image before write > > > > + * operations that exceed the current size. > > > > + */ > > > > + if (off + size > bs->total_sectors * BDRV_SECTOR_SIZE) { > > > > > > When will "bs->total_sectors" be refreshed to represent the correct > > > current size? You wouldn't want a future write whose extent was > > > greater than the original image size but less then a previous IO that > > > expanded the image to attempt to shrink the image. > > > > > > > Good point! > > IIUC it can happen, because in the bdrv_aligned_pwritev() we do these > > steps: > > 1. call bdrv_driver_pwritev() that invokes "drv->bdrv_aio_pwritev" and > > then it waits calling "qemu_coroutine_yield()" > > 2. call bdrv_co_write_req_finish() that updates the "bs->total_sectors" > > > > Between steps 1 and 2, maybe another request can be executed, then the > > issue that you described can occur. > > > > The solutions that I have in mind are: > > a. Add a variable in the BDRVRBDState to track the latest resize. > > This would work and be relatively simple. > > > b. Call rbd_get_size() before the rbd_resize() to be sure to avoid to shrink > > the image. > > I'm not sure if rbd_get_size() involves network traffic or other > significant complexity. If so, I'd definitely avoid it. > > > c. Updates the "bs->total_sectors" after the rbd_resize(), but I'm not > > sure it is allowed. > > > > @Jason, @Kevin Do you have any advice? > > We need to make sure to run everything that bdrv_co_write_req_finish() > does for resizing an image: > > bs->total_sectors = end_sector; > bdrv_parent_cb_resize(bs); > bdrv_dirty_bitmap_truncate(bs, end_sector << BDRV_SECTOR_BITS); > > Just duplicating that code wouldn't be good; if something is added, we'd > probably forget updating rbd, too. So I think your solution c would at > least involve refactoring the above code into a separate function that > can be called from rbd. > > But solution a might actually be the simplest. In this case, sorry for > giving you bad advice in v1 of the patch. > I agree with you, 'a' should be simplest to implement. I'll send a v3 fixing this. Thanks, Stefano
diff --git a/block/rbd.c b/block/rbd.c index 0c549c9935..613e8f4982 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -934,13 +934,25 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs, } switch (cmd) { - case RBD_AIO_WRITE: + case RBD_AIO_WRITE: { + /* + * RBD APIs don't allow us to write more than actual size, so in order + * to support growing images, we resize the image before write + * operations that exceed the current size. + */ + if (off + size > bs->total_sectors * BDRV_SECTOR_SIZE) { + r = rbd_resize(s->image, off + size); + if (r < 0) { + goto failed_completion; + } + } #ifdef LIBRBD_SUPPORTS_IOVEC r = rbd_aio_writev(s->image, qiov->iov, qiov->niov, off, c); #else r = rbd_aio_write(s->image, off, size, rcb->buf, c); #endif break; + } case RBD_AIO_READ: #ifdef LIBRBD_SUPPORTS_IOVEC r = rbd_aio_readv(s->image, qiov->iov, qiov->niov, off, c);
RBD APIs don't allow us to write more than the size set with rbd_create() or rbd_resize(). In order to support growing images (eg. qcow2), we resize the image before write operations that exceed the current size. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> --- v2: - use bs->total_sectors instead of adding a new field [Kevin] - resize the image only during write operation [Kevin] for read operation, the bdrv_aligned_preadv() already handles reads that exceed the length returned by bdrv_getlength(), so IMHO we can avoid to handle it in the rbd driver --- block/rbd.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)