Message ID | 20190411105025.97397-2-sgarzare@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block/rbd: increase dynamically the image size | expand |
On Thu, Apr 11, 2019 at 7:00 AM Stefano Garzarella <sgarzare@redhat.com> wrote: > > RBD APIs don't allow us to write more than the size set with rbd_create() > or rbd_resize(). > In order to support growing images (eg. qcow2), we resize the image > before RW operations that exceed the current size. What's the use-case for storing qcow2 images within a RBD image? RBD images are already thinly provisioned, they support snapshots, they can form a parent/child linked image hierarchy. > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1171007 > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> > --- > block/rbd.c | 25 +++++++++++++++++++++++++ > 1 file changed, 25 insertions(+) > > diff --git a/block/rbd.c b/block/rbd.c > index 0c549c9935..228658e20a 100644 > --- a/block/rbd.c > +++ b/block/rbd.c > @@ -102,6 +102,7 @@ typedef struct BDRVRBDState { > rbd_image_t image; > char *image_name; > char *snap; > + uint64_t image_size; > } BDRVRBDState; > > static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx, > @@ -777,6 +778,14 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags, > goto failed_open; > } > > + r = rbd_get_size(s->image, &s->image_size); > + if (r < 0) { > + error_setg_errno(errp, -r, "error reading image size from %s", > + s->image_name); > + rbd_close(s->image); > + goto failed_open; > + } > + > /* If we are using an rbd snapshot, we must be r/o, otherwise > * leave as-is */ > if (s->snap != NULL) { > @@ -921,6 +930,20 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs, > rcb->buf = acb->bounce; > } > > + /* > + * RBD APIs don't allow us to write more than actual size, so in order > + * to support growing images, we resize the image before RW operations > + * that exceed the current size. > + */ > + if (s->image_size < off + size) { > + r = rbd_resize(s->image, off + size); > + if (r < 0) { > + goto failed; > + } > + > + s->image_size = off + size; > + } > + > acb->ret = 0; > acb->error = 0; > acb->s = s; > @@ -1066,6 +1089,8 @@ static int coroutine_fn qemu_rbd_co_truncate(BlockDriverState *bs, > return r; > } > > + s->image_size = offset; > + > return 0; > } > > -- > 2.20.1 > >
On Thu, Apr 11, 2019 at 08:35:44AM -0400, Jason Dillaman wrote: > On Thu, Apr 11, 2019 at 7:00 AM Stefano Garzarella <sgarzare@redhat.com> wrote: > > > > RBD APIs don't allow us to write more than the size set with rbd_create() > > or rbd_resize(). > > In order to support growing images (eg. qcow2), we resize the image > > before RW operations that exceed the current size. > > What's the use-case for storing qcow2 images within a RBD image? RBD > images are already thinly provisioned, they support snapshots, they > can form a parent/child linked image hierarchy. > Hi Jason, I understand your point of view, maybe one use case could be if you have a qcow2 image and you want to put it in the rdb pool without convert it. I'm going through this BZ [1] and I'll ask if they have other use cases in mind. Anyway, if we want to support only raw format on RBD driver, maybe we should return something different from current behaviour, also avoiding to create the image: $ qemu-img create -f qcow2 rbd:test/qcow.img 1G qemu-img: rbd:test/qcow.img: Could not write qcow2 header: Invalid argument What do you think? Thanks, Stefano [1] https://bugzilla.redhat.com/show_bug.cgi?id=1171007
On Thu, Apr 11, 2019 at 9:02 AM Stefano Garzarella <sgarzare@redhat.com> wrote: > > On Thu, Apr 11, 2019 at 08:35:44AM -0400, Jason Dillaman wrote: > > On Thu, Apr 11, 2019 at 7:00 AM Stefano Garzarella <sgarzare@redhat.com> wrote: > > > > > > RBD APIs don't allow us to write more than the size set with rbd_create() > > > or rbd_resize(). > > > In order to support growing images (eg. qcow2), we resize the image > > > before RW operations that exceed the current size. > > > > What's the use-case for storing qcow2 images within a RBD image? RBD > > images are already thinly provisioned, they support snapshots, they > > can form a parent/child linked image hierarchy. > > > > Hi Jason, > I understand your point of view, maybe one use case could be if you have > a qcow2 image and you want to put it in the rdb pool without convert it. > > I'm going through this BZ [1] and I'll ask if they have other > use cases in mind. Assuming no good use-cases, perhaps it would just be better to make the qemu-img error messages more clear. > Anyway, if we want to support only raw format on RBD driver, maybe we > should return something different from current behaviour, also avoiding > to create the image: > > $ qemu-img create -f qcow2 rbd:test/qcow.img 1G > qemu-img: rbd:test/qcow.img: Could not write qcow2 header: Invalid argument > > What do you think? > > Thanks, > Stefano > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1171007
On Thu, Apr 11, 2019 at 01:06:49PM -0400, Jason Dillaman wrote: > On Thu, Apr 11, 2019 at 9:02 AM Stefano Garzarella <sgarzare@redhat.com> wrote: > > > > On Thu, Apr 11, 2019 at 08:35:44AM -0400, Jason Dillaman wrote: > > > On Thu, Apr 11, 2019 at 7:00 AM Stefano Garzarella <sgarzare@redhat.com> wrote: > > > > > > > > RBD APIs don't allow us to write more than the size set with rbd_create() > > > > or rbd_resize(). > > > > In order to support growing images (eg. qcow2), we resize the image > > > > before RW operations that exceed the current size. > > > > > > What's the use-case for storing qcow2 images within a RBD image? RBD > > > images are already thinly provisioned, they support snapshots, they > > > can form a parent/child linked image hierarchy. > > > > > > > Hi Jason, > > I understand your point of view, maybe one use case could be if you have > > a qcow2 image and you want to put it in the rdb pool without convert it. > > > > I'm going through this BZ [1] and I'll ask if they have other > > use cases in mind. > > Assuming no good use-cases, perhaps it would just be better to make > the qemu-img error messages more clear. > Hi Jason, I asked about use-cases and they want to use qcow2 on rbd in order to take advantage of these qcow2 features [1]: external snapshots, Copy-on-write, and optional compression and encryption. Maybe the more interesting are external snapshots and Copy-on-write, since encryption can be achieved with LUKS and rbd should support compression for a specified pool. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1171007#c13 Cheers, Stefano
On Sun, Apr 14, 2019 at 9:20 AM Stefano Garzarella <sgarzare@redhat.com> wrote: > > On Thu, Apr 11, 2019 at 01:06:49PM -0400, Jason Dillaman wrote: > > On Thu, Apr 11, 2019 at 9:02 AM Stefano Garzarella <sgarzare@redhat.com> wrote: > > > > > > On Thu, Apr 11, 2019 at 08:35:44AM -0400, Jason Dillaman wrote: > > > > On Thu, Apr 11, 2019 at 7:00 AM Stefano Garzarella <sgarzare@redhat.com> wrote: > > > > > > > > > > RBD APIs don't allow us to write more than the size set with rbd_create() > > > > > or rbd_resize(). > > > > > In order to support growing images (eg. qcow2), we resize the image > > > > > before RW operations that exceed the current size. > > > > > > > > What's the use-case for storing qcow2 images within a RBD image? RBD > > > > images are already thinly provisioned, they support snapshots, they > > > > can form a parent/child linked image hierarchy. > > > > > > > > > > Hi Jason, > > > I understand your point of view, maybe one use case could be if you have > > > a qcow2 image and you want to put it in the rdb pool without convert it. > > > > > > I'm going through this BZ [1] and I'll ask if they have other > > > use cases in mind. > > > > Assuming no good use-cases, perhaps it would just be better to make > > the qemu-img error messages more clear. > > > > Hi Jason, > I asked about use-cases and they want to use qcow2 on rbd in order to > take advantage of these qcow2 features [1]: external snapshots, > Copy-on-write, and optional compression and encryption. > > Maybe the more interesting are external snapshots and Copy-on-write, Copy-on-write is natively supported by RBD. The concept of external snapshots seems similar to just automating the process of creating a new copy-on-write image. Compression is also supported by Ceph on the cluster side by recent releases. > since encryption can be achieved with LUKS and rbd should support > compression for a specified pool. > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1171007#c13 > > Cheers, > Stefano
Am 14.04.2019 um 17:14 hat Jason Dillaman geschrieben: > On Sun, Apr 14, 2019 at 9:20 AM Stefano Garzarella <sgarzare@redhat.com> wrote: > > > > On Thu, Apr 11, 2019 at 01:06:49PM -0400, Jason Dillaman wrote: > > > On Thu, Apr 11, 2019 at 9:02 AM Stefano Garzarella <sgarzare@redhat.com> wrote: > > > > > > > > On Thu, Apr 11, 2019 at 08:35:44AM -0400, Jason Dillaman wrote: > > > > > On Thu, Apr 11, 2019 at 7:00 AM Stefano Garzarella <sgarzare@redhat.com> wrote: > > > > > > > > > > > > RBD APIs don't allow us to write more than the size set with rbd_create() > > > > > > or rbd_resize(). > > > > > > In order to support growing images (eg. qcow2), we resize the image > > > > > > before RW operations that exceed the current size. > > > > > > > > > > What's the use-case for storing qcow2 images within a RBD image? RBD > > > > > images are already thinly provisioned, they support snapshots, they > > > > > can form a parent/child linked image hierarchy. > > > > > > > > > > > > > Hi Jason, > > > > I understand your point of view, maybe one use case could be if you have > > > > a qcow2 image and you want to put it in the rdb pool without convert it. > > > > > > > > I'm going through this BZ [1] and I'll ask if they have other > > > > use cases in mind. > > > > > > Assuming no good use-cases, perhaps it would just be better to make > > > the qemu-img error messages more clear. > > > > > > > Hi Jason, > > I asked about use-cases and they want to use qcow2 on rbd in order to > > take advantage of these qcow2 features [1]: external snapshots, > > Copy-on-write, and optional compression and encryption. > > > > Maybe the more interesting are external snapshots and Copy-on-write, > > Copy-on-write is natively supported by RBD. The concept of external > snapshots seems similar to just automating the process of creating a > new copy-on-write image. Compression is also supported by Ceph on the > cluster side by recent releases. I think a potential actual use case could be persistent dirty bitmaps for incremental backup. Though maybe that would be better served by using the rbd image just as a raw external data file and keeping the qcow2 metadata on a filesystem. How fast is rbd_resize()? Does automatically resizing for every write request actually work reasonably well in practice? If it does, there is probably little reason not to allow it, even if the use cases are rather obscure. Kevin
On Mon, Apr 15, 2019 at 10:04:52AM +0200, Kevin Wolf wrote: > > I think a potential actual use case could be persistent dirty bitmaps > for incremental backup. Though maybe that would be better served by > using the rbd image just as a raw external data file and keeping the > qcow2 metadata on a filesystem. Thanks to point it out! I'll take a look to understand how to keep metadata separated from the data. > > How fast is rbd_resize()? Does automatically resizing for every write > request actually work reasonably well in practice? If it does, there is > probably little reason not to allow it, even if the use cases are rather > obscure. I'll try to measure the percentage of the time spent in the rbd_resize. Another solution could be to pass to the rbd driver the virtual size of the image and resize it only one time also if the preallocation is disabled, because RBD will not allocate blocks but IIUC it only set the max size. Do you think make sense? Is it feasible? Thanks, Stefano
Am 17.04.2019 um 09:34 hat Stefano Garzarella geschrieben: > On Mon, Apr 15, 2019 at 10:04:52AM +0200, Kevin Wolf wrote: > > > > I think a potential actual use case could be persistent dirty bitmaps > > for incremental backup. Though maybe that would be better served by > > using the rbd image just as a raw external data file and keeping the > > qcow2 metadata on a filesystem. > > Thanks to point it out! I'll take a look to understand how to keep > metadata separated from the data. I'd consider the feature still experimental, but for local files, it works like this: qemu-img create -f qcow2 -o data_file=test.raw test.qcow2 4G And then just use test.qcow2. As long as you can put everything you need into an rbd URL, the same approach should work. Otherwise, you may need to use QMP blockdev-create on creation and possibly the data-file option of the qcow2 driver for opening. > > How fast is rbd_resize()? Does automatically resizing for every write > > request actually work reasonably well in practice? If it does, there is > > probably little reason not to allow it, even if the use cases are rather > > obscure. > > I'll try to measure the percentage of the time spent in the rbd_resize. > > Another solution could be to pass to the rbd driver the virtual size of > the image and resize it only one time also if the preallocation is > disabled, because RBD will not allocate blocks but IIUC it only set the max > size. > > Do you think make sense? Is it feasible? In theory yes, though it requires modification of every driver that should be usable together with rbd (i.e. ideally all of the drivers). If automatic resize works good enough, I'd prefer that. Kevin
Hi Kevin, On Wed, Apr 17, 2019 at 10:04:43AM +0200, Kevin Wolf wrote: > Am 17.04.2019 um 09:34 hat Stefano Garzarella geschrieben: > > On Mon, Apr 15, 2019 at 10:04:52AM +0200, Kevin Wolf wrote: > > > > > > I think a potential actual use case could be persistent dirty bitmaps > > > for incremental backup. Though maybe that would be better served by > > > using the rbd image just as a raw external data file and keeping the > > > qcow2 metadata on a filesystem. > > > > Thanks to point it out! I'll take a look to understand how to keep > > metadata separated from the data. > > I'd consider the feature still experimental, but for local files, it > works like this: > > qemu-img create -f qcow2 -o data_file=test.raw test.qcow2 4G > > And then just use test.qcow2. As long as you can put everything you need > into an rbd URL, the same approach should work. Otherwise, you may need > to use QMP blockdev-create on creation and possibly the data-file option > of the qcow2 driver for opening. > Very interesting, I'll try to add this support also in the rbd driver. > > > How fast is rbd_resize()? Does automatically resizing for every write > > > request actually work reasonably well in practice? If it does, there is > > > probably little reason not to allow it, even if the use cases are rather > > > obscure. > > > > I'll try to measure the percentage of the time spent in the rbd_resize. > > > > Another solution could be to pass to the rbd driver the virtual size of > > the image and resize it only one time also if the preallocation is > > disabled, because RBD will not allocate blocks but IIUC it only set the max > > size. > > > > Do you think make sense? Is it feasible? > > In theory yes, though it requires modification of every driver that > should be usable together with rbd (i.e. ideally all of the drivers). If > automatic resize works good enough, I'd prefer that I did some tests and it seems that the cost of rbd_resize() is negligible. IIUC it only updates the metadata without allocating any blocks (if we are growing, like that case). Anyway the automatic resize will not affect the current use-case (raw images on rbd), where the file size is set during the creation, so I think there should not be side effects with this patch. I'm also adding the support for preallocation (i.e. full) in the rbd driver that can be useful for qcow2 images. If you prefer I can resend this patch with the preallocation series. Thanks, Stefano
Am 19.04.2019 um 14:23 hat Stefano Garzarella geschrieben: > Hi Kevin, > > On Wed, Apr 17, 2019 at 10:04:43AM +0200, Kevin Wolf wrote: > > Am 17.04.2019 um 09:34 hat Stefano Garzarella geschrieben: > > > On Mon, Apr 15, 2019 at 10:04:52AM +0200, Kevin Wolf wrote: > > > > > > > > I think a potential actual use case could be persistent dirty bitmaps > > > > for incremental backup. Though maybe that would be better served by > > > > using the rbd image just as a raw external data file and keeping the > > > > qcow2 metadata on a filesystem. > > > > > > Thanks to point it out! I'll take a look to understand how to keep > > > metadata separated from the data. > > > > I'd consider the feature still experimental, but for local files, it > > works like this: > > > > qemu-img create -f qcow2 -o data_file=test.raw test.qcow2 4G > > > > And then just use test.qcow2. As long as you can put everything you need > > into an rbd URL, the same approach should work. Otherwise, you may need > > to use QMP blockdev-create on creation and possibly the data-file option > > of the qcow2 driver for opening. > > Very interesting, I'll try to add this support also in the rbd driver. I don't understand - what is the thing you want to add to the rbd driver? qcow2 doesn't need special protocol driver support for doing this, and I don't think the QEMU rbd driver has any metadata that could be split off. > > > > How fast is rbd_resize()? Does automatically resizing for every write > > > > request actually work reasonably well in practice? If it does, there is > > > > probably little reason not to allow it, even if the use cases are rather > > > > obscure. > > > > > > I'll try to measure the percentage of the time spent in the rbd_resize. > > > > > > Another solution could be to pass to the rbd driver the virtual size of > > > the image and resize it only one time also if the preallocation is > > > disabled, because RBD will not allocate blocks but IIUC it only set the max > > > size. > > > > > > Do you think make sense? Is it feasible? > > > > In theory yes, though it requires modification of every driver that > > should be usable together with rbd (i.e. ideally all of the drivers). If > > automatic resize works good enough, I'd prefer that > > I did some tests and it seems that the cost of rbd_resize() is > negligible. IIUC it only updates the metadata without allocating any > blocks (if we are growing, like that case). > > Anyway the automatic resize will not affect the current use-case (raw > images on rbd), where the file size is set during the creation, so I > think there should not be side effects with this patch. Okay, sounds good. > I'm also adding the support for preallocation (i.e. full) in the rbd > driver that can be useful for qcow2 images. > > If you prefer I can resend this patch with the preallocation series. Let's keep seperate things separate. Huge patch series are always harder to handle. Kevin
On Tue, Apr 23, 2019 at 09:56:19AM +0200, Kevin Wolf wrote: > Am 19.04.2019 um 14:23 hat Stefano Garzarella geschrieben: > > Hi Kevin, > > > > On Wed, Apr 17, 2019 at 10:04:43AM +0200, Kevin Wolf wrote: > > > Am 17.04.2019 um 09:34 hat Stefano Garzarella geschrieben: > > > > On Mon, Apr 15, 2019 at 10:04:52AM +0200, Kevin Wolf wrote: > > > > > > > > > > I think a potential actual use case could be persistent dirty bitmaps > > > > > for incremental backup. Though maybe that would be better served by > > > > > using the rbd image just as a raw external data file and keeping the > > > > > qcow2 metadata on a filesystem. > > > > > > > > Thanks to point it out! I'll take a look to understand how to keep > > > > metadata separated from the data. > > > > > > I'd consider the feature still experimental, but for local files, it > > > works like this: > > > > > > qemu-img create -f qcow2 -o data_file=test.raw test.qcow2 4G > > > > > > And then just use test.qcow2. As long as you can put everything you need > > > into an rbd URL, the same approach should work. Otherwise, you may need > > > to use QMP blockdev-create on creation and possibly the data-file option > > > of the qcow2 driver for opening. > > > > Very interesting, I'll try to add this support also in the rbd driver. > > I don't understand - what is the thing you want to add to the rbd driver? > qcow2 doesn't need special protocol driver support for doing this, and I > don't think the QEMU rbd driver has any metadata that could be split off. > Oh sorry, I didn't understand that was completely independent from the protocol. > > > > > How fast is rbd_resize()? Does automatically resizing for every write > > > > > request actually work reasonably well in practice? If it does, there is > > > > > probably little reason not to allow it, even if the use cases are rather > > > > > obscure. > > > > > > > > I'll try to measure the percentage of the time spent in the rbd_resize. > > > > > > > > Another solution could be to pass to the rbd driver the virtual size of > > > > the image and resize it only one time also if the preallocation is > > > > disabled, because RBD will not allocate blocks but IIUC it only set the max > > > > size. > > > > > > > > Do you think make sense? Is it feasible? > > > > > > In theory yes, though it requires modification of every driver that > > > should be usable together with rbd (i.e. ideally all of the drivers). If > > > automatic resize works good enough, I'd prefer that > > > > I did some tests and it seems that the cost of rbd_resize() is > > negligible. IIUC it only updates the metadata without allocating any > > blocks (if we are growing, like that case). > > > > Anyway the automatic resize will not affect the current use-case (raw > > images on rbd), where the file size is set during the creation, so I > > think there should not be side effects with this patch. > > Okay, sounds good. > > > I'm also adding the support for preallocation (i.e. full) in the rbd > > driver that can be useful for qcow2 images. > > > > If you prefer I can resend this patch with the preallocation series. > > Let's keep seperate things separate. Huge patch series are always harder > to handle. Okay, thanks for the suggestion! Should this patch go through your tree? Thanks, Stefano
Am 23.04.2019 um 10:26 hat Stefano Garzarella geschrieben: > On Tue, Apr 23, 2019 at 09:56:19AM +0200, Kevin Wolf wrote: > > Am 19.04.2019 um 14:23 hat Stefano Garzarella geschrieben: > > > Hi Kevin, > > > > > > On Wed, Apr 17, 2019 at 10:04:43AM +0200, Kevin Wolf wrote: > > > > Am 17.04.2019 um 09:34 hat Stefano Garzarella geschrieben: > > > > > On Mon, Apr 15, 2019 at 10:04:52AM +0200, Kevin Wolf wrote: > > > > > > > > > > > > I think a potential actual use case could be persistent dirty bitmaps > > > > > > for incremental backup. Though maybe that would be better served by > > > > > > using the rbd image just as a raw external data file and keeping the > > > > > > qcow2 metadata on a filesystem. > > > > > > > > > > Thanks to point it out! I'll take a look to understand how to keep > > > > > metadata separated from the data. > > > > > > > > I'd consider the feature still experimental, but for local files, it > > > > works like this: > > > > > > > > qemu-img create -f qcow2 -o data_file=test.raw test.qcow2 4G > > > > > > > > And then just use test.qcow2. As long as you can put everything you need > > > > into an rbd URL, the same approach should work. Otherwise, you may need > > > > to use QMP blockdev-create on creation and possibly the data-file option > > > > of the qcow2 driver for opening. > > > > > > Very interesting, I'll try to add this support also in the rbd driver. > > > > I don't understand - what is the thing you want to add to the rbd driver? > > qcow2 doesn't need special protocol driver support for doing this, and I > > don't think the QEMU rbd driver has any metadata that could be split off. > > > > Oh sorry, I didn't understand that was completely independent from the > protocol. > > > > > > > How fast is rbd_resize()? Does automatically resizing for every write > > > > > > request actually work reasonably well in practice? If it does, there is > > > > > > probably little reason not to allow it, even if the use cases are rather > > > > > > obscure. > > > > > > > > > > I'll try to measure the percentage of the time spent in the rbd_resize. > > > > > > > > > > Another solution could be to pass to the rbd driver the virtual size of > > > > > the image and resize it only one time also if the preallocation is > > > > > disabled, because RBD will not allocate blocks but IIUC it only set the max > > > > > size. > > > > > > > > > > Do you think make sense? Is it feasible? > > > > > > > > In theory yes, though it requires modification of every driver that > > > > should be usable together with rbd (i.e. ideally all of the drivers). If > > > > automatic resize works good enough, I'd prefer that > > > > > > I did some tests and it seems that the cost of rbd_resize() is > > > negligible. IIUC it only updates the metadata without allocating any > > > blocks (if we are growing, like that case). > > > > > > Anyway the automatic resize will not affect the current use-case (raw > > > images on rbd), where the file size is set during the creation, so I > > > think there should not be side effects with this patch. > > > > Okay, sounds good. > > > > > I'm also adding the support for preallocation (i.e. full) in the rbd > > > driver that can be useful for qcow2 images. > > > > > > If you prefer I can resend this patch with the preallocation series. > > > > Let's keep seperate things separate. Huge patch series are always harder > > to handle. > > Okay, thanks for the suggestion! > > Should this patch go through your tree? I think so, yes. Kevin
Am 23.04.2019 um 10:38 hat Kevin Wolf geschrieben: > Am 23.04.2019 um 10:26 hat Stefano Garzarella geschrieben: > > On Tue, Apr 23, 2019 at 09:56:19AM +0200, Kevin Wolf wrote: > > > Am 19.04.2019 um 14:23 hat Stefano Garzarella geschrieben: > > > > Hi Kevin, > > > > > > > > On Wed, Apr 17, 2019 at 10:04:43AM +0200, Kevin Wolf wrote: > > > > > Am 17.04.2019 um 09:34 hat Stefano Garzarella geschrieben: > > > > > > On Mon, Apr 15, 2019 at 10:04:52AM +0200, Kevin Wolf wrote: > > > > > > > > > > > > > > I think a potential actual use case could be persistent dirty bitmaps > > > > > > > for incremental backup. Though maybe that would be better served by > > > > > > > using the rbd image just as a raw external data file and keeping the > > > > > > > qcow2 metadata on a filesystem. > > > > > > > > > > > > Thanks to point it out! I'll take a look to understand how to keep > > > > > > metadata separated from the data. > > > > > > > > > > I'd consider the feature still experimental, but for local files, it > > > > > works like this: > > > > > > > > > > qemu-img create -f qcow2 -o data_file=test.raw test.qcow2 4G > > > > > > > > > > And then just use test.qcow2. As long as you can put everything you need > > > > > into an rbd URL, the same approach should work. Otherwise, you may need > > > > > to use QMP blockdev-create on creation and possibly the data-file option > > > > > of the qcow2 driver for opening. > > > > > > > > Very interesting, I'll try to add this support also in the rbd driver. > > > > > > I don't understand - what is the thing you want to add to the rbd driver? > > > qcow2 doesn't need special protocol driver support for doing this, and I > > > don't think the QEMU rbd driver has any metadata that could be split off. > > > > > > > Oh sorry, I didn't understand that was completely independent from the > > protocol. > > > > > > > > > How fast is rbd_resize()? Does automatically resizing for every write > > > > > > > request actually work reasonably well in practice? If it does, there is > > > > > > > probably little reason not to allow it, even if the use cases are rather > > > > > > > obscure. > > > > > > > > > > > > I'll try to measure the percentage of the time spent in the rbd_resize. > > > > > > > > > > > > Another solution could be to pass to the rbd driver the virtual size of > > > > > > the image and resize it only one time also if the preallocation is > > > > > > disabled, because RBD will not allocate blocks but IIUC it only set the max > > > > > > size. > > > > > > > > > > > > Do you think make sense? Is it feasible? > > > > > > > > > > In theory yes, though it requires modification of every driver that > > > > > should be usable together with rbd (i.e. ideally all of the drivers). If > > > > > automatic resize works good enough, I'd prefer that > > > > > > > > I did some tests and it seems that the cost of rbd_resize() is > > > > negligible. IIUC it only updates the metadata without allocating any > > > > blocks (if we are growing, like that case). > > > > > > > > Anyway the automatic resize will not affect the current use-case (raw > > > > images on rbd), where the file size is set during the creation, so I > > > > think there should not be side effects with this patch. > > > > > > Okay, sounds good. > > > > > > > I'm also adding the support for preallocation (i.e. full) in the rbd > > > > driver that can be useful for qcow2 images. > > > > > > > > If you prefer I can resend this patch with the preallocation series. > > > > > > Let's keep seperate things separate. Huge patch series are always harder > > > to handle. > > > > Okay, thanks for the suggestion! > > > > Should this patch go through your tree? > > I think so, yes. Hm, this is an RFC patch, which suggests that it wasn't originally meant to be merged as it is. Are you going to send a new version, or did it turn out to be exactly what we want and the "RFC" tag was a mistake? Kevin
On Mon, Apr 29, 2019 at 11:58:55AM +0200, Kevin Wolf wrote: > > Hm, this is an RFC patch, which suggests that it wasn't originally meant > to be merged as it is. Are you going to send a new version, or did it > turn out to be exactly what we want and the "RFC" tag was a mistake? I put the "RFC" tag because I was not sure about this patch, but after your comments and some tests on the rbd_resize() impact, I think that it turns out exactly what we want. If you agree, you can queue it as is. Thanks, Stefano
Am 11.04.2019 um 12:50 hat Stefano Garzarella geschrieben: > RBD APIs don't allow us to write more than the size set with rbd_create() > or rbd_resize(). > In order to support growing images (eg. qcow2), we resize the image > before RW operations that exceed the current size. > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1171007 > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> > --- > block/rbd.c | 25 +++++++++++++++++++++++++ > 1 file changed, 25 insertions(+) > > diff --git a/block/rbd.c b/block/rbd.c > index 0c549c9935..228658e20a 100644 > --- a/block/rbd.c > +++ b/block/rbd.c > @@ -102,6 +102,7 @@ typedef struct BDRVRBDState { > rbd_image_t image; > char *image_name; > char *snap; > + uint64_t image_size; > } BDRVRBDState; Can't we use bs->total_sectors instead of adding a new image_size field? > static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx, > @@ -777,6 +778,14 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags, > goto failed_open; > } > > + r = rbd_get_size(s->image, &s->image_size); > + if (r < 0) { > + error_setg_errno(errp, -r, "error reading image size from %s", > + s->image_name); > + rbd_close(s->image); > + goto failed_open; > + } > + > /* If we are using an rbd snapshot, we must be r/o, otherwise > * leave as-is */ > if (s->snap != NULL) { > @@ -921,6 +930,20 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs, > rcb->buf = acb->bounce; > } > > + /* > + * RBD APIs don't allow us to write more than actual size, so in order > + * to support growing images, we resize the image before RW operations > + * that exceed the current size. > + */ > + if (s->image_size < off + size) { > + r = rbd_resize(s->image, off + size); > + if (r < 0) { > + goto failed; > + } > + > + s->image_size = off + size; > + } This doesn't check the request type, so it's actually not limited to RW operations, but even reads will try to resize the image. This is at least surprising. For regular files, file-posix extends the file for write requests, but for reads it returns a zeroed buffer without actually changing the file size. Kevin
On Mon, Apr 29, 2019 at 12:25:10PM +0200, Kevin Wolf wrote: > Am 11.04.2019 um 12:50 hat Stefano Garzarella geschrieben: > > RBD APIs don't allow us to write more than the size set with rbd_create() > > or rbd_resize(). > > In order to support growing images (eg. qcow2), we resize the image > > before RW operations that exceed the current size. > > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1171007 > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> > > --- > > block/rbd.c | 25 +++++++++++++++++++++++++ > > 1 file changed, 25 insertions(+) > > > > diff --git a/block/rbd.c b/block/rbd.c > > index 0c549c9935..228658e20a 100644 > > --- a/block/rbd.c > > +++ b/block/rbd.c > > @@ -102,6 +102,7 @@ typedef struct BDRVRBDState { > > rbd_image_t image; > > char *image_name; > > char *snap; > > + uint64_t image_size; > > } BDRVRBDState; > > Can't we use bs->total_sectors instead of adding a new image_size field? I'm not sure we can use bs->total_sectors. IIUC, for example, it doesn't take care of bytes used by QCOW2 metadata. > > > static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx, > > @@ -777,6 +778,14 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags, > > goto failed_open; > > } > > > > + r = rbd_get_size(s->image, &s->image_size); > > + if (r < 0) { > > + error_setg_errno(errp, -r, "error reading image size from %s", > > + s->image_name); > > + rbd_close(s->image); > > + goto failed_open; > > + } > > + > > /* If we are using an rbd snapshot, we must be r/o, otherwise > > * leave as-is */ > > if (s->snap != NULL) { > > @@ -921,6 +930,20 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs, > > rcb->buf = acb->bounce; > > } > > > > + /* > > + * RBD APIs don't allow us to write more than actual size, so in order > > + * to support growing images, we resize the image before RW operations > > + * that exceed the current size. > > + */ > > + if (s->image_size < off + size) { > > + r = rbd_resize(s->image, off + size); > > + if (r < 0) { > > + goto failed; > > + } > > + > > + s->image_size = off + size; > > + } > > This doesn't check the request type, so it's actually not limited to RW > operations, but even reads will try to resize the image. This is at > least surprising. For regular files, file-posix extends the file for > write requests, but for reads it returns a zeroed buffer without > actually changing the file size. Yes, I'll change the behaviour in the v2. I did some tries (i.e. using qemu-io and reading more than bytes used) and the RBD driver didn't receive 'read' requests that exceed the current size, maybe because it is managed in the QCOW2 protocol, but of course I'll handle also in the RBD driver. Thanks, Stefano
Am 29.04.2019 um 16:04 hat Stefano Garzarella geschrieben: > On Mon, Apr 29, 2019 at 12:25:10PM +0200, Kevin Wolf wrote: > > Am 11.04.2019 um 12:50 hat Stefano Garzarella geschrieben: > > > RBD APIs don't allow us to write more than the size set with rbd_create() > > > or rbd_resize(). > > > In order to support growing images (eg. qcow2), we resize the image > > > before RW operations that exceed the current size. > > > > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1171007 > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> > > > --- > > > block/rbd.c | 25 +++++++++++++++++++++++++ > > > 1 file changed, 25 insertions(+) > > > > > > diff --git a/block/rbd.c b/block/rbd.c > > > index 0c549c9935..228658e20a 100644 > > > --- a/block/rbd.c > > > +++ b/block/rbd.c > > > @@ -102,6 +102,7 @@ typedef struct BDRVRBDState { > > > rbd_image_t image; > > > char *image_name; > > > char *snap; > > > + uint64_t image_size; > > > } BDRVRBDState; > > > > Can't we use bs->total_sectors instead of adding a new image_size field? > > I'm not sure we can use bs->total_sectors. IIUC, for example, it doesn't > take care of bytes used by QCOW2 metadata. bs->total_sectors for the rbd BLockDriverState is the image file size, not the virtual disk size. The only reason not to use it would be if we need byte granularity rather than 512 byte granularity. But I don't think it's a problem to round up offsets to the next 512 bytes (BDRV_SECTOR_SIZE) boundary. > > > static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx, > > > @@ -777,6 +778,14 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags, > > > goto failed_open; > > > } > > > > > > + r = rbd_get_size(s->image, &s->image_size); > > > + if (r < 0) { > > > + error_setg_errno(errp, -r, "error reading image size from %s", > > > + s->image_name); > > > + rbd_close(s->image); > > > + goto failed_open; > > > + } > > > + > > > /* If we are using an rbd snapshot, we must be r/o, otherwise > > > * leave as-is */ > > > if (s->snap != NULL) { > > > @@ -921,6 +930,20 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs, > > > rcb->buf = acb->bounce; > > > } > > > > > > + /* > > > + * RBD APIs don't allow us to write more than actual size, so in order > > > + * to support growing images, we resize the image before RW operations > > > + * that exceed the current size. > > > + */ > > > + if (s->image_size < off + size) { > > > + r = rbd_resize(s->image, off + size); > > > + if (r < 0) { > > > + goto failed; > > > + } > > > + > > > + s->image_size = off + size; > > > + } > > > > This doesn't check the request type, so it's actually not limited to RW > > operations, but even reads will try to resize the image. This is at > > least surprising. For regular files, file-posix extends the file for > > write requests, but for reads it returns a zeroed buffer without > > actually changing the file size. > > Yes, I'll change the behaviour in the v2. > > I did some tries (i.e. using qemu-io and reading more than bytes used) and > the RBD driver didn't receive 'read' requests that exceed the current > size, maybe because it is managed in the QCOW2 protocol, but of course > I'll handle also in the RBD driver. I don't remember the exact scenario where it happened, but I know I implemented it for file-posix to fix a bug. Maybe it actually doesn't happen any more because we have made other changes in the meantime, but I'm not sure. Kevin
On Mon, Apr 29, 2019 at 04:30:14PM +0200, Kevin Wolf wrote: > Am 29.04.2019 um 16:04 hat Stefano Garzarella geschrieben: > > On Mon, Apr 29, 2019 at 12:25:10PM +0200, Kevin Wolf wrote: > > > Am 11.04.2019 um 12:50 hat Stefano Garzarella geschrieben: > > > > RBD APIs don't allow us to write more than the size set with rbd_create() > > > > or rbd_resize(). > > > > In order to support growing images (eg. qcow2), we resize the image > > > > before RW operations that exceed the current size. > > > > > > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1171007 > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> > > > > --- > > > > block/rbd.c | 25 +++++++++++++++++++++++++ > > > > 1 file changed, 25 insertions(+) > > > > > > > > diff --git a/block/rbd.c b/block/rbd.c > > > > index 0c549c9935..228658e20a 100644 > > > > --- a/block/rbd.c > > > > +++ b/block/rbd.c > > > > @@ -102,6 +102,7 @@ typedef struct BDRVRBDState { > > > > rbd_image_t image; > > > > char *image_name; > > > > char *snap; > > > > + uint64_t image_size; > > > > } BDRVRBDState; > > > > > > Can't we use bs->total_sectors instead of adding a new image_size field? > > > > I'm not sure we can use bs->total_sectors. IIUC, for example, it doesn't > > take care of bytes used by QCOW2 metadata. > > bs->total_sectors for the rbd BLockDriverState is the image file size, > not the virtual disk size. > > The only reason not to use it would be if we need byte granularity > rather than 512 byte granularity. But I don't think it's a problem to > round up offsets to the next 512 bytes (BDRV_SECTOR_SIZE) boundary. > I tried and it works as you told me :) I'll remove the image_size in the v2. > > > > static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx, > > > > @@ -777,6 +778,14 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags, > > > > goto failed_open; > > > > } > > > > > > > > + r = rbd_get_size(s->image, &s->image_size); > > > > + if (r < 0) { > > > > + error_setg_errno(errp, -r, "error reading image size from %s", > > > > + s->image_name); > > > > + rbd_close(s->image); > > > > + goto failed_open; > > > > + } > > > > + > > > > /* If we are using an rbd snapshot, we must be r/o, otherwise > > > > * leave as-is */ > > > > if (s->snap != NULL) { > > > > @@ -921,6 +930,20 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs, > > > > rcb->buf = acb->bounce; > > > > } > > > > > > > > + /* > > > > + * RBD APIs don't allow us to write more than actual size, so in order > > > > + * to support growing images, we resize the image before RW operations > > > > + * that exceed the current size. > > > > + */ > > > > + if (s->image_size < off + size) { > > > > + r = rbd_resize(s->image, off + size); > > > > + if (r < 0) { > > > > + goto failed; > > > > + } > > > > + > > > > + s->image_size = off + size; > > > > + } > > > > > > This doesn't check the request type, so it's actually not limited to RW > > > operations, but even reads will try to resize the image. This is at > > > least surprising. For regular files, file-posix extends the file for > > > write requests, but for reads it returns a zeroed buffer without > > > actually changing the file size. > > > > Yes, I'll change the behaviour in the v2. > > > > I did some tries (i.e. using qemu-io and reading more than bytes used) and > > the RBD driver didn't receive 'read' requests that exceed the current > > size, maybe because it is managed in the QCOW2 protocol, but of course > > I'll handle also in the RBD driver. > > I don't remember the exact scenario where it happened, but I know I > implemented it for file-posix to fix a bug. Maybe it actually doesn't > happen any more because we have made other changes in the meantime, but > I'm not sure. > Thanks for the details, I'll check better if we can avoid it, otherwise I'll take care of this in the RBD driver. Cheers, Stefano
diff --git a/block/rbd.c b/block/rbd.c index 0c549c9935..228658e20a 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -102,6 +102,7 @@ typedef struct BDRVRBDState { rbd_image_t image; char *image_name; char *snap; + uint64_t image_size; } BDRVRBDState; static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx, @@ -777,6 +778,14 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags, goto failed_open; } + r = rbd_get_size(s->image, &s->image_size); + if (r < 0) { + error_setg_errno(errp, -r, "error reading image size from %s", + s->image_name); + rbd_close(s->image); + goto failed_open; + } + /* If we are using an rbd snapshot, we must be r/o, otherwise * leave as-is */ if (s->snap != NULL) { @@ -921,6 +930,20 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs, rcb->buf = acb->bounce; } + /* + * RBD APIs don't allow us to write more than actual size, so in order + * to support growing images, we resize the image before RW operations + * that exceed the current size. + */ + if (s->image_size < off + size) { + r = rbd_resize(s->image, off + size); + if (r < 0) { + goto failed; + } + + s->image_size = off + size; + } + acb->ret = 0; acb->error = 0; acb->s = s; @@ -1066,6 +1089,8 @@ static int coroutine_fn qemu_rbd_co_truncate(BlockDriverState *bs, return r; } + s->image_size = offset; + return 0; }
RBD APIs don't allow us to write more than the size set with rbd_create() or rbd_resize(). In order to support growing images (eg. qcow2), we resize the image before RW operations that exceed the current size. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1171007 Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> --- block/rbd.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)