Message ID | 20180727033402.13513-1-famz@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | qemu-img: Disable copy offloading by default | expand |
Am 27.07.2018 um 05:33 hat Fam Zheng geschrieben: > Kevin pointed out that both glibc and kernel provides a slow fallback of > copy_file_range which hurts thin provisioning. This is particularly true for > thin LVs, because host_device driver cannot get allocation info from the > volume, and copy_file_range is called on every sectors, making the dst fully > allocated. > > NFS mount points also doesn't support SEEK_DATA well, so the allocation > information is unknown to QEMU. > > That leaves only iscsi:// which seems to do what we want so far, but it is a > smaller use case. > > Add an option to qemu-img convert, "-C", to enable (attempting) copy offloading > explicitly. And mark it incompatible with "-S" and "-c". Reviewed-by: Kevin Wolf <kwolf@redhat.com> Not sure why you made this an RFC only, but I think we absolutely need this. People are used to using 'qemu-img convert' to compact images and this would regress with automatic copy offloading. Do you think we need more discussion? Kevin
On Fri, Jul 27, 2018 at 6:29 PM Kevin Wolf <kwolf@redhat.com> wrote: > > Am 27.07.2018 um 05:33 hat Fam Zheng geschrieben: > > Kevin pointed out that both glibc and kernel provides a slow fallback of > > copy_file_range which hurts thin provisioning. This is particularly true for > > thin LVs, because host_device driver cannot get allocation info from the > > volume, and copy_file_range is called on every sectors, making the dst fully > > allocated. > > > > NFS mount points also doesn't support SEEK_DATA well, so the allocation > > information is unknown to QEMU. > > > > That leaves only iscsi:// which seems to do what we want so far, but it is a > > smaller use case. > > > > Add an option to qemu-img convert, "-C", to enable (attempting) copy offloading > > explicitly. And mark it incompatible with "-S" and "-c". > > Reviewed-by: Kevin Wolf <kwolf@redhat.com> > > Not sure why you made this an RFC only, but I think we absolutely need > this. People are used to using 'qemu-img convert' to compact images and > this would regress with automatic copy offloading. > > Do you think we need more discussion? I think merging this for 3.0 is the right thing do to. What worries me is the general usability of the feature. We could probably explore ideas about how we can better take advantage of copy offloading. I don't think counting on the user to make the right decision between disk efficiency (thin provisioning) and BW efficiency (copy offloading) will ever work. Even if we don't care about breaking the default '-S 4k' behavior, the lack of SEEK_DATA/SEEK_HOLE support on host NFS and block devices will make it very hard to use. Making it worse, if the network to NFS server is good enough, convert with pread64/pwrite64 with host page cache is also more efficient than copy_file_range, so we'll be slower by trying to play clever. :( Any thought? Fam
On Fri, Jul 27, 2018 at 3:15 PM Fam Zheng <famz@redhat.com> wrote: > On Fri, Jul 27, 2018 at 6:29 PM Kevin Wolf <kwolf@redhat.com> wrote: > > > > Am 27.07.2018 um 05:33 hat Fam Zheng geschrieben: > > > Kevin pointed out that both glibc and kernel provides a slow fallback > of > > > copy_file_range which hurts thin provisioning. This is particularly > true for > > > thin LVs, because host_device driver cannot get allocation info from > the > > > volume, and copy_file_range is called on every sectors, making the dst > fully > > > allocated. > > > > > > NFS mount points also doesn't support SEEK_DATA well, so the allocation > > > information is unknown to QEMU. > NFS >= 4.2 supports SEEK_DATA/HOLE. > > > > > > That leaves only iscsi:// which seems to do what we want so far, but > it is a > > > smaller use case. > > > > > > Add an option to qemu-img convert, "-C", to enable (attempting) copy > offloading > > > explicitly. And mark it incompatible with "-S" and "-c". > > > > Reviewed-by: Kevin Wolf <kwolf@redhat.com> > > > > Not sure why you made this an RFC only, but I think we absolutely need > > this. People are used to using 'qemu-img convert' to compact images and > > this would regress with automatic copy offloading. > > > > Do you think we need more discussion? > > I think merging this for 3.0 is the right thing do to. > > What worries me is the general usability of the feature. We could > probably explore ideas about how we can better take advantage of copy > offloading. I don't think counting on the user to make the right > decision between disk efficiency (thin provisioning) and BW efficiency (copy offloading) will ever work. Even if we don't care about breaking > the default '-S 4k' behavior, the lack of SEEK_DATA/SEEK_HOLE support > on host NFS and block devices will make it very hard to use. Making it > worse, if the network to NFS server is good enough, convert with > pread64/pwrite64 with host page cache is also more efficient than > copy_file_range, so we'll be slower by trying to play clever. :( > In oVirt we always disable caching (-t none -T none), so using host page cache is not an option. I think adding an option for copy offloading is the right thing. This way we can introduce the feature early without breaking the rest of the stuck. For long term it would be nicer if qemu could select the best way to do the copy automatically, keeping the allocation policy specified by the user. oVirt still have bugs related to converting preallocated images to sparse and sparse images to preallocated. Users like to have control on this, so qemu-img cannot change the policy. Do we have some documentation on the useful cases for copy offloading? Did we benchmark this with different kind of storage? The interesting use case for oVirt are: - block storage: copying rregular LV on shared storage to another LV on same VG, but may be on a different PV. - file storage: copying files on same NFS or GlusterFS mont. - copying between different servers or file types, I guess copy_file_range will not help in this case, right? Nir
Am 27.07.2018 um 14:14 hat Fam Zheng geschrieben: > On Fri, Jul 27, 2018 at 6:29 PM Kevin Wolf <kwolf@redhat.com> wrote: > > > > Am 27.07.2018 um 05:33 hat Fam Zheng geschrieben: > > > Kevin pointed out that both glibc and kernel provides a slow fallback of > > > copy_file_range which hurts thin provisioning. This is particularly true for > > > thin LVs, because host_device driver cannot get allocation info from the > > > volume, and copy_file_range is called on every sectors, making the dst fully > > > allocated. > > > > > > NFS mount points also doesn't support SEEK_DATA well, so the allocation > > > information is unknown to QEMU. > > > > > > That leaves only iscsi:// which seems to do what we want so far, but it is a > > > smaller use case. > > > > > > Add an option to qemu-img convert, "-C", to enable (attempting) copy offloading > > > explicitly. And mark it incompatible with "-S" and "-c". > > > > Reviewed-by: Kevin Wolf <kwolf@redhat.com> > > > > Not sure why you made this an RFC only, but I think we absolutely need > > this. People are used to using 'qemu-img convert' to compact images and > > this would regress with automatic copy offloading. > > > > Do you think we need more discussion? > > I think merging this for 3.0 is the right thing do to. Thanks, applied to the block branch. > What worries me is the general usability of the feature. We could > probably explore ideas about how we can better take advantage of copy > offloading. I don't think counting on the user to make the right > decision between disk efficiency (thin provisioning) and BW efficiency > (copy offloading) will ever work. Ultimately it is a decision that QEMU can't make, though. The user needs to tell us at least whether they are trying to compact the image or whether they just want to get it copied as fast as possible. Even if we know this, of course, we may still be lacking information about the storage to make the best decision. > Even if we don't care about breaking the default '-S 4k' behavior, the > lack of SEEK_DATA/SEEK_HOLE support on host NFS and block devices will > make it very hard to use. I hope that NFS 4.2 will gain some ground in the future. Without it, thin provisioning with raw images on NFS is basically impossible. I'm not aware of problems with block devices? > Making it worse, if the network to NFS server is good enough, convert > with pread64/pwrite64 with host page cache is also more efficient than > copy_file_range, so we'll be slower by trying to play clever. :( Really? I could understand not being much faster, but being actually slower sounds wrong. Are we slower because of overhead in copy_file_range() itself or something during preparation? Kevin