Message ID | 20090820205616.GA5503@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Am Donnerstag 20 August 2009 22:56:16 schrieb Christoph Hellwig: > Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which > means it does not allow filesystems to use barriers. But the typical use > case for virtio-blk is to use a backed that uses synchronous I/O, and in > that case we can simply set QUEUE_ORDERED_DRAIN to make the block layer > drain the request queue around barrier I/O and provide the semantics that > the filesystems need. This is what the SCSI disk driver does for disks > that have the write cache disabled. > > With this patch we incorrectly advertise barrier support if someone > configure qemu with write back caching. While this displays wrong > information in the guest there is nothing that guest could have done > even if we rightfully told it that we do not support any barriers. > > Signed-off-by: Christoph Hellwig <hch@lst.de> Make sense to me. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [...] > - /* If barriers are supported, tell block layer that queue is ordered */ > + /* > + * If barriers are supported, tell block layer that queue is ordered. > + * > + * If no barriers are supported assume the host uses synchronous > + * writes and just drain the the queue before and after the barrier. > + */ > if (virtio_has_feature(vdev, VIRTIO_BLK_F_BARRIER)) > blk_queue_ordered(vblk->disk->queue, QUEUE_ORDERED_TAG, NULL); > + else > + blk_queue_ordered(vblk->disk->queue, QUEUE_ORDERED_DRAIN, NULL); [...] -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 21 Aug 2009 06:26:16 am Christoph Hellwig wrote: > Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which > means it does not allow filesystems to use barriers. But the typical use > case for virtio-blk is to use a backed that uses synchronous I/O Really? Does qemu open with O_SYNC? I'm definitely no block expert, but this seems strange... Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 25, 2009 at 11:41:37PM +0930, Rusty Russell wrote: > On Fri, 21 Aug 2009 06:26:16 am Christoph Hellwig wrote: > > Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which > > means it does not allow filesystems to use barriers. But the typical use > > case for virtio-blk is to use a backed that uses synchronous I/O > > Really? Does qemu open with O_SYNC? > > I'm definitely no block expert, but this seems strange... > Rusty. Qemu can open it various ways, but the only one that is fully safe is O_SYNC (cache=writethrough). The O_DIRECT (cache=none) option is also fully safe with the above patch under some limited circumstances (disk write caches off and using a host device or fully allocated file). Fixing the cache=writeback option and the majority case for cache=none requires implementing a cache flush command, and for the latter one also fixes to the host kernel I'm working on. You will get another patch to implement the proper cache controls in virtio-blk for me in a couple of days, too. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 25 Aug 2009 11:46:08 pm Christoph Hellwig wrote: > On Tue, Aug 25, 2009 at 11:41:37PM +0930, Rusty Russell wrote: > > On Fri, 21 Aug 2009 06:26:16 am Christoph Hellwig wrote: > > > Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which > > > means it does not allow filesystems to use barriers. But the typical use > > > case for virtio-blk is to use a backed that uses synchronous I/O > > > > Really? Does qemu open with O_SYNC? > > > > I'm definitely no block expert, but this seems strange... > > Rusty. > > Qemu can open it various ways, but the only one that is fully safe > is O_SYNC (cache=writethrough). (Rusty goes away and reads the qemu man page). By default, if no explicit caching is specified for a qcow2 disk image, cache=writeback will be used. Are you claiming qcow2 is unusual? I can believe snapshot is less common, though I use it all the time. You'd normally have to add a feature for something like this. I don't think this is different. Sorry, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/26/2009 03:06 PM, Rusty Russell wrote: > On Tue, 25 Aug 2009 11:46:08 pm Christoph Hellwig wrote: > >> On Tue, Aug 25, 2009 at 11:41:37PM +0930, Rusty Russell wrote: >> >>> On Fri, 21 Aug 2009 06:26:16 am Christoph Hellwig wrote: >>> >>>> Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which >>>> means it does not allow filesystems to use barriers. But the typical use >>>> case for virtio-blk is to use a backed that uses synchronous I/O >>>> >>> Really? Does qemu open with O_SYNC? >>> >>> I'm definitely no block expert, but this seems strange... >>> Rusty. >>> >> Qemu can open it various ways, but the only one that is fully safe >> is O_SYNC (cache=writethrough). >> > (Rusty goes away and reads the qemu man page). > > By default, if no explicit caching is specified for a qcow2 disk image, > cache=writeback will be used. > It's now switched to writethrough. In any case, cache=writeback means "lie to the guest, we don't care about integrity". > Are you claiming qcow2 is unusual? I can believe snapshot is less common, > though I use it all the time. > > You'd normally have to add a feature for something like this. I don't > think this is different. > Why do we need to add a feature for this?
On Wed, 26 Aug 2009 09:58:13 pm Avi Kivity wrote: > On 08/26/2009 03:06 PM, Rusty Russell wrote: > > On Tue, 25 Aug 2009 11:46:08 pm Christoph Hellwig wrote: > > > >> On Tue, Aug 25, 2009 at 11:41:37PM +0930, Rusty Russell wrote: > >> > >>> On Fri, 21 Aug 2009 06:26:16 am Christoph Hellwig wrote: > >>> > >>>> Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which > >>>> means it does not allow filesystems to use barriers. But the typical use > >>>> case for virtio-blk is to use a backed that uses synchronous I/O > >>>> > >>> Really? Does qemu open with O_SYNC? > >>> > >>> I'm definitely no block expert, but this seems strange... > >>> Rusty. > >>> > >> Qemu can open it various ways, but the only one that is fully safe > >> is O_SYNC (cache=writethrough). > >> > > (Rusty goes away and reads the qemu man page). > > > > By default, if no explicit caching is specified for a qcow2 disk image, > > cache=writeback will be used. > > > > It's now switched to writethrough. In any case, cache=writeback means > "lie to the guest, we don't care about integrity". Well, that was the intent of the virtio barrier feature; *don't* lie to the guest, make it aware of the limitations. Of course, having read Christoph's excellent summary of the situation, it's clear I failed. > > Are you claiming qcow2 is unusual? I can believe snapshot is less common, > > though I use it all the time. > > > > You'd normally have to add a feature for something like this. I don't > > think this is different. > > Why do we need to add a feature for this? Because cache=writeback should *not* lie to the guest? Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/27/2009 01:43 PM, Rusty Russell wrote: > >>> Are you claiming qcow2 is unusual? I can believe snapshot is less common, >>> though I use it all the time. >>> >>> You'd normally have to add a feature for something like this. I don't >>> think this is different. >>> >> Why do we need to add a feature for this? >> > Because cache=writeback should *not* lie to the guest? > > No, it should. There are two possible semantics to cache=writeback: - simulate a drive with a huge write cache; use fsync() to implement barriers - tell the host that we aren't interested in data integrity, lie to the guest to get best performance The first semantic is not very useful; guests don't expect huge write caches so you can't be sure of your integrity guarantees, and it's slower than cache=none due to double caching and extra copies. The second semantic is not useful for production, but is very useful for testing out things where you aren't worries about host crashes and you're usually rebooting the guest very often (you can't rely on guest caches, so you want the host to cache).
On Thu, 27 Aug 2009 08:34:19 pm Avi Kivity wrote: > There are two possible semantics to cache=writeback: > > - simulate a drive with a huge write cache; use fsync() to implement > barriers > - tell the host that we aren't interested in data integrity, lie to the > guest to get best performance Why lie to the guest? Just say we're not ordered, and don't support barriers. Gets even *better* performance since it won't drain the queues. Maybe you're thinking of full virtualization where we guest ignorance is bliss. But lying always gets us in trouble later on when other cases come up. > The second semantic is not useful for production, but is very useful for > testing out things where you aren't worries about host crashes and > you're usually rebooting the guest very often (you can't rely on guest > caches, so you want the host to cache). This is not the ideal world; people will do things for performance "in production". Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/28/2009 04:15 AM, Rusty Russell wrote: > On Thu, 27 Aug 2009 08:34:19 pm Avi Kivity wrote: > >> There are two possible semantics to cache=writeback: >> >> - simulate a drive with a huge write cache; use fsync() to implement >> barriers >> - tell the host that we aren't interested in data integrity, lie to the >> guest to get best performance >> > Why lie to the guest? Just say we're not ordered, and don't support barriers. > Gets even *better* performance since it won't drain the queues. > In that case, honesty is preferable. It means testing with cache=writeback exercises different guest code paths, but that's acceptable. > Maybe you're thinking of full virtualization where we guest ignorance is > bliss. But lying always gets us in trouble later on when other cases come > up. > > >> The second semantic is not useful for production, but is very useful for >> testing out things where you aren't worries about host crashes and >> you're usually rebooting the guest very often (you can't rely on guest >> caches, so you want the host to cache). >> > This is not the ideal world; people will do things for performance "in > production". > > We found that cache=none is faster than cache=writeback when you're really interested in performance (no qcow2).
Err, I'll take this one back for now pending some more discussion. What we need more urgently is the writeback cache flag, which is now implemented in qemu, patch following ASAP. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 18 Sep 2009 03:01:42 am Christoph Hellwig wrote: > Err, I'll take this one back for now pending some more discussion. > What we need more urgently is the writeback cache flag, which is now > implemented in qemu, patch following ASAP. OK, still catching up on mail. I'll push them out of the queue for now. Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: linux-2.6/drivers/block/virtio_blk.c =================================================================== --- linux-2.6.orig/drivers/block/virtio_blk.c 2009-08-20 17:41:37.019718433 -0300 +++ linux-2.6/drivers/block/virtio_blk.c 2009-08-20 17:45:40.511747922 -0300 @@ -336,9 +336,16 @@ static int __devinit virtblk_probe(struc vblk->disk->driverfs_dev = &vdev->dev; index++; - /* If barriers are supported, tell block layer that queue is ordered */ + /* + * If barriers are supported, tell block layer that queue is ordered. + * + * If no barriers are supported assume the host uses synchronous + * writes and just drain the the queue before and after the barrier. + */ if (virtio_has_feature(vdev, VIRTIO_BLK_F_BARRIER)) blk_queue_ordered(vblk->disk->queue, QUEUE_ORDERED_TAG, NULL); + else + blk_queue_ordered(vblk->disk->queue, QUEUE_ORDERED_DRAIN, NULL); /* If disk is read-only in the host, the guest should obey */ if (virtio_has_feature(vdev, VIRTIO_BLK_F_RO))
Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which means it does not allow filesystems to use barriers. But the typical use case for virtio-blk is to use a backed that uses synchronous I/O, and in that case we can simply set QUEUE_ORDERED_DRAIN to make the block layer drain the request queue around barrier I/O and provide the semantics that the filesystems need. This is what the SCSI disk driver does for disks that have the write cache disabled. With this patch we incorrectly advertise barrier support if someone configure qemu with write back caching. While this displays wrong information in the guest there is nothing that guest could have done even if we rightfully told it that we do not support any barriers. Signed-off-by: Christoph Hellwig <hch@lst.de> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html