Message ID | 20200310094653.33257-1-johannes.thumshirn@wdc.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce Zone Append for writing to zoned block devices | expand |
On Tue, Mar 10, 2020 at 06:46:42PM +0900, Johannes Thumshirn wrote: > For null_blk the emulation is way simpler, as null_blk's zoned block > device emulation support already caches the write pointer position, so we > only need to report the position back to the upper layers. Additional > caching is not needed here. > > Testing has been conducted by translating RWF_APPEND DIOs into > REQ_OP_ZONE_APPEND commands in the block device's direct I/O function and > injecting errors by bypassing the block layer interface and directly > writing to the disc via the SCSI generic interface. We really need a user of this to be useful upstream. Didn't you plan to look into converting zonefs/iomap to use it? Without that it is at best a RFC. Even better would be converting zonefs and the f2fs zoned code so that can get rid of the old per-zone serialization in the I/O scheduler entirely.
On 2020/03/11 1:42, Christoph Hellwig wrote: > On Tue, Mar 10, 2020 at 06:46:42PM +0900, Johannes Thumshirn wrote: >> For null_blk the emulation is way simpler, as null_blk's zoned block >> device emulation support already caches the write pointer position, so we >> only need to report the position back to the upper layers. Additional >> caching is not needed here. >> >> Testing has been conducted by translating RWF_APPEND DIOs into >> REQ_OP_ZONE_APPEND commands in the block device's direct I/O function and >> injecting errors by bypassing the block layer interface and directly >> writing to the disc via the SCSI generic interface. > > We really need a user of this to be useful upstream. Didn't you plan > to look into converting zonefs/iomap to use it? Without that it is > at best a RFC. Even better would be converting zonefs and the f2fs > zoned code so that can get rid of the old per-zone serialization in > the I/O scheduler entirely. I do not think we can get rid of it entirely as it is needed for applications using regular writes on raw zoned block devices. But the zone write locking will be completely bypassed for zone append writes issued by file systems.
On Wed, Mar 11, 2020 at 12:37:33AM +0000, Damien Le Moal wrote: > I do not think we can get rid of it entirely as it is needed for applications > using regular writes on raw zoned block devices. But the zone write locking will > be completely bypassed for zone append writes issued by file systems. But applications that are aware of zones should not be sending multiple write commands to a zone anyway. We certainly can't use zone write locking for nvme if we want to be able to use multiple queues.
On 2020/03/11 15:25, Christoph Hellwig wrote: > On Wed, Mar 11, 2020 at 12:37:33AM +0000, Damien Le Moal wrote: >> I do not think we can get rid of it entirely as it is needed for applications >> using regular writes on raw zoned block devices. But the zone write locking will >> be completely bypassed for zone append writes issued by file systems. > > But applications that are aware of zones should not be sending multiple > write commands to a zone anyway. We certainly can't use zone write > locking for nvme if we want to be able to use multiple queues. > True, and that is the main use case I am seeing in the field. However, even for this to work properly, we will also need to have a special bio_add_page() function for regular writes to zones, similarly to zone append, to ensure that a large BIO does not become multiple requests, won't we ? Otherwise, a write bio submit will generate multiple requests that may get reordered on dispatch and on requeue (on SAS or on SATA). Furthermore, we already have aio supported. Customers in the field use that with fio libaio engine to test drives and for applications development. So I am afraid that removing the zone write locking now would break user space, no ? For nvme, we want to allow the "none" elevator as the default rather than mq-deadline which is now the default for all zoned block devices. This is a very simple change to the default elevator selection we can add based on the nonrot queue flag.
On 10/03/2020 17:42, Christoph Hellwig wrote: [...] > We really need a user of this to be useful upstream. Didn't you plan > to look into converting zonefs/iomap to use it? Without that it is > at best a RFC. Even better would be converting zonefs and the f2fs > zoned code so that can get rid of the old per-zone serialization in > the I/O scheduler entirely. Yes I'm right now working on iomap/zonefs support for zone append.