Message ID | 20240328004409.594888-5-dlemoal@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Zone write plugging | expand |
> - if (req_op(req) == REQ_OP_ZONE_APPEND) > - bio->bi_iter.bi_sector = req->__sector; > - > - if (!is_flush) > + if (!is_flush) { > + blk_zone_update_request_bio(req, bio); > bio_endio(bio); > + } As noted by Bart last time around, the blk_zone_update_request_bio really should stay out of the !is_flush check, as otherwise we'd break zone appends going through the flush state machine. Otherwise this looks good: Reviewed-by: Christoph Hellwig <hch@lst.de>
On 3/28/24 13:14, Christoph Hellwig wrote: >> - if (req_op(req) == REQ_OP_ZONE_APPEND) >> - bio->bi_iter.bi_sector = req->__sector; >> - >> - if (!is_flush) >> + if (!is_flush) { >> + blk_zone_update_request_bio(req, bio); >> bio_endio(bio); >> + } > > As noted by Bart last time around, the blk_zone_update_request_bio > really should stay out of the !is_flush check, as otherwise we'd > break zone appends going through the flush state machine. I do not think that is corect. Because is_flush indicates that RQF_FLUSH_SEQ is set, that is, we are in the middle of a flush sequence. And flush sequence progression is handled at the request level, not BIOs. Once the sequence finishes, then and only then the BIO original endio should be done, meaning that we will then take this path and actually do blk_zone_update_request_bio() and bio_endio(). So I still think this is correct. > > Otherwise this looks good: > > Reviewed-by: Christoph Hellwig <hch@lst.de>
On Thu, Mar 28, 2024 at 02:20:17PM +0900, Damien Le Moal wrote: > I do not think that is corect. Because is_flush indicates that RQF_FLUSH_SEQ is > set, that is, we are in the middle of a flush sequence. And flush sequence > progression is handled at the request level, not BIOs. Once the sequence > finishes, then and only then the BIO original endio should be done, meaning that > we will then take this path and actually do blk_zone_update_request_bio() and > bio_endio(). So I still think this is correct. Well. lk_flush_restore_request with the previous patch now restores rq->__sector, and the blk_mq_end_request call following it will propagate it to the original bio. But blk_flush_restore_request grabs the sector from rq->bio->bi_iter.bi_sector, and we need to actually get it there first, which is done by the data I/O completion that has RQF_FLUSH_SEQ set. I think we really need a good test case for zone append and FUA, i.e. we need the append op for zonefs, which should exercise the fua code if O_SYNC/O_DSYNC is set.
On 3/28/24 14:42, Christoph Hellwig wrote: > On Thu, Mar 28, 2024 at 02:20:17PM +0900, Damien Le Moal wrote: >> I do not think that is corect. Because is_flush indicates that RQF_FLUSH_SEQ is >> set, that is, we are in the middle of a flush sequence. And flush sequence >> progression is handled at the request level, not BIOs. Once the sequence >> finishes, then and only then the BIO original endio should be done, meaning that >> we will then take this path and actually do blk_zone_update_request_bio() and >> bio_endio(). So I still think this is correct. > > Well. > > lk_flush_restore_request with the previous patch now restores rq->__sector, > and the blk_mq_end_request call following it will propagate it to the > original bio. But blk_flush_restore_request grabs the sector from > rq->bio->bi_iter.bi_sector, and we need to actually get it there first, > which is done by the data I/O completion that has RQF_FLUSH_SEQ set. Ah. Yes. There is no issue with the current code for regular writes, but we would get the original sector and not the written sector in the case of zone append. Will make the change. > I think we really need a good test case for zone append and FUA, > i.e. we need the append op for zonefs, which should exercise the > fua code if O_SYNC/O_DSYNC is set. Yep. There is currently no issuer of zone append + FUA. But once I get to add that for zonefs and block dev files, we indeed will have good tstbed. >
On 3/27/24 5:43 PM, Damien Le Moal wrote: > On completion of a zone append request, the request sector indicates the > location of the written data. This value must be returned to the user > through the BIO iter sector. This is done in 2 places: in > blk_complete_request() and in blk_update_request(). Introduce the inline > helper function blk_zone_update_request_bio() to avoid duplicating > this BIO update for zone append requests, and to compile out this > helper call when CONFIG_BLK_DEV_ZONED is not enabled. Reviewed-by: Bart Van Assche <bvanassche@acm.org>
diff --git a/block/blk-mq.c b/block/blk-mq.c index e55af6058cbf..70dfb4af65cf 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -820,11 +820,11 @@ static void blk_complete_request(struct request *req) /* Completion has already been traced */ bio_clear_flag(bio, BIO_TRACE_COMPLETION); - if (req_op(req) == REQ_OP_ZONE_APPEND) - bio->bi_iter.bi_sector = req->__sector; - - if (!is_flush) + if (!is_flush) { + blk_zone_update_request_bio(req, bio); bio_endio(bio); + } + bio = next; } while (bio); @@ -923,8 +923,7 @@ bool blk_update_request(struct request *req, blk_status_t error, /* Don't actually finish bio if it's part of flush sequence */ if (!bio->bi_iter.bi_size && !is_flush) { - if (req_op(req) == REQ_OP_ZONE_APPEND) - bio->bi_iter.bi_sector = req->__sector; + blk_zone_update_request_bio(req, bio); bio_endio(bio); } diff --git a/block/blk.h b/block/blk.h index 5cac4e29ae17..a12cde1d45de 100644 --- a/block/blk.h +++ b/block/blk.h @@ -409,12 +409,29 @@ static inline struct bio *blk_queue_bounce(struct bio *bio, #ifdef CONFIG_BLK_DEV_ZONED void disk_free_zone_bitmaps(struct gendisk *disk); +static inline void blk_zone_update_request_bio(struct request *rq, + struct bio *bio) +{ + /* + * For zone append requests, the request sector indicates the location + * at which the BIO data was written. Return this value to the BIO + * issuer through the BIO iter sector. + */ + if (req_op(rq) == REQ_OP_ZONE_APPEND) + bio->bi_iter.bi_sector = rq->__sector; +} int blkdev_report_zones_ioctl(struct block_device *bdev, unsigned int cmd, unsigned long arg); int blkdev_zone_mgmt_ioctl(struct block_device *bdev, blk_mode_t mode, unsigned int cmd, unsigned long arg); #else /* CONFIG_BLK_DEV_ZONED */ -static inline void disk_free_zone_bitmaps(struct gendisk *disk) {} +static inline void disk_free_zone_bitmaps(struct gendisk *disk) +{ +} +static inline void blk_zone_update_request_bio(struct request *rq, + struct bio *bio) +{ +} static inline int blkdev_report_zones_ioctl(struct block_device *bdev, unsigned int cmd, unsigned long arg) {