Message ID | 570D5D50.6000408@sandisk.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 04/12/2016 10:40 PM, Bart Van Assche wrote: > Split discard requests as follows: > * If the start sector is not aligned, an initial write request up > to the first aligned sector. > * A discard request from the first aligned sector in the range up > to the last aligned sector in the discarded range. > * If the end sector is not aligned, a final write request from the > last aligned sector up to the end. > > Note: if the start and/or end sectors are not aligned and if the > range is small enough the discard request will be submitted with > bi_size == 0. > > Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> > Cc: Jan Kara <jack@suse.cz> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Mike Snitzer <snitzer@redhat.com> > Cc: Martin K. Petersen <martin.petersen@oracle.com> > Cc: Dmitry Monakhov <dmonakhov@openvz.org> > Cc: Darrick J. Wong <darrick.wong@oracle.com> > Cc: Sagi Grimberg <sagi@grimberg.me> > --- > block/blk-lib.c | 4 ++-- > block/blk-merge.c | 55 ++++++++++++++++++++++++++++++------------------------- > block/blk.h | 3 +++ > 3 files changed, 35 insertions(+), 27 deletions(-) > Well. I do understand the intent (and, in fact, I need a similar thing for SMR :-), but I'm not sure if the implementation is correct. From my understanding, 'discard' is telling the device to there are no outstanding users, and the device may be re-arranging these blocks as needed. And the 'discard_zeroes_data' is just a hint that the blocks will be zeroed while/after being discarded. WRITE SAME approaches it from the other side, blanking out blocks and optionally discards them. So I wonder if we should plumb this into blkdev_issue_zeroout(), not blk_issue_discard(). Or maybe both ... Cheers, Hannes
On 04/13/2016 07:23 AM, Hannes Reinecke wrote: > So I wonder if we should plumb this into blkdev_issue_zeroout(), not > blk_issue_discard(). That's an excellent question. Let's take a step back and look at the functionality that is exposed to user space. What should the behavior of the BLKDISCARD and BLKSECDISCARD ioctls be if the start and/or end sectors are not aligned? My patch ensures that these ioctls do something meaningful if the start and/or end sector are not aligned on a discard boundary. Is that the behavior we want or should rather we make these ioctls fail if the start and/or end sectors are not aligned? Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/13/2016 05:22 PM, Bart Van Assche wrote: > On 04/13/2016 07:23 AM, Hannes Reinecke wrote: >> So I wonder if we should plumb this into blkdev_issue_zeroout(), not >> blk_issue_discard(). > > That's an excellent question. Let's take a step back and look at the > functionality that is exposed to user space. What should the behavior of > the BLKDISCARD and BLKSECDISCARD ioctls be if the start and/or end > sectors are not aligned? My patch ensures that these ioctls do something > meaningful if the start and/or end sector are not aligned on a discard > boundary. Is that the behavior we want or should rather we make these > ioctls fail if the start and/or end sectors are not aligned? > Guess what, I've run into the same issue as I've tried to adapt blkdev_issue_discard for SMR drives. There are always some areas on SMR drives which do not necessarily expose a discard functionality, so the same problem applies to that, too. Lightning talk as LSF? Cheers, Hannes
On Wed, Apr 13, 2016 at 08:22:12AM -0700, Bart Van Assche wrote: > That's an excellent question. Let's take a step back and look at the > functionality that is exposed to user space. What should the behavior of > the BLKDISCARD and BLKSECDISCARD ioctls be if the start and/or end sectors > are not aligned? My patch ensures that these ioctls do something meaningful > if the start and/or end sector are not aligned on a discard boundary. Is > that the behavior we want or should rather we make these ioctls fail if the > start and/or end sectors are not aligned? BLKDISCARD should just skip these ranges. You'll need to use BLKZEROOUT or the falloc support from Darrick if you actually want zeroes. BLKSECDISCARD implies it should fully erase data by the name, for which zeroing might actually not be enough. I'd be tempted to fail it, but would like to see input from the people who created it and/or are using it. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/block/blk-lib.c b/block/blk-lib.c index 9a93ca4..d78ded5 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -30,7 +30,7 @@ static void bio_batch_end_io(struct bio *bio) * Return the largest number that is less than or equal to @s and for which * the remainder of the division by @granularity is @alignment. */ -static sector_t blk_round_sect_down(sector_t s, u32 granularity, u32 alignment) +sector_t blk_round_sect_down(sector_t s, u32 granularity, u32 alignment) { sector_t tmp = s, res = s; u32 remainder; @@ -219,7 +219,7 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, } EXPORT_SYMBOL(blkdev_issue_write_same); -static void bio_add_zero_pages(struct bio *bio, sector_t nr_sects) +void bio_add_zero_pages(struct bio *bio, sector_t nr_sects) { unsigned int sz; int ret; diff --git a/block/blk-merge.c b/block/blk-merge.c index 2613531..fd15606 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -16,42 +16,47 @@ static struct bio *blk_bio_discard_split(struct request_queue *q, struct bio_set *bs, unsigned *nsegs) { + struct bio *wr; unsigned int max_discard_sectors, granularity; int alignment; - sector_t tmp; - unsigned split_sectors; + sector_t start, start_r, end, end_r, skip; *nsegs = 1; /* Zero-sector (unknown) and one-sector granularities are the same. */ granularity = max(q->limits.discard_granularity >> 9, 1U); - + alignment = (q->limits.discard_alignment >> 9) % granularity; max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9); - max_discard_sectors -= max_discard_sectors % granularity; - - if (unlikely(!max_discard_sectors)) { - /* XXX: warn */ - return NULL; - } - - if (bio_sectors(bio) <= max_discard_sectors) - return NULL; - - split_sectors = max_discard_sectors; + WARN_ON_ONCE(max_discard_sectors == 0); /* - * If the next starting sector would be misaligned, stop the discard at - * the previous aligned sector. + * If the start or end sector are misaligned, issue a write same + * same request if the discard_zeroes_data flag has been set. */ - alignment = (q->limits.discard_alignment >> 9) % granularity; - - tmp = bio->bi_iter.bi_sector + split_sectors - alignment; - tmp = sector_div(tmp, granularity); - - if (split_sectors > tmp) - split_sectors -= tmp; - - return bio_split(bio, split_sectors, GFP_NOIO, bs); + start = bio->bi_iter.bi_sector; + start_r = blk_round_sect_down(start, granularity, alignment); + end = start + min(max_discard_sectors, bio_sectors(bio)); + end_r = blk_round_sect_down(end, granularity, alignment); + if (start == start_r && start < end_r) { + if (end == end_r && bio_sectors(bio) == end_r - start) + return NULL; + return bio_split(bio, end_r - start, GFP_NOIO, bs); + } + if (q->limits.discard_zeroes_data && start < end) { + end = min(end, start_r + granularity); + wr = bio_alloc_bioset(GFP_NOIO, end - start, bs); + if (WARN_ON_ONCE(!wr)) + return NULL; + wr->bi_rw = REQ_WRITE; + wr->bi_iter.bi_sector = start; + wr->bi_bdev = bio->bi_bdev; + bio_add_zero_pages(wr, end - start); + bio_advance(bio, wr->bi_iter.bi_size); + return wr; + } + skip = (min(start_r + granularity, end) - start) << 9; + bio_advance(bio, skip); + return NULL; } static struct bio *blk_bio_write_same_split(struct request_queue *q, diff --git a/block/blk.h b/block/blk.h index 70e4aee..31b13f9 100644 --- a/block/blk.h +++ b/block/blk.h @@ -36,6 +36,9 @@ extern struct kmem_cache *request_cachep; extern struct kobj_type blk_queue_ktype; extern struct ida blk_queue_ida; +sector_t blk_round_sect_down(sector_t s, u32 granularity, u32 alignment); +void bio_add_zero_pages(struct bio *bio, sector_t nr_sects); + static inline struct blk_flush_queue *blk_get_flush_queue( struct request_queue *q, struct blk_mq_ctx *ctx) {
Split discard requests as follows: * If the start sector is not aligned, an initial write request up to the first aligned sector. * A discard request from the first aligned sector in the range up to the last aligned sector in the discarded range. * If the end sector is not aligned, a final write request from the last aligned sector up to the end. Note: if the start and/or end sectors are not aligned and if the range is small enough the discard request will be submitted with bi_size == 0. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Dmitry Monakhov <dmonakhov@openvz.org> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Sagi Grimberg <sagi@grimberg.me> --- block/blk-lib.c | 4 ++-- block/blk-merge.c | 55 ++++++++++++++++++++++++++++++------------------------- block/blk.h | 3 +++ 3 files changed, 35 insertions(+), 27 deletions(-)