diff mbox

block: Avoid executing a report or reset zones while a queue is frozen

Message ID 20180417010034.28676-1-bart.vanassche@wdc.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bart Van Assche April 17, 2018, 1 a.m. UTC
This patch on itself does not change the behavior of either ioctl.
However, this patch is necessary to avoid that these ioctls fail
with -EIO if sd_revalidate_disk() is called while these ioctls are
in progress because the current zoned block command code temporarily
clears data that is needed by these ioctls. See also commit
3ed05a987e0f ("blk-zoned: implement ioctls").

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Shaun Tancheff <shaun@tancheff.com>
Cc: stable@vger.kernel.org # v4.10
---
 block/blk-zoned.c | 42 +++++++++++++++++++++++++++++-------------
 1 file changed, 29 insertions(+), 13 deletions(-)

Comments

Christoph Hellwig April 17, 2018, 3:18 p.m. UTC | #1
On Mon, Apr 16, 2018 at 06:00:34PM -0700, Bart Van Assche wrote:
> This patch on itself does not change the behavior of either ioctl.
> However, this patch is necessary to avoid that these ioctls fail
> with -EIO if sd_revalidate_disk() is called while these ioctls are
> in progress because the current zoned block command code temporarily
> clears data that is needed by these ioctls. See also commit
> 3ed05a987e0f ("blk-zoned: implement ioctls").

Hmm.  I think we need to avoid clearing that data and update it using
RCU instead.  Calling blk_queue_enter before submitting bios is
something that would make zone reporting very different from any
other block layer user.
Bart Van Assche April 17, 2018, 3:32 p.m. UTC | #2
On Tue, 2018-04-17 at 17:18 +0200, Christoph Hellwig wrote:
> On Mon, Apr 16, 2018 at 06:00:34PM -0700, Bart Van Assche wrote:

> > This patch on itself does not change the behavior of either ioctl.

> > However, this patch is necessary to avoid that these ioctls fail

> > with -EIO if sd_revalidate_disk() is called while these ioctls are

> > in progress because the current zoned block command code temporarily

> > clears data that is needed by these ioctls. See also commit

> > 3ed05a987e0f ("blk-zoned: implement ioctls").

> 

> Hmm.  I think we need to avoid clearing that data and update it using

> RCU instead.  Calling blk_queue_enter before submitting bios is

> something that would make zone reporting very different from any

> other block layer user.


Hello Christoph,

As you know some struct members that contain zoned block device information
are in struct request queue and others are in struct scsi_disk:

struct scsi_disk {
	[ ... ]
#ifdef CONFIG_BLK_DEV_ZONED
	u32		nr_zones;
	u32		zone_blocks;
	u32		zone_shift;
	u32		zones_optimal_open;
	u32		zones_optimal_nonseq;
	u32		zones_max_open;
#endif
	[ ... ]
};

struct request_queue {
	[ ... ]
	unsigned int		nr_zones;
	unsigned long		*seq_zones_bitmap;
	unsigned long		*seq_zones_wlock;
	[ ... ]
};

Did you perhaps mean to move these members into two new structures, to use
RCU to update the pointers to these structures and also to protect code that
reads these pointers with rcu_read_lock() / rcu_read_unlock()? If so, how to
make sure that both pointers get updated simultaneously such that no code
ever sees a pointer to the old information in e.g. struct scsi_disk and a
pointer to the new information in struct request_queue?

Thanks,

Bart.
Bart Van Assche April 17, 2018, 5:35 p.m. UTC | #3
On Tue, 2018-04-17 at 17:18 +0200, Christoph Hellwig wrote:
> On Mon, Apr 16, 2018 at 06:00:34PM -0700, Bart Van Assche wrote:

> > This patch on itself does not change the behavior of either ioctl.

> > However, this patch is necessary to avoid that these ioctls fail

> > with -EIO if sd_revalidate_disk() is called while these ioctls are

> > in progress because the current zoned block command code temporarily

> > clears data that is needed by these ioctls. See also commit

> > 3ed05a987e0f ("blk-zoned: implement ioctls").

> 

> Hmm.  I think we need to avoid clearing that data and update it using

> RCU instead.  Calling blk_queue_enter before submitting bios is

> something that would make zone reporting very different from any

> other block layer user.


Hello Christoph,

Please have a look at generic_make_request(). I think that function shows
that blk_queue_enter() is called anyway before .make_request_fn() is called.

Thanks,

Bart.
Damien Le Moal April 17, 2018, 10:58 p.m. UTC | #4
Christoph,

On 2018/04/17 8:18, Christoph Hellwig wrote:
> On Mon, Apr 16, 2018 at 06:00:34PM -0700, Bart Van Assche wrote:

>> This patch on itself does not change the behavior of either ioctl.

>> However, this patch is necessary to avoid that these ioctls fail

>> with -EIO if sd_revalidate_disk() is called while these ioctls are

>> in progress because the current zoned block command code temporarily

>> clears data that is needed by these ioctls. See also commit

>> 3ed05a987e0f ("blk-zoned: implement ioctls").

> 

> Hmm.  I think we need to avoid clearing that data and update it using

> RCU instead.  Calling blk_queue_enter before submitting bios is

> something that would make zone reporting very different from any

> other block layer user.

> 


With the scsi patches that Bart sent, the data is not cleared anymore until it
is ready to be updated all at once, in between calls to blk_mq_freeze_queue()
and blk_mq_unfreeze_queue(). This prevents the scheduler (deadline and
mq-deadline) from using partially incorrect data since the queue freeze
guarantees no activity on the scheduler side during the data swap.
But for the BIO issuing side, the calls to blk_queue_enter() ensures the same.

This all allows to leave unmodified all the blk-zoned.c helper functions that
use the request queue information (bitmaps and number of zones), without any
need for rcu lock calls. I think this is a simpler approach.

Best regards.

-- 
Damien Le Moal
Western Digital Research
Christoph Hellwig April 19, 2018, 9:14 a.m. UTC | #5
On Tue, Apr 17, 2018 at 05:35:07PM +0000, Bart Van Assche wrote:
> > Hmm.  I think we need to avoid clearing that data and update it using
> > RCU instead.  Calling blk_queue_enter before submitting bios is
> > something that would make zone reporting very different from any
> > other block layer user.
> 
> Hello Christoph,
> 
> Please have a look at generic_make_request(). I think that function shows
> that blk_queue_enter() is called anyway before .make_request_fn() is called.

Yes, that is the point.  We already call blk_queue_enter in
generic_make_request, which the zone report and reset ops pass through.
diff mbox

Patch

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 20bfc37e1852..acc71e8c3473 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -127,15 +127,19 @@  int blkdev_report_zones(struct block_device *bdev,
 	if (!q)
 		return -ENXIO;
 
+	blk_queue_enter(q, 0);
+
+	ret = -EOPNOTSUPP;
 	if (!blk_queue_is_zoned(q))
-		return -EOPNOTSUPP;
+		goto exit_queue;
 
+	ret = 0;
 	if (!nrz)
-		return 0;
+		goto exit_queue;
 
 	if (sector > bdev->bd_part->nr_sects) {
 		*nr_zones = 0;
-		return 0;
+		goto exit_queue;
 	}
 
 	/*
@@ -154,9 +158,10 @@  int blkdev_report_zones(struct block_device *bdev,
 	nr_pages = min_t(unsigned int, nr_pages,
 			 queue_max_segments(q));
 
+	ret = -ENOMEM;
 	bio = bio_alloc(gfp_mask, nr_pages);
 	if (!bio)
-		return -ENOMEM;
+		goto exit_queue;
 
 	bio_set_dev(bio, bdev);
 	bio->bi_iter.bi_sector = blk_zone_start(q, sector);
@@ -166,7 +171,7 @@  int blkdev_report_zones(struct block_device *bdev,
 		page = alloc_page(gfp_mask);
 		if (!page) {
 			ret = -ENOMEM;
-			goto out;
+			goto put_bio;
 		}
 		if (!bio_add_page(bio, page, PAGE_SIZE, 0)) {
 			__free_page(page);
@@ -179,7 +184,7 @@  int blkdev_report_zones(struct block_device *bdev,
 	else
 		ret = submit_bio_wait(bio);
 	if (ret)
-		goto out;
+		goto put_bio;
 
 	/*
 	 * Process the report result: skip the header and go through the
@@ -222,11 +227,14 @@  int blkdev_report_zones(struct block_device *bdev,
 	}
 
 	*nr_zones = nz;
-out:
+put_bio:
 	bio_for_each_segment_all(bv, bio, i)
 		__free_page(bv->bv_page);
 	bio_put(bio);
 
+exit_queue:
+	blk_queue_exit(q);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(blkdev_report_zones);
@@ -256,21 +264,25 @@  int blkdev_reset_zones(struct block_device *bdev,
 	if (!q)
 		return -ENXIO;
 
+	blk_queue_enter(q, 0);
+
+	ret = -EOPNOTSUPP;
 	if (!blk_queue_is_zoned(q))
-		return -EOPNOTSUPP;
+		goto out;
 
+	ret = -EINVAL;
 	if (end_sector > bdev->bd_part->nr_sects)
 		/* Out of range */
-		return -EINVAL;
+		goto out;
 
 	/* Check alignment (handle eventual smaller last zone) */
 	zone_sectors = blk_queue_zone_sectors(q);
 	if (sector & (zone_sectors - 1))
-		return -EINVAL;
+		goto out;
 
 	if ((nr_sectors & (zone_sectors - 1)) &&
 	    end_sector != bdev->bd_part->nr_sects)
-		return -EINVAL;
+		goto out;
 
 	while (sector < end_sector) {
 
@@ -283,7 +295,7 @@  int blkdev_reset_zones(struct block_device *bdev,
 		bio_put(bio);
 
 		if (ret)
-			return ret;
+			goto out;
 
 		sector += zone_sectors;
 
@@ -292,7 +304,11 @@  int blkdev_reset_zones(struct block_device *bdev,
 
 	}
 
-	return 0;
+	ret = 0;
+
+out:
+	blk_queue_exit(q);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(blkdev_reset_zones);