Message ID | 20190213095044.29628-2-bob.liu@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Block/XFS: Support alternative mirror device retry | expand |
On Feb 13, 2019, at 2:50 AM, Bob Liu <bob.liu@oracle.com> wrote: > > When fs data/metadata checksum mismatch, lower block devices may have other > correct copies. e.g if we did raid1 for protecting fs metadata. > Then fs could try other copies of metadata instead of panic, but fs need be > awared how many mirrors the block devices have. > > This patch add @nr_mirrors to struct request_queue which is similar as > blk_queue_nonrot(), filesystem can grab device request queue and check the > number of mirrors of this block device. > > @nr_mirrors is 1 by default which means only one copy, drivers e.g raid1 are > responsible for setting the right value. The maximum value is > BITS_PER_LONG which is 32 or 64. That should be big enough else retry lantency > may be too high. > > Also added helper functions for get/set the number of mirrors for a specific > device request queue. > > Todo: > * Export nr_mirrors through /sysfs. > > Signed-off-by: Bob Liu <bob.liu@oracle.com> > diff --git a/block/blk-settings.c b/block/blk-settings.c > index 3e7038e475ee..38e4d7e675e6 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -844,6 +844,30 @@ void blk_queue_write_cache(struct request_queue *q, bool wc, bool fua) > +/* > + * Set the number of read redundant mirrors. > + */ > +bool blk_queue_set_mirrors(struct request_queue *q, unsigned short mirrors) > +{ > + if(q->nr_mirrors >= BLKDEV_MAX_MIRRORS) { > + printk("blk_queue_set_mirrors: %d exceed max mirrors(%d)\n", > + mirrors, BLKDEV_MAX_MIRRORS); Need to supply a KERN_ level here. Cheers, Andreas
On Wed, Feb 13, 2019 at 05:50:36PM +0800, Bob Liu wrote: > @nr_mirrors is 1 by default which means only one copy, drivers e.g raid1 are > responsible for setting the right value. The maximum value is > BITS_PER_LONG which is 32 or 64. That should be big enough else retry lantency > may be too high. This is admittedly bike-shedding, so feel free to ignore, but... In the case of Raid 6, "mirrors" will be a bit of a misnomer. Would "nr_recovery" be better? Thanks for working on this!! I would be interested in using this for ext4 once it's available. - Ted
On 2/14/19 12:04 AM, Theodore Y. Ts'o wrote: > On Wed, Feb 13, 2019 at 05:50:36PM +0800, Bob Liu wrote: >> @nr_mirrors is 1 by default which means only one copy, drivers e.g raid1 are >> responsible for setting the right value. The maximum value is >> BITS_PER_LONG which is 32 or 64. That should be big enough else retry lantency >> may be too high. > > This is admittedly bike-shedding, so feel free to ignore, but... > > In the case of Raid 6, "mirrors" will be a bit of a misnomer. Would > "nr_recovery" be better? > Now the initial/default value is 1 indicating only one copy of data. Would nr_copy be more accurate? > Thanks for working on this!! I would be interested in using this for > ext4 once it's available. > > - Ted >
On Thu, Feb 14, 2019 at 01:57:20PM +0800, Bob Liu wrote: > > Now the initial/default value is 1 indicating only one copy of data. > Would nr_copy be more accurate? > Well, it's at least shorter; the problem is that it's not really another "copy" of the data, it's just that it can simply be different (multiple) ways of reconstructing the data. I suppose we could say that it's a virtual copy. In any case, I can't think of a better term, so nr_copy is probably as good as any. Cheers, - Ted
diff --git a/block/blk-core.c b/block/blk-core.c index 6b78ec56a4f2..b838c6dc5357 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -537,6 +537,9 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) if (blkcg_init_queue(q)) goto fail_ref; + /* Set queue default mirrors to 1 explicitly. */ + blk_queue_set_mirrors(q, 1); + return q; fail_ref: diff --git a/block/blk-settings.c b/block/blk-settings.c index 3e7038e475ee..38e4d7e675e6 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -844,6 +844,30 @@ void blk_queue_write_cache(struct request_queue *q, bool wc, bool fua) } EXPORT_SYMBOL_GPL(blk_queue_write_cache); +/* + * Get the number of read redundant mirrors. + */ +unsigned short blk_queue_get_mirrors(struct request_queue *q) +{ + return q->nr_mirrors; +} +EXPORT_SYMBOL(blk_queue_get_mirrors); + +/* + * Set the number of read redundant mirrors. + */ +bool blk_queue_set_mirrors(struct request_queue *q, unsigned short mirrors) +{ + if(q->nr_mirrors >= BLKDEV_MAX_MIRRORS) { + printk("blk_queue_set_mirrors: %d exceed max mirrors(%d)\n", + mirrors, BLKDEV_MAX_MIRRORS); + return false; + } + q->nr_mirrors = mirrors; + return true; +} +EXPORT_SYMBOL(blk_queue_set_mirrors); + static int __init blk_settings_init(void) { blk_max_low_pfn = max_low_pfn - 1; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 338604dff7d0..0191dc4d3f2d 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -570,6 +570,7 @@ struct request_queue { #define BLK_MAX_WRITE_HINTS 5 u64 write_hints[BLK_MAX_WRITE_HINTS]; + unsigned long nr_mirrors; /* Default value is 1 */ }; #define QUEUE_FLAG_STOPPED 1 /* queue is stopped */ @@ -1071,6 +1072,8 @@ extern void blk_queue_update_dma_alignment(struct request_queue *, int); extern void blk_queue_rq_timeout(struct request_queue *, unsigned int); extern void blk_queue_flush_queueable(struct request_queue *q, bool queueable); extern void blk_queue_write_cache(struct request_queue *q, bool enabled, bool fua); +extern unsigned short blk_queue_get_mirrors(struct request_queue *q); +extern bool blk_queue_set_mirrors(struct request_queue *q, unsigned short mirrors); /* * Number of physical segments as sent to the device. diff --git a/include/linux/types.h b/include/linux/types.h index c2615d6a019e..a29135772f3a 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -7,6 +7,9 @@ #ifndef __ASSEMBLY__ +/* max mirrors of blkdev */ +#define BLKDEV_MAX_MIRRORS BITS_PER_LONG + #define DECLARE_BITMAP(name,bits) \ unsigned long name[BITS_TO_LONGS(bits)]
When fs data/metadata checksum mismatch, lower block devices may have other correct copies. e.g if we did raid1 for protecting fs metadata. Then fs could try other copies of metadata instead of panic, but fs need be awared how many mirrors the block devices have. This patch add @nr_mirrors to struct request_queue which is similar as blk_queue_nonrot(), filesystem can grab device request queue and check the number of mirrors of this block device. @nr_mirrors is 1 by default which means only one copy, drivers e.g raid1 are responsible for setting the right value. The maximum value is BITS_PER_LONG which is 32 or 64. That should be big enough else retry lantency may be too high. Also added helper functions for get/set the number of mirrors for a specific device request queue. Todo: * Export nr_mirrors through /sysfs. Signed-off-by: Bob Liu <bob.liu@oracle.com> --- block/blk-core.c | 3 +++ block/blk-settings.c | 24 ++++++++++++++++++++++++ include/linux/blkdev.h | 3 +++ include/linux/types.h | 3 +++ 4 files changed, 33 insertions(+)