diff mbox

[v7,9/9] bcache: stop bcache device when backing device is offline

Message ID 20180227165557.20442-10-colyli@suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Coly Li Feb. 27, 2018, 4:55 p.m. UTC
Currently bcache does not handle backing device failure, if backing
device is offline and disconnected from system, its bcache device can still
be accessible. If the bcache device is in writeback mode, I/O requests even
can success if the requests hit on cache device. That is to say, when and
how bcache handles offline backing device is undefined.

This patch tries to handle backing device offline in a rather simple way,
- Add cached_dev->status_update_thread kernel thread to update backing
  device status in every 1 second.
- Add cached_dev->offline_seconds to record how many seconds the backing
  device is observed to be offline. If the backing device is offline for
  BACKING_DEV_OFFLINE_TIMEOUT (30) seconds, set dc->io_disable to 1 and
  call bcache_device_stop() to stop the bache device which linked to the
  offline backing device.

Now if a backing device is offline for BACKING_DEV_OFFLINE_TIMEOUT seconds,
its bcache device will be removed, then user space application writing on
it will get error immediately, and handler the device failure in time.

This patch is quite simple, does not handle more complicated situations.
Once the bcache device is stopped, users need to recovery the backing
device, register and attach it manually.

Changelog:
v3: call wait_for_kthread_stop() before exits kernel thread.
v2: remove "bcache: " prefix when calling pr_warn().
v1: initial version.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Michael Lyle <mlyle@lyle.org>
Cc: Junhui Tang <tang.junhui@zte.com.cn>
---
 drivers/md/bcache/bcache.h |  2 ++
 drivers/md/bcache/super.c  | 55 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)

Comments

Michael Lyle Feb. 27, 2018, 6:20 p.m. UTC | #1
Hi Coly Li--

Just a couple of questions.

On 02/27/2018 08:55 AM, Coly Li wrote:
> +#define BACKING_DEV_OFFLINE_TIMEOUT 5

I think you wanted this to be 30 (per commit message)-- was this turned
down for testing or deliberate?

> +static int cached_dev_status_update(void *arg)
> +{
> +	struct cached_dev *dc = arg;
> +	struct request_queue *q;
> +	char buf[BDEVNAME_SIZE];
> +
> +	/*
> +	 * If this delayed worker is stopping outside, directly quit here.
> +	 * dc->io_disable might be set via sysfs interface, so check it
> +	 * here too.
> +	 */
> +	while (!kthread_should_stop() && !dc->io_disable) {
> +		q = bdev_get_queue(dc->bdev);
> +		if (blk_queue_dying(q))

I am not sure-- is this the correct test to know if the bdev is offline?
 It's very sparsely used outside of the core block system.  (Is there
any scenario where the queue comes back?  Can the device be "offline"
and still have a live queue?  Another approach might be to set
io_disable when there's an extended period without a successful IO, and
it might be possible to do this with atomics without a thread).

This approach can work if you're certain that blk_queue_dying is an
appropriate test for all failure scenarios (I just don't know).

Mike
Coly Li Feb. 28, 2018, 4:54 a.m. UTC | #2
On 28/02/2018 2:20 AM, Michael Lyle wrote:
> Hi Coly Li--
> 
> Just a couple of questions.
> 
> On 02/27/2018 08:55 AM, Coly Li wrote:
>> +#define BACKING_DEV_OFFLINE_TIMEOUT 5
> 

Hi Mike,

> I think you wanted this to be 30 (per commit message)-- was this turned
> down for testing or deliberate?
> 

Currently the correct timeout is 5 seconds. The 30 seconds was for the
condition when the offline backing device came back within 30 seconds,
which is not implemented yet. In general after 5 seconds, the offline
device will be deleted from system, if it comes back again the device
name might be changed (e.g. from /dev/sdc to /dev/sdd) with different
bdev index. I need to recognize the new coming drive is previous offline
one, and modify bcache internal data structures to link to the new
device name. This requires more details and effort, and I decided to
work on it later as a separate topic. Obviously I didn't update patch
commit log properly. Thanks for point out this :-)

>> +static int cached_dev_status_update(void *arg)
>> +{
>> +	struct cached_dev *dc = arg;
>> +	struct request_queue *q;
>> +	char buf[BDEVNAME_SIZE];
>> +
>> +	/*
>> +	 * If this delayed worker is stopping outside, directly quit here.
>> +	 * dc->io_disable might be set via sysfs interface, so check it
>> +	 * here too.
>> +	 */
>> +	while (!kthread_should_stop() && !dc->io_disable) {
>> +		q = bdev_get_queue(dc->bdev);
>> +		if (blk_queue_dying(q))
> 
> I am not sure-- is this the correct test to know if the bdev is offline?
>  It's very sparsely used outside of the core block system.  (Is there
> any scenario where the queue comes back?  Can the device be "offline"
> and still have a live queue?  Another approach might be to set
> io_disable when there's an extended period without a successful IO, and
> it might be possible to do this with atomics without a thread).
> 

I asked blocker layer developers Ming Lei, this is the methods suggested
 for now. And in my test, it works as expected.

Anyway, using a kernel thread to monitor device status is ugly, the
reason I have to do it this way is because there is no general notifier
for block layer device offline or failure. There is a bus notifier, but
for devices have no bus, it does not work well. So I am also thinking of
the possibility for a generic notifier for block device offline or
failure, if it can be done someday, we can just register a callback
routine, and not need an extra kernel thread.

> This approach can work if you're certain that blk_queue_dying is an
> appropriate test for all failure scenarios (I just don't know).

So far it works well, and I am collaborating with our partners to test
the patch set, no negative report so far.

Thanks.

Coly Li
diff mbox

Patch

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index d338b7086013..1cfe37dba0f1 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -345,6 +345,7 @@  struct cached_dev {
 
 	struct keybuf		writeback_keys;
 
+	struct task_struct	*status_update_thread;
 	/*
 	 * Order the write-half of writeback operations strongly in dispatch
 	 * order.  (Maintain LBA order; don't allow reads completing out of
@@ -392,6 +393,7 @@  struct cached_dev {
 #define DEFAULT_CACHED_DEV_ERROR_LIMIT	64
 	atomic_t		io_errors;
 	unsigned		error_limit;
+	unsigned		offline_seconds;
 };
 
 enum alloc_reserve {
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index cae4caac17da..abeb52bd9ebe 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -654,6 +654,11 @@  static int ioctl_dev(struct block_device *b, fmode_t mode,
 		     unsigned int cmd, unsigned long arg)
 {
 	struct bcache_device *d = b->bd_disk->private_data;
+	struct cached_dev *dc = container_of(d, struct cached_dev, disk);
+
+	if (dc->io_disable)
+		return -EIO;
+
 	return d->ioctl(d, mode, cmd, arg);
 }
 
@@ -864,6 +869,45 @@  static void calc_cached_dev_sectors(struct cache_set *c)
 	c->cached_dev_sectors = sectors;
 }
 
+#define BACKING_DEV_OFFLINE_TIMEOUT 5
+static int cached_dev_status_update(void *arg)
+{
+	struct cached_dev *dc = arg;
+	struct request_queue *q;
+	char buf[BDEVNAME_SIZE];
+
+	/*
+	 * If this delayed worker is stopping outside, directly quit here.
+	 * dc->io_disable might be set via sysfs interface, so check it
+	 * here too.
+	 */
+	while (!kthread_should_stop() && !dc->io_disable) {
+		q = bdev_get_queue(dc->bdev);
+		if (blk_queue_dying(q))
+			dc->offline_seconds++;
+		else
+			dc->offline_seconds = 0;
+
+		if (dc->offline_seconds >= BACKING_DEV_OFFLINE_TIMEOUT) {
+			pr_err("%s: device offline for %d seconds",
+				bdevname(dc->bdev, buf),
+				BACKING_DEV_OFFLINE_TIMEOUT);
+			pr_err("%s: disable I/O request due to backing "
+				"device offline", dc->disk.name);
+			dc->io_disable = true;
+			/* let others know earlier that io_disable is true */
+			smp_mb();
+			bcache_device_stop(&dc->disk);
+			break;
+		}
+
+		schedule_timeout_interruptible(HZ);
+	}
+
+	wait_for_kthread_stop();
+	return 0;
+}
+
 void bch_cached_dev_run(struct cached_dev *dc)
 {
 	struct bcache_device *d = &dc->disk;
@@ -906,6 +950,15 @@  void bch_cached_dev_run(struct cached_dev *dc)
 	if (sysfs_create_link(&d->kobj, &disk_to_dev(d->disk)->kobj, "dev") ||
 	    sysfs_create_link(&disk_to_dev(d->disk)->kobj, &d->kobj, "bcache"))
 		pr_debug("error creating sysfs link");
+
+	dc->status_update_thread = kthread_run(cached_dev_status_update,
+					       dc,
+					      "bcache_status_update");
+	if (IS_ERR(dc->status_update_thread)) {
+		pr_warn("failed to create bcache_status_update kthread, "
+			"continue to run without monitoring backing "
+			"device status");
+	}
 }
 
 /*
@@ -1128,6 +1181,8 @@  static void cached_dev_free(struct closure *cl)
 		kthread_stop(dc->writeback_thread);
 	if (dc->writeback_write_wq)
 		destroy_workqueue(dc->writeback_write_wq);
+	if (!IS_ERR_OR_NULL(dc->status_update_thread))
+		kthread_stop(dc->status_update_thread);
 
 	if (atomic_read(&dc->running))
 		bd_unlink_disk_holder(dc->bdev, dc->disk.disk);