Message ID | 20180103140325.63175-11-colyli@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 01/03/2018 03:03 PM, Coly Li wrote: > When there are too many I/O errors on cache device, current bcache code > will retire the whole cache set, and detach all bcache devices. But the > detached bcache devices are not stopped, which is problematic when bcache > is in writeback mode. > > If the retired cache set has dirty data of backing devices, continue > writing to bcache device will write to backing device directly. If the > LBA of write request has a dirty version cached on cache device, next time > when the cache device is re-registered and backing device re-attached to > it again, the stale dirty data on cache device will be written to backing > device, and overwrite latest directly written data. This situation causes > a quite data corruption. > > This patch checkes whether cache_set->io_disable is true in > __cache_set_unregister(). If cache_set->io_disable is true, it means cache > set is unregistering by too many I/O errors, then all attached bcache > devices will be stopped as well. If cache_set->io_disable is not true, it > means __cache_set_unregister() is triggered by writing 1 to sysfs file > /sys/fs/bcache/<UUID>/bcache/stop. This is an exception because users do > it explicitly, this patch keeps existing behavior and does not stop any > bcache device. > > Even the failed cache device has no dirty data, stopping bcache device is > still a desired behavior by many Ceph and data base users. Then their > application will report I/O errors due to disappeared bcache device, and > operation people will know the cache device is broken or disconnected. > > Signed-off-by: Coly Li <colyli@suse.de> > --- > drivers/md/bcache/super.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c > index 49d6fedf89c3..20a7a6959506 100644 > --- a/drivers/md/bcache/super.c > +++ b/drivers/md/bcache/super.c > @@ -1458,6 +1458,14 @@ static void __cache_set_unregister(struct closure *cl) > dc = container_of(c->devices[i], > struct cached_dev, disk); > bch_cached_dev_detach(dc); > + /* > + * If we come here by too many I/O errors, > + * bcache device should be stopped too, to > + * keep data consistency on cache and > + * backing devices. > + */ > + if (c->io_disable) > + bcache_device_stop(c->devices[i]); > } else { > bcache_device_stop(c->devices[i]); > } > Reviewed-by: Hannes Reinecke <hare@suse.com> Cheers, Hannes
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 49d6fedf89c3..20a7a6959506 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1458,6 +1458,14 @@ static void __cache_set_unregister(struct closure *cl) dc = container_of(c->devices[i], struct cached_dev, disk); bch_cached_dev_detach(dc); + /* + * If we come here by too many I/O errors, + * bcache device should be stopped too, to + * keep data consistency on cache and + * backing devices. + */ + if (c->io_disable) + bcache_device_stop(c->devices[i]); } else { bcache_device_stop(c->devices[i]); }
When there are too many I/O errors on cache device, current bcache code will retire the whole cache set, and detach all bcache devices. But the detached bcache devices are not stopped, which is problematic when bcache is in writeback mode. If the retired cache set has dirty data of backing devices, continue writing to bcache device will write to backing device directly. If the LBA of write request has a dirty version cached on cache device, next time when the cache device is re-registered and backing device re-attached to it again, the stale dirty data on cache device will be written to backing device, and overwrite latest directly written data. This situation causes a quite data corruption. This patch checkes whether cache_set->io_disable is true in __cache_set_unregister(). If cache_set->io_disable is true, it means cache set is unregistering by too many I/O errors, then all attached bcache devices will be stopped as well. If cache_set->io_disable is not true, it means __cache_set_unregister() is triggered by writing 1 to sysfs file /sys/fs/bcache/<UUID>/bcache/stop. This is an exception because users do it explicitly, this patch keeps existing behavior and does not stop any bcache device. Even the failed cache device has no dirty data, stopping bcache device is still a desired behavior by many Ceph and data base users. Then their application will report I/O errors due to disappeared bcache device, and operation people will know the cache device is broken or disconnected. Signed-off-by: Coly Li <colyli@suse.de> --- drivers/md/bcache/super.c | 8 ++++++++ 1 file changed, 8 insertions(+)