From patchwork Wed Jan 3 14:03:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 10142445 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id AA06E6035E for ; Wed, 3 Jan 2018 14:04:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9ADBE290C6 for ; Wed, 3 Jan 2018 14:04:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8E76E290C9; Wed, 3 Jan 2018 14:04:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3D418290C6 for ; Wed, 3 Jan 2018 14:04:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752038AbeACOEm (ORCPT ); Wed, 3 Jan 2018 09:04:42 -0500 Received: from mx2.suse.de ([195.135.220.15]:55098 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752506AbeACOEm (ORCPT ); Wed, 3 Jan 2018 09:04:42 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id A5584ADAE; Wed, 3 Jan 2018 14:04:40 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, mlyle@lyle.org, tang.junhui@zte.com.cn, Coly Li Subject: [PATCH v1 10/10] bcache: stop all attached bcache devices for a retired cache set Date: Wed, 3 Jan 2018 22:03:25 +0800 Message-Id: <20180103140325.63175-11-colyli@suse.de> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180103140325.63175-1-colyli@suse.de> References: <20180103140325.63175-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When there are too many I/O errors on cache device, current bcache code will retire the whole cache set, and detach all bcache devices. But the detached bcache devices are not stopped, which is problematic when bcache is in writeback mode. If the retired cache set has dirty data of backing devices, continue writing to bcache device will write to backing device directly. If the LBA of write request has a dirty version cached on cache device, next time when the cache device is re-registered and backing device re-attached to it again, the stale dirty data on cache device will be written to backing device, and overwrite latest directly written data. This situation causes a quite data corruption. This patch checkes whether cache_set->io_disable is true in __cache_set_unregister(). If cache_set->io_disable is true, it means cache set is unregistering by too many I/O errors, then all attached bcache devices will be stopped as well. If cache_set->io_disable is not true, it means __cache_set_unregister() is triggered by writing 1 to sysfs file /sys/fs/bcache//bcache/stop. This is an exception because users do it explicitly, this patch keeps existing behavior and does not stop any bcache device. Even the failed cache device has no dirty data, stopping bcache device is still a desired behavior by many Ceph and data base users. Then their application will report I/O errors due to disappeared bcache device, and operation people will know the cache device is broken or disconnected. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke --- drivers/md/bcache/super.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 49d6fedf89c3..20a7a6959506 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1458,6 +1458,14 @@ static void __cache_set_unregister(struct closure *cl) dc = container_of(c->devices[i], struct cached_dev, disk); bch_cached_dev_detach(dc); + /* + * If we come here by too many I/O errors, + * bcache device should be stopped too, to + * keep data consistency on cache and + * backing devices. + */ + if (c->io_disable) + bcache_device_stop(c->devices[i]); } else { bcache_device_stop(c->devices[i]); }