From patchwork Sun Jan 14 14:42:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 10162643 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 5F71960390 for ; Sun, 14 Jan 2018 14:44:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4A53D28A9B for ; Sun, 14 Jan 2018 14:44:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3F3E328AAE; Sun, 14 Jan 2018 14:44:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AB10828A9B for ; Sun, 14 Jan 2018 14:44:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751838AbeANOoL (ORCPT ); Sun, 14 Jan 2018 09:44:11 -0500 Received: from mx2.suse.de ([195.135.220.15]:49090 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751709AbeANOoL (ORCPT ); Sun, 14 Jan 2018 09:44:11 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 0C566AB49; Sun, 14 Jan 2018 14:44:10 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, Coly Li , Junhui Tang , Michael Lyle Subject: [PATCH v3 09/13] bcache: stop all attached bcache devices for a retired cache set Date: Sun, 14 Jan 2018 22:42:32 +0800 Message-Id: <20180114144236.28213-10-colyli@suse.de> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180114144236.28213-1-colyli@suse.de> References: <20180114144236.28213-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When there are too many I/O errors on cache device, current bcache code will retire the whole cache set, and detach all bcache devices. But the detached bcache devices are not stopped, which is problematic when bcache is in writeback mode. If the retired cache set has dirty data of backing devices, continue writing to bcache device will write to backing device directly. If the LBA of write request has a dirty version cached on cache device, next time when the cache device is re-registered and backing device re-attached to it again, the stale dirty data on cache device will be written to backing device, and overwrite latest directly written data. This situation causes a quite data corruption. This patch checkes whether cache_set->io_disable is true in __cache_set_unregister(). If cache_set->io_disable is true, it means cache set is unregistering by too many I/O errors, then all attached bcache devices will be stopped as well. If cache_set->io_disable is not true, it means __cache_set_unregister() is triggered by writing 1 to sysfs file /sys/fs/bcache//bcache/stop. This is an exception because users do it explicitly, this patch keeps existing behavior and does not stop any bcache device. Even the failed cache device has no dirty data, stopping bcache device is still a desired behavior by many Ceph and data base users. Then their application will report I/O errors due to disappeared bcache device, and operation people will know the cache device is broken or disconnected. Changelog: v2: add Reviewed-by from Hannes. v1: initial version for review. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke Cc: Junhui Tang Cc: Michael Lyle --- drivers/md/bcache/super.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 4204d75aee7b..97e3bb8e1aee 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1478,6 +1478,14 @@ static void __cache_set_unregister(struct closure *cl) dc = container_of(c->devices[i], struct cached_dev, disk); bch_cached_dev_detach(dc); + /* + * If we come here by too many I/O errors, + * bcache device should be stopped too, to + * keep data consistency on cache and + * backing devices. + */ + if (test_bit(CACHE_SET_IO_DISABLE, &c->flags)) + bcache_device_stop(c->devices[i]); } else { bcache_device_stop(c->devices[i]); }