From patchwork Sat Jan 27 14:24:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 10187705 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 2FCB560384 for ; Sun, 28 Jan 2018 06:39:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 016742888F for ; Sun, 28 Jan 2018 06:39:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D387628894; Sun, 28 Jan 2018 06:39:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.9 required=2.0 tests=BAYES_00, DATE_IN_PAST_12_24, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 30CC82888F for ; Sun, 28 Jan 2018 06:39:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751298AbeA1GjR (ORCPT ); Sun, 28 Jan 2018 01:39:17 -0500 Received: from mx2.suse.de ([195.135.220.15]:50105 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751159AbeA1GjQ (ORCPT ); Sun, 28 Jan 2018 01:39:16 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 05356AED1; Sun, 28 Jan 2018 06:35:47 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, Coly Li , Junhui Tang , Michael Lyle Subject: [PATCH v4 08/13] bcache: stop all attached bcache devices for a retired cache set Date: Sat, 27 Jan 2018 22:24:01 +0800 Message-Id: <20180127142406.89741-9-colyli@suse.de> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180127142406.89741-1-colyli@suse.de> References: <20180127142406.89741-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When there are too many I/O errors on cache device, current bcache code will retire the whole cache set, and detach all bcache devices. But the detached bcache devices are not stopped, which is problematic when bcache is in writeback mode. If the retired cache set has dirty data of backing devices, continue writing to bcache device will write to backing device directly. If the LBA of write request has a dirty version cached on cache device, next time when the cache device is re-registered and backing device re-attached to it again, the stale dirty data on cache device will be written to backing device, and overwrite latest directly written data. This situation causes a quite data corruption. This patch checkes whether cache_set->io_disable is true in __cache_set_unregister(). If cache_set->io_disable is true, it means cache set is unregistering by too many I/O errors, then all attached bcache devices will be stopped as well. If cache_set->io_disable is not true, it means __cache_set_unregister() is triggered by writing 1 to sysfs file /sys/fs/bcache//bcache/stop. This is an exception because users do it explicitly, this patch keeps existing behavior and does not stop any bcache device. Even the failed cache device has no dirty data, stopping bcache device is still a desired behavior by many Ceph and data base users. Then their application will report I/O errors due to disappeared bcache device, and operation people will know the cache device is broken or disconnected. Changelog: v2: add reviewed-by from Hannes. v1: initial version for review. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke Cc: Junhui Tang Cc: Michael Lyle --- drivers/md/bcache/super.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 4204d75aee7b..97e3bb8e1aee 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1478,6 +1478,14 @@ static void __cache_set_unregister(struct closure *cl) dc = container_of(c->devices[i], struct cached_dev, disk); bch_cached_dev_detach(dc); + /* + * If we come here by too many I/O errors, + * bcache device should be stopped too, to + * keep data consistency on cache and + * backing devices. + */ + if (test_bit(CACHE_SET_IO_DISABLE, &c->flags)) + bcache_device_stop(c->devices[i]); } else { bcache_device_stop(c->devices[i]); }