From patchwork Wed Feb 7 19:41:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Lyle X-Patchwork-Id: 10205893 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B8E01602D8 for ; Wed, 7 Feb 2018 19:42:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A98AA29104 for ; Wed, 7 Feb 2018 19:42:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9E4272910A; Wed, 7 Feb 2018 19:42:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5153629104 for ; Wed, 7 Feb 2018 19:42:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754388AbeBGTmE (ORCPT ); Wed, 7 Feb 2018 14:42:04 -0500 Received: from mail-pl0-f67.google.com ([209.85.160.67]:38992 "EHLO mail-pl0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754320AbeBGTmC (ORCPT ); Wed, 7 Feb 2018 14:42:02 -0500 Received: by mail-pl0-f67.google.com with SMTP id o13-v6so670267pli.6 for ; Wed, 07 Feb 2018 11:42:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lyle-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=R/Lf35Nlven0mF1TZdJniK5FuDJR1oZAI+en4T8/WX0=; b=1tpEZL+3JztcowMaUieEjAzozlE8LrBLZN1HleIApveWTvOrOERmLIj6a2AqEjO5c1 W9ykwIGMaEEaVca5WEXrj9ty17rYND/z1lrx5y6Npe/GPoaFXxwpMwPfn6AuDpXZxG9w MRgjX+x1zSCdJZWy3J7ylcN5epZ2n6m337NW6P++1bMlBoqJXv7es+9rsNGHdxeXSpKG RCqbnKbZO4HyexVaIb0ULdV4n5dEEK//P1T7gfFGEe/1PtbNlfGV6nx/z1ghAIpSQQ03 M0cttdZApMw+GpDAzkWTIK84OYpKvmj7z+m8qRduxJouycrzGXVZLbQkSCxWPNdwkaqk t1lQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=R/Lf35Nlven0mF1TZdJniK5FuDJR1oZAI+en4T8/WX0=; b=dCZDlmPBbWeDIqPjVeRie+F2C9HyDhjAtYRfM8yOnKfUJhPKbWzqQ2e2yYeIzk+62B 3A/3Rfj+qcHkWylP5boS2Oh2XILZwDXOlArrBKd8clG+0u4HsVLadsJgo6lQsUr9red0 qTPY+S1RlrjhCVIW+k2MpsYmBbuCm4C3lDSjQaycj0FP1Qoq+9az/lNddHXye4lWCCNj 8UW+15KM0Q1AM1J1cTb5Ba+TZUvoAIB4K2cpB/A24xmHmXXZxssjV8++0Sou43MZ0Alx +eUIeJQ0hJlAij9Myie3duH5ZPWFqBTDz/EfvocOMLxyUsuUuw38xg541gU1saWhSYOE nccQ== X-Gm-Message-State: APf1xPB0mYlkrrZsWiuoxIn/VvrCF3WUoPGL2mChPIxnuN8byoFzg1L/ BTLRDRLY8aMxhBukdV/vQPhImg== X-Google-Smtp-Source: AH8x226M2BOCY8Pc+dwtuw5oQqkHuX5fhvu6WTSt21Ip4+7FxFF3FIjfuZNRdiJB0mHtvtV5dllFzw== X-Received: by 2002:a17:902:4083:: with SMTP id c3-v6mr6787296pld.90.1518032521698; Wed, 07 Feb 2018 11:42:01 -0800 (PST) Received: from midnight.lan (2600-6c52-6200-09b7-0000-0000-0000-0d66.dhcp6.chtrptr.net. [2600:6c52:6200:9b7::d66]) by smtp.gmail.com with ESMTPSA id q24sm6136554pgn.46.2018.02.07.11.42.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 07 Feb 2018 11:42:01 -0800 (PST) From: Michael Lyle To: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org Cc: axboe@fb.com, Coly Li , Junhui Tang Subject: [PATCH 4/8] bcache: set error_limit correctly Date: Wed, 7 Feb 2018 11:41:42 -0800 Message-Id: <20180207194146.5095-5-mlyle@lyle.org> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20180207194146.5095-1-mlyle@lyle.org> References: <20180207194146.5095-1-mlyle@lyle.org> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Coly Li Struct cache uses io_errors for two purposes, - Error decay: when cache set error_decay is set, io_errors is used to generate a small piece of delay when I/O error happens. - I/O errors counter: in order to generate big enough value for error decay, I/O errors counter value is stored by left shifting 20 bits (a.k.a IO_ERROR_SHIFT). In function bch_count_io_errors(), if I/O errors counter reaches cache set error limit, bch_cache_set_error() will be called to retire the whold cache set. But current code is problematic when checking the error limit, see the following code piece from bch_count_io_errors(), 90 if (error) { 91 char buf[BDEVNAME_SIZE]; 92 unsigned errors = atomic_add_return(1 << IO_ERROR_SHIFT, 93 &ca->io_errors); 94 errors >>= IO_ERROR_SHIFT; 95 96 if (errors < ca->set->error_limit) 97 pr_err("%s: IO error on %s, recovering", 98 bdevname(ca->bdev, buf), m); 99 else 100 bch_cache_set_error(ca->set, 101 "%s: too many IO errors %s", 102 bdevname(ca->bdev, buf), m); 103 } At line 94, errors is right shifting IO_ERROR_SHIFT bits, now it is real errors counter to compare at line 96. But ca->set->error_limit is initia- lized with an amplified value in bch_cache_set_alloc(), 1545 c->error_limit = 8 << IO_ERROR_SHIFT; It means by default, in bch_count_io_errors(), before 8<<20 errors happened bch_cache_set_error() won't be called to retire the problematic cache device. If the average request size is 64KB, it means bcache won't handle failed device until 512GB data is requested. This is too large to be an I/O threashold. So I believe the correct error limit should be much less. This patch sets default cache set error limit to 8, then in bch_count_io_errors() when errors counter reaches 8 (if it is default value), function bch_cache_set_error() will be called to retire the whole cache set. This patch also removes bits shifting when store or show io_error_limit value via sysfs interface. Nowadays most of SSDs handle internal flash failure automatically by LBA address re-indirect mapping. If an I/O error can be observed by upper layer code, it will be a notable error because that SSD can not re-indirect map the problematic LBA address to an available flash block. This situation indicates the whole SSD will be failed very soon. Therefore setting 8 as the default io error limit value makes sense, it is enough for most of cache devices. Changelog: v2: add reviewed-by from Hannes. v1: initial version for review. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke Reviewed-by: Tang Junhui Reviewed-by: Michael Lyle Cc: Junhui Tang --- drivers/md/bcache/bcache.h | 1 + drivers/md/bcache/super.c | 2 +- drivers/md/bcache/sysfs.c | 4 ++-- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index a857eb3c10de..b8c2e1bef1f1 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -666,6 +666,7 @@ struct cache_set { ON_ERROR_UNREGISTER, ON_ERROR_PANIC, } on_error; +#define DEFAULT_IO_ERROR_LIMIT 8 unsigned error_limit; unsigned error_decay; diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 133b81225ea9..e29150ec17e7 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1553,7 +1553,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb) c->congested_read_threshold_us = 2000; c->congested_write_threshold_us = 20000; - c->error_limit = 8 << IO_ERROR_SHIFT; + c->error_limit = DEFAULT_IO_ERROR_LIMIT; return c; err: diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c index 46e7a6b3e7c7..c524305cc9a7 100644 --- a/drivers/md/bcache/sysfs.c +++ b/drivers/md/bcache/sysfs.c @@ -568,7 +568,7 @@ SHOW(__bch_cache_set) /* See count_io_errors for why 88 */ sysfs_print(io_error_halflife, c->error_decay * 88); - sysfs_print(io_error_limit, c->error_limit >> IO_ERROR_SHIFT); + sysfs_print(io_error_limit, c->error_limit); sysfs_hprint(congested, ((uint64_t) bch_get_congested(c)) << 9); @@ -668,7 +668,7 @@ STORE(__bch_cache_set) } if (attr == &sysfs_io_error_limit) - c->error_limit = strtoul_or_return(buf) << IO_ERROR_SHIFT; + c->error_limit = strtoul_or_return(buf); /* See count_io_errors() for why 88 */ if (attr == &sysfs_io_error_halflife)