From patchwork Mon Jan 8 20:21:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Lyle X-Patchwork-Id: 10150555 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id C65CE6056F for ; Mon, 8 Jan 2018 20:22:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D1C201FFDA for ; Mon, 8 Jan 2018 20:22:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C68BC22F3E; Mon, 8 Jan 2018 20:22:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C99E2287EF for ; Mon, 8 Jan 2018 20:22:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932936AbeAHUWA (ORCPT ); Mon, 8 Jan 2018 15:22:00 -0500 Received: from mail-pf0-f194.google.com ([209.85.192.194]:38936 "EHLO mail-pf0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757227AbeAHUV4 (ORCPT ); Mon, 8 Jan 2018 15:21:56 -0500 Received: by mail-pf0-f194.google.com with SMTP id e11so3061039pff.6 for ; Mon, 08 Jan 2018 12:21:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lyle-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=29Rp8klUcNaQGTuGRN67Ly4hFQZRgFgde1krTROb3bM=; b=FH1IJ3339KDlkJphTWByWnoZ4BASBfK9BDBxbvRy9c8oe98S4ydQXHe8lZWrtunqQl FZcvxcBXbTWwK1+kr91RTqIVr8qPR5YS6BSVA8gdqmd6jS97Q1vj7AoXoamsWHTuRdhQ uRrremSTIV3DHIOsL+TSRYNQiLgjIJxxiQRfLd7h4o674WyXzVYzPOLEXT++sHooFDVj owbXNl99+vY2j5yi4tMPn+mhx0PJb3iNTJ7PjfMC62pPEP0ChX/MUP3Nur1zJ5tDUYQA udWJped/KWz0lESMDf87/ommoi7zCEwMua4Ynby5481Qn+p57uK8IQsNPrAufLvhT+r4 qtwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=29Rp8klUcNaQGTuGRN67Ly4hFQZRgFgde1krTROb3bM=; b=CxLiSbxBgD4qjZ4Y/RRn0HiOmqRqsfGsEJrUJ46BLl7OqmPjHQVdBQ8CPMlyygesjL +BaQgUhZIJRAeWfsJaVVvr26AboBorkEhf+NJdbKSZFFL3HGxNEB7mwng5Ze0T33d3d1 KSbp9Tr+v1pAQbuqEaX6hJZLiRt7s/i6fgKrOUL6hPBmdxbM7NLdajVnyHGqgPtI7Wke ws67vvZhYvBNUsEwzYSoE759pyx5gOZWM/HtQJ6WtpmpLTAqTfNk/3a7lFLR8d0Al2iX uHwOobgkWqczaoYmaADUCqKEclJTWBwPJI7HJ4QriBNl1ubUufj84yy/quDMeD5DM8OW yjQg== X-Gm-Message-State: AKGB3mILvdWF1erN1z2KHCHoTIHprQwCVCj073vuMxQ4+gsRX8Xwhccr unq/afWNiaIL9XxWG1I/8RUApA== X-Google-Smtp-Source: ACJfBouhppW9wZu4y9BOBmMBgduAbsuXJNa0U41lPK72dncFexyzxBPKigN9aZtRSEK8/7TYGv/jEA== X-Received: by 10.98.226.18 with SMTP id a18mr11685414pfi.70.1515442915641; Mon, 08 Jan 2018 12:21:55 -0800 (PST) Received: from midnight.lan (2600-6c52-6200-383d-a0f8-4aea-fac9-9f39.dhcp6.chtrptr.net. [2600:6c52:6200:383d:a0f8:4aea:fac9:9f39]) by smtp.gmail.com with ESMTPSA id o70sm28540227pfk.79.2018.01.08.12.21.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 08 Jan 2018 12:21:54 -0800 (PST) From: Michael Lyle To: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org Cc: axboe@fb.com, Michael Lyle Subject: [416 PATCH 13/13] bcache: fix writeback target calc on large devices Date: Mon, 8 Jan 2018 12:21:30 -0800 Message-Id: <20180108202130.31303-14-mlyle@lyle.org> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20180108202130.31303-1-mlyle@lyle.org> References: <20180108202130.31303-1-mlyle@lyle.org> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Bcache needs to scale the dirty data in the cache over the multiple backing disks in order to calculate writeback rates for each. The previous code did this by multiplying the target number of dirty sectors by the backing device size, and expected it to fit into a uint64_t; this blows up on relatively small backing devices. The new approach figures out the bdev's share in 16384ths of the overall cached data. This is chosen to cope well when bdevs drastically vary in size and to ensure that bcache can cross the petabyte boundary for each backing device. This has been improved based on Tang Junhui's feedback to ensure that every device gets a share of dirty data, no matter how small it is compared to the total backing pool. The existing mechanism is very limited; this is purely a bug fix to remove limits on volume size. However, there still needs to be change to make this "fair" over many volumes where some are idle. Reported-by: Jack Douglas Signed-off-by: Michael Lyle Reviewed-by: Tang Junhui --- drivers/md/bcache/writeback.c | 31 +++++++++++++++++++++++++++---- drivers/md/bcache/writeback.h | 7 +++++++ 2 files changed, 34 insertions(+), 4 deletions(-) diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index 31b0a292a619..51306a19ab03 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -18,17 +18,39 @@ #include /* Rate limiting */ - -static void __update_writeback_rate(struct cached_dev *dc) +static uint64_t __calc_target_rate(struct cached_dev *dc) { struct cache_set *c = dc->disk.c; + + /* + * This is the size of the cache, minus the amount used for + * flash-only devices + */ uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size - bcache_flash_devs_sectors_dirty(c); + + /* + * Unfortunately there is no control of global dirty data. If the + * user states that they want 10% dirty data in the cache, and has, + * e.g., 5 backing volumes of equal size, we try and ensure each + * backing volume uses about 2% of the cache for dirty data. + */ + uint32_t bdev_share = + div64_u64(bdev_sectors(dc->bdev) << WRITEBACK_SHARE_SHIFT, + c->cached_dev_sectors); + uint64_t cache_dirty_target = div_u64(cache_sectors * dc->writeback_percent, 100); - int64_t target = div64_u64(cache_dirty_target * bdev_sectors(dc->bdev), - c->cached_dev_sectors); + /* Ensure each backing dev gets at least one dirty share */ + if (bdev_share < 1) + bdev_share = 1; + + return (cache_dirty_target * bdev_share) >> WRITEBACK_SHARE_SHIFT; +} + +static void __update_writeback_rate(struct cached_dev *dc) +{ /* * PI controller: * Figures out the amount that should be written per second. @@ -49,6 +71,7 @@ static void __update_writeback_rate(struct cached_dev *dc) * This acts as a slow, long-term average that is not subject to * variations in usage like the p term. */ + int64_t target = __calc_target_rate(dc); int64_t dirty = bcache_dev_sectors_dirty(&dc->disk); int64_t error = dirty - target; int64_t proportional_scaled = diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h index f102b1f9bc51..66f1c527fa24 100644 --- a/drivers/md/bcache/writeback.h +++ b/drivers/md/bcache/writeback.h @@ -8,6 +8,13 @@ #define MAX_WRITEBACKS_IN_PASS 5 #define MAX_WRITESIZE_IN_PASS 5000 /* *512b */ +/* + * 14 (16384ths) is chosen here as something that each backing device + * should be a reasonable fraction of the share, and not to blow up + * until individual backing devices are a petabyte. + */ +#define WRITEBACK_SHARE_SHIFT 14 + static inline uint64_t bcache_dev_sectors_dirty(struct bcache_device *d) { uint64_t i, ret = 0;