From patchwork Fri Jan 5 21:17:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Lyle X-Patchwork-Id: 10147093 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id DCD9560155 for ; Fri, 5 Jan 2018 21:17:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5F19528931 for ; Fri, 5 Jan 2018 21:17:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 541BE28936; Fri, 5 Jan 2018 21:17:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 37EEE28931 for ; Fri, 5 Jan 2018 21:17:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752992AbeAEVRq (ORCPT ); Fri, 5 Jan 2018 16:17:46 -0500 Received: from mail-pf0-f196.google.com ([209.85.192.196]:40540 "EHLO mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752849AbeAEVRp (ORCPT ); Fri, 5 Jan 2018 16:17:45 -0500 Received: by mail-pf0-f196.google.com with SMTP id v26so2669082pfl.7 for ; Fri, 05 Jan 2018 13:17:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lyle-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id; bh=jKcluHcOlRBkJ0mcKfz8idzRZBwfE6ga3VOzNnvNjP4=; b=Cg9NRCevN7w6hXQbFgOYGsVwaTZPtZj8zkxXI7U4rpKUlILOywfm7V5e+cmuF82sme Og0YAwS7IBsQ4SEVpIMgckpNyjhVFXUVX/TFpktw473amposLHlof243N/L1pgR0S4LU SVzxnh4p6c5aiRKWM5kItHw2WPoUvOU+Blo7Bej/vhNXYrUV2bmLJwYIdgT0C/ns7wj+ No5lyWM1u7yQJgNZkhGSnT05pAOzn+InCRmwlCriOtdg/9aure3vQKFqz3Dx/4O9afMM igTbdPFQodP42qvNfOIKNin16FiT95ikztVrQPQhYlcR8UFzKTwMtDE+x+77GMTGwmbl ZdeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=jKcluHcOlRBkJ0mcKfz8idzRZBwfE6ga3VOzNnvNjP4=; b=hsvdNc2t0zLLz1eUKoa0l4iXBTzyQFilmW3zWSAVQKTK0aji3GYt2iVgC/3xiL/Fck cIjW1ny3KQVlbvcM67Jm/AyvDr60YeFD14wh8swu+bT+ev3rZbmFWQsqIK0UNicXE18i IYuQvAPp26I0E6EYNVMUrLO6xN4dPLm9syNaZ3yMewrgohaqevH0F1ZqvQlnRHCTz2aX sVEYsC35B74TpkorSw57vSJ0Wca9AxYw5rc64mUjyK0O6oOA1wtGh5z+0gAhIgXwY2yG 9O6QkTzv4U3oUJ83FFvhFSXD0vVkxwz7IRQcmQjR63o4JpPT8iB2HsKj5eKq9vVMksm1 CWZA== X-Gm-Message-State: AKGB3mKmvGwbtlDUaIGbnowafVNqdiZMvSE1w6sFN+kVPIOm6I9DwKHL rJXSIvMILKJRYMowVUXgAh2ZPZfO X-Google-Smtp-Source: ACJfBovPI+yV3zRA7s0ECxI3kAiUEuyETMpxvBi6IOL3PMc9hpiEvjWnWYMFfeKU8brPAZ7qJMiO7w== X-Received: by 10.99.115.91 with SMTP id d27mr169109pgn.71.1515187064457; Fri, 05 Jan 2018 13:17:44 -0800 (PST) Received: from midnight.lan (2600-6c52-6200-383d-a0f8-4aea-fac9-9f39.dhcp6.chtrptr.net. [2600:6c52:6200:383d:a0f8:4aea:fac9:9f39]) by smtp.gmail.com with ESMTPSA id i85sm14799461pfi.54.2018.01.05.13.17.43 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 05 Jan 2018 13:17:43 -0800 (PST) From: Michael Lyle To: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org Cc: Michael Lyle Subject: [PATCH v3] bcache: fix writeback target calc on large devices Date: Fri, 5 Jan 2018 13:17:39 -0800 Message-Id: <20180105211739.27305-1-mlyle@lyle.org> X-Mailer: git-send-email 2.14.1 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Bcache needs to scale the dirty data in the cache over the multiple backing disks in order to calculate writeback rates for each. The previous code did this by multiplying the target number of dirty sectors by the backing device size, and expected it to fit into a uint64_t; this blows up on relatively small backing devices. The new approach figures out the bdev's share in 16384ths of the overall cached data. This is chosen to cope well when bdevs drastically vary in size and to ensure that bcache can cross the petabyte boundary for each backing device. This has been improved based on Tang Junhui's feedback to ensure that every device gets a share of dirty data, no matter how small it is compared to the total backing pool. Reported-by: Jack Douglas Signed-off-by: Michael Lyle --- drivers/md/bcache/writeback.c | 35 +++++++++++++++++++++++++++++++---- 1 file changed, 31 insertions(+), 4 deletions(-) diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index 61f24c04cebd..c7e35180091e 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -18,17 +18,43 @@ #include /* Rate limiting */ - -static void __update_writeback_rate(struct cached_dev *dc) +static uint64_t __calc_target_rate(struct cached_dev *dc) { struct cache_set *c = dc->disk.c; + + /* + * This is the size of the cache, minus the amount used for + * flash-only devices + */ uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size - bcache_flash_devs_sectors_dirty(c); + + /* + * Unfortunately there is no control of global dirty data. If the + * user states that they want 10% dirty data in the cache, and has, + * e.g., 5 backing volumes of equal size, we try and ensure each + * backing volume uses about 2% of the cache. + * + * 16384 is chosen here as something that each backing device should + * be a reasonable fraction of the share, and not to blow up until + * individual backing devices are a petabyte. + */ + uint32_t bdev_share_per16k = + div64_u64(16384 * bdev_sectors(dc->bdev), + c->cached_dev_sectors); + uint64_t cache_dirty_target = div_u64(cache_sectors * dc->writeback_percent, 100); - int64_t target = div64_u64(cache_dirty_target * bdev_sectors(dc->bdev), - c->cached_dev_sectors); + /* Ensure each backing dev gets at least 1/16384th dirty share */ + if (bdev_share_per16k < 1) + bdev_share_per16k = 1; + + return div_u64(cache_dirty_target * bdev_share_per16k, 16384); +} + +static void __update_writeback_rate(struct cached_dev *dc) +{ /* * PI controller: * Figures out the amount that should be written per second. @@ -49,6 +75,7 @@ static void __update_writeback_rate(struct cached_dev *dc) * This acts as a slow, long-term average that is not subject to * variations in usage like the p term. */ + int64_t target = __calc_target_rate(dc); int64_t dirty = bcache_dev_sectors_dirty(&dc->disk); int64_t error = dirty - target; int64_t proportional_scaled =