From patchwork Thu Dec 28 00:47:42 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Lyle X-Patchwork-Id: 10134337 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 2094D605B4 for ; Thu, 28 Dec 2017 00:48:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 125622CE1C for ; Thu, 28 Dec 2017 00:48:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0704F2CFCC; Thu, 28 Dec 2017 00:48:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4D9832CE1C for ; Thu, 28 Dec 2017 00:48:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752837AbdL1AsH (ORCPT ); Wed, 27 Dec 2017 19:48:07 -0500 Received: from mail-pl0-f65.google.com ([209.85.160.65]:39565 "EHLO mail-pl0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752641AbdL1AsG (ORCPT ); Wed, 27 Dec 2017 19:48:06 -0500 Received: by mail-pl0-f65.google.com with SMTP id bi12so20253195plb.6 for ; Wed, 27 Dec 2017 16:48:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lyle-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id; bh=HDuAHjDNHgTLBk2yHfW+sznzNZi2YpYnuu5sCOwz0LQ=; b=QUntIRonHjCeZr41abnLSOT8aw0e74Ci9p9PGQPPj1X2RW3ZSCHyLyMK3rmnhYl7pH okEEbdHATe8wgMa0eXJku3TpWXfckAReeD+U+c3LqNHNAz7r4mYxxV9vqZ3fCf0pYgHR QBO3QQ8tKdPbwDyACRnL5z6tkdMons7HgY5iGBeHvgpIMICUdJFVsICE/iuWs+6+VQDy QVG2+totpHC9jyV4e8T13m1FsaVrikP4WlEwMQ5A0TWE3UIX7sDDaHF7McbCszJ3+NVh jjckKFlPeb1AIyExdePxDcssgZf2RBPDLrxsF8WcHMk3Toh8+IyzQaUoyMW/+f3Zl855 BIoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=HDuAHjDNHgTLBk2yHfW+sznzNZi2YpYnuu5sCOwz0LQ=; b=UR9nytVZLP6o7sKpKpDWso55QTX0vcmHnsH81e8OMQcfmqPL2DYlExcxyhLUmL2Gvz uqm8XHYZ4Qr0iE6eVdZ6HFe/G0jJXKK6MTY63/c5/nSWGmn8SfpxaaplO0JpbBIzCyp8 u6k9wRsWDiJt2qjadQqLgI9uKrYFNIbDNPZZm4v/HyFad/W0UbhlKxFk72CM7OkUran5 OArBC0UZM76/LPHIiNfDtKHUUVX1/K0VLacNABeKqeNVMRRlYZRxh7PENaABsfQvnQM2 idfjECKQpdS7Ufwt82nGbs5LdQr3b73dkcICfSck8Q5Cz0o2dvVJTra56ID2Njl4h/Pe eBsw== X-Gm-Message-State: AKGB3mJMtPM/fjadNd46XKD6pPtpCMtn9FFsS1ZRONiTye1WeftymUgW Biop1KgacPPnvGZIwaN65hyXkYqw X-Google-Smtp-Source: ACJfBov5LAR+ziHNk8IwDIyr2ju3bmnNT5Ijys1RpVV4W8WklIQGwMtSAFtog4z2gbmrnsHlo1Wrdw== X-Received: by 10.84.137.169 with SMTP id 38mr29724729pln.36.1514422085507; Wed, 27 Dec 2017 16:48:05 -0800 (PST) Received: from midnight.lan (2600-6c52-6200-383d-a0f8-4aea-fac9-9f39.dhcp6.chtrptr.net. [2600:6c52:6200:383d:a0f8:4aea:fac9:9f39]) by smtp.gmail.com with ESMTPSA id y5sm68236061pfa.128.2017.12.27.16.48.04 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Dec 2017 16:48:04 -0800 (PST) From: Michael Lyle To: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org Cc: Michael Lyle Subject: [for-416 PATCH 1/3] bcache: writeback: collapse contiguous IO better Date: Wed, 27 Dec 2017 16:47:42 -0800 Message-Id: <20171228004744.3522-1-mlyle@lyle.org> X-Mailer: git-send-email 2.14.1 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Previously, there was some logic that attempted to immediately issue writeback of backing-contiguous blocks when the writeback rate was fast. The previous logic did not have any limits on the aggregate size it would issue, nor the number of keys it would combine at once. It would also discard the chance to do a contiguous write when the writeback rate was low-- e.g. at "background" writeback of target rate = 8, it would not combine two adjacent 4k writes and would instead seek the disk twice. This patch imposes limits and explicitly understands the size of contiguous I/O during issue. It also will combine contiguous I/O in all circumstances, not just when writeback is requested to be relatively fast. It is a win on its own, but also lays the groundwork for skip writes to short keys to make the I/O more sequential/contiguous. It also gets ready to start using blk_*_plug, and to allow issuing of non-contig I/O in parallel if requested by the user (to make use of disk throughput benefits available from higher queue depths). This patch fixes a previous version where the contiguous information was not calculated properly. Signed-off-by: Michael Lyle --- drivers/md/bcache/bcache.h | 6 -- drivers/md/bcache/writeback.c | 133 ++++++++++++++++++++++++++++++------------ drivers/md/bcache/writeback.h | 3 + 3 files changed, 98 insertions(+), 44 deletions(-) diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index 843877e017e1..1784e50eb857 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -323,12 +323,6 @@ struct cached_dev { struct bch_ratelimit writeback_rate; struct delayed_work writeback_rate_update; - /* - * Internal to the writeback code, so read_dirty() can keep track of - * where it's at. - */ - sector_t last_read; - /* Limit number of writeback bios in flight */ struct semaphore in_flight; struct task_struct *writeback_thread; diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index f3d680c907ae..4e4836c6e7cf 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -248,10 +248,25 @@ static void read_dirty_submit(struct closure *cl) continue_at(cl, write_dirty, io->dc->writeback_write_wq); } +static inline bool keys_contiguous(struct cached_dev *dc, + struct keybuf_key *first, struct keybuf_key *second) +{ + if (KEY_INODE(&second->key) != KEY_INODE(&first->key)) + return false; + + if (KEY_OFFSET(&second->key) != + KEY_OFFSET(&first->key) + KEY_SIZE(&first->key)) + return false; + + return true; +} + static void read_dirty(struct cached_dev *dc) { unsigned delay = 0; - struct keybuf_key *w; + struct keybuf_key *next, *keys[MAX_WRITEBACKS_IN_PASS], *w; + size_t size; + int nk, i; struct dirty_io *io; struct closure cl; @@ -262,45 +277,87 @@ static void read_dirty(struct cached_dev *dc) * mempools. */ - while (!kthread_should_stop()) { - - w = bch_keybuf_next(&dc->writeback_keys); - if (!w) - break; - - BUG_ON(ptr_stale(dc->disk.c, &w->key, 0)); - - if (KEY_START(&w->key) != dc->last_read || - jiffies_to_msecs(delay) > 50) - while (!kthread_should_stop() && delay) - delay = schedule_timeout_interruptible(delay); - - dc->last_read = KEY_OFFSET(&w->key); - - io = kzalloc(sizeof(struct dirty_io) + sizeof(struct bio_vec) - * DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS), - GFP_KERNEL); - if (!io) - goto err; - - w->private = io; - io->dc = dc; - - dirty_init(w); - bio_set_op_attrs(&io->bio, REQ_OP_READ, 0); - io->bio.bi_iter.bi_sector = PTR_OFFSET(&w->key, 0); - bio_set_dev(&io->bio, PTR_CACHE(dc->disk.c, &w->key, 0)->bdev); - io->bio.bi_end_io = read_dirty_endio; - - if (bio_alloc_pages(&io->bio, GFP_KERNEL)) - goto err_free; - - trace_bcache_writeback(&w->key); + next = bch_keybuf_next(&dc->writeback_keys); + + while (!kthread_should_stop() && next) { + size = 0; + nk = 0; + + do { + BUG_ON(ptr_stale(dc->disk.c, &next->key, 0)); + + /* + * Don't combine too many operations, even if they + * are all small. + */ + if (nk >= MAX_WRITEBACKS_IN_PASS) + break; + + /* + * If the current operation is very large, don't + * further combine operations. + */ + if (size >= MAX_WRITESIZE_IN_PASS) + break; + + /* + * Operations are only eligible to be combined + * if they are contiguous. + * + * TODO: add a heuristic willing to fire a + * certain amount of non-contiguous IO per pass, + * so that we can benefit from backing device + * command queueing. + */ + if (nk != 0 && !keys_contiguous(dc, keys[nk-1], next)) + break; + + size += KEY_SIZE(&next->key); + keys[nk++] = next; + } while ((next = bch_keybuf_next(&dc->writeback_keys))); + + /* Now we have gathered a set of 1..5 keys to write back. */ + + for (i = 0; i < nk; i++) { + w = keys[i]; + + io = kzalloc(sizeof(struct dirty_io) + + sizeof(struct bio_vec) * + DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS), + GFP_KERNEL); + if (!io) + goto err; + + w->private = io; + io->dc = dc; + + dirty_init(w); + bio_set_op_attrs(&io->bio, REQ_OP_READ, 0); + io->bio.bi_iter.bi_sector = PTR_OFFSET(&w->key, 0); + bio_set_dev(&io->bio, + PTR_CACHE(dc->disk.c, &w->key, 0)->bdev); + io->bio.bi_end_io = read_dirty_endio; + + if (bio_alloc_pages(&io->bio, GFP_KERNEL)) + goto err_free; + + trace_bcache_writeback(&w->key); + + down(&dc->in_flight); + + /* We've acquired a semaphore for the maximum + * simultaneous number of writebacks; from here + * everything happens asynchronously. + */ + closure_call(&io->cl, read_dirty_submit, NULL, &cl); + } - down(&dc->in_flight); - closure_call(&io->cl, read_dirty_submit, NULL, &cl); + delay = writeback_delay(dc, size); - delay = writeback_delay(dc, KEY_SIZE(&w->key)); + while (!kthread_should_stop() && delay) { + schedule_timeout_interruptible(delay); + delay = writeback_delay(dc, 0); + } } if (0) { diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h index a9e3ffb4b03c..6d26927267f8 100644 --- a/drivers/md/bcache/writeback.h +++ b/drivers/md/bcache/writeback.h @@ -5,6 +5,9 @@ #define CUTOFF_WRITEBACK 40 #define CUTOFF_WRITEBACK_SYNC 70 +#define MAX_WRITEBACKS_IN_PASS 5 +#define MAX_WRITESIZE_IN_PASS 5000 /* *512b */ + static inline uint64_t bcache_dev_sectors_dirty(struct bcache_device *d) { uint64_t i, ret = 0;