From patchwork Sun Feb 7 15:24:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12073057 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7030FC433DB for ; Sun, 7 Feb 2021 15:28:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 40C9960234 for ; Sun, 7 Feb 2021 15:28:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230342AbhBGP2B (ORCPT ); Sun, 7 Feb 2021 10:28:01 -0500 Received: from mx2.suse.de ([195.135.220.15]:35784 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230415AbhBGPZX (ORCPT ); Sun, 7 Feb 2021 10:25:23 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 7070EAF4C; Sun, 7 Feb 2021 15:24:41 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, jianpeng.ma@intel.com, qiaowei.ren@intel.com, Coly Li Subject: [PATCH 1/6] bcache: use bucket index for SET_GC_MARK() in bch_btree_gc_finish() Date: Sun, 7 Feb 2021 23:24:18 +0800 Message-Id: <20210207152423.70697-2-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210207152423.70697-1-colyli@suse.de> References: <20210207152423.70697-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently the meta data bucket locations on cache device are reserved after the meta data stored on NVDIMM pages, for the meta data layout consistentcy temporarily. So these buckets are still marked as meta data by SET_GC_MARK() in bch_btree_gc_finish(). When BCH_FEATURE_INCOMPAT_NVDIMM_META is set, the sb.d[] stores linear address of NVDIMM pages and not bucket index anymore. Therefore we should avoid to find bucket index from sb.d[], and directly use bucket index from ca->sb.first_bucket to (ca->sb.first_bucket + ca->sb.njournal_bucketsi) for setting the gc mark of journal bucket. Signed-off-by: Coly Li Cc: Jianpeng Ma Cc: Qiaowei Ren --- drivers/md/bcache/btree.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 910df242c83d..9da91577b358 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -1759,8 +1759,10 @@ static void bch_btree_gc_finish(struct cache_set *c) ca = c->cache; ca->invalidate_needs_gc = 0; - for (k = ca->sb.d; k < ca->sb.d + ca->sb.keys; k++) - SET_GC_MARK(ca->buckets + *k, GC_MARK_METADATA); + /* Range [first_bucket, first_bucket + keys) is for journal buckets */ + for (i = ca->sb.first_bucket; + i < ca->sb.first_bucket + ca->sb.njournal_buckets; i++) + SET_GC_MARK(ca->buckets + i, GC_MARK_METADATA); for (k = ca->prio_buckets; k < ca->prio_buckets + prio_buckets(ca) * 2; k++) From patchwork Sun Feb 7 15:24:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12073059 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1715C433E6 for ; Sun, 7 Feb 2021 15:28:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7963E60234 for ; Sun, 7 Feb 2021 15:28:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229691AbhBGP2F (ORCPT ); Sun, 7 Feb 2021 10:28:05 -0500 Received: from mx2.suse.de ([195.135.220.15]:35846 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230439AbhBGPZm (ORCPT ); Sun, 7 Feb 2021 10:25:42 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 15D34AD29; Sun, 7 Feb 2021 15:25:00 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, jianpeng.ma@intel.com, qiaowei.ren@intel.com, Coly Li Subject: [PATCH 2/6] bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set Date: Sun, 7 Feb 2021 23:24:19 +0800 Message-Id: <20210207152423.70697-3-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210207152423.70697-1-colyli@suse.de> References: <20210207152423.70697-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This patch adds BCH_FEATURE_INCOMPAT_NVDIMM_META (value 0x0004) into the incompat feature set. When this bit is set by bcache-tools, it indicates bcache meta data should be stored on specific NVDIMM meta device. The bcache meta data mainly includes journal and btree nodes, when this bit is set in incompat feature set, bcache will ask the nvm-pages allocator for NVDIMM space to store the meta data. Signed-off-by: Coly Li Cc: Jianpeng Ma Cc: Qiaowei Ren --- drivers/md/bcache/features.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/md/bcache/features.h b/drivers/md/bcache/features.h index d1c8fd3977fc..333fb5efb6bd 100644 --- a/drivers/md/bcache/features.h +++ b/drivers/md/bcache/features.h @@ -17,11 +17,19 @@ #define BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET 0x0001 /* real bucket size is (1 << bucket_size) */ #define BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE 0x0002 +/* store bcache meta data on nvdimm */ +#define BCH_FEATURE_INCOMPAT_NVDIMM_META 0x0004 #define BCH_FEATURE_COMPAT_SUPP 0 #define BCH_FEATURE_RO_COMPAT_SUPP 0 +#ifdef CONFIG_BCACHE_NVM_PAGES +#define BCH_FEATURE_INCOMPAT_SUPP (BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET| \ + BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE| \ + BCH_FEATURE_INCOMPAT_NVDIMM_META) +#else #define BCH_FEATURE_INCOMPAT_SUPP (BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET| \ BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE) +#endif #define BCH_HAS_COMPAT_FEATURE(sb, mask) \ ((sb)->feature_compat & (mask)) @@ -89,6 +97,7 @@ static inline void bch_clear_feature_##name(struct cache_sb *sb) \ BCH_FEATURE_INCOMPAT_FUNCS(obso_large_bucket, OBSO_LARGE_BUCKET); BCH_FEATURE_INCOMPAT_FUNCS(large_bucket, LOG_LARGE_BUCKET_SIZE); +BCH_FEATURE_INCOMPAT_FUNCS(nvdimm_meta, NVDIMM_META); static inline bool bch_has_unknown_compat_features(struct cache_sb *sb) { From patchwork Sun Feb 7 15:24:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12073061 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41E44C433DB for ; Sun, 7 Feb 2021 15:28:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1D40664E49 for ; Sun, 7 Feb 2021 15:28:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230034AbhBGP2L (ORCPT ); Sun, 7 Feb 2021 10:28:11 -0500 Received: from mx2.suse.de ([195.135.220.15]:35892 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230101AbhBGPZq (ORCPT ); Sun, 7 Feb 2021 10:25:46 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 1B30FAF57; Sun, 7 Feb 2021 15:25:04 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, jianpeng.ma@intel.com, qiaowei.ren@intel.com, Coly Li Subject: [PATCH 3/6] bcache: initialize bcache journal for NVDIMM meta device Date: Sun, 7 Feb 2021 23:24:20 +0800 Message-Id: <20210207152423.70697-4-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210207152423.70697-1-colyli@suse.de> References: <20210207152423.70697-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org The nvm-pages allocator may store and index the NVDIMM pages allocated for bcache journal. This patch adds the initialization to store bcache journal space on NVDIMM pages if BCH_FEATURE_INCOMPAT_NVDIMM_META bit is set by bcache-tools. If BCH_FEATURE_INCOMPAT_NVDIMM_META is set, get_nvdimm_journal_space() will return the linear address of NVDIMM pages for bcache journal, - If there is previously allocated space, find it from nvm-pages owner list and return to bch_journal_init(). - If there is no previously allocated space, require a new NVDIMM range from the nvm-pages allocator, and return it to bch_journal_init(). And in bch_journal_init(), keys in sb.d[] store the corresponding linear address from NVDIMM into sb.d[i].ptr[0] where 'i' is the bucket index to iterate all journal buckets. Later when bcache journaling code stores the journaling jset, the target NVDIMM linear address stored (and updated) in sb.d[i].ptr[0] can be used directly in memory copy from DRAM pages into NVDIMM pages. Signed-off-by: Coly Li Cc: Jianpeng Ma Cc: Qiaowei Ren --- drivers/md/bcache/journal.c | 96 +++++++++++++++++++++++++++++++++++++ drivers/md/bcache/journal.h | 2 +- drivers/md/bcache/super.c | 9 ++-- 3 files changed, 100 insertions(+), 7 deletions(-) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index aefbdb7e003b..3a88092ea14c 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -9,6 +9,8 @@ #include "btree.h" #include "debug.h" #include "extents.h" +#include "nvm-pages.h" +#include "features.h" #include @@ -982,3 +984,97 @@ int bch_journal_alloc(struct cache_set *c) return 0; } + +static void *find_journal_nvm_base(struct bch_extent *list, struct cache *ca) +{ + void *ret = NULL; + struct bch_extent *cur, *next; + + next = list; + do { + cur = next; + /* Match journal area's nvdimm address */ + if (cur->kaddr == (void *)ca->sb.d[0]) { + ret = cur->kaddr; + break; + } + next = list_entry(cur->list.next, struct bch_extent, list); + } while (next != list); + + return ret; +} + +static void bch_release_nvm_extent_list(struct bch_extent *list) +{ + struct bch_extent *ext; + struct list_head *cur, *next; + + list_for_each_safe(cur, next, &list->list) { + ext = list_entry(cur, struct bch_extent, list); + kfree(ext); + } +} + +static void *get_nvdimm_journal_space(struct cache *ca) +{ + struct bch_extent *allocated_list = NULL; + void *ret = NULL; + + allocated_list = bch_get_allocated_pages(ca->sb.set_uuid); + if (allocated_list) { + ret = find_journal_nvm_base(allocated_list, ca); + bch_release_nvm_extent_list(allocated_list); + } + + if (!ret) { + int order = ilog2(ca->sb.bucket_size * ca->sb.njournal_buckets / + PAGE_SECTORS); + + ret = bch_nvm_alloc_pages(order, ca->sb.set_uuid); + memset(ret, 0, (1 << order) * PAGE_SIZE); + } + + return ret; +} + +static int __bch_journal_nvdimm_init(struct cache *ca) +{ + int i, ret = 0; + void *journal_nvm_base = NULL; + + journal_nvm_base = get_nvdimm_journal_space(ca); + if (!journal_nvm_base) { + pr_err("Failed to get journal space from nvdimm\n"); + ret = -1; + goto out; + } + + /* Iniialized and reloaded from on-disk super block already */ + if (ca->sb.d[0] != 0) + goto out; + + for (i = 0; i < ca->sb.keys; i++) + ca->sb.d[i] = + (u64)(journal_nvm_base + (ca->sb.bucket_size * i)); + +out: + return ret; +} + +int bch_journal_init(struct cache_set *c) +{ + int i, ret = 0; + struct cache *ca = c->cache; + + ca->sb.keys = clamp_t(int, ca->sb.nbuckets >> 7, + 2, SB_JOURNAL_BUCKETS); + + if (!bch_has_feature_nvdimm_meta(&ca->sb)) { + for (i = 0; i < ca->sb.keys; i++) + ca->sb.d[i] = ca->sb.first_bucket + i; + } else { + ret = __bch_journal_nvdimm_init(ca); + } + + return ret; +} diff --git a/drivers/md/bcache/journal.h b/drivers/md/bcache/journal.h index f2ea34d5f431..e3a7fa5a8fda 100644 --- a/drivers/md/bcache/journal.h +++ b/drivers/md/bcache/journal.h @@ -179,7 +179,7 @@ void bch_journal_mark(struct cache_set *c, struct list_head *list); void bch_journal_meta(struct cache_set *c, struct closure *cl); int bch_journal_read(struct cache_set *c, struct list_head *list); int bch_journal_replay(struct cache_set *c, struct list_head *list); - +int bch_journal_init(struct cache_set *c); void bch_journal_free(struct cache_set *c); int bch_journal_alloc(struct cache_set *c); diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 7fffb6ccfb0c..262dae3ef030 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -2071,14 +2071,11 @@ static int run_cache_set(struct cache_set *c) if (bch_journal_replay(c, &journal)) goto err; } else { - unsigned int j; - pr_notice("invalidating existing data\n"); - ca->sb.keys = clamp_t(int, ca->sb.nbuckets >> 7, - 2, SB_JOURNAL_BUCKETS); - for (j = 0; j < ca->sb.keys; j++) - ca->sb.d[j] = ca->sb.first_bucket + j; + err = "error initializing journal"; + if (bch_journal_init(c)) + goto err; bch_initial_gc_finish(c); From patchwork Sun Feb 7 15:24:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12073063 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4678AC433DB for ; Sun, 7 Feb 2021 15:28:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1A57A64E43 for ; Sun, 7 Feb 2021 15:28:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229970AbhBGP2M (ORCPT ); Sun, 7 Feb 2021 10:28:12 -0500 Received: from mx2.suse.de ([195.135.220.15]:35952 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230447AbhBGPZx (ORCPT ); Sun, 7 Feb 2021 10:25:53 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id D8DA6AF5B; Sun, 7 Feb 2021 15:25:08 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, jianpeng.ma@intel.com, qiaowei.ren@intel.com, Coly Li Subject: [PATCH 4/6] bcache: support storing bcache journal into NVDIMM meta device Date: Sun, 7 Feb 2021 23:24:21 +0800 Message-Id: <20210207152423.70697-5-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210207152423.70697-1-colyli@suse.de> References: <20210207152423.70697-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This patch implements two methods to store bcache journal to, 1) __journal_write_unlocked() for block interface device The latency method to compose bio and issue the jset bio to cache device (e.g. SSD). c->journal.key.ptr[0] indicates the LBA on cache device to store the journal jset. 2) __journal_nvdimm_write_unlocked() for memory interface NVDIMM Use memory interface to access NVDIMM pages and store the jset by memcpy_flushcache(). c->journal.key.ptr[0] indicates the linear address from the NVDIMM pages to store the journal jset. For lagency configuration without NVDIMM meta device, journal I/O is handled by __journal_write_unlocked() with existing code logic. If the NVDIMM meta device is used (by bcache-tools), the journal I/O will be handled by __journal_nvdimm_write_unlocked() and go into the NVDIMM pages. And when NVDIMM meta device is used, sb.d[] stores the linear addresses from NVDIMM pages (no more bucket index), in journal_reclaim() the journaling location in c->journal.key.ptr[0] should also be updated by linear address from NVDIMM pages (no more LBA combined by sectors offset and bucket index). Signed-off-by: Coly Li Cc: Jianpeng Ma Cc: Qiaowei Ren --- drivers/md/bcache/journal.c | 111 ++++++++++++++++++++++++------------ 1 file changed, 75 insertions(+), 36 deletions(-) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index 3a88092ea14c..243253974cab 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -596,6 +596,8 @@ static void do_journal_discard(struct cache *ca) return; } + BUG_ON(bch_has_feature_nvdimm_meta(&ca->sb)); + switch (atomic_read(&ja->discard_in_flight)) { case DISCARD_IN_FLIGHT: return; @@ -661,9 +663,13 @@ static void journal_reclaim(struct cache_set *c) goto out; ja->cur_idx = next; - k->ptr[0] = MAKE_PTR(0, - bucket_to_sector(c, ca->sb.d[ja->cur_idx]), - ca->sb.nr_this_dev); + if (!bch_has_feature_nvdimm_meta(&ca->sb)) + k->ptr[0] = MAKE_PTR(0, + bucket_to_sector(c, ca->sb.d[ja->cur_idx]), + ca->sb.nr_this_dev); + else + k->ptr[0] = ca->sb.d[ja->cur_idx]; + atomic_long_inc(&c->reclaimed_journal_buckets); bkey_init(k); @@ -729,46 +735,21 @@ static void journal_write_unlock(struct closure *cl) spin_unlock(&c->journal.lock); } -static void journal_write_unlocked(struct closure *cl) + +static void __journal_write_unlocked(struct cache_set *c) __releases(c->journal.lock) { - struct cache_set *c = container_of(cl, struct cache_set, journal.io); - struct cache *ca = c->cache; - struct journal_write *w = c->journal.cur; struct bkey *k = &c->journal.key; - unsigned int i, sectors = set_blocks(w->data, block_bytes(ca)) * - ca->sb.block_size; - + struct journal_write *w = c->journal.cur; + struct closure *cl = &c->journal.io; + struct cache *ca = c->cache; struct bio *bio; struct bio_list list; + unsigned int i, sectors = set_blocks(w->data, block_bytes(ca)) * + ca->sb.block_size; bio_list_init(&list); - if (!w->need_write) { - closure_return_with_destructor(cl, journal_write_unlock); - return; - } else if (journal_full(&c->journal)) { - journal_reclaim(c); - spin_unlock(&c->journal.lock); - - btree_flush_write(c); - continue_at(cl, journal_write, bch_journal_wq); - return; - } - - c->journal.blocks_free -= set_blocks(w->data, block_bytes(ca)); - - w->data->btree_level = c->root->level; - - bkey_copy(&w->data->btree_root, &c->root->key); - bkey_copy(&w->data->uuid_bucket, &c->uuid_bucket); - - w->data->prio_bucket[ca->sb.nr_this_dev] = ca->prio_buckets[0]; - w->data->magic = jset_magic(&ca->sb); - w->data->version = BCACHE_JSET_VERSION; - w->data->last_seq = last_seq(&c->journal); - w->data->csum = csum_set(w->data); - for (i = 0; i < KEY_PTRS(k); i++) { ca = PTR_CACHE(c, k, i); bio = &ca->journal.bio; @@ -793,7 +774,6 @@ static void journal_write_unlocked(struct closure *cl) ca->journal.seq[ca->journal.cur_idx] = w->data->seq; } - /* If KEY_PTRS(k) == 0, this jset gets lost in air */ BUG_ON(i == 0); @@ -805,6 +785,65 @@ static void journal_write_unlocked(struct closure *cl) while ((bio = bio_list_pop(&list))) closure_bio_submit(c, bio, cl); +} + +static void __journal_nvdimm_write_unlocked(struct cache_set *c) + __releases(c->journal.lock) +{ + struct journal_write *w = c->journal.cur; + struct cache *ca = c->cache; + unsigned int sectors; + + sectors = set_blocks(w->data, block_bytes(ca)) * ca->sb.block_size; + atomic_long_add(sectors, &ca->meta_sectors_written); + + memcpy_flushcache((void *)c->journal.key.ptr[0], w->data, sectors << 9); + + c->journal.key.ptr[0] += sectors << 9; + ca->journal.seq[ca->journal.cur_idx] = w->data->seq; + + atomic_dec_bug(&fifo_back(&c->journal.pin)); + bch_journal_next(&c->journal); + journal_reclaim(c); + + spin_unlock(&c->journal.lock); +} + +static void journal_write_unlocked(struct closure *cl) +{ + struct cache_set *c = container_of(cl, struct cache_set, journal.io); + struct cache *ca = c->cache; + struct journal_write *w = c->journal.cur; + + if (!w->need_write) { + closure_return_with_destructor(cl, journal_write_unlock); + return; + } else if (journal_full(&c->journal)) { + journal_reclaim(c); + spin_unlock(&c->journal.lock); + + btree_flush_write(c); + continue_at(cl, journal_write, bch_journal_wq); + return; + } + + c->journal.blocks_free -= set_blocks(w->data, block_bytes(ca)); + + w->data->btree_level = c->root->level; + + bkey_copy(&w->data->btree_root, &c->root->key); + bkey_copy(&w->data->uuid_bucket, &c->uuid_bucket); + + w->data->prio_bucket[ca->sb.nr_this_dev] = ca->prio_buckets[0]; + w->data->magic = jset_magic(&ca->sb); + w->data->version = BCACHE_JSET_VERSION; + w->data->last_seq = last_seq(&c->journal); + w->data->csum = csum_set(w->data); + + if (!bch_has_feature_nvdimm_meta(&ca->sb)) + __journal_write_unlocked(c); + else + __journal_nvdimm_write_unlocked(c); continue_at(cl, journal_write_done, NULL); } From patchwork Sun Feb 7 15:24:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12073067 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0198EC433E9 for ; Sun, 7 Feb 2021 15:30:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CA68D64E3B for ; Sun, 7 Feb 2021 15:30:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230280AbhBGPaD (ORCPT ); Sun, 7 Feb 2021 10:30:03 -0500 Received: from mx2.suse.de ([195.135.220.15]:36938 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230365AbhBGP2C (ORCPT ); Sun, 7 Feb 2021 10:28:02 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 607B2AF50; Sun, 7 Feb 2021 15:25:12 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, jianpeng.ma@intel.com, qiaowei.ren@intel.com, Coly Li Subject: [PATCH 5/6] bache: read jset from NVDIMM pages for journal replay Date: Sun, 7 Feb 2021 23:24:22 +0800 Message-Id: <20210207152423.70697-6-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210207152423.70697-1-colyli@suse.de> References: <20210207152423.70697-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This patch implements two methods to read jset from media for journal replay, - __jnl_rd_bkt() for block device This is the legacy method to read jset via block device interface. - __jnl_rd_nvm_bkt() for NVDIMM This is the method to read jset from NVDIMM memory interface, a.k.a memcopy() from NVDIMM pages to DRAM pages. If BCH_FEATURE_INCOMPAT_NVDIMM_META is set in incompat feature set, during running cache set, journal_read_bucket() will read the journal content from NVDIMM by __jnl_rd_nvm_bkt(). The linear addresses of NVDIMM pages to read jset are stored in sb.d[SB_JOURNAL_BUCKETS], which were initialized and maintained in previous runs of the cache set. A thing should be noticed is, when bch_journal_read() is called, the linear address of NVDIMM pages is not loaded and initialized yet, it is necessary to call __bch_journal_nvdimm_init() before reading the jset from NVDIMM pages. Signed-off-by: Coly Li Cc: Jianpeng Ma Cc: Qiaowei Ren --- drivers/md/bcache/journal.c | 81 ++++++++++++++++++++++++++----------- 1 file changed, 57 insertions(+), 24 deletions(-) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index 243253974cab..d1c76facb6b2 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -34,60 +34,84 @@ static void journal_read_endio(struct bio *bio) closure_put(cl); } +static struct jset *__jnl_rd_bkt(struct cache *ca, unsigned int bkt_idx, + unsigned int len, unsigned int offset, + struct closure *cl) +{ + sector_t bucket = bucket_to_sector(ca->set, ca->sb.d[bkt_idx]); + struct bio *bio = &ca->journal.bio; + struct jset *data = ca->set->journal.w[0].data; + + bio_reset(bio); + bio->bi_iter.bi_sector = bucket + offset; + bio_set_dev(bio, ca->bdev); + bio->bi_iter.bi_size = len << 9; + bio->bi_end_io = journal_read_endio; + bio->bi_private = cl; + bio_set_op_attrs(bio, REQ_OP_READ, 0); + bch_bio_map(bio, data); + + closure_bio_submit(ca->set, bio, cl); + closure_sync(cl); + + /* Indeed journal.w[0].data */ + return data; +} + +static struct jset *__jnl_rd_nvm_bkt(struct cache *ca, unsigned int bkt_idx, + unsigned int len, unsigned int offset) +{ + void *jset_addr = (void *)ca->sb.d[bkt_idx] + (offset << 9); + struct jset *data = ca->set->journal.w[0].data; + + memcpy(data, jset_addr, len << 9); + + /* Indeed journal.w[0].data */ + return data; +} + static int journal_read_bucket(struct cache *ca, struct list_head *list, - unsigned int bucket_index) + unsigned int bucket_idx) { struct journal_device *ja = &ca->journal; - struct bio *bio = &ja->bio; struct journal_replay *i; - struct jset *j, *data = ca->set->journal.w[0].data; + struct jset *j; struct closure cl; unsigned int len, left, offset = 0; int ret = 0; - sector_t bucket = bucket_to_sector(ca->set, ca->sb.d[bucket_index]); closure_init_stack(&cl); - pr_debug("reading %u\n", bucket_index); + pr_debug("reading %u\n", bucket_idx); while (offset < ca->sb.bucket_size) { reread: left = ca->sb.bucket_size - offset; len = min_t(unsigned int, left, PAGE_SECTORS << JSET_BITS); - bio_reset(bio); - bio->bi_iter.bi_sector = bucket + offset; - bio_set_dev(bio, ca->bdev); - bio->bi_iter.bi_size = len << 9; - - bio->bi_end_io = journal_read_endio; - bio->bi_private = &cl; - bio_set_op_attrs(bio, REQ_OP_READ, 0); - bch_bio_map(bio, data); - - closure_bio_submit(ca->set, bio, &cl); - closure_sync(&cl); + if (!bch_has_feature_nvdimm_meta(&ca->sb)) + j = __jnl_rd_bkt(ca, bucket_idx, len, offset, &cl); + else + j = __jnl_rd_nvm_bkt(ca, bucket_idx, len, offset); /* This function could be simpler now since we no longer write * journal entries that overlap bucket boundaries; this means * the start of a bucket will always have a valid journal entry * if it has any journal entries at all. */ - - j = data; while (len) { struct list_head *where; size_t blocks, bytes = set_bytes(j); if (j->magic != jset_magic(&ca->sb)) { - pr_debug("%u: bad magic\n", bucket_index); + pr_debug("%u: bad magic\n", bucket_idx); return ret; } if (bytes > left << 9 || bytes > PAGE_SIZE << JSET_BITS) { pr_info("%u: too big, %zu bytes, offset %u\n", - bucket_index, bytes, offset); + bucket_idx, bytes, offset); return ret; } @@ -96,7 +120,7 @@ reread: left = ca->sb.bucket_size - offset; if (j->csum != csum_set(j)) { pr_info("%u: bad csum, %zu bytes, offset %u\n", - bucket_index, bytes, offset); + bucket_idx, bytes, offset); return ret; } @@ -158,8 +182,8 @@ reread: left = ca->sb.bucket_size - offset; list_add(&i->list, where); ret = 1; - if (j->seq > ja->seq[bucket_index]) - ja->seq[bucket_index] = j->seq; + if (j->seq > ja->seq[bucket_idx]) + ja->seq[bucket_idx] = j->seq; next_set: offset += blocks * ca->sb.block_size; len -= blocks * ca->sb.block_size; @@ -170,6 +194,8 @@ reread: left = ca->sb.bucket_size - offset; return ret; } +static int __bch_journal_nvdimm_init(struct cache *ca); + int bch_journal_read(struct cache_set *c, struct list_head *list) { #define read_bucket(b) \ @@ -188,6 +214,13 @@ int bch_journal_read(struct cache_set *c, struct list_head *list) unsigned int i, l, r, m; uint64_t seq; + /* + * Linear addresses of NVDIMM pages for journaling is not + * initialized yet, do it before read jset from NVDIMM pages. + */ + if (bch_has_feature_nvdimm_meta(&ca->sb)) + __bch_journal_nvdimm_init(ca); + bitmap_zero(bitmap, SB_JOURNAL_BUCKETS); pr_debug("%u journal buckets\n", ca->sb.njournal_buckets); From patchwork Sun Feb 7 15:24:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12073065 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D22FBC433DB for ; Sun, 7 Feb 2021 15:30:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8601464E43 for ; Sun, 7 Feb 2021 15:30:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230247AbhBGP34 (ORCPT ); Sun, 7 Feb 2021 10:29:56 -0500 Received: from mx2.suse.de ([195.135.220.15]:36936 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230364AbhBGP2B (ORCPT ); Sun, 7 Feb 2021 10:28:01 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 84F3CAF69; Sun, 7 Feb 2021 15:25:17 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, jianpeng.ma@intel.com, qiaowei.ren@intel.com, Coly Li Subject: [PATCH 6/6] bcache: add sysfs interface register_nvdimm_meta to register NVDIMM meta device Date: Sun, 7 Feb 2021 23:24:23 +0800 Message-Id: <20210207152423.70697-7-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210207152423.70697-1-colyli@suse.de> References: <20210207152423.70697-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This patch adds a sysfs interface register_nvdimm_meta to register NVDIMM meta device. The sysfs interface file only shows up when CONFIG_BCACHE_NVM_PAGES=y. Then a NVDIMM name space formatted by bcache-tools can be registered into bcache by e.g., echo /dev/pmem0 > /sys/fs/bcache/register_nvdimm_meta Signed-off-by: Coly Li Cc: Jianpeng Ma Cc: Qiaowei Ren --- drivers/md/bcache/super.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 262dae3ef030..7cbccc768468 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -2409,10 +2409,18 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, static ssize_t bch_pending_bdevs_cleanup(struct kobject *k, struct kobj_attribute *attr, const char *buffer, size_t size); +#ifdef CONFIG_BCACHE_NVM_PAGES +static ssize_t register_nvdimm_meta(struct kobject *k, + struct kobj_attribute *attr, + const char *buffer, size_t size); +#endif kobj_attribute_write(register, register_bcache); kobj_attribute_write(register_quiet, register_bcache); kobj_attribute_write(pendings_cleanup, bch_pending_bdevs_cleanup); +#ifdef CONFIG_BCACHE_NVM_PAGES +kobj_attribute_write(register_nvdimm_meta, register_nvdimm_meta); +#endif static bool bch_is_open_backing(dev_t dev) { @@ -2526,6 +2534,24 @@ static void register_device_aync(struct async_reg_args *args) queue_delayed_work(system_wq, &args->reg_work, 10); } +#ifdef CONFIG_BCACHE_NVM_PAGES +static ssize_t register_nvdimm_meta(struct kobject *k, struct kobj_attribute *attr, + const char *buffer, size_t size) +{ + ssize_t ret = size; + + struct bch_nvm_namespace *ns = bch_register_namespace(buffer); + + if (IS_ERR(ns)) { + pr_err("register nvdimm namespace %s for meta device failed.\n", + buffer); + ret = -EINVAL; + } + + return size; +} +#endif + static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, const char *buffer, size_t size) { @@ -2858,6 +2884,9 @@ static int __init bcache_init(void) static const struct attribute *files[] = { &ksysfs_register.attr, &ksysfs_register_quiet.attr, +#ifdef CONFIG_BCACHE_NVM_PAGES + &ksysfs_register_nvdimm_meta.attr, +#endif &ksysfs_pendings_cleanup.attr, NULL };