From patchwork Wed Aug 11 17:02:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12431591 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 965AEC4338F for ; Wed, 11 Aug 2021 17:03:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A668061019 for ; Wed, 11 Aug 2021 17:03:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229776AbhHKRDv (ORCPT ); Wed, 11 Aug 2021 13:03:51 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:48268 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229835AbhHKRDu (ORCPT ); Wed, 11 Aug 2021 13:03:50 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id D62B522210; Wed, 11 Aug 2021 17:03:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1628701405; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Yw2Lnquz8mhvPAZ2zFhRmvTOY9GchEY3dKk/xW+CMHY=; b=j/kCwcXsQo2nFVQpMr/P7rW0iQKdxintNq8ljjA/843U90PM0eNXXQ0RzW8E3lZNsAX4tM DlMuYaxirRX8pZA+2gO/S7ncxyJ9RbslQ4kAeseuKWrBgYSWSiO/zg7a9DFB7R5nVmovCl eXT/fdX0R7QWEL/PxHFKt02BaQEeV10= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1628701405; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Yw2Lnquz8mhvPAZ2zFhRmvTOY9GchEY3dKk/xW+CMHY=; b=KXe7GnQpgh+AlZ27agE5fiDWwXWk9qhJeRMug9lzaWWLqjJdTJn2VRaPCmQK/zYoI1GCvL qLE+4VepZoRjznCg== Received: from localhost.localdomain (colyli.tcp.ovpn1.nue.suse.de [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id F3F5EA3D58; Wed, 11 Aug 2021 17:03:19 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-nvdimm@lists.linux.dev, axboe@kernel.dk, hare@suse.com, jack@suse.cz, dan.j.williams@intel.com, hch@lst.de, ying.huang@intel.com, Coly Li , Hannes Reinecke , Jianpeng Ma , Qiaowei Ren Subject: [PATCH v12 01/12] bcache: add initial data structures for nvm pages Date: Thu, 12 Aug 2021 01:02:13 +0800 Message-Id: <20210811170224.42837-2-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210811170224.42837-1-colyli@suse.de> References: <20210811170224.42837-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This patch initializes the prototype data structures for nvm pages allocator, - struct bch_nvmpg_sb This is the super block allocated on each nvdimm namespace for the nvm pages allocator. A nvdimm pages allocator set may have multiple name- spaces, bch_nvmpg_sb->set_uuid is used to mark which nvdimm set this namespace belongs to. - struct bch_nvmpg_header This is a table for all heads of all allocation record lists. An allo- cation record list traces all page(s) allocated from nvdimm namespace(s) to a specific requester (identified by uuid). After system reboot, a requester can retrieve all previously allocated nvdimm pages from its record list by a pre-defined uuid. - struct bch_nvmpg_head This is a head of an allocation record list. Each nvdimm pages requester (typically it's a driver) has and only has one allocation record list, and an allocated nvdimm page only belongs to a specific allocation record list. Member uuid[] will be set as the requester's uuid, e.g. for bcache it is the cache set uuid. Member label is not mandatory, it is a human-readable string for debug purpose. The nvm offset format pointers recs_offset[] point to the location of actual allocator record lists on each namespace of the nvdimm pages allocator set. Each per namespace record list is represented by the following struct bch_nvmpg_recs. - struct bch_nvmpg_recs This structure represents a requester's allocation record list. Member uuid is same value as the uuid of its corresponding struct bch_nvmpg_head. Member recs[] is a table of struct bch_pgalloc_rec objects to trace all allocated nvmdimm pages. If the table recs[] is full, the nvmpg format offset is a pointer points to the next struct bch_nvmpg_recs object, nvm pages allocator will look for available free allocation record there. All the linked struct bch_nvmpg_recs objects compose a requester's alloction record list which is headed by the above struct bch_nvmpg_head. - struct bch_nvmpg_recs This structure records a range of allocated nvdimm pages. Member pgoff is offset in unit of page size of this allocation range. Member order indicates size of the allocation range by (1 << order) in unit of page size. Because the nvdimm pages allocator set may have multiple nvdimm namespaces, member ns_id is used to identify which namespace the pgoff belongs to. - Bits 0 - 51: pgoff - is pages offset of the allocated pages. - Bits 52 - 57: order - allocaed size in page_size * order-of-2 - Bits 58 - 60: ns_id - identify which namespace the pages stays on - Bits 61 - 63: reserved. Since each of the allocated nvm pages are power of 2, using 6 bits to represent allocated size can have (1<<(1<<64) - 1) * PAGE_SIZE maximum value. It can be a 76 bits width range size in byte for 4KB page size, which is large enough currently. All the structure members having _offset suffix are in a special fomat. E.g. bch_nvmpg_sb.{sb_offset, pages_offset, set_header_offset}, bch_nvmpg_head.recs_offset, bch_nvmpg_recs.{head_offset, next_offset}, the offset value is 64bit, the most significant 3 bits are used to identify which namespace this offset belongs to, and the rested 61 bits are actual offset inside the namespace. Following patches will have helper routines to do the conversion between memory pointer and offset. Signed-off-by: Coly Li Cc: Christoph Hellwig Cc: Dan Williams Cc: Hannes Reinecke Cc: Jens Axboe Cc: Jianpeng Ma Cc: Qiaowei Ren Cc: Ying Huang --- include/uapi/linux/bcache-nvm.h | 253 ++++++++++++++++++++++++++++++++ 1 file changed, 253 insertions(+) create mode 100644 include/uapi/linux/bcache-nvm.h diff --git a/include/uapi/linux/bcache-nvm.h b/include/uapi/linux/bcache-nvm.h new file mode 100644 index 000000000000..0e1082bb88ee --- /dev/null +++ b/include/uapi/linux/bcache-nvm.h @@ -0,0 +1,253 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ + +#ifndef _UAPI_BCACHE_NVM_H +#define _UAPI_BCACHE_NVM_H + +/* + * Bcache on NVDIMM data structures + */ + +/* + * - struct bch_nvmpg_sb + * This is the super block allocated on each nvdimm namespace for the nvm + * pages allocator. A nvdimm pages allocator set may have multiple namespaces, + * bch_nvmpg_sb->set_uuid is used to mark which nvdimm set this name space + * belongs to. + * + * - struct bch_nvmpg_header + * This is a table for all heads of all allocation record lists. An allo- + * cation record list traces all page(s) allocated from nvdimm namespace(s) to + * a specific requester (identified by uuid). After system reboot, a requester + * can retrieve all previously allocated nvdimm pages from its record list by a + * pre-defined uuid. + * + * - struct bch_nvmpg_head + * This is a head of an allocation record list. Each nvdimm pages requester + * (typically it's a driver) has and only has one allocation record list, and + * an allocated nvdimm page only bedlones to a specific allocation record list. + * Member uuid[] will be set as the requester's uuid, e.g. for bcache it is the + * cache set uuid. Member label is not mandatory, it is a human-readable string + * for debug purpose. The nvm offset format pointers recs_offset[] point to the + * location of actual allocator record lists on each name space of the nvdimm + * pages allocator set. Each per name space record list is represented by the + * following struct bch_nvmpg_recs. + * + * - struct bch_nvmpg_recs + * This structure represents a requester's allocation record list. Member uuid + * is same value as the uuid of its corresponding struct bch_nvmpg_head. Member + * recs[] is a table of struct bch_pgalloc_rec objects to trace all allocated + * nvmdimm pages. If the table recs[] is full, the nvmpg format offset is a + * pointer points to the next struct bch_nvmpg_recs object, nvm pages allocator + * will look for available free allocation record there. All the linked + * struct bch_nvmpg_recs objects compose a requester's alloction record list + * which is headed by the above struct bch_nvmpg_head. + * + * - struct bch_nvmpg_rec + * This structure records a range of allocated nvdimm pages. Member pgoff is + * offset in unit of page size of this allocation range. Member order indicates + * size of the allocation range by (1 << order) in unit of page size. Because + * the nvdimm pages allocator set may have multiple nvdimm name spaces, member + * ns_id is used to identify which name space the pgoff belongs to. + * + * All allocation record lists are stored on the first initialized nvdimm name- + * space (ns_id 0). The meta data default layout of nvm pages allocator on + * namespace 0 is, + * + * 0 +---------------------------------+ + * | | + * 4KB +---------------------------------+ <-- BCH_NVMPG_SB_OFFSET + * | bch_nvmpg_sb | + * 8KB +---------------------------------+ <-- BCH_NVMPG_RECLIST_HEAD_OFFSET + * | bch_nvmpg_header | + * | | + * 16KB +---------------------------------+ <-- BCH_NVMPG_SYSRECS_OFFSET + * | bch_nvmpg_recs | + * | (nvm pages internal usage) | + * 24KB +---------------------------------+ + * | | + * | | + * 16MB +---------------------------------+ <-- BCH_NVMPG_START + * | allocable nvm pages | + * | for buddy allocator | + * end +---------------------------------+ + * + * + * + * Meta data default layout on rested nvdimm namespaces, + * + * 0 +---------------------------------+ + * | | + * 4KB +---------------------------------+ <-- BCH_NVMPG_SB_OFFSET + * | bch_nvmpg_sb | + * 8KB +---------------------------------+ + * | | + * | | + * | | + * | | + * | | + * | | + * 16MB +---------------------------------+ <-- BCH_NVMPG_START + * | allocable nvm pages | + * | for buddy allocator | + * end +---------------------------------+ + * + * + * - The nvmpg offset format pointer + * All member names ending with _offset in this header are nvmpg offset + * format pointer. The offset format is, + * [highest 3 bits: ns_id] + * [rested 61 bits: offset in No. ns_id namespace] + * + * The above offset is byte unit, the procedure to reference a nvmpg offset + * format pointer is, + * 1) Identify the namespace related in-memory structure by ns_id from the + * highest 3 bits of offset value. + * 2) Get the DAX mapping base address from the in-memory structure. + * 3) Calculate the actual memory address on nvdimm by plusing the DAX base + * address with offset value in rested low 61 bits. + * All related in-memory structure and conversion routines don't belong to + * user space api, they are defined by nvm-pages allocator code in + * drivers/md/bcache/nvm-pages.{c,h} + * + */ + +#include + +/* In sectors */ +#define BCH_NVMPG_SB_OFFSET 4096 +#define BCH_NVMPG_START (16 << 20) + +#define BCH_NVMPG_LBL_SIZE 32 +#define BCH_NVMPG_NS_MAX 8 + +#define BCH_NVMPG_RECLIST_HEAD_OFFSET (8<<10) +#define BCH_NVMPG_SYSRECS_OFFSET (16<<10) + +#define BCH_NVMPG_SB_VERSION 0 +#define BCH_NVMPG_SB_VERSION_MAX 0 + +static const __u8 bch_nvmpg_magic[] = { + 0x17, 0xbd, 0x53, 0x7f, 0x1b, 0x23, 0xd6, 0x83, + 0x46, 0xa4, 0xf8, 0x28, 0x17, 0xda, 0xec, 0xa9 }; +static const __u8 bch_nvmpg_recs_magic[] = { + 0x39, 0x25, 0x3f, 0xf7, 0x27, 0x17, 0xd0, 0xb9, + 0x10, 0xe6, 0xd2, 0xda, 0x38, 0x68, 0x26, 0xae }; + +/* takes 64bit width */ +struct bch_nvmpg_rec { + union { + struct { + __u64 pgoff:52; + __u64 order:6; + __u64 ns_id:3; + __u64 reserved:3; + }; + __u64 _v; + }; +}; + +struct bch_nvmpg_recs { + union { + struct { + /* + * A nvmpg offset format pointer to + * struct bch_nvmpg_head + */ + __u64 head_offset; + /* + * A nvmpg offset format pointer to + * struct bch_nvm_pgalloc_recs which contains + * the next recs[] array. + */ + __u64 next_offset; + __u8 magic[16]; + __u8 uuid[16]; + __u32 size; + __u32 used; + __u64 _pad[4]; + struct bch_nvmpg_rec recs[]; + }; + __u8 pad[8192]; + }; +}; + +#define BCH_NVMPG_MAX_RECS \ + ((sizeof(struct bch_nvmpg_recs) - \ + offsetof(struct bch_nvmpg_recs, recs)) / \ + sizeof(struct bch_nvmpg_rec)) + +#define BCH_NVMPG_HD_STAT_FREE 0x0 +#define BCH_NVMPG_HD_STAT_ALLOC 0x1 +struct bch_nvmpg_head { + __u8 uuid[16]; + __u8 label[BCH_NVMPG_LBL_SIZE]; + __u32 state; + __u32 flags; + /* + * Array of offset values from the nvmpg offset format + * pointers, each of the pointer points to a per-namespace + * struct bch_nvmpg_recs. + */ + __u64 recs_offset[BCH_NVMPG_NS_MAX]; +}; + +/* heads[0] is always for nvm_pages internal usage */ +struct bch_nvmpg_set_header { + union { + struct { + __u32 size; + __u32 used; + __u64 _pad[4]; + struct bch_nvmpg_head heads[]; + }; + __u8 pad[8192]; + }; +}; + +#define BCH_NVMPG_MAX_HEADS \ + ((sizeof(struct bch_nvmpg_set_header) - \ + offsetof(struct bch_nvmpg_set_header, heads)) / \ + sizeof(struct bch_nvmpg_head)) + +/* The on-media bit order is local CPU order */ +struct bch_nvmpg_sb { + __u64 csum; + __u64 sb_offset; + __u64 ns_start; + __u64 version; + __u8 magic[16]; + __u8 uuid[16]; + __u32 page_size; + __u32 total_ns; + __u32 this_ns; + union { + __u8 set_uuid[16]; + __u64 set_magic; + }; + + __u64 flags; + __u64 seq; + + __u64 feature_compat; + __u64 feature_incompat; + __u64 feature_ro_compat; + + /* For allocable nvm pages from buddy systems */ + __u64 pages_offset; + __u64 pages_total; + + __u64 pad[8]; + + /* + * A nvmpg offset format pointer, it points + * to struct bch_nvmpg_set_header which is + * stored only on the first name space. + */ + __u64 set_header_offset; + + /* Just for csum_set() */ + __u32 keys; + __u64 d[0]; +}; + +#endif /* _UAPI_BCACHE_NVM_H */ From patchwork Wed Aug 11 17:02:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12431593 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35440C4320A for ; Wed, 11 Aug 2021 17:03:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 13F8760FD9 for ; Wed, 11 Aug 2021 17:03:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229963AbhHKRD7 (ORCPT ); Wed, 11 Aug 2021 13:03:59 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:58266 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229905AbhHKRD4 (ORCPT ); Wed, 11 Aug 2021 13:03:56 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id DD1C51FEE1; Wed, 11 Aug 2021 17:03:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1628701411; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n7zzjEbzmTdDbg+1xtUoC/vore4SLt6AYTm5bzt6nK0=; b=PT3/Ur9gwzHKBLr2ZFlZKAmZBW7EIj+uDPKDZsJEwnr0Q/NpDlkZ1PT4gmcWlVLYq2E7f4 RK5bJcyBJS6h006+em4aiCnoMthazZnL1g9uMnGARAciJM6x3jhfioYUpYfbvRqnkanrz6 Z0W6HXbhbYuqbIHRJQluXhuv/2XtOtE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1628701411; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n7zzjEbzmTdDbg+1xtUoC/vore4SLt6AYTm5bzt6nK0=; b=KTBtBhtVV79wJMIO0lD95VVFd/rB1KpFathiu376s+JP1N0tmAxbiSdA/vzC8XLnuwk68Q GvxT+v/wrlbDzLBA== Received: from localhost.localdomain (colyli.tcp.ovpn1.nue.suse.de [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id 4BA43A3D62; Wed, 11 Aug 2021 17:03:26 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-nvdimm@lists.linux.dev, axboe@kernel.dk, hare@suse.com, jack@suse.cz, dan.j.williams@intel.com, hch@lst.de, ying.huang@intel.com, Jianpeng Ma , Randy Dunlap , Qiaowei Ren , Hannes Reinecke Subject: [PATCH v12 02/12] bcache: initialize the nvm pages allocator Date: Thu, 12 Aug 2021 01:02:14 +0800 Message-Id: <20210811170224.42837-3-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210811170224.42837-1-colyli@suse.de> References: <20210811170224.42837-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Jianpeng Ma This patch define the prototype data structures in memory and initializes the nvm pages allocator. The nvm address space which is managed by this allocator can consist of many nvm namespaces, and some namespaces can compose into one nvm set, like cache set. For this initial implementation, only one set can be supported. The users of this nvm pages allocator need to call register_namespace() to register the nvdimm device (like /dev/pmemX) into this allocator as the instance of struct nvm_namespace. Reported-by: Randy Dunlap Signed-off-by: Jianpeng Ma Co-developed-by: Qiaowei Ren Signed-off-by: Qiaowei Ren Cc: Christoph Hellwig Cc: Dan Williams Cc: Hannes Reinecke Cc: Jens Axboe --- drivers/md/bcache/Kconfig | 10 + drivers/md/bcache/Makefile | 1 + drivers/md/bcache/nvm-pages.c | 339 ++++++++++++++++++++++++++++++++++ drivers/md/bcache/nvm-pages.h | 96 ++++++++++ drivers/md/bcache/super.c | 3 + 5 files changed, 449 insertions(+) create mode 100644 drivers/md/bcache/nvm-pages.c create mode 100644 drivers/md/bcache/nvm-pages.h diff --git a/drivers/md/bcache/Kconfig b/drivers/md/bcache/Kconfig index d1ca4d059c20..a69f6c0e0507 100644 --- a/drivers/md/bcache/Kconfig +++ b/drivers/md/bcache/Kconfig @@ -35,3 +35,13 @@ config BCACHE_ASYNC_REGISTRATION device path into this file will returns immediately and the real registration work is handled in kernel work queue in asynchronous way. + +config BCACHE_NVM_PAGES + bool "NVDIMM support for bcache (EXPERIMENTAL)" + depends on BCACHE + depends on 64BIT + depends on LIBNVDIMM + depends on DAX + help + Allocate/release NV-memory pages for bcache and provide allocated pages + for each requestor after system reboot. diff --git a/drivers/md/bcache/Makefile b/drivers/md/bcache/Makefile index 5b87e59676b8..2397bb7c7ffd 100644 --- a/drivers/md/bcache/Makefile +++ b/drivers/md/bcache/Makefile @@ -5,3 +5,4 @@ obj-$(CONFIG_BCACHE) += bcache.o bcache-y := alloc.o bset.o btree.o closure.o debug.o extents.o\ io.o journal.o movinggc.o request.o stats.o super.o sysfs.o trace.o\ util.o writeback.o features.o +bcache-$(CONFIG_BCACHE_NVM_PAGES) += nvm-pages.o diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c new file mode 100644 index 000000000000..6184c628d9cc --- /dev/null +++ b/drivers/md/bcache/nvm-pages.c @@ -0,0 +1,339 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Nvdimm page-buddy allocator + * + * Copyright (c) 2021, Intel Corporation. + * Copyright (c) 2021, Qiaowei Ren . + * Copyright (c) 2021, Jianpeng Ma . + */ + +#include "bcache.h" +#include "nvm-pages.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct bch_nvmpg_set *global_nvmpg_set; + +void *bch_nvmpg_offset_to_ptr(unsigned long offset) +{ + int ns_id = BCH_NVMPG_GET_NS_ID(offset); + struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[ns_id]; + + if (offset == 0) + return NULL; + + ns_id = BCH_NVMPG_GET_NS_ID(offset); + ns = global_nvmpg_set->ns_tbl[ns_id]; + + if (ns) + return (void *)(ns->base_addr + BCH_NVMPG_GET_OFFSET(offset)); + + pr_err("Invalid ns_id %u\n", ns_id); + return NULL; +} + +unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr) +{ + int ns_id = ns->ns_id; + unsigned long offset = (unsigned long)(ptr - ns->base_addr); + + return BCH_NVMPG_OFFSET(ns_id, offset); +} + +static void release_ns_tbl(struct bch_nvmpg_set *set) +{ + int i; + struct bch_nvmpg_ns *ns; + + for (i = 0; i < BCH_NVMPG_NS_MAX; i++) { + ns = set->ns_tbl[i]; + if (ns) { + blkdev_put(ns->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXEC); + set->ns_tbl[i] = NULL; + set->attached_ns--; + kfree(ns); + } + } + + if (set->attached_ns) + pr_err("unexpected attached_ns: %u\n", set->attached_ns); +} + +static void release_nvmpg_set(struct bch_nvmpg_set *set) +{ + release_ns_tbl(set); + kfree(set); +} + +/* Namespace 0 contains all meta data of the nvmpg allocation set */ +static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns) +{ + struct bch_nvmpg_set_header *set_header; + + if (ns->ns_id != 0) { + pr_err("unexpected ns_id %u for first nvmpg namespace.\n", + ns->ns_id); + return -EINVAL; + } + + set_header = bch_nvmpg_offset_to_ptr(ns->sb->set_header_offset); + + mutex_lock(&global_nvmpg_set->lock); + global_nvmpg_set->set_header = set_header; + global_nvmpg_set->heads_size = set_header->size; + global_nvmpg_set->heads_used = set_header->used; + mutex_unlock(&global_nvmpg_set->lock); + + return 0; +} + +static int attach_nvmpg_set(struct bch_nvmpg_ns *ns) +{ + struct bch_nvmpg_sb *sb = ns->sb; + int rc = 0; + + mutex_lock(&global_nvmpg_set->lock); + + if (global_nvmpg_set->ns_tbl[sb->this_ns]) { + pr_err("ns_id %u already attached.\n", ns->ns_id); + rc = -EEXIST; + goto unlock; + } + + if (ns->ns_id != 0) { + pr_err("unexpected ns_id %u for first namespace.\n", ns->ns_id); + rc = -EINVAL; + goto unlock; + } + + if (global_nvmpg_set->attached_ns > 0) { + pr_err("multiple namespace attaching not supported yet\n"); + rc = -EOPNOTSUPP; + goto unlock; + } + + if ((global_nvmpg_set->attached_ns + 1) > sb->total_ns) { + pr_err("namespace counters error: attached %u > total %u\n", + global_nvmpg_set->attached_ns, + global_nvmpg_set->total_ns); + rc = -EINVAL; + goto unlock; + } + + memcpy(global_nvmpg_set->set_uuid, sb->set_uuid, 16); + global_nvmpg_set->ns_tbl[sb->this_ns] = ns; + global_nvmpg_set->attached_ns++; + global_nvmpg_set->total_ns = sb->total_ns; + +unlock: + mutex_unlock(&global_nvmpg_set->lock); + return rc; +} + +static int read_nvdimm_meta_super(struct block_device *bdev, + struct bch_nvmpg_ns *ns) +{ + struct page *page; + struct bch_nvmpg_sb *sb; + uint64_t expected_csum = 0; + int r; + + page = read_cache_page_gfp(bdev->bd_inode->i_mapping, + BCH_NVMPG_SB_OFFSET >> PAGE_SHIFT, GFP_KERNEL); + + if (IS_ERR(page)) + return -EIO; + + sb = (struct bch_nvmpg_sb *) + (page_address(page) + offset_in_page(BCH_NVMPG_SB_OFFSET)); + + r = -EINVAL; + expected_csum = csum_set(sb); + if (expected_csum != sb->csum) { + pr_info("csum is not match with expected one\n"); + goto put_page; + } + + if (memcmp(sb->magic, bch_nvmpg_magic, 16)) { + pr_info("invalid bch_nvmpg_magic\n"); + goto put_page; + } + + if (sb->sb_offset != + BCH_NVMPG_OFFSET(sb->this_ns, BCH_NVMPG_SB_OFFSET)) { + pr_info("invalid superblock offset 0x%llx\n", sb->sb_offset); + goto put_page; + } + + r = -EOPNOTSUPP; + if (sb->total_ns != 1) { + pr_info("multiple name space not supported yet.\n"); + goto put_page; + } + + + r = 0; + /* Necessary for DAX mapping */ + ns->page_size = sb->page_size; + ns->pages_total = sb->pages_total; + +put_page: + put_page(page); + return r; +} + +struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path) +{ + struct bch_nvmpg_ns *ns = NULL; + struct bch_nvmpg_sb *sb = NULL; + char buf[BDEVNAME_SIZE]; + struct block_device *bdev; + pgoff_t pgoff; + int id, err; + char *path; + long dax_ret = 0; + + path = kstrndup(dev_path, 512, GFP_KERNEL); + if (!path) { + pr_err("kstrndup failed\n"); + return ERR_PTR(-ENOMEM); + } + + bdev = blkdev_get_by_path(strim(path), + FMODE_READ|FMODE_WRITE|FMODE_EXEC, + global_nvmpg_set); + if (IS_ERR(bdev)) { + pr_err("get %s error: %ld\n", dev_path, PTR_ERR(bdev)); + kfree(path); + return ERR_PTR(PTR_ERR(bdev)); + } + + err = -ENOMEM; + ns = kzalloc(sizeof(struct bch_nvmpg_ns), GFP_KERNEL); + if (!ns) + goto bdput; + + err = -EIO; + if (read_nvdimm_meta_super(bdev, ns)) { + pr_err("%s read nvdimm meta super block failed.\n", + bdevname(bdev, buf)); + goto free_ns; + } + + err = -EOPNOTSUPP; + if (!bdev_dax_supported(bdev, ns->page_size)) { + pr_err("%s don't support DAX\n", bdevname(bdev, buf)); + goto free_ns; + } + + err = -EINVAL; + if (bdev_dax_pgoff(bdev, 0, ns->page_size, &pgoff)) { + pr_err("invalid offset of %s\n", bdevname(bdev, buf)); + goto free_ns; + } + + err = -ENOMEM; + ns->dax_dev = fs_dax_get_by_bdev(bdev); + if (!ns->dax_dev) { + pr_err("can't by dax device by %s\n", bdevname(bdev, buf)); + goto free_ns; + } + + err = -EINVAL; + id = dax_read_lock(); + dax_ret = dax_direct_access(ns->dax_dev, pgoff, ns->pages_total, + &ns->base_addr, &ns->start_pfn); + if (dax_ret <= 0) { + pr_err("dax_direct_access error\n"); + dax_read_unlock(id); + goto free_ns; + } + + if (dax_ret < ns->pages_total) { + pr_warn("mapped range %ld is less than ns->pages_total %lu\n", + dax_ret, ns->pages_total); + } + dax_read_unlock(id); + + sb = (struct bch_nvmpg_sb *)(ns->base_addr + BCH_NVMPG_SB_OFFSET); + + err = -EINVAL; + /* Check magic again to make sure DAX mapping is correct */ + if (memcmp(sb->magic, bch_nvmpg_magic, 16)) { + pr_err("invalid bch_nvmpg_magic after DAX mapping\n"); + goto free_ns; + } + + if ((global_nvmpg_set->attached_ns > 0) && + memcmp(sb->set_uuid, global_nvmpg_set->set_uuid, 16)) { + pr_err("set uuid does not match with ns_id %u\n", ns->ns_id); + goto free_ns; + } + + if (sb->set_header_offset != + BCH_NVMPG_OFFSET(sb->this_ns, BCH_NVMPG_RECLIST_HEAD_OFFSET)) { + pr_err("Invalid header offset: this_ns %u, ns_id %llu, offset 0x%llx\n", + sb->this_ns, + BCH_NVMPG_GET_NS_ID(sb->set_header_offset), + BCH_NVMPG_GET_OFFSET(sb->set_header_offset)); + goto free_ns; + } + + ns->page_size = sb->page_size; + ns->pages_offset = sb->pages_offset; + ns->pages_total = sb->pages_total; + ns->sb = sb; + ns->free = 0; + ns->bdev = bdev; + ns->set = global_nvmpg_set; + + err = attach_nvmpg_set(ns); + if (err < 0) + goto free_ns; + + mutex_init(&ns->lock); + + err = init_nvmpg_set_header(ns); + if (err < 0) + goto free_ns; + + kfree(path); + return ns; + +free_ns: + kfree(ns); +bdput: + blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXEC); + kfree(path); + return ERR_PTR(err); +} +EXPORT_SYMBOL_GPL(bch_register_namespace); + +int __init bch_nvmpg_init(void) +{ + global_nvmpg_set = kzalloc(sizeof(*global_nvmpg_set), GFP_KERNEL); + if (!global_nvmpg_set) + return -ENOMEM; + + global_nvmpg_set->total_ns = 0; + mutex_init(&global_nvmpg_set->lock); + + pr_info("bcache nvm init\n"); + return 0; +} + +void bch_nvmpg_exit(void) +{ + release_nvmpg_set(global_nvmpg_set); + pr_info("bcache nvm exit\n"); +} diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h new file mode 100644 index 000000000000..827cff695608 --- /dev/null +++ b/drivers/md/bcache/nvm-pages.h @@ -0,0 +1,96 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _BCACHE_NVM_PAGES_H +#define _BCACHE_NVM_PAGES_H + +#include +#include + +/* + * Bcache NVDIMM in memory data structures + */ + +/* + * The following three structures in memory records which page(s) allocated + * to which owner. After reboot from power failure, they will be initialized + * based on nvm pages superblock in NVDIMM device. + */ +struct bch_nvmpg_ns { + struct bch_nvmpg_sb *sb; + void *base_addr; + + unsigned char uuid[16]; + int ns_id; + unsigned int page_size; + unsigned long free; + unsigned long pages_offset; + unsigned long pages_total; + pfn_t start_pfn; + + struct dax_device *dax_dev; + struct block_device *bdev; + struct bch_nvmpg_set *set; + + struct mutex lock; +}; + +/* + * A set of namespaces. Currently only one set can be supported. + */ +struct bch_nvmpg_set { + unsigned char set_uuid[16]; + + int heads_size; + int heads_used; + struct bch_nvmpg_set_header *set_header; + + struct bch_nvmpg_ns *ns_tbl[BCH_NVMPG_NS_MAX]; + int total_ns; + int attached_ns; + + struct mutex lock; +}; + +#define BCH_NVMPG_NS_ID_BITS 3 +#define BCH_NVMPG_OFFSET_BITS 61 +#define BCH_NVMPG_NS_ID_MASK ((1UL<> BCH_NVMPG_OFFSET_BITS) & BCH_NVMPG_NS_ID_MASK) + +#define BCH_NVMPG_GET_OFFSET(offset) ((offset) & BCH_NVMPG_OFFSET_MASK) + +#define BCH_NVMPG_OFFSET(ns_id, offset) \ + ((((ns_id) & BCH_NVMPG_NS_ID_MASK) << BCH_NVMPG_OFFSET_BITS) | \ + ((offset) & BCH_NVMPG_OFFSET_MASK)) + +/* Indicate which field in bch_nvmpg_sb to be updated */ +#define BCH_NVMPG_TOTAL_NS 0 /* total_ns */ + +void *bch_nvmpg_offset_to_ptr(unsigned long offset); +unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr); + +#if defined(CONFIG_BCACHE_NVM_PAGES) + +struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path); +int bch_nvmpg_init(void); +void bch_nvmpg_exit(void); + +#else + +static inline struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path) +{ + return NULL; +} + +static inline int bch_nvmpg_init(void) +{ + return 0; +} + +static inline void bch_nvmpg_exit(void) { } + +#endif /* CONFIG_BCACHE_NVM_PAGES */ + +#endif /* _BCACHE_NVM_PAGES_H */ diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 185246a0d855..4326ffa0d21f 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -14,6 +14,7 @@ #include "request.h" #include "writeback.h" #include "features.h" +#include "nvm-pages.h" #include #include @@ -2809,6 +2810,7 @@ static void bcache_exit(void) { bch_debug_exit(); bch_request_exit(); + bch_nvmpg_exit(); if (bcache_kobj) kobject_put(bcache_kobj); if (bcache_wq) @@ -2907,6 +2909,7 @@ static int __init bcache_init(void) bch_debug_init(); closure_debug_init(); + bch_nvmpg_init(); bcache_is_reboot = false; From patchwork Wed Aug 11 17:02:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12431595 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F965C432BE for ; Wed, 11 Aug 2021 17:04:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2B6996101E for ; Wed, 11 Aug 2021 17:03:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229918AbhHKREW (ORCPT ); Wed, 11 Aug 2021 13:04:22 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:48304 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229905AbhHKREV (ORCPT ); Wed, 11 Aug 2021 13:04:21 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id AF3A32222E; Wed, 11 Aug 2021 17:03:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1628701436; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+mPewLOx0a1S8ZBMqH74ffluZMDAnu2uA7uLvLv6L7M=; b=DX7jTXwYTTqXLq3s63WskLN5neUaghjIx4RB9RunvkEY7H7+89CktyqpLc1pbK8jgjMcVR JlRCx215CeOUrS3Mf7jyzhXaUTxmUs0n5LRn36a4xspablVQlatzpkpnZCcBspH+Ued6ru hctEh1fJQm1MVwyyrK4RZNo4FlX7dZk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1628701436; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+mPewLOx0a1S8ZBMqH74ffluZMDAnu2uA7uLvLv6L7M=; b=hWQWdCyA5eVmdcVOpPYQMf7qx8VQpGpLSiUL8tHA4lB05vD4o8U3UZ6PviN4s862nkeLm6 E6cZOjAhN+1Zb1Bg== Received: from localhost.localdomain (colyli.tcp.ovpn1.nue.suse.de [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id A9B44A3D58; Wed, 11 Aug 2021 17:03:32 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-nvdimm@lists.linux.dev, axboe@kernel.dk, hare@suse.com, jack@suse.cz, dan.j.williams@intel.com, hch@lst.de, ying.huang@intel.com, Jianpeng Ma , kernel test robot , Dan Carpenter , Qiaowei Ren , Hannes Reinecke Subject: [PATCH v12 03/12] bcache: initialization of the buddy Date: Thu, 12 Aug 2021 01:02:15 +0800 Message-Id: <20210811170224.42837-4-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210811170224.42837-1-colyli@suse.de> References: <20210811170224.42837-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Jianpeng Ma This nvm pages allocator will implement the simple buddy allocator to anage the nvm address space. This patch initializes this buddy allocator for new namespace. the unit of alloc/free of the buddy allocator is page. DAX device has their struct page(in dram or PMEM). struct { /* ZONE_DEVICE pages */ /** @pgmap: Points to the hosting device page map. */ struct dev_pagemap *pgmap; void *zone_device_data; /* * ZONE_DEVICE private pages are counted as being * mapped so the next 3 words hold the mapping, index, * and private fields from the source anonymous or * page cache page while the page is migrated to device * private memory. * ZONE_DEVICE MEMORY_DEVICE_FS_DAX pages also * use the mapping, index, and private fields when * pmem backed DAX files are mapped. */ }; ZONE_DEVICE pages only use pgmap. Other 4 words[16/32 bytes] don't use. So the second/third word will be used as 'struct list_head ' which list in buddy. The fourth word(that is normal struct page::index) store pgoff which the page-offset in the dax device. And the fifth word (that is normal struct page::private) store order of buddy. page_type will be used to store buddy flags. Reported-by: kernel test robot Reported-by: Dan Carpenter Signed-off-by: Jianpeng Ma Co-developed-by: Qiaowei Ren Signed-off-by: Qiaowei Ren Cc: Christoph Hellwig Cc: Dan Williams Cc: Hannes Reinecke Cc: Jens Axboe --- drivers/md/bcache/nvm-pages.c | 212 +++++++++++++++++++++++++++++++++- drivers/md/bcache/nvm-pages.h | 12 ++ 2 files changed, 221 insertions(+), 3 deletions(-) diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c index 6184c628d9cc..677fdb62f737 100644 --- a/drivers/md/bcache/nvm-pages.c +++ b/drivers/md/bcache/nvm-pages.c @@ -50,6 +50,36 @@ unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr) return BCH_NVMPG_OFFSET(ns_id, offset); } +static struct page *bch_nvmpg_va_to_pg(void *addr) +{ + return virt_to_page(addr); +} + +static void *bch_nvmpg_pgoff_to_ptr(struct bch_nvmpg_ns *ns, pgoff_t pgoff) +{ + return ns->base_addr + (pgoff << PAGE_SHIFT); +} + +static void *bch_nvmpg_rec_to_ptr(struct bch_nvmpg_rec *r) +{ + struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[r->ns_id]; + pgoff_t pgoff = r->pgoff; + + return bch_nvmpg_pgoff_to_ptr(ns, pgoff); +} + +static inline void reserve_nvmpg_pages(struct bch_nvmpg_ns *ns, + pgoff_t pgoff, u64 nr) +{ + while (nr > 0) { + unsigned int num = nr > UINT_MAX ? UINT_MAX : nr; + + bitmap_set(ns->pages_bitmap, pgoff, num); + nr -= num; + pgoff += num; + } +} + static void release_ns_tbl(struct bch_nvmpg_set *set) { int i; @@ -58,6 +88,10 @@ static void release_ns_tbl(struct bch_nvmpg_set *set) for (i = 0; i < BCH_NVMPG_NS_MAX; i++) { ns = set->ns_tbl[i]; if (ns) { + kvfree(ns->pages_bitmap); + if (ns->recs_bitmap) + bitmap_free(ns->recs_bitmap); + blkdev_put(ns->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXEC); set->ns_tbl[i] = NULL; set->attached_ns--; @@ -75,10 +109,73 @@ static void release_nvmpg_set(struct bch_nvmpg_set *set) kfree(set); } +static int validate_recs(int ns_id, + struct bch_nvmpg_head *head, + struct bch_nvmpg_recs *recs) +{ + if (memcmp(recs->magic, bch_nvmpg_recs_magic, 16)) { + pr_err("Invalid bch_nvmpg_recs magic\n"); + return -EINVAL; + } + + if (memcmp(recs->uuid, head->uuid, 16)) { + pr_err("Invalid bch_nvmpg_recs uuid\n"); + return -EINVAL; + } + + if (recs->head_offset != + bch_nvmpg_ptr_to_offset(global_nvmpg_set->ns_tbl[ns_id], head)) { + pr_err("Invalid recs head_offset\n"); + return -EINVAL; + } + + return 0; +} + +static int reserve_nvmpg_recs(struct bch_nvmpg_recs *recs) +{ + int i, used = 0; + + for (i = 0; i < recs->size; i++) { + struct bch_nvmpg_rec *r = &recs->recs[i]; + struct bch_nvmpg_ns *ns; + struct page *page; + void *addr; + + if (r->pgoff == 0) + continue; + + ns = global_nvmpg_set->ns_tbl[r->ns_id]; + addr = bch_nvmpg_rec_to_ptr(r); + if (addr < ns->base_addr) { + pr_err("Invalid recorded address\n"); + return -EINVAL; + } + + /* init struct page: index/private */ + page = bch_nvmpg_va_to_pg(addr); + set_page_private(page, r->order); + page->index = r->pgoff; + + reserve_nvmpg_pages(ns, r->pgoff, 1L << r->order); + used++; + } + + if (used != recs->used) { + pr_err("used %d doesn't match recs->used %d\n", + used, recs->used); + return -EINVAL; + } + + return 0; +} + /* Namespace 0 contains all meta data of the nvmpg allocation set */ static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns) { struct bch_nvmpg_set_header *set_header; + struct bch_nvmpg_recs *sys_recs; + int i, j, used = 0, rc = 0; if (ns->ns_id != 0) { pr_err("unexpected ns_id %u for first nvmpg namespace.\n", @@ -92,9 +189,83 @@ static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns) global_nvmpg_set->set_header = set_header; global_nvmpg_set->heads_size = set_header->size; global_nvmpg_set->heads_used = set_header->used; + + /* Reserve the used space from buddy allocator */ + reserve_nvmpg_pages(ns, 0, div_u64(ns->pages_offset, ns->page_size)); + + sys_recs = ns->base_addr + BCH_NVMPG_SYSRECS_OFFSET; + for (i = 0; i < set_header->size; i++) { + struct bch_nvmpg_head *head; + + head = &set_header->heads[i]; + if (head->state == BCH_NVMPG_HD_STAT_FREE) + continue; + + used++; + if (used > global_nvmpg_set->heads_size) { + pr_err("used heads %d > heads size %d.\n", + used, global_nvmpg_set->heads_size); + goto unlock; + } + + for (j = 0; j < BCH_NVMPG_NS_MAX; j++) { + struct bch_nvmpg_recs *recs; + + recs = bch_nvmpg_offset_to_ptr(head->recs_offset[j]); + + /* Iterate the recs list */ + while (recs) { + rc = validate_recs(j, head, recs); + if (rc < 0) + goto unlock; + + rc = reserve_nvmpg_recs(recs); + if (rc < 0) + goto unlock; + + bitmap_set(ns->recs_bitmap, recs - sys_recs, 1); + recs = bch_nvmpg_offset_to_ptr(recs->next_offset); + } + } + } +unlock: mutex_unlock(&global_nvmpg_set->lock); + return rc; +} - return 0; +static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns) +{ + unsigned int start, end, pages; + int i; + struct page *page; + pgoff_t pgoff_start; + + bitmap_for_each_clear_region(ns->pages_bitmap, + start, end, 0, ns->pages_total) { + pgoff_start = start; + pages = end - start; + + while (pages) { + void *addr; + + for (i = BCH_MAX_ORDER - 1; i >= 0; i--) { + if ((pgoff_start % (1L << i) == 0) && + (pages >= (1L << i))) + break; + } + + addr = bch_nvmpg_pgoff_to_ptr(ns, pgoff_start); + page = bch_nvmpg_va_to_pg(addr); + set_page_private(page, i); + page->index = pgoff_start; + __SetPageBuddy(page); + list_add((struct list_head *)&page->zone_device_data, + &ns->free_area[i]); + + pgoff_start += 1L << i; + pages -= 1L << i; + } + } } static int attach_nvmpg_set(struct bch_nvmpg_ns *ns) @@ -199,7 +370,7 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path) char buf[BDEVNAME_SIZE]; struct block_device *bdev; pgoff_t pgoff; - int id, err; + int id, i, err; char *path; long dax_ret = 0; @@ -303,13 +474,48 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path) mutex_init(&ns->lock); + /* + * parameters of bitmap_set/clear are unsigned int. + * Given currently size of nvm is far from exceeding this limit, + * so only add a WARN_ON message. + */ + WARN_ON(BITS_TO_LONGS(ns->pages_total) > UINT_MAX); + ns->pages_bitmap = kvcalloc(BITS_TO_LONGS(ns->pages_total), + sizeof(unsigned long), GFP_KERNEL); + if (!ns->pages_bitmap) { + err = -ENOMEM; + goto clear_ns_nr; + } + + if (ns->sb->this_ns == 0) { + ns->recs_bitmap = + bitmap_zalloc(BCH_MAX_PGALLOC_RECS, GFP_KERNEL); + if (ns->recs_bitmap == NULL) { + err = -ENOMEM; + goto free_pages_bitmap; + } + } + + for (i = 0; i < BCH_MAX_ORDER; i++) + INIT_LIST_HEAD(&ns->free_area[i]); + err = init_nvmpg_set_header(ns); if (err < 0) - goto free_ns; + goto free_recs_bitmap; + + if (ns->sb->this_ns == 0) + /* init buddy allocator */ + bch_nvmpg_init_free_space(ns); kfree(path); return ns; +free_recs_bitmap: + bitmap_free(ns->recs_bitmap); +free_pages_bitmap: + kvfree(ns->pages_bitmap); +clear_ns_nr: + global_nvmpg_set->ns_tbl[sb->this_ns] = NULL; free_ns: kfree(ns); bdput: diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h index 827cff695608..2116086c4d01 100644 --- a/drivers/md/bcache/nvm-pages.h +++ b/drivers/md/bcache/nvm-pages.h @@ -10,6 +10,8 @@ * Bcache NVDIMM in memory data structures */ +#define BCH_MAX_ORDER 20 + /* * The following three structures in memory records which page(s) allocated * to which owner. After reboot from power failure, they will be initialized @@ -27,6 +29,11 @@ struct bch_nvmpg_ns { unsigned long pages_total; pfn_t start_pfn; + unsigned long *pages_bitmap; + struct list_head free_area[BCH_MAX_ORDER]; + + unsigned long *recs_bitmap; + struct dax_device *dax_dev; struct block_device *bdev; struct bch_nvmpg_set *set; @@ -68,6 +75,11 @@ struct bch_nvmpg_set { /* Indicate which field in bch_nvmpg_sb to be updated */ #define BCH_NVMPG_TOTAL_NS 0 /* total_ns */ +#define BCH_MAX_PGALLOC_RECS \ + (min_t(unsigned int, 64, \ + (BCH_NVMPG_START - BCH_NVMPG_SYSRECS_OFFSET) / \ + sizeof(struct bch_nvmpg_recs))) + void *bch_nvmpg_offset_to_ptr(unsigned long offset); unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr); From patchwork Wed Aug 11 17:02:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12431597 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 441D6C4320E for ; Wed, 11 Aug 2021 17:04:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 21BA96104F for ; Wed, 11 Aug 2021 17:04:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229958AbhHKRE1 (ORCPT ); Wed, 11 Aug 2021 13:04:27 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:48364 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229905AbhHKRE0 (ORCPT ); Wed, 11 Aug 2021 13:04:26 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 5314B22230; Wed, 11 Aug 2021 17:04:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1628701441; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tzL/DK4YyfDaNOt8X4SJPGZyOBuwC/rMUS8f0mHWqOg=; b=nxgg4FjpiXIoqqXYt5+wTDrvudcrr9AaiddY4RTziQ2OR8ZOH61x56z2yjVQ1nq/RXZkcP iuAMc4oJBS/zrBGxxY3flMbzP8PO4vB7HI1rf/NKk8UB7nk/ksFZqlJsg361EyjbCULgm/ Yj5goPTPvjkiq3Oihk/JEG9eZDRXc98= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1628701441; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tzL/DK4YyfDaNOt8X4SJPGZyOBuwC/rMUS8f0mHWqOg=; b=10PzYtjgae9mK8d6ZpUfX9J3ur62lPz0O/4mfPEBjMImIfuGy66f/eH9/gzxRRfdEkWZ1K HSClKOfb+fTwScBg== Received: from localhost.localdomain (colyli.tcp.ovpn1.nue.suse.de [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id 30A11A3D6B; Wed, 11 Aug 2021 17:03:56 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-nvdimm@lists.linux.dev, axboe@kernel.dk, hare@suse.com, jack@suse.cz, dan.j.williams@intel.com, hch@lst.de, ying.huang@intel.com, Jianpeng Ma , Qiaowei Ren , Hannes Reinecke Subject: [PATCH v12 04/12] bcache: bch_nvmpg_alloc_pages() of the buddy Date: Thu, 12 Aug 2021 01:02:16 +0800 Message-Id: <20210811170224.42837-5-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210811170224.42837-1-colyli@suse.de> References: <20210811170224.42837-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Jianpeng Ma This patch implements the bch_nvmpg_alloc_pages() of the nvm pages buddy allocator. In terms of function, this func is like current page-buddy-alloc. But the differences are: a: it need owner_uuid as parameter which record owner info. And it make those info persistence. b: it don't need flags like GFP_*. All allocs are the equal. c: it don't trigger other ops etc swap/recycle. Signed-off-by: Jianpeng Ma Co-developed-by: Qiaowei Ren Signed-off-by: Qiaowei Ren Cc: Christoph Hellwig Cc: Dan Williams Cc: Hannes Reinecke Cc: Jens Axboe --- drivers/md/bcache/nvm-pages.c | 210 ++++++++++++++++++++++++++++++++++ drivers/md/bcache/nvm-pages.h | 9 ++ 2 files changed, 219 insertions(+) diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c index 677fdb62f737..420b7c479057 100644 --- a/drivers/md/bcache/nvm-pages.c +++ b/drivers/md/bcache/nvm-pages.c @@ -50,6 +50,13 @@ unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr) return BCH_NVMPG_OFFSET(ns_id, offset); } +static unsigned long bch_nvmpg_ptr_to_pgoff(struct bch_nvmpg_ns *ns, void *ptr) +{ + unsigned long offset = (unsigned long)(ptr - ns->base_addr); + + return offset >> PAGE_SHIFT; +} + static struct page *bch_nvmpg_va_to_pg(void *addr) { return virt_to_page(addr); @@ -268,6 +275,209 @@ static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns) } } + +/* If not found, it will create if create == true */ +static struct bch_nvmpg_head *find_nvmpg_head(const char *uuid, bool create) +{ + struct bch_nvmpg_set_header *set_header = global_nvmpg_set->set_header; + struct bch_nvmpg_head *head = NULL; + int i; + + if (set_header == NULL) + goto out; + + for (i = 0; i < set_header->size; i++) { + struct bch_nvmpg_head *h = &set_header->heads[i]; + + if (h->state != BCH_NVMPG_HD_STAT_ALLOC) + continue; + + if (!memcmp(uuid, h->uuid, 16)) { + head = h; + break; + } + } + + if (!head && create) { + u32 used = set_header->used; + + if (set_header->size > used) { + head = &set_header->heads[used]; + memset(head, 0, sizeof(struct bch_nvmpg_head)); + head->state = BCH_NVMPG_HD_STAT_ALLOC; + memcpy(head->uuid, uuid, 16); + global_nvmpg_set->heads_used++; + set_header->used++; + } else + pr_info("No free bch_nvmpg_head\n"); + } + +out: + return head; +} + +static struct bch_nvmpg_recs *find_empty_nvmpg_recs(void) +{ + unsigned int start; + struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[0]; + struct bch_nvmpg_recs *recs; + + start = bitmap_find_next_zero_area(ns->recs_bitmap, + BCH_MAX_PGALLOC_RECS, 0, 1, 0); + if (start > BCH_MAX_PGALLOC_RECS) { + pr_info("No free struct bch_nvmpg_recs\n"); + return NULL; + } + + bitmap_set(ns->recs_bitmap, start, 1); + recs = (struct bch_nvmpg_recs *) + bch_nvmpg_offset_to_ptr(BCH_NVMPG_SYSRECS_OFFSET) + + start; + + memset(recs, 0, sizeof(struct bch_nvmpg_recs)); + return recs; +} + + +static struct bch_nvmpg_recs *find_nvmpg_recs(struct bch_nvmpg_ns *ns, + struct bch_nvmpg_head *head, + bool create) +{ + int ns_id = ns->sb->this_ns; + struct bch_nvmpg_recs *prev_recs = NULL, *recs = NULL; + + recs = bch_nvmpg_offset_to_ptr(head->recs_offset[ns_id]); + + /* If create=false, we return recs[nr] */ + if (!create) + return recs; + + /* + * If create=true, it mean we need a empty struct bch_nvmpg_rec + * So we should find non-empty struct bch_nvmpg_recs or alloc + * new struct bch_nvmpg_recs. And return this bch_nvmpg_recs + */ + while (recs && (recs->used == recs->size)) { + prev_recs = recs; + recs = bch_nvmpg_offset_to_ptr(recs->next_offset); + } + + /* Found empty struct bch_nvmpg_recs */ + if (recs) + return recs; + + /* Need alloc new struct bch_nvmpg_recs */ + recs = find_empty_nvmpg_recs(); + if (recs) { + unsigned long offset; + + recs->next_offset = 0; + recs->head_offset = bch_nvmpg_ptr_to_offset(ns, head); + memcpy(recs->magic, bch_nvmpg_recs_magic, 16); + memcpy(recs->uuid, head->uuid, 16); + recs->size = BCH_NVMPG_MAX_RECS; + recs->used = 0; + + offset = bch_nvmpg_ptr_to_offset(ns, recs); + if (prev_recs) + prev_recs->next_offset = offset; + else + head->recs_offset[ns_id] = offset; + } + + return recs; +} + +static void add_nvmpg_rec(struct bch_nvmpg_ns *ns, + struct bch_nvmpg_recs *recs, + void *kaddr, int order) +{ + int i; + + for (i = 0; i < recs->size; i++) { + if (recs->recs[i].pgoff == 0) { + recs->recs[i].pgoff = bch_nvmpg_ptr_to_pgoff(ns, kaddr); + recs->recs[i].order = order; + recs->recs[i].ns_id = ns->sb->this_ns; + recs->used++; + break; + } + } + BUG_ON(i == recs->size); +} + + +void *bch_nvmpg_alloc_pages(int order, const char *uuid) +{ + void *kaddr = NULL; + struct bch_nvmpg_head *head; + int n, o; + + mutex_lock(&global_nvmpg_set->lock); + head = find_nvmpg_head(uuid, true); + + if (!head) { + pr_err("Cannot find bch_nvmpg_recs by uuid.\n"); + goto unlock; + } + + for (n = 0; n < global_nvmpg_set->total_ns; n++) { + struct bch_nvmpg_ns *ns = global_nvmpg_set->ns_tbl[n]; + + if (!ns || (ns->free < (1L << order))) + continue; + + for (o = order; o < BCH_MAX_ORDER; o++) { + struct list_head *list; + struct page *page, *buddy_page; + + if (list_empty(&ns->free_area[o])) + continue; + + list = ns->free_area[o].next; + page = container_of((void *)list, struct page, + zone_device_data); + + list_del(list); + + while (o != order) { + void *addr; + pgoff_t pgoff; + + pgoff = page->index + (1L << (o - 1)); + addr = bch_nvmpg_pgoff_to_ptr(ns, pgoff); + buddy_page = bch_nvmpg_va_to_pg(addr); + set_page_private(buddy_page, o - 1); + buddy_page->index = pgoff; + __SetPageBuddy(buddy_page); + list_add((struct list_head *)&buddy_page->zone_device_data, + &ns->free_area[o - 1]); + o--; + } + + set_page_private(page, order); + __ClearPageBuddy(page); + ns->free -= 1L << order; + kaddr = bch_nvmpg_pgoff_to_ptr(ns, page->index); + break; + } + + if (o < BCH_MAX_ORDER) { + struct bch_nvmpg_recs *recs; + + recs = find_nvmpg_recs(ns, head, true); + /* ToDo: handle pgalloc_recs==NULL */ + add_nvmpg_rec(ns, recs, kaddr, order); + break; + } + } + +unlock: + mutex_unlock(&global_nvmpg_set->lock); + return kaddr; +} +EXPORT_SYMBOL_GPL(bch_nvmpg_alloc_pages); + static int attach_nvmpg_set(struct bch_nvmpg_ns *ns) { struct bch_nvmpg_sb *sb = ns->sb; diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h index 2116086c4d01..1bcd7a4e1fd1 100644 --- a/drivers/md/bcache/nvm-pages.h +++ b/drivers/md/bcache/nvm-pages.h @@ -75,6 +75,9 @@ struct bch_nvmpg_set { /* Indicate which field in bch_nvmpg_sb to be updated */ #define BCH_NVMPG_TOTAL_NS 0 /* total_ns */ +#define BCH_PGOFF_TO_KVADDR(pgoff) \ + ((void *)((unsigned long)(pgoff) << PAGE_SHIFT)) + #define BCH_MAX_PGALLOC_RECS \ (min_t(unsigned int, 64, \ (BCH_NVMPG_START - BCH_NVMPG_SYSRECS_OFFSET) / \ @@ -88,6 +91,7 @@ unsigned long bch_nvmpg_ptr_to_offset(struct bch_nvmpg_ns *ns, void *ptr); struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path); int bch_nvmpg_init(void); void bch_nvmpg_exit(void); +void *bch_nvmpg_alloc_pages(int order, const char *uuid); #else @@ -103,6 +107,11 @@ static inline int bch_nvmpg_init(void) static inline void bch_nvmpg_exit(void) { } +static inline void *bch_nvmpg_alloc_pages(int order, const char *uuid) +{ + return NULL; +} + #endif /* CONFIG_BCACHE_NVM_PAGES */ #endif /* _BCACHE_NVM_PAGES_H */ From patchwork Wed Aug 11 17:02:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12431599 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA677C4320A for ; Wed, 11 Aug 2021 17:04:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7630060E78 for ; Wed, 11 Aug 2021 17:04:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230096AbhHKREq (ORCPT ); Wed, 11 Aug 2021 13:04:46 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:58290 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229905AbhHKREp (ORCPT ); Wed, 11 Aug 2021 13:04:45 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 3E97E1FEDC; Wed, 11 Aug 2021 17:04:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1628701461; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oqOSTbWS2JJ3AUVP1NVPqkmZCwkyfGMY0LYovewmgME=; b=p0+ws1Vsmfu1JenGxdNuAm4gQfogifCcUgNE7R3CEQGE5OA0rbkoU46LngyKrP2koM5Dqx uVnSWHsVadFIhAP2fSKwW1cWO8rX56x3qoKlQrFHT7cJ5So5QujX/pmzF3esXpEtXsxoov sYEhDccEm+isiesTTN3krPMpVSQEVpw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1628701461; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oqOSTbWS2JJ3AUVP1NVPqkmZCwkyfGMY0LYovewmgME=; b=lccT5Sc4X/VasjybDrX6Tm9UbTuMpCWrW1U5W9jsRwclAqhlokFaJjZ+HclUN7OyBwNe0p Fgn3JVClvKtBYvDw== Received: from localhost.localdomain (colyli.tcp.ovpn1.nue.suse.de [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id D0BC0A3D6F; Wed, 11 Aug 2021 17:04:01 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-nvdimm@lists.linux.dev, axboe@kernel.dk, hare@suse.com, jack@suse.cz, dan.j.williams@intel.com, hch@lst.de, ying.huang@intel.com, Jianpeng Ma , Qiaowei Ren , Hannes Reinecke Subject: [PATCH v12 05/12] bcache: bch_nvmpg_free_pages() of the buddy allocator Date: Thu, 12 Aug 2021 01:02:17 +0800 Message-Id: <20210811170224.42837-6-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210811170224.42837-1-colyli@suse.de> References: <20210811170224.42837-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Jianpeng Ma This patch implements the bch_nvmpg_free_pages() of the buddy allocator. The difference between this and page-buddy-free: it need owner_uuid to free owner allocated pages, and must persistent after free. Signed-off-by: Jianpeng Ma Co-developed-by: Qiaowei Ren Signed-off-by: Qiaowei Ren Cc: Christoph Hellwig Cc: Dan Williams Cc: Hannes Reinecke Cc: Jens Axboe --- drivers/md/bcache/nvm-pages.c | 167 +++++++++++++++++++++++++++++++++- drivers/md/bcache/nvm-pages.h | 3 + 2 files changed, 167 insertions(+), 3 deletions(-) diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c index 420b7c479057..ef61fdaaac28 100644 --- a/drivers/md/bcache/nvm-pages.c +++ b/drivers/md/bcache/nvm-pages.c @@ -240,6 +240,51 @@ static int init_nvmpg_set_header(struct bch_nvmpg_ns *ns) return rc; } +static void __free_space(struct bch_nvmpg_ns *ns, void *addr, int order) +{ + unsigned long add_pages = (1L << order); + pgoff_t pgoff; + struct page *page; + void *va; + + page = bch_nvmpg_va_to_pg(addr); + WARN_ON((!page) || (page->private != order)); + pgoff = page->index; + + while (order < BCH_MAX_ORDER - 1) { + struct page *buddy_page; + + pgoff_t buddy_pgoff = pgoff ^ (1L << order); + pgoff_t parent_pgoff = pgoff & ~(1L << order); + + if ((parent_pgoff + (1L << (order + 1)) > ns->pages_total)) + break; + + va = bch_nvmpg_pgoff_to_ptr(ns, buddy_pgoff); + buddy_page = bch_nvmpg_va_to_pg(va); + WARN_ON(!buddy_page); + + if (PageBuddy(buddy_page) && (buddy_page->private == order)) { + list_del((struct list_head *)&buddy_page->zone_device_data); + __ClearPageBuddy(buddy_page); + pgoff = parent_pgoff; + order++; + continue; + } + break; + } + + va = bch_nvmpg_pgoff_to_ptr(ns, pgoff); + page = bch_nvmpg_va_to_pg(va); + WARN_ON(!page); + list_add((struct list_head *)&page->zone_device_data, + &ns->free_area[order]); + page->index = pgoff; + set_page_private(page, order); + __SetPageBuddy(page); + ns->free += add_pages; +} + static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns) { unsigned int start, end, pages; @@ -265,9 +310,9 @@ static void bch_nvmpg_init_free_space(struct bch_nvmpg_ns *ns) page = bch_nvmpg_va_to_pg(addr); set_page_private(page, i); page->index = pgoff_start; - __SetPageBuddy(page); - list_add((struct list_head *)&page->zone_device_data, - &ns->free_area[i]); + + /* In order to update ns->free */ + __free_space(ns, addr, i); pgoff_start += 1L << i; pages -= 1L << i; @@ -478,6 +523,121 @@ void *bch_nvmpg_alloc_pages(int order, const char *uuid) } EXPORT_SYMBOL_GPL(bch_nvmpg_alloc_pages); +static inline void *nvm_end_addr(struct bch_nvmpg_ns *ns) +{ + return ns->base_addr + (ns->pages_total << PAGE_SHIFT); +} + +static inline bool in_nvmpg_ns_range(struct bch_nvmpg_ns *ns, + void *start_addr, void *end_addr) +{ + return (start_addr >= ns->base_addr) && (end_addr < nvm_end_addr(ns)); +} + +static struct bch_nvmpg_ns *find_nvmpg_ns_by_addr(void *addr, int order) +{ + int i; + struct bch_nvmpg_ns *ns; + + for (i = 0; i < global_nvmpg_set->total_ns; i++) { + ns = global_nvmpg_set->ns_tbl[i]; + + if (ns && in_nvmpg_ns_range(ns, addr, addr + (1L << order))) + return ns; + } + + return NULL; +} + +static int remove_nvmpg_rec(struct bch_nvmpg_recs *recs, int ns_id, + void *kaddr, int order) +{ + struct bch_nvmpg_head *head; + struct bch_nvmpg_recs *prev_recs, *sys_recs; + struct bch_nvmpg_ns *ns; + unsigned long pgoff; + int i; + + ns = global_nvmpg_set->ns_tbl[0]; + pgoff = bch_nvmpg_ptr_to_pgoff(ns, kaddr); + + head = bch_nvmpg_offset_to_ptr(recs->head_offset); + prev_recs = recs; + sys_recs = bch_nvmpg_offset_to_ptr(BCH_NVMPG_SYSRECS_OFFSET); + while (recs) { + for (i = 0; i < recs->size; i++) { + struct bch_nvmpg_rec *rec = &(recs->recs[i]); + + if ((rec->pgoff == pgoff) && (rec->ns_id == ns_id)) { + WARN_ON(rec->order != order); + rec->_v = 0; + recs->used--; + + if (recs->used == 0) { + int recs_pos = recs - sys_recs; + + if (recs == prev_recs) + head->recs_offset[ns_id] = + recs->next_offset; + else + prev_recs->next_offset = + recs->next_offset; + + recs->next_offset = 0; + recs->head_offset = 0; + + bitmap_clear(ns->recs_bitmap, recs_pos, 1); + } + goto out; + } + } + prev_recs = recs; + recs = bch_nvmpg_offset_to_ptr(recs->next_offset); + } +out: + return (recs ? 0 : -ENOENT); +} + +void bch_nvmpg_free_pages(void *addr, int order, const char *uuid) +{ + struct bch_nvmpg_ns *ns; + struct bch_nvmpg_head *head; + struct bch_nvmpg_recs *recs; + int r; + + mutex_lock(&global_nvmpg_set->lock); + + ns = find_nvmpg_ns_by_addr(addr, order); + if (!ns) { + pr_err("can't find namespace by given kaddr from namespace\n"); + goto unlock; + } + + head = find_nvmpg_head(uuid, false); + if (!head) { + pr_err("can't found bch_nvmpg_head by uuid\n"); + goto unlock; + } + + recs = find_nvmpg_recs(ns, head, false); + if (!recs) { + pr_err("can't find bch_nvmpg_recs by uuid\n"); + goto unlock; + } + + r = remove_nvmpg_rec(recs, ns->sb->this_ns, addr, order); + if (r < 0) { + pr_err("can't find bch_nvmpg_rec\n"); + goto unlock; + } + + __free_space(ns, addr, order); + +unlock: + mutex_unlock(&global_nvmpg_set->lock); +} +EXPORT_SYMBOL_GPL(bch_nvmpg_free_pages); + static int attach_nvmpg_set(struct bch_nvmpg_ns *ns) { struct bch_nvmpg_sb *sb = ns->sb; @@ -674,6 +834,7 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path) ns->pages_offset = sb->pages_offset; ns->pages_total = sb->pages_total; ns->sb = sb; + /* increase by __free_space() */ ns->free = 0; ns->bdev = bdev; ns->set = global_nvmpg_set; diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h index 1bcd7a4e1fd1..2529dc8b9d49 100644 --- a/drivers/md/bcache/nvm-pages.h +++ b/drivers/md/bcache/nvm-pages.h @@ -92,6 +92,7 @@ struct bch_nvmpg_ns *bch_register_namespace(const char *dev_path); int bch_nvmpg_init(void); void bch_nvmpg_exit(void); void *bch_nvmpg_alloc_pages(int order, const char *uuid); +void bch_nvmpg_free_pages(void *addr, int order, const char *uuid); #else @@ -112,6 +113,8 @@ static inline void *bch_nvmpg_alloc_pages(int order, const char *uuid) return NULL; } +static inline void bch_nvmpg_free_pages(void *addr, int order, const char *uuid) { } + #endif /* CONFIG_BCACHE_NVM_PAGES */ #endif /* _BCACHE_NVM_PAGES_H */ From patchwork Wed Aug 11 17:02:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12431601 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B6B9C432BE for ; Wed, 11 Aug 2021 17:04:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 46C9260E78 for ; Wed, 11 Aug 2021 17:04:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230196AbhHKREw (ORCPT ); Wed, 11 Aug 2021 13:04:52 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:58610 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229905AbhHKREv (ORCPT ); Wed, 11 Aug 2021 13:04:51 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 2E9B820193; Wed, 11 Aug 2021 17:04:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1628701467; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=laAwESEAf1HwUo02bMecdYHi6y4qQbivmyWQ7n28vLM=; b=kL8U39qfAGHlfIpFKSdlb2jrwBzFmmUCbuKHdvnI5e85FkNvJFP3O1BJ06q3ES/IeJQhWf yByFx91+xFcC+91UKtqBaioVexFYMwDkYlqbRGPkArYHsLluWVAtFchyc9yWVZREnObCzX 5GmsU6pjWTsitNFHOpXgJLb32Wm3p8A= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1628701467; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=laAwESEAf1HwUo02bMecdYHi6y4qQbivmyWQ7n28vLM=; b=C9v12mtL3QesmsinphdNJhXzTH3yKw6zj+xrgVu7N2XGLMqnd1sJx+niQ+a/xSwqRJCffV BTRbzpg67QbE3yBQ== Received: from localhost.localdomain (colyli.tcp.ovpn1.nue.suse.de [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id 0CB9AA3D62; Wed, 11 Aug 2021 17:04:21 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-nvdimm@lists.linux.dev, axboe@kernel.dk, hare@suse.com, jack@suse.cz, dan.j.williams@intel.com, hch@lst.de, ying.huang@intel.com, Jianpeng Ma , Qiaowei Ren , Hannes Reinecke Subject: [PATCH v12 06/12] bcache: get recs list head for allocated pages by specific uuid Date: Thu, 12 Aug 2021 01:02:18 +0800 Message-Id: <20210811170224.42837-7-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210811170224.42837-1-colyli@suse.de> References: <20210811170224.42837-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Jianpeng Ma This patch implements bch_get_nvmpg_head() of the buddy allocator to be used to get recs list head for allocated pages by specific uuid. Then the requester (owner) can find all previous allocated nvdimm pages by iterating the recs list. Signed-off-by: Jianpeng Ma Co-developed-by: Qiaowei Ren Signed-off-by: Qiaowei Ren Reviewed-by: Hannes Reinecke Cc: Christoph Hellwig Cc: Dan Williams Cc: Jens Axboe --- drivers/md/bcache/nvm-pages.c | 6 ++++++ drivers/md/bcache/nvm-pages.h | 6 ++++++ 2 files changed, 12 insertions(+) diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c index ef61fdaaac28..497360c60f26 100644 --- a/drivers/md/bcache/nvm-pages.c +++ b/drivers/md/bcache/nvm-pages.c @@ -523,6 +523,12 @@ void *bch_nvmpg_alloc_pages(int order, const char *uuid) } EXPORT_SYMBOL_GPL(bch_nvmpg_alloc_pages); +struct bch_nvmpg_head *bch_get_nvmpg_head(const char *uuid) +{ + return find_nvmpg_head(uuid, false); +} +EXPORT_SYMBOL_GPL(bch_get_nvmpg_head); + static inline void *nvm_end_addr(struct bch_nvmpg_ns *ns) { return ns->base_addr + (ns->pages_total << PAGE_SHIFT); diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h index 2529dc8b9d49..2f6f2ffbfd80 100644 --- a/drivers/md/bcache/nvm-pages.h +++ b/drivers/md/bcache/nvm-pages.h @@ -93,6 +93,7 @@ int bch_nvmpg_init(void); void bch_nvmpg_exit(void); void *bch_nvmpg_alloc_pages(int order, const char *uuid); void bch_nvmpg_free_pages(void *addr, int order, const char *uuid); +struct bch_nvmpg_head *bch_get_nvmpg_head(const char *uuid); #else @@ -115,6 +116,11 @@ static inline void *bch_nvmpg_alloc_pages(int order, const char *uuid) static inline void bch_nvmpg_free_pages(void *addr, int order, const char *uuid) { } +static inline struct bch_nvmpg_head *bch_get_nvmpg_head(const char *uuid) +{ + return NULL; +} + #endif /* CONFIG_BCACHE_NVM_PAGES */ #endif /* _BCACHE_NVM_PAGES_H */ From patchwork Wed Aug 11 17:02:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12431603 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43041C4320A for ; Wed, 11 Aug 2021 17:04:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 191AF61019 for ; Wed, 11 Aug 2021 17:04:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230253AbhHKRFH (ORCPT ); Wed, 11 Aug 2021 13:05:07 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:58626 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229905AbhHKRFG (ORCPT ); Wed, 11 Aug 2021 13:05:06 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 8733A20185; Wed, 11 Aug 2021 17:04:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1628701481; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wtwFnp4xZMjddc3KByFiXKLm0ST5Cfz+TLDQyLU1/5Q=; b=OoxG+Fdr5sgZySs1KR8cWyYjVrhjTS2nJnHprvqfRGcVMNrmhp6EavaOB2mx38OivNnB2/ CFu8JGkBtOdhkTzIqwpuQ/HWo9I2iNul7/M05qxZ5TAH6G8RQDdAaVYy8GNOeR7gNEoye+ wiL57XghNaw3e8uhLXmgcMQorbwsrEQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1628701481; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wtwFnp4xZMjddc3KByFiXKLm0ST5Cfz+TLDQyLU1/5Q=; b=fA2/MOmq+ILLxRzTje5wEhnlZtkuxCtLLAUPtHcsO42L8SYAaZxdXC7zfRF3OEicHbt920 GHqgDXV2b0YcO3DQ== Received: from localhost.localdomain (colyli.tcp.ovpn1.nue.suse.de [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id F21DFA3D65; Wed, 11 Aug 2021 17:04:27 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-nvdimm@lists.linux.dev, axboe@kernel.dk, hare@suse.com, jack@suse.cz, dan.j.williams@intel.com, hch@lst.de, ying.huang@intel.com, Coly Li , Hannes Reinecke , Jianpeng Ma , Qiaowei Ren Subject: [PATCH v12 07/12] bcache: use bucket index to set GC_MARK_METADATA for journal buckets in bch_btree_gc_finish() Date: Thu, 12 Aug 2021 01:02:19 +0800 Message-Id: <20210811170224.42837-8-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210811170224.42837-1-colyli@suse.de> References: <20210811170224.42837-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently the meta data bucket locations on cache device are reserved after the meta data stored on NVDIMM pages, for the meta data layout consistentcy temporarily. So these buckets are still marked as meta data by SET_GC_MARK() in bch_btree_gc_finish(). When BCH_FEATURE_INCOMPAT_NVDIMM_META is set, the sb.d[] stores linear address of NVDIMM pages and not bucket index anymore. Therefore we should avoid to find bucket index from sb.d[], and directly use bucket index from ca->sb.first_bucket to (ca->sb.first_bucket + ca->sb.njournal_bucketsi) for setting the gc mark of journal bucket. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke Cc: Christoph Hellwig Cc: Dan Williams Cc: Jens Axboe Cc: Jianpeng Ma Cc: Qiaowei Ren --- drivers/md/bcache/btree.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 183a58c89377..e0d7135669ca 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -1761,8 +1761,10 @@ static void bch_btree_gc_finish(struct cache_set *c) ca = c->cache; ca->invalidate_needs_gc = 0; - for (k = ca->sb.d; k < ca->sb.d + ca->sb.keys; k++) - SET_GC_MARK(ca->buckets + *k, GC_MARK_METADATA); + /* Range [first_bucket, first_bucket + keys) is for journal buckets */ + for (i = ca->sb.first_bucket; + i < ca->sb.first_bucket + ca->sb.njournal_buckets; i++) + SET_GC_MARK(ca->buckets + i, GC_MARK_METADATA); for (k = ca->prio_buckets; k < ca->prio_buckets + prio_buckets(ca) * 2; k++) From patchwork Wed Aug 11 17:02:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12431605 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03F75C4320A for ; Wed, 11 Aug 2021 17:04:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CACFC6101E for ; Wed, 11 Aug 2021 17:04:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229600AbhHKRFM (ORCPT ); Wed, 11 Aug 2021 13:05:12 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:58642 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229905AbhHKRFM (ORCPT ); Wed, 11 Aug 2021 13:05:12 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id E025520194; Wed, 11 Aug 2021 17:04:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1628701487; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pDO6KbUU2BBHFR3gnCnc2pGQo+NHb8DjCJX3xMzrcWc=; b=ZlVgU81q6pHRHlmcuUBC++wKpikJ9HDgBaIvtF9LxfMWXbHRVg4hsDj7ZxW5MAUsUILjDQ j1VXcd/Pww+MMbnvLklW9cfbHBiy5730ole/LQAhdYI5+y9xJdQpl3Koi32tbSbjXD4MId c8z9wGkDlblVvzhgmHE9JGadph5nkKc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1628701487; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pDO6KbUU2BBHFR3gnCnc2pGQo+NHb8DjCJX3xMzrcWc=; b=ZQ5ywbr2wBpN9hT831K3oGCFwuFRCU54ydL8y1Pe0RUWtYTFwuuxN3j9iRNY6k8+y6Oxi5 iAionqaLD+QzOFBg== Received: from localhost.localdomain (colyli.tcp.ovpn1.nue.suse.de [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id 1E1F8A3D5E; Wed, 11 Aug 2021 17:04:41 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-nvdimm@lists.linux.dev, axboe@kernel.dk, hare@suse.com, jack@suse.cz, dan.j.williams@intel.com, hch@lst.de, ying.huang@intel.com, Coly Li , Hannes Reinecke , Jianpeng Ma , Qiaowei Ren Subject: [PATCH v12 08/12] bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set Date: Thu, 12 Aug 2021 01:02:20 +0800 Message-Id: <20210811170224.42837-9-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210811170224.42837-1-colyli@suse.de> References: <20210811170224.42837-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This patch adds BCH_FEATURE_INCOMPAT_NVDIMM_META (value 0x0004) into the incompat feature set. When this bit is set by bcache-tools, it indicates bcache meta data should be stored on specific NVDIMM meta device. The bcache meta data mainly includes journal and btree nodes, when this bit is set in incompat feature set, bcache will ask the nvm-pages allocator for NVDIMM space to store the meta data. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke Cc: Christoph Hellwig Cc: Dan Williams Cc: Jens Axboe Cc: Jianpeng Ma Cc: Qiaowei Ren --- drivers/md/bcache/features.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/md/bcache/features.h b/drivers/md/bcache/features.h index d1c8fd3977fc..45d2508d5532 100644 --- a/drivers/md/bcache/features.h +++ b/drivers/md/bcache/features.h @@ -17,11 +17,19 @@ #define BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET 0x0001 /* real bucket size is (1 << bucket_size) */ #define BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE 0x0002 +/* store bcache meta data on nvdimm */ +#define BCH_FEATURE_INCOMPAT_NVDIMM_META 0x0004 #define BCH_FEATURE_COMPAT_SUPP 0 #define BCH_FEATURE_RO_COMPAT_SUPP 0 +#if defined(CONFIG_BCACHE_NVM_PAGES) +#define BCH_FEATURE_INCOMPAT_SUPP (BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET| \ + BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE| \ + BCH_FEATURE_INCOMPAT_NVDIMM_META) +#else #define BCH_FEATURE_INCOMPAT_SUPP (BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET| \ BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE) +#endif #define BCH_HAS_COMPAT_FEATURE(sb, mask) \ ((sb)->feature_compat & (mask)) @@ -89,6 +97,7 @@ static inline void bch_clear_feature_##name(struct cache_sb *sb) \ BCH_FEATURE_INCOMPAT_FUNCS(obso_large_bucket, OBSO_LARGE_BUCKET); BCH_FEATURE_INCOMPAT_FUNCS(large_bucket, LOG_LARGE_BUCKET_SIZE); +BCH_FEATURE_INCOMPAT_FUNCS(nvdimm_meta, NVDIMM_META); static inline bool bch_has_unknown_compat_features(struct cache_sb *sb) { From patchwork Wed Aug 11 17:02:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12431607 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83808C4320A for ; Wed, 11 Aug 2021 17:04:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 63E2661019 for ; Wed, 11 Aug 2021 17:04:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230369AbhHKRFT (ORCPT ); Wed, 11 Aug 2021 13:05:19 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:48392 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230394AbhHKRFT (ORCPT ); Wed, 11 Aug 2021 13:05:19 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id B45FE2220F; Wed, 11 Aug 2021 17:04:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1628701494; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ecIa84cgTVNhRGYl/WBq7EpOun0kY/mYu4uzS76R8dA=; b=1E93cL6Djb7Mj1kctKms3Ce3R7RtTZy9SAwZ13NTD7ofkNNBmzaQZ7t0WubRJV5MeJ4x3Y UHzLaM5H6MxO2bkjSB7aKhpF0E4zfGVyj9P3Uu6nw1347w2aNVNU3zk+A3pPF3mk9NClAM A3jTbTSoNmhH4keaMtI1RQYlRWy9d+w= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1628701494; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ecIa84cgTVNhRGYl/WBq7EpOun0kY/mYu4uzS76R8dA=; b=GeRH1c7VyumFjaF2bxVkQP+D7dbHQICt82P1WZwIdaDuk2ox9DOGideAkyg2i1ZOJlV3Vf VXr0V00IERPBXnAg== Received: from localhost.localdomain (colyli.tcp.ovpn1.nue.suse.de [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id A5519A3D5E; Wed, 11 Aug 2021 17:04:48 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-nvdimm@lists.linux.dev, axboe@kernel.dk, hare@suse.com, jack@suse.cz, dan.j.williams@intel.com, hch@lst.de, ying.huang@intel.com, Coly Li , Hannes Reinecke , Jianpeng Ma , Qiaowei Ren Subject: [PATCH v12 09/12] bcache: initialize bcache journal for NVDIMM meta device Date: Thu, 12 Aug 2021 01:02:21 +0800 Message-Id: <20210811170224.42837-10-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210811170224.42837-1-colyli@suse.de> References: <20210811170224.42837-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org The nvm-pages allocator may store and index the NVDIMM pages allocated for bcache journal. This patch adds the initialization to store bcache journal space on NVDIMM pages if BCH_FEATURE_INCOMPAT_NVDIMM_META bit is set by bcache-tools. If BCH_FEATURE_INCOMPAT_NVDIMM_META is set, get_nvdimm_journal_space() will return the linear address of NVDIMM pages for bcache journal, - If there is previously allocated space, find it from nvm-pages owner list and return to bch_journal_init(). - If there is no previously allocated space, require a new NVDIMM range from the nvm-pages allocator, and return it to bch_journal_init(). And in bch_journal_init(), keys in sb.d[] store the corresponding linear address from NVDIMM into sb.d[i].ptr[0] where 'i' is the bucket index to iterate all journal buckets. Later when bcache journaling code stores the journaling jset, the target NVDIMM linear address stored (and updated) in sb.d[i].ptr[0] can be used directly in memory copy from DRAM pages into NVDIMM pages. Signed-off-by: Coly Li Cc: Christoph Hellwig Cc: Dan Williams Cc: Hannes Reinecke Cc: Jens Axboe Cc: Jianpeng Ma Cc: Qiaowei Ren --- drivers/md/bcache/journal.c | 117 ++++++++++++++++++++++++++++++++++ drivers/md/bcache/journal.h | 2 +- drivers/md/bcache/nvm-pages.c | 9 +++ drivers/md/bcache/nvm-pages.h | 1 + drivers/md/bcache/super.c | 18 +++--- 5 files changed, 136 insertions(+), 11 deletions(-) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index 61bd79babf7a..9fe6c1abfd84 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -9,6 +9,8 @@ #include "btree.h" #include "debug.h" #include "extents.h" +#include "nvm-pages.h" +#include "features.h" #include @@ -982,3 +984,118 @@ int bch_journal_alloc(struct cache_set *c) return 0; } + +#if defined(CONFIG_BCACHE_NVM_PAGES) + +static void *find_journal_nvmpg_base(struct bch_nvmpg_head *nvmpg_head, + struct cache *ca) +{ + void *addr = NULL; + unsigned long jnl_offset, jnl_pgoff, jnl_ns_id; + int i; + + jnl_offset = (unsigned long)ca->sb.d[0]; + jnl_ns_id = BCH_NVMPG_GET_NS_ID(jnl_offset); + jnl_pgoff = BCH_NVMPG_GET_OFFSET(jnl_offset) >> PAGE_SHIFT; + + for (i = 0; i < BCH_NVMPG_NS_MAX; i++) { + struct bch_nvmpg_recs *recs; + struct bch_nvmpg_rec *rec; + unsigned long recs_offset = 0; + int j; + + recs_offset = nvmpg_head->recs_offset[i]; + recs = bch_nvmpg_offset_to_ptr(recs_offset); + while (recs) { + for (j = 0; j < recs->size; j++) { + rec = &recs->recs[j]; + if ((rec->pgoff != jnl_pgoff) || + (rec->ns_id != jnl_ns_id)) + continue; + + addr = bch_nvmpg_offset_to_ptr(jnl_offset); + goto out; + } + recs_offset = recs->next_offset; + recs = bch_nvmpg_offset_to_ptr(recs_offset); + } + } + +out: + return addr; +} + +static void *get_journal_nvmpg_space(struct cache *ca) +{ + struct bch_nvmpg_head *head = NULL; + void *ret = NULL; + int order; + + head = bch_get_nvmpg_head(ca->sb.set_uuid); + if (head) { + ret = find_journal_nvmpg_base(head, ca); + if (ret) + goto found; + } + + order = ilog2((ca->sb.bucket_size * + ca->sb.njournal_buckets) / PAGE_SECTORS); + ret = bch_nvmpg_alloc_pages(order, ca->sb.set_uuid); + if (ret) + memset(ret, 0, (1 << order) * PAGE_SIZE); +found: + return ret; +} + +#endif /* CONFIG_BCACHE_NVM_PAGES */ + +static int __bch_journal_nvdimm_init(struct cache *ca) +{ + int ret = -1; + +#if defined(CONFIG_BCACHE_NVM_PAGES) + int i; + void *jnl_base = NULL; + + jnl_base = get_journal_nvmpg_space(ca); + if (!jnl_base) { + pr_err("Failed to get journal space from nvdimm\n"); + goto out; + } + + /* Iniialized and reloaded from on-disk super block already */ + if (ca->sb.d[0] != 0) + goto out; + + for (i = 0; i < ca->sb.keys; i++) { + unsigned long jnl_offset; + + jnl_offset = bch_nvmpg_ptr_to_offset(bch_nvmpg_id_to_ns(0), + jnl_base + (bucket_bytes(ca) * i)); + ca->sb.d[i] = jnl_offset; + } + + ret = 0; +out: +#endif /* CONFIG_BCACHE_NVM_PAGES */ + + return ret; +} + + +int bch_journal_init(struct cache_set *c) +{ + int i, ret = 0; + struct cache *ca = c->cache; + + ca->sb.keys = clamp_t(int, ca->sb.nbuckets >> 7, + 2, SB_JOURNAL_BUCKETS); + + if (!bch_has_feature_nvdimm_meta(&ca->sb)) { + for (i = 0; i < ca->sb.keys; i++) + ca->sb.d[i] = ca->sb.first_bucket + i; + } else + ret = __bch_journal_nvdimm_init(ca); + + return ret; +} diff --git a/drivers/md/bcache/journal.h b/drivers/md/bcache/journal.h index f2ea34d5f431..e3a7fa5a8fda 100644 --- a/drivers/md/bcache/journal.h +++ b/drivers/md/bcache/journal.h @@ -179,7 +179,7 @@ void bch_journal_mark(struct cache_set *c, struct list_head *list); void bch_journal_meta(struct cache_set *c, struct closure *cl); int bch_journal_read(struct cache_set *c, struct list_head *list); int bch_journal_replay(struct cache_set *c, struct list_head *list); - +int bch_journal_init(struct cache_set *c); void bch_journal_free(struct cache_set *c); int bch_journal_alloc(struct cache_set *c); diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c index 497360c60f26..55f3f9b7fb0c 100644 --- a/drivers/md/bcache/nvm-pages.c +++ b/drivers/md/bcache/nvm-pages.c @@ -24,6 +24,15 @@ struct bch_nvmpg_set *global_nvmpg_set; +struct bch_nvmpg_ns *bch_nvmpg_id_to_ns(int ns_id) +{ + if ((ns_id >= 0) && (ns_id < BCH_NVMPG_NS_MAX)) + return global_nvmpg_set->ns_tbl[ns_id]; + + pr_emerg("Invalid ns_id: %d\n", ns_id); + return NULL; +} + void *bch_nvmpg_offset_to_ptr(unsigned long offset) { int ns_id = BCH_NVMPG_GET_NS_ID(offset); diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h index 2f6f2ffbfd80..13cc6a532bda 100644 --- a/drivers/md/bcache/nvm-pages.h +++ b/drivers/md/bcache/nvm-pages.h @@ -94,6 +94,7 @@ void bch_nvmpg_exit(void); void *bch_nvmpg_alloc_pages(int order, const char *uuid); void bch_nvmpg_free_pages(void *addr, int order, const char *uuid); struct bch_nvmpg_head *bch_get_nvmpg_head(const char *uuid); +struct bch_nvmpg_ns *bch_nvmpg_id_to_ns(int ns_id); #else diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 4326ffa0d21f..e66e1d6ef260 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -147,9 +147,11 @@ static const char *read_super_common(struct cache_sb *sb, struct block_device * goto err; err = "Journal buckets not sequential"; - for (i = 0; i < sb->keys; i++) - if (sb->d[i] != sb->first_bucket + i) - goto err; + if (!bch_has_feature_nvdimm_meta(sb)) { + for (i = 0; i < sb->keys; i++) + if (sb->d[i] != sb->first_bucket + i) + goto err; + } err = "Too many journal buckets"; if (sb->first_bucket + sb->keys > sb->nbuckets) @@ -2065,14 +2067,10 @@ static int run_cache_set(struct cache_set *c) if (bch_journal_replay(c, &journal)) goto err; } else { - unsigned int j; - pr_notice("invalidating existing data\n"); - ca->sb.keys = clamp_t(int, ca->sb.nbuckets >> 7, - 2, SB_JOURNAL_BUCKETS); - - for (j = 0; j < ca->sb.keys; j++) - ca->sb.d[j] = ca->sb.first_bucket + j; + err = "error initializing journal"; + if (bch_journal_init(c)) + goto err; bch_initial_gc_finish(c); From patchwork Wed Aug 11 17:02:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12431609 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6973C4320A for ; Wed, 11 Aug 2021 17:05:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 857CE6101E for ; Wed, 11 Aug 2021 17:05:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230273AbhHKRFg (ORCPT ); Wed, 11 Aug 2021 13:05:36 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:48414 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229905AbhHKRFe (ORCPT ); Wed, 11 Aug 2021 13:05:34 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id D971C22233; Wed, 11 Aug 2021 17:05:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1628701509; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1EDZ1BeYl/GOojhFFjp2dFeR3vjBss5L/uK7qG8U+L8=; b=Npd/2RjVwQ9puTJSuaXQ2goz31Gd9p4NB3NkGJF88LLWiGfyFwKLDGehnGVslyl/GovdOL Zwwk0l7Btrbgz6j7kt0Aj4aqVmLxP7dVtDmKEsL20fXFIyg7+VGib2F1xusb8uLQ56AzgQ wO3hcu6pZJtYO5Qbj+z7SgtJHy/0lQ8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1628701509; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1EDZ1BeYl/GOojhFFjp2dFeR3vjBss5L/uK7qG8U+L8=; b=ArrpR2eYPpi2tfsvYNWcj1gTPGH8VwVwpXIl9TXz/xF5LXbeKx5Da9eY2wQdPJndvxW0PD oyKfoT2oZJUvCACw== Received: from localhost.localdomain (colyli.tcp.ovpn1.nue.suse.de [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id 53690A3D5E; Wed, 11 Aug 2021 17:04:55 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-nvdimm@lists.linux.dev, axboe@kernel.dk, hare@suse.com, jack@suse.cz, dan.j.williams@intel.com, hch@lst.de, ying.huang@intel.com, Coly Li , Hannes Reinecke , Jianpeng Ma , Qiaowei Ren Subject: [PATCH v12 10/12] bcache: support storing bcache journal into NVDIMM meta device Date: Thu, 12 Aug 2021 01:02:22 +0800 Message-Id: <20210811170224.42837-11-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210811170224.42837-1-colyli@suse.de> References: <20210811170224.42837-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This patch implements two methods to store bcache journal to, 1) __journal_write_unlocked() for block interface device The latency method to compose bio and issue the jset bio to cache device (e.g. SSD). c->journal.key.ptr[0] indicates the LBA on cache device to store the journal jset. 2) __journal_nvdimm_write_unlocked() for memory interface NVDIMM Use memory interface to access NVDIMM pages and store the jset by memcpy_flushcache(). c->journal.key.ptr[0] indicates the linear address from the NVDIMM pages to store the journal jset. For legacy configuration without NVDIMM meta device, journal I/O is handled by __journal_write_unlocked() with existing code logic. If the NVDIMM meta device is used (by bcache-tools), the journal I/O will be handled by __journal_nvdimm_write_unlocked() and go into the NVDIMM pages. And when NVDIMM meta device is used, sb.d[] stores the linear addresses from NVDIMM pages (no more bucket index), in journal_reclaim() the journaling location in c->journal.key.ptr[0] should also be updated by linear address from NVDIMM pages (no more LBA combined by sectors offset and bucket index). Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke Cc: Christoph Hellwig Cc: Dan Williams Cc: Jens Axboe Cc: Jianpeng Ma Cc: Qiaowei Ren --- drivers/md/bcache/journal.c | 120 +++++++++++++++++++++++++----------- drivers/md/bcache/super.c | 3 +- 2 files changed, 85 insertions(+), 38 deletions(-) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index 9fe6c1abfd84..8cd0c4dc9137 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -596,6 +596,8 @@ static void do_journal_discard(struct cache *ca) return; } + BUG_ON(bch_has_feature_nvdimm_meta(&ca->sb)); + switch (atomic_read(&ja->discard_in_flight)) { case DISCARD_IN_FLIGHT: return; @@ -661,9 +663,16 @@ static void journal_reclaim(struct cache_set *c) goto out; ja->cur_idx = next; - k->ptr[0] = MAKE_PTR(0, - bucket_to_sector(c, ca->sb.d[ja->cur_idx]), - ca->sb.nr_this_dev); + if (!bch_has_feature_nvdimm_meta(&ca->sb)) + k->ptr[0] = MAKE_PTR(0, + bucket_to_sector(c, ca->sb.d[ja->cur_idx]), + ca->sb.nr_this_dev); +#if defined(CONFIG_BCACHE_NVM_PAGES) + else + k->ptr[0] = (unsigned long)bch_nvmpg_offset_to_ptr( + ca->sb.d[ja->cur_idx]); +#endif + atomic_long_inc(&c->reclaimed_journal_buckets); bkey_init(k); @@ -729,46 +738,21 @@ static void journal_write_unlock(struct closure *cl) spin_unlock(&c->journal.lock); } -static void journal_write_unlocked(struct closure *cl) + +static void __journal_write_unlocked(struct cache_set *c) __releases(c->journal.lock) { - struct cache_set *c = container_of(cl, struct cache_set, journal.io); - struct cache *ca = c->cache; - struct journal_write *w = c->journal.cur; struct bkey *k = &c->journal.key; - unsigned int i, sectors = set_blocks(w->data, block_bytes(ca)) * - ca->sb.block_size; - + struct journal_write *w = c->journal.cur; + struct closure *cl = &c->journal.io; + struct cache *ca = c->cache; struct bio *bio; struct bio_list list; + unsigned int i, sectors = set_blocks(w->data, block_bytes(ca)) * + ca->sb.block_size; bio_list_init(&list); - if (!w->need_write) { - closure_return_with_destructor(cl, journal_write_unlock); - return; - } else if (journal_full(&c->journal)) { - journal_reclaim(c); - spin_unlock(&c->journal.lock); - - btree_flush_write(c); - continue_at(cl, journal_write, bch_journal_wq); - return; - } - - c->journal.blocks_free -= set_blocks(w->data, block_bytes(ca)); - - w->data->btree_level = c->root->level; - - bkey_copy(&w->data->btree_root, &c->root->key); - bkey_copy(&w->data->uuid_bucket, &c->uuid_bucket); - - w->data->prio_bucket[ca->sb.nr_this_dev] = ca->prio_buckets[0]; - w->data->magic = jset_magic(&ca->sb); - w->data->version = BCACHE_JSET_VERSION; - w->data->last_seq = last_seq(&c->journal); - w->data->csum = csum_set(w->data); - for (i = 0; i < KEY_PTRS(k); i++) { ca = c->cache; bio = &ca->journal.bio; @@ -793,7 +777,6 @@ static void journal_write_unlocked(struct closure *cl) ca->journal.seq[ca->journal.cur_idx] = w->data->seq; } - /* If KEY_PTRS(k) == 0, this jset gets lost in air */ BUG_ON(i == 0); @@ -805,6 +788,71 @@ static void journal_write_unlocked(struct closure *cl) while ((bio = bio_list_pop(&list))) closure_bio_submit(c, bio, cl); +} + +#if defined(CONFIG_BCACHE_NVM_PAGES) + +static void __journal_nvdimm_write_unlocked(struct cache_set *c) + __releases(c->journal.lock) +{ + struct journal_write *w = c->journal.cur; + struct cache *ca = c->cache; + unsigned int sectors; + + sectors = set_blocks(w->data, block_bytes(ca)) * ca->sb.block_size; + atomic_long_add(sectors, &ca->meta_sectors_written); + + memcpy_flushcache((void *)c->journal.key.ptr[0], w->data, sectors << 9); + + c->journal.key.ptr[0] += sectors << 9; + ca->journal.seq[ca->journal.cur_idx] = w->data->seq; + + atomic_dec_bug(&fifo_back(&c->journal.pin)); + bch_journal_next(&c->journal); + journal_reclaim(c); + + spin_unlock(&c->journal.lock); +} + +#endif /* CONFIG_BCACHE_NVM_PAGES */ + +static void journal_write_unlocked(struct closure *cl) +{ + struct cache_set *c = container_of(cl, struct cache_set, journal.io); + struct cache *ca = c->cache; + struct journal_write *w = c->journal.cur; + + if (!w->need_write) { + closure_return_with_destructor(cl, journal_write_unlock); + return; + } else if (journal_full(&c->journal)) { + journal_reclaim(c); + spin_unlock(&c->journal.lock); + + btree_flush_write(c); + continue_at(cl, journal_write, bch_journal_wq); + return; + } + + c->journal.blocks_free -= set_blocks(w->data, block_bytes(ca)); + + w->data->btree_level = c->root->level; + + bkey_copy(&w->data->btree_root, &c->root->key); + bkey_copy(&w->data->uuid_bucket, &c->uuid_bucket); + + w->data->prio_bucket[ca->sb.nr_this_dev] = ca->prio_buckets[0]; + w->data->magic = jset_magic(&ca->sb); + w->data->version = BCACHE_JSET_VERSION; + w->data->last_seq = last_seq(&c->journal); + w->data->csum = csum_set(w->data); + + if (!bch_has_feature_nvdimm_meta(&ca->sb)) + __journal_write_unlocked(c); +#if defined(CONFIG_BCACHE_NVM_PAGES) + else + __journal_nvdimm_write_unlocked(c); +#endif continue_at(cl, journal_write_done, NULL); } diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index e66e1d6ef260..24734250d005 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1676,7 +1676,7 @@ void bch_cache_set_release(struct kobject *kobj) static void cache_set_free(struct closure *cl) { struct cache_set *c = container_of(cl, struct cache_set, cl); - struct cache *ca; + struct cache *ca = c->cache; debugfs_remove(c->debug); @@ -1688,7 +1688,6 @@ static void cache_set_free(struct closure *cl) bch_bset_sort_state_free(&c->sort); free_pages((unsigned long) c->uuids, ilog2(meta_bucket_pages(&c->cache->sb))); - ca = c->cache; if (ca) { ca->set = NULL; c->cache = NULL; From patchwork Wed Aug 11 17:02:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12431611 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70D42C432BE for ; Wed, 11 Aug 2021 17:05:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3AB936101E for ; Wed, 11 Aug 2021 17:05:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230400AbhHKRFl (ORCPT ); Wed, 11 Aug 2021 13:05:41 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:58662 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230391AbhHKRFk (ORCPT ); Wed, 11 Aug 2021 13:05:40 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id F2BB120192; Wed, 11 Aug 2021 17:05:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1628701515; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1Mi4GxxoJ8bz3IvlDPgn+MPYGjz3ma/b9rGB0Qg8+hQ=; b=R1qaMiaLvckFg2eiSKehzmIKHwCDVy5lPLHSWsQUIQqjceMhKCIyg2iEExH2ZO9r8QOUPo u36wJBINAvyyqAf52iI2T1VBzDxhoH4GGHKvR7adu+cWmJBkiArlqKLp6gzhZDIAkQKslh gRlPYzYkeZDDwyYjh4cI2QiO02ADT0Q= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1628701515; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1Mi4GxxoJ8bz3IvlDPgn+MPYGjz3ma/b9rGB0Qg8+hQ=; b=+vsQc7/Bbc4SbdRNP1WoLxhzQvsofdp5YuSQN5l75ikBizzX0mE9wviRL4IFLVP40Sv6fg NGswjFNS+r78niAA== Received: from localhost.localdomain (colyli.tcp.ovpn1.nue.suse.de [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id A936CA3D5E; Wed, 11 Aug 2021 17:05:10 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-nvdimm@lists.linux.dev, axboe@kernel.dk, hare@suse.com, jack@suse.cz, dan.j.williams@intel.com, hch@lst.de, ying.huang@intel.com, Coly Li , kernel test robot , Dan Carpenter , Hannes Reinecke , Jianpeng Ma , Qiaowei Ren Subject: [PATCH v12 11/12] bcache: read jset from NVDIMM pages for journal replay Date: Thu, 12 Aug 2021 01:02:23 +0800 Message-Id: <20210811170224.42837-12-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210811170224.42837-1-colyli@suse.de> References: <20210811170224.42837-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This patch implements two methods to read jset from media for journal replay, - __jnl_rd_bkt() for block device This is the legacy method to read jset via block device interface. - __jnl_rd_nvm_bkt() for NVDIMM This is the method to read jset from NVDIMM memory interface, a.k.a memcopy() from NVDIMM pages to DRAM pages. If BCH_FEATURE_INCOMPAT_NVDIMM_META is set in incompat feature set, during running cache set, journal_read_bucket() will read the journal content from NVDIMM by __jnl_rd_nvm_bkt(). The linear addresses of NVDIMM pages to read jset are stored in sb.d[SB_JOURNAL_BUCKETS], which were initialized and maintained in previous runs of the cache set. A thing should be noticed is, when bch_journal_read() is called, the linear address of NVDIMM pages is not loaded and initialized yet, it is necessary to call __bch_journal_nvdimm_init() before reading the jset from NVDIMM pages. The code comments added in journal_read_bucket() is noticed by kernel test robot and Dan Carpenter, it explains why it is safe to only check !bch_has_feature_nvdimm_meta() condition in the if() statement when CONFIG_BCACHE_NVM_PAGES is not configured. To avoid confusion from the bogus warning message from static checking tool. Signed-off-by: Coly Li Reported-by: kernel test robot Reported-by: Dan Carpenter Cc: Christoph Hellwig Cc: Dan Williams Cc: Hannes Reinecke Cc: Jens Axboe Cc: Jianpeng Ma Cc: Qiaowei Ren --- drivers/md/bcache/journal.c | 88 ++++++++++++++++++++++++++++++------- 1 file changed, 71 insertions(+), 17 deletions(-) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index 8cd0c4dc9137..987306b4db20 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -34,18 +34,60 @@ static void journal_read_endio(struct bio *bio) closure_put(cl); } +static struct jset *__jnl_rd_bkt(struct cache *ca, unsigned int bkt_idx, + unsigned int len, unsigned int offset, + struct closure *cl) +{ + sector_t bucket = bucket_to_sector(ca->set, ca->sb.d[bkt_idx]); + struct bio *bio = &ca->journal.bio; + struct jset *data = ca->set->journal.w[0].data; + + bio_reset(bio); + bio->bi_iter.bi_sector = bucket + offset; + bio_set_dev(bio, ca->bdev); + bio->bi_iter.bi_size = len << 9; + + bio->bi_end_io = journal_read_endio; + bio->bi_private = cl; + bio_set_op_attrs(bio, REQ_OP_READ, 0); + bch_bio_map(bio, data); + + closure_bio_submit(ca->set, bio, cl); + closure_sync(cl); + + /* Indeed journal.w[0].data */ + return data; +} + +#if defined(CONFIG_BCACHE_NVM_PAGES) + +static struct jset *__jnl_rd_nvm_bkt(struct cache *ca, unsigned int bkt_idx, + unsigned int len, unsigned int offset) +{ + void *jset_addr; + struct jset *data; + + jset_addr = bch_nvmpg_offset_to_ptr(ca->sb.d[bkt_idx]) + (offset << 9); + data = ca->set->journal.w[0].data; + + memcpy(data, jset_addr, len << 9); + + /* Indeed journal.w[0].data */ + return data; +} + +#endif /* CONFIG_BCACHE_NVM_PAGES */ + static int journal_read_bucket(struct cache *ca, struct list_head *list, unsigned int bucket_index) { struct journal_device *ja = &ca->journal; - struct bio *bio = &ja->bio; struct journal_replay *i; - struct jset *j, *data = ca->set->journal.w[0].data; + struct jset *j; struct closure cl; unsigned int len, left, offset = 0; int ret = 0; - sector_t bucket = bucket_to_sector(ca->set, ca->sb.d[bucket_index]); closure_init_stack(&cl); @@ -55,26 +97,27 @@ static int journal_read_bucket(struct cache *ca, struct list_head *list, reread: left = ca->sb.bucket_size - offset; len = min_t(unsigned int, left, PAGE_SECTORS << JSET_BITS); - bio_reset(bio); - bio->bi_iter.bi_sector = bucket + offset; - bio_set_dev(bio, ca->bdev); - bio->bi_iter.bi_size = len << 9; - - bio->bi_end_io = journal_read_endio; - bio->bi_private = &cl; - bio_set_op_attrs(bio, REQ_OP_READ, 0); - bch_bio_map(bio, data); - - closure_bio_submit(ca->set, bio, &cl); - closure_sync(&cl); + if (!bch_has_feature_nvdimm_meta(&ca->sb)) + j = __jnl_rd_bkt(ca, bucket_index, len, offset, &cl); + /* + * If CONFIG_BCACHE_NVM_PAGES is not defined, the feature bit + * BCH_FEATURE_INCOMPAT_NVDIMM_META won't in incompatible + * support feature set, a cache device format with feature bit + * BCH_FEATURE_INCOMPAT_NVDIMM_META will fail much earlier in + * read_super() by bch_has_unknown_incompat_features(). + * Therefore when CONFIG_BCACHE_NVM_PAGES is not define, it is + * safe to ignore the bch_has_feature_nvdimm_meta() condition. + */ +#if defined(CONFIG_BCACHE_NVM_PAGES) + else + j = __jnl_rd_nvm_bkt(ca, bucket_index, len, offset); +#endif /* This function could be simpler now since we no longer write * journal entries that overlap bucket boundaries; this means * the start of a bucket will always have a valid journal entry * if it has any journal entries at all. */ - - j = data; while (len) { struct list_head *where; size_t blocks, bytes = set_bytes(j); @@ -170,6 +213,8 @@ reread: left = ca->sb.bucket_size - offset; return ret; } +static int __bch_journal_nvdimm_init(struct cache *ca); + int bch_journal_read(struct cache_set *c, struct list_head *list) { #define read_bucket(b) \ @@ -188,6 +233,15 @@ int bch_journal_read(struct cache_set *c, struct list_head *list) unsigned int i, l, r, m; uint64_t seq; + /* + * Linear addresses of NVDIMM pages for journaling is not + * initialized yet, do it before read jset from NVDIMM pages. + */ + if (bch_has_feature_nvdimm_meta(&ca->sb)) { + if (__bch_journal_nvdimm_init(ca) < 0) + return -ENXIO; + } + bitmap_zero(bitmap, SB_JOURNAL_BUCKETS); pr_debug("%u journal buckets\n", ca->sb.njournal_buckets); From patchwork Wed Aug 11 17:02:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12431613 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB3E6C432BE for ; Wed, 11 Aug 2021 17:05:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 852E86101E for ; Wed, 11 Aug 2021 17:05:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230401AbhHKRFr (ORCPT ); Wed, 11 Aug 2021 13:05:47 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:58684 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229530AbhHKRFr (ORCPT ); Wed, 11 Aug 2021 13:05:47 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id A3A4E20195; Wed, 11 Aug 2021 17:05:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1628701521; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=efsOw98AEBFshxhhH9/n3GRQdv2ksVkqaTCb57JSsqU=; b=FRR4OFK3s6fk7teQ4WATZ3QEJHo5och1ZjfgFXygxnp8Rc/ube6zMpRQGu2Go5USA+MmjH Utk0j9944xnayK9aiIeVUCxOLL1RHlsbT8KKcuJ1jdIazbWH4LiMgfw+Q8EV4uq45Z2KdZ S3K9brjiqqkxdvENy5sG7fkVDCve07Q= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1628701521; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=efsOw98AEBFshxhhH9/n3GRQdv2ksVkqaTCb57JSsqU=; b=tZa4CLdOvovYtOXexaq9cWlVntxJgOGmy71E79JNPrcVmpTagz5+uNMhYuXlOlJ+eej4Ta LVmrvUAu1ZQY9xBQ== Received: from localhost.localdomain (colyli.tcp.ovpn1.nue.suse.de [10.163.16.22]) by relay2.suse.de (Postfix) with ESMTP id 9AC0AA3D61; Wed, 11 Aug 2021 17:05:16 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-nvdimm@lists.linux.dev, axboe@kernel.dk, hare@suse.com, jack@suse.cz, dan.j.williams@intel.com, hch@lst.de, ying.huang@intel.com, Coly Li , Hannes Reinecke , Jianpeng Ma , Qiaowei Ren Subject: [PATCH v12 12/12] bcache: add sysfs interface register_nvdimm_meta to register NVDIMM meta device Date: Thu, 12 Aug 2021 01:02:24 +0800 Message-Id: <20210811170224.42837-13-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210811170224.42837-1-colyli@suse.de> References: <20210811170224.42837-1-colyli@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This patch adds a sysfs interface register_nvdimm_meta to register NVDIMM meta device. The sysfs interface file only shows up when CONFIG_BCACHE_NVM_PAGES=y. Then a NVDIMM name space formatted by bcache-tools can be registered into bcache by e.g., echo /dev/pmem0 > /sys/fs/bcache/register_nvdimm_meta Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke Cc: Christoph Hellwig Cc: Dan Williams Cc: Jens Axboe Cc: Jianpeng Ma Cc: Qiaowei Ren --- drivers/md/bcache/super.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 24734250d005..434739594baf 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -2403,10 +2403,18 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, static ssize_t bch_pending_bdevs_cleanup(struct kobject *k, struct kobj_attribute *attr, const char *buffer, size_t size); +#if defined(CONFIG_BCACHE_NVM_PAGES) +static ssize_t register_nvdimm_meta(struct kobject *k, + struct kobj_attribute *attr, + const char *buffer, size_t size); +#endif kobj_attribute_write(register, register_bcache); kobj_attribute_write(register_quiet, register_bcache); kobj_attribute_write(pendings_cleanup, bch_pending_bdevs_cleanup); +#if defined(CONFIG_BCACHE_NVM_PAGES) +kobj_attribute_write(register_nvdimm_meta, register_nvdimm_meta); +#endif static bool bch_is_open_backing(dev_t dev) { @@ -2520,6 +2528,24 @@ static void register_device_async(struct async_reg_args *args) queue_delayed_work(system_wq, &args->reg_work, 10); } +#if defined(CONFIG_BCACHE_NVM_PAGES) +static ssize_t register_nvdimm_meta(struct kobject *k, struct kobj_attribute *attr, + const char *buffer, size_t size) +{ + ssize_t ret = size; + + struct bch_nvmpg_ns *ns = bch_register_namespace(buffer); + + if (IS_ERR(ns)) { + pr_err("register nvdimm namespace %s for meta device failed.\n", + buffer); + ret = -EINVAL; + } + + return ret; +} +#endif + static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, const char *buffer, size_t size) { @@ -2855,6 +2881,9 @@ static int __init bcache_init(void) static const struct attribute *files[] = { &ksysfs_register.attr, &ksysfs_register_quiet.attr, +#if defined(CONFIG_BCACHE_NVM_PAGES) + &ksysfs_register_nvdimm_meta.attr, +#endif &ksysfs_pendings_cleanup.attr, NULL };