From patchwork Fri Apr 9 16:43:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 12194503 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18030C43460 for ; Fri, 9 Apr 2021 16:43:58 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D3C8B610A8 for ; Fri, 9 Apr 2021 16:43:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D3C8B610A8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id AEC3B100EB330; Fri, 9 Apr 2021 09:43:57 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=195.135.220.15; helo=mx2.suse.de; envelope-from=colyli@suse.de; receiver= Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 78248100EB330 for ; Fri, 9 Apr 2021 09:43:54 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id E7F80B10B; Fri, 9 Apr 2021 16:43:52 +0000 (UTC) From: Coly Li To: linux-bcache@vger.kernel.org Subject: [PATCH v7 00/16] bcache: support NVDIMM for journaling Date: Sat, 10 Apr 2021 00:43:27 +0800 Message-Id: <20210409164343.56828-1-colyli@suse.de> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 Message-ID-Hash: A4OVJUNNILYMBASOXH6HGB3P6U5VAS53 X-Message-ID-Hash: A4OVJUNNILYMBASOXH6HGB3P6U5VAS53 X-MailFrom: colyli@suse.de X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: linux-block@vger.kernel.org, linux-nvdimm@lists.01.org, axboe@kernel.dk, jianpeng.ma@intel.com, qiaowei.ren@intel.com, hare@suse.com, jack@suse.cz, Coly Li X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: This is the 7th effort for bcache to support NVDIMM for jouranling since the first nvm-pages series was posted. This series is combination of the v7 nvm-pages allocator developed by Intel developers and related bcache changes from me. The nvm-pages allocator is a buddy-like allocator, which allocates size in power-of-2 pages from the NVDIMM namespace. User space tool 'bcache' has a new added '-M' option to format a NVDIMM namespace and register it via sysfs interface as a bcache meta device. The nvm-pages kernel code does a DAX mapping to map the whole namespace into system's memory address range, and allocating the pages to requestion like typical buddy allocator does. The major difference is nvm-pages allocator maintains the pages allocated to each requester by a owner list which stored on NVDIMM too. Owner list of different requester is tracked by a pre- defined UUID, all the pages tracked in all owner lists are treated as allocated busy pages and won't be initialized into buddy system after the system reboot. The bcache journal code may request a block of power-of-2 size pages from the nvm-pages allocator, normally it is a range of 256MB or 512MB continuous pages range. During meta data journaling, the in-memory jsets go into the calculated nvdimm pages location by kernel memcpy routine. So the journaling I/Os won't go into block device (e.g. SSD) anymore, the write and read for journal jsets happen on NVDIMM. The nvm-pages on-NVDIMM data structures are defined as legacy in-memory objects, because they ARE in-memory objects directly referenced by linear addresses, both in system DRAM and NVDIMM. They are defined in the following patch, - bcache: add initial data structures for nvm pages Intel developers Jianpeng Ma and Qiaowei Ren compose the initial code of nvm-pages, the related patches are, - bcache: initialize the nvm pages allocator - bcache: initialization of the buddy - bcache: bch_nvm_alloc_pages() of the buddy - bcache: bch_nvm_free_pages() of the buddy - bcache: get allocated pages from specific owner All the code depends on Linux libnvdimm and dax drivers, the bcache nvm- pages allocator can be treated as user of these two drivers. I modify the bcache code to recognize the nvm meta device feature, initialize journal on NVDIMM, and do journal I/Os on NVDIMM in the following patches, - bcache: add initial data structures for nvm pages - bcache: use bucket index to set GC_MARK_METADATA for journal buckets in bch_btree_gc_finish() - bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set - bcache: initialize bcache journal for NVDIMM meta device - bcache: support storing bcache journal into NVDIMM meta device - bcache: read jset from NVDIMM pages for journal replay - bcache: add sysfs interface register_nvdimm_meta to register NVDIMM meta device Also during the code integration and testing, there are some issues are fixed by the following patches, - bcache: nvm-pages fixes for bcache integration testing - bcache: use div_u64() in init_owner_info() - bcache: fix BCACHE_NVM_PAGES' dependences in Kconfig - bcache: more fix for compiling error when BCACHE_NVM_PAGES disabled The above patches can be added or merged into nvm-pages code, so that they can be dropped in next version of this series. Current series works as expected, of course it is not perfect but the state is fine as a code base for further improvement. For example the power failure tolerance for nvm-pages owner list operations, more error handling for journal code, and moving the B+ tree node I/Os into NVDIMM. All the code is EXPERIMENTAL, they won't be enabled by default until we feel the NVDIMM support is completed and stable. Any comments and suggestion is warmly welcome :-) Thank you in advance. Coly Li --- Changelog: v7: Refine nvm-pages allocator code to operate owner list directly in dax mapped NVDIMM pages, and remove the meta data copy from DRAM. v6: The series submitted but not merged in Linux 5.12 merge window. v1-v5: RFC patches of bcache nvm-pages. Coly Li (11): bcache: add initial data structures for nvm pages bcache: nvm-pages fixes for bcache integration testing bcache: use bucket index to set GC_MARK_METADATA for journal buckets in bch_btree_gc_finish() bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set bcache: initialize bcache journal for NVDIMM meta device bcache: support storing bcache journal into NVDIMM meta device bcache: read jset from NVDIMM pages for journal replay bcache: add sysfs interface register_nvdimm_meta to register NVDIMM meta device bcache: use div_u64() in init_owner_info() bcache: fix BCACHE_NVM_PAGES' dependences in Kconfig bcache: more fix for compiling error when BCACHE_NVM_PAGES disabled Jianpeng Ma (5): bcache: initialize the nvm pages allocator bcache: initialization of the buddy bcache: bch_nvm_alloc_pages() of the buddy bcache: bch_nvm_free_pages() of the buddy bcache: get allocated pages from specific owner drivers/md/bcache/Kconfig | 9 + drivers/md/bcache/Makefile | 2 +- drivers/md/bcache/btree.c | 6 +- drivers/md/bcache/features.h | 9 + drivers/md/bcache/journal.c | 317 +++++++++++--- drivers/md/bcache/journal.h | 2 +- drivers/md/bcache/nvm-pages.c | 744 ++++++++++++++++++++++++++++++++ drivers/md/bcache/nvm-pages.h | 95 ++++ drivers/md/bcache/super.c | 73 +++- include/uapi/linux/bcache-nvm.h | 208 +++++++++ 10 files changed, 1392 insertions(+), 73 deletions(-) create mode 100644 drivers/md/bcache/nvm-pages.c create mode 100644 drivers/md/bcache/nvm-pages.h create mode 100644 include/uapi/linux/bcache-nvm.h