BTT: Use dram freelist and remove bflog to otpimize perf

Dependency:
[PATCH] nvdimm: Add NVDIMM_NO_DEEPFLUSH flag to control btt
data deepflush
https://lore.kernel.org/nvdimm/20220629135801.192821-1-dennis.wu@intel.com/T/#u

Reason:
In BTT, each write will write sector data, update 4 bytes btt_map
entry and update 16 bytes bflog (two 8 bytes atomic write),the
meta data write overhead is big and we can optimize the algorithm
and not use the bflog. Then each write, we will update the sector
data and then 4 bytes btt_map entry.

How:
1. scan the btt_map to generate the aba mapping bitmap, if one
internal aba used, the bit will be set.
2. generate the in-memory freelist according the aba bitmap, the
freelist is a array that records all the free ABAs like:
| 340 | 422 | 578 |...
that means ABA 340, 422, 578 are free. The last nfree(nlane)
records in the array will be used for each lane at the beginning.
3. Get a free ABA of a lane, write data to the ABA. If the premap
btt_map entry is initialization state (e_flag=0, z_flag=0), get
an free ABA from the free ABA array for the lane. If the premap
btt_map entry is not in initialization state, the ABA in the
btt_map entry will be looked as the free ABA of the lane.Once
the free ABAs = nfree that means the arena is fully written and
we can free the whole freelist (not implimented yet).
4. In the code, "version_major ==2" is the new algorithm and
the logic in else is the old algorithm.

Result:
1. The write performance can improve ~50% and the latency also
reduce to 60% of origial algorithm.
2. During initialization, scan btt_map and generate the freelist
will take time and lead namespace enable longer. With 4K sector,
1TB namespace, the enable time less than 4s. This will only happen
once during initalization.
3. Take 4 bytes per sector memory to store the freelist. But once
the arena fully written, the freelist can be freed. As we know,in
the storage case, the disk always be fully written for usage, then
we don't have memory space overhead.

Compatablity:
1. The new algorithm keep the layout of bflog, only ignore its
logic, that means no update during new algorithm.
2. If a namespace create with old algorithm and layout, you can
switch to the new algorithm seamless w/o any specific operation.
3. Since the bflog will not be updated if you move to the new
algorithm. After you write data with the new algorithmyou, you
can't switch back from the new algorithm to old algorithm.

Signed-off-by: dennis.wu <dennis.wu@intel.com>
---
 drivers/nvdimm/btt.c | 231 ++++++++++++++++++++++++++++++++++---------
 drivers/nvdimm/btt.h |  15 +++
 2 files changed, 199 insertions(+), 47 deletions(-)

Message ID	20220630134244.685331-1-dennis.wu@intel.com (mailing list archive)
State	New, archived
Headers	show Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B2DE33D7C for <nvdimm@lists.linux.dev>; Thu, 30 Jun 2022 13:42:23 +0000 (UTC) From: "dennis.wu" <dennis.wu@intel.com> To: nvdimm@lists.linux.dev Cc: vishal.l.verma@intel.com, dan.j.williams@intel.com, dave.jiang@intel.com, "dennis.wu" <dennis.wu@intel.com> Subject: [PATCH] BTT: Use dram freelist and remove bflog to otpimize perf Date: Thu, 30 Jun 2022 21:42:44 +0800 Message-Id: <20220630134244.685331-1-dennis.wu@intel.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	BTT: Use dram freelist and remove bflog to otpimize perf \| expand BTT: Use dram freelist and remove bflog to otpimize perf

BTT: Use dram freelist and remove bflog to otpimize perf

Commit Message

Comments

Patch